CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
about "split"

 



frank
stranger

Apr 22, 2001, 7:33 PM

Post #1 of 19 (2114 views)
about "split" Can't Post

I just begin to use PERL recently,Could you like to help me?
My question is :
@tmp1={"text1","text2","text3"};
@tmp2=split(/\,/,@tmp1);
print $tmp2[1];

the result is text2
ok,now
@tmp1={"text1","this is a sample,I use it to test split function" ,"text3"};
@tmp2=split(/\,/,@tmp1);
print $tmp2[1];

the result is "this is a sample

but I don't want to get the value,the mark "," in the string ("this is a sample,I use it to test split function" )
is looked as a seperate operator.Is there any way to get the whole string ("this is a sample,I use it to test split function" ) as the value of $tmp2[1]?It is better to use "split",or some way you know,thanks.




zanardi
journeyman

Apr 22, 2001, 8:05 PM

Post #2 of 19 (2113 views)
Re: about "split" [In reply to] Can't Post

well you've got it all wrong

you dont need to split those arrays,


Code
@tmp1 = ("text1","text2","text3"); 

print $tmp1[1];

this will print:


Code
text2

this is how split works:


Code
$tmp1 = 'text1,text2,text3'; 

@tmp2 = split(/\,/,$tmp1);

print $tmp2[1]

this will print:


Code
text2

----------------------
Fueled By ZCom


Jasmine
Administrator / Moderator

Apr 22, 2001, 8:19 PM

Post #3 of 19 (2112 views)
Re: about "split" [In reply to] Can't Post

I think what zanardi was trying to say is that you don't need to split arrays because they're already "split" and indexed.


Code
my @array = ("zero", "one", "two", "three");

are four separate elements of @array, whose values can be retrieved by their index.


Code
my @array = ("zero", "one", "two", "three"); 

print $array[3]; # prints three

The thing to remember about arrays is that the index number of the first element (in this case, one) is 0.

You can split strings using split, so if your string looked like:


Code
my $string = "zero,one,two,three";

You can use split to put each element into its own array.


Code
my @array = split(/,/, $string );

Hope this helps!



(This post was edited by Jasmine on Apr 23, 2001, 5:54 PM)


frank
stranger

Apr 22, 2001, 9:13 PM

Post #4 of 19 (2108 views)
Re: about "split" [In reply to] Can't Post

sorry for my mistakes,and thank you and zanardi.
now I fix my question as you taught.

my $string = 'text1,"This is a sample which use it for test",text3';
my @array = split(/./, $string );
print $array[1];

the result is "This is a sample which use it for test"

but I put "," into the string this time

my $string = 'text1,"This is a sample,I use it for test",text3';
my @array = split(/./, $string );
print $array[1];

what will the result be?

It is "This is a sample

How can I get the whole string,how to avoid the effect of "," in the string which is same as the mark split used,this is what I want to know,thanks.




zanardi
journeyman

Apr 23, 2001, 12:21 PM

Post #5 of 19 (2083 views)
Re: about "split" [In reply to] Can't Post

Thanks Jasmine =)

well I have a question, on your split function it is spliting on periods (.) but you have no periods in your string

anyway, if you want to split, and get rid of quotes (") you could use a regex after you split, so


Code
my $string = 'text1,"This is a sample which use it for test",text3'; 
my ($string1,$string2,$string3) = split(/,/,$string);
$string2 =~ s/"//g;
print $string2;

thats how I would do it unless you dont want to get rid of all quotes, I'd find another way to store my data in a string

----------------------
Fueled By ZCom


frank
stranger

Apr 23, 2001, 6:42 PM

Post #6 of 19 (2064 views)
Re: about "split" [In reply to] Can't Post

hi,zanardi,thanks for your help.
I understand you,but maybe you didn't read all my post;
in last post I wrote:
----------------------------------------------------------------
but I put "," into the string this time
my $string = 'text1,"This is a sample,I use it for test",text3';
my @array = split(/./, $string );
print $array[1];
what will the result be?
It is "This is a sample
How can I get the whole string (between the " "),how to avoid the effect of "," in the string which is same as the mark split used,this is what I want to know,thanks.
-----------------------------------------------------
sorry for my poor English,can you answer me ,thanks.



Jasmine
Administrator / Moderator

Apr 23, 2001, 6:54 PM

Post #7 of 19 (2061 views)
Re: about "split" [In reply to] Can't Post

Oopsie, bad fingers! I just fixed it -- thanks for pointing that out zanardi :)



widexl
Novice

Apr 24, 2001, 6:38 AM

Post #8 of 19 (2052 views)
Re: about "split" [In reply to] Can't Post

Hello Frank

I think you can better use a TAB for to split your data
Use for a tab ( \t )

$item = 'text1\t"This is a sample,I use it for test"\ttext3';
@data = split(/\t/,$item);

You can not use a tab in a web form so you are save for the most times.
a work around gives to many problems.

Henk




zanardi
journeyman

Apr 24, 2001, 3:40 PM

Post #9 of 19 (2044 views)
Re: about "split" [In reply to] Can't Post

yeah, widexl is right. You should use another method of spliting your data, like Tab as widexl suggested, or most commonly used is the pipe |

----------------------
Fueled By ZCom


frank
stranger

Apr 24, 2001, 6:49 PM

Post #10 of 19 (2040 views)
Re: about "split" [In reply to] Can't Post

hi,folks
thanks all.
that's good idea,it can solve the problem very well.Unfortunately I can't choose another method to split my data.Ok,Let me explain why I ask the question.

I have some information saved in a database,I can't operate DB and only import data from it.So I own the data like this saved in a text file:
"ID","name","address:Washington,US","Tel:123423"
I am ordered to select some items by ID from the data and report to someone by mail.so when I use
split(/\,/,$string),the trouble occured.
Yes,I can use the mark like "," or ," in the split funtion,but I can't make sure whether the mark I am using in the split funtion will appear in the data,this is a big trouble.
All right,if there is no easy way to cover the trouble,the question is closed.
thanks zanardi,widexl,Jasmine,I am glad to discuss with you!




Jean
User


Apr 24, 2001, 11:17 PM

Post #11 of 19 (2030 views)
Re: about "split" [In reply to] Can't Post

Another idea:
Split your string into an array and then scan (save every character in) it until matching pair of double quotes is met. The saved string is the first word. Ignore the commas and (optionally) whitespace before the odd double quote. Continue until the whole string length is processed. Although not the fastest way, it will most certainly solve your problem.

Cheers,


Jean Spector
QA Engineer @ Extent Technologies, Ltd.
mage@lycosmail.com


BigRich
Novice

Apr 24, 2001, 11:56 PM

Post #12 of 19 (2030 views)
Re: about "split" [In reply to] Can't Post

This is a common problem.

my $string = q~"text","City, State","More text",444,123~;

Convert the commas within the quoted text.

$string =~ s/(("|')[^"']+?),([^"']+?\2)/$1~=~$3/g;

$string is now:

"text","City~=~ State","More text",444,123

Now, split on your commas, remove the quotes from the beginning and end of any quoted data and replace the converted commas.

my @data; #split string will go here

foreach (split(/,/, $string)) {
s/^["']|["']$//g; #remove quotes, if any
s/~=~/,/g; #replace commas, if any
push @data, $_; # data now contains split string
}

@data now consists of:

text ($data[0])
City, State ($data[1])
More Text ($data[2])
444 ($data[3])
123 ($data[4])


Good luck,

BigRich



japhy
Enthusiast

Apr 25, 2001, 3:37 PM

Post #13 of 19 (2022 views)
Re: about "split" [In reply to] Can't Post

This is a common problem, with a solution using regexes. Please see the code at http://www.pobox.com/~japhy/regexes/comma

Jeff "japhy" Pinyan -- accomplished hacker, teacher, lecturer, and author


BigRich
Novice

Apr 26, 2001, 2:09 AM

Post #14 of 19 (2014 views)
Re: about "split" [In reply to] Can't Post

jeff>This is a common problem, with a solution using regexes. Please see the code at jeff>http://www.pobox.com/~japhy/regexes/comma

jeff>Jeff "japhy" Pinyan -- accomplished hacker, teacher, lecturer, and author


Try that regex with the following string.

$str = q(foo,bar,"this is a backslash->\"\\\"",blat,"not THIS, comma!",oops,'or, this \'one\'');

I got:
@values =
foo
bar
"this is a backslash->\"\\"
",blat,"
not THIS
comma!"
oops
'or, this \'one\''


this is a backslash->"\" , is properly quoted and the internal quotes and backslash are escaped, but it throws your regex out of whack.

I fixed my simple solution to remove escapes from escaped characters. Run the above through it, it'll give you the data in a ready-to-use format.


Code
#find the commas in quoted strings and convert them to 
# something unique
$str =~ s/(("|')[^,]+?),([^,]+?\2)/$1~=~$3/g;

my @data; #split string will go here

foreach (split(/,/, $str)) {
s/^["']|["']$//g; #remove quotes, if any
s/~=~/,/g; #replace commas, if any
s{\\(.)}{$1}g; #remove escapes
push @data, $_; # data now contains split string
}

results:
@data =
foo
bar
this is a backslash->"\"
blat
not THIS, comma!
oops
or, this 'one'

Every application I've written removes the quotes and escapes before using the data from a csv so why not handle the problem of internal commas in a csv while doing so.

It may not be as *cool* as using a big, extended regex but it works.

BigRich



japhy
Enthusiast

Apr 26, 2001, 6:52 AM

Post #15 of 19 (2009 views)
Re: about "split" [In reply to] Can't Post

That's not my regex at fault, BigRich; that's your understanding of the single-quoted string.

Using this string, my regex appears to break:


Code
$str = q{a,bc,"def \"\\\" ghij",klmno}; 

# yields the output:
a
bc
"def \"\\"
ghij"
klmno

Obviously, there's an error somewhere. What is it? The fact that the string you entered does not evaluate to what you think it does.


Code
$str = q{a,bc,"def \"\\\" ghij",klmno}; 
print $str;

This does not print, as one might hope, a,bc,"def \"\\\" ghij",klmno. Rather, it prints a,bc,"def \"\\" ghij",klmno. Do you see the missing backslash? That is what causes my regex to seem to fail.

If, on the other hand, a USER entered the string on STDIN, Perl would not do the \\ -> \ translation, and my regex works fine. It was the data, not the process, that ran afoul.

Jeff "japhy" Pinyan -- accomplished hacker, teacher, lecturer, and author


BigRich
Novice

Apr 26, 2001, 12:15 PM

Post #16 of 19 (2002 views)
Re: about "split" [In reply to] Can't Post

jeff>That's not my regex at fault, BigRich; that's your understanding of the single-quoted string.

Nothing wrong with my understanding of strings, single-quoted or not.

As I read it, you stated that yours was the "intelligent" solution (so any other is un-intelligent?). I saw nothing that stated that the data had to be read in from a file or form submission or STDIN or flew in from LA or shaken gently before serving.

jeff>Obviously, there's an error somewhere. What is it?
jeff >The fact that the string you entered does not evaluate to what you think it does.

I knew exactly what it would evaluate to, but that shouldn't matter to such an "intelligent" solution.

jeff>Rather, it prints a,bc,"def \"\\" ghij",klmno.
jeff>Do you see the missing backslash?

Yep, but the two inner quotes are still escaped, surely an intelligent solution would figure out that the backslash between the escaped quotes was was just a backslash and wasn't supposed to escape the backslash that escaped the second inner quote.

jeff>If, on the other hand, a USER entered the string on STDIN,
jeff>Perl would not do the \\ -> \ translation, and my regex works fine.
jeff>It was the data, not the process, that ran afoul.

Ahhh, now I see, it's "intelligent" as long as the data plays fair, ok.

That's fine, my solutions are both broken and I've never used them(my apologies to the op, but I never stated that they were the "intelligent" solution either).

I've always used a regex based on the one in the FAQ (where 99% of the questions in these forums are already answered), but It's been a long time since I've had to work with a database in which I have no control over the delimiter and, after spending all night pounding out code, I didn't feel like digging through my junk drawer to find it so I used a method I saw in another forum recently. (funny, the self-proclaimed Perl gods in that group didn't appreciate it either)

There's nothing wrong with the method I suggested (sure, it's ugly but so are most of the programmers I know and I don't hold that against them), it's my examples that could use some fine tuning and I think that I've shown that the data doesn't have to play fair for it to work.

Don't mind me, Jeff. I'm the guy who stomps in the mud puddle to see how high the water will splash, sorry if I got your dockers dirty.


BigRich



widexl
Novice

Apr 26, 2001, 1:07 PM

Post #17 of 19 (2001 views)
Re: about "split" [In reply to] Can't Post

He BigRich

I can say only one thing.
I like the way your speaking.


In Reply To

Don't mind me, Jeff. I'm the guy who stomps in the mud puddle to see how high the water will splash, sorry if I got your dockers dirty.



Henk



japhy
Enthusiast

Apr 26, 2001, 2:00 PM

Post #18 of 19 (1998 views)
Re: about "split" [In reply to] Can't Post

(Confidential to BigRich: I'm wearing cutoff jean shorts; a little mud could do them good.)

First, I never said mine was the "intelligent" solution. My comment in the code states that the regex intelligently splits -- and then I clarify: it understands that backslashed characters can exist, and it pays attention to commas within quoted strings.

Second, any data parsing algorithm depends on getting data that it's expected to parse. If that data is screwy, the algorithm can't be depended upon to return the proper results. Algorithms aren't magical. They're input->output schemes.

(That being said, I have modified my regex to now return matches up to the last successful datum on such improperly formed input. The problem was that I was saying matching a comma was optional, when it should only be optional after the last datum.)

(I have also fixed the definition of a non-quoted string; now, the pattern matches single-quoted strings, double-quoted strings, or strings consisting of non-quote and non-comma characters. This seems more proper, now that I consider it.)

As for comprehending backslashes, it requires backslashes mean what they represent. It seems to me that all data being parsed by this process WOULD be safe (unless you're forming a comma-separated list of values and need to parse it later in the code itself).

Please note the difference between this format and proper CSV format. In proper CSV-formatted strings, double quotes inside of double quotes are input as "" instead of \".

I'm gonna get back to work now, in my shorts, and perhaps play some foozball, much to the chagrin of Abercrombie & Finch clothing-besmirchers.

Jeff "japhy" Pinyan -- accomplished hacker, teacher, lecturer, and author


BigRich
Novice

Apr 29, 2001, 11:45 AM

Post #19 of 19 (1975 views)
Re: about "split" [In reply to] Can't Post

Relax Jeff, I'm just pullin' your chain :-)

Looks like I did make you re-examine you code though, and apparently you've improved it as a result.

As far as "proper" format, there is no guarantee that anything will be proper in the www environment where I work. Sure, if the data's coming from, say MS Access, then I can probably rest assured that it would be ""These are ""quotes"" inside of quotes"". But the data could be generated from some custom, shareware or freeware script that could output it as \"escaped\" or '"single outside of double"' or whatever, and If there is a one in a million chance that something will break, it will take my clients about 5 minutes to break it.

(You may also want to send a note to O'Reilly about the errata on page 32 of the "Perl Cookbook" where they (T. Christiansen & N. Torkington) use "a \"glug\" bit," in a string to demonstrate how Text::ParseWords can parse CSV files with commas in the fields, let them know that's not a "proper CSV format").


I'd better get back to work too. I've got to put the finishing touches on this custom database/site management project so I can get paid ($$cha-ching$$), then I may go fishing and have a few cold ones. Ahhh, the life of a freelancer, tough job but somebody's gotta do it.


BigRich


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives