CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Split in several lines

 



hybi
Novice

Jul 23, 2013, 3:31 AM

Post #1 of 14 (1216 views)
Split in several lines Can't Post

Hey everyone,

i have a .txt-file with entries about article descriptions. Each entry is seperated by "@@", so i kinda looks like this:

Title [ whatever ]
Year, Page [ 1996, 1057]
Description [ something something ] @@
Title [ whatever ]
Year, Page [ 1999, 217]
Description [ something something ] @@
....
...


Now i have written a script which seperates the entries and saves each entry in a seperate file. This is the code:


Code
#!/usr/bin/perl 
use warnings;
use strict;

my $i = 0;
my $split = '@@';
open IN, 'wuw_all.txt' or die "Can't open in.txt: $!\n";
open OUT, '> Files/wuw1.txt' or die "Can't write to wuw1.txt: $!\n";
while (<IN>)
{
if (/^(.*?)$split(.*)$/) {
print OUT $1 if $1;
close OUT;
$i++;
open OUT, '> Files/wuw' . $i . '.txt' or die "Can't write to wuw +${i}.txt: $!\n";
print OUT $2 if $2;
}
else {
print OUT $_;
}
}
close IN;


So far, so good. Now the problem is in these lines:

Year, Page [ 1996, 1057]

Now i would like to extend the script where this line is splitted so it looks like this in the end:

Year [ 1996 ]
Page [ 1057 ]


Anyone got an idea how to handle that?
Thanks in advance!


BillKSmith
Veteran

Jul 23, 2013, 5:57 AM

Post #2 of 14 (1191 views)
Re: [hybi] Split in several lines [In reply to] Can't Post

Place this code before the "else".


Code
    elsif (m/^Year, Page\s*\[\s*(\d{4}),\s+(\d+)\]\s$/) { 
print "Year [$1]\nPage [$2]\n";


Are you sure your posted code does what you want? It appears that the first entry goes in one file, the last entry in the other. All others are discarded. (When you open a file that is already open, open closes it for you. Refer to the perl documentation for close (perldoc -f close).
Good Luck,
Bill


hybi
Novice

Jul 23, 2013, 6:24 AM

Post #3 of 14 (1187 views)
Re: [BillKSmith] Split in several lines [In reply to] Can't Post

Hey Bill,

thanks for the reply and your code!

I am kind of new to Perl so I actually have no idea whether this code is written correctly or whether I have too much stuff in it. But to answer your question: yes, it actually does what I want!
I get over 600 files, each entry seperated as it was wished.

I tried to expand my script with your code but unfortunately nothing happened Frown

I hope I put it in the right place:


Code
..... 
..
print OUT $2 if $2;
}
elsif (m/^Year, Page\s*\[\s*(\d{4}),\s+(\d+)\]\s$/) {
print OUT "Year [$1]\nPage [$2]\n";
}
else {
print OUT $_;
}
}
close IN;



BillKSmith
Veteran

Jul 23, 2013, 8:33 AM

Post #4 of 14 (1177 views)
Re: [hybi] Split in several lines [In reply to] Can't Post

The problem is probably whitespace at the end of the line. I should have handled it the same way you did in your regex.


Code
elsif (m/^Year, Page\s*\[\s*(\d{4}),\s+(\d+)\]\s*$/) {


Note the asterisk after the last \s.

Forget my comment. I did not understand what you were trying to do and I overlooked the concatenation in the filename. Your approach is unusual, but as long as it works ...

I plan to post more conventional code later.
Good Luck,
Bill


hybi
Novice

Jul 24, 2013, 2:33 PM

Post #5 of 14 (1152 views)
Re: [BillKSmith] Split in several lines [In reply to] Can't Post

Thanks, will be able to try the adjustments tomorrow.


In Reply To
I plan to post more conventional code later.


Much appreciated!


BillKSmith
Veteran

Jul 24, 2013, 4:24 PM

Post #6 of 14 (1145 views)
Re: [hybi] Split in several lines [In reply to] Can't Post

The following version should be easy to understand. The only 'trick' is the use of the special variable "$/" (refer: perldoc perlvar) The operator <> read upto and including the INPUT_RECORD_SEPARATOR (a newline by default).


Code
use strict; 
use warnings;
open my $SOURCE, '<', 'wuw_all.txt'
or die "Cannot open input:$!";
my $i = 0;
local $/ = "@@\n";
while (my $entry = <$SOURCE>) {
$entry =~ s/Year,\sPage\s\[\s(\d{4}),\s(\d+)\]
/Year $1\nPage $2/xms;
$i++;
my $output_file_name = "wuw$i.txt" ;
open my $OUT, '>', $output_file_name
or die "cannot open $output_file_name:$!";
print {$OUT} $entry;
close $OUT;
}
close $SOURCE;
print "Successfully wrote $i files\n";

Good Luck,
Bill


hybi
Novice

Jul 25, 2013, 2:48 AM

Post #7 of 14 (1135 views)
Re: [BillKSmith] Split in several lines [In reply to] Can't Post

Thanks for your help, Bill!

I have tried your code but it didn't work, I don't know why Frown
But thanks to your adjustment previously, it worked out fine!
Your script seems to be more professional but I would like to stick with my script just because of the fact that it works even if it looks weird :D
I hope it's ok with you.

Now i have another problem. I want to split the last line of each entry which looks like this for example:

Description [ EV 2510 ] @@

Note the @@ which is the seperator between each entry!

So now the script looks like this:


Code
use warnings; 
use strict;

my $i = 0;
my $split = '@@';

open IN, 'wuw_all.txt' or die "ERROR!\n";
open OUT, '> wuw0.txt' or die "ERROR!\n";

while (<IN>)
{
if (m/^Year, Page\s*\[\s*(\d{4}),\s+(\d+)\s\]\s*$/){
print OUT "Year [ $1 ]\nPage [ $2 ]\n";
}

elsif (m/^Description\s*\[\s*(.*)\s(\d*)\s\]\s(.*)/){
print OUT "Description [ $1 ]\nDescription-Number [ $2 ] @@\n";
}
# I added the @@ again in the last print-line for the split in the next elsif

elsif (/^(.*?)$split(.*)$/) {
print OUT $1;
close OUT;
$i++;
open OUT, '> wuw' . $i . '.txt' or die "ERROR!\n";
print OUT $2;
}

else {
print OUT $_;
}
}
close IN;


Unfortunately the script does not split at the @@ anymore. The middle elsif part with "Description" splits the line like I wished, but the last elsif doesn't seem to work because of the previous elsif.
I guess the problem is with the @@ which I added again previously but I didn't know how to handle it.

Anyone got an idea?
Thanks in advance!


BillKSmith
Veteran

Jul 25, 2013, 4:16 AM

Post #8 of 14 (1131 views)
Re: [hybi] Split in several lines [In reply to] Can't Post

You have just discovered the problem with code such as yours. It can be incredibly hard to modify when the requirements change slightly. (You can bet the changed code will be even harder to change the next time.)

In this case, my code should handle the new data without any change at all! The problem that you are having with it is almost certainly that my definition of $/ does not match your data. I am attaching a copy of my code and the data that I tested it with. The code should be easy to fix once you see the difference between your data and mine.
Good Luck,
Bill
Attachments: hybi.pl (0.49 KB)
  wuw_all.txt (0.24 KB)


hwnd
User

Jul 25, 2013, 5:03 PM

Post #9 of 14 (1108 views)
Re: [BillKSmith] Split in several lines [In reply to] Can't Post

Another way you could do this without splitting anything.


Code
 use strict;    
use warnings;

my $key;
my @data;

my $i = 0;

open my $fh, '<', 'file.txt' or die "failed open $!";

while (<$fh>) {
chomp;
if ( /^Title\s\[\s+(.*)\s+\]/ ) {
$key = qq{Title [ $1 ]} . "\n";
}
elsif ( /^Year,\sPage\s\[\s(\d{4}),\s(\d+)\]/ ) {
$key .= qq{Year [ $1 ]} . "\n" . qq{Page [ $2 ]};
}
elsif ( /^Description\s\[\s(.*)\s+\]/ ) {
push @data, join "\n", $key, qq{Description [ $1 ]};
}
}

close $fh;

foreach ( @data ) {
++$i;
open my $out, '>', "wuw$i.txt" or die "failed open $!";
print $out $_;
close $out;
}



(This post was edited by hwnd on Jul 25, 2013, 5:36 PM)


BillKSmith
Veteran

Jul 25, 2013, 8:51 PM

Post #10 of 14 (1100 views)
Re: [hwnd] Split in several lines [In reply to] Can't Post

Much better. Why not print your output in the last elsif block rather than pushing it into the @data array?

I notice that you are not printing the '@@' or the final newline. Your operating system may take care of the newline for you, but your code is more portable if you do it explicitly in perl.
Good Luck,
Bill


hybi
Novice

Jul 26, 2013, 2:22 AM

Post #11 of 14 (1091 views)
Re: [BillKSmith] Split in several lines [In reply to] Can't Post


In Reply To
You have just discovered the problem with code such as yours. It can be incredibly hard to modify when the requirements change slightly. (You can bet the changed code will be even harder to change the next time.)

In this case, my code should handle the new data without any change at all! The problem that you are having with it is almost certainly that my definition of $/ does not match your data. I am attaching a copy of my code and the data that I tested it with. The code should be easy to fix once you see the difference between your data and mine.

You're probably right, i changed the example a little because it is actually in German. I attached an excerpt from the original data and simply changed the titles and numbers. Everything else is as it is seen here.

I hope you can help me to adjust your script to this excerpt because I'm still struggling to apply your script to it and still nothing happens.
Thanks in advance!
Attachments: bsp.txt (1.19 KB)


BillKSmith
Veteran

Jul 26, 2013, 6:44 AM

Post #12 of 14 (1082 views)
Re: [hybi] Split in several lines [In reply to] Can't Post

I am unable to duplicate your problem.

Just to be certain that we are using the same thing, I downloaded both my code and your data from this thread. (Both files are in ASCII with windows newlines)
I edited my code to get its input from bsp.txt.

When I ran that code (perl 5.16.1 on windows xp), it split the input into three files, each ending in "@@". Of course, it did not fix the year and page line because it is still looking for the English.

What do you mean "nothing happens?" How are you sure that the program ran at all? Did you get any error messages? Did you get (at least) one output file? Did you get the message saying how many files were written?

Please try again. Download both files just to be sure that we are using the same code and data. Edit the code to get input from that data file. Run the code. Tell me exactly what happened.

Tell me what OS and perl you are using.
Good Luck,
Bill


Chris Charley
User

Jul 27, 2013, 1:01 PM

Post #13 of 14 (1061 views)
Re: [hybi] Split in several lines [In reply to] Can't Post

When you say you want to split Description (Abteilung [ EV 1234 ] @@, for example), will there always be 2 fields enclosed in brackets? Might there be only 1, or 3 or more?

How would you want the output to look.

Code
Abteilung        [ EV  ] 
Abteilung [ 1234 ]

Or

Code
Abteilung        [ EV  ] 
[ 1234 ]



hybi
Novice

Jul 30, 2013, 5:09 AM

Post #14 of 14 (970 views)
Re: [Chris Charley] Split in several lines [In reply to] Can't Post

Hey everyone, sorry for being absent for the last couple of days.

I have been working on the script for some time and it is done.
Thanks for everyone's help, just wanted you to know that I won't bother you any longer Wink

If anyone is interested in the final solution, I attached the script.

Best regards,
hybi
Attachments: script.txt (1.56 KB)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives