
mhx
Enthusiast
/ Moderator
Jul 27, 2001, 1:00 PM
Post #6 of 9
(686 views)
|
|
Re: Reading in Contents, then Formatting out: Importan
[In reply to]
|
Can't Post
|
|
Hi, first, the code we posted were our actual Perl scripts that worked really fine. (I hope Jasmine won't mind that I'm talking in her name here, but I'm quite sure this is her tested script ;-) Of course you cannot upload my script and expect it to work, since it isn't laid out as a CGI script. It's laid out to run from the shell or command line. Since this is the intermediate forum, I was a bit rare with explanation :) Actually, your problem doesn't seem to be that our code isn't working, but how you can embed our code in your script. But this should be quite easy with some basic Perl knowledge and even if you don't understand my code. (I hope you understood Jasmine's code, because her explanation was very good and accurate.) Anyway, just to explain to you what my code does (so you know what it does not and can figure out how to extend the script), here comes the full script again, just a bit reformatted.
#!/bin/perl -w use strict; print map "$_->[3]$_->[0](/a)(br)\n". "$_->[3]$_->[2](/a)(br)with $_->[1](br)(br)\n", map [ grep !/^\s*$/, split /(?:\r?\n)+/ ], grep !/^\s*$/, split /_+/, do { local $/=undef; <DATA> }; __DATA__ _____________________________ John Doe Silver Hair Wednesday, September 19 at 8 PM (a link is here) ______________________________ Mary Bonham Blue House Saturday, September 23 at 7:30 PM (a link is here) The header should be clear from Jasmine's explanation:
#!/bin/perl -w use strict; It just specifies the path to your perl interpreter executable, turns Perl's warnings on (which is a must) and turns Perl's strict mode on (which is a must). The important part, as you might just have figured out, is
print map "$_->[3]$_->[0](/a)(br)\n". "$_->[3]$_->[2](/a)(br)with $_->[1](br)(br)\n", map [ grep !/^\s*$/, split /(?:\r?\n)+/ ], grep !/^\s*$/, split /_+/, do { local $/=undef; <DATA> }; To understand this, let's go from the bottom up, because that's also the way Perl evaluates it. If you see lots of map's and grep's and split's you almost always have to start at the end to understand ;-)
do { local $/=undef; <DATA> }; This block will read the whole content from DATA. As I pointed out, DATA is a special filehandle that allows you to read directly from a special section (everything following __DATA__) in the script. You could just use any other filehandle than DATA, if you would read from a file called 'address.txt', you would open that file
open ADDRESS, 'address.txt' or die "cannot open address.txt: $!\n"; and replace DATA by ADDRESS
do { local $/=undef; <ADDRESS> }; Now, why does this read the whole file? $/ is the input record separator, which is normally set to the newline sequence. If you undefine it, as I do in the block above, the readline operator <> will gobble the whole file. So the contents of the whole file are passed into the previous line of our script: which will just split it by the long lines of underscores. /_+/ is the regular expression for one or more underscores, and that regex is used to separate the fields in the string that we just read in. The split function returns a list of all records. Since there was a blank line before the first record, that list would contain three elements, the first of which contains only whitespace characters. Since we don't want 'empty' records, the previous line filters these out: This will only return those elements of the list that is passed in that do not only contain whitespace characters. The grep function evaluates !/^\s*$/ for each element of the list and 'greps' only those for which the expression is true. /^\s*$/ is a regex that checks if a string contains only whitespace characters. Since we want all elements for which this is not the case, we negate the regex matching result with a '!'. Now, we want the fields in each record. This is done in the following block that is evaluated for each of our two remaining records due to the map function:
map [ grep !/^\s*$/, split /(?:\r?\n)+/ ], map is quite similar to grep, only that it returns the result of the given expression for each element of the list that we feed in. The result is encapsulated in square brackets, which means we return an array reference. So, when the map is done, we will have a list of array references. But what do the referenced arrays contain? The following two lines are very similar to what I have explained above for the split and grep. Each record is now split into its lines and the empty lines are filtered using the grep function. The regex /(?:\r?\n)+/ means one or more newline sequences, the reason I used \r?\n was to support Windows (\r\n) and Unix (\n) newline sequences.
grep !/^\s*$/, split /(?:\r?\n)+/ The array that is returned by grep will now have 4 elements for each record, actually, the list returned by map will look like this:
['John Doe', 'Silver Hair', 'Wednesday, September 19 at 8 PM', '(a link is here)'], ['Mary Bonham', 'Blue House', 'Saturday, September 23 at 7:30 PM', '(a link is here)'] We have extracted all the data from our file successfully! Now all that's left is to print that stuff out, and this is done by mapping each list element into a string and printing the list of strings returned by map:
print map "$_->[3]$_->[0](/a)(br)\n". "$_->[3]$_->[2](/a)(br)with $_->[1](br)(br)\n", The code may look a bit magic to a beginner, but you quickly get used to code like this. I hope the explanations above make clear to you what my script does, and what it not does. I hope you can add the neccessary code to make this suit your needs. BTW, your other post is just the same problem. No difference! You just have to replace the expression in the map function:
print map join('|', @$_)."\n", This will join the array elements for each record by pipes, append a newline to the string and print the resulting list of strings. You could also put this into only one map and have:
print map join('|', grep !/^\s*$/, split /(?:\r?\n)+/ )."\n", grep !/^\s*$/, split /_+/, do { local $/=undef; <DATA> }; If you're not familiar with map, grep and split, I recommend you to read the manual pages to these functions with perldoc -f grep, for example. I hope all this helps. -- Marcus
s$$ab21b8d15c3d97bd6317286d$;$"=547269736;split'i',join$,,map{chr(($*+= ($">>=1)&1?-hex:hex)+0140)}/./g;$"=chr$";s;.;\u$&;for@_[0,2];print"@_,"
|