CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Regex matching question

 



earachefl
Novice

Jul 15, 2009, 5:03 AM

Post #1 of 7 (4412 views)
Regex matching question Can't Post

I'm trying to complete an exercise which reads from a text file of "Much Ado About Nothing" and writes the lines for a particular character only. The text file separates the character's speeches by listing the character name on its own line, in caps, followed by their speech, followed by one blank line, e.g.

BEATRICE\n
yada yada yada\n
yada yada yada\n
\n
LEONATO\n

The book suggests using the following pattern matching to find the current character and store it in a variable:



Code
unless($line =~ m/[^A-Z\s]/) 
{
$character = $line;
}


which works fine with one exception: it also matches the empty lines (consisting of only the \n character), which I don't want to have happen.

So to boil down my question: any suggestions for the best way to match ONLY lines which consist of one or two all-caps words, e.g.

LEONATO\n

DON JOHN\n

not: Enter LEONATO, HERO, BEATRICE\n?

Thanks in advance.


strathglass
New User

Jul 15, 2009, 7:01 AM

Post #2 of 7 (4407 views)
Re: [earachefl] Regex matching question [In reply to] Can't Post

 
I think this would work to store character names into $character:


Code
if ($line =~ m/^[A-Z\s]+$/)  
{
$character = $line;
}


It looks for lines with only one or more A-Z or space characters, and hence excludes blank lines and exlcudes lines with lower case, with commas, etc.

-strathglass


ichi
User

Jul 16, 2009, 11:22 PM

Post #3 of 7 (4395 views)
Re: [earachefl] Regex matching question [In reply to] Can't Post

without too much regular expression, here's another way making use of standard string function uc()

Code
while(<>){ 
chomp;
if ( uc($_) eq $_ ){
print $_."\n";
}
}



earachefl
Novice

Jul 17, 2009, 6:49 AM

Post #4 of 7 (4390 views)
Re: [strathglass] Regex matching question [In reply to] Can't Post

Almost, but no cigar... that also matches lines which consist of only the newline character. BUT....


Code
if ($line =~ m/^[A-Z]+\s?[A-Z]*$/)


works! This ensures that the line starts with at least one capital letter, followed by zero or one spaces, ending with zero or more capital letters.


earachefl
Novice

Jul 17, 2009, 6:50 AM

Post #5 of 7 (4389 views)
Re: [ichi] Regex matching question [In reply to] Can't Post

Ah, that's smart.... didn't think of it because the exercise called for using regex.


earachefl
Novice

Jul 17, 2009, 7:41 AM

Post #6 of 7 (4384 views)
Re: [earachefl] Regex matching question [In reply to] Can't Post

Ah, not so fast... this works until, unexpectedly, at line 249 of the text, Don Pedro's character and lines get written to the file, and at line 977, Don John's, and at 1173, 1181, 1198, 1224, 1240, 1248, 1257, 1274, 2048, Don Pedro's, and at 3101, Don John's, and at 3121, Friar Francis's. So out of some 756 character changes, 13 are incorrectly processed.

I'm attaching the txt file that's being processed (MuchAdoAboutNothing.txt) as well as my outputs - Beatrice.txt and my Terminal output.

And here's the complete code that I'm using:

Code
#!/usr/bin/perl 

$file = "MuchAdoAboutNothing.txt";

if (-e $file && -r $file)
{
open (IN, "<$file") || die "Couldn't open $file, $!";
}

while ($line = <IN>)
{
#if $line consists of only uppercase characters, set $character to $line
#unless ($line =~ m/[^A-Z\s]/) what the book suggested - not so good
if ($line =~ m/^[A-Z]+\s?[A-Z]*$/)
{
$character = $line;
#used for debugging character changes
print ("Character changed to $character");
}

#if BEATRICE is the current character, add her lines to the lines array
if ($character =~ m/BEATRICE/)
{
push (@lines, "$line");
}
}

close(IN);

open(OUT, ">Beatrice.txt") || die ("Couldn't open Beatrice.txt: $!");

print OUT @lines;

close(OUT);


#There's also an issue with what happens when it's still Beatrice's character and there's a scene change or stage direction; those lines also get written to the file as if they were part of Beatrice's lines. But that's a separate issue and I'll work on it myself.


(This post was edited by earachefl on Jul 17, 2009, 7:44 AM)
Attachments: MuchAdoAboutNothing.txt (123 KB)
  Beatrice.txt (15.0 KB)
  Saved Terminal Output.txt (22.4 KB)


FishMonger
Veteran / Moderator

Jul 17, 2009, 9:50 AM

Post #7 of 7 (4376 views)
Re: [earachefl] Regex matching question [In reply to] Can't Post

First, every Perl script you write should include the warnings and strict pragmas, which also means that you need to declare your vars using the my keyword.


Code
#!/usr/bin/perl 

use strict;
use warnings;

my $file = "MuchAdoAboutNothing.txt";


For this assignment, I'd highly recommend processing the file in paragraph mode instead of line-by-line.
http://www.perl.com/pub/a/2004/06/18/variables.html

Putting the open call in the if block is unnecessary.

When creating/opening filehandles, it's best to use the 3 arg form of open and lexical vars instead of the barewords. I'd also open both of the filehandles at the same time and drop the array.

Code
open my $INPUT, '<', $file or die "Couldn't open $file, $!"; 
open my $OUTPUT, '>', 'Beatrice.txt' or die "Couldn't open Beatrice.txt: $!";


Now, putting those things together gives you 95% of the code needed. Your while loop could be reduced to a single simple line.

Here's the complete script as a 1 liner.

Code
perl -00 -ne "print if /^BEATRICE/" MuchAdoAboutNothing.txt


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives