CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
'Processing a text file using a separate Regex file' help needed please

 



cyberjoe
New User

Apr 19, 2005, 5:37 AM

Post #1 of 3 (918 views)
'Processing a text file using a separate Regex file' help needed please Can't Post

Hi ppl,

I've been trying to work on another way of executing the following program.

$_ = "My cat can get very annoying at times but then that is to be expected seeing as he is nearly 80 years old";

s/can/cn/g;
s/get/gt/g;
s/very/vry/g;
s/annoying/annoyN/g;
s/times/x/g;
s/then/thn/g;
s/that/tht/g;
s/to/2/g;
s/be/b/g;
s/expected/Xpectd/g;
s/seeing/C_N/g;
s/80/8T/g;
s/years/yrs/g;

print "$_\n";

Obviously this works fine, but what I really want to do is create a base program where the S/// operators and text file they act upon get read in from separate files and the the result is printed to an output file.

I know you can read in separate files for the S/// operator as in 'replacement word' file and 'substituted word' file and these then act on the text file, but the s/// operators are just simplifications for what I really want to do, which are RegEx.

So what the program should really read as is

$_ = "My cat can get very annoying at times but then that is to be expected seeing as he is nearly 80 years old";

RegEx;
RegEx;
RegEx;
RegEx;
RegEx;
RegEx;
RegEx;
RegEx; etc.

print "$_\n";

So, I want the Regex to read in from a separate file and the example $_ to be read in from a separate file and then the read in RegEx to process the read in sample $_ with the output saved to another file.

At the command line, I want it to look something like

'perl base_prog.pl Regex.txt $_.txt'

(where Regex.txt and $_.txt will be specified)

I've tried with minimal success at the moment.

I have another idea where the base program opens a specified file with the Regex in it , which then get saved
to another file which the gets executed itself as a valid script and continues the process by asking for the text file to be processed by the Regex.

This idea came about because I was told that reading in any operators into memory like the sample S/// operators in their full format does not allow them to process any other file as they as they cannot process this way.

The reason I want to specify the Regex file is becuse I want to be able to experiment with different combinations of Regex on text files which would mean different Regex files (for ease)

I hope I make sense.

Any help or suggestion would be appreciated.

(Sorry for the long post)


rork
User

Apr 20, 2005, 9:04 AM

Post #2 of 3 (904 views)
Re: [cyberjoe] 'Processing a text file using a separate Regex file' help needed please [In reply to] Can't Post

The first problem is to read the filenames from the command line.

Code
my $regexp_file = shift; 
my $text_file = shift;


I would make a regexp file that looks like:

can=cn\n
get=gt\n
....

Then itter over the file storing everything in a hash.
my %regexps;

Code
open (REGEXP, "<", $regexp_file); 
while(<REGEXP>) {
chomp $_;
my ($original, $new) = split(/=/, $_);
$regexps{$orignal} = $new;
}


Open the text file and read it into an array.
Itter through the array while executing the regexps and write it to a file.


Code
open(FILE2, ">", "file2.txt"); 
foreach my $line(@file1) {
foreach my $key(keys %regexps) {
$line =~ s/$key/$regexps{$key}/g;
}
print FILE2 $line;
}
close(FILE2);


Something like this should work but there might be better ways to do this.
--
Don't reinvent the wheel, use it, abuse it or hack it.


davorg
Thaumaturge / Moderator

Apr 24, 2005, 7:26 AM

Post #3 of 3 (890 views)
Re: [cyberjoe] 'Processing a text file using a separate Regex file' help needed please [In reply to] Can't Post

I think you're slightly confusing everyone with your description of what you want to do. When you say you want to store regexes in a file, I think that what you actually want is to store the patterns and the replacement strings for a set of substitution operators. If I'm wrong then please correct me.

It's perfectly possible to use variables as both the pattern and the replacement string in a substitution operator so you can write code like this.


Code
#!/usr/bin/perl 

use strict;
use warnings;

$_ = "Here is a test string\n";

while (my $subs = <DATA>) {
chomp $subs;
my ($from, $to) = split /\t/, $subs, 2;

s/$from/$to/g;
}

print;

__DATA__
Here[TAB]There
is[TAB]goes
a[TAB]the
test string[TAB]neighbourhood

(Where I've put [TAB] in the code above, you need to replace it with a literal tab character).

It's then a short step to storing the data in a completely separate file.

--
Dave Cross, Perl Hacker, Trainer and Writer
http://www.dave.org.uk/
Get more help at Perl Monks

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives