CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Code help with search script

 



richsark
User

May 14, 2009, 6:01 AM

Post #1 of 16 (1865 views)
Code help with search script Can't Post

I have a script, but its sorta broke and I need help to tweak it.


Code
#!/usr/bin/perl 

my @files_to_search = ("exportOBJ-sark.txt","ResourceRecText-ALL.txt","exportdnsrr-report.txt");

open (FL, "sark.com.txt");
my %search_hash = ();
while (my $line = <FL>) {
my @parts = split(/,/,$line);
$search_hash{$parts[0]} = 1;
$search_hash{$parts[4]} = 1;
}
close(FL);

foreach my $file (@files_to_search) {
open (FL, $file);
while (my $line = <FL>) {
foreach my $key (keys %search_hash) {
if ($line =~ /$key/) {
print $line;
}
}
}
close(FL);
}


The above script takes input file named sark.com.txt which contains this type of info:

ao,300,IN,CNAME,www.sark.com.,,,
bd-integ4-sarkprime,IN,NS,gss-t2i.is.sark.com.,,,,
sarkhub,IN,MX,5,sarkhub1,,,
bpm,IN,NS,gss-t2p.is.sark.com.,,,,
bpmqa,IN,NS,gss-t2i.is.sark.com.,,,,

from the above example we take one record (bd-integ4-sarkprime,IN,NS,gss-t2i.is.sark.com.,,,,) ( for example) strip it and it should result as:

bd-integ4-sarkprime
and
gss-t2i.is.sark.com

The script should then look for bd-integ4-sarkprime and gss-t2i.is.sark.com in the 3 files (exportOBJ-sark.txt","ResourceRecText-ALL.txt","exportdnsrr-report.txt)



(BTW..The format above will not change.)

Then I want it to search inside 3 files for any matches to bd-integ4-sarkprime and gss-t2i.is.sark.com to report on it. called

exportOBJ-sark.txt
(example of contents)
10.0.234.254,cli2821cncx-stmin-00-01,,,is.sark.com,Server,"",,,,,-1,0,,,-1,,3,,
10.0.237.254,cli2821cncx-tgsp-00-01,,,is.sark.com,Server,"",,,,,-1,0,,,-1,,3,,

For "exportOBJ-sark.txt" I can remove the ",,," manually before running the script so it can join as cli2821cncx-stmin-00-01.is.sark.com, or if the script can do that?

ResourceRecText-ALL.txt
(example of contents)
outputRR-sark.com.txt:ResourceRecText=whdgss1infz-qapri1.is.sark.com.
outputRR-sark.com.txt:ResourceRecText=n3mgss1infz-qasec1.is.sark.com.

exportdnsrr-report.txt
(example of contents)
srsservicing.net,sarkprivate.srsservicing.net,IN,A,-1,100.168.67.190,,,0,0,"",""
srsservicing.net,sarkmanagedcontrolprivate.srsservicing.net,IN,A,-1,100.168.67.189,,,0,0,"",""

Can I get some help to tune this up please.

Thanks


FishMonger
Veteran / Moderator

May 14, 2009, 6:45 AM

Post #2 of 16 (1863 views)
Re: [richsark] Code help with search script [In reply to] Can't Post

In what way is it "sorta broke"?

Do you realize that only 2 lines from the sark.com.txt sample data that you posted have a value in the 4th field?

Which fields in the other files do you want to match?

You're missing 2 very important lines (use statements) which should be in every Perl script you write.

Code
use strict; 
use warnings;


You should ALWAYS check the return code of an open call and take action if it fails and it's preferable to use the 3 arg form of open and a lexical var for the filehandle.

Code
my $sark = 'sark.com.txt'; 
open my $FL, '<', $sark or die "failed to open '$sark' $!";



richsark
User

May 14, 2009, 7:41 AM

Post #3 of 16 (1860 views)
Re: [FishMonger] Code help with search script [In reply to] Can't Post

OK, I made a correction:


Code
$search_hash{$parts[4]} = 1; 
should be:
$search_hash{$parts[3]} = 1;


maybe I am going about it all wrong. I just want to use a reference file "sark.com.txt" and then look at various files to see if there are any matches, my issue is that they are all in different formats which makes it hard I guess ( to write a perl or any script)

saying that, for the other files I want to match vary,

exportOBJ-sark.txt
(example of contents)
10.0.234.254,cli2821cncx-stmin-00-01,,,is.sark.com,Server,"",,,,,-1,0,,,-1,,3,,
10.0.237.254,cli2821cncx-tgsp-00-01,,,is.sark.com,Server,"",,,,,-1,0,,,-1,,3,,

you can see it has lots of crap in there and comma's etc... essentially I only need

cli2821cncx-stmin-01.is.sark.com
and
cli2821cncx-tgsp-00-01.is.sark.com
used to match. the other stuff I dont care

for:
ResourceRecText-ALL.txt
(example of contents)
outputRR-sark.com.txt:ResourceRecText=whdgss1infz-qapri1.is.sark.com.
outputRR-sark.com.txt:ResourceRecText=n3mgss1infz-qasec1.is.sark.com

just look at the name after the "="

lastly, for
exportdnsrr-report.txt
(example of contents)
srsservicing.net,sarkprivate.srsservicing.net,IN,A,-1,100.168.67.190,,,0,0,"",""
srsservicing.net,sarkmanagedcontrolprivate.srsservicing.net,IN,A,-1,100.168.67.189,,,0,0,"",""

I only care about whats after the first "," so
sarkprivate.srsservicing.net

does this help?


richsark
User

May 15, 2009, 4:36 AM

Post #4 of 16 (1850 views)
Re: [richsark] Code help with search script [In reply to] Can't Post

HI,

Any updates or help?


KevinR
Veteran


May 15, 2009, 8:48 AM

Post #5 of 16 (1845 views)
Re: [richsark] Code help with search script [In reply to] Can't Post

see your thread on unix.com
-------------------------------------------------


richsark
User

May 19, 2009, 6:19 AM

Post #6 of 16 (1816 views)
Re: [KevinR] Code help with search script [In reply to] Can't Post

HI,

I like to keep this fthread open as it leaves ideas open from folks that visit here, I am still in search of a working code.

Thanks


1arryb
User

May 19, 2009, 6:46 AM

Post #7 of 16 (1810 views)
Re: [richsark] Code help with search script [In reply to] Can't Post

Hi rich,

You have a choice of approaches. I'd consider:

1. Figure out some tricky uberregular expression that can handle all of the formats.

2. Examine each file as you open it and determine the format, then choose the appropriate search strategy for the format.

3. Normalize the files into a standard format then use a single search strategy appropriate to the normal form. Normalization can be done either line-by-line or whole-file using temp files.

Cheers,

Larry


richsark
User

May 19, 2009, 7:18 AM

Post #8 of 16 (1808 views)
Re: [1arryb] Code help with search script [In reply to] Can't Post

Thanks for the comments,

The issue is that I have millions of lines of text I need to search and it will take a long time to try and weed them all into one common file/format.

As I am not perl/awk expert, I would not know an easier way to do this.

Is my request to far out there to accomplish such requirement?

Thanks


FishMonger
Veteran / Moderator

May 19, 2009, 7:55 AM

Post #9 of 16 (1804 views)
Re: [richsark] Code help with search script [In reply to] Can't Post

How many different files/formats do you have? Is it only the 3 you've posted?

I haven't worked up any code, but off hand I'd probably use a dispatch table where the keys are the filenames and the value of each is a subroutine call that passes the format (i.e., the field separator, and wanted fields).


richsark
User

May 19, 2009, 9:03 AM

Post #10 of 16 (1802 views)
Re: [FishMonger] Code help with search script [In reply to] Can't Post

Hi FishMonger,

Only 3 file formats but with lots of lines. What you suggested is awesome, but for me its to far off expertise.

Could you help?


FishMonger
Veteran / Moderator

May 20, 2009, 2:03 PM

Post #11 of 16 (1789 views)
Re: [richsark] Code help with search script [In reply to] Can't Post

I didn't use a dispatch table (but it's very close), but see if this does what you need.


Code
#!/usr/bin/perl 

use strict;
use warnings;
use Data::Dumper;

my %search_hash;
my $sark = 'sark.com.txt';

open my $FH, '<', $sark or die "failed to open '$sark' $!";
while (<$FH> ) {
my @parts = (split(/,/))[0,3];
$search_hash{$parts[0]}++ if $parts[0];
$search_hash{$parts[1]}++ if $parts[1];
}
#print Dumper \%search_hash;

my %files = ('exportOBJ-sark.txt' => [ ',', [1,4], '.'],
'ResourceRecText-ALL.txt' => [ '=', [1] ],
'exportdnsrr-report.txt ' => [ ',', [1] ],
);

foreach my $file ( sort keys %files ) {
process_file($file, $files{$file});
}


sub process_file {
my $file = shift;
my ($delimiter, $fields, $join) = @{$_[0]};

print "processing: $file\n";

open my $FH, '<', $file or die "failed to open '$file' $!";
while (<$FH> ) {
chomp;
my @parts = (split(/$delimiter/))[@$fields];

if ( $join ) {
@parts = join($join, @parts);
}

#print Dumper \@parts;
foreach my $key ( @parts ) {
if ( exists $search_hash{$key} ) {
print $search_hash{$key},$/;
}
}
}
print "\n\n";
}



richsark
User

May 20, 2009, 6:25 PM

Post #12 of 16 (1783 views)
Re: [FishMonger] Code help with search script [In reply to] Can't Post

Thanks FishMonger, will try it and let you know

Cheers


richsark
User

May 21, 2009, 5:14 AM

Post #13 of 16 (1776 views)
Re: [richsark] Code help with search script [In reply to] Can't Post

Hi FishMonger,

Here is an update,

I ran the script, This is what I get on the command window:

$ ./compare.pl
processing: ResourceRecText-ALL.txt


processing: exportOBJ-SARK.txt


processing: exportdnsrr-report.txt
1

Does the 1 mean it found a match? if thats the case, I dont know which one? is there a way to print what it found?

Thanks


FishMonger
Veteran / Moderator

May 21, 2009, 5:41 AM

Post #14 of 16 (1774 views)
Re: [richsark] Code help with search script [In reply to] Can't Post

The '1' is what was matched.

Uncomment the print Dumper line(s) to see the details.


richsark
User

May 21, 2009, 6:59 AM

Post #15 of 16 (1771 views)
Re: [FishMonger] Code help with search script [In reply to] Can't Post

Yup, I did that, now I get:

$ ./compare.pl
$VAR1 = {
'A' => 3,
'sarldns1.rservices.com.' => 1,
'nyss1.rservices.com.' => 3,
'session.rservices.com.' => 1
};
processing: ResourceRecText-ALL.txt


processing: exportOBJ-SARKtxt


processing: exportdnsrr-report.txt
1

But I still dont know under "processing: exportdnsrr-report.txt" what is matched?


FishMonger
Veteran / Moderator

May 21, 2009, 7:09 AM

Post #16 of 16 (1768 views)
Re: [richsark] Code help with search script [In reply to] Can't Post

Uncomment the other print Dumper line.

If you want to see the full line, then add print $_; just before the chomp statement in the subroutine.

It's very possible that you may need to adjust which fields it's selecting.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives