CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Search for text within a line of text

 



mike3point0
Novice

May 30, 2013, 11:43 AM

Post #1 of 22 (1174 views)
Search for text within a line of text Can't Post

For some reason, my comparison (the if statement), is not working in my code. I have two lists that I read into an array, and then compare the values to see if a string of text in one file is found in any of the lines of text in the second file:



$filenameUNIX = "Unixlist";
$filenameNON = "Nonprod";
$filenameITM =
"InITM";

open (IN,
"$filenameUNIX") || die "Problems: $!";
open (NON, "$filenameNON") || die "Problems: $!";
open (OUT,
">>$filenameITM") or die "Problems writing to file: $!";

@data1contents = <IN>;
# place contents of UNIX list file in an array
@data2contents = <NON>;
# place contents of UNIX list file in an array


my $finalcount =
0;
my $i =
0;
while ( $i <= $#data1contents ) {
my $p =
0;
my $InITMCounter =
0;
while ( $p <= $#data2contents ) {
my $i = 0;
if ( $data2contents[$p] =~ /$data1contents[$i]/ ) {
print $OUT
"$data1contents[$i] Yes";
$InITMCounter++;
$p++;
}
else {
$p++;
}
}
my $finalcount = $finalcount + $InITMCounter;

$i++;
}
# End of main while loop



Any suggestions are appreaciated.

Mike-



















Laurent_R
Veteran / Moderator

May 30, 2013, 2:50 PM

Post #2 of 22 (1163 views)
Re: [mike3point0] Search for text within a line of text [In reply to] Can't Post

A number of things that are or can go wrong there.

1. As a general rule, chomp the lines that you read from a file. Here, you could actually simply chomp the @data1contents and @data2contents arrays. Sometimes you don't really need to do it, but just do it until you know better.

2. You use /$data1contents[$i]/ as a regex, please show us what is the content of the Unixlist file it is coming from. This may have to be sanitized before it is used as a regex pattern (which could possibly be done with the quotemeta function), because it may contain characters which have a special meaning in regex matching.

3. This code:


Code
		my $i = 0; 
if ( $data2contents[$p] =~ /$data1contents[$i]/ ) {

is almost certainly wrong: your main loop is iterating over the subscripts of @data1contents as $i, presumably to visit each record in that array, but since you create another local version of $i always equal to 0, you always compare $data2contents[$p] with the first element of @data1contents, which is certainly not your requirement.

4. Much less important, you don't seem to need 2 variables, $finalcount and $InITMCounter, incrementing just one throughout should be sufficient.

As an additional comment, there are much better ways to loop over the elements of an array in Perl, and using them would help you avoiding the bug reported in point # 3 above.

Consider trying something like this:


Code
chomp @data1contents; 
chomp @data2contents;
foreach my $data1 (@data1contents) {
foreach my $data2 (@data2contents) {
if ($data2 =~$data1) {
# ...
}
}
}


This is cleaner, easier to understand and less error prone.

Final point: please include your code in code tags, that preserves the formatting and make you code easier to read.

And final final note: there may be some other errors that I haven't seen in your code, but we cannot run your code without sample data, please provide samples of your file1 and file2 if you need more help. This would be useful anyway to see if anything needs to be done as per my point #2 above. And, with no data, I can neither test your code, nor test the code that I offer as a replacement. (Although I am a very great developer, I think that testing my code is not always superfluous. Wink . I don't know about you.)


mike3point0
Novice

May 30, 2013, 3:10 PM

Post #3 of 22 (1157 views)
Re: [Laurent_R] Search for text within a line of text [In reply to] Can't Post

Well, I definitely wouldn't call myself a great developer, but I thank you very much, for providing these great insights...

I had tried the foreach in a prior version of this script, but I was getting the same results... Guess that extra 'my $I=0;' might have had something to do with it...

I thought it might have been because of colons in the 2nd file that I was dumping into @data2contents..

Going to give this a go again, and in the next post, I'll include some samples from the text files if I'm still running into trouble...

Thank you again.

Mike-


Laurent_R
Veteran / Moderator

May 30, 2013, 3:32 PM

Post #4 of 22 (1151 views)
Re: [mike3point0] Search for text within a line of text [In reply to] Can't Post


In Reply To
Well, I definitely wouldn't call myself a great developer,


Neither would I, of course, I hope it is clear that this was a joke just there to stress the need of having data for the purpose of running tests.


(This post was edited by Laurent_R on May 30, 2013, 3:33 PM)


mike3point0
Novice

May 30, 2013, 3:45 PM

Post #5 of 22 (1146 views)
Re: [Laurent_R] Search for text within a line of text [In reply to] Can't Post

No worries. I welcome all suggestions to help me become better at this... P.S. Haven't learned how to use the 'code' function on this site to make my code more presentable, but hopefully, this is good enough.

Okay, here is what I did, I stripped the extra data (colons, etc.) from the source files.. So, Unixlist now has entries like this:

ga016d4e
ga016d63
ga016d77
ga016d98
ga016a395

Then 'Nonprod' has entries like this:

NC006DEVA0CA
NC006DEVD01E
NC006T00D
NC006VDEVW009
nc006idem001
nc006item001
nc006qaa0f1
nc006t054
nc006t055

I incorporated the changes you suggested in my code:

chomp @data1contents;
chomp @data2contents;

foreach my $data1 (@data1contents) {
foreach my $data2 (@data2contents) {
if ( $data2 =~ /$data1/i ) {
print OUT "$data1 is in $data2\n";
}
else {
}
}
}


And that worked. The results are now looking like this:

ga016d77 is in ga016d77
ga016d98 is in ga016d98
nc006deva0a9 is in nc006deva0a9
nc006deva0aa is in nc006deva0aa
nc006qad0a4 is in nc006qad0a4
nc006qad0a5 is in nc006qad0a5
nc006d0aa is in nc006d0aa
nc006d0ab is in nc006d0ab




So things appear to be working...

Many thanx for your suggestion..



Mike-




Code

Code



      
    


recruiter
User

May 30, 2013, 3:55 PM

Post #6 of 22 (1142 views)
Re: [mike3point0] Search for text within a line of text [In reply to] Can't Post


Quote
I have two lists that I read into an array, and then compare the values to see if a string of text in one file is found in any of the lines of text in the second file



I am just curious, from your statement and with the code you just posted, how are you comparing/matching text?


mike3point0
Novice

May 30, 2013, 4:02 PM

Post #7 of 22 (1138 views)
Re: [hwnd] Search for text within a line of text [In reply to] Can't Post

Hello, simply put-



Using the foreach for each array, the If statement says do I find the string in Unixlist located on any line in Nonprod? If I do, then say I found a match, if not then I mean to go to the next entry in Unixlist and do the same thing again, meaning compare the next entry in Unixlist and compare with every line in Nonprod.

The outer foreach cycles through the Unixlist file. the inner foreach statement says go through every line in Nonprod and use the If for my comparison. I use the regex style because I know that each line in Nonprod can have additional characters separated by colons. But I stripped that out of each file.

Hope this helps.

Mike-


FishMonger
Veteran / Moderator

May 30, 2013, 6:33 PM

Post #8 of 22 (1128 views)
Re: [mike3point0] Search for text within a line of text [In reply to] Can't Post


Quote
I use the regex style because I know that each line in Nonprod can have additional characters separated by colons. But I stripped that out of each file.


Did you strip out that extra data only for testing purposes, or also out of the actual source file(s) that you need to compare?

In either case, using nested foreach loops is very inefficient. The approach I'd take would be to parse one of the files and load the desired data into a hash. Then loop over the second file parsing the line as needed and do a simple hash lookup and output the data if the lookup is successful.

Using that approach means that you only need to loop over each file once instead of looping over the full second file for each line in the first file.


mike3point0
Novice

May 31, 2013, 3:41 AM

Post #9 of 22 (1112 views)
Re: [FishMonger] Search for text within a line of text [In reply to] Can't Post

@Fishmonger - Yes, that's why I stripped out the 'added' data for each line in Nonprod. But knowing that in the future, that the extraneous data could be there. In some of the lines, there's a suffix of ':KUX', so a line could have 'ga016d400:KUX', and it could even have a 'blahblahblah:ga016D400:KYNA' and so on, which again is why I thought I should use a regex in the comparison.

I was thinking about it this morning, that I would probably have to add a subroutine or function to do this before I actually loop through the data to find the pattern matches'.



Mike-


FishMonger
Veteran / Moderator

May 31, 2013, 6:13 AM

Post #10 of 22 (1105 views)
Re: [mike3point0] Search for text within a line of text [In reply to] Can't Post

If you post a reasonable and accurate sample of both files which include some of the problem lines, I will see if I can work up a possible solution. Without that info, I won't because it would end up being a waste of time.


Laurent_R
Veteran / Moderator

May 31, 2013, 6:13 AM

Post #11 of 22 (1104 views)
Re: [mike3point0] Search for text within a line of text [In reply to] Can't Post

Using a hash as suggested bu Fishmonger is of course far more efficient, provided however that the fields in the files are equal. If it is not the case (the fact that you use a regex tells me it is probably not the case), then the two nested loops approach is the right solution.


(This post was edited by Laurent_R on May 31, 2013, 6:33 AM)


FishMonger
Veteran / Moderator

May 31, 2013, 7:20 AM

Post #12 of 22 (1094 views)
Re: [Laurent_R] Search for text within a line of text [In reply to] Can't Post


In Reply To
Using a hash as suggested bu Fishmonger is of course far more efficient, provided however that the fields in the files are equal. If it is not the case (the fact that you use a regex tells me it is probably not the case), then the two nested loops approach is the right solution.


I don't necessarily agree that nested loops would be the right solution. Here's a short example of a non nested loop using the contrived sample data.


Code
use 5.10.0; 
use strict;
use warnings;

my %fields = map { chomp; $_ => 1 } <DATA>;

foreach my $str ('ga016d400:KUX', 'blahblahblah:ga016D400:KYNA') {
my $wanted = (split /:/, $str)[-2];

if ( exists $fields{$wanted} ) {
say "$wanted is in $str";
}
}

__DATA__
ga016d4e
ga016d63
ga016d77
ga016d98
ga016a395
ga016d400
ga016D400


Outputs:
ga016d400 is in ga016d400:KUX
ga016D400 is in blahblahblah:ga016D400:KYNA

However, the format of the actual data probably does not conform to that sample, so this test case would need to be adjusted accordingly.


(This post was edited by FishMonger on May 31, 2013, 7:20 AM)


mike3point0
Novice

May 31, 2013, 9:56 AM

Post #13 of 22 (1086 views)
Re: [FishMonger] Search for text within a line of text [In reply to] Can't Post

Hello, here are some lines from the original file:



nc006a172
nc006a22e
nc006a22f
nc006a230
nc006a231
nc006d045
nc006t03c
nc006t040
nc006t042
lxd2101
nc006a0b1
nc006a0b2
nc006a0c5



And here are some lines from the list to compare the first list with in order to find a match:

Name
#10.5.70.164::RCACFG
#10.5.74.57::RCACFG
#10.5.76.43::RCACFG
10_7_213_152:nc006a192:KYNA
10_7_60_164:nc006deva057:KYNA
10_7_92_48:nc006tva123:KYNA
10_7_92_56:nc006deva033:KYNA
10_7_92_57:nc006deva034:KYNA
10_7_92_70:nc006vdeva09c:KYNA
Custom01_d_#1:ga016vdeva003:KYNS
Custom01_d_ase:nc006deva054:KYNS
Custom01_d_e_1:nc006deva08c:KYNS
Custom01_d_e_1:nc006deva08d:KYNS
Custom01_d_ei0:nc006deva08c:KYNS
Custom01_d_ei0:nc006deva08d:KYNS
nc006deva001:KUX
nc006deva002:KUL
nc006deva002:KUX
nc006deva002:PX
nc006deva003:KUL
nc006deva003:KUX



Hope this helps... Again, I was looking to take the servers in the first list and see if I find a match on any of the lines in the 2nd list..



Mike-


FishMonger
Veteran / Moderator

May 31, 2013, 10:40 AM

Post #14 of 22 (1081 views)
Re: [mike3point0] Search for text within a line of text [In reply to] Can't Post

I assume the server (hostname) in Nonprod will always be in the next to last colon separated field and that the lines beginning with # are to be skipped.

Given those assumptions the approach in the test script I posted will do what you need once the line to skip the "comments" is added.

Here's the updated version.


Code
#!/usr/bin/perl 

use 5.10.0;
use strict;
use warnings;

my $server_lst = 'Unixlist';
my $non_prod = 'Nonprod';
my $itm_found = 'InITM';

open my $srv_fh, '<', $server_lst or die "failed to open '$server_lst' $!";
open my $non_fh, '<', $non_prod or die "failed to open '$non_prod' $!";
open my $itm_fh, '>>', $itm_found or die "failed to open '$itm_found' $!";

my %server = map { chomp; $_ => 1 } <$srv_fh>;
close $srv_fh;

while ( my $str = <$non_fh> ) {

next if $str =~ /^#/;

my $hostname = (split /:/, $str)[-2];

next unless $hostname; # sanity check to make sure the hostname field wasn't empty

if ( exists $server{$hostname} ) {
say {$itm_fh} "$hostname was found on line $. => $str";
}
}

close $non_fh;
close $itm_fh;



Laurent_R
Veteran / Moderator

May 31, 2013, 10:41 AM

Post #15 of 22 (1080 views)
Re: [FishMonger] Search for text within a line of text [In reply to] Can't Post


In Reply To

In Reply To
Using a hash as suggested by Fishmonger is of course far more efficient, provided however that the fields in the files are equal. If it is not the case (the fact that you use a regex tells me it is probably not the case), then the two nested loops approach is the right solution.


I don't necessarily agree that nested loops would be the right solution. Here's a short example of a non nested loop using the contrived sample data.

(...)


OK, agreed, if we know enough about the data to be able to locate exactly where in the string to look for a possible match, then the hash solution is more performant than nested loops.

It is actually one of the reason why I asked the OP for for data samples.

Looking at the data now provided by the OP, I think it looks probably too messy for a hash solution (unless you can make sense out of it and be sure that no unexpected case will occur).

Since I cannot be sure of where in the string to look in a hash for a possible match, I would probably go for two nested loops, but using the index function will probably be more efficient that a regex.


mike3point0
Novice

Jun 1, 2013, 5:10 PM

Post #16 of 22 (1041 views)
Re: [Laurent_R] Search for text within a line of text [In reply to] Can't Post

Laurent_R-

The problem to note is that all of the fields on each line are not guaranteed to be the same number, or have the same number of fields. Some lines may have two fields, some may have three, and based on the server app where I'm getting the data from, it could be more, which is why I originally thought I could keep it simple, employ a short regex to just look for the specific text on each line from the second file (Nonprod).


Mike-


(This post was edited by mike3point0 on Jun 1, 2013, 5:19 PM)


mike3point0
Novice

Jun 1, 2013, 5:18 PM

Post #17 of 22 (1039 views)
Re: [FishMonger] Search for text within a line of text [In reply to] Can't Post

Fishmonger-



Many thank you's for the updated script. As I read through it... I wanted to make sure I'm on the right track here...

With a cursory glance, it looks like you are interpreting each field with the ':' as the delimiter, noting if the field has data in it or not. My only note here is your assumption about the hostname always being the 2nd field in the NonProd file. With this list, you can't necessarily do that, but now you've got me thinking to get more specific and just look at the suffix, because that actually is important to me if I wanted to approach it using an example of "Look at each line in the 2nd (NonProd) file, and if it has :KUL or :KUX in the line, search that line and discard all of the others, because the data (or field) before the :KUL/:KUX is in fact the hostname that I am looking for.



Did I get that right... (be gentle... LOL)...



Mike-


(This post was edited by mike3point0 on Jun 1, 2013, 5:19 PM)


Laurent_R
Veteran / Moderator

Jun 2, 2013, 2:33 AM

Post #18 of 22 (1016 views)
Re: [mike3point0] Search for text within a line of text [In reply to] Can't Post

Hi Mike,

we don't know your data, hopefully you do.

If you can locate with certainty where in the string a match may occur, then the hash solution is better. If you can't be sure to locate that, then you probably need to stay with the nested loops and regex matching.


FishMonger
Veteran / Moderator

Jun 2, 2013, 6:22 AM

Post #19 of 22 (1006 views)
Re: [mike3point0] Search for text within a line of text [In reply to] Can't Post


Quote
My only note here is your assumption about the hostname always being the 2nd field in the NonProd file.

That is NOT what I meant.


Quote
because the data (or field) before the :KUL/:KUX is in fact the hostname

That IS what I meant and is what my code looks for i.e., the field right before the last field.

Did you try the script and does it do what you need?


(This post was edited by FishMonger on Jun 2, 2013, 6:23 AM)


mike3point0
Novice

Jun 2, 2013, 5:09 PM

Post #20 of 22 (990 views)
Re: [FishMonger] Search for text within a line of text [In reply to] Can't Post

@Fishmonger- My apologies for what I stated based on your first comment. You introduced a couple of things I hadn't seen before (i.e. map), so I was just trying to see if I could read it correctly.. I'm going to try the new script tonight, stay tuned...

Mike-


mike3point0
Novice

Jun 2, 2013, 5:16 PM

Post #21 of 22 (989 views)
Re: [Laurent_R] Search for text within a line of text [In reply to] Can't Post

@Laurent - I know the application that I generate the list (nonprod) from, as the list represents agents that might be running on a particular server. With the application, the agents run on multiple platforms, for instance, we have agents that run on the mainframe (which is why there might be line without a servername), then agents that run on distributed platform (where I am targeting), and then there are agents exist solely for monitoring a JVM or a Database instance, so there may not necessarily be a hostname in the line.. so that is why I was just trying to get away with a quick regex match to see if a server in my unix list could be found...

Thank you for the suggestions.

Mike-


mike3point0
Novice

Jun 2, 2013, 5:59 PM

Post #22 of 22 (986 views)
Re: [FishMonger] Search for text within a line of text [In reply to] Can't Post

@Fishmonger- Ok, I tested the script, and it works beautifully, no need to strip out anything in the original files. Here is some of the output:



nc006qaa073 was found on line 14 => 10_6_106_189:nc006qaa073:KYNA

nc006a23 was found on line 15 => 10_6_106_23:nc006a23:KYNA

nc006a192 was found on line 23 => 10_7_213_152:nc006a192:KYNA

nc006deva057 was found on line 24 => 10_7_60_164:nc006deva057:KYNA

nc006deva033 was found on line 26 => 10_7_92_56:nc006deva033:KYNA

nc006deva034 was found on line 27 => 10_7_92_57:nc006deva034:KYNA

nc006deva112 was found on line 33 => 10_7_93_117:nc006deva112:KYNA



Thank you again-



Mike-

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives