CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Reg ex script

 



esludds
New User

Dec 4, 2013, 6:03 AM

Post #1 of 7 (1758 views)
Reg ex script Can't Post

Ive created a script to write all the ip address found in one file to another file. My problem is that it jumps straight to the line "No ip addresses found" and prints that to the file instead of the ip addresses can anyone help me identify where i am going wrong. The file being read in is Assignment2.rtf and the results are being written to ipaddresses.txt


Code
#!/usr/bin/perl 

open(INFILE, "<<Assignment2.rtf") || die("file not found");
#chooses the file to read
open(OUT, ">ipadresses.txt");
#prints file
$none = "No ip addresses found!";
$line = <IN>;

for ($line)
{
if ($line =~ m/\d{1,2}.\d{1,3}.\d{1,3}.\d{1,3}/)
{
print (OUT $line);
}
else
{
print (OUT $none);
}
}

close(IN);
close(OUT);



FishMonger
Veteran / Moderator

Dec 4, 2013, 6:17 AM

Post #2 of 7 (1755 views)
Re: [esludds] Reg ex script [In reply to] Can't Post

Start by adding these 2 lines before the open statement and pay attention to the problems that they point out.


Code
use strict; 
use warnings;



(This post was edited by FishMonger on Dec 4, 2013, 6:18 AM)


Laurent_R
Veteran / Moderator

Dec 4, 2013, 8:54 AM

Post #3 of 7 (1745 views)
Re: [esludds] Reg ex script [In reply to] Can't Post


In Reply To

Code
$line = <IN>;



If anything, this line of coide will only reead the first line of your input file, no other line will ever be read. This part has to go into a loop reading the whole file.


esludds
New User

Dec 4, 2013, 9:11 AM

Post #4 of 7 (1744 views)
Re: [Laurent_R] Reg ex script [In reply to] Can't Post

How would I put it into a loop? for loop while loop??? Thanks for your reply


BillKSmith
Veteran

Dec 4, 2013, 10:04 AM

Post #5 of 7 (1732 views)
Re: [esludds] Reg ex script [In reply to] Can't Post

The simplest way to add this feature to your code is to replace

Code
$line = <IN>;  

for ($line)
{


with:

Code
for (my $line = <IN>) 
{


Note: The 'my' becomes necessary when you adopt FishMonger's sugestion to use strict.

On an unrelated topic, your regex will match valid ip addresses, but can make many false matches because a period is metacharacterin a regex. You must escape them with a backslash when you mean for them to match only themseleves.
Good Luck,
Bill


FishMonger
Veteran / Moderator

Dec 4, 2013, 10:18 AM

Post #6 of 7 (1731 views)
Re: [esludds] Reg ex script [In reply to] Can't Post


Quote

Code
open(INFILE, "<<Assignment2.rtf") || die("file not found");  
...
...
$line = <IN>;


Those are not the same filehandle.

You should be using a lexical var fore the filehandle and the 3 arg form of open.

The die statement should include the filename and the reason it failed

Look at the incorrect mode you specified.

You should be using a while loop, not a for loop.

This is how you should do it.

Code
my $input_file = 'Assignment2.rtf'; 
open my $in_fh, '<', $input_file or die "failed to open '$input_file' due to: $!";

while (my $line = <$in_fh>) {
chomp $line;



Kenosis
User

Dec 4, 2013, 11:15 AM

Post #7 of 7 (1723 views)
Re: [esludds] Reg ex script [In reply to] Can't Post

Your IP-matching regex needs a little attention. For example:

Code
use warnings; 
use strict;

while (<DATA>) {
if (m/\d{1,2}.\d{1,3}.\d{1,3}.\d{1,3}/) {
print "Yes: $_";
}
else {
print "No : $_";
}
}

__DATA__
127.0.0.1
9995.333.777.000
255.255.255.0
abc.def.ghi.jkl
10_376_14.0
1234567890


Output:

Code
Yes: 127.0.0.1 
Yes: 9995.333.777.000
Yes: 255.255.255.0
No : abc.def.ghi.jkl
Yes: 10_376_14.0
Yes: 1234567890


Note that it's matching too much. Also, the leading m is used only when ^ (match from the beginning) or $ (denoting the end) is used in the regex, so that can be removed:


Code
/\d{1,2}.\d{1,3}.\d{1,3}.\d{1,3}/


Didn't you mean /\d{1,2} for the first octet, like all the rest? Also, the period in your regex isn't escaped \., so it will match any character except a newline. Making these changes, you get:

Code
/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/

with an output used in the script above of:

Code
Yes: 127.0.0.1 
Yes: 9995.333.777.000
Yes: 255.255.255.0
No : abc.def.ghi.jkl
No : 10_376_14.0
No : 1234567890

This is better, but you don't want 9995.333.777.000. You want to match an IP that's surrounded by a word boundary, and that's represented by \b. So, we now have:

Code
/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/

with an output of:

Code
Yes: 127.0.0.1 
No : 9995.333.777.000
Yes: 255.255.255.0
No : abc.def.ghi.jkl
No : 10_376_14.0
No : 1234567890

This is matching only what we want matched.

Now, you can go one step further here, using the notation in \d{1,3}. Notice that the IP addresses have a repeating pattern: (nnn.) x 3 (assuming three digits in the octets), following by nnn as the last octet. You can represent that in a regex as follows:

Code
/\b(?:\d{1,3}\.){3}\d{1,3}\b/

And this generates the same output as the last regex, but it's just a bit shorter.

Hope this helps!


(This post was edited by Kenosis on Dec 4, 2013, 11:17 AM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives