CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Reading contents of all files in a directory

 



relroy
Novice

Apr 15, 2010, 6:14 PM

Post #1 of 10 (1022 views)
Reading contents of all files in a directory Can't Post

I have a directory with 1000 files which contain email msgs (1 file is attached as sample).
Aim is to extract the IP addresses from the received headers of the messages and extract domains from the body of the messages to give sample output as below:-
File: filename2
IP: 12.2.3.40
Domain: domain2.com
Domain: domain3.com

Current code gives the filenames but isnt extracting IP address from the files, says uninitialized $_
-----------------------------------------------------------
#!C:/strawberry/perl/bin/perl
#directory.plx

use strict;
use warnings;

chdir("C:/Documents and Settings/Administrator/Directory101") or die "$!";
opendir (DIR, ".") or die "$!";
my @files = readdir DIR;
close DIR;

local @ARGV = @files;
foreach $ARGV (sort @ARGV)
{
print "Filename: $ARGV \n";
if (m /^d(,3)\.\d(1,3)\.\d(1,3)\.\d(1,3)\$/)
{
print "IP Address: $_ \n";
}
}


Any tips would be welcome. Thank you!


(This post was edited by relroy on Apr 15, 2010, 6:27 PM)
Attachments: 1 (27.4 KB)


roolic
User

Apr 15, 2010, 8:51 PM

Post #2 of 10 (1008 views)
Re: [relroy] Reading contents of all files in a directory [In reply to] Can't Post

1. using @ARGV has no sense. $_ is item of @_ array, not @ARGV
2. do not read the file content.
3. regex is incorrect. and /^ .. $/ assumes the $_ string containing IP only
4. there a lot of IPs in your file sample

Code
foreach $file (sort @files) {  
print "Filename: $file \n";
open (FILE, "< $file") || die;
my $filedata = join('',<FILE>);
close FILE;
foreach ( $filedata =~ /((?:\d{1,3}\.){3}\d{1,3})/g ){
print "IP Address: $_ \n";
}
}



relroy
Novice

Apr 15, 2010, 11:12 PM

Post #3 of 10 (998 views)
Re: [roolic] Reading contents of all files in a directory [In reply to] Can't Post

Thanks so much for quick response. We only need the IPs under 'received' part of header msgs. For instance theres 2 IP addrs in the sample:-

Received: from dev211.mell.com (HELO mell.com) (71.129.195.163)
and
Received: from unknown (HELO mta104.cheetahmail.com) (216.15.189.38)
by www.mell.com with SMTP; 12 Jan 2003 15:21:24 -0000


Also, this code below gives output as
Filename: .
Died at line 13
-----------------------------------------------------------
#!C:/strawberry/perl/bin/perl
#directory.plx

use strict;
use warnings;

chdir("C:/Documents and Settings/Administrator/Directory101") or die "$!";
opendir (DIR, ".") or die "$!";
my @files = readdir DIR;
close DIR;
foreach my $file (sort @files) {
print "Filename: $file \n";
open (FILE, "< $file") || die;
my $filedata = join('',<FILE>);
close FILE;
foreach ( $filedata =~ /((?:\d{1,3}\.){3}\d{1,3})/g ){
print "IP Address: $_ \n";
}
}

Any modifications/tips welcome. Also need to get the domain name eg "circuitcity.com" that appears in the header part of the sample file.


7stud
Enthusiast

Apr 15, 2010, 11:19 PM

Post #4 of 10 (996 views)
Re: [relroy] Reading contents of all files in a directory [In reply to] Can't Post


Code
use strict; 
use warnings;
use 5.010;

my @words = (
'hello',
'goodbye',
);

#1
for my $word (@words) {
say $word;
}

say '-' x 20;

#2
for my $word (@words) {
say $_;
}


say '-' x 20;

#3
for (@words) {
say $_;
}

say '-' x 20;

#4
for (@words) {
say;
}


--output:--
hello
goodbye
--------------------
Use of uninitialized value $_ in say at 1perl.pl line 17.

Use of uninitialized value $_ in say at 1perl.pl line 17.

--------------------
hello
goodbye
--------------------
hello
goodbye


#1 is the preferred way. #4 is the shortest.

Your regex is completely screwed up. See what this does:


Code
use strict; 
use warnings;
use 5.010;


my $str = '12345';

if ($str =~ /\d(1,3)/) {
say 'yes';
}
else {
say 'no';
}

--output:--
no


What is the reason for that output?

Here's the deal:

1) You don't know what an array is.
2) You don't know how a for loop works.
3) You don't know how regexes work.

How about this: before trying to write a program more sophisticated than printing 'hello world' you read a beginning perl book? You will find that computer languages are not well suited to people who try to guess how to write a program.

There are no shortcuts in computer programming. You can't skip the basics and expect to be able to write programs that work. Learning the basics requires two things: time and effort. If you don't have the time or are not willing to put in the effort, then you have no chance.


(This post was edited by 7stud on Apr 15, 2010, 11:24 PM)


relroy
Novice

Apr 15, 2010, 11:24 PM

Post #5 of 10 (991 views)
Re: [7stud] Reading contents of all files in a directory [In reply to] Can't Post

Thanks for the reply.

Yes I have no idea what is going on there. This is for a school assignment and first time Im learning a programming language (safe to say I havent learnt much at all) but any tips would be helpful to get that snippet to work.

Yea thats why I put this under the beginner part of forum since just began learning and got no clue how to proceed for this assignment ... since first two assignments he gave were mch simpler.


(This post was edited by relroy on Apr 15, 2010, 11:27 PM)


7stud
Enthusiast

Apr 15, 2010, 11:41 PM

Post #6 of 10 (984 views)
Re: [relroy] Reading contents of all files in a directory [In reply to] Can't Post

Well, you need to understand how a for-loop works, so find some beginning tutorials and read them.

I'm not sure why you are using @ARGV for an array. Are you under the impression that it is hard to create your own array? It's easy as pie:


Code
my @words = ('hello', 'goodbye');


Is it your belief that parentheses () and brackets {} are the same thing and that you can just use them interchangeably? Better re-check your regex reference.

Also, search google for 'code tags'. Read the first 10 results and come back and post what you have learned.

More effort is required on your part.


(This post was edited by 7stud on Apr 15, 2010, 11:47 PM)


7stud
Enthusiast

Apr 16, 2010, 12:24 AM

Post #7 of 10 (972 views)
Re: [7stud] Reading contents of all files in a directory [In reply to] Can't Post


Quote
Aim is to extract the IP addresses from the received headers of the messages and extract domains from the body of the messages


How will you know where the headers in the file end and the body begins?


roolic
User

Apr 16, 2010, 2:17 AM

Post #8 of 10 (962 views)
Re: [relroy] Reading contents of all files in a directory [In reply to] Can't Post


Code
... 
chdir("C:/Documents and Settings/Administrator/Directory101") or die "$!";
opendir (DIR, ".") or die "$!";
my @files = grep { -f $_ } readdir DIR; # read files only
close DIR;

foreach my $file (sort @files) {
print "Filename: $file \n";
open (FILE, "< $file") || die;
# read only lines looking like
# Received: from dev211.mell.com (HELO mell.com) (71.129.195.163)
my @lines = grep( /Received:\s+from.+?\(HELO.+?\)\s+\([\d\.]+\)/, <FILE> );
close FILE;
# checking the lines one by one
foreach my $line ( @lines ){
# using regex to find the domain and IP
if( $line =~ /\(HELO\s+(\S+)\)\s+\(([\d\.]+)\)/ ){
print "IP Address: $2\nDomain: $1\n";
}
}
}


NB: the provided regex conditions should be enough for requested task. However some garbage may appear depending on data. in this case more strong conditions will be necessary


(This post was edited by roolic on Apr 16, 2010, 2:22 AM)


FishMonger
Veteran / Moderator

Apr 16, 2010, 6:06 AM

Post #9 of 10 (951 views)
Re: [relroy] Reading contents of all files in a directory [In reply to] Can't Post

Why did you start a new thread for this question when 7stud and roolic are already working with you to on this issue?

You've already admitted that this is your class homework assignment, which is what I suspected in the first place, and it appears that you're looking for someone to do this assignment for you.

We don't do other peoples homework, but we can give guidance.

Personally I'd start out by using one or more of the modules designed for parsing emails.

Here's a head start.


Code
#!/usr/bin/perl 

use strict;
use warnings;
use MIME::Head;
use Data::Dumper;

my $email = './email.txt';
my $head = MIME::Head->new->from_file($email);
my @all_received = grep { /HELO/ } $head->get('Received');

print Dumper \@all_received;


http://search.cpan.org/search?query=mime%3A%3A&mode=all


relroy
Novice

Apr 16, 2010, 6:33 AM

Post #10 of 10 (944 views)
Re: [FishMonger] Reading contents of all files in a directory [In reply to] Can't Post

Sorry if I violated some rules. And thanks all who replied. Thank u roolic!

-Thread closed-

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives