CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Extracting a 36 digit word (object handle)

 



umpty
Novice

Jan 23, 2003, 7:44 AM

Post #1 of 17 (14330 views)
Extracting a 36 digit word (object handle) Can't Post

Compared to other scripting languages, I'm most impressed with Perl and trying to learn it as quickly as possible. However I need your help with a project at work. I need to open up each file in an approximate 50,000 file directory and extract the first 36 digit word (actually an object handle), write the name of each file opened along with this 36 digit word (object handle) to a text file, close that file and loop through each of the remaining files doing the same thing.

This is probably as easy as pie for some of you Perl guru's, and I relish the time when I can be labeled as such. But for now, I am crawling and need your input for this project. Please help. Thanks


BackUp
Novice

Jan 23, 2003, 12:29 PM

Post #2 of 17 (14311 views)
Re: [umpty] Extracting a 36 digit word (object handle) [In reply to] Can't Post

A 36 digit word? lol

Btw 50,000 files in one directory probably isn't good for your OS performance.


Code
my $found = {}; 
my $dir = '/path/to/dir';

opendir DIR, $dir or die $!;
for (grep { !/^\./ } readdir(DIR)) {
open FILE, "$dir/$_" or die $!;
while (<FILE>) {
if (/\b(.{36})\b/) {
$found{$_} = $1;
last;
}
}
close FILE;
}
close DIR;

if (scalar keys %$found) {
open FILE $new or die $!;
print FILE join "\n", map { "$_ => $found->{$_}" } keys %$found;
close FILE;
}


That's my very quick go before I rush for dinner :)


BackUp
Novice

Jan 23, 2003, 12:30 PM

Post #3 of 17 (14309 views)
Re: [umpty] Extracting a 36 digit word (object handle) [In reply to] Can't Post

Oops make sure you define $new as the file you want to write the results to. Oh and change:

for (

to...

for my $file (

...and change $_ to $file inside the for loop




(This post was edited by BackUp on Jan 23, 2003, 12:32 PM)


umpty
Novice

Jan 23, 2003, 1:49 PM

Post #4 of 17 (14297 views)
Re: [BackUp] Extracting a 36 digit word (object handle) [In reply to] Can't Post

Thanks for your reply: I incorporated the following code...

my $found = {};
my $dir = 'C:\Search Express\OGC Images';
opendir DIR, $dir or die $!;
for my $file(grep { !/^\./ } readdir(DIR)) {
open FILE, "$dir/$file" or die $!;
while (<FILE>) {
if (/\b(.{36})\b/) {
$found{$file} = $1;
last;
}
}
close FILE;
}
close DIR;
if (scalar keys %$found) {
open FILE, $new or die $!;
print FILE join "\n", map { "$_ => $found->{$_}" } keys %$found;
close FILE;
}


and received the following error:

Name "main::new" used only once: possible typo at imagefiles.pl line 16

Name "main::found" used only once: possible typo at imagefiles.pl line 8

Here are two other caveats:

---I am only interested in files within the directory that end with an .wmk extension.

---I am using Active State (Windows Perl)



As stated, thanks for your reply, but if any of you have any further time to look at this, it would be immensely appreciated.

Thanks


BackUp
Novice

Jan 23, 2003, 3:00 PM

Post #5 of 17 (14296 views)
Re: [umpty] Extracting a 36 digit word (object handle) [In reply to] Can't Post

There were a few typos. Try this:


Code
my $found = {};  
my $new = 'C:\Search Express\output.file';
my $dir = 'C:\Search Express\OGC Images';

opendir DIR, $dir or die $!;
for my $file (grep { /\.wmk$/ } readdir(DIR)) {
open FILE, "$dir/$file" or die $!;
while (<FILE>) {
if (/\b(.{36})\b/) {
$found->{$file} = $1;
last;
}
}
close FILE;
}
close DIR;

if (scalar keys %$found) {
open FILE, $new or die $!;
print FILE join "\n", map { "$_ => $found->{$_}" } keys %$found;
close FILE;
}



umpty
Novice

Jan 25, 2003, 7:39 AM

Post #6 of 17 (14284 views)
Re: [BackUp] Extracting a 36 digit word (object handle) [In reply to] Can't Post

I ran the script and did not realize any outcome produced. Correct me if I'm wrong, but wouldn't there be a txt file produced as a new file. I can't see anything that the script did.

Thanks


Paul
Enthusiast

Jan 25, 2003, 8:16 AM

Post #7 of 17 (14282 views)
Re: [umpty] Extracting a 36 digit word (object handle) [In reply to] Can't Post

That's what output.file is...although I just realised I made a booboo.

Change...

open FILE, $new

to

open FILE, ">$new"


(This post was edited by Paul on Jan 25, 2003, 8:17 AM)


umpty
Novice

Jan 25, 2003, 9:27 AM

Post #8 of 17 (14278 views)
Re: [Paul] Extracting a 36 digit word (object handle) [In reply to] Can't Post

I have run it, and run it. It simply does not create an output.file or output.* (anything) file.



I am sending a file with a .wmk extension. Maybe you can run this on your machine to see if it works

Thanks.
Attachments: 8200.WMK (0.61 KB)


Paul
Enthusiast

Jan 25, 2003, 9:35 AM

Post #9 of 17 (14271 views)
Re: [umpty] Extracting a 36 digit word (object handle) [In reply to] Can't Post

If you made the change I suggested and made sure the paths were correct then it _will_ create a file.


umpty
Novice

Jan 26, 2003, 11:10 AM

Post #10 of 17 (14264 views)
Re: [Paul] Extracting a 36 digit word (object handle) [In reply to] Can't Post

Paul, excuse me if I'm being dense; but am I to make the last change only at the 4th line from the bottom as such:

open FILE, ">$new" or die $!;
print FILE join "\n", map { "$_ => $found->{$_}" } keys %$found;
close FILE;
}

I'm still trying to get this to work. I do trust that you know what you're doing; it's just that I'm trying to overcome my shortsightedness.

This is the only place that I made the change.

Thanks.


Paul
Enthusiast

Jan 26, 2003, 1:42 PM

Post #11 of 17 (14262 views)
Re: [umpty] Extracting a 36 digit word (object handle) [In reply to] Can't Post

What happened after you made that change?


umpty
Novice

Jan 26, 2003, 3:03 PM

Post #12 of 17 (14257 views)
Re: [Paul] Extracting a 36 digit word (object handle) [In reply to] Can't Post

Nothing happened. I ran the script, hoping that the output file was written, but it did not happen.

Dave


Paul
Enthusiast

Jan 26, 2003, 4:38 PM

Post #13 of 17 (14252 views)
Re: [umpty] Extracting a 36 digit word (object handle) [In reply to] Can't Post

Thats not possible. If the script ran the file has to be created. If the file wasn't created you would have seen an error.


umpty
Novice

Jan 29, 2003, 6:29 PM

Post #14 of 17 (14235 views)
Re: [Paul] Extracting a 36 digit word (object handle) [In reply to] Can't Post

Paul, thanks for all of your help. It seems obvious that I have not loaded perl correctly because I put in some simple scripts that is suppose to produce an output file. In these examples it produces the output file, but does not write the expeced data to the file.

When installing perl for xp OS, is there a module install or some other step that I am suppose to take to insure that all works well. I simply installed it using the MSI installer.

Thanks Dave


umpty
Novice

Jan 31, 2003, 9:31 PM

Post #15 of 17 (14226 views)
Re: [Paul] Extracting a 36 digit word (object handle) [In reply to] Can't Post

Hello Paul, I was able to output the file. It does indeed write the name of the file, however it does not get the first 36 character word. For example: I am trying to get the 1st 36 digit word; in this case I need to get (A6C0BDB4-92A9-11D1-96F8-00805FE246D4) and close that file and go to the next. So the output for this file would be:

xxxxx.wmk A6C0BDB4-92A9-11D1-96F8-00805FE246D4

This is my sample xxxxx.wmk file:

uniqueID parent class dateCreated dateMostRecentUpdate expirationDate size pageCount familyID resolutionID targetClass deleteInProgress version documentName account template createdBy mostRecentUpdateBy description format all.kw all.user
A6C0BDB4-92A9-11D1-96F8-00805FE246D4 37AD20C7-5232-11D1-96EE-00805FE246D4 3 19980122000000000 19980122000000000 20790604000000000 0 0 A6C0BDB4-92A9-11D1-96F8-00805FE246D4 0 3 0 0 Perkins, Annettee Watermark ogc ogc

I am almost there. Sorry for not realizing what was happening before.
Thanks, Dave


umpty
Novice

Feb 17, 2003, 11:29 AM

Post #16 of 17 (14184 views)
Re: [umpty] Extracting a 36 digit word (object handle) [In reply to] Can't Post

Answered my own request after persistently working with the problem: The modification to the if statement that corrected the problem is as follow:

if (/\b([^ \t]{36})\b/) {

Paul, thanks for all your help.


davorg
Thaumaturge / Moderator

Feb 18, 2003, 1:49 AM

Post #17 of 17 (14179 views)
Re: [umpty] Extracting a 36 digit word (object handle) [In reply to] Can't Post

You can probably simplify that to

Code
/\b(\S{36})\b/

\S isn't exactly the same as [^ \t], but it's close enough for most purposes.

--
Dave Cross, Perl Hacker, Trainer and Writer
http://www.dave.org.uk/
Get more help at Perl Monks

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives