
umpty
Novice
Feb 1, 2003, 6:42 PM
Post #1 of 2
(517 views)
|
|
36 digit string extraction
|
Can't Post
|
|
I have gotten wonderful help from this forum on a problem earlier, but I'm about 98% to a resolution. My objective is to extract the first 36 digit string from each file with a .wmk extension in a directory. The script that I got help for this is as follow: my $found = {}; my $new = 'C:\Search Express\output.txt'; my $dir = 'C:\Search Express\OGC Images'; opendir DIR, $dir or die $!; for my $file (grep { /\.WMK$/ } readdir(DIR)) { open FILE, "$dir/$file" or die $!; while (<FILE>) { if (/\b(.{36})\b/) { #I think the problem is this line. $found->{$file} = $1; last;} } close FILE; } close DIR; if (scalar keys %$found) { open FILE, ">$new" or die $!; print FILE join "\n", map { "$_ $found->{$_}" } keys %$found; close FILE; } The following is a sample of file content: uniqueID parent class dateCreated dateMostRecentUpdate expirationDate size pageCount familyID resolutionID targetClass deleteInProgress version documentName account template createdBy mostRecentUpdateBy description format all.kw all.user A6C0BDB4-92A9-11D1-96F8-00805FE246D4 37AD20C7-5232-11D1-96EE-00805FE246D4 3 19980122000000000 19980122000000000 20790604000000000 0 0 A6C0BDB4-92A9-11D1-96F8-00805FE246D4 0 3 0 0 Perkins, Annettee Watermark ogc ogc - The results of running the script yields the following:
8040.WMK dateMostRecentUpdate expirationDate 240.WMK dateMostRecentUpdate expirationDate 0.WMK dateMostRecentUpdate expirationDate 9200.WMK dateMostRecentUpdate expirationDate - Ideally, it should look like this:
1000.WMK A6C0BD67-92A9P11D1-96F8-00805FE246D4 Any help on this would be greatly appreciated. Thanks David Jones
|