CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Advanced:
Matching Unique Field with Action

 



rfransix
Novice

Mar 7, 2011, 8:24 AM

Post #1 of 17 (5497 views)
Matching Unique Field with Action Can't Post

Hi,

A text file contains lines of fields seperated by some delimiter; space for example; some lines but not all will have in the second field a value like an inital (e.g., A. or B. or Jr.) for those lines with such a value in field 2, we wish to have that value deleted. We also wish to do this as a one-liner. Here's what we have so far, but this omits all the other lines without a period in any field. What say you?

perl -i.orig -ane "print unless $F[2] =~ m{ \A (?: [[:alpha:]] | Jr) \. \z }xms" in.file


The following lines illustrate the issue: Thomas, A. Alexandria S. Perl Programmer Las Vegas NV.Williams, Jr., Michael A. C Programmer New York N.Y.Silver, Susan Java Programmer San Jose CAAltips, Alvin C. Tcl Programmer Chicago IL. The challenge is to remove the A. and Jr., from the first two records, leaving the remaining fields in all lines with periods intact, and moreover leaving all lines in the in.file.Thank you.


Karazam
User

Mar 8, 2011, 12:53 AM

Post #2 of 17 (5479 views)
Re: [rfransix] Matching Unique Field with Action [In reply to] Can't Post

It seems to me you want to edit certain lines? The "print unless" construction will simply delete the matching lines.
If I'm right, something like this should work:


Code
perl -pi.orig -e "s/(\w+,)\s+\w{1,3}[,.]+\s+(.*)/$1 $2/;" in.file


provided that your data is broken up into lines like so:


Code
Thomas, A. Alexandria S. Perl Programmer Las Vegas NV. 
Williams, Jr., Michael A. C Programmer New York N.Y.
Silver, Susan Java Programmer San Jose CA
Altips, Alvin C. Tcl Programmer Chicago IL.



rfransix
Novice

Mar 8, 2011, 8:02 AM

Post #3 of 17 (5472 views)
Re: [Karazam] Matching Unique Field with Action [In reply to] Can't Post

Well done.

Unfortunately, when the 2nd field is something like (Jr., or III., or P.J., or A.B.C.) this one liner keeps them in the file instead of deleting them.

Will you be kind enough to comment on the workings of this code?

Thank you again.


Karazam
User

Mar 8, 2011, 8:52 AM

Post #4 of 17 (5468 views)
Re: [rfransix] Matching Unique Field with Action [In reply to] Can't Post

That's odd, when I run it it takes care of the Jr., and III., cases. Not the P.J., or A.B.C. cases though (they didn't appear in your example data).
I made some changes to cover more cases, it looks like this in a script:


Code
#!/usr/bin/perl 
use warnings;
use strict;

while (<DATA>) {
s/
(\w+,) # one or more word characters follow by a comma, captured into $1
\s+ # one ore more white spaces
(?: # non-capturing group
\w # word character
\.? # 0 or 1 period
)+ # one or more of the "word character-maybe period" pattern
[,.]+ # one or more periods or commas
\s+ # one or more white spaces
(.*) # everything else, captured into $2
/$1 $2/x; # matches in first and second capturing group
print;
}


__DATA__
Thomas, A. Alexandria S. Perl Programmer Las Vegas NV.
Williams, Jr., Michael A. C Programmer New York N.Y.
Silver, Susan Java Programmer San Jose CA
Altips, Alvin C. Tcl Programmer Chicago IL.
Doe, III., John B. Lisp Programmer San Francisco CA
Smith, P.J., Jessica Ruby Programmer Las Vegas NV.
Norris, A.B.C. Chuck Python Programmer Chicago IL.


The oneliner version would be


Code
perl -pi.orig -e "s/(\w+,)\s+(?:\w\.?)+[,.]+\s+(.*)/$1 $2/;" in.file


Unless I'm missing something obvious it should work. If it doesn't, could you please attach your original data file?
Might be something in there that's different from the examples.


rfransix
Novice

Mar 8, 2011, 9:25 AM

Post #5 of 17 (5465 views)
Re: [Karazam] Matching Unique Field with Action [In reply to] Can't Post

Here's the real data examples (and that's the ballgame), the remaining Jr. entry is Park, notice Park does not include a comma. Notice the only III line is Mastando, which also does not include a commo in the first field:

Egan, Jr., James C. Partner Washington DC 202/682-7036 Litigation
Eskow, A.B.C., Lisa R. Counsel Houston +1 201 386 2956 Litigation
Grossman, Y. Shukie Partner New York +1 212 310 8655 Corporate
Hatcher, R., Todd Associate New York +1 212 310 8700 Tax
Himelfarb, P.J. Partner Washington DC +1 202 682 7197 Corporate
Mastando III., John P. Partner New York +1 212 310 8064 Litigation
Odle, Jr., Robert C. Partner Washington DC +1 202 682 7180 Litigation
Park Jr., Jay H. Associate New York +1 212 310 8947 Corporate
Spake, Jr., Robert V. Associate New York +1 212 310 8794 Litigation
Welty, Ph.D. William P. Associate New York +1 212 310 8411 Corporate
Woodworth, P.C., Andrew J. Associate New York +1 212 310 8852 Corporate
Wright, M. Jarrad Associate Washington DC +1 202 682 7058 Litigation


rfransix
Novice

Mar 8, 2011, 9:35 AM

Post #6 of 17 (5464 views)
Re: [rfransix] Matching Unique Field with Action [In reply to] Can't Post

The pattern matching approach is great. Can Perl identity by fields and then evaluate if that field (whether $2, $4, or $6) contains a period then delete it? If the field contains any number then delete it, etc.


Karazam
User

Mar 8, 2011, 10:45 AM

Post #7 of 17 (5461 views)
Re: [rfransix] Matching Unique Field with Action [In reply to] Can't Post

Ah ok, the comma after first name part is optional. A question mark after the comma in the regex fixes that:


Code
perl -pi.orig -e "s/(\w+,?)\s+(?:\w\.?)+[,.]+\s+(.*)/$1 $2/;" in.file



Quote
Can Perl identity by fields and then evaluate if that field (whether $2, $4, or $6) contains a period then delete it? If the field contains any number then delete it, etc.


Yes of course.


Code
perl -ane "for (@F) { print qq($_ ) unless /\./; } print qq(\n);" in.file


Or "unless /\d/" for numbers.
Run perldoc perlrun (or http://perldoc.perl.org/perlrun.html) for the details (-a and -n flags).


rfransix
Novice

Mar 8, 2011, 11:07 AM

Post #8 of 17 (5459 views)
Re: [Karazam] Matching Unique Field with Action [In reply to] Can't Post

Indeed. While very useful, this code removes any field with a period, regardless of location. Is it possible to only remove specific fields, say field #2, or field #10??


rfransix
Novice

Mar 8, 2011, 11:19 AM

Post #9 of 17 (5457 views)
Re: [rfransix] Matching Unique Field with Action [In reply to] Can't Post

The /\d/ has the effect of removing any field with a number, including fields with alpha+numbers. To be sure is it possible to be more surgical and only remove a chosen field but leaving the others intact?


rfransix
Novice

Mar 8, 2011, 11:24 AM

Post #10 of 17 (5456 views)
Re: [Karazam] Matching Unique Field with Action [In reply to] Can't Post

this code: perl -pi.orig -e "s/(\w+,?)\s+(?:\w\.?)+[,.]+\s+(.*)/$1 $2/;" in.fileremoves all fields with a period.Can it be modified to only remove the second field (and optionally, a field of choice)?


Karazam
User

Mar 8, 2011, 11:30 AM

Post #11 of 17 (5455 views)
Re: [rfransix] Matching Unique Field with Action [In reply to] Can't Post

Ok, this comes to mind, but is starts to be perhaps a little too much to cram into a oneliner:


Code
perl -ane "for $i (0 .. $#F) { print qq($F[$i] ) unless $i == 1 and /\./; } print qq(\n);" in.file


Personally, I'd prefer a script at this point:


Code
#!/usr/bin/perl 
use warnings;
use strict;

while (<>) {
my @a = split;
for my $i ( 0 .. $#a ) {
print "$a[$i] " unless $i == 1 and $a[$i] =~ /\./;
}
print "\n";
}



(This post was edited by Karazam on Mar 8, 2011, 11:37 AM)


Karazam
User

Mar 8, 2011, 12:09 PM

Post #12 of 17 (5453 views)
Re: [rfransix] Matching Unique Field with Action [In reply to] Can't Post


Quote
this code: perl -pi.orig -e "s/(\w+,?)\s+(?:\w\.?)+[,.]+\s+(.*)/$1 $2/;" in.file removes all fields with a period.


That's really odd, for me it only touches the second field.


(This post was edited by Karazam on Mar 8, 2011, 12:10 PM)


rfransix
Novice

Mar 8, 2011, 2:22 PM

Post #13 of 17 (5446 views)
Re: [Karazam] Matching Unique Field with Action [In reply to] Can't Post

...bottlecaps...

The script works perfect. Can it be put to a one-liner?


rfransix
Novice

Mar 8, 2011, 2:35 PM

Post #14 of 17 (5445 views)
Re: [Karazam] Matching Unique Field with Action [In reply to] Can't Post

perl -ane "for $i (0 .. $#F) { print qq($F[$i] ) unless $i == 1 and /\./; } print qq(\n);"



The above code has the unfortunate side effect of removing the 2nd field if the line contains a period in any other field (for example, when the 3rd field contains A., or B., or C., then the 2nd field is deleted).


Karazam
User

Mar 8, 2011, 10:13 PM

Post #15 of 17 (5427 views)
Re: [rfransix] Matching Unique Field with Action [In reply to] Can't Post

Yes sorry, it was late here at my end. I missed a bit that was present in the script version.


Code
perl -ane "for $i (0 .. $#F) { print qq($F[$i] ) unless $i == 1 and $F[$i] =~ /\./; } print qq(\n);" in.file



rfransix
Novice

Mar 9, 2011, 12:17 PM

Post #16 of 17 (5404 views)
Re: [Karazam] Matching Unique Field with Action [In reply to] Can't Post

...bottlecaps...

Well done. Hurray!

This is perfect as it allows any field to be declared, and any character of choice.

Good luck to you.


rfransix
Novice

Mar 9, 2011, 2:40 PM

Post #17 of 17 (5396 views)
Re: [rfransix] Matching Unique Field with Action [In reply to] Can't Post

Another similiar problem may be resolved by altering the above code...what do you think?

In a file of thousands of lines, we want to keep only lines with 2 fields; any others we delete (delimiter of choice could be space, tab, period, comma, semi-colon, percent, dollar, etc.).

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives