Home: Perl Programming Help: Beginner:
some text manipulations



yaniv_av
Novice

Nov 2, 2002, 7:54 AM


Views: 921
some text manipulations

I have a larg text file of articles that have this format:
----------------------------------------------------
<REUTERS TOPICS="NO" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET" OLDID="16322" NEWID="1002">
<DATE> 3-MAR-1987 09:19:31.96</DATE>
<TEXT>&#2;
<DATELINE> TAIPEI, March 3 - </DATELINE><BODY>Central bank governor Chang Chi-cheng
rejected a request by textile makers to halt the rise of the
Taiwan dollar against the U.S. Dollar to stop them losing
&#3;</BODY></TEXT>
</REUTERS>
<REUTERS TOPICS="YES" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET" OLDID="16323" NEWID="1003">
<DATE> 3-MAR-1987 09:20:23.32</DATE>
<TEXT>&#2;
<TITLE>NATIONAL FSI INC &lt;NFSI> 4TH QTR LOSS</TITLE>
<DATELINE> DALLAS, March 3 -
</DATELINE><BODY>Shr loss six cts vs profit 19 cts
Net loss 166,000 vs profit 580,000
Revs 3,772,000 vs 5,545,000
adjustments resulting from March 1985 reeacquisition of company
by its original shareholders before August 1985 initial public
offering.
Reuter
&#3;</BODY></TEXT>
</REUTERS>

-------------------------------------------------------
The articles themselfs are between <BODY>.....</BODY>.

I have 2 do 2 things:
1) create an array of the "OLDID"'s of all the articles in that file.
2) creating an array of articles that contains ONLY the articles (from <BODY> to </BODY> - removing all the other text in the file).
can sombody help me with that ?


thodi
stranger

Nov 2, 2002, 9:35 AM


Views: 919
Re: [yaniv_av] some text manipulations

Looks like XML to me. Take a look at http://search.cpan.org/author/COOPERCL/XML-Parser-2.31/Parser.pm, maybe this is something for you. Or http://search.cpan.org/author/JGOFF/parrot-0.0.8.1/lib/Text/Balanced.pm.


davorg
Thaumaturge / Moderator

Nov 3, 2002, 1:19 AM


Views: 912
Re: [yaniv_av] some text manipulations

Looks like the kind of job that XML::XPath would be perfect for,

--
Dave Cross, Perl Hacker, Trainer and Writer
http://www.dave.org.uk/
Get more help at Perl Monks