CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
some text manipulations

 



yaniv_av
Novice

Nov 2, 2002, 7:54 AM

Post #1 of 3 (543 views)
some text manipulations Can't Post

I have a larg text file of articles that have this format:
----------------------------------------------------
<REUTERS TOPICS="NO" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET" OLDID="16322" NEWID="1002">
<DATE> 3-MAR-1987 09:19:31.96</DATE>
<TEXT>&#2;
<DATELINE> TAIPEI, March 3 - </DATELINE><BODY>Central bank governor Chang Chi-cheng
rejected a request by textile makers to halt the rise of the
Taiwan dollar against the U.S. Dollar to stop them losing
&#3;</BODY></TEXT>
</REUTERS>
<REUTERS TOPICS="YES" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET" OLDID="16323" NEWID="1003">
<DATE> 3-MAR-1987 09:20:23.32</DATE>
<TEXT>&#2;
<TITLE>NATIONAL FSI INC &lt;NFSI> 4TH QTR LOSS</TITLE>
<DATELINE> DALLAS, March 3 -
</DATELINE><BODY>Shr loss six cts vs profit 19 cts
Net loss 166,000 vs profit 580,000
Revs 3,772,000 vs 5,545,000
adjustments resulting from March 1985 reeacquisition of company
by its original shareholders before August 1985 initial public
offering.
Reuter
&#3;</BODY></TEXT>
</REUTERS>

-------------------------------------------------------
The articles themselfs are between <BODY>.....</BODY>.

I have 2 do 2 things:
1) create an array of the "OLDID"'s of all the articles in that file.
2) creating an array of articles that contains ONLY the articles (from <BODY> to </BODY> - removing all the other text in the file).
can sombody help me with that ?


thodi
stranger

Nov 2, 2002, 9:35 AM

Post #2 of 3 (541 views)
Re: [yaniv_av] some text manipulations [In reply to] Can't Post

Looks like XML to me. Take a look at http://search.cpan.org/author/COOPERCL/XML-Parser-2.31/Parser.pm, maybe this is something for you. Or http://search.cpan.org/author/JGOFF/parrot-0.0.8.1/lib/Text/Balanced.pm.


davorg
Thaumaturge / Moderator

Nov 3, 2002, 1:19 AM

Post #3 of 3 (534 views)
Re: [yaniv_av] some text manipulations [In reply to] Can't Post

Looks like the kind of job that XML::XPath would be perfect for,

--
Dave Cross, Perl Hacker, Trainer and Writer
http://www.dave.org.uk/
Get more help at Perl Monks

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives