
yaniv_av
Novice
Nov 2, 2002, 7:54 AM
Post #1 of 3
(981 views)
|
some text manipulations
|
Can't Post
|
|
I have a larg text file of articles that have this format: ---------------------------------------------------- <REUTERS TOPICS="NO" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET" OLDID="16322" NEWID="1002"> <DATE> 3-MAR-1987 09:19:31.96</DATE> <TEXT> <DATELINE> TAIPEI, March 3 - </DATELINE><BODY>Central bank governor Chang Chi-cheng rejected a request by textile makers to halt the rise of the Taiwan dollar against the U.S. Dollar to stop them losing </BODY></TEXT> </REUTERS> <REUTERS TOPICS="YES" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET" OLDID="16323" NEWID="1003"> <DATE> 3-MAR-1987 09:20:23.32</DATE> <TEXT> <TITLE>NATIONAL FSI INC <NFSI> 4TH QTR LOSS</TITLE> <DATELINE> DALLAS, March 3 - </DATELINE><BODY>Shr loss six cts vs profit 19 cts Net loss 166,000 vs profit 580,000 Revs 3,772,000 vs 5,545,000 adjustments resulting from March 1985 reeacquisition of company by its original shareholders before August 1985 initial public offering. Reuter </BODY></TEXT> </REUTERS> ------------------------------------------------------- The articles themselfs are between <BODY>.....</BODY>. I have 2 do 2 things: 1) create an array of the "OLDID"'s of all the articles in that file. 2) creating an array of articles that contains ONLY the articles (from <BODY> to </BODY> - removing all the other text in the file). can sombody help me with that ?
|