CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Split file using regexp

 



perl_nov
New User

Jan 17, 2013, 12:37 PM

Post #1 of 3 (988 views)
Split file using regexp Can't Post

Hi Gurus,
I am new to perl and have following question.

I have huge files around 400 mb, which has clob data and have diffeent scenarios:

I am trying to pass scenario number as parameter and and get required modified file based on the scenario number and criteria.

Scenario 1:
file name : scenario_1.txt

Code
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~pi=[10.10.10.10.10],uid=[23231131]}~ 
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~pi=[10.10.10.10.11],uid=[3456]}~
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~pi=[10.10.10.10.12],uid=[659784]}~
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~pi=[10.10.10.10.13],uid=[654812]}~
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~pi=[10.10.10.10.14],uid=[323]}~
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~pi=[10.10.10.10.10],uid=[97945641564]}~
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~pi=[10.10.10.10.10],uid=[1654594]}~


Now I am trying to split the data like below to a new file scenario_1_n.txt:

It should get all the data till last "|" and the pi, uid

Code
1|1212|34353|56575|||||4|10.10.10.10.10|23231131 
1|1212|34353|56575|||||4|10.10.10.10.11|3456
1|1212|34353|56575|||||4|10.10.10.10.12|659784
1|1212|34353|56575|||||4|10.10.10.10.13|654812
.
.
.
.


Scenario 2:
file name : scenario_2.txt

Code
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~390=10.10.10.10.10,391=23231131,394~ 
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~390=10.10.10.10.11,391=3456,394~
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~390=10.10.10.10.12,391=659784,394~
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~390=10.10.10.10.13,391=654812,394~
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~390=10.10.10.10.14,391=323,394~
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~390=10.10.10.10.10,391=97945641564,394~
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~390=10.10.10.10.10,391=1654594,394~


Now I am trying to split the data like below to a new file scenario_2_n.txt:

It should get all the data till last "|" and the date after 390=, and 391=

Code
1|1212|34353|56575|||||4|10.10.10.10.10|23231131 
1|1212|34353|56575|||||4|10.10.10.10.11|3456
1|1212|34353|56575|||||4|10.10.10.10.12|659784
1|1212|34353|56575|||||4|10.10.10.10.13|654812
.
.
.
.


Scenario 3:
file name : scenario_3.txt

Code
1|1212|34353|56575|||||4|~somedata~10.10.10.10.10~123546~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~ 
1|1212|34353|56575|||||4|~somedata~10.10.10.10.11~546~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~
1|1212|34353|56575|||||4|~somedata~10.10.10.10.12~3415646~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~
1|1212|34353|56575|||||4|~somedata~10.10.10.10.13~12156~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~
1|1212|34353|56575|||||4|~somedata~10.10.10.10.10~15464~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~
1|1212|34353|56575|||||4|~somedata~10.10.10.10.10~8465~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~
1|1212|34353|56575|||||4|~somedata~10.10.10.10.10~15654~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~


Now I am trying to split the data like below to a new file scenario_3_n.txt:

It should get all the data till last "|" and the date after second~ and third~

Code
1|1212|34353|56575|||||4|10.10.10.10.10|123546 
1|1212|34353|56575|||||4|10.10.10.10.11|546
1|1212|34353|56575|||||4|10.10.10.10.12|3415646
1|1212|34353|56575|||||4|10.10.10.10.13|12156
.
.
.
.


Thanks for looking and thanks for your help.


7stud
Enthusiast

Jan 17, 2013, 4:25 PM

Post #2 of 3 (982 views)
Re: [perl_nov] Split file using regexp [In reply to] Can't Post

Have at it. Good luck.


BillKSmith
Veteran

Jan 17, 2013, 7:22 PM

Post #3 of 3 (968 views)
Re: [perl_nov] Split file using regexp [In reply to] Can't Post

Scenario_1 should get you going.

Code
use strict; 
use warnings;
while ( <DATA>) {
s/^([\d|]+)\|.+pi=\[([\d.]+)\],uid=\[(\d+)\].*$/$1|$2|$3\n/s;
print;
}
__DATA__
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~pi=[10.10.10.10.10],uid=[23231131]}~
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~pi=[10.10.10.10.11],uid=[3456]}~
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~pi=[10.10.10.10.12],uid=[659784]}~
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~pi=[10.10.10.10.13],uid=[654812]}~
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~pi=[10.10.10.10.14],uid=[323]}~
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~pi=[10.10.10.10.10],uid=[97945641564]}~
1|1212|34353|56575|||||4|~somedata~some data~~~~~~~~~~~~some data~~~~~~~~~~~~~~some data~~~~~~~pi=[10.10.10.10.10],uid=[1654594]}~

Good Luck,
Bill

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives