CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Input record seperator issue in linux

 



Tejas
User

Jul 8, 2016, 1:23 AM

Post #1 of 9 (2672 views)
Input record seperator issue in linux Can't Post

Hi

Below is the code which fetched the lines seperated by local $/ = "\r\n\r\n";


Though

Code
se strict; 
use warnings;
use Data::Dumper;

my $ref = parse_report( \*DATA );
print Dumper $ref;

sub parse_report
{
my ( $handle ) = @_;

my $ref = [ ]; # { };

local $/ = "\r\n\r\n";

while ( my $chunk = <$handle> )
{

my ( $heading, $cols, $rows ) = split /-{10,}/, $chunk;
$heading =~ s/\s+$//;

my ( $heading_b, $time_cst ) = $heading =~ /^(.+)\s+at.*CST\s+Time:\s+(.+)$/; # todo: time utc not always available.

if ( $rows =~ /----/ )
{
$rows = [ ];
}
else
{
$cols = parse_csv_head( $cols );
$rows = parse_csv_body( $rows, $cols );
}

#$ref->{$time_cst}->{$heading_b} = $rows;
push @$ref, $rows if ( $heading_b eq 'Start of WIU/Base Station Link Report Summary Part 1
' );
}

return $ref;
}

sub parse_csv_head
{
my ( $head ) = @_;

my $hash = { };

my $rows = [ split /\n/, $head ];

for my $row ( @$rows )
{
next if $row =~ /^\s*$/;
while ( $row =~ /(\w.*?)\s{2,}/g )
{
my $col = $1;
my $offset = length $`;
push @{$hash->{$offset}}, $col;
}
}

my $head_b = [ map { join ' ', @{$hash->{$_}} } sort { $a <=> $b } keys %$hash ];

return $head_b;
}

sub parse_csv_body
{
my ( $body, $head ) = @_;

my $body_b = [ ];

my $rows = [ split /\n/, $body];

for my $row ( @$rows )
{
next if $row =~ /^\s*$/;
my $cells = { };
@$cells{@$head} = split /\s{2,}/, $row;
push @$body_b, $cells;
}

return $body_b;
}

__DATA__

Start of WIU Report Part 1 Summary at UTC Time: 1467159638 CST Time: Tue Jun 28 19:03:38 2016 to Tue Jun 28 19:20:38 2016
-----------------------------------------------------------------------------------------------------------------------------------------------
Best Avg Msg Idle Time Beacon Time Expected Link Counts WSBE
WIU WIU Mode Link State Comm Lvl Delay in Secs Period Period Msg Rate Total >95% Msg
WIU->ELM ELM->EMPR EMPR-WSSM WIU->WSSM Seconds Seconds Interval Links
-----------------------------------------------------------------------------------------------------------------------------------------------
780200700103 ON DEMAND MONITORING 100 0 5 0 5 900 120 1 2 2 N
780222304203 ON DEMAND MONITORING 100 0 0 0 0 900 120 1 2 2 N
780222304404 ON DEMAND MONITORING 94.38 0 0 0 0 900 120 1 2 0 N
780222306003 ON DEMAND MONITORING 100 0 0 0 0 900 120 1 2 2 N
-----------------------------------------------------------------------------------------------------------------------------------------------

Start of WIU Report Part 2 Alert Summary at UTC Time: 1467159638 CST Time: Tue Jun 28 19:03:38 2016 to Tue Jun 28 19:20:38 2016
------------------------------------------------------------------------------------------------------------------------------
Msg Delay Sequential Msg Loss Frequency Communication WSBE Exec Cell Tx
WIU WIU Mode Link State Alert Time Alert Time Alert Time Alert Time Alert Time Alert Time Alert Time
In Seconds In Seconds In Seconds In Seconds In Seconds In Seconds In Seconds
------------------------------------------------------------------------------------------------------------------------------
780200700103 ON DEMAND MONITORING 588 0 0 0 0 0 0
780222306003 ON DEMAND MONITORING 0 0 0 568 0 0 0
--------------------------------------------------------------------------------------------------------------------------------------------------------------------

Active WIU Alert Summary at UTC Time: 1467159638 CST Time: Tue Jun 28 19:03:38 2016 to Tue Jun 28 19:20:38 2016
--------------------------------------------------------------------------------------------------------------------------------------
WIU Comm Alert ID Delay Alert ID Seq Alert ID Overall Alert ID Freq Alert ID WSBE Exec Alert ID Cell Tx Alert ID
--------------------------------------------------------------------------------------------------------------------------------------
780200700103 0 3 0 0 0 0 0
780222306003 0 0 0 0 8 0 0
-----------------------------------------------------------------------------------------------

Start of WIU/Base Station Link Report Summary Part 1 at UTC Time: 1467159638 CST Time: Tue Jun 28 19:03:38 2016 to Tue Jun 28 19:20:38 2016
------------------------------------------------------------------------------------------------------------------------------------------------
Rcvd Msgs Old Msgs Msg Delay
WIU Base Station WIU Mode Link State Comm Lvl For Interval Received Secs (Avg)
WIU->ELM ELM->EMPR EMPR-WSSM WIU->WSSM
------------------------------------------------------------------------------------------------------------------------------------------------
780200700103 up.v.000315 ON DEMAND MONITORING 100 31 0 0 5 0 5
780200700103 up.v.000945 ON DEMAND MONITORING 100 31 0 0 5 0 5
780222304203 up.v.000654 ON DEMAND MONITORING 100 30 0 0 0 0 0
780222304203 up.v.000657 ON DEMAND MONITORING 100 30 0 0 0 0 0
780222304404 up.v.000654 ON DEMAND MONITORING 94.38 29 0 0 0 0.03448 0.03448
780222304404 up.v.000657 ON DEMAND MONITORING 94.38 29 0 0 0 0 0
780222306003 up.v.000652 ON DEMAND MONITORING 100 37 0 0 0 0 0
780222306003 up.v.000654 ON DEMAND MONITORING 100 37 0 0 0 0 0

Start of WIU/Base Station Link Duplicate Report Summary at UTC Time: 1467159638 CST Time: Tue Jun 28 19:03:38 2016 to Tue Jun 28 19:20:38 2016
----------------------------------------------------------------------------------------------
WIU Base Station WIU Mode Link State Duplicate Message Count
----------------------------------------------------------------------------------------------
780222306003 up.v.000652 ON DEMAND MONITORING 6
780222306003 up.v.000654 ON DEMAND MONITORING 6
------------------------------------------------------------------------------------------------------------------------------------------------


The above code works well with codepad.org and record seperator is picking up records perfectly
But when i use the same code in Redhat Linux, it is nt working the same,it is picking up the whole file as one record.
Any ideas what went wrong


BillKSmith
Veteran

Jul 8, 2016, 4:39 AM

Post #2 of 9 (2665 views)
Re: [Tejas] Input record seperator issue in linux [In reply to] Can't Post

Set $/ = q(); This is a special code which means break on any number of blank lines. It should work on any OS. Refer: $INPUT_RECORD_SEPARATOR in perldoc perlvar.
Good Luck,
Bill


Laurent_R
Veteran / Moderator

Jul 8, 2016, 10:52 AM

Post #3 of 9 (2655 views)
Re: [Tejas] Input record seperator issue in linux [In reply to] Can't Post

Just for information, a new line is "\r\n" under Windows and only "\n" index Unix or Linux. So the "\r\n\r\n" sequence will not be recognized as two new lines under Linux.

Setting $/ to the empty string, as suggested by Bill, is probably the best solution that should work on any system.


BillKSmith
Veteran

Jul 9, 2016, 5:52 AM

Post #4 of 9 (2634 views)
Re: [Laurent_R] Input record seperator issue in linux [In reply to] Can't Post

Please correct me if my interpretation is wrong. Perl has a lot of documentation related to this subject, but nothing seems to address it directly. I believe that a perl user almost never needs to know how a newline is represented in a disk file of his OS. Within perl, a newline is always represented by a single newline character. Translation to/from disk files is handled by PerlIO. (The translation 'layer' can be tuned off with the function 'binmode'. Note that 'binmode' does not do anything in UNIX because the translation it controls does not do anything.) Given this view, it is surprising to read in the original post that $/="\r\n\r\n" ever matches anything (unless binmode is effect on a windows system).

Detailed documentation of the 'open' function in perlfunc refers to PerlIO. The document PerlIO includes descriptions of the layers ':crlf' and ':unix' which actually perform the translation.
Good Luck,
Bill


Laurent_R
Veteran / Moderator

Jul 10, 2016, 7:53 AM

Post #5 of 9 (2605 views)
Re: [BillKSmith] Input record seperator issue in linux [In reply to] Can't Post

Yes, I think that you are right, although I am relatively rarely using Perl under Windows.

I have regularly trouble at work with newline character sequences because we are using on Unix platforms files that have sometimes been prepared under Windows (i.e. with Windows new lines). This is the type of case where this matters, because Perl does not know that the file has Windows ends of line.

When the platform where the file is prepared is the same as the platform where the Perl program is run, there is usually no problem using "\n" only, and "\r\n" would usually be a problem even under Windows.

Example program written and run under Windows (ActiveState):


Code
use strict; 
use warnings;

my $string = <<END;
line one
line two
END
;
print "Matched\n" if $string =~ /one\n/;


prints out "Matched", whereas it does not print anything if I change the relevant line to:


Code
print "Matched\n" if $string =~ /one\r\n/;



BillKSmith
Veteran

Jul 10, 2016, 1:43 PM

Post #6 of 9 (2597 views)
Re: [Laurent_R] Input record seperator issue in linux [In reply to] Can't Post

You make a good point. Although the OP used <DATA> is his post, his live data could have windows newlines. If you know that a file has this style newlines, you can translate them the same the same way that perl does on windows by specifying the :crlf layer in you open statement.


Code
use strict; 
use warnings;
# Simulate a windows file on a unix system
my $windows_file = \do{"Some test\r\nSome more text\r\n"};

open my $fh, '<:crlf', $windows_file or die "$!";
$_ = <$fh>;
print "Newline is ", /\r\n$/ ? "not " : "", "translated correctly\n";
close $fh;


Please verify this on a unix system. Replace the in-memory file with a real windows style file.

At first, this all seemed like an interesting diversion, but I have come to doubt that our advice about the null string would work without the translation.
Good Luck,
Bill


BillKSmith
Veteran

Jul 11, 2016, 11:48 AM

Post #7 of 9 (2580 views)
Re: [Tejas] Input record seperator issue in linux [In reply to] Can't Post

Tejas, The code that you have posted will not read the posted data correctly on any machine. I believe that the code will work on any unix-like system, but only if the data file uses windows style newlines. It probably is far more trouble than it is worth to write code that can handle either format on either type of machine.

What are your requirements? Will production code always run on the same type of machine? Which one? Will your data always use the same style newlines? If not, will the operator be able to set a run-time option to specify the input style?

You should at least consider the possibility of preprocessing your data into the form your system expects. (My windows gvim editor needs only a single command to change its save mode) Of course, in your code, you would have to set $INPUT_RECORD_SEPARATOR ($/) to the slightly idiomatic q() (a null string). Identical code would work on either machine. Preprocessing would only be necessary if the data was prepared with the other machine.
Good Luck,
Bill


Tejas
User

Jul 11, 2016, 12:08 PM

Post #8 of 9 (2577 views)
Re: [BillKSmith] Input record seperator issue in linux [In reply to] Can't Post

Yes, The Production code works always on linux
Thanks for your inputs


Tejas


Tejas
User

Jul 11, 2016, 12:51 PM

Post #9 of 9 (2576 views)
Re: [BillKSmith] Input record seperator issue in linux [In reply to] Can't Post

Thanks Bill
Thre is a seperate post i have crated for reading the same data ,
and i would want to read this data in rverse order and save it as per the regex match.
Can you please look into it.

I creatd a seperate post as the qustion is different , though we have the same data

Appreciate your help

Thanks

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives