CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
Sorting Chunks of a file

 



sn0rkl3m4st3r
New User

Sep 1, 2011, 6:15 AM

Post #1 of 6 (4107 views)
Sorting Chunks of a file Can't Post

Hi,



I'm attempting to sort a file based on a few criterium. I have a file that has a similar structure to the following:

#*----------------

GRAPH: <graphname>

stuff

#*----------------

GRAPH: <graphname>

stuff

#*----------------



I need to put these graph chunks in alphabetical order (based on the graph name) with the exact same file structure. Does anyone have any ideas? Maybe some resources of problems similar to this? So far I have this...


Code
 
if (-e $ARGV[0])
{
#open file for iteration
open(f, $ARGV[0]) or die("Could not open file");

#iterating...
$i = 0;
@hashes = ();
%graph = ();
$first = 1;
while(f)
{
#read lines from file
$line = readline(f);
#check for ##*-------- line
#if in the graph section set the hash values
#and set the hash in the array
if($line =~ m/^#\*+-/)
{

}
else
{
#print "setting in hash\n";
$inhash = 1;
}

if($inhash)
{
$graph{"$linenumber"} = $line;
$linenumbers++;
}
}
}
else
{
die("cannot find file");
}
close(f);
print "@hashes\n";



This is not all my code obviously but it's the basic idea (without all the usage checking). Basically, I've gotten far enough to find the lines that separate the data. Any suggestions would be greatly appreciated! Thanks!


(This post was edited by sn0rkl3m4st3r on Sep 1, 2011, 6:18 AM)


Zhris
Enthusiast

Sep 1, 2011, 3:01 PM

Post #2 of 6 (4094 views)
Re: [sn0rkl3m4st3r] Sorting Chunks of a file [In reply to] Can't Post

Hey,

Theres a number of ways the data could be constructed -> sorted -> printed. Heres my rough method which merely uses nested array references:


Code
#! /usr/bin/perl 
use strict;
use warnings;
use Data::Dumper;

my $b_regex = q{^#\*\-+}; #break
my $gn_regex = q{^GRAPH: <(.*?)>}; #graph name

my $graphs = [];
my $graph_name = q{};
my $chunk = [];

while (my $line = <DATA>) {
chomp $line;

push @$chunk, $line;

if ($line =~ m/$b_regex/) {
push @$graphs, [$graph_name, $chunk];
($graph_name, $chunk) = (q{}, []);
}
elsif ($line =~ m/$gn_regex/) {
$graph_name = $1;
}
}

#print Dumper($graphs);

foreach my $graph (sort { $a->[0] cmp $b->[0] } @$graphs) {
{
local $" = "\n";
print "@{$graph->[1]}\n";
}
}

__DATA__
#*----------------

GRAPH: <graphname2>

stuff

#*----------------

GRAPH: <graphname1>

stuff

#*----------------

GRAPH: <graphname3>

stuff

#*----------------


Chris


(This post was edited by Zhris on Sep 1, 2011, 4:06 PM)


Chris Charley
User

Sep 1, 2011, 4:32 PM

Post #3 of 6 (4077 views)
Re: [sn0rkl3m4st3r] Sorting Chunks of a file [In reply to] Can't Post

It is possible to sort by 'chunks' where the input record separator, $/, is set equal to "#".


Code
#!/usr/bin/perl 
use strict;
use warnings;

$/ = "#";
my @data;

while (<DATA>) {
chomp;
next unless $_;
last if /^\*----------------\Z/;
push @data, $_;
}

print "#$_" for sort by_name @data;
print "#*----------------\n";

sub by_name {
my ($A) = $a =~ /GRAPH: <(.+?)>/;
my ($B) = $b =~ /GRAPH: <(.+?)>/;
return $A cmp $B;
}

__DATA__
#*----------------
GRAPH: <graphname2>
stuff
#*----------------
GRAPH: <graphname1>
stuff
#*----------------
GRAPH: <graphname3>
stuff
#*----------------



BillKSmith
Veteran

Sep 1, 2011, 9:00 PM

Post #4 of 6 (4056 views)
Re: [Chris Charley] Sorting Chunks of a file [In reply to] Can't Post

$_ is not limited to a single character.

We are sorting on the first part of the line that is variable. We do not have to extract the field.

Your method becomes:


Code
#!/usr/ bin / perl 
use strict;
use warnings;
$/ = "#*----------------";
print sort <DATA>;
__DATA__
#*----------------
GRAPH: <graphname2>
stuff
#*----------------
GRAPH: <graphname1>
stuff
#*----------------
GRAPH: <graphname3>
stuff
#*----------------

Good Luck,
Bill


Zhris
Enthusiast

Sep 2, 2011, 12:27 AM

Post #5 of 6 (4038 views)
Re: [BillKSmith] Sorting Chunks of a file [In reply to] Can't Post

Very nice methods using $/. However, with real data I can imagine Chris' being less efficient than my own method (slower compilation time) who uses 2 chunk wide regular expressions while sorting. Other things to note are the O/P probably wants to retain empty lines, Bill's code won't necessarily retain the original order if 2 graph names are the same since it will become controlled by "stuff". Also Bill's output won't be formatted properly in his current code. Just to nitpick beyond the example data!

Chris


(This post was edited by Zhris on Sep 2, 2011, 12:29 AM)


sn0rkl3m4st3r
New User

Sep 2, 2011, 5:15 AM

Post #6 of 6 (4028 views)
Re: [sn0rkl3m4st3r] Sorting Chunks of a file [In reply to] Can't Post

Thanks for all the great replies. I actually kept working on it yesterday and was able to achieve a result which is less elegant than anyone elses...by far. Here is the "final" code:




Code
   

#!/usr/bin/perl -w
#Written and tested by sn0rkl3m4st3r
#Script for sorting graph text files

#Ensure correct number of command line arguments
#and if they're incorrect, print usage and die.
if (@ARGV < 2 || @ARGV > 2)
{
die("Usage: sortit.pl <infile> <outfile>\n");
}

#Check if the file exists
if (-e $ARGV[0])
{
#open file for iteration
open(F, "<$ARGV[0]") or die("Could not open file");
open (G, ">>$ARGV[1]") or die("could not open outfile");

#initialize variables/arrays
$i = 0;
$fseven = 1;
%graph = ();
@hsharr = ();
@names = ();
$first = 1;
$inhash = 0;
$gname = "";
#read lines from file
while($line = readline(F))
{
#make sure first 7 lines are printed to the new file
if($fseven <= 7)
{
if($fseven == 7)
{
$line =~ s/\n//;
}
print G "$line";
$fseven++;
}

#check for ##*-------- line
#if in the graph section set the hash values
#and set the hash in the array
if($line =~ m/^#\*+-/)
{

#make sure we don't have null first hash/array
if($first)
{

$first = 0;

}
else
{
#this line won't be part of the array
$inhash = 0;
$i = 0;
#because this isn't the first ##*---- set we've seen
#we know we have an array to push to another array...don't ask
push(@hsharr, @graph);
#reinitialize
@graph = ();
}
}
elsif(!$first)
{
$inhash = 1;
}

if($inhash)
{
if($line =~ m/^GRAPH:/)
{
#obtain the graph name
@splitar = split(/:/, $line);
$gname = $splitar[1];
#remove leading space
$gname =~ s/^\s+//;
push (@names, $gname);
#get rid of leading spaces and enter into array
$line =~ s/^\s+//;
$graph[$i] = $line;
$i++;
}
else
{
$graph[$i] = $line;
$i++;
#print "current graph ($i): @graph\nDone\n\n";
}

}
}

#sort the list of names for comparison
@names = sort(@names);
foreach $name (@names)
{
$printed = 0;
$isequal = 0;
foreach $line (@hsharr)
{
@splitar = split(/:/, $line);
if($line =~ m/^GRAPH:/)
{
#print "trying to format...";
$gname = $splitar[1];
$gname =~ s/^\s+//;
if($gname eq $name)
{
#print "name is: $name";
print G "\n#*--------------------------------------------------------\n\n";
print G "$line";
$isequal = 1;
}
}
elsif(($line =~ m/^END_GRAPH:/) && $isequal)
{
print G "$line";
$isequal = 0;
$printed = 1;
}
elsif($isequal)
{
print G "$line";
}
}
}}
else
{
die("cannot find file");
}
close(F);
close(G);





There's a lot of stepping on toes here, but I think for the most part it does what it's supposed to do within a reasonable period of time. Also, there are a lot of place holder variables so that my iterations were kept correct. I will be, in the future, optimizing this script to be a little more efficient. Currently, much of the functionality is obviously a bit superfluous given the efficient examples you gave. Thanks a ton for your help!


(This post was edited by sn0rkl3m4st3r on Sep 2, 2011, 5:26 AM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives