CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Sorting hash by value

 



niall_heavey
Novice

Jul 14, 2010, 7:27 AM

Post #1 of 22 (1383 views)
Sorting hash by value Can't Post

Hi all, i've been trying to find this on another forum but no joy so thought i'd see what else is out there.
I'm quite new to perl so bare with me!

I have got a csv file which I am reading in, there is a lot of repeated information in columns 3 and 5. I was trying to see which is the most common entry from both of these columns.
I then wanted to sort it in numerical order.
At the moment I have my list of the number of entries for each value from both columns but now I am trying to sort it numerically....... (If possible just the top 5 or 10 entries as there are hundreds!)

I have searched many tutorial websites and have seen the same piece of code to do this, but for some reason it does not work for me.


Code
      #!/usr/bin/perl 
#use strict;
#use warnings;
#use Text::CSV;

my $dat_file = 'file.csv';

my $index = {};
my $index_test = {};
@data =('col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7');

open (F, $dat_file), print "File Opened\n\n" || die ("Could not open file");

while ($line = <F>)

{

($data1,$data2,$data3,$data4,$data5,$data6) = split ',', $line;

if (exists($index{$data3}))
{
$index->{ $data3 } = 1;

}
else
{
$index->{ $data3 } =$index->{ $data3 }+1 ;
}
if (exists($index{$data5}))
{
$index->{ $data5 } = 1;

}

else

{
$index->{ $data5 } = $index->{ $data5 }+1 ;
}

}
foreach $value (sort {$index{$a} cmp $index{$b} }

keys %index)

{

print "$value => $index{$value}\n";

}
close (F);
print"\n";


This is the code I have, pretty sure the problem is somewhere in the
foreach $value (sort {$index{$a} cmp $index{$b} }
section near the end.

I am not getting an error but am just getting "File Opened " output to the screen!

Can anyone see where my problem might be?

Thanks,
N


savo
User

Jul 14, 2010, 8:33 AM

Post #2 of 22 (1377 views)
Re: [niall_heavey] Sorting hash by value [In reply to] Can't Post

Do you have a sample input file i may have a play later if i get time.


niall_heavey
Novice

Jul 14, 2010, 8:48 AM

Post #3 of 22 (1373 views)
Re: [savo] Sorting hash by value [In reply to] Can't Post

Ok, thanks for the offer.

I have attached a similar lay out file. It is column 3 and 5 that I am interested in. I think Nokia should have 30 entries, Iphone 1, Sony 27, Motorola 1 and HTC 1.

But currently the way it outputs is:

"Nokia" => 30
"Iphone" => 1
"Sony" => 27
"Motorola" => 1
"HTC" => 1

But I am looking for it in order like:

"Nokia" => 30
"Sony" => 27
"Iphone" => 1
"Motorola" => 1
"HTC" => 1

Hopefully you can help!

Thanks very much.


FishMonger
Veteran / Moderator

Jul 14, 2010, 8:55 AM

Post #4 of 22 (1371 views)
Re: [niall_heavey] Sorting hash by value [In reply to] Can't Post

cmp sorts lexically, not numerically

Change cmp to <=>


niall_heavey
Novice

Jul 14, 2010, 8:57 AM

Post #5 of 22 (1369 views)
Re: [FishMonger] Sorting hash by value [In reply to] Can't Post

Cool, changed that now.

still no display though!

Thanks.


savo
User

Jul 14, 2010, 9:05 AM

Post #6 of 22 (1366 views)
Re: [niall_heavey] Sorting hash by value [In reply to] Can't Post

I am a little confused you say there is no display but also show how it outputs?

Could you add the first few lines of file.csv


BillKSmith
Veteran

Jul 14, 2010, 9:14 AM

Post #7 of 22 (1362 views)
Re: [niall_heavey] Sorting hash by value [In reply to] Can't Post

Restore "use strict;" and "use warnings" then correct the errors that are reported.

  • You must decide if the symbol "index" is a hash or a reference to a hash and use it consistently.


  • The array @data is never used. Remove it


  • The reference $index_test is never used. Remove it.


  • Use the three argument form of open.


  • Remove ', print "File Opened\n\n"'

    The || operator is testing the return code of print not open.


  • This Should get your script working. Then we talk about how to improve it.
    Good Luck,
    Bill


    niall_heavey
    Novice

    Jul 14, 2010, 3:45 PM

    Post #8 of 22 (1351 views)
    Re: [BillKSmith] Sorting hash by value [In reply to] Can't Post

    Ok, thank you all for your help, with regard me having no display, what I mean is I have no display where I try to output the sorted values, I am getting the values output in a raw (unordered) display.

    I have also done some of the things that was suggested, but what do you mean by the 3 argument form of open?
    And
    You must decide if the symbol "index" is a hash or a reference to a hash and use it consistently.

    I am quite new to perl so not sure what exactly is meant! Well I know what 3 argument form is but just don't know what the 3rd argument should be!!

    Thanks again,

    N


    BillKSmith
    Veteran

    Jul 14, 2010, 7:19 PM

    Post #9 of 22 (1346 views)
    Re: [niall_heavey] Sorting hash by value [In reply to] Can't Post

    You can reference perl documentation by using the perldoc program.
    At your command prompt, type perldoc filename.
    Type perldoc perldoc to learn more options.

    refer to perldoc -f open for a description of the mode argument of open.


    Code
      

    open( F, '<', $dat_file ) || die 'Could not open file.';



    Refer to perldoc perldata for description of perl data types especially 'hash'.

    If you are not familiar with references, you probably should not be using them.


    Code
    #my $index = [];  

    my %index;



    Remove '->' from every use of 'index'. The use strict and use warnings will
    help find all of them.

    you should read:
    perldoc perlreftut for an overview of references.
    perldoc perldsc for examples of references in data structures.
    perldoc perlref for details of references.

    Your logic is unnecessarily complicated, but lets get your code working before
    we address that.
    Good Luck,
    Bill


    FishMonger
    Veteran / Moderator

    Jul 14, 2010, 8:09 PM

    Post #10 of 22 (1344 views)
    Re: [niall_heavey] Sorting hash by value [In reply to] Can't Post

    There are a number of things that Bill mentioned that I could expand on, but I'll focus on the open call.

    Current best practice standards recommend using a lexical var for the filehandle instead of a bareword. Two of the reasons are 1) reducing the scope of the filehandle, and 2) a lexical var filehandle will automatically close when it goes out of scope.

    The name of the filehandle, as with all vars, should be descriptive.

    In most cases, the die statement should include the reason it failed, which is stored in Perl's $! var, as well as the filename.

    Personally, I prefer to use 'or' instead of '||' when checking the return code of the open call.


    Code
    open my $csv_fh, '<', $dat_file 
    or die "Could not open <$dat_file> for reading. $!";



    niall_heavey
    Novice

    Jul 16, 2010, 5:50 AM

    Post #11 of 22 (1327 views)
    Re: [FishMonger] Sorting hash by value [In reply to] Can't Post

    Ok, I have tried sorting out the code as recommended,

    With warnings and strick uncommented I am now getting numerous errors (they are really all the same thing though!)

    They are:

    Global symbol "$line" requires explicit package name at simplify.pl line 14.
    Global symbol "$data1" requires explicit package name at simplify.pl line 18.
    Global symbol "$data2" requires explicit package name at simplify.pl line 18.
    Global symbol "$data3" requires explicit package name at simplify.pl line 18.
    Global symbol "$data4" requires explicit package name at simplify.pl line 18.
    Global symbol "$data5" requires explicit package name at simplify.pl line 18.
    Global symbol "$data6" requires explicit package name at simplify.pl line 18.
    Global symbol "$line" requires explicit package name at simplify.pl line 18.
    Global symbol "%index" requires explicit package name at simplify.pl line 20.
    Global symbol "$data3" requires explicit package name at simplify.pl line 20.
    Global symbol "$index" requires explicit package name at simplify.pl line 22.
    Global symbol "$data3" requires explicit package name at simplify.pl line 22.
    Global symbol "$index" requires explicit package name at simplify.pl line 27.
    Global symbol "$data3" requires explicit package name at simplify.pl line 27.
    Global symbol "$index" requires explicit package name at simplify.pl line 27.
    Global symbol "$data3" requires explicit package name at simplify.pl line 27.
    Global symbol "%index" requires explicit package name at simplify.pl line 31.
    Global symbol "$data5" requires explicit package name at simplify.pl line 31.
    Global symbol "$index" requires explicit package name at simplify.pl line 33.
    Global symbol "$data5" requires explicit package name at simplify.pl line 33.
    Global symbol "$index" requires explicit package name at simplify.pl line 38.
    Global symbol "$data5" requires explicit package name at simplify.pl line 38.
    Global symbol "$index" requires explicit package name at simplify.pl line 38.
    Global symbol "$data5" requires explicit package name at simplify.pl line 38.


    I assume since they are all effectively the same error that it might be something very simple that I am missing.

    Thanks for all the help so far!

    N


    FishMonger
    Veteran / Moderator

    Jul 16, 2010, 6:06 AM

    Post #12 of 22 (1323 views)
    Re: [niall_heavey] Sorting hash by value [In reply to] Can't Post

    The error messages are telling you that you forgot to declare those vars, which is done with the 'my' keyword.


    niall_heavey
    Novice

    Jul 16, 2010, 6:16 AM

    Post #13 of 22 (1321 views)
    Re: [FishMonger] Sorting hash by value [In reply to] Can't Post

    I'm putting them in like so:

    my $data3 = {};

    would this be sufficient?

    Thanks for the quick reply!


    FishMonger
    Veteran / Moderator

    Jul 16, 2010, 6:51 AM

    Post #14 of 22 (1316 views)
    Re: [niall_heavey] Sorting hash by value [In reply to] Can't Post

    If you want $data3 to be a reference to a hash, then that would be the correct way to declare it. I doubt that is what you intended.

    An array would probably be more appropriate.

    Code
    my @data = split /,/, $line;



    BillKSmith
    Veteran

    Jul 16, 2010, 7:10 AM

    Post #15 of 22 (1313 views)
    Re: [niall_heavey] Sorting hash by value [In reply to] Can't Post

    No! $data3 is not a reference to a hash. The messages show that you have not even tried to fix index. Did you read the references?

    You can eliminate all the line and data messages with 'my' in two places


    Code
      

    while (my $line = <F>) {
    my ($data1,$data2,$data3,$data4,$data5,$data6) = split ',', $line;



    Fix index as I described in a previous post. Make the code change that I showed explicitly. The messaages that remain will tell you what lines have hash refrences that must be changed to hash accesses. (Remove '->')
    Good Luck,
    Bill


    niall_heavey
    Novice

    Jul 16, 2010, 7:20 AM

    Post #16 of 22 (1311 views)
    Re: [BillKSmith] Sorting hash by value [In reply to] Can't Post

    I did read it but I am very much a beginner at perl, and script programming in general. So I did not understand most of it. I have put in the my and the errors are gone
    I am now getting a number of these errors:

    Use of uninitialized value in addition (+) at simplify.pl line 32, <F> line 1

    So hopefully I am on the right tracks.....


    FishMonger
    Veteran / Moderator

    Jul 16, 2010, 7:29 AM

    Post #17 of 22 (1311 views)
    Re: [BillKSmith] Sorting hash by value [In reply to] Can't Post


    Quote

    Code
    my ($data1,$data2,$data3,$data4,$data5,$data6) =


    That screams out that an array should be used instead of the individual sequentially numbered vars.


    Quote

    Code
    split ',', $line


    The first arg for split is a pattern, not a string. So, even though the string is allowed, the proper syntax would be:

    Code
    split /,/, $line



    BillKSmith
    Veteran

    Jul 16, 2010, 8:44 AM

    Post #18 of 22 (1305 views)
    Re: [niall_heavey] Sorting hash by value [In reply to] Can't Post

    Sorry, I did not notice that your logic is backwards.


    Code
      

    # if (exists($index{$data3})) {
    if (!exists($index{$data3})) {

    # if (exists($index{$data5})) {
    if (!exists($index{$data5})) {



    I hope that this will "work'. I agree with all of FishMonger's recommendations and I can suggest further improvements.
    Good Luck,
    Bill


    niall_heavey
    Novice

    Jul 19, 2010, 8:27 AM

    Post #19 of 22 (1279 views)
    Re: [BillKSmith] Sorting hash by value [In reply to] Can't Post

    Ok, I have now changed this to a database and have it working (sorting the values). However for some reason it only works with small files and not the large one that I need.
    It is giving me errors/warnings that I think might be causing the problem, these are

    Use of uninitialized value in addition (+) at simplify.pl line 41.
    Use of uninitialized value in addition (+) at simplify.pl line 52.
    Use of uninitialized value in addition (+) at simplify.pl line 41.
    Use of uninitialized value in addition (+) at simplify.pl line 52.
    Use of uninitialized value in addition (+) at simplify.pl line 52.
    HASH(0x8647968)


    Line 41 and 52 are as follows:

    Code
    34. if (exists($index{$data3})) 
    35. {
    36. $index->{ $data3 } = 1;
    37.
    38. }
    39. else
    40. {
    41. $index->{ $data3 } = $index->{ $data3 }+1 ;
    42.
    43. }
    44.
    45. if (exists($index{$data5}))
    46. {
    47. $index->{ $data5 } = 1;
    48.
    49. }
    50. else
    51. {
    52. $index->{ $data5 } = $index->{ $data5 }+1 ;
    53.
    54. }

    I have tried a few things to sort out these problems, some of them get rid of the messages but then the code does not sort the values etc.

    Any ideas where the problem might be with these errors?

    Thanks!
    N


    (This post was edited by niall_heavey on Jul 19, 2010, 8:49 AM)


    BillKSmith
    Veteran

    Jul 19, 2010, 9:34 AM

    Post #20 of 22 (1274 views)
    Re: [niall_heavey] Sorting hash by value [In reply to] Can't Post

    You did not make the changes from my previous post (add the ! to each if).



    I think you can replace lines 34 through 54 with the following two lines.


    Code
      

    $index->{$data3}++;

    $index->{$data5}++;

    Good Luck,
    Bill


    niall_heavey
    Novice

    Jul 19, 2010, 9:41 AM

    Post #21 of 22 (1273 views)
    Re: [BillKSmith] Sorting hash by value [In reply to] Can't Post

    I did make the change with the ! but it just displayed all results with a value of 1!

    I have replaced the lines with the 2 you gave me and it got rid of the errors,

    It is currently running for a large section of data so waiting to see what happens.

    When I ran it with the small piece I was still getting the

    HASH(0x9279968)

    above the displayed (sorted) results
    Do you know what this means? Or where it comes from? The value in it does change by the way (i.e. its not always (0x9279968))

    Thanks!
    N


    BillKSmith
    Veteran

    Jul 19, 2010, 10:17 AM

    Post #22 of 22 (1270 views)
    Re: [niall_heavey] Sorting hash by value [In reply to] Can't Post

    This is what perl does when you attempt to print a reference to a hash. The number part is an address which probably is different every time you run the program.

    You will probably need Data::Dumper to track this down. Do not ignore it!
    Good Luck,
    Bill

     
     


    Search for (options) Powered by Gossamer Forum v.1.2.0

    Web Applications & Managed Hosting Powered by Gossamer Threads
    Visit our Mailing List Archives