CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
A file parsing and 2D array/matrix problem.

 

First page Previous page 1 2 3 Next page Last page  View All


rushadrena
Novice

Aug 25, 2012, 8:32 AM

Post #1 of 62 (7679 views)
A file parsing and 2D array/matrix problem. Can't Post

 I am stuck with this complicated problem. I have a list

** LIST**
[code]
substrate[s]: 3649
product[s]: 3419 3648
substrate[s]: 3645
product[s]: 3647
substrate[s]: 3659
product[s]: 3647
substrate[s]: 3675
product[s]: 3674
substrate[s]: 3674
product[s]: 3490 3489
substrate[s]: 3489
product[s]: 3490
substrate[s]: 3490
product[s]: 3485
substrate[s]: 3485
product[s]: 3486
substrate[s]: 3486
product[s]: 3488
substrate[s]: 3488
product[s]: 3487
substrate[s]: 3487
product[s]: 3877
substrate[s]: 3877
product[s]: 3419
substrate[s]: 3182
product[s]: 1875
substrate[s]: 2809
product[s]: 3182
substrate[s]: 3186
product[s]: 2809 [/code]


Now I have a superlist each of substrate & product as:-

**SUPERLIST_SUBSTRATE**
[code]
substrate[s]: 3649
substrate[s]: 3645
substrate[s]: 3659
substrate[s]: 3675
substrate[s]: 3674
substrate[s]: 3489
substrate[s]: 3490
substrate[s]: 3485
substrate[s]: 3486
substrate[s]: 3488
substrate[s]: 3487
substrate[s]: 3877
substrate[s]: 3182
substrate[s]: 2809
substrate[s]: 3186
substrate[s]: 3675
substrate[s]: 3492
substrate[s]: 3314
substrate[s]: 3006
substrate[s]: 3049[/code]


**SUPERLIST_PRODUCT**
[code]
product[s]: 3419
product[s]: 3648
product[s]: 3489
product[s]: 3647
product[s]: 3647
product[s]: 3674
product[s]: 3490
product[s]: 3490
product[s]: 3485
product[s]: 3486
product[s]: 3488
product[s]: 3487
product[s]: 3877
product[s]: 3419
product[s]: 1875
product[s]: 3182
product[s]: 2809
product[s]: 3492
product[s]: 3186
product[s]: 3492
product[s]: 1825
product[s]: 2543 [/code]


The superlist_product and superlist_substrate will encompass all the possible substrates & products in LIST. ie. substrate(LIST) is a subset of superlist_substrate and similarly for product(LIST). Now i want to create a SUPERARRAY as superlist_substrate(rows) X superlist_product(columns). Now parse the LIST for each substrate id one by one insert a "1" for each product id in the SUPERARRAY. For example consider first two lines of LIST

substrates: 3649

products: 3419 3648
So for substrate id 3649 ,the row id=3649 will be selected from SUPERARRAY and a "1" will be inserted at column ids 3419 & 3648 of the SUPERARRAY. And so on for the entire LIST.Basically SUPERARRAY will be a matrix.


(This post was edited by rushadrena on Aug 25, 2012, 8:34 AM)


Laurent_R
Veteran / Moderator

Aug 25, 2012, 10:57 AM

Post #2 of 62 (7669 views)
Re: [rushadrena] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

Hi,

if your list is smaller than the super-arrays, your matrix will be sparce. You probably want to use a hash of hashes and create only the entries that exist in the list (other entries will be undefined).

Something like this (untested):


Code
my %supermatrix; 
open my $DATA, '<', $list_in or die "unable to open my list $list $!\n;
while (my $line = <$DATA>) {
chomp $line;
my $substrate = $1 if $line =~ /substrate.*(\d+)$/;
$line = <DATA>; # fetch next line
my @products;
(undef, @products) = split / /, $line;
foreach my $prod (@products) {
$supermatrix{$substrate}{$prod} = 1;
}
}


At the end of the while loop, you hash of hash is populated with 1's for each existing combination of substrates and products. Non existing combinations will be undefined. When using this data structure you will need to check for existence of a combination. For example, you may have later in your code:


Code
print "combination $substrate1 $product1 exists ! \n" if exists $supermatrix{$substrate1}{$prod1};



rushadrena
Novice

Aug 25, 2012, 11:42 AM

Post #3 of 62 (7662 views)
Re: [Laurent_R] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

Thanks a lot Laurent.
But the problem is that my list would at max be ~5 to 10% smaller than superarrays. Also I need the output in such a format that there are blanks wherever there isnt a "1".
Actually the quality of the output depends on the number of blanks also, because this output will be then compared to other 20 such outputs. So in that
way position of "blanks" and "1" is equally important.


FishMonger
Veteran / Moderator

Aug 25, 2012, 12:47 PM

Post #4 of 62 (7654 views)
Re: [Laurent_R] A file parsing and 2D array/matrix problem. [In reply to] Can't Post


Quote

Code
while (my $line = <$DATA>)  { 
...
...
$line = <DATA>; # fetch next line
...


Here's one good example where you should not use uppercase vars especially when the name conflicts with a built-n global.


Laurent_R
Veteran / Moderator

Aug 25, 2012, 1:22 PM

Post #5 of 62 (7648 views)
Re: [FishMonger] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

Right, I originally thought to put the data in the program (in the data section at the end). I then changed my mind to offer the OP the possibility to read from a file, because I thought it was more convenient to give an example of file opening. And I forgot to change it the second time the file handle is used.

I think I said this was untested code. Quite easy to correct.


Laurent_R
Veteran / Moderator

Aug 25, 2012, 1:32 PM

Post #6 of 62 (7646 views)
Re: [rushadrena] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

Hmmm, you don't seem to realize that a Cartesian product is far more demanding in terms of space allocation than what you think.

If you have, say, 1,000 substrates in your list (and 1,000 products), you end up with at least one million possible combinations of substrate/products (actually more if you can have several products for one substrate). Most of these combination are probably useless. This is why I suggest a sparse matrix modelized with a hash or hashes.


rushadrena
Novice

Aug 25, 2012, 2:32 PM

Post #7 of 62 (7640 views)
Re: [Laurent_R] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

Laurent thanks a lot for all your valuable suggestions and time.
In my case the size of supermatrix is 762 X 680.
And the LIST has 740 substrates and 600 products. Therefore a minimal representation would mean a considerable loss of information, a the representation is very much important for me.Computational resources aren't an issue.
In that respect could you be please suggest a suitable method (which takes care of blanks also)
[EDIT] I would like to save this matrix to a text file.


(This post was edited by rushadrena on Aug 25, 2012, 3:36 PM)


rushadrena
Novice

Aug 25, 2012, 11:16 PM

Post #8 of 62 (7623 views)
Re: [Laurent_R] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

I tried testing your code with some print checks inserted.

Code
 my %supermatrix;  
my $list = "LIST.txt";
open my $DATA, '<', $list or die "unable to open my list $list $!\n";
print "\nWORKING ON $list\n";
print `head $list`;
while (my $line = <$DATA>) {
chomp $line;

my $substrate = 1 if $line =~ /substrate.*(\d+)$/;

$line = <$DATA>; # fetch next line
print "PRODUCT====$line";
my @products;
(undef, @products) = split / /, $line;
foreach my $prod (@products) {
$supermatrix{$substrate}{$prod} = 1;
print "SUBSTRATE ===$substrate";
print "combination $substrate $prod exists ! \n" if exists $supermatrix{$substrate}{$prod};
}
}

I still am worried that it wouldnt help my case of complete matrix.


Laurent_R
Veteran / Moderator

Aug 26, 2012, 1:05 AM

Post #9 of 62 (7621 views)
Re: [rushadrena] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

From what you described, you really don't need a complete matrix. In a complete matrix, perhaps 99% or more of the elements will be 0 and 1% or less will be 1. The 99% are just useless. You only need to know if you have a match or not. For that a sparce matrix is far better. For a specific combination of substrate/product, you only need to know if the element exists.


rushadrena
Novice

Aug 26, 2012, 7:17 AM

Post #10 of 62 (7605 views)
Re: [Laurent_R] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

Im sorry Laurent for any misinterpretations. But in my case ~15% of elements will be zero, because the size of supermatrix is 762 X 680 (518160 elements).
And the LIST has 740 substrates and 600 products (444000 elements).
% blanks = 74160/518160 = 15%.
So I cant afford to have a sparse representation. Moreover I need to create this matrix representation for 10 more such cases( i.e. different LISTS) for the same supermatrix. And all these 10 LISTS have at most 16% of blanks for the supermatrix.
So please help me.
Thanks again.


Laurent_R
Veteran / Moderator

Aug 26, 2012, 8:29 AM

Post #11 of 62 (7600 views)
Re: [rushadrena] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

Not quite right, Rushadrena. From the data examples you gave, each substrate has 1 or 2 products associated with it, not more (at least more often 1 than more than 2). So you don't get at all a Cartesian product between 740 substrates and 600 products (444000 elements), but only the actually existing combinations, i.e., assuming two products per substrate, at most 1480 elements. This is very very far from the 518160 elements of a full matrix, less than 0,3%. This means that more than 99,7% of the full matrix would be unemployed, or, I would rather say, totally useless and worthless.

The other point is that, anyway, from the way you described your problem, all you really care to know if whether a specific substrate/combination exists (where to assign the 1 value), or not. For that, the solution I suggested is totally sufficient. The sparce matrix approach I suggested just contains exactly as much useful information on your data as a full matrix taking more than 300 times more space in memory (and far longer to load).

I'm ready to make one concession, though. You may want to have available a full list of all the possible substrates and a full list of all the possible products, not just those in the input list, so that you can say: although this specific combination of substrate/product does not exists in the input list, it would still be a possible candidate, since both the product and the substrate exists. If you want that, then all what you need is two other simple hashes, one with all the possible 740 substrates and one with the all the possible 600 products. So you would end up with two simple hashes and one hash of hashes, keeping in memory about 3,000 elements, still very far less than 500 k-elements. These two hashes give you a virtual Cartesian product of all possibilities, but you never have to compute the actual Cartesian product.

But your description of the problem is an extremely strong indication that a sparce matrix is really exactly what you need. And an hash of hashes is the ideal data structure to store that, because you need just one (pretty fast) line of code to retrieve the information you need (i.e. whether a given substrate/combination exsists in the input data).

I hope I am being clear in my explanations. I work a lot on quite similar problems, the one thing you want to avoid, especially when the volume of data grows, is the quadratic burden of a full Cartesian product (or, even worse, an exponential or factorial explosion of possibilities). Some of the problems I work on at my job can be solved within a few hours of computation with various things similar to the sparce matrix approach described above, but would probably not have the time to finish by the final explosion of the sun and the end of the solar system if we were to try to compute all the possibilities in a super-matrix approach.

One last example. My company has a database with about 35 million customers and about a million possible products and services. What is stored in the database, is the list of services (usually 5 to 20) actually subscribed by the customer. Not a "super-matrix" of all possible customer-service combination s, with 0 and 1 to record if the service has been subscribed or not by the customer. This "supermatrix" would have 35,000 billion elements and would take ages to query and require disk space that I can't even imagine. What a standard business-oriented database (e.g., Oracle) does is, in effect, is to implement a slightly more complicated version of the sparce matrix approach I have described.


(This post was edited by Laurent_R on Aug 26, 2012, 8:32 AM)


Chris Charley
User

Aug 26, 2012, 8:57 AM

Post #12 of 62 (7596 views)
Re: [rushadrena] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

I've been following this thread and some questions occurred to me. This would be such a large matrix that you would lose the header information if you had to scroll down anumber of rows. Likewise, if you scrolled over to read the columns, you would lose the substrate in the first left column.

Here is a sparse table created from the sample LIST.txt file. It lists only the combinations seen, not the 'super' matrix you could create from the SUPERSUB and SUPERPROD lists.


Code
 prod-> 1875 2809 3182 3419 3485 3486 3487 3488 3489 3490 3647 3648 3674 3877 
2809 - - 1 - - - - - - - - - - -
3182 1 - - - - - - - - - - - - -
3186 - 1 - - - - - - - - - - - -
3485 - - - - - 1 - - - - - - - -
3486 - - - - - - - 1 - - - - - -
3487 - - - - - - - - - - - - - 1
3488 - - - - - - 1 - - - - - - -
3489 - - - - - - - - - 1 - - - -
3490 - - - - 1 - - - - - - - - -
3645 - - - - - - - - - - 1 - - -
3649 - - - 1 - - - - - - - 1 - -
3659 - - - - - - - - - - 1 - - -
3674 - - - - - - - - 1 1 - - - -
3675 - - - - - - - - - - - - 1 -
3877 - - - 1 - - - - - - - - - -

^
|
substrate


Would it be better to create a comma separated file, where it could be opened by a spreadsheet program like Excel? Those programs can 'freeze' the column/row headers so you can easily scroll and still keep them visible.


Laurent_R
Veteran / Moderator

Aug 26, 2012, 10:09 AM

Post #13 of 62 (7592 views)
Re: [Chris Charley] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

Yes, you could easily generate a CSV file from the hash of hashes for importing the data under a speadsheet. Or, better yet, you could use a CPAN module to write directly a speadsheet file. For example: Spreadsheet-WriteExcel, Spreadsheet::Write, Spreadsheet::SimpleExcel, etc.

I am happy that you presented the data in such a tabular form, Chris, as it will show graphically to Rushadrena how sparse the data actually is. This example has 210 element holders, and only 17 of them are really useful, already less than 10%. And the more you add data, the more the ratio between just empty places and actually useful elements becomes large.


rushadrena
Novice

Aug 27, 2012, 2:14 AM

Post #14 of 62 (7571 views)
Re: [Laurent_R] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

Laurent and Chris , Thanks a ton for sharing practical views and extensively exploring other realms of the problem space.
Yes the supermatrix will be very very sparse. But let me add the last element to the problem posed here by me.
I need to create 10 such supermatrices and concatenate them taking two at a time. For instance till now I'm able to create a text file for each of these 10 supermatrices.
Now there's the last piece of puzzle. I have created 10 such matrices (with obviously same number of rows and column).
Now the problem is that I have to concatenate (OR logic operation) two such matrices,
INPUT = Two matrices A,B (each saved in separate text files) of same row and column
OUTPUT = A single matrix C ( C[j] = A[j] OR B[j] )

Code
==============INPUT======= 
MAT - A
1875 2809 3182 3419
2809 - 1 1 -
3182 1 - - -
3186 1 1 - -
3485 - - - -
3486 - - - -

MAT - B
1875 2809 3182 3419
2809 1 - - 1
3182 - - - -
3186 - 1 1 -
3485 - - - -
3486 - 1 - 1


========== OUTPUT===========
MAT - C
1875 2809 3182 3419
2809 1 1 1 1
3182 1 - - -
3186 1 1 1 -
3485 - - - -
3486 - 1 - 1


I.e. an element of matrix will be one if either of the corresponding element of A or B is one.


(This post was edited by rushadrena on Aug 27, 2012, 2:20 AM)


Laurent_R
Veteran / Moderator

Aug 27, 2012, 7:33 AM

Post #15 of 62 (7557 views)
Re: [rushadrena] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

It is still very easy to concatenate two sparse matrices.

You just need to copy each element of matrix A into matrix B, and matrix B will be the concatenation of the two matrices.

BTW, the matrices don't necessarily have to have the same size.


rushadrena
Novice

Aug 27, 2012, 12:58 PM

Post #16 of 62 (7546 views)
Re: [Laurent_R] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

Is there a way I can read a matrix from a text file into perl, so as to access each element one by one. This is what I have tried .Though it reads the matrix from file and prints it as it is but Im not able to access elements one by one.

Code
use strict; 
use warnings;

open DATA, "matrix.txt" or die $!;
chomp( my @lines = <DATA> );
foreach (@lines) {
print "$_\n";
}

=====CONTENTS of matrix.txt======
1 2 3 4

1 5 6 8

1 7 8 0


Chris Charley
User

Aug 27, 2012, 6:25 PM

Post #17 of 62 (7534 views)
Re: [rushadrena] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

You might not know, but to pair every one of the 10 matrices with each other will create another 45 matrices. That is because the number of unique pairings is a combination of 10 items, 2 at a time.

(10 * 9) / (2 * 1) = 45

So you will have a total of 55 matrices. And you will probably want each in its own file, (or perhaps not).

Here is some output from what I worked on. Its ok for small tables, but I don't think you will be able to view a 400 column matrix that way. You should probably create a comma separated values file, (.csv), to be read by a program like Excel, which reads those files.


Code
C:\Old_Data\perlp>perl t1.pl 
Processing file: junk.txt
prod-> 1875 2809 3182 3419 3485 3486 3487 3488 3489 3490 3647 3648 3674 3877
2809 - - 1 - - - - - - - - - - -
3182 1 - - - - - - - - - - - - -
3186 - 1 - - - - - - - - - - - -
3485 - - - - - 1 - - - - - - - -
3486 - - - - - - - 1 - - - - - -
3487 - - - - - - - - - - - - - 1
3488 - - - - - - 1 - - - - - - -
3489 - - - - - - - - - 1 - - - -
3490 - - - - 1 - - - - - - - - -
3645 - - - - - - - - - - 1 - - -
3649 - - - 1 - - - - - - - 1 - -
3659 - - - - - - - - - - 1 - - -
3674 - - - - - - - - 1 1 - - - -
3675 - - - - - - - - - - - - 1 -
3877 - - - 1 - - - - - - - - - -

^
|
substrate

Processing file: another.txt
prod-> 1875 2809 3182 3248 3374 3390 3419 3485 3486 3487 3488 3490 3641 3645 3877
2809 - - 1 - - - - - - - - - - - -
3182 1 - - - - - - - - - - - - - -
3186 - 1 - - - - - - - - - - - - -
3287 - - - - - - - - - - - - - - 1
3489 - - - - - - - - 1 - - - - - -
3490 - - - - - - - 1 - - - - - - -
3491 - - - - - - - - - - - 1 - - -
3499 - - - - - - - - - 1 - - - - -
3609 - - - - - - - - - - - - - 1 -
3645 - - - - - - - - - - - - 1 - -
3647 - - - 1 - - 1 - - - - - - - -
3674 - - - - - 1 - - - - - 1 - - -
3685 - - - - 1 - - - - - - - - - -
3877 - - - - - - 1 - - - - - - - -
3986 - - - - - - - - - - 1 - - - -

^
|
substrate

Combining junk.txt and another.txt
prod-> 1875 2809 3182 3248 3374 3390 3419 3485 3486 3487 3488 3489 3490 3641 3645 3647 3648 3674 3877
2809 - - 1 - - - - - - - - - - - - - - - -
3182 1 - - - - - - - - - - - - - - - - - -
3186 - 1 - - - - - - - - - - - - - - - - -
3287 - - - - - - - - - - - - - - - - - - 1
3485 - - - - - - - - 1 - - - - - - - - - -
3486 - - - - - - - - - - 1 - - - - - - - -
3487 - - - - - - - - - - - - - - - - - - 1
3488 - - - - - - - - - 1 - - - - - - - - -
3489 - - - - - - - - 1 - - - 1 - - - - - -
3490 - - - - - - - 1 - - - - - - - - - - -
3491 - - - - - - - - - - - - 1 - - - - - -
3499 - - - - - - - - - 1 - - - - - - - - -
3609 - - - - - - - - - - - - - - 1 - - - -
3645 - - - - - - - - - - - - - 1 - 1 - - -
3647 - - - 1 - - 1 - - - - - - - - - - - -
3649 - - - - - - 1 - - - - - - - - - 1 - -
3659 - - - - - - - - - - - - - - - 1 - - -
3674 - - - - - 1 - - - - - 1 1 - - - - - -
3675 - - - - - - - - - - - - - - - - - 1 -
3685 - - - - 1 - - - - - - - - - - - - - -
3877 - - - - - - 1 - - - - - - - - - - - -
3986 - - - - - - - - - - 1 - - - - - - - -

^
|
substrate



rushadrena
Novice

Aug 28, 2012, 1:10 AM

Post #18 of 62 (7525 views)
Re: [Chris Charley] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

Dear Chris,
Actually I dont need to visualize the matrix I just need to process it as it is and for the purposes text file is sufficient.
Chris could you pass on the code (t1.pl) you have written to achieve the OR of two matrix. That would be really helpful.


Laurent_R
Veteran / Moderator

Aug 28, 2012, 4:40 AM

Post #19 of 62 (7512 views)
Re: [rushadrena] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

Hi,

to access the individual elements:

Code
 use strict;   
use warnings;

my $matrix = "matrix.txt";
open my $fh, "<", $matrix or die "unable to open $matrix $! \n";
chomp( my @lines = <DATA> );
foreach my $line (@lines) {
my @fields = split / /, $line;
print "$_\n" foreach @fields;
}


With your data:


Code
1 2 3 4  
1 5 6 8
1 7 8 0


this prints:


Code
 perl matrix.pl  
1
2
3
4
1
5
6
8
1
7
8
0



(This post was edited by Laurent_R on Aug 28, 2012, 4:41 AM)


Chris Charley
User

Aug 28, 2012, 8:49 AM

Post #20 of 62 (7498 views)
Re: [rushadrena] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

Giving you a solution without you working through it will not help you learn. I suspect this is a school assignment and I won't help you cheat your teacher.

Even if its not a school assignment, I'm not sure you would understand the code.

The problem is not difficult. The code you wrote(?) to create 'MAT -A' and 'MAT - B' should be part of the solution.

Post the code that created these 2 matrices, and then see if it can be modified to merge the 2 tables.


rushadrena
Novice

Aug 28, 2012, 10:25 AM

Post #21 of 62 (7494 views)
Re: [Chris Charley] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

Hi there,
Guys it ain't any school assignment but related to my research work. Here's the code I have developed for creating the matrices

Code
use Modern::Perl; 
use File::Slurp qw/read_file/;
use Text::Table;
use Data::Dumper;

my ( %supermatrix, @titles, %seen, @rows );

my @list = read_file 'LIST.txt';

for ( my $i = 0 ; $i < $#list + 1 ; $i += 2 ) {
my ($substrateID) = $list[$i] =~ /(\d+)/g;
$supermatrix{$substrateID}{$1} = 1 while $list[ $i + 1 ] =~ /(\d+)/g;
}

for my $product ( read_file 'SUPERLIST_PRODUCT.txt' ) {
my ($productID) = $product =~ /(\d+)/g;
push @titles, $productID unless $seen{$productID}++;

for my $substrate ( read_file 'SUPERLIST_SUBSTRATE.txt' ) {
my ($substrateID) = $substrate =~ /(\d+)/g;
$supermatrix{$substrateID}{$productID} //= '.';
}
}

my $titles = join ',',
map "{title => 'p$_', align_title => 'center', align => 'center'}",
sort { $a <=> $b } @titles;

for my $y ( sort { $a <=> $b } keys %supermatrix ) { #rows
my ( $rowLable, @row );

for my $x ( sort { $a <=> $b } keys %{ $supermatrix{$y} } ) {
#columns
$rowLable = $y unless $rowLable;
push @row, $supermatrix{$y}{$x};
}
push @rows, [ "s$rowLable", @row ];
}

my $tb = Text::Table->new( ' ', eval $titles );
$tb->load(@rows);
say $tb;

say "\n", Dumper \%supermatrix;

Chris the OR combination code for matrices(the one you have written) is the last thing I need.


Chris Charley
User

Aug 29, 2012, 7:20 AM

Post #22 of 62 (7466 views)
Re: [rushadrena] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

I wasn't able to configure the code you posted, so I am posting some solution I got,

Code
#!/usr/bin/perl 
use strict;
use warnings;

my @matrix;
my @file = qw/junk.txt another.txt/;

for my $file ( @file ) {
my %data;
my $sub;

open my $fh, "<", $file
or die "Unable to open $file for reading. $!";

while (<$fh>) {
if (/substrate\D+(\d+)$/) {
$sub = $1;
}
else { # get the product(s)
$data{$sub}{$_} = 1 for /\d+/g;
}
}
close $fh or die "Unable to close $file. $!";

print "Processing file: $file\n";
process(%data);

push @matrix, \%data;
}

for my $i (0 .. $#matrix) {
for my $j ($i+1 .. $#matrix) {
print "Combining $file[$i] and $file[$j]\n";
my %data = combine($matrix[$i], $matrix[$j]);
process(%data);
}
}

sub process {
my %data = @_;
my %seen;
my @product = sort {$a <=> $b}
grep ! $seen{$_}++,
map keys %$_, values %data;

printf "%7s" . "%5s" x @product . "\n", 'prod->', @product;

for my $substrate (sort {$a <=> $b} keys %data) {
printf "%7s", $substrate;
printf "%5s", $data{$substrate}{$_} || '-' for @product;
print "\n";
}
printf "\n%5s\n%5s\n%s\n\n", '^', '|', 'substrate';
}

sub combine {
my ($matrix1, $matrix2) = @_;
my %new_hash = %$matrix1;

for my $substrate (keys %$matrix2) {
$new_hash{$substrate}{$_} = 1 for keys %{ $matrix2->{$substrate} };
}
return %new_hash;
}



Laurent_R
Veteran / Moderator

Aug 29, 2012, 10:43 AM

Post #23 of 62 (7454 views)
Re: [rushadrena] A file parsing and 2D array/matrix problem. [In reply to] Can't Post


Rushadrena,

your idea of storing the data in a file in tabular form is probably wrong.

It is far easier to store direcly the hash of hashes structure using Data::Dumper. Then, you only have to open the file, slurp its content and use eval on it to recreate the hash.

And combining two hashes the way you want to do is very easy, it only takes three lines of code (as shown in Chris Charley's code suggestion.

Final point, I can see from your code that you're still trying to build the complete matrix instead of a sparse one, this is simply the wrong approach, it takes more space, it takes more time, and it takes more code.


FishMonger
Veteran / Moderator

Aug 29, 2012, 11:23 AM

Post #24 of 62 (7450 views)
Re: [Laurent_R] A file parsing and 2D array/matrix problem. [In reply to] Can't Post


Quote
It is far easier to store direcly the hash of hashes structure using Data::Dumper. Then, you only have to open the file, slurp its content and use eval on it to recreate the hash.

I'd suggest using the Storable module for that step instead of Data::Dumper and an eval statement.
http://search.cpan.org/~ams/Storable-2.35/Storable.pm


Laurent_R
Veteran / Moderator

Aug 29, 2012, 11:42 AM

Post #25 of 62 (7446 views)
Re: [FishMonger] A file parsing and 2D array/matrix problem. [In reply to] Can't Post

Yes, right, the Storable module is probably even easier to use.

First page Previous page 1 2 3 Next page Last page  View All
 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives