CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
string match question

 



rpfujiw
New User

Jan 24, 2010, 11:03 PM

Post #1 of 2 (523 views)
string match question Can't Post

I have this file this format:

CO Contig1 <SOME DATA ON MULTIPLE LINES>

RD <SOME DATA ON MULTIPLE LINES>

QT <SOME DATA ON MULTIPLE LINES>

CT <SOME DATA ON MULTIPLE LINES>

CO Contig2 <SOME DATA ON MULTIPLE LINES>

RD <SOME DATA ON MULTIPLE LINES>

QT <SOME DATA ON MULTIPLE LINES>

CT <SOME DATA ON MULTIPLE LINES>

etc...

Ideally I want to separate the data found under CO and then call on the data found under RD heading under the respective CO heading. I was thinking that some kind of pattern matching between the CO and CT headings. Here's what I have so far.


Code
#!/usr/bin/perl 

# use strict;
use warnings;
use Getopt::Long;

my ($acefile, $contig, $position);

GetOptions (
'-ace=s' => \$acefile,
'-cont=s' => \$contig,
'-pos=s' => \$position,
);

my $usage = "$0 -ace <acefile> -cont <contig> -pos <position>\n";
while ($acefile eq '')
{
print $usage and exit;
}
while ($contig eq '')
{
print $usage and exit;
}
while ($position eq '')
{
print $usage and exit;
}

# get acefile data

sub get_acefile_data {

shift;
#initialize variables
unless( open(GET_ACEFILE_DATA, $acefile)) {
print STDERR "Cannot open file \"$acefile\"\n\n";
exit;
}

my @filedata = <GET_ACEFILE_DATA>;
close GET_ACEFILE_DATA;

my @contig_data = grep (s/CO $contig(\S+)CO|CT/, @filedata);
my @read_data = grep (/RD(\S+)CO|CT/, @contig_data);
print "@read_data";
}

#find contig an put into array







&get_acefile_data;



Any help would be greatly appreciated thanks.


7stud
Enthusiast

Jan 25, 2010, 4:38 AM

Post #2 of 2 (514 views)
Re: [rpfujiw] string match question [In reply to] Can't Post


Code
use warnings; 
use 5.010;

my $data =<<ENDOFDATA;
CO Contig1 <SOME DATA ON MULTIPLE LINES>

RD <SOME DATA ON MULTIPLE LINES>

QT <SOME DATA ON MULTIPLE LINES>

CT <SOME DATA ON MULTIPLE LINES>

CO Contig2 <SOME DATA ON MULTIPLE LINES>

RD <SOME DATA ON MULTIPLE LINES>

QT <SOME DATA ON MULTIPLE LINES>

CT <SOME DATA ON MULTIPLE LINES>
ENDOFDATA



while ($data =~ /(CO.+?>).*?(RD.+?>)/sg) {
say "----\n$1\n$2\n-----";
}

--output:--
----
CO Contig1 <SOME DATA ON MULTIPLE LINES>
RD <SOME DATA ON MULTIPLE LINES>
-----
----
CO Contig2 <SOME DATA ON MULTIPLE LINES>
RD <SOME DATA ON MULTIPLE LINES>
-----



(This post was edited by 7stud on Jan 25, 2010, 4:39 AM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives