CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
partial Match and extract

 



birdview
Novice

Mar 27, 2014, 2:19 AM

Post #1 of 8 (2549 views)
partial Match and extract Can't Post

Hi all,
I have files

file1 (list)
let-7a-1
mir-100
mir-204-3
mir-130c-2
mir-135c-1

file2 (precursor)
>let-7a-1_GeneScaffold_1068dna:genescaffoldgenescaffold:or1:GeneScaffold_1068:1:184962:1REF:125983..126049:+
>let-7a-2_contig883631dna:contigcontig::contig883631:1:3604:1REF:2613..2684:-
>mir-100_contig363446dna:contigcontig::contig363446:1:623:1REF:122..177:-
>mir-130a_GeneScaffold_2375dna:genescaffoldgenescaffold:or1:GeneScaffold_2375:1:349505:1REF:161313..161392:-
>mir-135c-1_GeneScaffold_1609dna:genescaffoldgenescaffold:or1:GeneScaffold_1609:1:1318837:1REF:1266799..1266862:+
>mir-130c_GeneScaffold_3571dna:genescaffoldgenescaffold:or1:GeneScaffold_3571:1:877190:1REF:112284..112343:-



I want to have an output like this
output
let-7a-1 GeneScaffold_1068dna:genescaffoldgenescaffold:or1:GeneScaffold_1068:1:184962:1REF 125983 126049 +
mir-100 contig363446dna:contigcontig::contig363446:1:623:1REF 122 177 -
mir-135c-1 GeneScaffold_1609dna:genescaffoldgenescaffold:or1:GeneScaffold_1609:1:1318837:1REF 1266799 1266862 +

I tried to write this script but I got errors. I appreciate your help

Code
#!/usr/bin/perl 

use warnings;
use strict;

open my $list, "<", "list" or die "cannot open input list: $!";
open my $fh, "<", "precursor" or die "cannot open input pre: $!";

while (my $l = <$list>){
chomp $l;
my ($miRNA) = "$l";
}
while ( my $line = <$fh> ){
chomp $line;
my ($header) = "$line";
my ($miRNAName, $scaffoldName, $contigName, $position) = split /_/, $header;
my ($length, $coordinate) = split /:REF:/, $position;
my ($start, $end) = split /../, $coordinate;
my($strand) = split /:/, $end;

}
if ($miRNAName =~ m/($miRNA)/)
my ($newHeader) = join($miRNAName $scaffoldName'_'$contigName'_'$length':REF' $start $strand);

print "$newHeader\n";

close my $list;
close my $fh;



(This post was edited by FishMonger on Mar 27, 2014, 6:00 AM)


FishMonger
Veteran / Moderator

Mar 27, 2014, 8:48 AM

Post #2 of 8 (2530 views)
Re: [birdview] partial Match and extract [In reply to] Can't Post


Quote
I tried to write this script but I got errors.

You should always include the errors/warnings in your post.


Code
#!/usr/bin/perl  

use warnings;
use strict;

open my $list, "<", "list" or die "cannot open input list: $!";
open my $fh, "<", "precursor" or die "cannot open input pre: $!";

Very good so far, execpt I'd add the filename to the die statements.


Code
while (my $l = <$list>){  
chomp $l;
my ($miRNA) = "$l";
}

What did you expect that code to do?

Both of your while loops have a problem with the scoping of the vars. In both loops you're declaring/assigning lexical vars that are only available inside the loops, but you're trying to use them outside of those loops. That will be to source of most of the errors you're receiving.

Your first while loop should be building a hash which needs to be declared prior to the loop so that you can use that data later in the script.

Your second while loop should strip off the leading > char and then split the line into 2 scalars. The first one is then used to see if it's a key in the hash built from the first loop. If it is, then output the data.


birdview
Novice

Mar 27, 2014, 10:26 AM

Post #3 of 8 (2527 views)
Re: [FishMonger] partial Match and extract [In reply to] Can't Post

Thanks FishMonger
My expectation was to take list of string from file1 (list) and search it in file2 (precursor) and in case of match the script should take the whole line and make tab separated line.
For example,
File1 (list) has let-7a-1, if this string is found in file2 (precursor) as in the following line:
>let-7a-1_GeneScaffold_1068dna:genescaffoldgenescaffold:or1:GeneScaffold_1068:1:184962:1REF:125983..126049:+

then the out put should look like this:

let-7a-1 GeneScaffold_1068dna:genescaffoldgenescaffold:or1:GeneScaffold_1068:1:184962:1REF 125983 126049 +

search pattern and remove stuffs such as _, : and .. from the line and put the rest in tab delimited line.
Thanks


(This post was edited by birdview on Mar 27, 2014, 10:39 AM)


BillKSmith
Veteran

Mar 27, 2014, 1:04 PM

Post #4 of 8 (2515 views)
Re: [birdview] partial Match and extract [In reply to] Can't Post

Your sample precursor data is not separated into fields the way you code expects. (e.g. Lines 2 and 3 each contain only one underscore.) We cannot fix the code without a much better idea of what you are trying to do.
Good Luck,
Bill


birdview
Novice

Mar 28, 2014, 8:29 AM

Post #5 of 8 (2402 views)
Re: [BillKSmith] partial Match and extract [In reply to] Can't Post

Thanks BillKSmith, what I am trying to do is that as you can see below, I am taking a string from file1(list) and match it to file2(precursor), then parse the matching line in tab separated line. The problem is that I cannot show tab separation in the post, it ends up as space. The tab characters in the output file should be:
1) between let-7a-1 and GeneScaffold
2) between 1REF and the first number following it
3) replace the two dots (..)
4) replace the last colon (:)


e.g. if let-7a-1 is found in one/more of the line of file2 parse both lines as follows

let-7a-1 GeneScaffold_1068dna:genescaffoldgenescaffold:or1:GeneScaffold_1068:1:184962:1REF 125983 126049 +


INFILE
file1 (list)

Code
let-7a-1 
mir-100
mir-130c
mir-135c-1


file2 (precursor)

Code
>let-7a-1_GeneScaffold_1068dna:genescaffoldgenescaffold:or1:GeneScaffold_1068:1:184962:1REF:125983..126049:+ 
>let-7a-2_contig883631dna:contigcontig::contig883631:1:3604:1REF:2613..2684:-
>mir-100_contig363446dna:contigcontig::contig363446:1:623:1REF:122..177:-
>mir-130a_GeneScaffold_2375dna:genescaffoldgenescaffold:or1:GeneScaffold_2375:1:349505:1REF:161313..161392:-
>mir-135c-1_GeneScaffold_1609dna:genescaffoldgenescaffold:or1:GeneScaffold_1609:1:1318837:1REF:1266799..1266862:+
>mir-130c_GeneScaffold_3571dna:genescaffoldgenescaffold:or1:GeneScaffold_3571:1:877190:1REF:112284..112343:-


OUTFILE
output

Code
let-7a-1                         GeneScaffold_1068dna:genescaffoldgenescaffold:or1:GeneScaffold_1068:1:184962:1REF                125983               126049                 + 
mir-100 contig363446dna:contigcontig::contig363446:1:623:1REF 122 177 -
mir-135c-1 GeneScaffold_1609dna:genescaffoldgenescaffold:or1:GeneScaffold_1609:1:1318837:1REF 1266799 1266862 +



(This post was edited by FishMonger on Mar 28, 2014, 8:53 AM)


FishMonger
Veteran / Moderator

Mar 28, 2014, 8:54 AM

Post #6 of 8 (2391 views)
Re: [birdview] partial Match and extract [In reply to] Can't Post

Please use the code tags around all code and examples that require the formatting to be retained.

I've updated your post and added the code tags for you. Please use then in future posts.


BillKSmith
Veteran

Mar 28, 2014, 11:34 AM

Post #7 of 8 (2345 views)
Re: [birdview] partial Match and extract [In reply to] Can't Post

The following code fails only for line 3. It matches one of the names, but does not conform to the format.

The only major change to you code is to abort a record if it does not match any of the names. This required building the regular expression rather than coding it in the script. Note: I intentionally output pipe (|) characters rather than tabs so you can see them.


Code
#!/usr/bin/perl  
use warnings;
use strict;

open my $list, "<", "list" or die "cannot open input list: $!";
my $miRNA = join '|', <$list>;
close $list;

$miRNA =~ s/\s*//g;
$miRNA = qr/$miRNA/;

open my $fh, '<', 'precursor' or die "cannot open input pre: $!";
while ( <$fh> ) {
chomp;
next if !/$miRNA/;
my $header = $_;
my( $miRNAName, $scaffoldName, $contigName, $position )
= split /_/, $header;
my( $length, $coordinate ) = split /:1REF:/, $position;
my( $start, $end ) = split /\.\./, $coordinate;
(my $strand) = split /:/, $end;

my $newHeader
= "${miRNAName}\|${scaffoldName}_${contigName}_${length}:1REF\|"
. "${start}\|${strand}\|+";

print "$newHeader\n";
}
close $fh;

Good Luck,
Bill


birdview
Novice

Mar 31, 2014, 3:13 AM

Post #8 of 8 (2020 views)
Re: [BillKSmith] partial Match and extract [In reply to] Can't Post

Thanks Bill:)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives