CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
splicing dependent on variating offset

 



lecb
New User

Jun 14, 2014, 4:55 AM

Post #1 of 2 (5148 views)
splicing dependent on variating offset Can't Post

Hi there,

I have a flat text file containing columns of information. The first column contains a sequence of 50 letters. I'm trying to pull out 20 letters from each line - this 20 seq is essentially 10 letters either side of a particularly boundary, but the boundary varies for each line. I don't have a value for the boundary, but I do have how far away from the boundary I am.

In the third column I'm trying to extract a value (either +/-) that tell me how far away from the boundary I am. Minus numbers mean to the left and positive refer to the right. From that, I can then work out corrected left boundary and count 20 letters from there (using splicing) to extract the correct 20 for each line.

The problem I have is that I don't think I have split my columns correctly.. I don't seem to be able to pull out this 20 sequence for every line of the first column. I am not sure if it because I need to run 2 arrays in parallel, or that I am not actually pulling out what I need. Perhaps you could help me.

Please be warned that my scripting style is not succinct - at least not yet.

Input text file is attached.
Many thanks,
E


Code
#!/usr/bin/perl -w 
use strict;

my $inputfile1 = $ARGV[0];

open (FILE1, $inputfile1) or die "Unable to find file $inputfile1"; ##Opens input file


my @file1 = <FILE1>; #loads inputfile1 data into array
close FILE1;


my @matches;
foreach my $file1 (@file1) {
if($file1 =~ m/splic/) {
push (@matches, $file1); ##loads matches into array @matches
}
}

my @col1; ## column 1
my @col3; ## column 3
foreach my $match(@matches) { ## process each line, splitting columns and move onto next line
my @colsplit = split("\t", $match);
push (@col3, $colsplit[2] . "\n"); ##pushes third column to @col3 array
push (@col1, $colsplit[0] . "\n");

}



my @intron_from_boundary;
my @baseref;


foreach my $col3line(@col3) {
if ($col3line =~ m/([\+|\-]\d+)\w+(\[[ACTG]])/) { ##pulls out + or - and subsequent number and [A]
push (@intron_from_boundary, $1 . "\n"); ##$1 pushes what is in the first set of brackets
push (@baseref, $2 . "\n");
}
}




print "@intron_from_boundary" . "@baseref" . "\n";


## need to take each intronmatch value and work out its position relative to intron/exon boundary


my @new_r_boundary;
my @new_l_boundary;
my $left_of_boundary;
my $right_of_boundary;
my $intron_from_boundary;

## split seq of @col1 into array

my @col1split;

foreach my $col1(@col1) {
@col1split = split(//, $col1);
}


my $i;
foreach my $lines(@col1split) {
$i = 0;
##for -7:

$left_of_boundary = 10; ##10 to the left
$right_of_boundary = 10;

$left_of_boundary = $left_of_boundary + $intron_from_boundary[$i]; ##3 to the left
$right_of_boundary = $right_of_boundary - $intron_from_boundary[0];


my $new_left = 23 - $left_of_boundary; ## 20


my @spliceout = splice @col1split, $new_left, 22; ##want to pull out 3 letters to left of [G] and 16 to the right

$i++;

print "@spliceout" . "\n";
}

Attachments: input.txt (0.56 KB)


lecb
New User

Jun 14, 2014, 6:31 AM

Post #2 of 2 (5076 views)
Re: [lecb] splicing dependent on variating offset [In reply to] Can't Post

I figured it out! For anyone interested:


Code
#!/usr/bin/perl -w 
use strict;

my $inputfile1 = $ARGV[0];

open (FILE1, $inputfile1) or die "Unable to find file $inputfile1"; ##Opens input file


my @file1 = <FILE1>; #loads inputfile1 data into array
close FILE1;


my @matches;
foreach my $file1 (@file1) {
if($file1 =~ m/splic/) {
push (@matches, $file1); ##loads matches into array @matches
}
}

my @col1; ## column 1
my @col3; ## column 3
foreach my $match(@matches) { ## process each line, splitting columns and move onto next line
my @colsplit = split("\t", $match);
push (@col3, $colsplit[2] . "\n"); ##pushes third column to @col3 array
push (@col1, $colsplit[0] . "\n");

}



my @intron_from_boundary;
my @baseref;


foreach my $col3line(@col3) {
if ($col3line =~ m/([\+|\-]\d+)\w+(\[[ACTG]])/) { ##pulls out + or - and subsequent number and [A]
push (@intron_from_boundary, $1 . "\n"); ##$1 pushes what is in the first set of brackets
push (@baseref, $2 . "\n");
}
}




print "@intron_from_boundary" . "@baseref" . "\n";


## need to take each intronmatch value and work out its position relative to intron/exon boundary

my $left_of_boundary;
my $right_of_boundary;
my $intron_from_boundary;

## split seq of @col1 into array

my $i = 0;
foreach my $col1(@col1) {
my @col1split = split(//, $col1);


##for -7:

$left_of_boundary = 10; ##10 to the left
$right_of_boundary = 10;

$left_of_boundary = $left_of_boundary + $intron_from_boundary[$i]; ##3 to the left

my $new_left = 23 - $left_of_boundary; ## 20

my @spliceout = splice @col1split, $new_left, 22; ##want to pull out 3 letters to left of [G] and 16 to the right

print "@spliceout" . "\n";

++$i;
}


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives