CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Creating a Sequence 'Array'

 



SteelyDan
New User

Jul 10, 2013, 6:17 AM

Post #1 of 9 (1347 views)
Creating a Sequence 'Array' Can't Post

Hey guys, I'm brand new to the PERL language (all programming languages for that matter) and I need to use it to create a script that will easily do the following for me:

In my summer project, I need to take a long sequence of DNA, which is about 3000 characters long

(i.e. AGCCTAGTTAGCCCGAGCTCGGAGCGAGCTATGAG... etc)

And split it up into 21 character long fragments, but each fragment must be shifted over one character. So for example, if I had the sequence

AAATTTAAATTTXAAATTTAAATTTAAATTT

The sequences I would need would be

AAATTTAAATTTAAATTTAAA
AATTTAAATTTAAATTTAAAT
ATTTAAATTTAAATTTAAATT
TTTAAATTTAAATTTAAATTT
TTAAATTTAAATTTAAATTTX
TAAATTTAAATTTAAATTTXA

...and so on.

(I used the X as a reference point so you could see how I need to shift each segment over by one nucleotide)

Then, once all of these are generated, I need to put a little id tag above each of them so it looks like

>Segment_1
AAATTTAAATTTAAATTTAAA
>Segment_2
AATTTAAATTTAAATTTAAAT
>Segment_3
ATTTAAATTTAAATTTAAATT

... and so on.

Then I would save this file and proceed to use it.

Now, by NO MEANS am I asking someone to make this script for me, I just don't really know where to start.

Thanks for any help.

Nick


Laurent_R
Veteran / Moderator

Jul 10, 2013, 9:33 AM

Post #2 of 9 (1333 views)
Re: [SteelyDan] Creating a Sequence 'Array' [In reply to] Can't Post

I think I would use the substr function, shifting by 1 each time it is called.


Code
my @seq: 
my $sequence = "AGCCTAGTTAGCCCGAGCTCGGAGCGAGCTATGAG";
for (my $i=0;$ i<5; $i++) {
$seq[$i] = substr $sequence, $i, 21;
}

the @seq array now contains:


Code
0  'AGCCTAGTTAGCCCGAGCTCG' 
1 'GCCTAGTTAGCCCGAGCTCGG'
2 'CCTAGTTAGCCCGAGCTCGGA'
3 'CTAGTTAGCCCGAGCTCGGAG'
4 'TAGTTAGCCCGAGCTCGGAGC'


I suppose that's what you want.


BillKSmith
Veteran

Jul 10, 2013, 10:16 AM

Post #3 of 9 (1333 views)
Re: [SteelyDan] Creating a Sequence 'Array' [In reply to] Can't Post

The very first thing you should do is learn to use CPAN then search for modules which you may be able to use. Be sure to check out the string processing functions as well as the BIO section. You might get lucky and find that someone else has already done your project! Even if you do not find a single piece of useful code, the effort is rarely wasted. You will learn subtleties such as name conventions and things to avoid, not to mention a good start for your next project. If you do not understand the documentation for a module, ask for help here.

You did not say how you get your original string. It probably comes in a file. Does it have any header info? If so, how can you tell where the data starts (and ends)? Is it broken into records? Are you sure that the length of the data portion is an exact multiple of twenty-one? What should you do if it is not? What if the data contains unexpected characters?

You plan to compute and store all twenty-one circular permutations of each substring. It might be better to compute each permutation as you need it. You probably should not decide until you have some idea how much time and memory each approach will require.

I do not understand "... and so on." I assume that 1 through 21 refer to the permutations of the first substring. Does 22 refer to the first permutation of the second substring?


There are two ways to manipulate strings in perl. You can work on them directly with the functions index, length, substr, pack, and unpack and with regular expressions. Or, you can split them into an array of characters and then use push, pop, shift, unshift, and splice. In your case, I recommend the first.
Good Luck,
Bill


SteelyDan
New User

Jul 10, 2013, 10:31 AM

Post #4 of 9 (1330 views)
Re: [BillKSmith] Creating a Sequence 'Array' [In reply to] Can't Post

Awesome!

Thanks so much to the both of you!

I now know what CPAN is, and can spend some time messing around to create my first program.

I have been reading up on how PERL works with beginner's books and these posts really helped.

I think what I needed was a couple of pros to highlight the particular functions I would need to use for this.

Thank you very much!
Nick


Laurent_R
Veteran / Moderator

Jul 10, 2013, 4:09 PM

Post #5 of 9 (1320 views)
Re: [SteelyDan] Creating a Sequence 'Array' [In reply to] Can't Post

Please note that I only tried to give a simple example on how to solve your central problem, i.e. generating a list of sequences. I would almost certainly not write my solution this way. I would probably not use a C-style for loop, and I would probably not store my results in an array, but rather print them directly to the output file.

I was just giving an example on how to generate sequences which are each offset by 1 character compared to the previous one.

Sidenote: I hope that what I am saying hereabove makes sense in English. I passed a master degree in a North American University, so that I tend to believe that I have an acceptably good command of English, but, sometimes, I have some doubts on the way I express certain things, after all, it is not my mother tongue and what I am saying or writing may sometimes sound a bit strange to others). And sorry if I am not clear (but don't hesitae to ask for clarification if you need).


SteelyDan
New User

Jul 11, 2013, 8:10 AM

Post #6 of 9 (1301 views)
Re: [Laurent_R] Creating a Sequence 'Array' [In reply to] Can't Post

I have been trying to construct my own code now, using yours as a reference, and I can't seem to get it work or understand what is going on.

Keep in mind, this is all brand new to me

So what I gathered from your code:


Code
my @seq:  
my $sequence = "AGCCTAGTTAGCCCGAGCTCGGAGCGAGCTATGAG";
for (my $i=0;$ i<5; $i++) {
$seq[$i] = substr $sequence, $i, 21;
}


Is that you 'claim' an array called @seq,
You then claim a variable called $sequence, and set it to equal the sequence of interest
(At this point I'm wondering what the purpose of including "my" in front of them is)
Then, you set up a 'for loop'. $i is your iterator variable which does something I don't really understand. You set i = 0 so that it starts at the beginning (position 0) of the sequence string, and you also set it so that it cannot be greater than 5... Then you make the value of $i increase incrementally by 1. But how does the program know to increase the value of $i at the end of each loop? And why is the "(my $i=0; $i<5; $i++)" in brackets?

Then, the first part of the loop I don't understand "$seq[$i]" ... What does this mean? Shouldn't it say @seq, because we want to create an array with these values? And what is the purpose of the $i in square brackets [$i]? Then you apply the function substring to output a 21 character string from $sequence starting at position $i (0).

Then the loop completes... how does the program know to then increase $i by 1, if the $i++ is not in the loop?

Sorry for my total nubishness, just want to learn the core stuff so I don't have to ask so many questions in the future.

P.S. Your English is perfect... I couldn't even tell it was your 'n'th language.

Cheers
Nick


SteelyDan
New User

Jul 11, 2013, 8:39 AM

Post #7 of 9 (1296 views)
Re: [SteelyDan] Creating a Sequence 'Array' [In reply to] Can't Post

Hey, so I actually got it working (somewhat) using a different kind of loop.

Here is my code:


Code
#!/usr/bin/perl 
use warnings;

print "Please enter the sequence you would like to fragment.\n";
$sequence = <STDIN>;

$seqlength = length $sequence;

print "Please enter the fragment length.\n";
$fraglength = <STDIN>;

$endloop = $seqlength - $fraglength;

print "The amount of fragments to print is $endloop\n";

$i = 0;

while ($i < $seqlength - $fraglength) {
my $output = substr $sequence, $i, $fraglength;
print $output;
$i++;
}


The only problem I have is that when it splits the sequence, it does so all on the same line. I tried putting a "\n" after the output so that it looks like

print $output, "\n";

but it tells me that its a useless use of a constant, and the sequences are all stuck together in the printed output.

Any help?


Laurent_R
Veteran / Moderator

Jul 11, 2013, 8:42 AM

Post #8 of 9 (1295 views)
Re: [SteelyDan] Creating a Sequence 'Array' [In reply to] Can't Post

Hi,

some explanations on the posted code.

The my statement is used to declare variables.
@seq is an array of elements (any variable starting with @ is an array), but an individual element of this array will be, for example, $seq[0].

The for loop is a so-called C-style for loop, because tit uses the syb=ntax of this loop in the C language.

Its semantics is for (starting condition; halting condition; change condition) { code to be executed for each value of the $i counter).

The loop is not just the code between the { and } curly braces, it is trhat code + the for (...) live above.

A more "perlish" way to do it is to have the following for syntax:

Code
for (my $i (0..4) { ...

which can also be written

Code
foreach (my $i (0..4) { ...

which is strictly the same thing, but the work foreach possibly explains better that $i is iterating over the range (0..4).

$seq[0] is the first element of the @seq array.
$seq[1] is the second element of the @seq array.
etc.

In my code, $i takes values 0, 1, 2, 3 and 4.

I think you should really pick up a good tutorial or a good book on Perl in order to nail down at least the basics.

Cheers,


Laurent_R
Veteran / Moderator

Jul 11, 2013, 8:46 AM

Post #9 of 9 (1294 views)
Re: [SteelyDan] Creating a Sequence 'Array' [In reply to] Can't Post


Code
print $output, "\n";


should work.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives