Replies

Go here --->http://www.ncbi.nlm.nih.gov/BLAST/

Click on the type of query you would like to do. Then put in the sequence you would like to check. After the process is completed, a listing of matches found in the searched databases will be given to you. In that data is a number describing the probability of finding a random sequence in the database. Here is one for a 300 base string.

The probability is 10^-167 with 0 mutations.

>gi|5729841|ref|NM_006708.1|   Homo sapiens glyoxalase I (GLO1), mRNA
          Length = 1993

 Score =  595 bits (300), Expect = e-167
 Identities = 300/300 (100%)
 Strand = Plus / Plus

                                                                       
Query: 1   ctagttaaggcggcacagggccgaggcgtagtgtgggtgactcctccgttccttgggtcc 60
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1   ctagttaaggcggcacagggccgaggcgtagtgtgggtgactcctccgttccttgggtcc 60

                                                                       
Query: 61  cgtcgtctgtgatactgcagttcagccatggcagaaccgcagcccccgtccggcggcctc 120
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 61  cgtcgtctgtgatactgcagttcagccatggcagaaccgcagcccccgtccggcggcctc 120

                                                                       
Query: 121 acggacgaggccgccctcagttgctgctccgacgcggaccccagtaccaaggattttcta 180
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 121 acggacgaggccgccctcagttgctgctccgacgcggaccccagtaccaaggattttcta 180

                                                                       
Query: 181 ttgcagcagaccatgctacgagtgaaggatcctaagaagtcactggatttttatactaga 240
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 181 ttgcagcagaccatgctacgagtgaaggatcctaagaagtcactggatttttatactaga 240

                                                                       
Query: 241 gttcttggaatgacgctaatccaaaaatgtgattttcccattatgaagttttcactctac 300
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 241 gttcttggaatgacgctaatccaaaaatgtgattttcccattatgaagttttcactctac 300

Here is the result for the mouse compared to human. 10^-26 with 27 mutations.

>gi|26327652|dbj|AK031832.1|   Mus musculus adult male medulla oblongata cDNA, RIKEN full-length
           enriched library, clone:6330414G20 product:GLYOXALASE I
           homolog [Homo sapiens], full insert sequence
          Length = 959

 Score =  127 bits (64), Expect = 1e-26
 Identities = 145/172 (84%)
 Strand = Plus / Plus

                                                                       
Query: 83  cagccatggcagaaccgcagcccccgtccggcggcctcacggacgaggccgccctcagtt 142
           ||||||||||||| || |||||  ||||| | |||||||| || ||| ||||  |||| |
Sbjct: 46  cagccatggcagagccacagccggcgtccagtggcctcactgatgagaccgctttcagct 105

                                                                       
Query: 143 gctgctccgacgcggaccccagtaccaaggattttctattgcagcagaccatgctacgag 202
           ||||||||||  | ||||| || ||||||||||||||| ||||||| || |||||| || 
Sbjct: 106 gctgctccgatccagaccctagcaccaaggattttctactgcagcaaacgatgctaagaa 165

                                                               
Query: 203 tgaaggatcctaagaagtcactggatttttatactagagttcttggaatgac 254
           | ||||||||||||||||| |||||||||||||| || ||||||||| ||||
Sbjct: 166 ttaaggatcctaagaagtccctggatttttatacgagggttcttggactgac 217

I don't think you are responding to my question.

Here's my question in another form:

The article speaks of "One of the chunks was 1.6 million DNA bases long, the other one was over 800,000 bases long."

There is an implication that there are other chunks of varying length. Presumably there are chunks of length 1, 2, 3, 4, 5, and so forth. Perhaps ther are constraints limiting the lengths to multiples of two or four or whatever, but there must be conserved chunks of various lengths.

So what I am asking is, what is the distribution of lengths? How many 1s, how many twos, and so forth.