Click on the type of query you would like to do. Then put in the sequence you would like to check. After the process is completed, a listing of matches found in the searched databases will be given to you. In that data is a number describing the probability of finding a random sequence in the database. Here is one for a 300 base string.
The probability is 10-167 with 0 mutations.
>gi|5729841|ref|NM_006708.1| Homo sapiens glyoxalase I (GLO1), mRNA Length = 1993 Score = 595 bits (300), Expect = e-167 Identities = 300/300 (100%) Strand = Plus / Plus Query: 1 ctagttaaggcggcacagggccgaggcgtagtgtgggtgactcctccgttccttgggtcc 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 1 ctagttaaggcggcacagggccgaggcgtagtgtgggtgactcctccgttccttgggtcc 60 Query: 61 cgtcgtctgtgatactgcagttcagccatggcagaaccgcagcccccgtccggcggcctc 120 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 61 cgtcgtctgtgatactgcagttcagccatggcagaaccgcagcccccgtccggcggcctc 120 Query: 121 acggacgaggccgccctcagttgctgctccgacgcggaccccagtaccaaggattttcta 180 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 121 acggacgaggccgccctcagttgctgctccgacgcggaccccagtaccaaggattttcta 180 Query: 181 ttgcagcagaccatgctacgagtgaaggatcctaagaagtcactggatttttatactaga 240 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 181 ttgcagcagaccatgctacgagtgaaggatcctaagaagtcactggatttttatactaga 240 Query: 241 gttcttggaatgacgctaatccaaaaatgtgattttcccattatgaagttttcactctac 300 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 241 gttcttggaatgacgctaatccaaaaatgtgattttcccattatgaagttttcactctac 300
Here is the result for the mouse compared to human. 10-26 with 27 mutations.
>gi|26327652|dbj|AK031832.1| Mus musculus adult male medulla oblongata cDNA, RIKEN full-length enriched library, clone:6330414G20 product:GLYOXALASE I homolog [Homo sapiens], full insert sequence Length = 959 Score = 127 bits (64), Expect = 1e-26 Identities = 145/172 (84%) Strand = Plus / Plus Query: 83 cagccatggcagaaccgcagcccccgtccggcggcctcacggacgaggccgccctcagtt 142 ||||||||||||| || ||||| ||||| | |||||||| || ||| |||| |||| | Sbjct: 46 cagccatggcagagccacagccggcgtccagtggcctcactgatgagaccgctttcagct 105 Query: 143 gctgctccgacgcggaccccagtaccaaggattttctattgcagcagaccatgctacgag 202 |||||||||| | ||||| || ||||||||||||||| ||||||| || |||||| || Sbjct: 106 gctgctccgatccagaccctagcaccaaggattttctactgcagcaaacgatgctaagaa 165 Query: 203 tgaaggatcctaagaagtcactggatttttatactagagttcttggaatgac 254 | ||||||||||||||||| |||||||||||||| || ||||||||| |||| Sbjct: 166 ttaaggatcctaagaagtccctggatttttatacgagggttcttggactgac 217
I don't think you are responding to my question.
Here's my question in another form:
The article speaks of "One of the chunks was 1.6 million DNA bases long, the other one was over 800,000 bases long."
There is an implication that there are other chunks of varying length. Presumably there are chunks of length 1, 2, 3, 4, 5, and so forth. Perhaps ther are constraints limiting the lengths to multiples of two or four or whatever, but there must be conserved chunks of various lengths.
So what I am asking is, what is the distribution of lengths? How many 1s, how many twos, and so forth.