The point is that 1,243 sequences, each over 100 base pairs with 70% identity is a humongus conserved "area". There is something called expectation when sequences are compared by BLAST. I'm pretty sure the expectation for this is close to zero. This is what I got when I took a 110 base sequence from mus musculus cytochrome and compared it to homo sapiens.
Score = 38.2 bits (19), Expect = 3.9 Identities = 19/19 (100%) Strand = Plus / Plus Query: 86 atgggccttcttgctcagt 104 ||||||||||||||||||| Sbjct: 223143 atgggccttcttgctcagt 223161
The expectation is low but the percent identities are 100%
The point is that 1,243 sequences, each over 100 base pairs with 70% identity is a humongus conserved "area". There is something called expectation when sequences are compared by BLAST. I'm pretty sure the expectation for this is close to zero. This is what I got when I took a 110 base sequence from mus musculus cytochrome and compared it to homo sapiens. ... The expectation is low but the percent identities are 100%There are several errors with your logic.
1) From the BLAST FAQ page here's what they say about what "Expect" means:
Q: What is the Expect (E) value?The E value has nothing at all to do with which species are being compared! It makes no judgements about how close two species' sequences "should" be to each other.The Expect value (E) is a parameter that describes the number of hits one can "expect" to see just by chance when searching a database of a particular size. It decreases exponentially with the Score (S) that is assigned to a match between two sequences. Essentially, the E value describes the random background noise that exists for matches between sequences. For example, an E value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size one might expect to see 1 match with a similar score simply by chance. This means that the lower the E-value, or the closer it is to "0" the more "significant" the match is. However, keep in mind that searches with short sequences, can be virtually indentical and have relatively high EValue. This is because the calculation of the E-value also takes into account the length of the Query sequence. This is because shorter sequences have a high probability of occurring in the database purely by chance. For more details please see the calculations in the BLAST Course.
The Expect value can also be used as a convenient way to create a significance threshold for reporting results. You can change the Expect value threshold on most main BLAST search pages. When the Expect value is increased from the default value of 10, a larger list with more low-scoring hits can be reported.
2) The cytochrome c gene, being an essential gene, should be highly conserved. I would expect a non-functional DNA stretch to be less homologous than the cytochrome c gene! And in fact it is: The knocked-out sequences were 70% homologous - a full 30 percent less than your example! Your example contradicts your argument.
3) Your claim was that the mice gene deserts were much more highly conserved WRT the homologous human gene deserts than we should expect if they were truly junk. You're making a judgement based on the average overall genetic distance between mice & man, not just one gene (I hope). And we still don't know what the "official" overall % figure is for that.
Pinging the only two people I know of who might know the real figures...