With regard to your question about complexity/entropy and Shakespeare/DNA Im not finding a lot of agreement, but the information is fascinating! Heres what Ive got so far, emphasis mine.
Lurkers: KC is short for Kolmogorov Complexity.
Consider a monkey typing randomly at a typewriter versus a monkey at a computer. It can be shown that the monkey at a computer is more likely to produce more ordered or "interesting" results. Consider the probability that either monkey will produce all the works of Shakespeare. Assuming the text is 1 million bits long, the monkey at a typewriter has probability about 2^-1,000,000 of producing the text. The monkey at the computer has probability 2^-KC(Shakespeare) or 2^-250,000 of producing the text assuming the Shakespeare is conpressable by four times. Neither of these probabilities is large but one is still many, many orders of magnitude more likely than the other. In this sense, we can say that a computer is an amplifier of order, information or "sense".
Of particular importance to this paper is the concept of Kolmogorov Complexity (also known as Algorithmic Information Complexity) introduced independently by Kolmogorov[10], Chaitin[2] and Solomonoff[15]. Given a particular universal Turing Machine1 (UTM) U, the Kolmogorov complexity of a string of characters (ie a description) is the length of the shortest program running on U that generates the description.
There are two main problems with Kolmogorov complexity:
2. Random sequences have maximum complexity, as by definition a random sequence can have no generating algorithm shorter than simply listing the sequence. As Gell-Mann[8] points out, this contradicts the notion that random sequences should contain no information.
Rather than trying to hide this context dependence, I would prefer to make it a feature. Instead of asserting complexity is a property of some system, it is a property of descriptions (which may or may not be about a system). There must also be an interpreter of these descriptions that can answer the question of whether two descriptions are equivalent or not. Consider Shakespeare's Romeo and Juliet. In Act II,ii, line 58, Juliet says ``My ears have yet not drunk a hundred words''. If we change the word ``drunk'' to ``heard'', your average theatre goer will not spot the difference. Perhaps the only one to notice would be a professional actor who has played the scene many times. Therefore the different texts differing by the single word ``drunk/heard'' in this scene are considered equivalent by our hypothetical theatre goer. There will be a whole equivalence class of texts that would be considered to be Romeo and Juliet by our theatre goer.
Once the set of all possible descriptions are given (strings of letters on a page, base pairs in a genome or bits on a computer harddisk for example), and an equivalence class between descriptions given, then one can apply the Shannon entropy formula to determine the complexity of that description, under that interpretation:
New perspectives on Zipfs law in linguistics
I highly recommend this one!!! The pdf is 101 pages, and begins speaking of both Shakespeare and DNA at page 9. They look at the issue from the language view (which is the correct view, IMHO) and at about page 17 offer a hierarchical improvement to figuring complexity.
Information Theory and Algorithmic Complexity: Applications to Linguistic Discourses and DNA Sequences as Complex Systems. Part I: Efficiency of the Genetic Code of Dna --- S. Naranan and V.K. Balasubrahmanyan
The genetic code is a mapping of 64 possible triplet codons from a 4-letter alphabet (A, C, G, U) into 20 amino acids and one STOP signal for protein synthesis. The pattern of degeneracies in codon assignments to amino acids suggests that there is an underlying variable length code, which meets all the optimal coding criteria dictated by Shannon's Information theory. The genetic code can be viewed as an instantaneous, prefix code with unique decipherability and compactness. Optimal codon assignments and average code lengths for 20 species from 10 groups in the phylogenetic tree have been determined from the available data on codon usage, using the Huffman algorithm.
The average binary code length of the genetic code exceeds the optimal average length of the Huffman code only by 2 to 5%, showing that the genetic code is close to optimal. But functionally, the genetic code is a fixed length code of 6 binary bits (3 bases). This provides the needed redundancy (? 25%) for tolerance of errors in the DNA sequences due to mutations.
This hybrid character of the genetic code, combining the advantages of variable and fixed length codes, supports a speculation that in the past the genetic code could have been a variable length code which has evolved to its modern version. The DNA sequence bears striking similarities to linguistic discourses from an Information Theoretic view. Both are complex adaptive systems with intrinsic elements of order and randomness. The complexity parameter, which we had defined earlier for linguistic discourses, is close to maximal in both DNA and natural languages. In this article, the first of two parts, we have focused on the variable length genetic code. In Part II, we deal with DNA sequence treated as a complex adaptive system and some possible correlation of certain parameters with evolution.
1. The dependence on U, as there is no unique way of specifying this.
Yes, that is one problem that has bothering me about it. The rules are algorithms and the 'shortness' and ability of an algorithm to produce a result is highly dependent on the abilities of the programmer, and yes even on the intuitional abilities of the person writing it. Seems there is something highly subjective in the formula.
2. Random sequences have maximum complexity
Yes, that bothers me the most as I pointed out before regarding the content of information. Someone may spend his life studying how complex the grains of sand on a beach are, but that will be of little use. On the other hand spending time studying the DNA code can have great benefits.
BTW - we discovered DNA some 50 years ago and we still have barely begun to uncover its secrets.
As a technical nit, the real value for the compressibility of Shakespeare is probably around 40x rather than 4x, this based on real compressibility data points. Not terribly important, except that this would be an order of magnitude reduction of the exponent.
This isn't a correct application for our universe. In our universe, two arbitrary machines will always agree on the complexity of a string. I touched on this in previous posts. The Invariance Theorem is misleading in practical application; our universe doesn't contain UTMs as far as we can tell, which has consequences.