Replies

Nevertheless, my questions about entropy and complexity remain - and I would greatly appreciate any comments you may have.

I've been busy this week, and will be driving off to visit my parents shortly, so I won't be able to address this over the weekend. Nonetheless, I will burn a little time right now and see what I can come up with.

The most important concepts that you need to understand in the domain of algorithmic information theory (noting that all information theory generally fits under this umbrella as special cases of this general framework) are entropy, Kolmogorov Complexity, and Solomonoff Induction. All these concepts are tightly related.

(Unfortunately for us, a lot of the work in this field was done by Russian mathematicians, so we don't have simple names like "Bob's Rule" or "Smith Complexity". Difficult pronunciation is a requirement in AIT.)

First, don't think of "information" as bits. The idea of "bits" puts people's minds in the wrong place conceptually. Think of information as "patterns". In fact, I generally refer to information as "patterns" because it makes it more difficult for minds to construe it as "bits", which is how most people read that. The difference is important to the definition of entropy.

Kolmogorov complexity has one definition, but two main contexts of interpretation. It is the measure of a perfectly efficient representation of a particular pattern. A more mundane term is "minimum description length" of a pattern. The first aspect that confuses many people is that there are two useful complexities that can be measured in a pattern. The first is the intrinsic KC of a specific state only e.g. the KC of "Hamlet" or any other fixed pattern. The second is the intrinsic KC of a dynamic system, which measures the control function of the "machine" required to generate the state space, but an important difference with respect to the first measure of KC that I mention is that the intrinsic KC of a "machine" in this context is always greater than or equal to the KC of any pattern stored in its state space. For the sake of clarity, I'll call the first useful application the "pattern KC" and the second application the "algorithmic KC". The pattern KC is essentially a special interpretation of algorithmic KC.

Entropy is a measure of information efficiency of a pattern. Traditional information theory only deals with the pattern KC. In this framework, entropy is measured as the difference between the pattern KC of a representation and the actual length of the representation. When this difference is zero, you have achieved maximum entropy. In algorithmic information theory, we extend this concept to algorithmic KCs.

At this point, you may have difficulty seeing the relationship between the laws of thermodynamics and algorithmic information theory. Thermodynamics is in fact AIT, but with one additional rule added that is a property of information in our universe: information can't be moved, only copied. This is counter-intuitive at first glance, but the universe deals with bits in the same way our computers do. The consequence of all this is a "transaction theoretic" model of information, the basis for the laws of thermodynamics (and many other equivalent laws in other fields).

Working from this, we can show that such algorithmic systems will trend towards more efficient information representations i.e. increased entropy. To put it another way, all systems in our universe trend towards increased pattern complexity with time (asymptotically approaching the algorithmic KC of the system), and the creation of information is favored. AIT is sufficiently abstract that certain fundamental rules of universes it is applied in do not matter. However, when applying it to algorithmic/dynamic interpretations in a given universe, these basic rules do have consequences. So in a nutshell, thermodynamics is AIT that has been grounded with a specific parameter that is a property of information in our universe.

"Solomonoff Induction" is a more advanced and esoteric area closely related to Kolmogorov complexity. In the mathematical abstract, it is a mechanism for discovering/computing the Kolmogorov complexity (and patterns that represent it) of an arbitrary pattern or system. Using SI to compute pattern KCs is relatively easy; finite approximations of this are the basis of data compression schemes. Using SI to universally compute algorithmic KCs is much more difficult -- no tractable implementations have ever been publicly described. This last problem (using SI to efficiently generate algorithmic KC representation approximations) has been my area of research for many years. This could be considered a fundamental pattern transform of computer science (an enormous algorithm space is only describable in such a function if implemented), hence why it is important to solve. One of the special properties of SI is that it can universally "reverse engineer" patterns to the machines/algorithms that created them.

I'm generally going to avoid getting into Solomonoff Induction because it is a complex and difficult to grok topic. Entropy and Kolmogorov Complexity can be described relatively simply, but the consequences and properties of SI are much more difficult. I haven't done justice to any of these things, but this should at least give you a jumping off point.