Another minor technical nit: Huffman coding is not an optimal measure of information representation in any universal sense. Being within 2-5% of "optimal" for Huffman coding could very well be 25% from the mathematical optimal. Huffman is only considered optimal (as a statistical model -- arithmetic coding is more efficient in the same domain, strictly speaking) in a Shannon information theory perspective.
Notably, you seem to discount the import of Shannon in the biological issues at hand - but many (if not most) of the articles I've found defer to Shannon.
This particular difference of opinion I find very relevant and would like to know more of your reasoning for not leaning to Shannon.