Replies

I think they are supposed to be a code for apostrophes and other symbol for quotation marks, but why am I seeing them?

FR pages are supposed to be coded in UTF-8. That means you should be able to copy and paste Rooskie, Chinese, Arabic, whatever — not to mention curly quotes, em dashes, and funny apostrophes — and expect it to show up correctly in the user's browser.

That used to be true. However, in the last few days, something got broken in the FR server. Now, if you post UTF-8 (that matters — normal red-blooded American ASCII is also UTF-8, but doesn't need to be), you get garbage.

What appears to be happening is, the server scans posts for bytes falling outside the 7-bit ASCII range and substitutes HTML entities for each such byte. The result is a mess.

For instance, consider the word refugee enclosed in curly scare quotes, e.g., “refugee”. Have a closer look:

“refugee”

Notice that the open and close quotes have different shapes. That's why they are curly. In 7-bit ASCII, they would be the same — 7-bit ASCII can't represent curli-quotiness:

"refugee"

In the first example, in UTF-8, the curly quotes are each represented by three binary bytes. The left curly is e2 80 9c, and the right curly is e2 80 9d. When your browser encounters those sequences while operating in UTF-8 rendering mode, it correctly paints the left and right curlie shapes on your screen. IOW it just works.

However, if the recently introduced server bug gets a chance to intervene, it replaces e2, 80, 9c, and 9d with HTML entities representing those individual bytes. E.g.,

ârefugeeâ&#157

Your browser faithfully renders those corrupted entities, resulting in garbage:

â€œrefugeeâ€

A circumflexed a followed by a Euro symbol followed by whatever. LOL!

The fix is obviously to revert the recent change.

Did the change get introduced as a result of the recent Google malware scare?