Replies

I’ll go further: the issue doesn’t occur on the front page, only when you view the thread. That’s where the problem is.

The problem is only with posted stuff. The server thinks it knows better than the user.

The most trivial example is posting non-HTML containing quotes. The server converts them from straight-up 7-bit ASCII to left and right curlies (smart quotes). They display just fine.

Then somebody selects them, posts them as a quote, adds commentary, and submits. Now they are garbage! LOL!

The root of the problem is, the server is looking at the UTF-8 input and supplying HTML entities for each non-7-bit ASCII byte it sees.

E.g., for a left curly double quote (“), the UTF-8 hex is e2 80 9c. The server translates those three bytes to â€, which come out as â€œ, namely, small-a with a circumflex, the euro symbol, and the oe ligature.

The solution is to convert that mess back to the original Unicode “.

This can easily be done client-side, using JavaScript. You just need some user scripts. See here for thread viewing and here for posting.

Interesting. I really think it’s 1252 somewhere, but your commentary also makes some sense. It could be some interaction between the markup translation, detranslation to raw characters, and retranslation to viewable.

The bit I find particularly interesting is that the smart quotes work on the front page, but break on the thread view.

Here’s my dump from copying smart quotes to a file (in a UTF-8 console) and hexdumping it:

[redacted]-imac:~ [redacted]$ hexdump ~/freep.txt
0000000 c3 a2 e2 82 ac 0a
0000006

That looks very much like the issue referenced on stackoverflow, which is why I suspect 1252 in there somewhere.