Free Republic
Browse · Search
General/Chat
Topics · Post Article

To: some tech guy
The code trying to make safe HTML is messing up because it doesn’t understand UTF-8.

That's as good a guess as any, as to how the bug got introduced.

Once you understand the source of the problem, it's not too hard to make client side fixes (see the links in #15).

To display a clean page, you need to scan the JavaScript representation of the page's text for cases where the UTF-8 is getting byte-by-byte entified. Solution: a regular expression that finds the entifications and feeds them to a function that reverses them back to the original code point. Apply it to every text snippet in the article, and update any snippet that underwent a change. Decrudified! Done!

To post clean input, you need to entitize any non-7-bit ASCII that may be present. And, on the Preview side, you need to do the opposite, so that the user is not buried in strange entities while posting, say, Uncle Volodya's full name (Влади́мир Влади́мирович Пу́тин) in the original Cyrillic.

The extensions at the links in #15 take care of all that.

31 posted on 11/20/2015 12:22:50 AM PST by cynwoody
[ Post Reply | Private Reply | To 29 | View Replies ]


To: cynwoody

Given that I’m server- and client-side, I’ll give my opinion (which is no more or less valid than yours)

Yes, you can fix it with JS, but that doesn’t address the root cause. I was *wrong* about the root cause and I don’t mind admitting that. The code and the db are solid.

There’s just a little bit of code which gets executed when displaying the thread with comments that screws everything up. People have reported that pasting text into the post box works, and that it’s fine on preview. It’s stored in the DB without problem. It shows on the main page just fine. UTF-8 FTW.

Something is filtering text on the display thread view. And that something only understands ASCII. That’s my bet. A $5 bet, if you’re interested ;)

Nobody should have to cleanse their input, everything should just work. That’s how it should be. In my opinion.


32 posted on 11/20/2015 12:30:38 AM PST by some tech guy (Stop trying to help, Obama)
[ Post Reply | Private Reply | To 31 | View Replies ]

Free Republic
Browse · Search
General/Chat
Topics · Post Article


FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson