Posted on 10/30/2015 9:53:08 PM PDT by saminfl
I'm really a mainframe guy, and after experiencing these problems and finally getting exasperated, I started searching what was causing this. Mt search endeavors are what I am sharing, not my intimate knowledge about it.
I’m having the same problem with constant religions and seeing the same junk letters and symbols in posts!
Me too
Thank you
iPhone 6+
I’m mixed on this phone btw
Sorry should be re- logins
From what I understand, the quote you entered is converted and displayed as you intended, however, the number code generated has the additional characters embedded in it. Thus when you copy & paste it, the conversion then has the extra embedded characters to now convert. I know that’s confusing, and since I am not really an expert on how these conversions actually work, I may be wrong in my explanation. But it is the only thing that makes sense. In other words I am making an educated guess based on what my researching has identified as to what is happening to cause these issues.
I got 10$ that the server runs on windows and is infected.
Yes, but it would be tedious.
As a minimum, you could replace each curly quote or apostrophe with a straight ASCII double or single quote and each em-dash with '--', etc. A more advanced approach would replace each non-ASCII character with the corresponding HTML entity. E.g., you would replace the left curly quote with “, the right with ”, etc.
You could also automate the process with the following JavaScript function:
function entify(s) { var result = ''; for (var x=0; x<s.length; x++) { var c = s.charCodeAt(x); result += c < 128 ? s[x] : '&#' + c + ';'; } return result; }
Copy it into your browser's JavaScript console and feed it, say, entify('“refugee”'). It will return '“refugee”', which will display correctly (8220 and 8221 are the numeric equivalents of ldquo and rdquo).
But there's an additional problem: It will only display correctly for one generation. As soon as somebody tries to quote your post in a reply, it will be back to the circumflex-A, Euro sign garbage (unless they also perform the entification process.
You lose.
FR runs nginx on Linux.
http://toolbar.netcraft.com/site_report?url=http://www.freerepublic.com
We all are.
I always preview, and I’m having the problem too.
It's some kind of utf-8 character encoding issue, I believe. There's probably a conflict between the content that people are posting, and the character encoding of the html page itself.
Such issues often involve apostrophes and the like.
Free Republic coding has a disgronifier built into it. The disgronifier software is on the fritz and had to be recoded strand by stand. Just think of the disgronifier as a text virtualizer, a text emulator and text simulator all rolled into one. Its not a lot on lines of code but they must be redone.
According to your link, everything should be hunky-dory, because FR pages use UTF-8 encoding.
But what is happening is that the FR server is now converting incoming UTF-8 to garbage by replacing the individual bytes of each multi-byte UTF-8 group with the corresponding HTML entity. Thus, for example, the UTF-8 for the left double curly quote, e2 80 9c, gets replaced by three entities, circumflex-a, the Euro sign, and the oe ligature. As a result, the browser, even though it is correctly in UTF-8 mode, doesn't see the pristine UTF-8 character, but instead the three bogus characters.
Nope. The pages are set to the proper text-encoding, UTF-8.
What's happening is that the server is mangling the text on input. See #55 above.
If what you say is correct, then it sounds like something has corrupted the translation processing possibly? Strange though since you can remove the character, retype it and it comes out correctly. Definitely a strange problem. Is there anyone at FR that can step through the conversion process, if that is even possible, to determine what exactly is happening? Being a mainframe person, not really knowledgeable if stepping through a process is even possible in this realm.
FR's posting input processor has two modes: lazy mode and HTML mode. In lazy mode, it tries to add enough HTML to make your post look more or less the way you typed it. In HTML mode, it strips out stuff like CSS and JavaScript and makes sure the likes of <i> get closed if any are left open. But, otherwise, it's up to you how your post formats.
Apparently, one of the features of lazy mode is smart quoting. (Your example works in lazy mode, but not in HTML mode.) In lazy mode, if you type straight-up ASCII quotes, it converts them to left and right curly quotes. E.g., if you type "Smart quotes in lazy mode!", hit Preview, and view the source of the page that comes back, you see that the server has converted your straight, 7-bit ASCII quotes into left and right curlies:
<DIV CLASS=body><p>“Smart quotes in lazy mode!”</p></DIV>
If you copy the text echoed in the preview window to the clipboard and feed it through a hex dumper, you see that the smart quotes have been replaced by the appropriate three-byte UTF-8 sequences:
$ pbpaste|xxd 0000000: e280 9c53 6d61 7274 2071 756f 7465 7320 ...Smart quotes 0000010: 696e 206c 617a 7920 6d6f 6465 21e2 809d in lazy mode!...
If you now paste that text back into the input window and retry the Preview, the HTML for the preview window looks like this:
<DIV CLASS=body><p>“Smart quotes in lazy mode!”</p></DIV>
The server has converted the individual bytes of the three-byte UTF-8 characters to entity codes, resulting in crud:
âSmart quotes in lazy mode!â
The bug is that the server is mangling UTF-8 sequences on input.
Lucky! I have to use cave drawings!
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.