Posted on 07/18/2011 4:28:59 AM PDT by RobinMasters
"....who, being themselves but fallible and uninspired men, have assumed dominion over the faith of others, setting up their own opinions and modes of thinking as the only true and infallible, and as such endeavoring to impose them on others, hath established and maintained false religions over the greatest part of the world, and through all time..."--The Virginia Act For Establishing Religious Freedom
--Thomas Jefferson, 1786
[In laymans terms, does that mean that when the People purposefully disable their Bravo Sierra meters, that eventually the fit will hit the shan?]
Yep.
BOHICA.
OCR doesn't necessarily mean the resulting document is searchable. The software "reads" text and other elements, but what it does to them is up to the user. Making the document searchable doesn't make much sense unless the text is clear and crisp to begin with, which isn't the case here.
oh the ole - OCR that does not produce searchable text...right!
Why didn’t I think of that?
Mac does have an optimize for OCR scan feature - it just does NOTHING like what showed up in the WH_LFCOLB.pdf. (I have tried it several times on similar documents.)
Care to demonstrate what you speculate is happening?
But it ISN'T an "exact copy of the image." It is a "meddled with" copy. OCR doesn't explain the "meddling."
So the image would still have all the letters in their original positions--it didn't "put" the 'R' in the middle of a string of text, that's just where it was on the original form. But because it was fainter than the other letters, it was left as part of the background rather than being extracted with the rest of the text.
What possible reason would the "background" have for being a different pixel resolution and bit depth from the text? Should we believe the programer was a moron? Your argument that a "fainter" object should have it's pixel resolution decreased while at the same time it's dynamic range INCREASED, makes no sense.
That means it was downsampled along with the rest of the background when the PDF was optimized.
Yeah, about that "optimized" stuff. Your argument previously was that "Optimizing" decreases files size and memory requirements. (As if that is of any concern nowadays.) It just now occurred to me that you can get a 4 X REDUCTION in memory size by using the larger pixels, but you get a 7 X INCREASE in memory requirements by switching from a Binary bit map to 8 bit gray-scale, and a 15 X increase by switching to 16 bit Color! Here is the difference.
binbit-map: 1 bit per pixel.
Gray-Scale: 11111111 bits per pixel.
16bitColor: 1111111111111111 bits per pixel.
How is this supposed to be a benefit?
(It would also mean that if the father's name wasn't there, someone searching for a document containing 'Barack' wouldn't find this, while someone searching for 'ack' would.)
Yeah, I get that. So for the benefit of some dubious secondary feature, they seriously degrade the necessary primary purpose of making an exact copy? And this theory makes more sense to you than different source image formats?
Occam's razor dude.
Why do you spread such patently false information?
HMMMMM...
He is arguing that it was the unintended consequence of making a copy of the document by using software that was intended to generate a searchable text file. This begs the question? Does the State of Hawaii not own a Xerox machine?
You are trying to pull a "god" out of the Machine. Your "god from the Machine" will not rescue you. He is weak and impotent. :)
That is likely true. Misunderstanding of the 14th amendment is deeply ingrained into the population now. Therefore it is our job to educate the populace. Likely as not we will collapse as a nation before that opportunity is realized.
The technical specs in the file show the programs used - if you think some crazy process was followed, do us all a favor and replicate it.
As far as I know, they just say it was created with "Mac OS X 10.6.7 Quartz PDFContext," which isn't a program. It's part of the native OS X PDF handling software, which means the file was last touched by a Mac program, probably Preview. But that's all we know.
I remember a joke about a computer that was developed to translate English and Chinese. As the machine was being demonstrated to a reporter they fed in the English phrase "Out of sight, out of mind." Out came a string of Chinese Characters. The reporter said "That's very impressive, but how do I know that's an accurate translation?" The Developer responded, "We'll feed these characters back into the machine and translate them from Chinese to English." They fed the Chinese Characters into the machine and the English Translation came out: "Invisible Idiot."
I think that is EXACTLY what will happen if you took an image file and fed it back through a Mac computer with the appropriate software, with one exception. The idiot wouldn't be invisible. After he saw the result, he would also no longer be an idiot. (one would hope.)
If the application is normally used for the searching of an existing database of image files for the purpose of making copies or fabricating new replacement documents, it is reasonable to believe someone might make use of the text searching feature. What is not reasonable, is to believe there is any good explanation for changing pixel size and dynamic range from one character to another. The only reasonable conclusion is that the output document accurately represents the input image files.
Well, Adobe builds the capability in for some reason, as do other document management system vendors. You don't think that file size matters for people storing thousands of pages, as the users of a document archiving system would?
It just now occurred to me that you can get a 4 X REDUCTION in memory size by using the larger pixels, but you get a 7 X INCREASE in memory requirements by switching from a Binary bit map to 8 bit gray-scale, and a 15 X increase by switching to 16 bit Color!...How is this supposed to be a benefit?
I'll take one more shot at this. There is no switching from a binary bitmap to grayscale. The scanning and processing software recognizes most of the letters either as text (if it's doing OCR) or at least as pure black. It handles those in one way. It recognizes the background as a color image and handles it in another way. Because the 'R' is faint compared to the other letters, it's treated as part of the background and processed as just a gray area of the background image. The whole background image is stored as a color image, and the 'R' is part of it--it's not "switched" to being in color, nor does it have its dynamic range increased. (As I'm sure you realize, if you scan a black-and-white photo at the same settings as you would use for a color photo, you get a file the same size as if it were a color photo. The computer doesn't "understand" that gray isn't really a color--unless you tell it so.) If the background is downsampled, the 'R' is downsampled along with it. The important thing is that the computer doesn't know it's an 'R'. We can recognize it, but the software just thinks it's a gray smudge.
Here's something from a vendor of PDF compression software:
High accuracy recognition rates are achieved by leveraging advanced image processing techniques including: re-sampling, foreground and background separation, auto-rotation, and font learning. [Emphasis mine.]And this theory makes more sense to you than different source image formats?
What I see is a choice between believing that the BC anomalies are the result of some combination of automatic PDF processing functions; or they're the result of someone sitting down with multiple source files and (using Adobe Illustrator, mind you, not Photoshop or some other tool much better suited to the task) copying and pasting letter by letter--a 'B' from this file, an 'R' from another, a different 'R' from yet another; a box from here, another box from there--to assemble a forgery. Occam's razor only cuts one way for me on that question.
That's to be expected when running an enhanced scan on documents that aren't uniformly clear. Some characters aren't recognized and get rendered along with the background.
For example, the same thing can be seen in this PDF:
i find your theory that the PDF or other electronic format was created in Hawaii much less than useful.
The White House owns/created the electronic copy, until there is actual evidence to the contrary. The WH went to great lengths to detail how the paper copies got to DC. Do you have evidence that the electronic copy was created in Hawaii?
The DESTINATION SURFACE is all one resolution and either a 24 or 32 bit color pixel depth. It can "represent" 4 x larger pixels by using 4 pixels of it's surface to represent each pixel of the image. It can "represent" binary bit map images by turning all 24 bits on (White) or off (Black). The destination surface resolution is a CONSTANT. It ought not "create" a 4x normal pixel resolution on a background smudge it doesn't recognize. It ought to simply render to the surface the image that is loaded.
Now you seem to be making the argument that on a supposedly black and white ORIGINAL document, the scanner (and software) cannot distinguish sufficient contrast between the BLACK of the letter, and the WHITE of the page to recognize it as anything but the background, yet our eyes can easily distinguish that it is not?
(As I'm sure you realize, if you scan a black-and-white photo at the same settings as you would use for a color photo, you get a file the same size as if it were a color photo.
It *IS* a color photo. It's colors are gray scale renderings of the three primary colors as represented by the binary bits in the memory surface allocated for this purpose.
The computer doesn't "understand" that gray isn't really a color--unless you tell it so.) If the background is downsampled, the 'R' is downsampled along with it.
Downsampled? A new term for "Deus ex machina? Yeah, when I don't recognize something, I make the resolution four times worse, rather than just leave it alone.
The important thing is that the computer doesn't know it's an 'R'. We can recognize it, but the software just thinks it's a gray smudge.
If you are making a copy, the computer doesn't need to know what it is, n'est-ce pas?
I have thought about this a bit. A better argument for you would be that the Adobe program is using a MPEG type compression algorithm on image tokens somehow deemed by the software to need less detail. You could further argue that this is a benefit in applications OTHER than creating exact copies, for which this software might be used most of the time. (rapid video rendering comes to mind.) Then the question becomes why some moron thought it was a good idea to do this instead of making an exact copy? (It still doesn't explain the "Halos" around each letter either.)
For example, the same thing can be seen in this PDF:
Okay, now you've got my attention. This document does exactly that, and therefore there must be a reason for it.
Jeopardy music playing............
Okay, I think i've got it. It IS being done for data compression purposes. When the document was scanned, it was scanned at a specific resolution. Some pixels in a scaned line didn't pass a threshold level of contrast and were rendered "as seen" in low resolution while others were on the other side of the contrast threshold and were rendered as a uniform black at the finer pixel resolution. (It's like Natural born citizen. One type of citizen is certain, and of fine quality, while another is uncertain and of dubious quality. :) )
To make this system work, it makes sense to separate the two data strings into a high res binary string, and a lower res gray scale (or color) string. (If color isn't needed, significant memory size can be saved if it isn't used.) Both strings would be RLL encoded, but they would render to the same surface at the different resolutions. Upon rendering to the image surface, (in memory or on paper) The pixels of one string would interlace with the pixels of the other string to produce a seamless image of both formats in their appropriate locations in the image.
And that is my theory for what is probably going on with your PDF file. It is a great method for compressing a huge amount of data into a very small space, but it is certainly not optimal for reproducing clear and accurate documents.
The subsequent hypothesis is that this same process was used on the Obama birth certificate. If this is the case then these particular artifacts have nothing to do with OCR. I will have to ponder this for a bit. If I can find no fault in it, I will have to agree. That particular peculiarity may just be the result of using a piece of software optimized for making small files instead of making accurate copies. If so it would be evidence of stupidity instead of corruption.
Bullshit
Our eyes are much better at discerning meaningful shapes than computers are. That's why CAPTCHAs work.
Downsampled? A new term for "Deus ex machina? Yeah, when I don't recognize something, I make the resolution four times worse, rather than just leave it alone.
It's the term Adobe uses (and others too, I imagine). I could have sworn it appeared in a previous post of mine. Anyway, first, the point is that the computer didn't recognize it. And second, because of that, it treated it the same way it handled the rest of the background--which was to "downsample" it, i.e., lower its resolution.
If you are making a copy, the computer doesn't need to know what it is, n'est-ce pas?
No, it doesn't. It would have been better if they'd turned off whatever routines caused these anomalies and just scanned the dam thing as a TIFF.
(It still doesn't explain the "Halos" around each letter either.)
Actually, I think that's an argument for the whole thing being a program process rather than intentional copying and pasting. If the latter, there would be no reason for halos, and certainly no reason that when you hid the text "layers," the background would be white behind them. In fact, I don't see any way to explain that in a copy-and-paste scenario.
I agree, but it appears to be White House policy to post scans as PDF files, as that's the only format I can find them in. The most likely reason for this is compliance with accessibility standards, since PDFs can easily be given alternate text and tags that assist with the use of screen readers.
It is speculation. It fits a series of facts. If it is wrong, we shall eventually uncover a fact that shows it to be wrong.
The White House owns/created the electronic copy, until there is actual evidence to the contrary. The WH went to great lengths to detail how the paper copies got to DC. Do you have evidence that the electronic copy was created in Hawaii?
No. But I know of no evidence that it was created in the White House either. I don't regard anything Obama says or does as trustworthy, so I don't consider pronouncements from him or his staff to be evidence.
Our eyes are just optics. Interpretation is done in the brain. In the case of the artifacts, it appears no interpretation is occurring. It looks like a simple string contrast sorting algorithm.
It's the term Adobe uses (and others too, I imagine). I could have sworn it appeared in a previous post of mine. Anyway, first, the point is that the computer didn't recognize it. And second, because of that, it treated it the same way it handled the rest of the background--which was to "downsample" it, i.e., lower its resolution.
Again, I don't think it's "recognizing" anything. On the Document provided by Kleon, there are spots on the scan rendered in fine resolution two color, and there are other spots rendered in low resolution multi-color. The difference seems to be one of contrast having nothing at all to do with shape.
No, it doesn't. It would have been better if they'd turned off whatever routines caused these anomalies and just scanned the dam thing as a TIFF.
Yes.
Actually, I think that's an argument for the whole thing being a program process rather than intentional copying and pasting. If the latter, there would be no reason for halos, and certainly no reason that when you hid the text "layers," the background would be white behind them. In fact, I don't see any way to explain that in a copy-and-paste scenario.
I explained it in a previous response. Assuming software designed to recognize text and substitute letters, replacing an "A" with an "e" (in a specific position on the document) would require an "A" sized space to be erased, or the top half of the "A" would show above the "e" which replaced it. If the software is designed to do this on a regular basis, one of the subroutines would be to always erase a capitol letter sized space.
At the moment, Kleon's document has convinced me that what I originally took to be evidence of pasting from two different formats is more likely the result of a compression algorithm being used to reduce file size. The software that created it appears to be optimized for saving file size rather than accurate reproduction. It appears that you were right and I was wrong.
Ah well, at least one mystery put to bed. (As far as i'm concerned.) Now I will have to reassess some of those other artifacts in light of this realization.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.