Posted on 06/18/2003 1:32:37 PM PDT by ShadowAce
SCO'S EXECUTIVES have been making some extremely wild claims and ridiculous veiled threats lately, even by their own, er... standards.
SCO's really cranked up its volume of FUD generation recently, from its announced "termination" of IBM's AIX license (which IBM immediately denied and dismissed) to laughably grandiose ambitions to go after virtually every vendor of Unix based operating systems (excepting Sun, apparently) including possibly even Microsoft. ($10 Million doesn't buy very much SCO loyalty anymore, it seems.)
It's almost as though SCO is screaming "All Your Code Belongs to Us" with respect to every Unix or Unix-like system developed since AT&T's ancient System V.
Well, we shall see. Maybe we can all find out soon, and much sooner than SCO would prefer. An INQUIRER reader writes:
"Yesterday I realized how trivial it was to find matching code within two source trees.
"While working on this stuff, I realized that [the] SCO lawsuit is indeed pure FUD, and they will keep it like that till the end. So it seems like the best thing for the linux community now would be to find the matching code ourselves and figure out where it came from. SCO help is not needed. Otherwise Linux is so to speak a sitting duck. If Linux community knows what is very similar and why, that would fully protect Linux in press and leave IBM to annihilate SCO."
I don't know how "fully" this might be effective, because certain press elements are practically extensions of the Vole's propaganda office. It does sound interesting enough to look into closely, though. Our unnamed correspondent continues:
"Since I do not have access to System V code, I took Linux 2.4.20 and BSD-lite 4.4. I'll give the technical details later, but here are the findings:
"[Linux versus] 4.4BSD-Lite
" lines Linux BSD
200- 260 ...amd7930.c ...bsd_audio.c
398- 519 ...slhc.c ...slcompress.c
739- 766 ...balloc.c ...ffs_alloc.c
2267-2299 ...bonding.c ...inet_addr.c
[Note: We truncated the full paths for formatting purposes, but the original email is available containing all paths and other details.]
"On the left is the file in the Linux tree, on the right is the file in the 4.4BSD tree. Also the range of matching lines in Linux is given on the left. It is unlikely that I missed any other large matching fragments.
"Now, it seems to be quite likely that the matching Linux-System V code shown to the "experts" by SCO came from one of these files. And all because this is the original BSD code, which got copied everywhere."
This isn't a new theory. Ever since SCO first filed its lawsuit against IBM there has been speculation that they're basing that on old BSD code that was added to both AT&T System V Unix and many other Unix versions. But what's different here is that a process is proposed to identify all such common BSD code, eliminate it, and perhaps do some other things.
As our reader intimates, he's found a clever way to compare Unix source code without viewing the code directly or violating copyrights We will let him explain in further detail how it's possible to do this:
"Here is the procedure for finding the matching code....
"1. Each file withing each source tree is "shredded" into 5 line pieces (1-5, 2-6, 3-7, etc.). MD5 sum is computed for each block of lines. The output is 3 columns: MD5sum, source file, 1st line in the block.
"At this stage, 4.4BSD had [a] ~40Mb file, linux ~160Mb. Potentially, one could shred into smaller or larger pieces, however, with pieces too small there'll be a lot of noise, with pieces too large some matches won't be seen. 5 liners seem to be a good compromise.
"2. Within each source tree the "shredded" file is sorted by MD5sum, and duplicate entries within the same tree are removed completely (these are either trivial 5-line sequences or licensing disclaimers). Unix sort here takes a couple of minutes on a 600Mhz P3.
"3. A column indicating the origin of the file is inserted into the file (0 - BSD, 1 - linux). Both Linux and BSD "shredded" files are merged such that MD5sums stay sorted.
"4. At this point a given MD5sum will occur either once or twice, i.e., in both source trees. Here remove all thesingle lines, and have the 5 liners left that are matching.
"5. Count for each file in Linux tree the number of matches with the BSD tree using the file generated at step 4. Sort this list, and the largest counts will occur for the files with the largest number of matching lines. The range can be extracted from the file from step 4, since at step 1 we kept the address of the 1st line in the block. That is how the info above was generated.
"The beauty of this scheme is that anybody with System V code can inform the Linux community about what is identical without revealing any System V code. And this might actually be legal, since I do not think that there are clauses in the contracts NOT to shred the code and compare it with other code. Also, it is quite easy to stay anonymous since the person who does the analysis need not to reveal him/herself in any way."
If anyone has access to the AT&T System V or later SCO source code, we can pass along our reader's scripts in such a way as to preserve his anonymity. And yours, as the INQUIRER never reveals its journalistic sources. All we'd ask is that you share the results with our readers.
As our Correspondent Who Prefers to Remain Anonymous says in closing:
"Anyway, I hope we will find what SCO has matching within days."
Indeed. Maybe we can pull the plug on SCO's anti-Linux FUD machine!
Best quote of the piece. All your code are belong to us. These guys are barking up the wrong tree.
It could be the other way around, too, as Caldera had a project trying to run Linux binaries on Unix. It could be that both got some code from BSD. We're not really going to know until this finally gets into court.
OTOH, it makes no sense for IBM to reward SCO's board and shareholders for starting this war. Who knows what could come out of the woodwork then.
You're right about one thing -- it's very entertaining to watch, whether you have a dog in the fight or not.
Dude, that $1B (I heard it was $3B now) is a mere drop in the bucket. IBM paid nearly $2B just in corporate income taxes last year. BFD.
IBM is well positioned to do precisely what Microsoft would likely do in this circumstance: keep the case tied up in the courts for years until SCO runs out of money.
Remember that he also told Algore, "we can win this, and then you'll be the POTUS". But he eventually reached the point where he could go no further.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.