Free Republic
Browse · Search
News/Activism
Topics · Post Article

Skip to comments.

Rethinking software bloat.
Information week.com ^ | 12/17/01 | Fred Langa

Posted on 12/17/2001 4:33:52 AM PST by damnlimey

Rethinking 'Software Bloat'

PRINT THIS ARTICLE
DISCUSS THIS ARTICLE
WRITE TO AN EDITOR
 
Fred Langa takes a trip into his software archives and finds some surprises--at two orders of magnitude.
By Fred Langa

 
Reader Randy King recently performed an unusual experiment that provided some really good end-of-the-year food for thought:
I have an old Gateway here (120 MHz, 32 Mbytes RAM) that I "beefed up" to 128 Mbytes and loaded with--get ready--Win 95 OSR2. OMIGOD! This thing screams. I was in tears laughing at how darn fast that old operating system is. When you really look at it, there's not a whole lot missing from later operating systems that you can't add through some free or low-cost tools (such as an Advanced Launcher toolbar). Of course, Win95 is years before all the slop and bloat was added. I am saddened that more engineering for good solutions isn't performed in Redmond. Instead, it seems to be "code fast, make it work, hardware will catch up with anything we do" mentality.
It was interesting to read about Randy's experiment, but it started an itch somewhere in the back of my mind. Something about it nagged at me, and I concluded there might be more to this than meets the eye. So, in search of an answer, I went digging in the closet where I store old software.

Factors Of 100
It took some rummaging, but there in a dusty 5.25" floppy tray was my set of install floppies for the first truly successful version of Windows--Windows 3.0--from more than a decade ago.

When Windows 3.0 shipped, systems typically operated at around 25 MHz or so. Consider that today's top-of-the-line systems run at about 2 GHz. That's two orders of magnitude--100 times--faster.

But today's software doesn't feel 100 times faster. Some things are faster than I remember in Windows 3.0, yes, but little (if anything) in the routine operations seems to echo the speed gains of the underlying hardware. Why?

The answer--on the surface, no surprise--is in the size and complexity of the software. The complete Windows 3.0 operating system was a little less than 5 Mbytes total; it fit on four 1.2-Mbyte floppies. Compare that to current software. Today's Windows XP Professional comes on a setup CD filled with roughly 100 times as much code, a little less than 500 Mbytes total.

That's an amazing symmetry. Today, we have a new operating system with roughly 100 times as much code as a decade ago, running on systems roughly 100 times as fast as a decade ago.

By itself, those "factors of 100" are worthy of note, but they beg the question: Are we 100 times more productive than a decade ago? Are our systems 100 times more stable? Are we 100 times better off?

While I believe that today's software is indeed better than that of a decade ago, I can't see how it's anywhere near 100 times better. Mostly, that two-orders-of-magnitude increase in code quantity is not matched by anything close to an equal increase in code quality. And software growth without obvious benefit is the very definition of "code bloat."

What's Behind Today's Bloated Code?
Some of the bloat we commonly see in today's software is, no doubt, due to the tools used to create it. For example, a decade ago, low-level assembly-language programming was far more common. Assembly-language code is compact and blazingly fast, but is hard to produce, is tightly tied to specific platforms, is difficult to debug, and isn't well suited for very large projects. All those factors contribute to the reason why assembly language programs--and programmers--are relatively scarce these days.

Instead, most of today's software is produced with high-level programming languages that often include code-automation tools, debugging routines, the ability to support projects of arbitrary scale, and so on. These tools can add an astonishing amount of baggage to the final code.

This real-life example from the Association for Computing Machinery clearly shows the effects of bloat: A simple "Hello, World" program written in assembly comprises just 408 bytes. But the same "Hello, World" program written in Visual C++ takes fully 10,369 bytes--that's 25 times as much code! (For many more examples, see http://www.latech.edu/~acm/HelloWorld.shtml. Or, for a more humorous but less-accurate look at the same phenomenon, see http://www.infiltec.com/j-h-wrld.htm. And, if you want to dive into Assembly language programming in any depth, you'll find this list of links helpful.)

Human skill also affects bloat. Programming is wonderfully open-ended, with a multitude of ways to accomplish any given task. All the programming solutions may work, but some are far more efficient than others. A true master programmer may be able to accomplish in a couple lines of Zen-pure code what a less-skillful programmer might take dozens of lines to do. But true master programmers are also few and far between. The result is that code libraries get loaded with routines that work, but are less than optimal. The software produced with these libraries then institutionalizes and propagates these inefficiencies.

You And I Are To Blame, Too!
All the above reasons matter, but I suspect that "featuritis"--the tendency to add feature after feature with each new software release--probably has more to do with code bloat than any other single factor. And it's hard to pin the blame for this entirely on the software vendors.

Take Windows. That lean 5-Mbyte version of Windows 3.0 was small, all right, but it couldn't even play a CD without add-on third-party software. Today's Windows can play data and music CDs, and even burn new ones. Windows 3.0 could only make primitive noises (bleeps and bloops) through the system speaker; today's Windows handles all manner of audio and video with relative ease. Early Windows had no built-in networking support; today's version natively supports a wide range of networking types and protocols. These--and many more built-in tools and capabilities we've come to expect--all help bulk up the operating system.

What's more, as each version of Windows gained new features, we insisted that it also retain compatibility with most of the hardware and software that had gone before. This never-ending aggregation of new code atop old eventually resulted in Windows 98, by far the most generally compatible operating system ever--able to run a huge range of software on a vast array of hardware. But what Windows 98 delivered in utility and compatibility came at the expense of simplicity, efficiency, and stability.

It's not just Windows. No operating system is immune to this kind of featuritis. Take Linux, for example. Although Linux can do more with less hardware than can Windows, a full-blown, general-purpose Linux workstation installation (complete with graphical interface and an array of the same kinds of tools and features that we've come to expect on our desktops) is hardly what you'd call "svelte." The current mainstream Red Hat 7.2 distribution, for example, calls for 64 Mbytes of RAM and 1.5-2 Gbytes of disk space, which also happens to be the rock-bottom minimum requirement for Windows XP. Other Linux distributions ship on as many as seven CDs. That's right: Seven! If that's not rampant featuritis, I don't know what is.

Is The Future Fat Or Lean?
So: Some of what we see in today's huge software packages is indeed simple code bloat, and some of it also is the bundling of the features that we want on our desktops. I don't see the latter changing any time soon. We want the features and conveniences to which we've become accustomed.

But there are signs that we may have reached some kind of plateau with the simpler forms of code bloat. For example, with Windows XP, Microsoft has abandoned portions of its legacy support. With fewer variables to contend with, the result is a more stable, reliable operating system. And over time, with fewer and fewer legacy products to support, there's at least the potential for Windows bloat to slow or even stop.

Linux tends to be self-correcting. If code-bloat becomes an issue within the Linux community, someone will develop some kind of a "skinny penguin" distribution that will pare away the needless code. (Indeed, there already are special-purpose Linux distributions that fit on just a floppy or two.)

While it's way too soon to declare that we've seen the end of code bloat, I believe the signs are hopeful. Maybe, just maybe, the "code fast, make it work, hardware will catch up" mentality will die out, and our hardware can finally get ahead of the curve. Maybe, just maybe, software inefficiency won't consume the next couple orders of magnitude of hardware horsepower.

What's your take? What's the worst example of bloat you know of? Are any companies producing lean, tight code anymore? Do you think code bloat is the result of the forces Fred outlines, or it more a matter of institutional sloppiness on the part of Microsoft and other software vendors? Do you think code bloat will reach a plateau, or will it continue indefinitely? Join in the discussion!



TOPICS: Editorial; Miscellaneous
KEYWORDS:
Navigation: use the links below to view more comments.
first previous 1-20 ... 61-8081-100101-120121-129 last
To: damnlimey
Thanks for the link! I bookmarked it in Opera.
121 posted on 12/20/2001 3:27:21 AM PST by jammer
[ Post Reply | Private Reply | To 5 | View Replies]

To: Centurion2000
Serious question for you. Do you sell or market Microsoft products ?

I have asked the same question. I don't like to use ad hominem arguments, but he is part of the Microsoft hit team. There are at least two teams like this on the forum who foul up the discussions. One is the M$ hit team, the other is the "anything the gummit does is good" hit team.

122 posted on 12/20/2001 3:38:26 AM PST by jammer
[ Post Reply | Private Reply | To 38 | View Replies]

To: VeritatisSplendor
Good reply about debugging. But there is a bright side--when I introduce a bug, I can always blame Windoze and my users just eat it up (of course, I am honest about it--I only blame the OS--which was a misnomer until W2K--when I think it caused the problem). If it is my bug, I just shrug and say, "Hey, Word for Windows (97 I believe it was) was reportedly released with 5,000 KNOWN bugs." They buy it.
123 posted on 12/20/2001 3:42:13 AM PST by jammer
[ Post Reply | Private Reply | To 12 | View Replies]

To: jammer;bush2000
Serious question for you. Do you sell or market Microsoft products ?

I have asked the same question. I don't like to use ad hominem arguments, but he is part of the Microsoft hit team. There are at least two teams like this on the forum who foul up the discussions. One is the M$ hit team, the other is the "anything the gummit does is good" hit team.

I only ask because he seems to have eery argument against linux down pat and pretty much crushed people in debate over the operating system.

BTW I admin NT,2000,Citrix and a little Solaris. I'll keep the Windows for the front end and the unix for the back end iron. That seems to be the best synergism of the two.

124 posted on 12/20/2001 4:02:59 AM PST by Centurion2000
[ Post Reply | Private Reply | To 122 | View Replies]

To: Bush2000
I take back a lot of things I have said and thought about you. My apologies. You make many cogent points on this thread (after the first one).

It doesn't feel good to admit one may have been an ass. :)

125 posted on 12/20/2001 4:06:26 AM PST by jammer
[ Post Reply | Private Reply | To 97 | View Replies]

To: jammer;bush2000
Title: former Microsoft employee replies...
From: fred langa
Email: fred_iwk@langa.com
Date: 17-Dec-01 8:34 PM GMT

Subject: Code Bloat

Hi Fred,

I have a few comments for you on the issue of code bloat.

If you reference any of these comments, please keep my
identity anonymous. I worked as a developer at Microsoft
from 1985 to 1993 and therefore know a bit about how
things happened at MS during that time, but I have gotten
quite tired of people taking out on me all their anger
and frustration at the company and its products so I like
to lay low, as it were....


In your Information Week article you wrote:
> Some of the bloat we commonly see in today's software is, no
> doubt, due to the tools used to create it. For example, a
> decade ago, low-level assembly-language programming was
> far more common. Assembly-language code is compact and
> blazingly fast, but is hard to produce, is tightly tied to specific
> platforms, is difficult to debug, and isn't well suited for very
> large projects. All those factors contribute to the reason why
> assembly language programs--and programmers--are
> relatively scarce these days.

> Instead, most of today's software is produced with high-level
> programming languages that often include code-automation
> tools, debugging routines, the ability to support projects of
> arbitrary scale, and so on. These tools can add an astonishing
> amount of baggage to the final code.

I spent my entire tenure at MS working on the BASIC
language products, from GWBASIC & BASCOM to the last
version of QuickBASIC, to the one-version-only VisualBASIC
for DOS, and then on to the first versions of VisualBASIC
for Windows. I was an assembly language programmer the
whole time, so I'm intimately familiar with what you're
saying about the difference between assembly language
and high level languages.

What you say is true but fails to mention what I believe,
through my own observations of how development changed
at Microsoft, to be the biggest reason for code bloat:
a conscious decision to trade developement efficiency
for code efficiency, with object oriented programming
being the culmination of that trend.

Here's an example:
In the early QuickBASIC days, when we were first
adding a real user interface to the BASIC development
tools, the UI code of text mode (non-Windows) products
was written by the developers of the product and built
into the product. It was clean and tight because it
did only what it had to do for that one product, and
was well integrated with that product.

Then it became clear that we, as a company, had several
developers scattered around the different product groups
all writing UI code that was doing essentially the same
job. So a UI group was created, and they built up a set
of shared library routines that each product would then
include and call. Of course, these library routines had
to provide all the features that all the products needed,
as well as doing so through a well-defined API instead
of tight integration with the main product code. This
led to a more efficient development process while also
leading to less efficient code size and speed.

Now extrapolate that concept from UI's to other aspects
of the products that could be shared: memory management,
graphics routines, help systems, network support, etc.
Each time a function got removed from tight integration
with the product and replaced with a general purpose
library routine, the development process got cleaner
and the resulting product got bigger and slower.

Note that this is not just a slide into big fat sloppy
code; it was a conscious decision made for business
reasons. It was not a universally popular decision
within the walls of the company, and I was one of those
who was not happy with it. I hated the fact that *my*
products, which I had worked so hard to keep lean and
clean, were being cluttered up with big fat code that
IMO we didn't need. But now I can look back and see
that it was a business decision that made sense. The
proponents of the decision to move development in that
direction always said things like, "don't worry that
it's fatter and slower, the hardware will catch up and
this method lets us get the products out the door and
to the customers sooner". They were right.

The eventual move into object oriented programming
magnified the situation because the development process
of building a product from object libraries was even
more efficient, but the resulting code was even more
fat and slow.

To answer some of your other questions:
> Are any companies producing lean, tight code anymore?

The Palm OS is my favorite example of lean tight code.
The writers of that were constrained by the weakness
of the hardware and so had to write tight code to make
it useful.

> Do you think code bloat is simply the result of
> institutional sloppiness on the part of Microsoft
> and other software vendors, or is there more to it?

I hope I have explained above that it is most definitely
not "simply [...] institutional sloppiness". I'm sure
that there is plenty of sloppy code in Windows, but to
put the whole issue of code bloat down to sloppiness is
much too simplistic. If it was just a matter of being
sloppy, Windows would have already been knocked off its
throne by a product which provides the same functionality
with unquestionably greater efficiency.

> Do you think code bloat will reach a plateau, or will
> it continue indefinitely?

I think it has already reached a plateau of sorts.
The move from single purpose assembly code to high level
languages and object libraries has already been made, and
I don't see on the horizon any similarly large leap left
to make. But I do expect every future version of Windows
to have more features and more help files and more
templates and more examples, so as long as you count
all those megabytes as "code bloat" then yes we'll see
a neverending increase.

Well, that's my 2 cents.

Keep up the good work,
[name withheld]

126 posted on 12/20/2001 2:38:10 PM PST by damnlimey
[ Post Reply | Private Reply | To 125 | View Replies]

To: damnlimey
"don't worry that it's fatter and slower, the hardware will catch up and this
method lets us get the products out the door and to the customers sooner". They were right." "

127 posted on 12/20/2001 2:45:14 PM PST by damnlimey
[ Post Reply | Private Reply | To 126 | View Replies]

To: PatrioticAmerican
There are many things that 'C' cannot do compared to an assembly design. Ever seen the disassembly of a switch statement? Assembly can be an indexed jump table.

A decent C compiler will also use an indexed jump table when such is appropriate. One advantage of a C compiler is that a programmer need not commit to a particular implementation of a switch structure; if growth of a project causes a switch/case structure to grow to the point that an indexed jump table is appropriate, the compiler will automatically make the change. It won't always pick the optimal approach (since it can't know how often different cases will occur) but it can still do pretty darned well.

Other design considerations involve memory usage. Memory pooling is a useful technique used in assembly that isn't used in 'C'. malloc calls are expensive. You could write such a thing in 'C', but it would be clumsy at best, and probably buggy.

The malloc family() of functions are decent general-purpose memory allocation routines. If the application requires something more or less sophisticated, a user can substitute his own routines. I've done memory-management routines in C; they're not hard.

Size is also a major factor in assembly. 'C' certainly write efficient code, but I can squeeze assembly down naturally without much work. Anytime code can be knocked from 1MB to 200K, there will be a significant performance gain.

Decent compilers shouldn't leave you that much room to improve things. Any C code which can be shrunk from 1MB to 200K by rewriting in assembly could probably be shrunk to 300-500K by doing a better job of writing it in C.

Actually, it has historically not been uncommon for good compilers to actually produce better code than decently-written assembly-language. This is not due to any fault of the assembly-language programmer, but rather due to the fact that compilers don't mind reworking optimizations whenever the code changes.

As a simple example, someone who is writing code in assembly-language needs to decide when he writes a routine what registers will be used by that routine, which ones will be saved on entry, which ones will be trashed, etc. Even if the programmer determines the optimal allocation of registers for a particular version of the code, it may be that a small change to the code results in something no longer "fitting" that used to, or may make it possible to "fit" something that previously didn't. A good compiler/linker package can deal with the first case and take advantage of the second without the programmer having to do extra work. By contrast, if the code was written in assembly, it will be necessary to either rework parts of it or else accept that the register allocation is no longer optimal.

[BTW, I recognize that many compilers aren't that good...]

Don't forget that the 'C' runtime library gets dragged in.

Hopefully only the parts that actually get used. Though in many compiler/linking platforms, that doesn't seem to matter.

128 posted on 12/20/2001 10:27:11 PM PST by supercat
[ Post Reply | Private Reply | To 120 | View Replies]

To: damnlimey
Good thoughts from one of the people in the trenches. Thanks.
129 posted on 12/26/2001 6:23:00 AM PST by jammer
[ Post Reply | Private Reply | To 126 | View Replies]


Navigation: use the links below to view more comments.
first previous 1-20 ... 61-8081-100101-120121-129 last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
News/Activism
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson