Rethinking software bloat.

Thanks for the link! I bookmarked it in Opera.

Serious question for you. Do you sell or market Microsoft products ?

I have asked the same question. I don't like to use ad hominem arguments, but he is part of the Microsoft hit team. There are at least two teams like this on the forum who foul up the discussions. One is the M$ hit team, the other is the "anything the gummit does is good" hit team.

Good reply about debugging. But there is a bright side--when I introduce a bug, I can always blame Windoze and my users just eat it up (of course, I am honest about it--I only blame the OS--which was a misnomer until W2K--when I think it caused the problem). If it is my bug, I just shrug and say, "Hey, Word for Windows (97 I believe it was) was reportedly released with 5,000 KNOWN bugs." They buy it.

Serious question for you. Do you sell or market Microsoft products ?

I have asked the same question. I don't like to use ad hominem arguments, but he is part of the Microsoft hit team. There are at least two teams like this on the forum who foul up the discussions. One is the M$ hit team, the other is the "anything the gummit does is good" hit team.

I only ask because he seems to have eery argument against linux down pat and pretty much crushed people in debate over the operating system.

BTW I admin NT,2000,Citrix and a little Solaris. I'll keep the Windows for the front end and the unix for the back end iron. That seems to be the best synergism of the two.

I take back a lot of things I have said and thought about you. My apologies. You make many cogent points on this thread (after the first one).

It doesn't feel good to admit one may have been an ass. :)

Title:	former Microsoft employee replies...
From:	fred langa
Email:	fred_iwk@langa.com
Date:	17-Dec-01 8:34 PM GMT

Subject: Code Bloat

Hi Fred,

I have a few comments for you on the issue of code bloat.

If you reference any of these comments, please keep my
identity anonymous. I worked as a developer at Microsoft
from 1985 to 1993 and therefore know a bit about how
things happened at MS during that time, but I have gotten
quite tired of people taking out on me all their anger
and frustration at the company and its products so I like
to lay low, as it were....

In your Information Week article you wrote:
> Some of the bloat we commonly see in today's software is, no
> doubt, due to the tools used to create it. For example, a
> decade ago, low-level assembly-language programming was
> far more common. Assembly-language code is compact and
> blazingly fast, but is hard to produce, is tightly tied to specific
> platforms, is difficult to debug, and isn't well suited for very
> large projects. All those factors contribute to the reason why
> assembly language programs--and programmers--are
> relatively scarce these days.

> Instead, most of today's software is produced with high-level
> programming languages that often include code-automation
> tools, debugging routines, the ability to support projects of
> arbitrary scale, and so on. These tools can add an astonishing
> amount of baggage to the final code.

I spent my entire tenure at MS working on the BASIC
language products, from GWBASIC & BASCOM to the last
version of QuickBASIC, to the one-version-only VisualBASIC
for DOS, and then on to the first versions of VisualBASIC
for Windows. I was an assembly language programmer the
whole time, so I'm intimately familiar with what you're
saying about the difference between assembly language
and high level languages.

What you say is true but fails to mention what I believe,
through my own observations of how development changed
at Microsoft, to be the biggest reason for code bloat:
a conscious decision to trade developement efficiency
for code efficiency, with object oriented programming
being the culmination of that trend.

Here's an example:
In the early QuickBASIC days, when we were first
adding a real user interface to the BASIC development
tools, the UI code of text mode (non-Windows) products
was written by the developers of the product and built
into the product. It was clean and tight because it
did only what it had to do for that one product, and
was well integrated with that product.

Then it became clear that we, as a company, had several
developers scattered around the different product groups
all writing UI code that was doing essentially the same
job. So a UI group was created, and they built up a set
of shared library routines that each product would then
include and call. Of course, these library routines had
to provide all the features that all the products needed,
as well as doing so through a well-defined API instead
of tight integration with the main product code. This
led to a more efficient development process while also
leading to less efficient code size and speed.

Now extrapolate that concept from UI's to other aspects
of the products that could be shared: memory management,
graphics routines, help systems, network support, etc.
Each time a function got removed from tight integration
with the product and replaced with a general purpose
library routine, the development process got cleaner
and the resulting product got bigger and slower.

Note that this is not just a slide into big fat sloppy
code; it was a conscious decision made for business
reasons. It was not a universally popular decision
within the walls of the company, and I was one of those
who was not happy with it. I hated the fact that *my*
products, which I had worked so hard to keep lean and
clean, were being cluttered up with big fat code that
IMO we didn't need. But now I can look back and see
that it was a business decision that made sense. The
proponents of the decision to move development in that
direction always said things like, "don't worry that
it's fatter and slower, the hardware will catch up and
this method lets us get the products out the door and
to the customers sooner". They were right.

The eventual move into object oriented programming
magnified the situation because the development process
of building a product from object libraries was even
more efficient, but the resulting code was even more
fat and slow.

To answer some of your other questions:
> Are any companies producing lean, tight code anymore?

The Palm OS is my favorite example of lean tight code.
The writers of that were constrained by the weakness
of the hardware and so had to write tight code to make
it useful.

> Do you think code bloat is simply the result of
> institutional sloppiness on the part of Microsoft
> and other software vendors, or is there more to it?

I hope I have explained above that it is most definitely
not "simply [...] institutional sloppiness". I'm sure
that there is plenty of sloppy code in Windows, but to
put the whole issue of code bloat down to sloppiness is
much too simplistic. If it was just a matter of being
sloppy, Windows would have already been knocked off its
throne by a product which provides the same functionality
with unquestionably greater efficiency.

> Do you think code bloat will reach a plateau, or will
> it continue indefinitely?

I think it has already reached a plateau of sorts.
The move from single purpose assembly code to high level
languages and object libraries has already been made, and
I don't see on the horizon any similarly large leap left
to make. But I do expect every future version of Windows
to have more features and more help files and more
templates and more examples, so as long as you count
all those megabytes as "code bloat" then yes we'll see
a neverending increase.

Well, that's my 2 cents.

Keep up the good work,
[name withheld]

"don't worry that it's fatter and slower, the hardware will catch up and this
method lets us get the products out the door and to the customers sooner". They were right." "

There are many things that 'C' cannot do compared to an assembly design. Ever seen the disassembly of a switch statement? Assembly can be an indexed jump table.

A decent C compiler will also use an indexed jump table when such is appropriate. One advantage of a C compiler is that a programmer need not commit to a particular implementation of a switch structure; if growth of a project causes a switch/case structure to grow to the point that an indexed jump table is appropriate, the compiler will automatically make the change. It won't always pick the optimal approach (since it can't know how often different cases will occur) but it can still do pretty darned well.

Other design considerations involve memory usage. Memory pooling is a useful technique used in assembly that isn't used in 'C'. malloc calls are expensive. You could write such a thing in 'C', but it would be clumsy at best, and probably buggy.

The malloc family() of functions are decent general-purpose memory allocation routines. If the application requires something more or less sophisticated, a user can substitute his own routines. I've done memory-management routines in C; they're not hard.

Size is also a major factor in assembly. 'C' certainly write efficient code, but I can squeeze assembly down naturally without much work. Anytime code can be knocked from 1MB to 200K, there will be a significant performance gain.

Decent compilers shouldn't leave you that much room to improve things. Any C code which can be shrunk from 1MB to 200K by rewriting in assembly could probably be shrunk to 300-500K by doing a better job of writing it in C.

Actually, it has historically not been uncommon for good compilers to actually produce better code than decently-written assembly-language. This is not due to any fault of the assembly-language programmer, but rather due to the fact that compilers don't mind reworking optimizations whenever the code changes.

As a simple example, someone who is writing code in assembly-language needs to decide when he writes a routine what registers will be used by that routine, which ones will be saved on entry, which ones will be trashed, etc. Even if the programmer determines the optimal allocation of registers for a particular version of the code, it may be that a small change to the code results in something no longer "fitting" that used to, or may make it possible to "fit" something that previously didn't. A good compiler/linker package can deal with the first case and take advantage of the second without the programmer having to do extra work. By contrast, if the code was written in assembly, it will be necessary to either rework parts of it or else accept that the register allocation is no longer optimal.

[BTW, I recognize that many compilers aren't that good...]

Don't forget that the 'C' runtime library gets dragged in.

Hopefully only the parts that actually get used. Though in many compiler/linking platforms, that doesn't seem to matter.

Good thoughts from one of the people in the trenches. Thanks.