Unix Architecture Showing it's Age

Unix Architecture Showing it's Age
OStatic ^ | 14 May 2013 | Jon Buys

Posted on 05/16/2013 6:39:16 AM PDT by ShadowAce

High Scalability has a fascinating article up that summarizes a talk by Robert Graham of Errata Security, summarizing the development choices needed to support 10 million concurrent connections on a single server. From a small data center perspective, the numbers he is talking about seem astronomical, but not unbelievable. With a new era of Internet connected devices dawning the time may have come to question the core architecture of Unix, and therefore Linux and BSD as well.

The core of the talk seems to be that the kernel is too inefficient in how it handles threads and packets to maintain the speed and scalability requirements for web scale computing. Graham recommends moving as much of the data processing as possible away from the kernel and into the application. This means writing device drivers, handling threading and multiple cores, and allocating memory yourself. Graham uses the example of scaling Apache to illustrate how depending on the operating system can actually slow the application when handling several thousand connections per second.

Why? Servers could not handle 10K concurrent connections because of O(n^2) algorithms used in the kernel.
Two basic problems in the kernel:
Connection = thread/process. As a packet came in it would walk down all 10K processes in the kernel to figure out which thread should handle the packet
Connections = select/poll (single thread). Same scalability problem. Each packet had to walk a list of sockets.
Solution: fix the kernel to make lookups in constant time
Threads now constant time context switch regardless of number of threads.
Came with a new scalable epoll()/IOCompletionPort constant time socket lookup.

The talk touches on a concept I’ve been mulling over for months, the inherent complexity of modern data centers. If you are virtualizing, and you probably are, for your application to get to the hardware there are most likely several layers of abstraction that need to be unpacked before the code it is trying to execute actually gets to the CPU, or the data is written to disk. Does virtualization actually solve the problem we have, or is it an approach built from spending far too long in the box? That Graham’s solution for building systems that scale for the next decade is to bypass the OS entirely and talk directly to the network and hardware tells me that we might be seeing the first slivers of dusk for the kernel’s useful life serving up web applications.

So what would come after Linux? It is possible that researchers in the UK have come up with a solution with Mirage. In a paper quoted on the High Scaleablity site the researchers describe Mirage:

Our prototype (dubbed Mirage) is unashamedly academic; it extends the Objective Caml language with storage extensions and a custom run-time to emit binaries that execute as a guest operating system under Xen.

Mirage is, as stated, very academic, and currently very alpha quality, but the idea is compelling. Writing applications that compile directly to a complete machine, something that runs independently without an operating system. Of course, the first objection that comes to mind is that this would lead to writing for specialized hardware, and would mean going back in time thirty years. However, combining a next generation language with a project like [Open Compute] would provide open specifications and community driven development at a low level, ideal for eking out as much performance as possible from the hardware.

No matter which way the industry turns to solve the upcoming challenges of an exploding Internet, the next ten years are sure to be a wild ride.

TOPICS: Computers/Internet
KEYWORDS: architecture; design; os; scalability

Navigation: use the links below to view more comments.
first previous 1-20, 21-40, 41-60, 61-76 next last

To: ShadowAce

Today’s application programmers are not capable of writing their own device drivers or handling the stack.

They wouldn’t know malloc() if it bit them in the @ss and they couldn’t free() themselves from a paper bag.

21 posted on 05/16/2013 7:29:01 AM PDT by BuckeyeTexan (There are those that break and bend. I'm the other kind. ~Steve Earle)

[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

I wrote a proprietary 4GL based on C for multiple platforms for 20 years when there were few standards. It’s doable and maintainable, especially with the standards that exist in the industry today.

The problem, IMHO, is that today’s application programmers - as opposed to systems programmers who write the OS, kernels, device drivers, etc. - are incapable of handling the stack. They code in languages that handle everything for them. It makes for great, object-oriented, reusable code, but it means they’re totally unaware of what’s under the hood. They also have no idea how to open a socket or listen to a port.

The author wrote the article as if the Unix architecture is limited to one kernel/OS per server. It’s not.

22 posted on 05/16/2013 7:38:41 AM PDT by BuckeyeTexan (There are those that break and bend. I'm the other kind. ~Steve Earle)

[ Post Reply | Private Reply | To 10 | View Replies]

To: PieterCasparzen

Damn straight! Pounding my fist on the desk in agreement.

23 posted on 05/16/2013 7:43:18 AM PDT by BuckeyeTexan (There are those that break and bend. I'm the other kind. ~Steve Earle)

[ Post Reply | Private Reply | To 20 | View Replies]

To: Paladin2; ShadowAce

>> "He’s talking about incorporating device drivers into applications."
>
> I'm pretty sure one could do that in DOS.
> Essentially an OS could be written into an app to take over the machine.

You are correct; in fact you could write an OS that was a valid DOS application.
I was using TP7 to do this for my OS which I got to the point of being able to recognize commands and change the screen resolution before I shelved the project [due to school and getting stumped on handling memory management*].

* I was looking for a way to make the memory-manager 'tyrannical', and generic [able to handle the small stuff (like variables for the compiler)], in order to cut down on memory-leaks.

24 posted on 05/16/2013 7:48:56 AM PDT by OneWingedShark (Q: Why am I here? A: To do Justly, to love mercy, and to walk humbly with my God.)

[ Post Reply | Private Reply | To 8 | View Replies]

To: xenob

Yes assembly/machine language can provide most performance but was a nightmare to write and maintain.

Here's an interesting article on Ada outperforming an experienced assembly programmer.
The thing about optimizing compilers is that they can be 'taught' all sorts of "tricks" that the experienced assembly guy might not know.

Sounds like someone is just fishing for funding. Why would you not scale up to support that many connections with the added benefit of balancing and redundancy.

Balancing and redundancy are good, but the Unix philosophy is rather hostile to the thing that would unlock the full potential: distributed computing. Remember that that OS is heavily reliant-on/intertwined-with C, and C's take on even threads is more of a "let the user [programmer] handle them" (i.e. "fork")... with distributed computing one could have the tasking system assign the task to the system with the lowest load [i.e. maintain a priority-queue]. ~~ I'm not sure, but I seem to recall IBM's OS/360 has the ability to keep services going while a particular machine (node) is under repair/replacement, VMS probably has that ability too.

25 posted on 05/16/2013 8:00:52 AM PDT by OneWingedShark (Q: Why am I here? A: To do Justly, to love mercy, and to walk humbly with my God.)

[ Post Reply | Private Reply | To 9 | View Replies]

To: PieterCasparzen

Now, if your programming staff is a bunch of FREAKING MORONS, you’ll be buried in problems, to be sure. But then again, that’s always been true and always will be.
But if you have a small team of SMART PEOPLE, who RTFM and program accordingly, performance SHOULD BE a non-issue.
It boggles the mind how stupid people can be.

How smart is an industry which uses C/C++ for systems programming instead of Ada... or even LISP... or FORTH...?*
There is honestly no good reason that Systems should be written in C/C++, especially given the number of items that are implementation dependent.

* -- Being commissioned by the DOD, Ada was designed to allow exact representations so it could interface with hardware that had no standard. LISP was the system-language of the LISP-Machine, which had the ability to debug while running, even the system routines. FORTH is actually pretty amazing, allowing for entire systems to be built "in a matchbox".

26 posted on 05/16/2013 8:09:50 AM PDT by OneWingedShark (Q: Why am I here? A: To do Justly, to love mercy, and to walk humbly with my God.)

[ Post Reply | Private Reply | To 20 | View Replies]

To: BuckeyeTexan; PieterCasparzen

Great comments. People don’t realize what they have as far as computing power either on their desk or under it. Instead we create problems to try and solve them with the newest TLA (three-letter acronym for those in Rio Linda).

Like ‘big data’. Our company has to move to some huge NOSQL database to handle the data. Not because we need to, but because someone read an article...

27 posted on 05/16/2013 8:14:48 AM PDT by LearnsFromMistakes (Yes, I am happy to see you. But that IS a gun in my pocket.)

[ Post Reply | Private Reply | To 21 | View Replies]

To: BuckeyeTexan

I wrote a proprietary 4GL based on C for multiple platforms for 20 years when there were few standards.

I'm impressed.

It’s doable and maintainable, especially with the standards that exist in the industry today.

Maintainable? Can you pull out your C compiler and compile the source w/o modification today? How about using another C-compiler?

I generally agree here -- though let's not kid ourselves: Object Oriented isn't always the best choice.
I think really what we're seeing is a failure in the CS-education system; it is surprising how many languages don't have something like Ada's subtype -- and how many CS graduates don't grasp how useful it is to be able to exclude values. {IE in Ada Positive is a subtype (of Integer) that has the additional constraint of only having values greater than zero.} Any CS battery of coursework ought to include enough Math to make the advantages thereof obvious.

28 posted on 05/16/2013 8:17:20 AM PDT by OneWingedShark (Q: Why am I here? A: To do Justly, to love mercy, and to walk humbly with my God.)

[ Post Reply | Private Reply | To 22 | View Replies]

To: ShadowAce

Here’s an idiot excerpt...

“The talk touches on a concept I’ve been mulling over for months, the inherent complexity of modern data centers. If you are virtualizing, and you probably are, for your application to get to the hardware there are most likely several layers of abstraction that need to be unpacked before the code it is trying to execute actually gets to the CPU, or the data is written to disk. Does virtualization actually solve the problem we have, or is it an approach built from spending far too long in the box?”

A data center needs to be as complex as it needs to be, no more, no less. Operating systems today all essentially do the SAME things at the bottom end; they allow for sharing of hardware resources between multiple user processes.

Some Information Technology Department (IT) shops are more well managed than others. Shops with serious problems have human management problems that cause great difficulty overcoming hurdles in managing their servers. As far as server administration goes, while M$ products will “run right out of the box”, it’s a costly mistake to think that managing them will be easier than managing Unix servers, since M$ products historically typically have default settings and functionality that are inherently the wrong choice, while Unix basically requires the server adminstrator to visit the configuration, understand all the options and make their choices. If one could have a purely Unix server environment, and one spent the time to have every option choice well-thought out, instead of neglecting the “details”, the pure Unix environment would be far more secure than the pure M$ environment. Inevitably today server environments are mixed, as dictated by the needs of particular applications that IT is required to support. This, of course, makes the labor cost of server administration far greater in smaller shops.

The impetus behind virtualization...

Used to be IT shops would gradually keep adding servers. Some departments in the company would have their own file server. There are email servers. Then, applications would be purchased, and a new set of servers would be purchased; development, test, production for the app. This is just life.

But you’d find mistakes being made. Performance problem ? Don’t correctly tune the application and revisit the design and understand what you’re trying to achieve and how best to do that - no, buy a faster server.

Due to how MOST of CPU time is spent IDLE, we wind up with millions in capital investment in server hardware sitting there depreciating; unable to run fast enough to satisfy users when the poorly-tuned apps run, but sitting idle the rest of the time.

With software advertising being ubiqitous, every department started screaming for new applications that they just had to have - and getting approval directly from the top with IT all but cut out of the loop. Thus the crucial factors of “what present capabilities and plans do we have in terms of our existing IT staff and infrastructure” and “what external directions are there and how will they affect our shop” (i.e., should be be moving in this or that technology direction in terms of both hardware and IT training) are all too often not given enough consideration; perhaps lipservice, perhaps IT actually liked the idea of new apps and architectures themselves. But instead of preparing by ensuring IT staff expertise FIRST, the business would plunge into new technology unaware, outsource the required core expertise and (typically) allow selected IT staff to have at the juicy new project from a backseat role. All too often these staffers would turn around and leave to catapult their careers higher with their newfound “expertise”.

So the “glass tower” of IT was overrun. No longer could IT dictate when the changes or new reports were to be completed who had access to what, etc.

Now, every department finds out what the most popular software is for their tasks, and says to senior executives - “why aren’t we doing that ?”. The senior executives all start asking the same question. The salesman is called in for the dog and pony show, and IT gets their marching orders, and new servers come rolling in.

Thus we have IT shops with hundreds and oftentimes thousands of physical servers; maintaining it represents work that must be done (installing upgrades, installing new machines, removing old machines, etc.).

Thus we see the drive for virtualization of servers.

You want hundreds, thousands of servers ? Well, IT went out and bought virtualization software, so they can provision you a set of new servers without having to purchase, wait for, and set up new hardware. Just clickety-click, bada bing, there’s your new servers, let’s install this new software.

Is there overhead to the virtualization - sure. But sorry to say for the people who created this article, it’s no show stopper with today’s hardware performance.

The inevitable downside ? Of course - since it’s not that much easier to create servers - the decision to create new servers is made MUCH more easily today, with the predictable result that the number of virtual servers increases much faster than the number of physical servers used to increase. So IT departments continue buying hardware and continue struggling to keep up. The virtualization itself provides no direct help for keeping software updates applied to all these virtual servers, so IT can get buried trying to maintain them all. And to solve this problem there is the age-old solution of software-based automation and good old-fashioned figuring out efficient ways to manage the configurations of the software applications running on all those virtual servers.

29 posted on 05/16/2013 8:22:23 AM PDT by PieterCasparzen (We have to fix things ourselves)

[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

Sounds great, as long as you don't have to worry about:

Security
Portablilty
Bad apps hanging your server

Hmm ... come to think of it, seems like he is suggesting Windows to me ;)

30 posted on 05/16/2013 8:23:12 AM PDT by SecondAmendment (Restoring our Republic at 9.8357x10^8 FPS)

[ Post Reply | Private Reply | To 1 | View Replies]

To: BuckeyeTexan

Damn straight! Pounding my fist on the desk in agreement.

Right back at ya ! Biiinngggoo on your posts.

31 posted on 05/16/2013 8:24:46 AM PDT by PieterCasparzen (We have to fix things ourselves)

[ Post Reply | Private Reply | To 23 | View Replies]

To: OneWingedShark

Maintainable? Can you pull out your C compiler and compile the source w/o modification today? How about using another C-compiler?

That proprietary 4GL is still running today, still being modified and enhanced, still compiled on multiple platforms, and still distributed via binary to multitudes of companies. So, yes.

I think really what we're seeing is a failure in the CS-education system;

That's the truth of it right there. And I agree that OO isn't always the best fit. Therein lies the crux of the issue. They're not taught to solve the actual problem with the best tools. They're taught to write software to work around their own lack of knowledge. They don't know how to analyze a specific problem or requirement and then develop an efficient and effective solution. They write inefficient code because "hardware is cheap." They don't know enough to respect memory and bandwidth as the precious resources they still are.

32 posted on 05/16/2013 8:46:07 AM PDT by BuckeyeTexan (There are those that break and bend. I'm the other kind. ~Steve Earle)

[ Post Reply | Private Reply | To 28 | View Replies]

To: OneWingedShark

How smart is an industry which uses C/C++ for systems programming instead of Ada... or even LISP... or FORTH...?*

Every languange can work, of course, but today with the speed of hardware and the capabilities of all those languages there simply is no real driving need to go crazy looking for a "better language". If Ada were the ubiquitous language it would also work. What's far more important than choice of language is programmer's ability.

There is honestly no good reason that Systems should be written in C/C++, especially given the number of items that are implementation dependent.

Perhaps we can agree on IMHO...

Back in the '80s there was something that used to be talked about quite a bit - platform-independent code.

The human programmer, as opposed to the code monkey, knows to keep platform-dependencies isolated.

I do this whether I'm writing a database or writing a grocery list on paper. It's a way of life for a good programmer.

C is very sufficient for low-level programming and it's so ubiquitous that it's the only sensible choice. However, as I sit writing a database, I'm using C++. I'm just not going berserk with using inheritance and the more esoteric features of the language. Of course, the real acid test for platform dependence comes at the first port to a new platform. That's the time to completely rethink, redesign and rewrite to do a better job of isolating than was done the first time out. The key factor that ruins software projects is being under time pressure and the resultant negative impact on design decisions. Management seldom wants to hear "rethink"; they will gladly pay a penalty in lost time (time=money) for the next 25 years in exchange for saving 4 months right now.

That being said, certainly C/C++ sucks in various ways. Actually, most operating systems up until now suck in various ways, so the language is living in an inherently imperfect - and dangerous - OS world.

When I use a language, I first go about understanding it, finding the shortcomings that could hurt my project, then classifying them. Then, I devise coding approaches that eliminate the problems associated with each whole class of shortcomings. After that, I'm not "fighting" with my language du jour any more.

For example, in the C/C++ world, one simply does not malloc and then not make 100% sure that one will not also free. That would be like driving to the store, coming home with the groceries and bringing them inside - and forgetting to close the door on the car. I can truthfully say I've never done that.

If one has a program that is not 100% bug free, so it will respond predictably in 100% of the situations it could possibly encounter, then one is simply not finished writing that program.

Long story short, I find it simply mind-boggling that with today's $400 PC executing a billion instructions per second that any programmer would be struggling to find a solution. IMHO, if they are, they simply do not know the basic approaches to programming; they are seriously overcomplicating what they are trying to do, there is a much simpler solution that they are blind to. This Rube Goldberg school of software design has been the industry standard for the last 10-20 years.

33 posted on 05/16/2013 9:00:32 AM PDT by PieterCasparzen (We have to fix things ourselves)

[ Post Reply | Private Reply | To 26 | View Replies]

To: BuckeyeTexan

They don't know enough to respect memory and bandwidth as the precious resources they still are.

This is true; but it does open the door to those who do.

They're taught to write software to work around their own lack of knowledge. They don't know how to analyze a specific problem or requirement and then develop an efficient and effective solution.

Tell me about it; I recently ran into a situation where randomization for "selecting candidates from a pool" was done via a loop of get random, continue looping if already-selected... fortunately I was allowed to replace this with Fisher-Yates shuffling. (This was in actual production code.)

That's the truth of it right there. And I agree that OO isn't always the best fit. Therein lies the crux of the issue. They're not taught to solve the actual problem with the best tools.

Which is why I'm rather against using C as a systems-level language; I don't think it's the best tool for the job. -- Sadly we're also seeing this sort of "go with the popular" mentality in application (especially Web) development: nothing else explains why anyone would willingly use PHP in any serious endeavor/project.

34 posted on 05/16/2013 9:02:34 AM PDT by OneWingedShark (Q: Why am I here? A: To do Justly, to love mercy, and to walk humbly with my God.)

[ Post Reply | Private Reply | To 32 | View Replies]

To: LearnsFromMistakes; PieterCasparzen

Agreed. Another problem with “because someone read about it” is the cloud.

I hate the cloud because I’m set it my ways, but mostly because my basic software development philosophy is “I’ll do it myself, thank you very much.”

I’m not keen on integrating third-party apps into my software and if I don’t have “physical” possession of the data, it ain’t my data anymore. IMHO, corporations are putting their entire businesses at risk when they lose control of their own data. The cloud is not appropriate for mission-critical data.

35 posted on 05/16/2013 9:03:36 AM PDT by BuckeyeTexan (There are those that break and bend. I'm the other kind. ~Steve Earle)

[ Post Reply | Private Reply | To 27 | View Replies]

To: PieterCasparzen

UNIX is no longer any one kernel, it’s a philosophy. A philosophy that has been tested, tweaked, broken, fixed, and pounded upon for decades.

It’s strength is in it’s diversity and easy modification.

Maybe in ten years we’ll all be using GNU’s HURD, but we’ll still call it all “UNIX”.

All you haters better jump on the train.

36 posted on 05/16/2013 9:04:09 AM PDT by gura (If Allah is so great, why does he need fat sexually confused fanboys to do his dirty work? -iowahawk)

[ Post Reply | Private Reply | To 20 | View Replies]

To: PieterCasparzen

I think you and I have lived the same life in IT. I’ve experienced the exact same things. Don’t even get me started on having to keep up with software licenses on all those cores.

37 posted on 05/16/2013 9:14:20 AM PDT by BuckeyeTexan (There are those that break and bend. I'm the other kind. ~Steve Earle)

[ Post Reply | Private Reply | To 29 | View Replies]

To: OneWingedShark

re: randomization ... *snort*

Don’t get me started telling stories. I once had to fix a bug in some software that loaded all of the records from a file into an array for the sole purpose of counting the number of records in the file. The database had an integrated method that returned the number of records. And then it cleared the array and began reloading the records in order to update the contents of one field in each record.

*chuckle* I’ve written some things in LAMP because the shop required low cost (read free) and needed something quick and dirty. It was ... quick and dirty. Had to grit my teeth.

I’m a C-lover so we disagree there, but I am always open to other tools.

38 posted on 05/16/2013 9:31:27 AM PDT by BuckeyeTexan (There are those that break and bend. I'm the other kind. ~Steve Earle)

[ Post Reply | Private Reply | To 34 | View Replies]

To: BuckeyeTexan

I’m a C-lover so we disagree there, but I am always open to other tools.

Yeah; sometimes the better/other 'tools' are really amazing, but you just don't know how to use them (I'm like that with FORTH; it's an intriguing little language... but I can't do jack w/ it yet). Ada's got a lot of great stuff in it (I'm still fairly bad at using it) -- but one thing that impresses me about it is that it was designed to be maintainable (part of the "programming as a human activity" ethos) and I think it really shows with the new Pre- and Post-conditions (which never go stale due to code/[annotated-]comment impedance mismatch), type-invariants (e.g. a point on a unit circle always has to have x**2+y**2 = 1; in Ada 2012 you can specify this [or even that the signature in a header is valid]), and the new qualitative statements for all and for some (e.g. there exists) all play together.

*chuckle* I’ve written some things in LAMP because the shop required low cost (read free) and needed something quick and dirty. It was ... quick and dirty. Had to grit my teeth.

When I was programming full-time (PHP) we were using LAMP; I'm not sure, but I think that (and maybe your use) might have been a violation of the terms for the free usage license.

Too bad; I actually like them -- there's a lot you can learn listening to stories. The Unix-Haters Handbook, for instance, was an amusing and surprisingly insightful collection of stories... despite its age it made some points which are still valid today: one of which is that trying to impose state on a system that was designed to be stateless is... troublesome. (This is why, IMO, HTML 5 [and CSS] is such a bad idea: they're trying to make HTML, which was designed to have content independent of the layout [leaving layout to the browser] -- IOW, they're trying to go directly against the whole idea of HTML.)

39 posted on 05/16/2013 9:56:12 AM PDT by OneWingedShark (Q: Why am I here? A: To do Justly, to love mercy, and to walk humbly with my God.)

[ Post Reply | Private Reply | To 38 | View Replies]

To: PieterCasparzen

Well said. Much of the rest of your post was pretty much on target as well. I've seen the server creep in virtualized environments first-hand. I think it's amazing how fast virtual server sprawl strikes virtual environments, and how incredibly wasteful it is. Thankfully, it's not as wasteful as physical server sprawl was.

Last job I was at had a huge virtualization push several years ago. One thing that I thought was interesting was some of the stuff that you'd hear from PHBs, because of what they'd been told. They'd look at the Unix folks with a critical eye, and point out that the Windows team was getting almost a hundred to one consolidation, (or some other rediculous number), and would ask why consolidation on the Unix side of the house was lucky to get 15/1 consolidation, (in some cases much, much less). My response was incredulity at their thinking processes. The reason you could cram so many Windows boxes in the virtual environment was because so few of them were actually doing much of anything, because each server was dedicated to a specific application/site or whatever, whereas we'd have a Unix server running Apache that had a hundred IPs plumbed on the box because of the number of sites it was supporting, or we'd have a Weblogic server running multiple clusters that contained multiple JVMs, pretty much maxing out the hardware of the box, both memory and CPU. How the hell do you virtualize that? Well, you can, but you're not gaining much from the consolidation, but instead are doing it because of some of the other capabilities iti gives you, such as the abstraction from the hardware itself.

My other question to the PHB would be why they were so happy about what was happening on the Windows side of the house, when what all that consolidation plainly showed was the incredible waste of resources all those single-purpose Windows machines had represented. Yeah, you're saving money now in relation to the flat out waste you had before, but you should keep in mind what had come before, and perhaps consider how that same methodology was impacting all this new 'virtualized' hardware.

Sadly, I never really saw anything come from those discussions, because there was no interest in assigning a dollar cost to the idea of 'one app per server'.

40 posted on 05/16/2013 10:11:28 AM PDT by zeugma (Those of us who work for a living are outnumbered by those who vote for a living.)

[ Post Reply | Private Reply | To 29 | View Replies]

Navigation: use the links below to view more comments.
first previous 1-20, 21-40, 41-60, 61-76 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search

General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794