Unix Architecture Showing it's Age

Unix Architecture Showing it's Age
OStatic ^ | 14 May 2013 | Jon Buys

Posted on 05/16/2013 6:39:16 AM PDT by ShadowAce

High Scalability has a fascinating article up that summarizes a talk by Robert Graham of Errata Security, summarizing the development choices needed to support 10 million concurrent connections on a single server. From a small data center perspective, the numbers he is talking about seem astronomical, but not unbelievable. With a new era of Internet connected devices dawning the time may have come to question the core architecture of Unix, and therefore Linux and BSD as well.

The core of the talk seems to be that the kernel is too inefficient in how it handles threads and packets to maintain the speed and scalability requirements for web scale computing. Graham recommends moving as much of the data processing as possible away from the kernel and into the application. This means writing device drivers, handling threading and multiple cores, and allocating memory yourself. Graham uses the example of scaling Apache to illustrate how depending on the operating system can actually slow the application when handling several thousand connections per second.

Why? Servers could not handle 10K concurrent connections because of O(n^2) algorithms used in the kernel.
Two basic problems in the kernel:
Connection = thread/process. As a packet came in it would walk down all 10K processes in the kernel to figure out which thread should handle the packet
Connections = select/poll (single thread). Same scalability problem. Each packet had to walk a list of sockets.
Solution: fix the kernel to make lookups in constant time
Threads now constant time context switch regardless of number of threads.
Came with a new scalable epoll()/IOCompletionPort constant time socket lookup.

The talk touches on a concept I’ve been mulling over for months, the inherent complexity of modern data centers. If you are virtualizing, and you probably are, for your application to get to the hardware there are most likely several layers of abstraction that need to be unpacked before the code it is trying to execute actually gets to the CPU, or the data is written to disk. Does virtualization actually solve the problem we have, or is it an approach built from spending far too long in the box? That Graham’s solution for building systems that scale for the next decade is to bypass the OS entirely and talk directly to the network and hardware tells me that we might be seeing the first slivers of dusk for the kernel’s useful life serving up web applications.

So what would come after Linux? It is possible that researchers in the UK have come up with a solution with Mirage. In a paper quoted on the High Scaleablity site the researchers describe Mirage:

Our prototype (dubbed Mirage) is unashamedly academic; it extends the Objective Caml language with storage extensions and a custom run-time to emit binaries that execute as a guest operating system under Xen.

Mirage is, as stated, very academic, and currently very alpha quality, but the idea is compelling. Writing applications that compile directly to a complete machine, something that runs independently without an operating system. Of course, the first objection that comes to mind is that this would lead to writing for specialized hardware, and would mean going back in time thirty years. However, combining a next generation language with a project like [Open Compute] would provide open specifications and community driven development at a low level, ideal for eking out as much performance as possible from the hardware.

No matter which way the industry turns to solve the upcoming challenges of an exploding Internet, the next ten years are sure to be a wild ride.

TOPICS: Computers/Internet
KEYWORDS: architecture; design; os; scalability

Navigation: use the links below to view more comments.
first 1-20, 21-40, 41-60, 61-76 next last

1 posted on 05/16/2013 6:39:16 AM PDT by ShadowAce

[ Post Reply | Private Reply | View Replies]

To: rdb3; Calvinist_Dark_Lord; Salo; JosephW; Only1choice____Freedom; amigatec; stylin_geek; ...

2 posted on 05/16/2013 6:39:35 AM PDT by ShadowAce (Linux -- The Ultimate Windows Service Pack)

[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

“it’s”?

3 posted on 05/16/2013 6:42:52 AM PDT by Clint N. Suhks (Barack Hussein 0bama: Benghazi was a bump in the road a sideshow and a circus.)

[ Post Reply | Private Reply | To 1 | View Replies]

To: Clint N. Suhks

I’ve always said that most writers don’t know how to write.:)

4 posted on 05/16/2013 6:45:51 AM PDT by ShadowAce (Linux -- The Ultimate Windows Service Pack)

[ Post Reply | Private Reply | To 3 | View Replies]

To: ShadowAce

" That Graham’s solution for building systems that scale for the next decade is to bypass the OS entirely and talk directly to the network and hardware"

Sounds like DOS.

5 posted on 05/16/2013 6:46:19 AM PDT by Paladin2

[ Post Reply | Private Reply | To 1 | View Replies]

To: Paladin2

No—DOS is an OS. He’s talking about incorporating device drivers into applications. No OS on the machine at all.

6 posted on 05/16/2013 6:48:09 AM PDT by ShadowAce (Linux -- The Ultimate Windows Service Pack)

[ Post Reply | Private Reply | To 5 | View Replies]

To: ShadowAce

LOL!

By way of comparison, Windows XP’s kernel has been patched so much that it is almost 95% unavailable except through a few secure channel APIs. This is a 15 year old OS. Unix, on the other hand, has been around for a very long time and is reaching the end of its useful lifetime due to an inherent programmatic shortfall from its very core utility.

The latest generation of Windows and Apple kernels are actually isolated behind an abstraction layer that performs all of the calls to the kernel, essentially freeing up the kernel for higher-order processing. I’m amused that Unix would have this shortcoming considering how fortuitous it’s been as an OS, but I suppose even giants must eventually fall.

7 posted on 05/16/2013 6:50:20 AM PDT by rarestia (It's time to water the Tree of Liberty.)

[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

"He’s talking about incorporating device drivers into applications."

I'm pretty sure one could do that in DOS.

Essentially an OS could be written into an app to take over the machine.

8 posted on 05/16/2013 6:51:37 AM PDT by Paladin2

[ Post Reply | Private Reply | To 6 | View Replies]

To: ShadowAce

Yes assembly/machine language can provide most performance but was a nightmare to write and maintain. Sounds like someone is just fishing for funding. Why would you not scale up to support that many connections with the added benefit of balancing and redundancy.

9 posted on 05/16/2013 6:52:04 AM PDT by xenob

[ Post Reply | Private Reply | To 1 | View Replies]

To: xenob

It does sounds rather nightmarish. Each machine (due to various options or omissions) would essentially be unique--thus you'd have to write for that specific machine.

It doesn't sound maintainable at all.

10 posted on 05/16/2013 6:54:34 AM PDT by ShadowAce (Linux -- The Ultimate Windows Service Pack)

[ Post Reply | Private Reply | To 9 | View Replies]

To: ShadowAce

Though provoking article. I guess I’d have three points.

1. If the number of connections coming into a web server becomes too large, then just spawn multiple instances (probably virtual) and then load balance between them. In other words this really may not be as real a problem as the article would have you believe.

2. Running apps on bare metal - I mean you could do it but you’d have to reinvent a lot of wheels like memory management, tcp/ip stacks, disk i/o, semaphores and locks and the list goes on. At some point someone would say I can do all that low level stuff for you and you can focus on your app - so you’d be back to a kernel of one sort or another in pretty short order.

What might be interesting is to have a control plane that runs on top of a conventional kernel and a data plane that bypasses all that stuff and just retrieves the data. Not completely sure how that would work but I could roughly speaking imagine that.

3. The other practical thing that could be done is to try to optimize the pressure points in the existing kernels. Find a way to do lookups in constant or log time instead of O(N). This approach will likely happen.

11 posted on 05/16/2013 6:57:20 AM PDT by 2 Kool 2 Be 4-Gotten

[ Post Reply | Private Reply | To 6 | View Replies]

To: ShadowAce

In my not so humble opinion, this is a bad approach. The vast majority of errors, overruns and causes for BSOD are in the driver layer. The better approach is stability first and then performance improvements. To that end, Minix 3 would be the better approach.

As for the socket to app issue, there are a number of approaches that can be taken. An index of sockets with dynamic index based upon most recent traffic would shorten the lookup time for most high concurrency situations.

However, I would argue that putting such a load on a single box is not at all a wise choice. Lose that box and lose 10,000 connections !?!?!? Sounds risky to me. Far better to break the load up over a larger base of servers.

12 posted on 05/16/2013 7:00:57 AM PDT by taxcontrol

[ Post Reply | Private Reply | To 1 | View Replies]

To: 2 Kool 2 Be 4-Gotten

I agree with your third point. I foresee the kernels evolving into something much more efficient, rather than the whole ecosystem changing.

13 posted on 05/16/2013 7:01:10 AM PDT by ShadowAce (Linux -- The Ultimate Windows Service Pack)

[ Post Reply | Private Reply | To 11 | View Replies]

To: ShadowAce

Genode is interesting.

http://genode.org/documentation/architecture/index

http://genode.org/documentation/general-overview/index

14 posted on 05/16/2013 7:04:23 AM PDT by Ray76 (Do you reject Obama? And all his works? And all his empty promises?)

[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

Hey I know! Replace that old worn out operating system with one of the new modern ones, you know, like Windows 8. That should fix all the problems, right?

15 posted on 05/16/2013 7:10:22 AM PDT by catnipman (Cat Nipman: Vote Republican in 2012 and only be called racist one more time!)

[ Post Reply | Private Reply | To 1 | View Replies]

To: xenob

Why would you not scale up to support that many connections with the added benefit of balancing and redundancy.

Exactly. Hardware is cheap - even cheaper than a whole team of Indian H1B's. :)

16 posted on 05/16/2013 7:12:22 AM PDT by Mr. Jeeves (CTRL-GALT-DELETE)

[ Post Reply | Private Reply | To 9 | View Replies]

To: ShadowAce

There is a reason why we are where we are now. This particular theoretical approach would eventually morph into an OS as people incorporated more base support into each iteration. I think that we are doing a grave disservice to new technologists by not teaching them the technological history of the last 40 decades.

17 posted on 05/16/2013 7:20:30 AM PDT by Durus (You can avoid reality, but you cannot avoid the consequences of avoiding reality. Ayn Rand)

[ Post Reply | Private Reply | To 1 | View Replies]

To: Durus

Sorry, I meant 4 decades.

18 posted on 05/16/2013 7:21:21 AM PDT by Durus (You can avoid reality, but you cannot avoid the consequences of avoiding reality. Ayn Rand)

[ Post Reply | Private Reply | To 17 | View Replies]

To: Durus

heh—40 works for me too

19 posted on 05/16/2013 7:24:57 AM PDT by ShadowAce (Linux -- The Ultimate Windows Service Pack)

[ Post Reply | Private Reply | To 18 | View Replies]

To: ShadowAce

This article is an enormous pile of crap.

It’s talking about a problem that has no practical significance.

ANY excuse to beat up on Unix, because there are FREE versions of it.

The real problem is from a business point of view. Computers are so fast today that massive applications can be run on a $500 PC. Well, IF (big IF) they are well-written.

We find business customers simply NOT NEEDING more computing power.

What to do ?

Create new buzzwords, create new things for them to do.

Everyone needs a “data warehouse”.

Buy new servers, create copies of your database on them in different forms. Mix and mash the data 6 ways to Sunday. Write all sorts of reporting tools, do all sorts of analysis (business intelligence buzzword). Spend millions on hardware and consultants to do things like tweek your pricing structure. Now to customer can come up with math to verify for senior management that they are improving top and bottom line results. And global warming is definitely man made.

Back to just how fast computers now are.

Let’s say you have a machine that executes a BILLION instructions every second. Oh, and it has 4 of those processors on its one chip.

It has a network card that will transfer a GIGABYTE every second.

It has 96 GIGABYTES of main memory, i.e., RAM.

It has a TERABYTE of disk storage and can transfer HUNDREDS of MEGABYTES per second to and from it.

And the dang machine costs $600.

Sounds like to me a single person could write ridiculously high-volume applications compared to even ten years ago, and run them on their PC.

Once you get into the realm of businesses that have some capital to throw around, and you can EASILY and CHEAPLY have racks of servers, with LOAD BALANCING, to support millions of simultaneous online users.

Now, if your programming staff is a bunch of FREAKING MORONS, you’ll be buried in problems, to be sure. But then again, that’s always been true and always will be.

But if you have a small team of SMART PEOPLE, who RTFM and program accordingly, performance SHOULD BE a non-issue.

It boggles the mind how stupid people can be.

20 posted on 05/16/2013 7:28:51 AM PDT by PieterCasparzen (We have to fix things ourselves)

[ Post Reply | Private Reply | To 1 | View Replies]

Navigation: use the links below to view more comments.
first 1-20, 21-40, 41-60, 61-76 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search

General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794