Posted on 05/16/2013 6:39:16 AM PDT by ShadowAce
High Scalability has a fascinating article up that summarizes a talk by Robert Graham of Errata Security, summarizing the development choices needed to support 10 million concurrent connections on a single server. From a small data center perspective, the numbers he is talking about seem astronomical, but not unbelievable. With a new era of Internet connected devices dawning the time may have come to question the core architecture of Unix, and therefore Linux and BSD as well.
The core of the talk seems to be that the kernel is too inefficient in how it handles threads and packets to maintain the speed and scalability requirements for web scale computing. Graham recommends moving as much of the data processing as possible away from the kernel and into the application. This means writing device drivers, handling threading and multiple cores, and allocating memory yourself. Graham uses the example of scaling Apache to illustrate how depending on the operating system can actually slow the application when handling several thousand connections per second.
Why? Servers could not handle 10K concurrent connections because of O(n^2) algorithms used in the kernel.
Two basic problems in the kernel:
Connection = thread/process. As a packet came in it would walk down all 10K processes in the kernel to figure out which thread should handle the packet
Connections = select/poll (single thread). Same scalability problem. Each packet had to walk a list of sockets.
Solution: fix the kernel to make lookups in constant time
Threads now constant time context switch regardless of number of threads.
Came with a new scalable epoll()/IOCompletionPort constant time socket lookup.
The talk touches on a concept Ive been mulling over for months, the inherent complexity of modern data centers. If you are virtualizing, and you probably are, for your application to get to the hardware there are most likely several layers of abstraction that need to be unpacked before the code it is trying to execute actually gets to the CPU, or the data is written to disk. Does virtualization actually solve the problem we have, or is it an approach built from spending far too long in the box? That Grahams solution for building systems that scale for the next decade is to bypass the OS entirely and talk directly to the network and hardware tells me that we might be seeing the first slivers of dusk for the kernels useful life serving up web applications.
So what would come after Linux? It is possible that researchers in the UK have come up with a solution with Mirage. In a paper quoted on the High Scaleablity site the researchers describe Mirage:
Our prototype (dubbed Mirage) is unashamedly academic; it extends the Objective Caml language with storage extensions and a custom run-time to emit binaries that execute as a guest operating system under Xen.
Mirage is, as stated, very academic, and currently very alpha quality, but the idea is compelling. Writing applications that compile directly to a complete machine, something that runs independently without an operating system. Of course, the first objection that comes to mind is that this would lead to writing for specialized hardware, and would mean going back in time thirty years. However, combining a next generation language with a project like [Open Compute] would provide open specifications and community driven development at a low level, ideal for eking out as much performance as possible from the hardware.
No matter which way the industry turns to solve the upcoming challenges of an exploding Internet, the next ten years are sure to be a wild ride.
“it’s”?
I’ve always said that most writers don’t know how to write.:)
Sounds like DOS.
No—DOS is an OS. He’s talking about incorporating device drivers into applications. No OS on the machine at all.
LOL!
By way of comparison, Windows XP’s kernel has been patched so much that it is almost 95% unavailable except through a few secure channel APIs. This is a 15 year old OS. Unix, on the other hand, has been around for a very long time and is reaching the end of its useful lifetime due to an inherent programmatic shortfall from its very core utility.
The latest generation of Windows and Apple kernels are actually isolated behind an abstraction layer that performs all of the calls to the kernel, essentially freeing up the kernel for higher-order processing. I’m amused that Unix would have this shortcoming considering how fortuitous it’s been as an OS, but I suppose even giants must eventually fall.
I'm pretty sure one could do that in DOS.
Essentially an OS could be written into an app to take over the machine.
Yes assembly/machine language can provide most performance but was a nightmare to write and maintain. Sounds like someone is just fishing for funding. Why would you not scale up to support that many connections with the added benefit of balancing and redundancy.
It doesn't sound maintainable at all.
Though provoking article. I guess I’d have three points.
1. If the number of connections coming into a web server becomes too large, then just spawn multiple instances (probably virtual) and then load balance between them. In other words this really may not be as real a problem as the article would have you believe.
2. Running apps on bare metal - I mean you could do it but you’d have to reinvent a lot of wheels like memory management, tcp/ip stacks, disk i/o, semaphores and locks and the list goes on. At some point someone would say I can do all that low level stuff for you and you can focus on your app - so you’d be back to a kernel of one sort or another in pretty short order.
What might be interesting is to have a control plane that runs on top of a conventional kernel and a data plane that bypasses all that stuff and just retrieves the data. Not completely sure how that would work but I could roughly speaking imagine that.
3. The other practical thing that could be done is to try to optimize the pressure points in the existing kernels. Find a way to do lookups in constant or log time instead of O(N). This approach will likely happen.
In my not so humble opinion, this is a bad approach. The vast majority of errors, overruns and causes for BSOD are in the driver layer. The better approach is stability first and then performance improvements. To that end, Minix 3 would be the better approach.
As for the socket to app issue, there are a number of approaches that can be taken. An index of sockets with dynamic index based upon most recent traffic would shorten the lookup time for most high concurrency situations.
However, I would argue that putting such a load on a single box is not at all a wise choice. Lose that box and lose 10,000 connections !?!?!? Sounds risky to me. Far better to break the load up over a larger base of servers.
I agree with your third point. I foresee the kernels evolving into something much more efficient, rather than the whole ecosystem changing.
Genode is interesting.
http://genode.org/documentation/architecture/index
http://genode.org/documentation/general-overview/index
Hey I know! Replace that old worn out operating system with one of the new modern ones, you know, like Windows 8. That should fix all the problems, right?
Exactly. Hardware is cheap - even cheaper than a whole team of Indian H1B's. :)
There is a reason why we are where we are now. This particular theoretical approach would eventually morph into an OS as people incorporated more base support into each iteration. I think that we are doing a grave disservice to new technologists by not teaching them the technological history of the last 40 decades.
Sorry, I meant 4 decades.
heh—40 works for me too
This article is an enormous pile of crap.
It’s talking about a problem that has no practical significance.
ANY excuse to beat up on Unix, because there are FREE versions of it.
The real problem is from a business point of view. Computers are so fast today that massive applications can be run on a $500 PC. Well, IF (big IF) they are well-written.
We find business customers simply NOT NEEDING more computing power.
What to do ?
Create new buzzwords, create new things for them to do.
Everyone needs a “data warehouse”.
Buy new servers, create copies of your database on them in different forms. Mix and mash the data 6 ways to Sunday. Write all sorts of reporting tools, do all sorts of analysis (business intelligence buzzword). Spend millions on hardware and consultants to do things like tweek your pricing structure. Now to customer can come up with math to verify for senior management that they are improving top and bottom line results. And global warming is definitely man made.
Back to just how fast computers now are.
Let’s say you have a machine that executes a BILLION instructions every second. Oh, and it has 4 of those processors on its one chip.
It has a network card that will transfer a GIGABYTE every second.
It has 96 GIGABYTES of main memory, i.e., RAM.
It has a TERABYTE of disk storage and can transfer HUNDREDS of MEGABYTES per second to and from it.
And the dang machine costs $600.
Sounds like to me a single person could write ridiculously high-volume applications compared to even ten years ago, and run them on their PC.
Once you get into the realm of businesses that have some capital to throw around, and you can EASILY and CHEAPLY have racks of servers, with LOAD BALANCING, to support millions of simultaneous online users.
Now, if your programming staff is a bunch of FREAKING MORONS, you’ll be buried in problems, to be sure. But then again, that’s always been true and always will be.
But if you have a small team of SMART PEOPLE, who RTFM and program accordingly, performance SHOULD BE a non-issue.
It boggles the mind how stupid people can be.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.