Posted on 04/27/2015 5:48:18 AM PDT by ShadowAce
As Linux version 4.0 was released on 15 April, one of the most discussed new features to be included in this release is "no reboot" kernel patching. With the major distros committing to support the 4.0 kernel and its features (including "no reboot" patching) at some point this year, it's a good time to take a look at what this feature actually does and what difference it will make for you.
First of all, what does it actually mean? Well, for once, this is a feature with a name that describes what it does pretty well. With versions of Linux before 4.0, when the kernel is updated via a patch, the system needs to reboot.
Kernel patches are released for a number of reasons, but fixing security holes is the most frequent reason. This is why it's important to install the patch as soon as possible.
Unlike other operating systems, Linux is able to update many different parts of the system without a reboot, but the kernel is different. Every running process integrates with the kernel intimately, so switching out parts of the kernel while it is running is quite risky.
On the other hand, rebooting the computer is irksome, and in some cases, where uptime is important, it can be a real issue. This is why "no reboot" kernel patching has been a priority for many administrators.
Recognizing this need, two companies have been hard at work on two different solutions. Red Hat has been working on kpatch, and SUSE has been working on kGraft. Both of these programs are designed to accomplish the same task, but they take a different approach and have different strengths.
Kpatch freezes every process and then reroutes system calls from the old kernel functions to the new, patched functions, before removing the old code. Because it handles every running process in one sweeping move, it runs quite fast - one to forty milliseconds and it's done. However, during this time the processes are frozen, which means there is some downtime - a mere fraction of a second, but in certain situations, that may be unacceptable.
kGraft, on the other hand, handles each thread one by one, as they make system calls (without forcing them to freeze first) until all of the threads are running the patched code. At this point, the patch is fully installed and the old code is replaced. This process takes longer to complete the patch, but it does it without any downtime.
Having solved the same problem separately, from two different angles, the 2 companies then came together in October last year. They looked at how their different approaches could be fused together, and the result of this merge has been pushed into version 4.0 of the kernel.
So, having described what "no reboot" kernel patching is, and how it works, the next question most users will have is "what difference does it make?"
For desktop users, the difference is relatively trivial. For users without 4.0, installing a kernel patch means rebooting the system, which means you must save your work and interrupt your work-flow. This is irritating, and can cause a small hiccup in your productivity. If everyone in a medium or large office has to install a patch on the same day, it hit productivity a bit harder. However, this is a relatively small cost and is worthwhile to ensure security.
On the other hand, some servers and critical real-time applications must not be taken down without advanced scheduling, even for a few minutes. This can be a pain when administrators need to keep the system secure and a patch is released to repair a newly discovered security hole. In this case, no-reboot patching becomes a real boon.
But this doesn't mean that system reboots are gone forever. Even on a system with the Linux 4.0 kernel, there will be security updates that still require a reboot, because there are other non-kernel components that can require patching, and some of these require a reboot as part of the process.
Some critics are therefore claiming that focusing so much effort and time on no-reboot patching is missing the real target that needs fixing - the reason why this feature was developed was to avoid the cost of rebooting a system. Maybe developers should be trying to make it less expensive to reboot a Linux system instead?
This is an introduction of a vulnerability.
Servers can avoid reboots for long periods of time, but not forever.
Once you need serious uptime, you will have more than one server providing the same services in either a failover / load balancing type arrangement, and individual servers can be rebooted without creating a service interruption.
Much too risky to allow realtime kernel patches for sake of convenience of no reboot, IMHO.
Depends on the situation. Some research applications can run for days at a time—and they need that uptime. This allows for patching during a multi-day/week job without having to restart that job.
Great! Modern operating systems get something Vax/VMS had 35 years ago. Progress!
It seems that more and more applications and software packages have memory leaks that tend to hold on to chunks of memory, and normally about the only solution is a reboot.
Does OpenVMS have this? I worked with DEC VAX/VMS in the 80's, but haven't played with OpenVMS.
Now you’re down to the set of servers running applications that:
0) have apps that need to run for days without stopping (app too dumb to support saving current state and restarting, but advanced enough to be doing something super critical)
1) the same super-critical apps that are doing some “research” also read and write data over the internet. (apparently the internet-based services they communicate with never go down)
2) the same systems are in dire need of getting security fixes to the kernel applied within a couple days of their release.
Sounds like a TV show, lol. Maybe a sci-fi thriller.
Wait... just a minute...
Hold on, I have to install this new security patch for my kernel...
...
...
...
Ah, ok, done.
Whew !
The terrorist/state-sponsored hackers almost got me !
This is probably another wonderful development by the NSA.
LOL, yeah. Dang memory leaks.
I guess it helps that all the script-kiddie languages use garbage collection so it’s not uncommon for programmers to not be bothered worrying about deallocation in general.
Remember the floating-point accumulating error on the old Patriot missile system ?
Their workaround was to reboot every so often. Otherwise, according to reports, accuracy was not good.
0) have apps that need to run for days without stopping (app too dumb to support saving current state and restarting, but advanced enough to be doing something super critical)
I haven't done application development for quite a while, but I did build in checkpoints for those apps that ran for days. I know my current environment has apps that run for days but do not seem to have checkpointing built in.
I guess software development, like all education, is rapidly going downhill.
Would only need to be a very subtle suggestion, just float the idea.
It’s such a cool feature, I’d hop on that right away !
Who wouldn’t want to write that, how cool; hot kernel updates. Very cool.
Well, technically.
Practically speaking though...
That is really cool.
Multi system redundancy can make a good system but less than a perfect one. If a system has to drop a transaction, something suffers.
Claims of risk should be backed up by more than just reiterating the claim. There are plausible risks to an approach like this, such as both kernels needing to be able to support the kernel-to-kernel handoff and having a bug doing so. But they should be listed, not just handwaving engaged in.
Progress is good, what could go wrong?
If there is a problem, it will manifest in the imperfect handover of process state. I would not want to ever guarantee that every kernel update can be performed this way, or rely upon it. If there is a kernel bug it could negatively impact the ability to do the handoff.
I could see virtualizing the concept, being able to hand off the state of one machine to another machine not just on a local network, but in the cloud. Security would be “a matter of details.”
There would be the usual integrity issues, like assuring that the new kernel is a valid one. These are nothing new, they pertain to all software updates.
I have to admit, I am a tad disapointed. I was hoping that by Linux 4, we would have migrated to the Minix3 kernel, and by 4.2 would be adopting isolated device drivers.
sigh
But patch on the fly is good. Will have to revisit the RHEL certification.
Squeezing that last mips out of the machine has taken precedence to structurally elegant but CPU hungry architectures.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.