Posted on 03/26/2015 8:27:11 PM PDT by Utilizer
Its a commonly held belief among software developers that avoiding disk access in favor of doing as much work as possible in-memory will results in shorter runtimes. The growth of big data has made time saving techniques such as performing operations in-memory more attractive than ever for programmers. New research, though, challenges the notion that in-memory operations are always faster than disk-access approaches and reinforces the need for developers to better understand system-level software.
These findings were recently presented by researchers from the University of Calgary and the University of British Columbia in a paper titled When In-Memory Computing is Slower than Heavy Disk Usage. They tested this assumption that working in-memory is necessarily faster than doing lots of disk writes using a simple example. Specifically, they compared the efficiency of alternative ways to create a 1MB string and write it to disk. An in-memory version concatenated strings of fixed sizes (first 1 byte then 10 then 1,000 then 1,000,000 bytes) in-memory, then wrote the result to disk (a single write). The disk-only approach wrote the strings directly to disk (e.g., 1,000,000 writes of 1 bytes strings, 100,000 writes of 10 byte strings, etc.).
(Excerpt) Read more at itworld.com ...
Yes, they are faster than hard disks. And they are reliable enough, now.
What if there’s little contention because you use queuing (a bus) and also scale out?
What if your algorithm is smart enough to somehow prevent contention?
Anyone who uses a buffered database knows this is nonsense.
You might be able to adjust for contention, allocating more space to write buffers when the disk queue length increases but memory is available.
Maybe scale out and have a separate dedicated server machines, like the BigData solutions? (Hadoop and so on...) Connect multiple machines with a fast fiber optic back plane.
SANs, after all, partition loads in exactly this fashion with LUNs, right? That’s why they can handle more throughput, right?
As far as adjusting for contention, why not use something that partitions the writes based on some natural key in the data?
I think the bottom line is that there's is not a "one size fits all" answer to the question.
Maybe not for this specific scenario: writing the same string over and over. In fact, concatenating the same string, over and over.
After all, don’t many disk controllers (most in fact?) have their own onboard caching mechanism, including a write caching mechanism?
And isn’t it possible that the controller might in fact be able to operate in parallel with the CPU, with this specific scenario? Isn’t there an assembler command under the old DOS that used to use interrupt 25H that allowed writing sectors and would in fact write more than one sector with the same buffer?
Therefore if the *same* write command (with the same untouched buffer) were repeated over and over, perhaps there is in fact some truth to this.
Huh? What?
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.