Posted on 03/26/2015 8:27:11 PM PDT by Utilizer
Its a commonly held belief among software developers that avoiding disk access in favor of doing as much work as possible in-memory will results in shorter runtimes. The growth of big data has made time saving techniques such as performing operations in-memory more attractive than ever for programmers. New research, though, challenges the notion that in-memory operations are always faster than disk-access approaches and reinforces the need for developers to better understand system-level software.
These findings were recently presented by researchers from the University of Calgary and the University of British Columbia in a paper titled When In-Memory Computing is Slower than Heavy Disk Usage. They tested this assumption that working in-memory is necessarily faster than doing lots of disk writes using a simple example. Specifically, they compared the efficiency of alternative ways to create a 1MB string and write it to disk. An in-memory version concatenated strings of fixed sizes (first 1 byte then 10 then 1,000 then 1,000,000 bytes) in-memory, then wrote the result to disk (a single write). The disk-only approach wrote the strings directly to disk (e.g., 1,000,000 writes of 1 bytes strings, 100,000 writes of 10 byte strings, etc.).
(Excerpt) Read more at itworld.com ...
I barely know how to turn my computer on, but might a better title to this post be: Can you structure a problem that can be finished faster on disk than in-memory? It pays to be specific.
Given that nearly all operating systems use virtual memory, all bets are off anyway.
Makes sense. Instead of putting the string together in memory and writing it to disk when it’s completed, you are writing to the disk as it’s assembled. So you’re skipping a step.
Interesting comments at article. Many saying that the code was poorly written.
Cheating. If the objective is to build something on the disk, building it on the disk is going to be faster, duh.
Not only that, but when writing to disk it’s actually going to a RAM buffer. So it’s almost as fast anyway.
I can’t believe how retarded this is.
It makes no sense to concatenate the string in memory and then write it to disk, since in either case you will be writing the string sequentially to disk, anyway. Java, Python, C#, and other "managed" languages will always do this more slowly because their strings are immutable, which any decent coder knows.
Best approach: find out the allocation block size of a file on disk, pre-allocate one buffer of that size, memory write to that buffer, flushing the whole block to disk when it's full; this avoid the penalty of zillions of memory allocations and garbage collections and writes a block of optimal size.
In most cases, just pre-allocating a moderately sized block of memory without knowing the best block size is good enough and may even be preferable, because the underlying OS is going to optimally block IO, and probably also cache that at a secondary level.
The key point is to avoid over-allocating managed objects, and again, most good coders know to do this, even if people writing stupid research papers don't...
The real issue isn't memory vs. disk, its what the language you are using does to perform the string concatenation operation.
The fastest technique will be one that does string concatenation in memory while the disk write of the previous string section is completing, so that the disk latencies are used for string building. Oh, and of course the string concatenation code should be designed to run in cache and avoid any virtual memory paging or extra memory copy operations.
The key to performance is understanding how the system works, and writing code at a low enough level to be able to control how it interacts with the system. That's why C and C++ still get used.
The technique of "writing 1 byte at a time" to the disk is really just a way of utilizing the buffering present in the I/O system to queue up disk writes. All the interesting stuff is actually happening in memory, however its being done by clever system code written by people who understand how to get high performance.
A well written version of the string concatenation test should be able to write data to the disk as fast as the disk can write data.
I smell a bug.
That’s the way it was written, the test. It saw the flaw of some nature and then wrote a perfectly good set of conflicting code. We called them bugs and the people who exploit them hackers.
I’ve spent many of nights watching and analyzing processor bus activity on a logic analyzer along with a profiling running program in the OS to believe that they just didn’t find a bug to exploit.
See Fred's example above...
Assembly is still most efficient, especially if the action has to occur frequently in a system of modest capabilities.
The Story of Mel is still the best.
Thanks for the hint to history. Back in the day knowledge of hardware opportunities was widely used for system optimization, especially in real time systems. Hacking in HEX ruled...
How do you think we got to the Moon?
Yep. The VM paging code in Windows is highly optimized, going all the way back to David Cutler’s Windows NT in 1996 (and DEC VAX/VMS before that).
Generally the fastest way to write a file in Windows is to just call ::CreateMemoryMapping() and scribble away. You avoid the double buffering of ::WriteFile(), and the VM subsystem is smart enough to do readaheads and stride I/O too.
Or octal...
I am required to post the article with the complete headline as originally printed, so as to not cause any problems with re-posts or searches.
Once you replace you mechanical hard drive with an SSD, you will then know what fast really is.
I bought a Samsung 850EVO and wow!
I’m never going back.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.