Free Republic
Browse · Search
General/Chat
Topics · Post Article

Something to think about for the tech/programming lot.
1 posted on 03/26/2015 8:27:11 PM PDT by Utilizer
[ Post Reply | Private Reply | View Replies ]


To: Utilizer

I barely know how to turn my computer on, but might a better title to this post be: Can you structure a problem that can be finished faster on disk than in-memory? It pays to be specific.


2 posted on 03/26/2015 8:35:10 PM PDT by Wingy
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Utilizer

Given that nearly all operating systems use virtual memory, all bets are off anyway.


3 posted on 03/26/2015 8:35:13 PM PDT by Squawk 8888 (Will steal your comments & post them on Twitter)
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Utilizer

Makes sense. Instead of putting the string together in memory and writing it to disk when it’s completed, you are writing to the disk as it’s assembled. So you’re skipping a step.


4 posted on 03/26/2015 8:36:51 PM PDT by GMMC0987
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Utilizer

Interesting comments at article. Many saying that the code was poorly written.


5 posted on 03/26/2015 8:37:46 PM PDT by kosciusko51
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Utilizer

Cheating. If the objective is to build something on the disk, building it on the disk is going to be faster, duh.


6 posted on 03/26/2015 8:38:08 PM PDT by Paladin2
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Utilizer

I can’t believe how retarded this is.


8 posted on 03/26/2015 8:44:16 PM PDT by Born to Conserve
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Utilizer
It's actually a pretty stupid example.

It makes no sense to concatenate the string in memory and then write it to disk, since in either case you will be writing the string sequentially to disk, anyway. Java, Python, C#, and other "managed" languages will always do this more slowly because their strings are immutable, which any decent coder knows.

Best approach: find out the allocation block size of a file on disk, pre-allocate one buffer of that size, memory write to that buffer, flushing the whole block to disk when it's full; this avoid the penalty of zillions of memory allocations and garbage collections and writes a block of optimal size.

In most cases, just pre-allocating a moderately sized block of memory without knowing the best block size is good enough and may even be preferable, because the underlying OS is going to optimally block IO, and probably also cache that at a secondary level.

The key point is to avoid over-allocating managed objects, and again, most good coders know to do this, even if people writing stupid research papers don't...

9 posted on 03/26/2015 8:47:16 PM PDT by FredZarguna (It looks just like a Telefunken U-47 -- with leather.)
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Utilizer
Java and Python versions of the code were written..

The real issue isn't memory vs. disk, its what the language you are using does to perform the string concatenation operation.

The fastest technique will be one that does string concatenation in memory while the disk write of the previous string section is completing, so that the disk latencies are used for string building. Oh, and of course the string concatenation code should be designed to run in cache and avoid any virtual memory paging or extra memory copy operations.

The key to performance is understanding how the system works, and writing code at a low enough level to be able to control how it interacts with the system. That's why C and C++ still get used.

The technique of "writing 1 byte at a time" to the disk is really just a way of utilizing the buffering present in the I/O system to queue up disk writes. All the interesting stuff is actually happening in memory, however its being done by clever system code written by people who understand how to get high performance.

A well written version of the string concatenation test should be able to write data to the disk as fast as the disk can write data.

10 posted on 03/26/2015 8:50:17 PM PDT by freeandfreezing
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Utilizer

I smell a bug.

That’s the way it was written, the test. It saw the flaw of some nature and then wrote a perfectly good set of conflicting code. We called them bugs and the people who exploit them hackers.

I’ve spent many of nights watching and analyzing processor bus activity on a logic analyzer along with a profiling running program in the OS to believe that they just didn’t find a bug to exploit.


11 posted on 03/26/2015 8:52:00 PM PDT by Usagi_yo (If you're not leading, you're struggling to be relevant.)
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Utilizer

Once you replace you mechanical hard drive with an SSD, you will then know what fast really is.

I bought a Samsung 850EVO and wow!

I’m never going back.


20 posted on 03/26/2015 9:20:09 PM PDT by ltc8k6
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Utilizer

So cache is not necessarily king?


21 posted on 03/26/2015 9:21:15 PM PDT by Ken H
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Utilizer
reinforces the need for developers to better understand system-level software.

Yeah, right. Like that's gonna happen. The monkeys churning out code today probably think Big Endian and Little Endian is a children's book about Native Americans.

43 posted on 03/26/2015 10:21:43 PM PDT by BuckeyeTexan (There are those that break and bend. I'm the other kind. ~Steve Earle)
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Utilizer

doing the operation in memory then doing a single 1m write to disk is still FAR faster then 1m 1 byte writes followed by 100k of 10bytes, etc.

even if the memory version was written as a single byte at a time, it would equate to the 1m 1 byte writes. the other writes would be slower


47 posted on 03/26/2015 11:10:49 PM PDT by sten (fighting tyranny never goes out of style)
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Utilizer
This problem is simple.

Generally, you should set a buffer size of 4-16K and format your app's output directly into the buffer, if possible. You may wish to use multiple 4-16K buffers, so that you are writing into the current buffer while one of your past buffers is being transferred to disk asynchronously. When you fill the current buffer, it should be queued for output, and you should switch your output-formatting activity to scribble on a previous buffer which has already been written. When you are done, you should remember to queue your final buffer for output and wait until all buffers have been written. Then please close the file.

The optimal buffer size and number of buffers should be determined by experiment.

48 posted on 03/26/2015 11:45:38 PM PDT by cynwoody
[ Post Reply | Private Reply | To 1 | View Replies ]

To: rdb3; Calvinist_Dark_Lord; JosephW; Only1choice____Freedom; amigatec; Ernest_at_the_Beach; ...

53 posted on 03/27/2015 4:13:00 AM PDT by ShadowAce (Linux -- The Ultimate Windows Service Pack)
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Utilizer
Performance testing that does disk I/O can produce results in the test that don't translate in production because of disk contention from other processes that may be running alongside the process in production.

One million single writes to disk can be a much different proposition if the test has the disk all to itself than it is on a busy system where every write operation can potentially have to get queued and wait for some other process to release the disk channel.

IMHO

54 posted on 03/27/2015 4:27:38 AM PDT by tacticalogic ("Oh, bother!" said Pooh, as he chambered his last round.)
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Utilizer
Unless you're specifically calling instructions to flush the write atomically to disk without delay, under modern OSs, you're likely caching the writes anyway.

I'm sure it is possible to construct very narrowly tailored circumstances where what they are describing makes sense, but it's such an artificial construct that it's not really useful. It's simply a reminder to never use the word 'never'.

56 posted on 03/27/2015 6:45:50 AM PDT by zeugma ( The Clintons Could Find a Loophole in a Stop Sign)
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Utilizer
This is crap.

It only proves that you can design a test to do stupid things that don't really apply in the real world.

First and foremost, is the fact that memory is everything. In order for a process to write to disk, it must first put that data in a buffer, which is (gasp) MEMORY. In most modern, enterprise level systems, there is a ton of cache (more memory) sitting in the disk subsystem to receive the data from the operating system prior to it being written to disk.

Let's see them run an application or database doing real-world work and see how their theory holds up. I got $100 that says "not very well"

57 posted on 03/27/2015 7:06:08 AM PDT by BlueMondaySkipper (Involuntarily subsidizing the parasite class since 1981)
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Utilizer
Sorry but the example is kind of silly..

In simple terms the test was to get a string of bits written to the disk in a given order

So a one step operation—write to the disk— is faster then a two step operation—organize the bits in memory—then write to the disk....

Gee that a shock...(/sarcasm off)

58 posted on 03/27/2015 9:32:46 AM PDT by tophat9000 (An Eye for an Eye, a Word for a Word...nothing more)
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Utilizer

Anyone who uses a buffered database knows this is nonsense.


63 posted on 03/30/2015 8:08:20 PM PDT by AppyPappy (If you are not part of the solution, there is good money to be made prolonging the problem.)
[ Post Reply | Private Reply | To 1 | View Replies ]

Free Republic
Browse · Search
General/Chat
Topics · Post Article


FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson