Free Republic
Browse · Search
General/Chat
Topics · Post Article

Skip to comments.

One Billion Dollars! Wait… I Mean One Billion Files!!!
Linux Magazine ^ | 6 October 2010 | Jeffrey B. Layton

Posted on 10/08/2010 8:06:52 AM PDT by ShadowAce

click here to read article


Navigation: use the links below to view more comments.
first 1-2021-24 next last

1 posted on 10/08/2010 8:06:58 AM PDT by ShadowAce
[ Post Reply | Private Reply | View Replies]

To: rdb3; Calvinist_Dark_Lord; GodGunsandGuts; CyberCowboy777; Salo; Bobsat; JosephW; ...

2 posted on 10/08/2010 8:07:39 AM PDT by ShadowAce (Linux -- The Ultimate Windows Service Pack)
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

Hmm. This guy should give me a call. We were putting 100+ million records on Linux Slackware systems using raw devices in 1995. Have since invented something that will easily hold billions of records that can span multiple machines across any network.


3 posted on 10/08/2010 8:17:28 AM PDT by isthisnickcool (Sharia? No thanks.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

I use ext4 on all of my Linux distros, if possible. It seems to me that SSDs are the next phase of the evolution of storage. Having recently purchased my first 1 TB SATA disk for my new gaming rig, I decided to run Windows 7’s system experience test on my system. I scored 7.6, 7.6, 7.7, and 7.7 respectively on each of the tests (CPU, memory, DirectX, and video), but my hard disk dumped my score down to a 5.9 (MS uses the lowest score as the final). I was shocked, to say the least, but my previous system had a 150 GB SATA disk at 10K RPM rotational speed and netted me a 6.5 on the same test.

Size truly does matter, but interface bandwidth and operating system disk operations appear to be the primary concerns.


4 posted on 10/08/2010 8:22:51 AM PDT by rarestia (It's time to water the Tree of Liberty.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: isthisnickcool
> We were putting 100+ million records ... easily hold billions of records...

I assume you mean a database. Most databases have a relatively small number of files on the filesystem, but the files may include millions or billions of records. That is, record =/= file.

So, are you storing each record in a separate file on the filesystem? If not, you're comparing apples and oranges.

5 posted on 10/08/2010 8:36:48 AM PDT by dayglored (Listen, strange women lying in ponds distributing swords is no basis for a system of government!)
[ Post Reply | Private Reply | To 3 | View Replies]

To: ShadowAce

Why wouldn’t you use a database?


6 posted on 10/08/2010 8:41:06 AM PDT by driftdiver (I could eat it raw, but why do that when I have a fire.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

http://en.wikipedia.org/wiki/ZFS

I thought that Apple almost put this into 10.5 (maybe), but had some license issues...Not really sure, but it is good to know that folks are in front of this.


7 posted on 10/08/2010 8:43:31 AM PDT by LearnsFromMistakes (Yes, I am happy to see you. But that IS a gun in my pocket.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: driftdiver
Database for what? This is about filesystems and the number of files--generically.

It's not necessarily about storing photos or databases.

8 posted on 10/08/2010 8:44:41 AM PDT by ShadowAce (Linux -- The Ultimate Windows Service Pack)
[ Post Reply | Private Reply | To 6 | View Replies]

To: ShadowAce

Why a database? Because in my experience trying to manage a high volume of individual data files is extremely difficult.

By manage I mean keep track of, update, backup, control access to and in general ensure the integrity of the data.


9 posted on 10/08/2010 8:46:46 AM PDT by driftdiver (I could eat it raw, but why do that when I have a fire.)
[ Post Reply | Private Reply | To 8 | View Replies]

To: ShadowAce

Regardless of whether you’re on an array, Windows with NTFS starts dying with only 20,000 or so files in a single folder. You’re sure to get a lock-up with half a million files.


10 posted on 10/08/2010 8:46:59 AM PDT by antiRepublicrat
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

Hmm. This guy should give me a call. We were putting 100+ million records on Linux Slackware systems using raw devices in 1995. Have since invented something that will easily hold billions of records that can span multiple machines across any network.


11 posted on 10/08/2010 8:48:47 AM PDT by isthisnickcool (Sharia? No thanks.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: driftdiver
Because in my experience trying to manage a high volume of individual data files is extremely difficult.

Yes it can. However, managing thousands of users on a single filesystem with their /home directories in a database is not very feasible, is it?

12 posted on 10/08/2010 9:02:57 AM PDT by ShadowAce (Linux -- The Ultimate Windows Service Pack)
[ Post Reply | Private Reply | To 9 | View Replies]

To: isthisnickcool

See Post #5


13 posted on 10/08/2010 9:05:44 AM PDT by ShadowAce (Linux -- The Ultimate Windows Service Pack)
[ Post Reply | Private Reply | To 11 | View Replies]

To: driftdiver
Why a database? Because in my experience trying to manage a high volume of individual data files is extremely difficult.

On the flip side, managing terabyte-sized files is even more of a pain in my experience.

14 posted on 10/08/2010 12:06:49 PM PDT by antiRepublicrat
[ Post Reply | Private Reply | To 9 | View Replies]

To: antiRepublicrat

“managing terabyte-sized files is even more of a pain “

Dunno, the largest I’ve dealt with was 23TB. It made restoring from the DR site a bit challenging due to bandwidth limitations but all in all it wasn’t that bad.


15 posted on 10/08/2010 12:50:35 PM PDT by driftdiver (I could eat it raw, but why do that when I have a fire.)
[ Post Reply | Private Reply | To 14 | View Replies]

To: driftdiver
Dunno, the largest I’ve dealt with was 23TB.

You can split a database into multiple files on most systems. It makes dealing with them a little easier, and usually improves performance.

16 posted on 10/08/2010 12:52:47 PM PDT by antiRepublicrat
[ Post Reply | Private Reply | To 15 | View Replies]

To: antiRepublicrat

This was an oracle DB hosted on SAN. We used BCVs and all that. The database itself wasn’t a problem. It was moving 23 TB between datacenters that created an issue.


17 posted on 10/08/2010 12:57:42 PM PDT by driftdiver (I could eat it raw, but why do that when I have a fire.)
[ Post Reply | Private Reply | To 16 | View Replies]

To: driftdiver
It was moving 23 TB between datacenters that created an issue.

Never underestimate the bandwidth of a station wagon full of backup tapes.

18 posted on 10/08/2010 2:35:42 PM PDT by antiRepublicrat
[ Post Reply | Private Reply | To 17 | View Replies]

To: rarestia
I use ext4 on all of my Linux distros, if possible.

Hm. I tried ext4 but found it unreliable. Then again, it might have been when I migrated to Xubuntu 9.10 which, installed (once) on this system was so flaky I abandoned it for Xubuntu 9.04 and ext3. Stable.

Some flaky hardware? I dunno. Once in a while I have to pull the power plug for a few seconds and re-insert before getting the old box to boot.

19 posted on 10/08/2010 5:33:28 PM PDT by sionnsar (IranAzadi|5yst3m 0wn3d-it's N0t Y0ur5:SONY|TV--it's NOT news you can trust)
[ Post Reply | Private Reply | To 4 | View Replies]

To: ShadowAce
Great article! I found the comparison between filesystems were interesting. I'd like to see a similar comparison between ext4 and ntfs.

Hmmm... how many files do I have on my computer...

$ sudo find / -print | wc -l
605730

That's a lot of files IMO for a simple desktop computer, but nowhere near what they are talking about.

I've dealt with directories at work with 100k+ files in them (as a result of really stupid programmers in this case), and it's not pretty when you need to do cleanup there. Thank God for xargs!

File proliferation is actually a serious issue. Desktop users, especially those that have never figured out how to properly organize files can quickly get into a state where they can't find files they are looking for, even if they know it's somewhere on the computer. Your average user doesn't know anything about how to efficiently organize directories and subdirectories. Heck, I sometimes have problems with it myself, and find myself going back through and reorganizing things periodically just to keep things straight.

20 posted on 10/08/2010 7:32:57 PM PDT by zeugma (Ad Majorem Dei Gloriam)
[ Post Reply | Private Reply | To 1 | View Replies]


Navigation: use the links below to view more comments.
first 1-2021-24 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson