Bug in backup software results in loss of 77 terabytes of research data at Kyoto University

Bug in backup software results in loss of 77 terabytes of research data at Kyoto University
TechXplore ^ | 4 January 2022 | Bob Yirka

Posted on 01/04/2022 3:29:08 PM PST by ShadowAce

Computer maintenance workers at Kyoto University have announced that due to an apparent bug in software used to back up research data, researchers using the University's Hewlett-Packard Cray computing system, called Lustre, have lost approximately 77 terabytes of data. The team at the University's Institute for Information Management and Communication posted a Failure Information page detailing what is known so far about the data loss.

The team, with the University's Information Department Information Infrastructure Division, Supercomputing, reported that files in the /LARGEO (on the DataDirect ExaScaler storage system) were lost during a system backup procedure. Some in the press have suggested that the problem arose from a faulty script that was supposed to delete only old, unneeded log files. The team noted that it was originally thought that approximately 100TB of files had been lost, but that number has since been pared down to 77TB. They note also that the failure occurred on December 16 between the hours of 5:50 and 7pm. Affected users were immediately notified via emails. The team further notes that approximately 34 million files were lost and that the files lost belonged to 14 known research groups. The team did not release information related to the names of the research groups or what sort of research they were conducting. They did note data from another four groups appears to be restorable. Also unclear is whether the research groups who lost their data will be reimbursed for the money spent conducting research on the university's supercomputer system. Such costs are notoriously high, running into the hundreds of dollars per hour of computing time.

Some news outlets are reporting that the backup system was supplied by Hewlett-Packard and that the failure occurred after an HP software update. The same outlets are also reporting that HP has accepted blame for the data loss and is offering to make amends. The team at the university reported that the backup procedure was halted as soon as it became clear that something was awry and university officials suggest that in the future, incremental backup procedures will always be used to prevent the loss of data.

TOPICS: Computers/Internet
KEYWORDS: filesystem

Navigation: use the links below to view more comments.
first 1-20, 21-40, 41-52 next last

1 posted on 01/04/2022 3:29:08 PM PST by ShadowAce

[ Post Reply | Private Reply | View Replies]

To: rdb3; JosephW; martin_fierro; Still Thinking; zeugma; Vinnie; ironman; Egon; raybbr; AFreeBird; ...

2 posted on 01/04/2022 3:29:30 PM PST by ShadowAce (Linux - The Ultimate Windows Service Pack )

[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

Ooops..

3 posted on 01/04/2022 3:31:15 PM PST by TomServo

[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

Hat tip to Red Badger.

Lustre is not a "computing system." It is s a filesystem. Still--data loss is data loss, even if it is only 77T. I've got working filesystems measured in petabytes.

We do perform incremental backups every night.

4 posted on 01/04/2022 3:31:29 PM PST by ShadowAce (Linux - The Ultimate Windows Service Pack )

[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

The team, with the University’s Information Department Information Infrastructure Division, Supercomputing, reported that files in the /LARGEO (on the DataDirect ExaScaler storage system) were lost during a system backup procedure.

—

A professional IT Dept should have multiple generations of backups. This article makes it sound like they keep a grand total of 1 backup. Doesn’t say much for the Information Infrastructure Division.

5 posted on 01/04/2022 3:34:34 PM PST by Flick Lives

[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

All your data are belong to us.

6 posted on 01/04/2022 3:35:26 PM PST by Veggie Todd (Let's go, Brandon!)

[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

I’ve been doing system backups since the 1970s.

Always, always, always do test restores. Always.

7 posted on 01/04/2022 3:35:35 PM PST by Alas Babylon! (Rush, we're missing your take on all of this!)

[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

Good thing they had their data backed up.

uhhh....

/sarc

8 posted on 01/04/2022 3:35:44 PM PST by Secret Agent Man (Gone Galt; not averse to Going Bronson.)

[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

i still remember one of my roommates telling me the story that he couldnt get his ibm pc-at with the 100mb hard drive because the store guy would not sell him one - “no one will ever need a 100mb hard drive”, he said.

9 posted on 01/04/2022 3:37:53 PM PST by Secret Agent Man (Gone Galt; not averse to Going Bronson.)

[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

"Some in the press have suggested"

Discount count that one off the bat.

10 posted on 01/04/2022 3:38:27 PM PST by fruser1

[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

Frequent saving helps prevent file loss.

(Thanks Sim City)

11 posted on 01/04/2022 3:45:16 PM PST by metmom (...fixing our eyes on Jesus, the Author and Perfecter of our faith…)

[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

I hear Pfizer is very interested on how well the excuse goes over....

12 posted on 01/04/2022 3:47:00 PM PST by Gaffer

[ Post Reply | Private Reply | To 1 | View Replies]

To: Alas Babylon!

I’ve been working in IT for decades and I have never backed up anything. Hahahahaaaa! Most of what people call data is useless.

13 posted on 01/04/2022 3:47:53 PM PST by webheart (I thought I was helping by getting vaccinated but they say I didn’t help at all. )

[ Post Reply | Private Reply | To 7 | View Replies]

To: ShadowAce

Fret not, NSA probably has a copy somewhere.

14 posted on 01/04/2022 3:49:49 PM PST by martin_fierro (< |:)~)

[ Post Reply | Private Reply | To 1 | View Replies]

To: Secret Agent Man

I worked in computer sales decades ago. I could not imagine at the time how someone could ever fill up a 20 megabyte hard drive. One thing I have learned is that computer manufacturers and software writers will find a way to fill up all of the available storage space on the most advanced computer, no matter how large.

15 posted on 01/04/2022 3:52:16 PM PST by webheart (I thought I was helping by getting vaccinated but they say I didn’t help at all. )

[ Post Reply | Private Reply | To 9 | View Replies]

To: webheart

You only find you really need it when you don’t have it.

16 posted on 01/04/2022 3:52:50 PM PST by Alas Babylon! (Rush, we're missing your take on all of this!)

[ Post Reply | Private Reply | To 13 | View Replies]

To: webheart

17 posted on 01/04/2022 3:54:07 PM PST by dfwgator (Endut! Hoch Hech!)

[ Post Reply | Private Reply | To 15 | View Replies]

To: ShadowAce

I was privy to one of the most catastrophic data losses ever. During a rather uninteresting database outage, a rogue Unix admin restored old data over new data, about 7 tbs, at a well know aircraft manufacturer. Totally corrupted the backups. The business was down for two weeks until we found an image that could be rolled forward with incremental backups. My contribution was saying, Greg don’t do it!

The root cause was that Greg had control over backups, disk allocation and Unix. There were no checks on his power. When I stepped into the breached I insisted that backups and disk were assigned to their proper groups. I took care of Unix.

Poor Greg died of a heart attack while still a fairly young man. He always wanted to be seen as the golden knight who rode in to save the damsel in distress. In reality, he was the worse lead Unix admin ever who became mad by root authority.

18 posted on 01/04/2022 3:55:21 PM PST by DeplorablePaul (s)

[ Post Reply | Private Reply | To 2 | View Replies]

To: webheart

The acronym GIGO comes to mind: Garbage In, Garbage Out.

19 posted on 01/04/2022 3:55:40 PM PST by webheart (I thought I was helping by getting vaccinated but they say I didn’t help at all. )

[ Post Reply | Private Reply | To 13 | View Replies]

To: ShadowAce

S’why you have to warm to the idea of absorbing the expense of a computer lab so your Admins can test their back-ups and confirm whether they work. Otherwise it’s like buying lifeboats for a ship and not bothering to test whether they float.

20 posted on 01/04/2022 3:57:01 PM PST by Paal Gulli

[ Post Reply | Private Reply | To 1 | View Replies]

Navigation: use the links below to view more comments.
first 1-20, 21-40, 41-52 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search

General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794