Free Republic
Browse · Search
News/Activism
Topics · Post Article

Skip to comments.

Mars Rover Recovering From Memory Problems
New Scientist ^ | 1-28-2004 | David L Chandler

Posted on 01/28/2004 8:35:05 AM PST by blam

Mars rover recovering from memory problems

13:35 28 January 04

NewScientist.com news service

A full revival of the Mars rover Spirit from its electronic ailments now seems highly likely. Engineers now think there is no real hardware or software problems, but something much easier to fix - a simple overload of files in its onboard memory.

If further testing confirms this diagnosis, that will be very good news for Spirit's twin, Opportunity. Any software bug or hardware weakness would probably be present in both rovers and might require weeks of analysis and repair.

But if, as it appears, the problem is a previously unrecognised limit on the number of files that can be stored in the craft's flash memory, then Opportunity's data collection and file management can be planned to prevent the problem.

This would avoid the bleak situation faced by engineers when Spirit fell silent for more than a day and failed to respond to commands. Having initially described the Spirit's troubles as "critical", mission manager Jennifer Trosper says "the patient is now in rehab".

However, Opportunity has developed a problem of its own, according to another mission manager Jim Erickson. The rover is losing power, apparently due to a heating unit that is switching itself on when it should not. What this will mean for the rover's mission and whether it can be fixed are not yet known.

Coaxed communication

Spirit's controllers have been coaxing the rover back into communication since it ended its silence on Thursday with a single bleep. The engineering data returned has allowed them to piece what had happened.

The rover first failed halfway through a test of a moving mirror that directs light to the mini-TES instrument. The high-gain antenna was also being used at the time, and the spacecraft entered a "safe mode" associated with antenna problems.

Later data returns showed the craft had entered a repeating cycle of resetting its computer system, preventing it from carrying out anything but the simplest commands. At last count, it had rebooted itself more than 120 times. This constant resetting prevented it from entering its night sleep cycle, needed to conserve its batteries.

But detailed analysis of the start of each reset cycle eventually led to the apparent answer to the mystery. The problem was clearly associated with the handling of files being written to one of its three types of internal memory: a non-volatile 256 megabyte flash memory.

Testing on Monday and Tuesday suggests that it is not the flash memory itself that is at fault, but the software's file-handling system. Unbeknownst to the engineers, there seems to be a limit on the number of files that can be simultaneously stored in the flash memory, even though the overall memory capacity is not full.

The solution is likely to be simply deleting unneeded files, many of which were accumulated during the eight-month journey to Mars. It will require some skillful programming to get the computer to do this without falling back into its resetting cycle, but Trosper says a full recovery is now expected.

David L Chandler


TOPICS: News/Current Events
KEYWORDS: jpl; mars; memory; nasa; problems; recovering; rover; spirit
Navigation: use the links below to view more comments.
first previous 1-2021-4041-6061-76 next last
To: Post Toasties
Also, why couldn't the number of files storable on a given memory be a SW configurable parameter upon reset?

It's not that easy if they are using the FAT filesystem. It's a hard limit.

The FAT filesystem stores the root directory differently than subdirectories. It has a fixed size and goes in a specific location on the media. Even MS couldn't expand it without breaking existing software (but the FAT32 and NTFS filesystems do not have this limit).

Now, if they are indeed using the FAT filesystem and ran into the root directory limit, they could easily solve it by storing the files in subdirectories under the root - the number of files allowed in subdirectories is limited only by the amount of space available.

41 posted on 01/28/2004 10:05:40 AM PST by Mannaggia l'America
[ Post Reply | Private Reply | To 21 | View Replies]

To: DeepDish
...a seep of ice...

Yes. I posted the other day that it resembled cracked ice. Of course, it could be the remains of flotsam, jetsam, or ligan.

42 posted on 01/28/2004 10:08:21 AM PST by Consort
[ Post Reply | Private Reply | To 39 | View Replies]

To: BigWaveBetty
R'Ver i seek the creator. (/h)
43 posted on 01/28/2004 10:11:53 AM PST by longtermmemmory (Vote!)
[ Post Reply | Private Reply | To 11 | View Replies]

To: blam
8-bit,16-bit,32-bit,64-bit,or 128-bit file system I wonder? The write command probably failed without giving an error. Easy to miss in testing if you are spot checking reads and writes.
44 posted on 01/28/2004 10:12:52 AM PST by muskogee
[ Post Reply | Private Reply | To 1 | View Replies]

To: brownsfan
Miracles?

They just shot 2 slugs into space, landed them on a foreign planet that 1/2 the attempts to reach have failed, had them land on the planet almost exactly where we wanted them to.... And despite travelling for 8 months in space and how many millions of miles they landed, and for the most part are operating properly....

I don't know what counts as miracles in your book, but do the statistical odds that we would get one of these things to even get to mars and answer, much less both and both still work?

45 posted on 01/28/2004 10:15:50 AM PST by HamiltonJay
[ Post Reply | Private Reply | To 20 | View Replies]

I saw on PBS that a lot of software was programmed during flight from earth to mars. They can program them from earth.
46 posted on 01/28/2004 10:25:12 AM PST by meanie monster
[ Post Reply | Private Reply | To 45 | View Replies]

To: Bikers4Bush
Welcome to the real world. No ammount of testing will find more than about 1/3 of the bugs. The goal of testing is to find the same 1/3 that would be encountered in the field. They missed one, it happens, a lot, on more expensive and more important systems than this. The good news is it seems to be easily recovered from has an easy work around.
47 posted on 01/28/2004 10:26:37 AM PST by discostu (are you in the pocket of the moment)
[ Post Reply | Private Reply | To 4 | View Replies]

To: blam
Mars Rover has Alzheimers?!
48 posted on 01/28/2004 10:28:34 AM PST by NotJustAnotherPrettyFace (Alec <a href = "http://www.alecbaldwin.com/" title="Miserable Failure">"Miserable Failure"</a>)
[ Post Reply | Private Reply | To 1 | View Replies]

To: discostu
#46 would explain a lot.
49 posted on 01/28/2004 10:30:17 AM PST by Bikers4Bush (Constitution party here I come. Write in Tancredo in 04'!)
[ Post Reply | Private Reply | To 47 | View Replies]

To: blam
And here I thought Spirit was transmitting back "Renegotiate" until it found out Opportunity had landed and it wasn't the only game in town.
50 posted on 01/28/2004 10:32:00 AM PST by OrioleFan (Republicans believe every day is July 4th, DemocRATs believe every day is April 15th. - Reagan)
[ Post Reply | Private Reply | To 1 | View Replies]

To: brownsfan
The longest test was 9 days!

Maybe I'm just a forgiving kind of guy, but I could see how this could happen. The operating system in use here is a fairly old one with a good track record. It's probably one of the last places they expected to have an issue. They have this huge electro-mechanical machine full of moving parts and bleeding-edge technology for imaging, telecommunications, and locomotion on uncertain surfaces. If you're drawing up the testing budget, how much time do you allocate to debug an OS that's been in service in ten million little widgets around the world for a decade?

The thing is, embedded systems do not often have the problem of managing lots of "files." That's not really what they are about. Embedded-system OS's are about compact size, interrupt latency, and managing lots of little tasks. "File management" is what they do in IT shops; it hardly ever comes up on the factory floor or in a heart-lung machine.

So, they got surprised. It happens to everybody. People who haven't made any mistakes haven't tried enough new things.

I would love to have been on the team that figured out what this was. When it finally dawned on somebody what was causing this, they must have whooped for joy, because this is an easy fix and the $400 million machine is safe. There are a lot of ways this could have turned out worse.

51 posted on 01/28/2004 10:41:00 AM PST by Nick Danger ( With sufficient thrust, pigs fly just fine.)
[ Post Reply | Private Reply | To 7 | View Replies]

To: Mannaggia l'America
Fat32 has a limit for a single folder but it's pretty absurd (I think is 2048, might be 10,000ish), only time I ever run into is dumping things through the print spooler because those files are tiny and you can fill the spool directory before filling the harddrive. NTFS is wide open though.
52 posted on 01/28/2004 10:43:38 AM PST by discostu (are you in the pocket of the moment)
[ Post Reply | Private Reply | To 41 | View Replies]

To: Bikers4Bush
That's SOP for these Mars missions. Most if not all of our landers have had the software loaded on the fly as it got to Mars. The orbitter than was mixed between English and metric measurements and tried to go through the planet had it's orbiting routine loaded minutes before it started the manuever. Fully writing and testing the software has to happen after the hardware is made, so you can either launch with incomplete software and try to get it finished before your object completes it's journey or delay the mission a year and a half and launch then. In general this way is better, though some complications can arise.
53 posted on 01/28/2004 10:49:29 AM PST by discostu (are you in the pocket of the moment)
[ Post Reply | Private Reply | To 49 | View Replies]

To: discostu
I can see the benefits. The only problem being it would seem the hardware guys aren't talking to the software guys.

Think about the advancements in hardware that occur over a 2 year period. That has to make the job for the guys writing the software tough when they aren't thinking about the hardware limitations initially.
54 posted on 01/28/2004 10:54:47 AM PST by Bikers4Bush (Constitution party here I come. Write in Tancredo in 04'!)
[ Post Reply | Private Reply | To 53 | View Replies]

To: gesully
Maybe they just need to upgrade their current version of Windows. Reminds me of an old joke:

There are three engineers in a car; an electrical engineer, a chemical engineer and a Microsoft engineer. Suddenly the car just stops by the side of the road, and the three engineers look at each other wondering what could be wrong. The electrical engineer suggests stripping down the electronics of the car and trying to trace where a fault might have occurred. The chemical engineer, not knowing much about cars, suggests that maybe the fuel is becoming emulsified and getting blocked somewhere.Then, the Microsoft engineer, not knowing much about anything, comes up with a suggestion: "Why don`t we close all the windows, get out, get back in, open the windows again, and maybe it`ll work !?"

55 posted on 01/28/2004 10:59:56 AM PST by reagan_fanatic (Ain't Skeered...)
[ Post Reply | Private Reply | To 37 | View Replies]

To: brownsfan
OK, that's another way of saying the same thing. A file handle is just an address. The handle is a number that takes memory to store. Someone determinined the maximum number of files and allocated a certain amount of memory to store the file handles. There appears to have been no check to prevent exceeding this number.So when the one-too-manyth file handle is added it overwrites something important in memory. Result is a reboot.

As an old "c" programmer, I know this one quite intimately. When you have to squeeze a lot of code into a tiny amount of memory, you put the obligation on the user not to break things.

56 posted on 01/28/2004 11:02:14 AM PST by js1138
[ Post Reply | Private Reply | To 22 | View Replies]

To: blam
Mars Rover Recovering From Memory Problems

Gee, I wonder if JPL can help me?

57 posted on 01/28/2004 11:03:01 AM PST by Howlin
[ Post Reply | Private Reply | To 1 | View Replies]

To: BigWaveBetty
Nice picture, but it gives rise to a few questions: For example; Why did they send a wood burning stove and where are they going to find the logs?
58 posted on 01/28/2004 11:03:25 AM PST by scouse
[ Post Reply | Private Reply | To 11 | View Replies]

To: Bikers4Bush
They probably are but somebody probably forgot. Assume for a second it's because the flash card uses FAT with it's well documented file limit, probably at some point the hardware guys mentioned the flash card was using FAT but probably nobody thought about the implications of that, most users that go back to the FAT days were taught good file management techniques and probably never ran into the file limit (which remember isn't actually a bug, it's a documented design limitation), I've been using computers since DOS 3.1 and didn't run into the file limit until last year and only hit that because the software I test professionally can (if you configure it inefficiently) run massive quantities of files through the Windows print spooler, and if I'd setup that machine as NTFS instead of FAT32 I wouldn't have run into then either. You're not going to write your software to keep track of the number of files that's what an OS is for, keeping from running over the file limit is really a user thing that's system management and maintenance that on lean systems (which this would be to maximize storage capacity) is supposed to be done by hand.

In the end I'd gues this is something that didn't make the "to do" list of the operator sending instructions to the rover, somewhere on there was an entry to check free space and delete files as necessary, now a new entry just got penciled in.
59 posted on 01/28/2004 11:05:00 AM PST by discostu (are you in the pocket of the moment)
[ Post Reply | Private Reply | To 54 | View Replies]

To: Mannaggia l'America
I have several CF cards that do not use FAT, they use various other flash tuned file systems. Since these probes run VxWorks the flash more than likely has a file system on it that spread out the writes to ensure that no one section of the flash is written to more that the others. This makes that flash last longer.
60 posted on 01/28/2004 11:11:27 AM PST by DMCA (Illegal Aliens get out of jail free, Illegal song swapping - pay large fines/goto jail...)
[ Post Reply | Private Reply | To 16 | View Replies]


Navigation: use the links below to view more comments.
first previous 1-2021-4041-6061-76 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
News/Activism
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson