Posted on 10/20/2003 8:36:42 PM PDT by anymouse
Editor's note: the following document was generated in 2003 for internal use at NASA Johnson Space Center.
You can download the entire 100 page report (with appendices, charts, etc.) here (5.6 MB PDF).
The first portion of the document describes how the Space and Life Sciences Division at NASA JSC is supposed to conduct business. The second part of this report (excerpts below) opens by saying "Despite the apparent order of the process described above, the reality of the current program tells a more chaotic story."
The third section of this report "Recommendations" ends with "The issue is clear. Voodoo science is not worth the cost. The limb of the fault tree Life Sciences is perched upon is perilously close to breaking."
The last portion of this report contains a detailed statistical analysis of JSC life science research.
None of the problems described in this document arose overnight. Indeed, they are the result of decades of bad decisions - both at JSC as well as at NASA HQ. These problems are also the result of a failure on the part of advisory committees - both those sponsored by NASA as well as those chartered external to the agency.
Having been deeply involved myself in the advisory, peer review, and payload integration aspects of NASA's life sciences programs in the 1980s and 1990s, I saw much of this with my own eyes. It hasn't gotten any better.
NASA may soon be handed a new mandate for humans to do new things in space. Unless NASA gets its life sciences research house in order, NASA will not be able to respond to that mandate.
Human Life Sciences Research Aboard the International Space Station and the Space Shuttle: A White Paper
-- LH Kuznetz
06/18/03
CONFIDENTIAL ... NOT FOR DISTRIBUTION
Page 3
Acknowledgment
This manuscript was compiled from the input of many people in the Space and Life Sciences Division and represents 21 months of effort. Special thanks go to Al Feiveson, who provided statistical guidance for the Analytical Heirarchy Pairwise method and created the Excel program for the Moving Target Approach. Others whose data formed the foundation of this work include the Experiment Science Managers (ESMs); Internal and External Principal Investigators; Increment Scientists; DSO Flight Experiment Managers and Medical Operations Personnel. Without the input of these passionate and dedicated people, the key conclusions reached in this white paper would not have been possible.
Page 17
2. STATE OF THE CURRENT PROGRAM
Despite the apparent order of the process described above, the reality of the current program tells a more chaotic story. Metrics for the program appear in Figure 7, which reveal that of the 45 experiments either in or about to enter the flight queue, only 13 are designated Red 1 or highest priority, compared to 27 that are Yellow or lower priority. More disturbing is the fact that 18 of these 27 areYellow 2, the next to lowest tier of importance. (The Red, Yellow, Green designation was established by the REMAP commission for a balanced program (Appendix 1). Ignoring for the moment what this says about the science value, only 6 of the 45 are countermeasures (15%) while 85% constitute mechanistic or fundamental studies with no clear path to a countermeasure. This is a clear contradiction to the dictate established by the Young Commission that the development of countermeasures to prevent the deleterious effects of microgravity is the primary mission of ISS.
Another foreboding statistic relates to the number of studies that have overlapping objectives (as noted by the Critical Path Risks and Questions) and measure similar parameters yet are manifested and treated as if they were unrelated. This lack of commonality stretches out the queue far beyond what it need be, accumulating costs and waste in return. This is especially true for the Cardiovascular and Neurovestibular Disciplines, with 12 experiments between them. To put it in perspective, they would take 8 years to complete at an average duration of 4 years per experiment if 6 could be run in parallel and manifested at the same time, as opposed to 48 years if they were run in series. While the latter is a play on extremes, it gets the point across that not combining resources for related experiments is terribly wasteful. Throw in the fact that 6 of the 12 experiments in cardio and neuro are Yellow 2's that clog the queue with second tier objectives and block new Category Reds from entering, and the stated BR&C goal of obtaining and implementing a countermeasure after 3 category Red experiments per discipline is pie in the sky.
Another concern is the quality of the science itself. Few if any of the experiments have valid controls in the usual sense of the word. They appear to, using pre and post flight data on the same individual, but subject-to-subject variations; restrictions imposed by Medical Operations; data sharing and other constraints (addressed below) conspire to confound results. Unfortunately, there is a "we have to live with it, that's the nature of the beast," mentality that has become the mantra, and controlled or cross disciplinary studies aimed at devariable-izing the mix are few and far between. Together with the small N size typical of most flight experiments, the line between real and wishful science is continually being blurred.
Page 19
NRAs.
The problems above start and end with the solicitation process. The flight NRAs seek research proposals that will "lead to the development of effective countermeasures or operational techniques for problems associated with one of the 12 disciplines covered by the Critical Path Roadmap." However, of the 21 proposals that made the final cut in the 2001 solicitation, only 1 was a bona-fide countermeasure (Rubin) and it, in all likelihood, will be downgraded to a ground study prior to flying because it "lacks sufficient ground data pedigree in humans." The pattern is not restricted to Flight NRAs. The 2002 Ground NRA solicitation drew over 100 proposals and only 5 countermeasures (2 of which were non funded international studies). Such a track record is discouraging, especially in light of the fact that the current flight program is already bloated with 84 % mechanistic studies. The problem doesn't end here, however, the underpinnings of the entire NRA process are questionable. Specifically:
The above issues are only half the problem. The other half is the confusion arising from the fact that there is not one NRA process but many and they confuse not only the principal investigators but the peer reviewers and NASA implementers overseeing them. The pyramid of Figure 6 is a gross simplification. In effect, there are 9 separate solicitation mechanisms that lead to overlapping and conflicting experiments: NASA Flight NRAs; NSBRI Flight NRAs; CEVP Flight NRAs; NASA Ground NRAs; NSBRI Ground NRAs; CEVP Ground NRAs; SMOs; grants and unsoliticited proposals. In theory, these processes are somehow woven together, in fact they are anything but.
Page 20
CPR.
The Critical Path Roadmap drives the direction and quality of the NRAs but it is flawed. In principal, this document attempts to meet the goals of the Young Commission by means of risk reduction, mitigation, and management. In practice its reach far exceeds its grasp. Like a computer model of a complex system, its fidelity rests on the strength of its underlying assumptions and its inputs. By carving the human body into 12 distinct disciplines, none of which are really interconnected, the CPR attempts to do too much. Since the body is a complex system in which every subsystem depends on every other one, the CPR ought follow the same philosophy. It does not. The CPR fails because it assumes that the PIs answering the fundamental questions in a particular discipline will be able to cull out the cross-disciplinary effects from other disciplines. In fact, data sharing hurdles thrown in its path by astronaut privacy restrictions and other obstacles negate this assumption. The CPR is supposed to be a "living document" but there's not enough resources, human or otherwise, to change it fast enough to reflect program changes. The conundrum of the CPR can best be appreciated by the following anecdote. A PI team proposing a cardiovascular experiment was told they would have to downscale their study because cardiomyopathy addressed Critical Questions 3.06 and 3.18 under Critical Risks 13 and 14 of the CPR. This was of lesser importance (yellow and green) than the arrhythmia portion of their experiment which addressed Critical Question 3.01 of Critical Risk 13 (Red). Their response to this was that "we are cardiologists who've been doing this for decades and we believe arrythmias are caused by cardiomyopathy. The CPR is wrong,"
A second problem with the CPR is that the assignment of critical questions to particular experiments is a judgment call in many cases. For example, does the Merfeld Sensory Integration experiment, Critical Risk 33, address Critical Question 9.09 or 9.25 or both; does the Alendronate SMO countermeasure (Critical Risk 9) also address Critical Risk 10, and Critical Question 2.19, 2.98 and 2.06 or all of them; does Bungo-Levine's CARDIO experiment address Critical Risks 13, 14 and 15, and Critical Questions 3.01, 3.06 and 3.18 or just the highest priority items (Red 1)? Figure 2 lists these overlapping critical questions and risks, and the consequence misjudgment can be serious. Take Oman's VOILA experiment, for example, addressing Critical Risk 20 in Behavior and Performance and labeled a Green 1 (lowest priority). It also appears to address Critical Risk 33 under Neurovestibular, however, which would elevate it to a Red 1, the highest priority. The difference between this label could be the difference between flying and deselection. Pierson's SWAB experiment is another example, labeled as both an Immune discipline, Critical Risk 22, and a Food/Nutrition discipline, Critical Risk 8; or Schneider's TREADMILL, which cuts across 4 disciplines, addressing Critical Risks 49, 19, 17 and 30 (at least it's labeled a cross-disciplinary experiment). These are just some of the issues that are brought to bear when trying to use the CPR as a tool to categorize and prioritize flight research experiments. Another problem with the CPR is its apparent disconnect with evidence-based space medicine, ie, observations gleaned from actual flight experience. The documented history of physiological problems lists behavioral problems and kidney stones as the most frequent and serious problems but the CPR pays more attention to cardio and neuro while kidney stone experiments are ranked as Yellow 2 (Renal Stone). And while behavior experiments abound in the flight queue (there are 5), there is not a proposed countermeasure in the lot and redundant objectives in 3 of them. In summary, the CPR is a tool that attempts to extrapolate programmatic priorities using a cookbook approach to the human body that is prone to misinterpretation and confusion.
Page 21
Confounding variables
The program has justified the use of a small subject size, N, through studies such as Evan's and Ildstad's Small Clinical Trials, which in the case of human life science flight experiments has subjects serving as their own controls though preflight, in-flight and post-flight data collection. The editors of this work, however, probably never envisioned the number of confounding variables that would negate a small N under spaceflight conditions. Figures 8 and 9 are cases in point, showing the dispersion in two of the most important parameters in human microgravity studies, aerobic capacity and bone loss. In the words of the one of the PIs, "The numbers are %change per month with SD in (). If you take +-3*SD as the range (contains 97% of the data), then you see the variability is huge. For example, for total femur trabecular BMD, the percent change is 2.5+-0.9, which means you have a range of 0 to about 5% loss per month. At the end of 6 months, the extreme range is 30% or so of lost total femur trabecular BMD. These data clearly document large variabilities."
How is one to determine the root cause of the data scatter in such studies when multiple parameters are varying concurrently, parameters that cannot be culled out due to data sharing issues (addressed later); different techniques or instruments for measuring the same thing (Biopsy (US) vs Myon ( Russian); Profilaktika vs CEVIS; different types of ultrasound, DEXA's, MRIs, etc). Throw the small N into the mix and conclusions of worth become rare indeed. How, for example, can one justify an N of 3 for the Foot experiment, 4 for H-reflex and 5 for Spatial Cues under such circumstances? The case of exercise countermeasures is especially noticeable, since they are supposed be beneficial on multiple fronts, from muscle strength to bone loss. There are three exercise countermeasures on ISS, the TEVIS (treadmill), CEVIS (bicycle ergometer) and IRED (resistive force device). The exercise prescriptions used for all 3 went from a research protocol to a countermeasure application before they were fully mature. The effect of exercise on the various organ systems is poorly understood as a consequence. Many assume that exercise is beneficial to bone loss, for example, but more than one principal investigator has looked at Figures 8 and 9 and wondered how anyone could reach such a conclusion. And while the in-flight exercise countermeasure is mandatory, the exercise prescriptions vary greatly, oftentimes left up to the astronauts to do "their own thing." To compound matters, there are too many fingers in the exercise prescription pie: exercise physiologists, ASCRS, flight surgeons, etc. It is also interesting to note that the ASCRS are under the auspices of Flight Surgeons, while the Exercise Physiology lab is under the Human Adaptation and Countermeasure Office, ie., flight research. The lab is also beholding to Med Ops since it responsible for a number of medical requirements (MRIDS). The concluding example relates to the use of drugs as a confounding variable and the lack of well-thought out exclusion controls. The soon to be implemented Alendronate study to reduce bone loss would have used the same subjects eligible for other bone loss studies such as VIBE (using vibration) and Renal Stone (using potassium citrate) if the potential interactions had not been inadvertently detected. In short, a systematic, rigorous means of preventing multiple variables from confounding flight research data is sorely lacking. The program is full of selfdestructive inconsistencies, many of them owing to poor management (see below).
Page 23
Management.
As should be obvious by now, many of the problems described above are nested in management. Figures 10 and 11 speak volumes to the point. These charts read more like a plan for the invasion of Normandy than one outlining the steps, organizations, boards and individuals involved in the process for developing countermeasures (Fig. 10) and manifesting flight experiments (Fig 11). From the charts, the number of people involved are likely to be in the hundreds and the diagrams themselves are unreadable and unusable. Followed to the letter, these flow charts would inevitably lead to confusion. Another example is shown by Figure 12, describing the review board process experiments must pass through. The sheer number of paths almost screams that "no one wants to make a decision." When astronauts' lives are at stake, ethics and informed consent review is essential, of course, but at some point it becomes overkill. Take the case of alendronate again, a drug that has been in wide use for years with demonstrated benefit for osteoporosis and bone loss. Having gone through one of the strictest regulatory systems on the planet, the FDA's approval process for consumer drugs, it must still be subjected to the twists and turns of Figures 10, 11 and 12 before it can be used in flight. By the time the Alendronate SMO gets through this gauntlet, no one will care because a list of bisphosphonates that could fill this page (and are already available) will likely have supplanted it on the ground, every one of which will have to follow in its wake. Absurdity number one is that one decent study on the pharmacokinetics of classes of drugs in microgravity might shortcut the process of requiring flight experiments for each individual drug by years, yet pharmacokinetics are called a Yellow 2 by the CPR and not even given top priority, a point of great contention to many Medical Operations personnel. Absurdity number two is that those same Med Ops personnel are able to prescribe drugs in orbit without knowing their pharmakokinetics. The system allows it to happen both ways, has a lag time of years with an incalculable waste and confusion factor. The underlying cause, in the author's opinion, is a bureaucratic hodgepodge with NO CLEAR CHAIN OF COMMAND, and a management style operating by verbal rather than written orders, exasperated by an e-mail process that serves as the main organ of communication rather than formal memorandums. Everyone seems to have their hands in the decision-making pot, and duplication runs rampant. The consequence? From the Select for Flight Telecons to the review boards described above, action items are bounced back and forth between multiple codes, organizations and individuals with infrequent resolution.
Page 27
Typical of this "ping-pong" management" are the actions of one not-to be named review board emailing its members not to convene 48 weeks out of the year as opposed to informing them to convene 4. Memorandums of Understanding and Project Plans also aboundhow SK should relate to SM written by SM; how SM should relate to SK written by SK; a plan to merge 2 major organizations after they were unmerged following a previous merger. While many of these plans never reach fruition, there's a tendency to ignore them even when they do since it's a safe bet they'll be amended before anyone can get comfortable enough to implement them.
Another manifestation of poor management is the time between an experiments' selection and its first flight. From limited data, the average waiting period for DSOs is 2.5 years, while that for ISS experiments is nearly 4 years, a delay that threatens the very foundation and utility of the experiment and tests the nerves of its PIs and flight managers alike. While extenuating circumstances such as schedule shifts, fleet turnaround times and accidents certainly contribute to delays, part of the blame must rest with management. Then there is the problem of conformance, ie., the number of crew members who actually sign up for an experiment after the crew briefing, shown for the DSO program in Figure 13.
On a typical shuttle flight of 7 crewpersons, the participation rate has averaged only 1.9 subjects or a 28% conformance rate since STS-95 (Chart B, Appendix 2). The track record on ISS seems better, averaging nearly 50%, but that is deceptive since only 3 crewmen fly and 1.4 sign up. Poor conformance starts with the crew briefing, the presentation given to each ISS or STS crewmember by the Principal Investigators. Often referred to as Informed Consent Briefing or ICB, the crew briefing is where informed consent agreements and confidentiality agreements are obtained and BDC (biological data collection) schedules are baselined. Despite the fact that crew participation is essential to the success of the flight research program, the program is strictly voluntary and PIs must compete against each other for astronaut's services. Allowing for the fact that subjects can be disqualified on medical grounds or excluded from some experiments because they're mutually exclusive of others, the potential subject pool is lowered before it even starts. Then comes the crew time barrier. With less than 20 hours/week available for research (Figure 14), conformance is crippled again as it leaves from the starting gate. Next comes R+0, the 4 hour time limit for landing day data collection, and the beat goes on.
Page 28-29
Med Ops Medical Operations are distinct from flight research management yet so interwoven with it that they must be treated separately. Charts as convoluted as Figures 10-12 would be needed to show all the interactions (one has yet to be written, the challenge is so great). Although the primary goal of Med Ops is to maintain crew health and well-being during all phases of flight, they have many other responsibilities including:
Though unintended, many of these responsibilities undercut the research program. The following examples illustrate the point :
Our Russian Friends
Although this too falls under the purview of management, it is so unique, that it must, like Med Ops, be addressed separately. The Russians (and ESA) have their own Biomedical Researach Program, which, at first blush, is supposed to complement our own. But does it? Figure 15 lists Russian human flight experiments on ISS alongside some of their US counterparts. Many of these experiments appear to mirror our own with the whole being less than the sum of its parts. Cases in point: Russians don't pay attention to the CPR and are doing experiments in several disciplines that may be duplicative; Russian crewmembers can participate in US experiments if we pay them, while American astronauts cannot participate in Russian experiments; some US PIs arrogantly think of Russian experiments as "tissue paper" (Fitts, for example, who, it must be noted, has the largest cost per year of ANY experiment on the manifest, and has an N of 3); Russians are not invited or choose not to attend the ISLSWG meetings that select experiments from NASA NRAs, choosing their own peer review process for their experiments in a way some NASA overseers view as inferior; the same precious space aboard ISS may be occupied by Russian hardware virtually identical to our own (the Russian experiment, Profilaktika uses a bicycle ergometer (velometer) while the CEVIS is available as well); Russians view the program with different priorities, choosing thermoregulation, (the experiment, Thermography) as a high priority while the US CPR ignores it, and so on. Life at ESA isn't much different. In the past month alone, 3 flight experiments suddenly appeared with requests for manifesting, one for blood pressure monitoring, another for EKG analysis (Rhythms) and a third for neurovestibular assessment (NeuroCOG), all remarkably similar to current DSO or HRF experiments. How many ways can one split the hairs of an EKG, one might ask, and how many varieties of 3-D visual imagery can one look at in the "name of science" without a countermeasure appearing? In a program with such limited resources, there shouldn't be many but there are. We may have a joint international crew but we do not have a joint research program. The expression, "A house divided against itself cannot stand," comes to mind.
Page 31
Summary
There are many excuses some would cite for the poor performance described above: it's a vestige of the old days when operations led the food chain; the system is changing and it takes time to mature; ISS is in the build phase and research must await its completion; a flight platform is not a ground platform and one must live with confounding variables; a 3 person crew can't perform like one with 6; STS 107 effected the schedule, and so on. While many of these excuses are valid, three facts stand out. The average experiment takes 5-6 years to complete, can cost as much as $4M and the avowed expectation of 3 experiments to flush out a single countermeasure in a particular discipline is pie in the sky. The more likely scenario, under the best of circumstances, is that when ISS has reached the end of its useful life, the number of countermeasures so derived will be pitifully small compared to the investment. Under the worst of circumstances, ISS will be in the ocean
(Excerpt) Read more at spaceref.com ...
goodness thats an excerpt??
NASA needs to concentrate on space engineering.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.