FR Folding@Home Project Update - More About Protein Folding (20,000,000 Points!)

Pulled form the postings of Dan Ensign, one of the graduate students in the Pande Group:

...

First, let's review some basic physics. The key idea is that of a "trajectory." You might recall Newton's Second Law, F = ma, which means that the acceleration a (change in velocity) that a particle experiences is proportional (by its mass m) to the force F it experiences. This means that if we can catalog all the forces on a particle, we can determine its acceleration. If we know the acceleration, then we can use calculus to determine the particle's position as a function of time, for all time. The result is what's called a 'trajectory' -- a kind of map of where the particle has been and where it will be going. By the way, when I say 'particle,' I mean that we could perform this analysis on atoms, protein molecules, baseballs, the space shuttle, the Sun, or anything in between.

The analysis gets a lot harder the more particles there are in the system -- for instance, if you set up a system with the Earth and the Sun as two particles, experiencing each others' gravity, then you can solve Newton's Second Law very easily and write down a function which describes the position of the Sun and the Earth at all times. If you include the moon or other planets, then you can't write down functions like this, though you can solve Newton II numerically. This is what we do for FAH -- solve Newton II numerically for thousands of atoms, thousands of times, once every femtosecond or so (that's "ten-to-the-minus-15" seconds). What we get is a trajectory for the protein atoms.

If we're simulating protein folding, then perhaps the trajectory will result in a folded protein. Perhaps not -- we don't have a way to say for sure how this happens for an arbitrary starting conformation. (But we're studying it, obviously, thanks to our Army of Undea -- oops, I mean FAH clients. The Army of Undead is for a different project entirely.)

Now, on my desktop machine at work, I can simulate a system of about 16,000 atoms moving for 1 nanosecond (ns, or "ten-to-the-minus-9" seconds) in one day. But the protein that I'm folding requires (on average) one microsecond ("ten-to-the-minus-6" seconds) to fold -- and this is a system engineered to fold fast. To get to one microsecond on my desktop machine, I'd have to fold for 1,000 days. Forget about "average" proteins, which might take hundreds of microseconds, or milliseconds, to fold.

Maybe I'd get lucky and the protein would fold in that time; maybe I wouldn't, and they'd find me 35 years later, in some sub-subbasement below the chemistry building at Stanford, a raving lunatic lost to the dredges of Ph. D. research, sneaking out only at night to feed on spilled yeast extract and collecting discarded NMR tubes to wear as primitive jewelry. (I heard this happened to a guy.)

To avoid life-wasting tragedy, we (and when I say "we" I mean, "Someone besides me, but who I know") has recruited hundreds of thousands of generous and interested persons ("you guys") to give us a hand with some of this work. I could run a trajectory for 1,000 days, but instead we've taken a shortcut and decided to run 1,000 or 10,0000 or 100,000 trajectories for a few days (or months or years) instead. On average, a few of these trajectories will result in a folded protein (and we have ways of yielding interesting and important information from all of the work done on FAH).

Okay, here it is: The CLONE numbers are labels for each trajectory that we run. Each GENeration is another chunk of time along that trajectory. So, say that I benchmark CLONE0, GEN0 (the first 4 ns). That WU is then done, and the FAH software builds a new WU with starting coordinates (and velocities and stuff) where mine left off. Then the new WU -- GEN1 of CLONE0 -- gets sent to you, and you simulate the next 4 ns. And so on. So CLONE is a label for an individual trajectory, and GENerations are time steps along that trajectory.

RUNs are groups of similar CLONEs. All the CLONEs in a RUN have the exact same atoms, the exact atom positions, the same temperature, etc. The difference is the starting velocities -- the initial motions of all the atoms in the protein are randomized. Although statistically the velocities are determined by the temperature, there are countless ways of partitioning the velocities to the atoms, so we try out 100 or so CLONEs to get a good feel for the sample space. Assigning different velocity sets to the atoms turns out to be wildly important: if the conformation we start with happens to represent the transition state (sort of halfway from folded, halfway from unfolded) then 50 of our 100 CLONEs will fold, and 50 won't.

The different RUNs in a PROJect might, in their simplest form, represent different starting conformations. So, we could start off 100 RUNs of different partially unfolded structures and try to find the one for which half of its CLONEs fold -- then that RUN has the conformation of a representative of the transition state.

So why is this transition state doohickey so important? The folded state is relatively easy to identify, especially if experimentalists have determined the structure for the protein under scrutiny, or for a very similar one. The "unfolded state" is a bit harder, but we can generate unfolded conformations by, say, simulating the folded protein at high temperatures so it "melts," or we can thread the amino acid sequence on a set of randomly coiled noodles, or whatever. But the path which connects "unfolded" protein with folded protein is not so easy to get to -- but if we identify the transition state, then we've found (at least one of) the paths by which proteins fold, and that's research in protein folding.

The RUNs might also represent slightly different proteins -- for instance, different mutants of some protein. They might represent other things that I haven't thought of, but whatever they are they are similar enough to other RUNs in the same PROJect, that, well, they're part of the same project.

So to summarize, when I'm setting up a project, I might do the following: 1. Pick 100 different unfolded or partially unfolded conformations of my protein of interest. These become my RUNs. 2. Then, I set up 100 different CLONEs for each RUN. (Well, I don't actually set them up myself, I just run a program. But I run it really well. And intelligently. And I look good doing it.) Each CLONE contains one WU at this point. 3. Then, I let the (100 RUNs) x (100 CLONEs) = 10,000 WUs loose on the world ("you guys"). 4. Then, I go have lunch. 5. I come back weeks later to find WUs crunched and GENerations progressing -- each of the original 10,000 WUs was the beginning of one trajectory, so at the end, I have 10,000 trajectories of 50 or 100 or more ns. 6. Finally, I sift through the data and learn something new about protein folding!

And so it goes. I'm still new at this, so I haven't actually done steps 4, 5, or 6 yet, but I've got a good handle on 1, 2, and 3, and now it's a matter of waiting (and doing 1, 2, and 3 a lot more).

...

Bruce has just correctly pointed out to me that this isn't always true (although it's true nearly all of the time). In some instances -- when different trajectories are made to interact -- the "next generation" can't be built until all the other CLONEs have returned WUs of the same generation.

This happens for instance when doing "Replica Exchange Molecular Dynamics," for which the different CLONEs would be trajectories run at different temperatures (at least I think this is how it works ...). Sometimes, the atom coordinates between different trajectories need to be swapped in REMD, and hence you need to wait for the CLONEs to all have generation n finished to build GEN n+1 WUs.

I think. Try

http://folding.stanford.edu/papers/rhee_MREMD_2003biophys.pdf

(hope I got that right). In the end, AIUI, doing REMD with FAH is a pain compared to just doing it on a supercomputer -- we'd rather use FAH for its strengths ("a freaking lot of processors").

1 posted on 02/25/2007 10:02:01 AM PST by texas booster

To: 1066AD; 11Bush; A.Hun; abner; AbsoluteGrace; Advil; aft_lizard; ahayes; aliquando; ambrose; AMD; ...

For those wanting to know more about the math behind F@H. I have tried to keep it simple.

Somebody better ping the math and science dudes and dudettes to this thread. My advanced math was 30 years ago and was average at best.

2 posted on 02/25/2007 10:05:02 AM PST by texas booster (Join FreeRepublic's Folding@Home team (Team # 36120))

To: M1Garand; Malsua; manwiththehands; Marie Antoinette; Marie; MarkeyD; MassLengthTime; ...

3 posted on 02/25/2007 10:06:13 AM PST by texas booster (Join FreeRepublic's Folding@Home team (Team # 36120))

To: texas booster

4 posted on 02/25/2007 10:08:41 AM PST by E.G.C.

To: texas booster

5 posted on 02/25/2007 10:10:03 AM PST by UB355 (Slower traffic keep right)

To: texas booster

Sciencespeak for "Sh_t happens"

6 posted on 02/25/2007 10:13:36 AM PST by oyez

To: texas booster

Sciencespeak for "Sh_t happens"

7 posted on 02/25/2007 10:13:41 AM PST by oyez

And I was really hoping to make it at least somewhat comprehensible!

Folding@Home is a project (like SETI) that uses our computers, when at rest, to perform serious calculations on them to do basic research for medicine.

About 200 regular FReepers are now part of the team and contribute nearly 1,050 systems to the effort.

Please hang around and we will help fill in the gaps.

8 posted on 02/25/2007 10:18:07 AM PST by texas booster (Join FreeRepublic's Folding@Home team (Team # 36120))

The sad part is that I have heard of guys that work forever on their Ph.D. only to be rejected.

Wouldn't THAT be a bummer?

9 posted on 02/25/2007 10:20:19 AM PST by texas booster (Join FreeRepublic's Folding@Home team (Team # 36120))

To: texas booster

Folding@Home FAQ for new users:

What is Folding@Home? A Stanford University project to find out how proteins fold.

Why it's important: Proteins folding wrong causes all kinds of diseases, like Alzheimer's, Parkinson's, and forms of cancer. Folding@Home uses novel computational methods and large scale distributed computing, to simulate timescales thousands to millions of times longer than previously achieved. Through Folding@home, scientists now have the horsepower to study the mechanics of protein folding. With its ability to share the workload among hundred of thousands of computers economically, Folding@home can help scientists understand how proteins snap, or don't, into their predestined shapes - and may help to explain the origins of diseases such as Alzheimer's and apparently unrelated diseases. We're fueling research that could end all that.

How does it work?: You download a safe, tested program (see link below) that is certified by Stanford University. It gets work from Stanford, runs calculations using your spare computer power, and sends the results back to the University.

Is it safe? Yes! Folding@Home rarely effects computer performance in any way and won't compromise your privacy in any way. It only uses the computing power you aren't using so it doesn't slow down other programs.

How do I get started folding for Team FreeRepublic?:
1.) Download the folding program from Stanford University's folding download page (Folding@home Client Download). Type in your desired username.
2.) Type in 36120 for the team number. THIS IS VERY IMPORTANT - if you get the number wrong, you won't be folding for team FreeRepublic!
3.) The third question asks, "Launch automatically at machine startup, installing this as a service?" - We recommend you answer YES. Otherwise you will have to manually start the program after every reboot.

How can my computer help? Even if they were given exclusive access to all of the world's supercomputers, Stanford still wouldn't have as much processing power as they get from the supercluster of people's desktop systems Folding@home relies on. Modern supercomputers are essentially a cluster of hundreds of processors linked by fast networking. But Stanford needed the power of hundreds of thousands of processors, not just hundreds.

There's no reason to not get involved! It's free, easy, and you can know you're helping every minute without lifting a finger.

*******************************************

List of Relevant Folding Links
Why Fold - Watch This !!

Another Folding Clip

Folding@home Client Download

FreeRepublic.com Folder Stats

Extreme Overclockers Stats for FreeRepublic

Another Stats Page

*******************************************
Competition (Not!!) Dummies ..Daily Kos

Dummie Folding Threads #7 #8 #9 #10 #11 #12

**************************************************
Other Useful Stuff - Links

How much are those work units worth? And what are they?
All Projects Listed
Point Summary for Workunits

Stat Image Generator

Fahmon Third Party Monitoring Software

**************************************
Past FreeRepublic Folding threads

#1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31

10 posted on 02/25/2007 10:23:12 AM PST by texas booster (Join FreeRepublic's Folding@Home team (Team # 36120))

To: Professional Engineer

11 posted on 02/25/2007 10:28:14 AM PST by Peanut Gallery

To: texas booster

Thanks for the ping, Prof. TB! lol

I lost a box the other day, and have it ready to go off for repair (under warranty).

I was a bit anguished by the loss, but it got me to wondering. You know, it was one of those "one thought leads to another " kind of wondering. It wound up at...How must it feel to Prof. Pande to have nearly 2 million pc's, from all over the world, working for his project?

I'd bet he's pretty darn proud, and appreciative. Have you seen all those accolades and grants he's garnered?

:O)

P

12 posted on 02/25/2007 10:29:52 AM PST by papasmurf (Join Team 36120 Free Republic Folders. Folding@Home Enter Name:FRpapasmurf)

To: texas booster

United Devices does something similar with a nicer screen saver.

13 posted on 02/25/2007 10:53:00 AM PST by martin_fierro (< |:)~)

To: texas booster

PhD in Chemical Physics here.

(Long ago, in a galaxy far, far away.)

You have made grey_whiskers purr.

Cheers!

14 posted on 02/25/2007 12:59:42 PM PST by grey_whiskers (The opinions are solely those of the author and are subject to change without notice.)

To: texas booster

I have tried to keep it simple.

LOL...I got keyboard imprints on my forehead from falling asleep and resting my head while reading about the physics.

In any case, thanks for the post.

FOLD one for the GIPPER!

15 posted on 02/25/2007 12:59:55 PM PST by Drango (A liberal's compassion is limited only by the size of someone else's wallet.)

My shameless self-promotion for this thread:

If you're interested in tracking your folding machine(s) over the web, please Freepmail me.

Available features include:

Timely snapshot of each machine's progress on it's current project
Point values for project
Estimated completion dates/times
Warning indicators for machines that appear to have stopped communicating or folding
Warnings for machines that won't make their deadline
Team ranking status box (courtesy of EOC)
Machine stats like PPD (points per day)
Project comparison stats: PPD, average time taken to completion, etc.
It's free-- and always will be
...other features as time permits, and people request

16 posted on 02/25/2007 1:46:15 PM PST by Egon ("If all your friends were named Cliff, would you jump off them??" - Hugh Neutron)

To: texas booster

17 posted on 02/25/2007 1:47:50 PM PST by andyk (Go Matt Kenseth!)

To: texas booster

TB,
Is it possible that the old thread get a final message when a new thread is started? I usually find myself a few days behind on new threads as I keep the old one up consistently. Might also help other new people.

Many thanks in advance and check your six,
JosephW

18 posted on 02/25/2007 1:54:49 PM PST by JosephW (Mohammad Lied, People die!)

To: texas booster

well THAT made my head explode.
glad you're splaining it though...

I'm just folding

19 posted on 02/25/2007 2:29:40 PM PST by stylin19a

To: grey_whiskers

F@H had a project that solved for Schroedingers Equation last year on these proteins!

I still remember the basics but realize that I can not keep up with real math any more. Still, nice to stay in touch with the smart folks.

If you have a spare computer to toss into the effort please join us. It is really a great way to make use of the spare cycles on your system.

Better yet, get involved and become a beta tester here:

http://forum.folding-community.org/ftopic3045.html

20 posted on 02/25/2007 3:30:05 PM PST by texas booster (Join FreeRepublic's Folding@Home team (Team # 36120))

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794