1 00:00:00,060 --> 00:00:02,500 The following content is provided under a Creative 2 00:00:02,500 --> 00:00:04,010 Commons license. 3 00:00:04,010 --> 00:00:06,350 Your support will help MIT OpenCourseWare 4 00:00:06,350 --> 00:00:10,720 continue to offer high quality educational resources for free. 5 00:00:10,720 --> 00:00:13,330 To make a donation or view additional materials 6 00:00:13,330 --> 00:00:17,209 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,209 --> 00:00:17,834 at ocw.mit.edu. 8 00:00:26,550 --> 00:00:28,270 PROFESSOR: Greg was not here yesterday. 9 00:00:28,270 --> 00:00:29,220 He was away. 10 00:00:29,220 --> 00:00:32,470 And so this is work he's largely done on his own again. 11 00:00:32,470 --> 00:00:34,339 He's done a molecular dynamics simulator. 12 00:00:34,339 --> 00:00:36,380 So he's going to tell us a little bit about that, 13 00:00:36,380 --> 00:00:39,770 and then after that we'll do course evaluations, 14 00:00:39,770 --> 00:00:42,530 hand out some awards, have some cake, and then after that 15 00:00:42,530 --> 00:00:45,885 some pizza, and then play same PlayStation 3s. 16 00:00:45,885 --> 00:00:48,830 All yours. 17 00:00:48,830 --> 00:00:50,790 GREG PINTILIE: OK, so I'll talk about the make 18 00:00:50,790 --> 00:00:53,080 general molecular dynamics algorithm 19 00:00:53,080 --> 00:00:56,930 and then parallelization approaches. 20 00:00:56,930 --> 00:00:59,980 So molecular dynamics is based on a potential energy function, 21 00:00:59,980 --> 00:01:04,519 which includes two terms, bonded terms and non-bonded terms. 22 00:01:04,519 --> 00:01:08,455 And bonded terms are split up into three sub terms. 23 00:01:12,040 --> 00:01:14,867 So don't confuse bonded with-- actually, 24 00:01:14,867 --> 00:01:16,450 don't think that these are non-bonded. 25 00:01:16,450 --> 00:01:18,650 These are actually all sort of grouped as bonded. 26 00:01:18,650 --> 00:01:20,108 I know it's a little bit confusing, 27 00:01:20,108 --> 00:01:22,970 but right now I'm dealing just with the sort of bonded terms. 28 00:01:22,970 --> 00:01:24,930 And what that means is that it deals 29 00:01:24,930 --> 00:01:27,410 with atoms that are connected together covalently. 30 00:01:27,410 --> 00:01:32,610 So for example, the bonds term reflects some energy 31 00:01:32,610 --> 00:01:37,190 due to a displacement between two atoms, which 32 00:01:37,190 --> 00:01:38,570 are certain distance apart. 33 00:01:38,570 --> 00:01:41,930 There is an equilibrium distance that they like to stay at, 34 00:01:41,930 --> 00:01:45,280 and if they vary from that distance 35 00:01:45,280 --> 00:01:49,760 the potential increases as a squared of that separation 36 00:01:49,760 --> 00:01:51,770 distance. 37 00:01:51,770 --> 00:01:57,890 The angles term represents, basically, 38 00:01:57,890 --> 00:01:59,610 the deviation from an equilibrium 39 00:01:59,610 --> 00:02:01,350 angle between three atoms. 40 00:02:01,350 --> 00:02:04,360 So if you imagine these three nodes as three atoms, 41 00:02:04,360 --> 00:02:07,560 if this angle varies away from a certain equilibrium 42 00:02:07,560 --> 00:02:11,260 theta or angle, then the potential energy 43 00:02:11,260 --> 00:02:16,120 is also a quadratic term, which tends 44 00:02:16,120 --> 00:02:22,010 to bring the three atoms to have a certain equilibrium angle. 45 00:02:22,010 --> 00:02:25,350 And then, finally, the dihedral and improper term 46 00:02:25,350 --> 00:02:27,260 is probably one of the more complicated ones. 47 00:02:27,260 --> 00:02:31,990 The improper term basically deals with four atoms. 48 00:02:31,990 --> 00:02:36,080 So if you imagine this is one, two, three, and four. 49 00:02:36,080 --> 00:02:39,090 These three atoms, if you put two planes through them, 50 00:02:39,090 --> 00:02:43,230 the angle that those two planes make to each other 51 00:02:43,230 --> 00:02:46,360 is given a certain equilibrium angle again. 52 00:02:46,360 --> 00:02:50,930 And this should also be a quadratic term, 53 00:02:50,930 --> 00:02:52,650 but the energy also varies quadratically 54 00:02:52,650 --> 00:02:55,760 as that angle is deviated from. 55 00:02:55,760 --> 00:02:59,670 An actual dihedral is the rotation 56 00:02:59,670 --> 00:03:03,330 along a bond between the two central atoms. 57 00:03:03,330 --> 00:03:07,540 And this energy term has actually a cosine form. 58 00:03:11,680 --> 00:03:16,040 So as you rotate it along this bond, say for example, 59 00:03:16,040 --> 00:03:19,990 the energy will very sinusoidally. 60 00:03:19,990 --> 00:03:22,000 And there will be a restoring potential that 61 00:03:22,000 --> 00:03:24,830 will tend to make it either a line, 62 00:03:24,830 --> 00:03:28,320 or sometimes it will make the atom stagger. 63 00:03:28,320 --> 00:03:32,090 So all of these parameters are very sort of molecule and atom 64 00:03:32,090 --> 00:03:32,750 dependent. 65 00:03:32,750 --> 00:03:35,350 And all of these parameters are available, 66 00:03:35,350 --> 00:03:40,450 or have been built over many years by different methods 67 00:03:40,450 --> 00:03:43,870 like ab initio calculations or experimental calculations. 68 00:03:43,870 --> 00:03:46,380 But basically, if you plug those parameters in, for example, 69 00:03:46,380 --> 00:03:51,120 here k represents the spring constant, right? 70 00:03:51,120 --> 00:03:53,370 Because they're basically just like spring constraints 71 00:03:53,370 --> 00:03:54,045 more or less. 72 00:03:54,045 --> 00:03:55,420 Or in this case, it will tell you 73 00:03:55,420 --> 00:03:57,920 how far up the potential, like what this value will 74 00:03:57,920 --> 00:04:02,570 be at the maximum, and will, for example, in this case, 75 00:04:02,570 --> 00:04:06,310 will specify how many fluctuations you 76 00:04:06,310 --> 00:04:08,400 have in the energy function. 77 00:04:08,400 --> 00:04:10,340 And the other parameters are usually 78 00:04:10,340 --> 00:04:14,116 the theta, the equilibrium theta, or the equilibrium, 79 00:04:14,116 --> 00:04:17,170 or the equilibrium bond length. 80 00:04:17,170 --> 00:04:19,279 So this is kind of what you can expect 81 00:04:19,279 --> 00:04:27,120 for it to look like as molecules will tend to move. 82 00:04:27,120 --> 00:04:30,240 The non bonded terms include van-der-Waals terms 83 00:04:30,240 --> 00:04:31,810 and electrostatic terms. 84 00:04:31,810 --> 00:04:35,450 So van-der-Waals forces occur between any two atoms, 85 00:04:35,450 --> 00:04:41,780 and if they're far enough apart, they will attract to each other 86 00:04:41,780 --> 00:04:43,220 according to this formula. 87 00:04:43,220 --> 00:04:45,040 And if they're too close together, 88 00:04:45,040 --> 00:04:47,610 so at equilibrium distance between them 89 00:04:47,610 --> 00:04:51,780 the energy is a minimum, but as they get any closer 90 00:04:51,780 --> 00:04:54,670 there will be a very strong group repulsive force, which 91 00:04:54,670 --> 00:04:56,600 is the r to the 12th term. 92 00:04:56,600 --> 00:04:58,590 And as they get further apart, there 93 00:04:58,590 --> 00:05:03,440 is an attraction force, which varies as r to the sixth. 94 00:05:06,870 --> 00:05:10,450 And then the electrostatic force is just 95 00:05:10,450 --> 00:05:12,630 a simple Coulombic form. 96 00:05:12,630 --> 00:05:15,120 So you basically take two atoms that are, 97 00:05:15,120 --> 00:05:18,050 say oppositely charged, will attract each other. 98 00:05:18,050 --> 00:05:20,480 If they're positively charged they will repel each other. 99 00:05:20,480 --> 00:05:24,320 And it's just a matter of multiplying the charges, 100 00:05:24,320 --> 00:05:26,310 and then dividing by r to get the energy. 101 00:05:29,160 --> 00:05:31,520 OK, so to do molecular dynamics, basically, 102 00:05:31,520 --> 00:05:34,850 you have to take derivatives of all those formulas. 103 00:05:34,850 --> 00:05:35,970 I just showed you. 104 00:05:35,970 --> 00:05:37,719 And I didn't do that all myself. 105 00:05:37,719 --> 00:05:38,510 It was pretty hard. 106 00:05:38,510 --> 00:05:42,940 I just got some code from some current existing 107 00:05:42,940 --> 00:05:44,420 molecular dynamic simulators. 108 00:05:44,420 --> 00:05:46,230 But basically, you take the derivative. 109 00:05:46,230 --> 00:05:48,390 And that basically equates to the force, right? 110 00:05:48,390 --> 00:05:51,150 The typical Newton Newtonian model. 111 00:05:51,150 --> 00:05:53,980 And that gives you the force in every atom, 112 00:05:53,980 --> 00:05:58,590 and then using that force and integration scheme, 113 00:05:58,590 --> 00:06:00,970 the leapfrog Verlet integration scheme just 114 00:06:00,970 --> 00:06:05,420 gives you the velocity at the next half time step. 115 00:06:05,420 --> 00:06:08,860 And then using that velocity at the half time step, 116 00:06:08,860 --> 00:06:13,830 the positions of each atom is updated 117 00:06:13,830 --> 00:06:16,910 based on the positions at the previous time step, 118 00:06:16,910 --> 00:06:18,570 and then some value. 119 00:06:18,570 --> 00:06:20,570 So I will show you a little bit of what 120 00:06:20,570 --> 00:06:23,675 it looks like on a simple molecule first. 121 00:06:27,770 --> 00:06:29,400 So here's a sort of simple molecule, 122 00:06:29,400 --> 00:06:31,550 and it has just an initial configuration 123 00:06:31,550 --> 00:06:35,540 that is just sort of random. 124 00:06:38,990 --> 00:06:43,070 And so one technique that is done 125 00:06:43,070 --> 00:06:45,190 is first the geometry is minimized, 126 00:06:45,190 --> 00:06:47,140 which means that you just take small steps 127 00:06:47,140 --> 00:06:49,800 in the direction of the gradient until you reach the energy 128 00:06:49,800 --> 00:06:50,740 minimum. 129 00:06:50,740 --> 00:06:52,680 And if I do that, I sort of reach 130 00:06:52,680 --> 00:06:54,335 sort of the minimum configuration 131 00:06:54,335 --> 00:06:57,140 that this molecule likes to be in. 132 00:06:57,140 --> 00:07:00,460 And then, if I add some thermal noise, which means that I just 133 00:07:00,460 --> 00:07:02,650 set the initial velocities of the atoms 134 00:07:02,650 --> 00:07:06,370 to some random value based on the temperature, 135 00:07:06,370 --> 00:07:10,680 then the molecule starts sort of jiggling around. 136 00:07:10,680 --> 00:07:13,180 And the nice thing about the Verlet integration scheme 137 00:07:13,180 --> 00:07:14,830 is that it's actually just right, 138 00:07:14,830 --> 00:07:16,650 like the energy is conserved in the system, 139 00:07:16,650 --> 00:07:18,200 although it fluctuates a little bit. 140 00:07:18,200 --> 00:07:19,949 So a lot of integration schemes the energy 141 00:07:19,949 --> 00:07:23,220 tends to be either decrease or increase, 142 00:07:23,220 --> 00:07:24,569 so the system explodes. 143 00:07:24,569 --> 00:07:26,360 But this integration scheme is very stable. 144 00:07:29,880 --> 00:07:33,500 One thing that you saw there is, so I just talked briefly 145 00:07:33,500 --> 00:07:34,390 about kinetic energy. 146 00:07:34,390 --> 00:07:36,440 This is kind of how the initial velocities are 147 00:07:36,440 --> 00:07:40,100 set based on the Boltzmann constant and the temperature. 148 00:07:40,100 --> 00:07:43,520 And the other thing that molecular dynamics usually do 149 00:07:43,520 --> 00:07:45,010 is called Langevin dynamics. 150 00:07:45,010 --> 00:07:48,640 And in this case, some random velocity vector 151 00:07:48,640 --> 00:07:53,490 is added to the normal velocity, which just sort of simulates 152 00:07:53,490 --> 00:07:55,789 collisions with other molecules in the environment. 153 00:07:55,789 --> 00:07:58,080 So this makes the simulation look a bit more realistic. 154 00:07:58,080 --> 00:08:00,820 So instead of the molecule just sitting there in space, 155 00:08:00,820 --> 00:08:06,350 this random sort of factor adds some movement. 156 00:08:06,350 --> 00:08:08,720 And there's also a damping constant, 157 00:08:08,720 --> 00:08:13,250 which sort of opposes the velocity by some constant. 158 00:08:13,250 --> 00:08:16,550 So if you didn't add any more sort of randomness 159 00:08:16,550 --> 00:08:20,700 into the system, the system would just eventually go 160 00:08:20,700 --> 00:08:22,442 to be completely still. 161 00:08:22,442 --> 00:08:23,900 So there would be no more movement, 162 00:08:23,900 --> 00:08:27,600 because this is kind of like a damping sort of factor. 163 00:08:27,600 --> 00:08:31,900 And there are some properties of this random vector that sort 164 00:08:31,900 --> 00:08:34,419 of make some physical sense. 165 00:08:34,419 --> 00:08:36,950 For example, it's independent of previously picked 166 00:08:36,950 --> 00:08:39,650 random vectors, and it also depends on the temperature. 167 00:08:39,650 --> 00:08:42,299 So that if you keep adding this random factor 168 00:08:42,299 --> 00:08:44,322 to the simulation, the temperature 169 00:08:44,322 --> 00:08:45,405 will always stay constant. 170 00:08:50,590 --> 00:08:52,990 Another thing to consider is that these molecules, 171 00:08:52,990 --> 00:08:54,620 if you try to think of them as they 172 00:08:54,620 --> 00:08:56,860 are in a sort of typical biological system, 173 00:08:56,860 --> 00:09:00,030 is that they'll be solvated in water. 174 00:09:00,030 --> 00:09:03,040 So in that case the Coulombic term 175 00:09:03,040 --> 00:09:05,260 is not actually that simple. 176 00:09:05,260 --> 00:09:08,270 You have to include some sort of dielectric constant. 177 00:09:08,270 --> 00:09:10,990 So for example, one that is typically used 178 00:09:10,990 --> 00:09:14,500 and is pretty simple is that the constant is just the distance. 179 00:09:14,500 --> 00:09:18,690 So that makes the electrostatic energy dependent on one over r 180 00:09:18,690 --> 00:09:20,440 squared rather than 1 over r. 181 00:09:20,440 --> 00:09:23,410 So that means that the electrostatic forces drop off 182 00:09:23,410 --> 00:09:27,240 a lot faster as the atoms get further and further apart. 183 00:09:27,240 --> 00:09:33,430 So for example, so two molecules that are oppositely charged, 184 00:09:33,430 --> 00:09:36,030 if you just put them in vacuum, they will eventually 185 00:09:36,030 --> 00:09:37,090 come close together. 186 00:09:37,090 --> 00:09:42,320 But if you add this term, the electrostatic attraction 187 00:09:42,320 --> 00:09:43,160 will be much weaker. 188 00:09:43,160 --> 00:09:45,730 So it will simulate them being solvated in water. 189 00:09:45,730 --> 00:09:47,760 And they will come together eventually 190 00:09:47,760 --> 00:09:49,180 but probably a lot slower. 191 00:09:52,200 --> 00:09:54,040 Another thing that is typically-- 192 00:09:54,040 --> 00:09:56,320 so I'll talk a little bit about the non-bonded forces 193 00:09:56,320 --> 00:09:59,110 right now, which are, again, the electrostatic forces 194 00:09:59,110 --> 00:10:01,000 and the van-der-Waal's forces. 195 00:10:01,000 --> 00:10:03,310 So generally, what has to be done 196 00:10:03,310 --> 00:10:06,110 is, if you consider any single atom 197 00:10:06,110 --> 00:10:09,400 like say this one over here, well, 198 00:10:09,400 --> 00:10:11,340 so first of all the bonded forces basically 199 00:10:11,340 --> 00:10:13,990 include just the atoms that are connected to it, and that's it. 200 00:10:13,990 --> 00:10:17,530 So it's usually the fastest step in the computation. 201 00:10:17,530 --> 00:10:20,780 But for non-bonded forces, you have to consider basically 202 00:10:20,780 --> 00:10:24,410 every other atom in the system. 203 00:10:24,410 --> 00:10:26,512 And I didn't include all of the atoms here, 204 00:10:26,512 --> 00:10:27,970 but basically just to give an idea, 205 00:10:27,970 --> 00:10:31,360 they could be atoms that are really far apart. 206 00:10:31,360 --> 00:10:32,880 So generally, what's done is there 207 00:10:32,880 --> 00:10:35,490 is some cut off radius, outside of which 208 00:10:35,490 --> 00:10:36,840 those forces aren't considered. 209 00:10:36,840 --> 00:10:38,750 And because the forces drop off really fast, 210 00:10:38,750 --> 00:10:40,760 sometimes that's considered to be good enough 211 00:10:40,760 --> 00:10:42,340 for a lot of simulations. 212 00:10:42,340 --> 00:10:45,035 So that's a consideration to take into account 213 00:10:45,035 --> 00:10:48,100 when doing these simulations. 214 00:10:48,100 --> 00:10:52,190 OK, so here's the basic molecular dynamics algorithm. 215 00:10:52,190 --> 00:10:54,580 So you specify how many steps you want to run it to. 216 00:10:54,580 --> 00:11:00,085 And each time step is one femtosecond. 217 00:11:03,280 --> 00:11:04,010 So very small. 218 00:11:06,920 --> 00:11:10,720 So usually what happens is you can find the atom pairs. 219 00:11:10,720 --> 00:11:12,990 So that means all the non-bonded interactions, 220 00:11:12,990 --> 00:11:15,560 you have to make a list, basically an n squared list. 221 00:11:15,560 --> 00:11:17,080 If you have n atoms, you can expect 222 00:11:17,080 --> 00:11:19,680 to have about n squared atom pairs. 223 00:11:19,680 --> 00:11:22,150 But if you consider a cut off, then what's usually done 224 00:11:22,150 --> 00:11:27,110 is only atom pairs within that cut off 225 00:11:27,110 --> 00:11:30,350 are considered and included into that list. 226 00:11:30,350 --> 00:11:32,140 And that computation is pretty expensive, 227 00:11:32,140 --> 00:11:36,790 d it's sometimes only done only every, say every ten steps. 228 00:11:36,790 --> 00:11:39,060 And atoms don't really move that much. 229 00:11:39,060 --> 00:11:40,990 So it's usually pretty accurate. 230 00:11:40,990 --> 00:11:44,639 And if you use a cut off of, say 15 angstroms, and then, well, 231 00:11:44,639 --> 00:11:46,930 the forces have dropped to almost zero by 12 angstroms, 232 00:11:46,930 --> 00:11:48,190 then that's pretty accurate. 233 00:11:48,190 --> 00:11:52,180 So this is usually an optimization. 234 00:11:52,180 --> 00:11:55,210 It optimizes the simulation a little bit. 235 00:11:55,210 --> 00:11:57,250 The next step is to compute the bonded forces. 236 00:11:57,250 --> 00:11:59,140 So that includes the bonds, angles, dihedrals 237 00:11:59,140 --> 00:12:00,160 and improper angles. 238 00:12:03,120 --> 00:12:04,800 And then compute the non-bonded forces, 239 00:12:04,800 --> 00:12:07,310 so that iterates over all the atom pairs. 240 00:12:07,310 --> 00:12:09,990 And then the integration is done to obtain the new atom 241 00:12:09,990 --> 00:12:11,400 positions. 242 00:12:11,400 --> 00:12:13,500 So here's, running on the PlayStation, 243 00:12:13,500 --> 00:12:18,040 this is kind of roughly how this is just running on the PPU 244 00:12:18,040 --> 00:12:19,000 alone. 245 00:12:19,000 --> 00:12:23,350 So finding the atom pairs, if I don't use a cut off, 246 00:12:23,350 --> 00:12:25,405 I can just compute the atom pairs list once, 247 00:12:25,405 --> 00:12:26,530 and then never do it again. 248 00:12:26,530 --> 00:12:29,130 So that's why I put here 0. 249 00:12:29,130 --> 00:12:32,070 But otherwise, generally takes a lot of time 250 00:12:32,070 --> 00:12:33,790 to compute the atom pairs, because it 251 00:12:33,790 --> 00:12:38,650 involves getting the distances between every two atoms. 252 00:12:38,650 --> 00:12:46,350 And this is a pretty big system, maybe about 1,000 atoms. 253 00:12:46,350 --> 00:12:47,670 So that's why. 254 00:12:47,670 --> 00:12:50,650 So these times are not really representative of smaller 255 00:12:50,650 --> 00:12:51,772 systems. 256 00:12:51,772 --> 00:12:53,500 AUDIENCE: What is BF [INAUDIBLE]? 257 00:12:53,500 --> 00:12:54,990 GREG PINTILIE: BF, so that's done 258 00:12:54,990 --> 00:12:56,320 using the brute force approach. 259 00:12:56,320 --> 00:12:58,700 So that means that you check every two atoms. 260 00:12:58,700 --> 00:13:02,990 And this is using a KD tree, so that breaks space up. 261 00:13:06,120 --> 00:13:12,621 So then checking the neighboring atoms is n log n. 262 00:13:12,621 --> 00:13:14,114 Or it's actually a constant factor 263 00:13:14,114 --> 00:13:15,280 after you've built the tree. 264 00:13:15,280 --> 00:13:16,690 Building the tree is n log n. 265 00:13:16,690 --> 00:13:18,520 So using that kind of an algorithm, 266 00:13:18,520 --> 00:13:20,740 you get much better improvement. 267 00:13:20,740 --> 00:13:25,680 So this is a scaling of increasing the system size 268 00:13:25,680 --> 00:13:27,650 by two in the number of atoms. 269 00:13:27,650 --> 00:13:30,010 So say going from 1,000 to 2,000 atoms. 270 00:13:30,010 --> 00:13:34,670 The time too, using a brute force approach, 271 00:13:34,670 --> 00:13:35,840 increases exponentially. 272 00:13:35,840 --> 00:13:39,780 Whereas, using a KD tree, it actually just about doubles. 273 00:13:39,780 --> 00:13:44,334 So that kind of shows that using a better algorithm sometimes 274 00:13:44,334 --> 00:13:46,000 for these things is probably much better 275 00:13:46,000 --> 00:13:49,590 than trying to parallelize it. 276 00:13:49,590 --> 00:13:54,807 Doing the bonded forces, say about 50 milliseconds, 277 00:13:54,807 --> 00:13:57,390 and with cut off it's the same time, because it doesn't really 278 00:13:57,390 --> 00:13:58,700 affect the step. 279 00:13:58,700 --> 00:14:01,480 But then computing the non-bonded forces 280 00:14:01,480 --> 00:14:03,640 is actually the biggest chunk of time. 281 00:14:03,640 --> 00:14:07,870 So that takes about 1.4 seconds for the system. 282 00:14:07,870 --> 00:14:12,760 And then this is considering a million pairs. 283 00:14:12,760 --> 00:14:15,090 With a cut off there is less pairs to consider, 284 00:14:15,090 --> 00:14:18,622 but the time is still the dominant time basically. 285 00:14:18,622 --> 00:14:20,580 So generally, for molecular dynamic simulations 286 00:14:20,580 --> 00:14:22,620 this is the dominant time factor. 287 00:14:22,620 --> 00:14:24,260 And that's what people really focus on, 288 00:14:24,260 --> 00:14:25,990 trying to speed things up. 289 00:14:25,990 --> 00:14:28,460 And then the integration is really fast, 290 00:14:28,460 --> 00:14:31,480 because it just loops over the atoms once. 291 00:14:31,480 --> 00:14:34,370 So there is basically three different parallelization 292 00:14:34,370 --> 00:14:37,440 approaches that I've come across in the literature. 293 00:14:37,440 --> 00:14:40,950 And one of them is referred to as force decomposition. 294 00:14:40,950 --> 00:14:44,420 So if you think of a force operation being, 295 00:14:44,420 --> 00:14:47,850 say a non-bonded force operation where you consider two atoms, 296 00:14:47,850 --> 00:14:51,080 and you compute the force the two, van-der-Waals 297 00:14:51,080 --> 00:14:56,330 and electrostatic forces, one easy way to split this up 298 00:14:56,330 --> 00:15:01,340 amongst, say six processors, is to just give each processor, 299 00:15:01,340 --> 00:15:05,045 say n divided by 6 of these force operations. 300 00:15:08,440 --> 00:15:12,170 So you have to send each processor both atom positions, 301 00:15:12,170 --> 00:15:15,120 and then some other properties of the atoms, such as the mass 302 00:15:15,120 --> 00:15:19,205 and-- actually not the mass, but the charges and the minimum 303 00:15:19,205 --> 00:15:19,705 radius. 304 00:15:22,250 --> 00:15:24,950 And basically, once that computation is done, 305 00:15:24,950 --> 00:15:28,090 the processor just has to either store or return the force 306 00:15:28,090 --> 00:15:29,530 on both atom. 307 00:15:29,530 --> 00:15:32,514 And it has to come back to sort of like the main processor 308 00:15:32,514 --> 00:15:33,930 where all the forces are added up, 309 00:15:33,930 --> 00:15:35,304 and then the integration is done. 310 00:15:39,920 --> 00:15:43,240 So what I just described is sort of the approach I took. 311 00:15:43,240 --> 00:15:45,690 And it's sort of an easy approach 312 00:15:45,690 --> 00:15:49,270 to sort of think of if you couldn't store, 313 00:15:49,270 --> 00:15:51,350 say every atom on every processor. 314 00:15:51,350 --> 00:15:53,620 Because you just have to send each processor 315 00:15:53,620 --> 00:15:56,160 a certain number of these force computation units. 316 00:15:56,160 --> 00:15:58,630 And the processor does then and then returns the forces. 317 00:15:58,630 --> 00:16:01,480 And then you can just add up all the forces on every atom 318 00:16:01,480 --> 00:16:03,110 and do that. 319 00:16:03,110 --> 00:16:06,210 So basically here is the algorithm how it works roughly. 320 00:16:06,210 --> 00:16:10,187 And so basically, first set up the SPUs, 321 00:16:10,187 --> 00:16:12,020 send the control box with, say the addresses 322 00:16:12,020 --> 00:16:15,650 where a these force operations will be stored. 323 00:16:15,650 --> 00:16:20,090 The bonded forces are computed on the PPU, 324 00:16:20,090 --> 00:16:23,835 just for simplicity for now. 325 00:16:23,835 --> 00:16:25,960 And then the computation for the non-bonded forces, 326 00:16:25,960 --> 00:16:30,890 basically, all the non-bonded forces operations remaining 327 00:16:30,890 --> 00:16:31,987 are considered. 328 00:16:31,987 --> 00:16:33,570 And then they're split up into blocks. 329 00:16:33,570 --> 00:16:35,980 So each SPU sent a block with, say 330 00:16:35,980 --> 00:16:37,690 a number of force operations. 331 00:16:37,690 --> 00:16:40,480 And in this case it was 200. 332 00:16:40,480 --> 00:16:43,320 So the control block is sent to the SPU, 333 00:16:43,320 --> 00:16:46,220 the SPU is told to start processing. 334 00:16:46,220 --> 00:16:49,370 And then the PPU waits for each SPU to finish. 335 00:16:49,370 --> 00:16:52,320 And then once the SPU is finished, 336 00:16:52,320 --> 00:16:55,417 the forces are added to the two atoms that are involved 337 00:16:55,417 --> 00:16:56,500 in that force computation. 338 00:16:59,100 --> 00:17:02,250 And then this loop is done once all the SPUs are finished. 339 00:17:02,250 --> 00:17:03,740 And then it goes, in this repeats 340 00:17:03,740 --> 00:17:06,700 until there is non-bonded operations remaining. 341 00:17:06,700 --> 00:17:09,596 So for example here, just to illustrate this, 342 00:17:09,596 --> 00:17:11,720 let's say these are all the non-bonded operations I 343 00:17:11,720 --> 00:17:12,800 have to do. 344 00:17:12,800 --> 00:17:15,310 And I'm going to use two SPUs. 345 00:17:15,310 --> 00:17:18,289 And each SPU can only store, say three force operations. 346 00:17:20,890 --> 00:17:22,730 So each SPU is sent by a block. 347 00:17:22,730 --> 00:17:24,520 I mean that, say these three operations 348 00:17:24,520 --> 00:17:25,435 are sent to each SPU. 349 00:17:25,435 --> 00:17:28,560 The SPU does them. 350 00:17:28,560 --> 00:17:31,720 The next block is sent to the next SPU at the same time, 351 00:17:31,720 --> 00:17:33,510 and then that SPU also does it. 352 00:17:33,510 --> 00:17:36,040 And this is iteration 0 for example. 353 00:17:36,040 --> 00:17:38,860 And this is done until all of the force computations 354 00:17:38,860 --> 00:17:40,500 are done. 355 00:17:40,500 --> 00:17:41,590 So here's a timing. 356 00:17:41,590 --> 00:17:44,680 So you might imagine that, as I said, 357 00:17:44,680 --> 00:17:49,540 there is there is a lot of communication 358 00:17:49,540 --> 00:17:52,410 overhead over here. 359 00:17:52,410 --> 00:17:55,360 So here's how this performs. 360 00:17:55,360 --> 00:18:01,080 So on the PPU, the MD simulation per time step, 361 00:18:01,080 --> 00:18:04,320 runs in 310 milliseconds. 362 00:18:04,320 --> 00:18:09,580 On one SPU, the time was up to 630 milliseconds. 363 00:18:09,580 --> 00:18:12,350 So there is a lot of communication overhead here. 364 00:18:12,350 --> 00:18:15,400 On two SPUs it goes back down to 390. 365 00:18:15,400 --> 00:18:18,560 And as the number of SPUs is increased, 366 00:18:18,560 --> 00:18:19,620 it gets lower and lower. 367 00:18:19,620 --> 00:18:23,310 But it's not that much better than really running it 368 00:18:23,310 --> 00:18:24,400 sequentially. 369 00:18:24,400 --> 00:18:28,120 And the reason is that the way I implemented is pretty rough. 370 00:18:28,120 --> 00:18:32,130 One of the limitations, I mean one very obvious optimization 371 00:18:32,130 --> 00:18:36,190 that should be done is that SPU basically 372 00:18:36,190 --> 00:18:38,350 waits for this whole block to arrive, 373 00:18:38,350 --> 00:18:39,624 and then starts computation. 374 00:18:39,624 --> 00:18:41,040 But really what I should have done 375 00:18:41,040 --> 00:18:47,090 is once one force computation arrives, 376 00:18:47,090 --> 00:18:50,080 you can already do that computation. 377 00:18:50,080 --> 00:18:52,320 Instead of waiting for all of them to be transferred 378 00:18:52,320 --> 00:18:53,903 and then starting to do the operation. 379 00:18:53,903 --> 00:18:56,540 So that I think that could've been a lot better. 380 00:18:56,540 --> 00:18:57,534 AUDIENCE: [INAUDIBLE] 381 00:18:57,534 --> 00:18:58,031 GREG PINTILIE: No. 382 00:18:58,031 --> 00:18:58,906 AUDIENCE: [INAUDIBLE] 383 00:19:03,030 --> 00:19:03,780 GREG PINTILIE: No. 384 00:19:06,790 --> 00:19:11,830 So right, so this is sort of the simplest approach, 385 00:19:11,830 --> 00:19:14,670 I mean sort of a simple approach to implement 386 00:19:14,670 --> 00:19:15,620 on this architecture. 387 00:19:15,620 --> 00:19:18,120 And it sort of scales well as the system grows. 388 00:19:18,120 --> 00:19:20,250 So if you can't store every item, for example, 389 00:19:20,250 --> 00:19:25,660 in every SPU, this doesn't really care. 390 00:19:25,660 --> 00:19:28,177 The system can grow as large as you want. 391 00:19:28,177 --> 00:19:29,760 There's some other approaches that are 392 00:19:29,760 --> 00:19:31,270 done sort of in the literature. 393 00:19:31,270 --> 00:19:34,070 I didn't implement these, but another one 394 00:19:34,070 --> 00:19:35,410 is called atomic decomposition. 395 00:19:35,410 --> 00:19:37,570 So here, if you have, say a number of atoms, 396 00:19:37,570 --> 00:19:41,140 you send each one of these atoms to one SPU. 397 00:19:41,140 --> 00:19:45,370 And then that SPU is solely responsible for computing 398 00:19:45,370 --> 00:19:49,250 the forces on it, and also integrating 399 00:19:49,250 --> 00:19:50,550 to get the new positions. 400 00:19:50,550 --> 00:19:52,060 And then all you have to communicate 401 00:19:52,060 --> 00:19:53,471 is the atomic positions. 402 00:19:53,471 --> 00:19:55,220 But this is a little bit more complicated, 403 00:19:55,220 --> 00:20:00,500 because then every SPU has to know 404 00:20:00,500 --> 00:20:02,670 about all of the force computations 405 00:20:02,670 --> 00:20:04,190 that you want to do. 406 00:20:04,190 --> 00:20:06,610 So for example, in this case it's just a non-bonded, 407 00:20:06,610 --> 00:20:10,151 but the SPU also has to know about-- 408 00:20:10,151 --> 00:20:12,151 AUDIENCE: So that would be like the same example 409 00:20:12,151 --> 00:20:14,520 that we did in [INAUDIBLE]. 410 00:20:14,520 --> 00:20:18,552 GREG PINTILIE: Yeah, it would be. 411 00:20:18,552 --> 00:20:20,260 Yeah, the only problem with that approach 412 00:20:20,260 --> 00:20:23,220 is that if the system was much larger than the SPU could 413 00:20:23,220 --> 00:20:28,450 store, and you couldn't store every single atom on the SPU, 414 00:20:28,450 --> 00:20:34,245 then an operation might refer to atoms that are not there. 415 00:20:37,170 --> 00:20:39,010 So yeah, it would be similar to that. 416 00:20:39,010 --> 00:20:40,950 But it would get more complicated in this case 417 00:20:40,950 --> 00:20:42,450 as a system grew, because you'd have 418 00:20:42,450 --> 00:20:44,650 to keep track of where each atom is and then 419 00:20:44,650 --> 00:20:45,870 implement communication. 420 00:20:45,870 --> 00:20:48,120 If you don't have a certain atom, 421 00:20:48,120 --> 00:20:49,840 to get to that atom position and use it 422 00:20:49,840 --> 00:20:52,020 in the force computation. 423 00:20:52,020 --> 00:20:56,230 So yeah, this approach, OK, I'll talk a little bit more 424 00:20:56,230 --> 00:20:56,735 about this. 425 00:20:56,735 --> 00:20:57,931 This is a-- 426 00:20:57,931 --> 00:21:03,710 AUDIENCE: [INAUDIBLE] I mean not SPU, but [INAUDIBLE]. 427 00:21:03,710 --> 00:21:05,257 GREG PINTILIE: That's true, yeah. 428 00:21:05,257 --> 00:21:07,340 So you only sort of give a certain number of atoms 429 00:21:07,340 --> 00:21:09,860 to every SPU, and then, yeah. 430 00:21:09,860 --> 00:21:10,360 That's true. 431 00:21:10,360 --> 00:21:12,940 And then you would have to sort of store every single force 432 00:21:12,940 --> 00:21:17,243 that that atom is included in on the SPU. 433 00:21:17,243 --> 00:21:18,118 AUDIENCE: [INAUDIBLE] 434 00:21:21,790 --> 00:21:26,390 GREG PINTILIE: Yeah, like when this will become a problem is, 435 00:21:26,390 --> 00:21:29,530 say the system was much larger than you had local store. 436 00:21:29,530 --> 00:21:34,130 Then say you had like 1/10 of the system on SPUs 437 00:21:34,130 --> 00:21:35,900 and then 9/10 of the system on the PPU. 438 00:21:35,900 --> 00:21:38,404 AUDIENCE: [INAUDIBLE] 439 00:21:38,404 --> 00:21:40,320 GREG PINTILIE: Yeah, it would be hard to think 440 00:21:40,320 --> 00:21:41,278 of it scaling properly. 441 00:21:44,950 --> 00:21:47,640 So a special case of this atomic decomposition 442 00:21:47,640 --> 00:21:49,370 is actually spatial decomposition. 443 00:21:49,370 --> 00:21:51,184 OK, I'm almost done. 444 00:21:51,184 --> 00:21:52,975 So a special case of spatial decomposition. 445 00:21:55,710 --> 00:21:58,510 So what makes this approach sort of not good, 446 00:21:58,510 --> 00:22:01,510 and it sort of involves a lot of communication, 447 00:22:01,510 --> 00:22:04,920 is just this fact that some of these force computations, 448 00:22:04,920 --> 00:22:07,130 or force operations that you have could include atoms 449 00:22:07,130 --> 00:22:08,320 on different units. 450 00:22:08,320 --> 00:22:11,130 So if you don't split up these atoms sort of smartly, 451 00:22:11,130 --> 00:22:13,900 then you could end up having to communicate a lot 452 00:22:13,900 --> 00:22:15,330 while computing these forces. 453 00:22:15,330 --> 00:22:17,410 So spatial decomposition tries to address this. 454 00:22:17,410 --> 00:22:21,590 And basically, you store atoms that are close to each other. 455 00:22:21,590 --> 00:22:25,830 So say I had the system, and I wanted to split this up 456 00:22:25,830 --> 00:22:30,315 on six SPUs, then right, I would give each SPU, 457 00:22:30,315 --> 00:22:31,834 say a certain area of space. 458 00:22:31,834 --> 00:22:34,250 But you could see right away what the problem with this is 459 00:22:34,250 --> 00:22:36,010 is that it's not load balanced. 460 00:22:36,010 --> 00:22:39,749 So some SPUs could not have too much work to do. 461 00:22:39,749 --> 00:22:41,790 And there's still some communication, because you 462 00:22:41,790 --> 00:22:45,060 have to communicate between neighboring SPUs, 463 00:22:45,060 --> 00:22:46,770 say some certain cases. 464 00:22:46,770 --> 00:22:48,620 But the communication is still probably 465 00:22:48,620 --> 00:22:51,000 a lot better than in the previous case, 466 00:22:51,000 --> 00:22:52,250 just for atomic decomposition. 467 00:22:55,230 --> 00:22:58,910 So here's the state of the art on parallelization MD. 468 00:22:58,910 --> 00:23:01,260 And it's implemented in this program called 469 00:23:01,260 --> 00:23:05,340 NAMD, which stands for Not Another Molecular Dynamics 470 00:23:05,340 --> 00:23:06,600 simulator. 471 00:23:06,600 --> 00:23:12,630 So initially, NAMD 1.0 used solely a spatial decomposition. 472 00:23:12,630 --> 00:23:14,472 And they try to address this load balancing 473 00:23:14,472 --> 00:23:16,430 by actually creating many more patches than you 474 00:23:16,430 --> 00:23:18,870 have processors. 475 00:23:18,870 --> 00:23:22,260 And then you just sort of split these patches up, 476 00:23:22,260 --> 00:23:24,650 load balance them amongst the different processors. 477 00:23:24,650 --> 00:23:25,650 That didn't really work. 478 00:23:25,650 --> 00:23:28,100 So NAMD 1 didn't parallelize very well. 479 00:23:28,100 --> 00:23:30,580 So what they added later was they 480 00:23:30,580 --> 00:23:33,130 added this idea of patch objects, 481 00:23:33,130 --> 00:23:34,610 and also compute objects. 482 00:23:34,610 --> 00:23:38,710 So basically a patch object puts a number 483 00:23:38,710 --> 00:23:40,346 of patches on a processor, but then you 484 00:23:40,346 --> 00:23:43,550 can also create compute patches, which include 485 00:23:43,550 --> 00:23:45,450 all of these force operations. 486 00:23:45,450 --> 00:23:47,760 And these force operations are also equally distributed 487 00:23:47,760 --> 00:23:48,680 amongst processors. 488 00:23:48,680 --> 00:23:50,180 And then there's a lot communication 489 00:23:50,180 --> 00:23:51,030 between these two. 490 00:23:51,030 --> 00:23:56,410 And it started off with sort message oriented communication. 491 00:23:56,410 --> 00:23:59,330 Then it moved to a dependency oriented communication 492 00:23:59,330 --> 00:24:01,810 so that each compute object would sort 493 00:24:01,810 --> 00:24:05,760 of signal which atoms it needs. 494 00:24:05,760 --> 00:24:09,306 And it also sort of optimized things 495 00:24:09,306 --> 00:24:12,770 by doing communication and force computation at the same time. 496 00:24:12,770 --> 00:24:14,160 So it did that really well. 497 00:24:14,160 --> 00:24:17,170 And actually it had really good parallel performance. 498 00:24:17,170 --> 00:24:21,450 So on this graph it shows the number of processors 499 00:24:21,450 --> 00:24:23,560 times the time per time step. 500 00:24:23,560 --> 00:24:28,800 And as the number of processors is increased, 501 00:24:28,800 --> 00:24:30,290 I mean optimal parallel performance 502 00:24:30,290 --> 00:24:33,040 would be a straight horizontal line, which is pretty 503 00:24:33,040 --> 00:24:34,520 close to what they achieve. 504 00:24:34,520 --> 00:24:36,830 And they also did it on many thousands of processors 505 00:24:36,830 --> 00:24:37,520 on Blue Gene. 506 00:24:37,520 --> 00:24:40,080 But as you might expect, the more processors 507 00:24:40,080 --> 00:24:44,600 you add, the step time actually still sort of levels off 508 00:24:44,600 --> 00:24:46,290 at a certain time. 509 00:24:46,290 --> 00:24:50,864 So I guess I'll skip the sort of live demo I was going to do, 510 00:24:50,864 --> 00:24:52,280 but I'll just show one more thing. 511 00:24:55,930 --> 00:25:00,494 So here is sort of the kind of systems that I was looking at. 512 00:25:00,494 --> 00:25:01,910 This is actually a simulation that 513 00:25:01,910 --> 00:25:05,560 was done on one of the PPUs and then recorded 514 00:25:05,560 --> 00:25:07,060 in positions file. 515 00:25:07,060 --> 00:25:13,610 And there is five different molecules here. 516 00:25:13,610 --> 00:25:15,640 So generally, what happens is first they're just 517 00:25:15,640 --> 00:25:16,460 placed randomly. 518 00:25:16,460 --> 00:25:18,700 Then they're an overlap just very coarsely, 519 00:25:18,700 --> 00:25:22,360 to make sure that huge forces are not 520 00:25:22,360 --> 00:25:23,507 entered into the system. 521 00:25:23,507 --> 00:25:25,215 And I'll just color each one differently. 522 00:25:27,800 --> 00:25:30,960 So here's what happens during the simulation. 523 00:25:30,960 --> 00:25:38,310 And here, in this simulation, actually, the positions 524 00:25:38,310 --> 00:25:40,460 were recorded every 20 time steps. 525 00:25:40,460 --> 00:25:44,070 So that's why it sort of seems to go a lot faster. 526 00:25:44,070 --> 00:25:46,646 If I did it on a single molecule, 527 00:25:46,646 --> 00:25:48,385 I actually done that first. 528 00:25:48,385 --> 00:25:50,630 So here what a single molecule of this looks like. 529 00:25:50,630 --> 00:25:54,130 And this is a protein with eight residues, which is pretty big. 530 00:25:54,130 --> 00:25:56,680 And this is running in real time on my computer right now. 531 00:25:56,680 --> 00:26:03,020 So like on a small system like this, it runs reasonably fast. 532 00:26:03,020 --> 00:26:06,630 But as you add more and more of these things, 533 00:26:06,630 --> 00:26:08,700 the computations do get pretty expensive. 534 00:26:08,700 --> 00:26:12,930 So parallelization really helps. 535 00:26:12,930 --> 00:26:16,634 Well, not in what I implemented, but generally. 536 00:26:16,634 --> 00:26:19,990 Generally, as you saw, NAMD can do a pretty good job 537 00:26:19,990 --> 00:26:22,104 at doing it. 538 00:26:22,104 --> 00:26:25,400 Oh OK, I didn't minimize the system well enough. 539 00:26:25,400 --> 00:26:27,524 But yeah, that's about it. 540 00:26:27,524 --> 00:26:28,690 AUDIENCE: Great. [INAUDIBLE] 541 00:26:35,550 --> 00:26:37,510 AUDIENCE: So basically, on the SPUs, 542 00:26:37,510 --> 00:26:40,940 you managed to get them synchronized properly, data in 543 00:26:40,940 --> 00:26:42,372 and data out? 544 00:26:42,372 --> 00:26:43,580 GREG PINTILIE: Yeah, exactly. 545 00:26:43,580 --> 00:26:45,537 So that's kind of to the extent that I went. 546 00:26:45,537 --> 00:26:46,870 And that still took a long time. 547 00:26:46,870 --> 00:26:48,670 So [INAUDIBLE].