In between writing MPI versions of my codes, I took a break to test a multi-threading version that would take advantage of my 8 core Mac Pro. I used Intel's processors and some OpenMP directives, so it was all very easy and took just a day of learning+implementing. The results are impressive: with 14 other applications running (including a running Mathematica kernel, Illustrator, iTunes, browsers, etc) the test code went from 74 seconds in a single processor to 16.5 seconds in 6 processors, at which point it saturated. A x4.5 speedup ! With some loop unrolling and further optimizations I already have in testing I should get to the x10 holy grail pretty fast.
0 Comments:
Post a Comment
<< Home