Shortly after writing the solution I started to wondering if using Scalala had any performance impact on the runtime cost of the solution. While Scalala does have the overhead of object creation, it also makes heavy use of specialized classes, which should provide a considerable improvement.
I decided to do some naive benchmarking. These benchmarks are nowhere near scientific, but should provide a general sense of the solution's runtime. Since I was benchmarking the two Scala solutions, I decided to look at also the MATLAB/Octave and R solutions.
All benchmarking was done on the same 64-bit laptop. The time for loading the data from a file/connection was not included in the benchmarks. Averages are based on 100 iterations of each solution. The value of each runtime is placed into a vector/list called runtimes, which is then used to determine the average.
 $ octave --version  
 GNU Octave, version 3.4.0  
   
 $ r --version  
 R version 2.13.1 (2011-07-08)  
   
 $ scala -version  
 Scala code runner version 2.9.0.1 -- Copyright 2002-2011, LAMP/EPFL  
 > mean(runtimes)  
 ans = 0.033590  
 > min(runtimes)  
 ans = 0.032610  
 > max(runtimes)  
 ans = 0.038699  
Onto Alexandre Martins' solution in R:
 > summary(runtimes)  
   Min. 1st Qu. Median  Mean 3rd Qu.  Max.   
 0.03900 0.04100 0.04200 0.04254 0.04325 0.05200  
Next is Scala. We will look at first the iterative solution. For both solutions we will use a simple method to calculate the mean
 def mean[T](s: Seq[T])(implicit n: Fractional[T]) = n.div(s.sum, n.fromInt(s.size))  
 scala> mean(runtimes)  
 res: Double = 0.02949999999999998  
   
 scala> runtimes.min  
 res: Double = 0.025  
   
 scala> runtimes.max  
 res: Double = 0.159  
 scala> runtimes.head  
 res: Double = 0.159  
   
 scala> runtimes.tail.max  
 res: Double = 0.075  
   
 scala> runtimes.take(10)  
 res: scala.collection.immutable.IndexedSeq[Double] = Vector(0.159, 0.075, 0.053, 0.052, 0.033, 0.03, 0.035, 0.031, 0.027, 0.027)  
   
 scala> mean(runtimes.take(10))  
 res: Double = 0.0558
Let us see if the Scalala solution has the same issues:
 scala> mean(runtimes)  
 res: Double = 0.0223  
   
 scala> runtimes.min  
 res: Double = 0.014  
   
 scala> runtimes.max  
 res: Double = 0.424  
 scala> runtimes.head  
 res: Double = 0.424  
   
 scala> runtimes.tail.max  
 res: Double = 0.084  
   
 scala> runtimes.take(10)  
 res: scala.collection.immutable.IndexedSeq[Double] = Vector(0.424, 0.084, 0.072, 0.052, 0.044, 0.032, 0.033, 0.043, 0.017, 0.016)  
   
 scala> mean(runtimes.take(10))  
 res: Double = 0.08170000000000002  
Where do most of the inefficiencies lie? What would happen if we increased the number of steps used to find the local minimum from 1500 to 15000?
MATLAB
 > mean(runtimes)  
 ans = 0.33541  
 > min(runtimes)  
 ans = 0.32408  
 > max(runtimes)  
 ans = 0.39594  
 > summary(runtimes)  
   Min. 1st Qu. Median  Mean 3rd Qu.  Max.   
  0.4250 0.4450 0.4525 0.4563 0.4603 0.5310  
Scala iterative solution
 scala> mean(runtimes)  
 res: Double = 0.27539999999999987  
   
 scala> runtimes.min  
 res: Double = 0.264  
   
 scala> runtimes.max  
 res: Double = 0.564  
   
 scala> runtimes.head  
 res: Double = 0.564  
   
 scala> runtimes.tail.max  
 res: Double = 0.321  
   
 scala> runtimes.take(10)  
 res: scala.collection.immutable.IndexedSeq[Double] = Vector(0.564, 0.29, 0.267, 0.268, 0.268, 0.276, 0.267, 0.269, 0.268, 0.27)  
   
 scala> mean(runtimes.take(10))  
 res: Double = 0.3007  
 scala> mean(runtimes)  
 res: Double = 0.16492  
   
 scala> runtimes.min  
 res: Double = 0.151  
   
 scala> runtimes.max  
 res: Double = 0.811  
   
 scala> runtimes.head  
 res: Double = 0.811  
   
 scala> runtimes.tail.max  
 res: Double = 0.188  
   
 scala> runtimes.take(10)  
 res: scala.collection.immutable.IndexedSeq[Double] = Vector(0.811, 0.159, 0.161, 0.155, 0.159, 0.157, 0.156, 0.157, 0.16, 0.155)  
   
 scala> mean(runtimes.take(10))  
 res: Double = 0.223  
What do all these number ultimately mean? Very little. Nowadays, developer productivity is one of the most important factors to pay attention to. If a developer is more productive in creating a solution in R, than perhaps a performance penalty is not as important. With the Scala solutions, using a linear algebra library helped simplify the code and focus on the mathematical concepts involved. Having a better runtime is only an added benefit.
UPDATE: benchmarking code https://gist.github.com/1201581
UPDATE 2:
The kind folks over at MathWorks were kind enough to do some benchmarking between MATLAB and Octave. All tests done on their system, so a comparison between those numbers and the R/Scala solutions will not be accurate. Then again, these benchmarks were never comprehensive in the first place.
Octave (3.4.0)
 Mean = 0.029570
 Min =  0.029214
 Max  =  0.030383
MATLAB (R2011b)
 Mean = 0.0126
 Min =  0.0125
 Max =  0.0127
Impressive improvement for this small test.
 
Very cool
ReplyDeleteThe first several passes on the JVM will always be slow. IMHO the jvm hotspot/jit etc. is optimized for a scenario where the same code is running for a long time.
ReplyDeleteSo if you don't want to benchmark the startup time then let your algorithm run for 100 iterations (or even more i don't know that exactly) and then start your benchmark.
If the jvm jit is similar to the clr jit the you will always pay the byte code to native compile and optimilization cost the first time a routine is executed. The fully optimilized code is then used for all further ititations.
ReplyDeleteAnonymous,
ReplyDeleteI suspected that the hotspot optimization was kicking in, but I have never profiled code at such a micro-level, therefore I have seen the optimization in action. Interesting stuff. The JVM is one part of why Scala has become my goto language.
However, ignoring the startup time might result in an apples to oranges comparison. If a code is only meant to be executed once, then startup time is important if doing side-by-side evaluations. It would be interesting to see if the code can be bootstrapped by using a small set of fake data and a low number of iterations.
Because profiling in matlab is so easy, it might be interesting to profile the routine and see where the hot spots are in the matlab version at least. If it's all linear algebra kernels (* or inv or whatever) then that would explain why the floor is where it is.
ReplyDelete>> profile on; run_the_code; profile report
is all you need.
what you going to do in that 0.01 second? :)
ReplyDeleteI was wondering why these comments were appearing days after posting the article. Google Analytics tells me Hacker News is linking to me. Welcome!
ReplyDeleteI encourage everyone to read the original post. The goal is to learn the math involved in the algorithms, not to simply create the fastest version. I also encourage everyone to create your own implementation in numpy/scipy/Ruby/Clojure/C++/whatever and share with the world.
Are you using MATLAB or Octave? At the top you show the Octave version, but then refer to MATLAB throughout the rest of the post. Octave and MATLAB are not the same things -- MATLAB is likely to be much more optimized than Octave. Please clarify.
ReplyDeletea pet peeve but ... if you want to compare things ... a graph or table putting the data side by side does wonders for readability and flow of arguments.
ReplyDeletemy 2 cents.
-- Anonymous Coward
RJN,
ReplyDeleteOctave was used in this example. I wished whoever posted the story to Hacker News did not used the title he/she did since the intent was never to be an absolute benchmark. Do not have MATLAB, so I cannot benchmark it in any way. Perhaps there is an EC2 instance with MATLAB already installed?
Anon 3:13,
Graphs or tables would have only helped the notion that the benchmark was significant. I do not think the resulting numbers can be used to create a definitive benchmark. However, I did find the results of the Scala solutions to be interesting since it highlighted the optimization process of the JVM and I did want to plot those values.
(I'm the first Anonymous)
ReplyDeleteIf you have code which have to be benchmarked then this code will probably run for a long time. And then the warm up time of the jvm isn't interesting any more.