Q: What is latency and why is it important?
A: Latency can be understood as "pauses" that the VM performs during execution. It is orthogonal to execution speed. For instance, a VM can be incredibly fast but pause for 1s every once in a while. Another VM can be slower but never pause for more than 1ms. The second VM might be preferred for realtime applications, such as games. We believe VMs should be predictable and offer low latency, that is why Octane 2.0 introduced specific latency tests.
Q: Why did you fix some existing tests in Octane 2.0 and did this impact their score?
A: A test is supposed to measure what it says it measures, not something else. In two of the tests (Regexp, Codeload), it was pointed out that caching of results between runs could skew the result: the VM under test would not perform what it is supposed to test all the way through. We updated both tests to reduce the chance of caching results. The score you see now is much closer to what the test is set to measure. Clearly in some VMs, this means that the score might have dropped. Part of Gameboy Emulator's code was supposed to run in strict mode but it didn't because of a bug. We fixed this and noted no score impact on any browser.
Q: How do you choose the constant that determines new tests score?
A: The basic rule is that you should get the same score running Octane v1 or Octane 2.0 on the same machine. To make the balancing of new scores simple, we just make sure that the new tests added in Octane 2.0 have the same score as the total Octane v1 score. In this way, they will not influence the geometric mean of the total score.
Q: What is the difference between Octane and other well-known JavaScript benchmarks, such as Kraken or Sunspider?
A: Octane aims to be representative of actual workloads and execution profiles of real web applications. Octane's goal is to be a proxy for the JavaScript application that you'll encounter when running browser games, highly-interactive web pages or online productivity tools.
Many micro-benchmarks, such as Sunspider, were written at a time when JavaScript wasn't used as extensively as a cornerstone of large, rich web applications. Therefore they tend to not measure the performance of JavaScript Engines under the demanding JavaScript environment that a modern web application creates. We've tried very hard to avoid making Octane a micro-benchmark that only tests very specific JavaScript features out of context.
Q: Why do you guys at Chrome need to make your own benchmark? Can't you use an existing one?
A: It's not possible to improve something you can't measure. The V8 Engineering Team has always been on the lookout for good measures of real-world JavaScript performance to know what to optimize V8 for. We believe that most other benchmarks don’t stress adequately the performance bottlenecks that are worth optimizing, in order to improve everybody’s experience on the web. The web has evolved and they are often not representative, not comprehensive enough or in some cases too “game-able”. That is why the V8 Benchmark Suite was originally born and updated further and further in the past years, and is why we have released Octane.
Q: What do the scores mean?
A: In a nutshell: bigger is better. Octane measures the time a test takes to complete and then assigns a score that is inversely proportional to the run time (historically, Firefox 2 produced a score of 100 on an old benchmark rig the V8 team used).
Q: Can I compare benchmark scores across Octane releases? What about single tests scores?
A: Each test in the suite will never be modified once released. That means, each test score can be compared across Octane versions. Every future version of Octane will contain new tests, hence the total score (“Octane Score”) of a version cannot be compared with that of the previous ones.
Q: Is Octane going to be updated? How often?
A: Yes, Octane is evolving together with the web and the kind of applications Chrome's users want to run today and in the future. We are not going to provide a fixed update schedule but as soon as a meaningful new test is identified, it will be considered for inclusion.
Q: I don't think you cover all that's out there, though...
A: ...and you are probably right! Choosing new tests for a benchmark suite is a long process; each test must be carefully evaluated to make sure it illustrates something new and meaningful. We added five new tests to Octane, however there is still a lot of room to improve JavaScript performance is areas we don't measure yet. If you think you've found an a real-world JavaScript application that illustrates something important that's not in Octane, let us know by filing a bug. If the code can be packed in a self-contained, open-source friendly way, we'll consider it for addition to one of the next iterations. And we'll optimize for it, you can count on it.