The Benchmark

What's new in Octane 2.0

Octane v1 focused on execution throughput, but in today's highly interactive applications (e.g. games, UI-rich web applications) reducing jankiness (what we call "ensuring low latency") is just as important. The V8 team wants to reduce compiler and garbage collection latency, so we included specific measures for those in Octane 2.0. In order to accomplish that we instrumented two existing tests, Mandreel and Splay, so that they now produce a latency score together with the existing speed score from Octane v1.

Octane 2.0 adds a test to measure how VMs perform on asm.js-like code. The test is taken directly from the zlib sample code from Mozilla's Emscripten test suite. To simulate complex data structure- and memory-intensive applications, we added a test derived from Microsoft's TypeScript compiler. The result measures how fast TypeScript compiles itself.

An important but behind-the-scenes modification in the test harness allows running tests a fixed number of times rather than as many times as possible in a time interval. In the past, a test would run a minimum number of times or at least 1 second. However, executing long-running tests a small but variable number of times--which can happen if the test harness limits run counts based on a timeout--leads to a higher variance in benchmarks results. To make the benchmark scores more consistent between runs, we fixed the iteration count for the benchmarks that are long-running. For example, the TypeScript-derived and zlib tests are long-running and therefore only run 1 and 3 times respectively, regardless of the time it takes. For TypeScript, a single run is sufficient to measure the full load / compile / execution performance, while in zlib, multiple runs are required to get an overall measurement of startup and throughput performance. For VMs that may use ahead-of-time compilation of asm.js-compliant code, we use eval() to ensure that we measure both preparation and execution time.

In addition to the new tests, several existing ones have been fixed:

  • GameBoy: part of the code was supposed to run in strict mode, but it didn't. Now it does. A bug that led to excessive out-of-bounds memory accesses in a TypedArray was also fixed.
  • Regexp: eliminated the possibility of caching some results.
  • CodeLoad: improved the cache busting measures by using a true hash function.
  • DeltaBlue and NavierStokes: minor bug fixes that don't change the overall execution profile.

The test suite in detail

Octane 2.0 consists of 17 tests, four more than Octane v1. We have chosen each test in order to cover most use cases encountered in the real web.

We believe a high-performance JavaScript engine should be able to perform well on real-world code, not just synthetic benchmarks. That is how the four new tests, have been selected.

  • Richards
    OS kernel simulation benchmark, originally written in BCPL by Martin Richards (539 lines).
    • Main focus: property load/store, function/method calls
    • Secondary focus: code optimization, elimination of redundant code
  • Deltablue Fixed
    One-way constraint solver, originally written in Smalltalk by John Maloney and Mario Wolczko (880 lines).
    • Main focus: polymorphism
    • Secondary focus: OO-style programming
  • Raytrace
    Ray tracer benchmark based on code by Adam Burmister (904 lines).
    • Main focus: argument object, apply
    • Secondary focus: prototype library object, creation pattern
  • Regexp Fixed
    Regular expression benchmark generated by extracting regular expression operations from 50 of the most popular web pages (1761 lines).
    • Main focus: Regular expressions
  • NavierStokes Fixed
    2D NavierStokes equations solver, heavily manipulates double precision arrays. Based on Oliver Hunt's code (387 lines).
    • Main focus: reading and writing numeric arrays.
    • Secondary focus: floating point math.
  • Crypto
    Encryption and decryption benchmark based on code by Tom Wu (1698 lines).
    • Main focus: bit operations
  • Splay
    Data manipulation benchmark that deals with splay trees and exercises the automatic memory management subsystem (394 lines)..
    • Main focus: Fast object creation, destruction
  • SplayLatency New
    The Splay test stresses the Garbage Collection subsystem of a VM. SplayLatency instruments the existing Splay code with frequent measurement checkpoints. A long pause between checkpoints is an indication of high latency in the GC. This test measures the frequency of latency pauses, classifies them into buckets and penalizes frequent long pauses with a low score.
    • Main focus: Garbage Collection latency
  • EarleyBoyer
    Classic Scheme benchmarks, translated to JavaScript by Florian Loitsch's Scheme2Js compiler (4684 lines).
    • Main focus: Fast object creation, destruction
    • Secondary focus: closures, arguments object
  • pdf.js
    Mozilla's PDF Reader implemented in JavaScript. It measures decoding and interpretation time (33,056 lines).
    • Main focus: array and typed arrays manipulations.
    • Secondary focus: math and bit operations, support for future language features (e.g. promises)
  • Mandreel
    Runs the 3D Bullet Physics Engine ported from C++ to JavaScript via Mandreel (277,377 lines).
    • Main focus: emulation
  • MandreelLatency New
    Similar to the SplayLatency test, this test instruments the Mandreel benchmark with frequent time measurement checkpoints. Since Mandreel stresses the VM's compiler, this test provides an indication of the latency introduced by the compiler. Long pauses between measurement checkpoints lower the final score.
    • Main focus: Compiler latency
  • GB Emulator Fixed
    Emulates the portable console's architecture and runs a demanding 3D simulation, all in JavaScript (11,097 lines).
    • Main focus: emulation
  • Code loading Fixed
    Measures how quickly a JavaScript engine can start executing code after loading a large JavaScript program, social widget being a common example. The source for this test is derived from open source libraries (Closure, jQuery) (1,530 lines).
    • Main focus: JavaScript parsing and compilation
  • Box2DWeb
    Based on Box2DWeb, the popular 2D physics engine originally written by Erin Catto, ported to JavaScript. (560 lines, 9000+ de-minified)
    • Main focus: floating point math.
    • Secondary focus: properties containing doubles, accessor properties.
  • zlib New
    The zlib asm.js/Emscripten test from the Mozilla Emscripten suite, running with workload 1. The code is enclosed in eval(), that guarantees that the running time we measure includes parsing and compilation on all browsers (2,585 lines).
    • Main focus: code compilation and execution
  • Typescript New
    Microsoft's TypeScript compiler is a complex application. This test measures the time TypeScript takes to compile itself and is a proxy of how well a VM handles complex and sizable Javascript applications (25,918 lines).
    • Main focus: run complex, heavy applications