Native Client

3D Graphics

3D rendering is a large and complex topic with many aspects that developers must know about. Most of these aspects do not relate specifically to Native Client and thus are beyond the scope of this document. This document provides a discussion of the intersection of 3D rendering and Native Client, and where applicable performance and usage suggestions that can improve your experience on the Native Client platform.

Overview

Native Client uses OpenGL ES 2.0 as its rendering API. For more on OpenGL ES 2.0, see the OpenGL ES 2.0 Programming Guide, which includes source code.

Known limitations (last updated 9-Oct-2011)

  • Instancing is not yet supported.
  • Occlusion queries are not yet supported.
  • Binary (precompiled) shaders are not yet supported.

Known bugs (last updated 29-Dec-2011)

  • There is a known problem with compiling shaders on some versions of Mac OS. If you're seeing issues with any sort of matrix transforms, be explicit about your casting. For instance, choose to upres a vec3 to a vec4 before transforming it by a mat4, as opposed to converting the mat4 to mat3. Note that this problem is not Native Client-specific, it's a driver issue (see additional information).
  • DXT3 and DXT5 support is enabled, but the ENUMs are missing from the gles headers in the pepper_15 SDK bundle. Use the following enums for support:
    #define GL_COMPRESSED_RGBA_S3TC_DXT3_EXT 0x83F2
    #define GL_COMPRESSED_RGBA_S3TC_DXT5_EXT 0x83F3
  • Do not call glDisable(GL_TEXTURE_2D). In OpenGL ES 2.0, this is an ERROR, which gets logged in Chrome's "about:gpu" tab. This logging can grow un-bounded, and cause memory leaks and instability.
  • Do not use GL_FIXED. It's not supported in OpenGL, so emulation for OpenGL ES 2.0 is slow. By default, GL_FIXED support is turned off in the Pepper 3D API. There is an option to turn it on—don't do it.

Getting started with the Pepper 3D API

To use the Pepper 3D API, you must use version 1.0 of the SDK or greater, and compile your application for the Pepper/Chrome 15 platform or greater.

How to setup the render / update loop

Currently, the best way to ensure maximum rendering throughput is to submit a callback at the end of your swapbuffers call, and have that callback trigger a render. Since swapbuffers is a non-blocking call, this will queue up a render frame as soon as the previous one is finished.

void Render(int32_t result) {
  // Do real rendering here.
  m_graphics.SwapBuffers(cbFactory.NewCallback(&Render));
}

Tips and best practices

As with most graphics APIs, there's a set of "proper" techniques you must use to make sure you're getting the maximum performance out of your application. In addition to the techniques that are applicable to OpenGL ES 2.0, below is a set of tips for getting maximum performance with the Pepper 3D API.


Update indices sparingly.

For security reasons all indices must be validated. If you change indeces Native Client has to validate them again. Therefore structure your code so indices are not updated often.


Don't use client side buffers.

In OpenGL ES 2.0 you can use client side data with glVertexAttribPointer and glDrawElements. It's REALLY SLOW! Don't use them. Instead, whenever possible use VBOs (Vertex Buffer Objects). Side-note: Client side buffers have been removed from OpenGL 4.0.


Don't mix vertex data with index data.

Actually this is off by default. In real OpenGL ES 2.0 you can create a single buffer and bind it to both GL_ARRAY_BUFFER and GL_ELEMENT_ARRAY_BUFFER. In Pepper 3D, by default, you can only bind buffers to 1 bind point. There is the option to enable binding buffers to both points. Doing so requires expensive work, so don't do it.


Don't call glGetXXX or glCheckXXX during rendering.

Calling either of those stalls the Chrome multi-process pipeline. This is normal advice for OpenGL programs, but is particularly important for 3D on Chrome. This includes glGetError - avoid calling it in release builds.


Make sure to enable Attrib 0.

In OpenGL you MUST enable Attrib 0. In OpenGL ES 2.0 you don't have to enable Attrib 0. What that means is that in order to emulate OpenGL ES 2.0 on top of OpenGL Chrome has to do some expensive work.

In practice most programs don't have an issue here but just in case, the most obvious way this might bite you is if you bind your own locations and don't start with 0. Example: Imagine you have a vertex shader with 2 attributes "positions" and "normals":

glBindAttribLocation(program, "positions", 1);
glBindAttribLocation(program, "normals", 2);

Those 2 functions would make make your shader NOT use attrib 0, in which case Chrome has to do some expensive work internally.


Avoid reading back output from the GPU to the client.

In other words, don't call glReadPixels. This is slow.


Use a smaller plugin and let CSS scale it.

The size your plugin renders and the size it displays in the page are set separately. CSS controls the size your plugin displays, whereas the width and height attribute of your <embed> element control the size your plugin renders.


Flush the RPC pipe frequently.

Chrome is a mutli-process browser – each tab is a seperate process that has its own memory, cycles, and containment. Because processes don't directly share memory spaces, communication between tabs/processes is done through a method called RPC, or Remote Procedure Call. RPC commucation acts like a packatized networking stream, where small blocks of data are batched up in a buffer and flushed across the RPC pipe to another process at a later time. These RPC flushes come in two flavors:

  • a flush - simply do an asyncrhounus flush of the buffer
  • a sync flush - flush the data, and wait until the buffer is consumed.

For 3D in Native Client, it's the second kind of flush that you need worry about.

For security reasons, all 3D calls occur in a seperate process. This is not the same process as your current tab, but yet another process that focuses solely on communicating with the graphics API. As such, all your OpenGL ES calls in your NaCl process do nothing more than push command data into an RPC buffer.

The upside of this is that your NaCl process doesn't lose any processing due to graphics API calls. This has been a bane of PC graphics computing for some time now, and it's nice to know that you don't have to split processing between rendering and sim work on your main thread.

The downside, though, is that it's unclear when an RPC flush call can occur, and whether it is going to be a simple flush or a sync flush. Since each GL call pushes a new command onto the buffer, your flush can occur inside your GL call on your NaCl process. This means that if you're doing processing, it could look like a random GL call spikes up to 98ms for no reason. This type of debugging data would lead you to believe that there's something wrong with the Native Client GL implementation, but alas, it's all about the RPC flushing.

To keep your rendering overhead low, try to force flushes before heavy processing that doesn't have an RPC function call. For instance, right before you kick off your multithreaded particle work, call a flush so that the RPC pipe will be clear by the time you're going to start doing GL calls again. You can do this directly with two GL function calls:

  • glFlush - which will force a flush
  • glFinish - which will force a flush with a sync.

To gauge how often to call glFlush, you should do throughput testing of your current rendering pipeline to find the sweet spot.


Use HTML where approriate.

If you're used to making native games you're probably used to rendering everything yourself. But the browser can already render text and UI very well, and it will composite that HTML with your Native Client module using all the standard HTML5 and CSS methods available. Using the Pepper API, you can communicate UI information back and forth to JavaScript, which has a rich set of tools available to handle complex UI for the user.


Avoid updating a small portion of a large buffer.

This is especially an issue in Windows where Chrome emulates OpenGL ES 2.0 on top of DirectX. In the current implementation, updating a portion of a buffer requires re-processing the entire buffer. In other words if you make a buffer (glBufferData) of 10000 bytes and then later call glSubBufferData to update just 3 of those bytes, all 10000 bytes will have to be re-converted. Two suggestions:

  • Separate static vertex data from dynamic.
  • In other words, put your static data in 1 buffer and dynamic data in a different buffer. That way your static data won't have to be re-converted.

General OpenGL advice

Although OpenGL rendering is not Native Client-specific, the following feedback may help your application's performance.


Find the proper bottleneck.

Performance in a graphics application is not a one-size problem. There are many interconnected pieces, all of which may operate at various frequencies of performance. Visit Shawn Hargreaves' blog and run through steps there to test if you are GPU-bound or CPU-bound before moving forward. It's worth noting that if you are GPU-bound, there's very little NaCl can do for you, as it uses the same driver and GPU that your desktop applications use.


Interleaved data is faster to render than non-interleaved data.

Three buffers of [position,position,position], [normal,normal,normal], [texcoord,texcoord,texcoord] is slower than 1 buffer of [position,normal,texcoord,position,normal,texcoord,position,normal,texcoord].


Separate dynamic data from static data.

Assume that you have positions, normals, and texcoords. Further assume that you update positions every frame. It would be best to put positions in one buffer and normals + texcoords in a separate buffer. That way, you can call glBufferData or glBufferSubData on a smaller range of data.


glBindBuffer is expensive.

Consider putting multiple meshes in a single buffer and using offsets (as long as the buffers are static – see above).


Check your extensions and max GL features.

Not every GPU supports every extension or has the same amount of textures units, vertex attributes, etc. Make sure you check for the features you need.

For example, if you are using non power-of-2 texture with MIPS, make sure GL_OES_texture_npot exists. If you are using floating point textures, make sure GL_OES_texture_float exists. If you are using DXT1, DXT3, or DXT5 texture, make sure GL_ETC_texture_compression_dxt1, GL_CHROMIUM_texture_compression_dxt3 and GL_CHROMIUM_texture_compression_dxt5 exist.

If you are using textures in vertex shaders make sure glGetIntegerv(GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS, ...) returns a value greater than 0.

If you are using more than 8 textures in a single shader make sure glGetIntegerv(GL_MAX_TEXTURE_IMAGE_UNITS, ...) returns a value greater than or equal to the number of simultaneous textures you need.

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.