Performance

Overview

I was pleasantly surprised by the JS and WebGL performance on the Oculus Quest.
- With a simple frustum-culling system which skips unnecessary draw calls, ROAR runs at a smooth 72 FPS.
The game is GPU-bound.
- In a typical frame, sys_render takes 8 ms (83% of the frame), while the CPU-bound systems like sys_transform, sys_collide, and sys_cull add up to less than 1 ms!
- It shouldn't be surprising that so much time is spent rendering. To start, the headset has to render everything twice, one for each eye!
A typical frame profile recorded on my Oculus Quest.
OVR Metrics Tool in action.
Image credit: developer.oculus.com.
I used the OVR Metrics Tools to monitor the performance on the Oculus Quest during most of the development.
- It displays an overlay with performance measurements on the screen; it's super helpful!

Frustum Culling

I implemented a frustum culling system to reduce the number of draw calls made every frame.
- The GPU also does culling on vertices outside the NDC, but it happens after the vertex shader runs, which means that it has no impact on the number of draw calls made.
A typical scene in ROAR.
The scene typically consists of 64 buildings, each made of 5 cubes on average.
- Without any optimization, that comes out to ca. 300 draw calls and over 10,000 vertices just for the environment.
- There are also another ~50 draw calls for the hands and other props.
Each building cube has a dormant particle emitter which activates when the building is set on fire.
- A control system slows the fire down and finally puts it out after ~20 seconds to limit the number of particles on the screen.
- An emitter is additionally limited to at most 200 particles.
- Still, if you go crazy and put the entire city on fire, the GPU would need to render ca. 40,000 particles, each drawn as a texture point with additive blending.
- Each emitter is its own draw call, too. In the extreme case, I'd almost reach 500 draw calls and up to 50,000 vertices per frame. This was a problem for performance.
- When I tested first tested it, I saw around 50-60 FPS with a reasonable amount of fires started (still pretty good!), and down to 15 FPS for the extreme case when all of the city was burning.
  - At this point I considered solving this through game design rather than through optimizations. E.g. I could have made the fire breath require some kind of "fuel" which would be in limited quantity. Breathing fire is fun, though, so I decided to try a technical solution.
The first iteration of the culling system turned off the Render and EmitParticles components for entities outside the camera's frustum, normalized into the NDC.
- This was enough to get the number of draw calls to around 150 per frame.
- Oculus docs recommend ~50-100 draw calls and ~50,000-100,000 triangles or vertices per frame.
  - I suspect that ROAR gets away with more because it only has two materials and changes them only once per frame.
    - All textured objects, almost all of which are cubes, are drawn first, and then in a second pass all particles are rendered.
    - This happens to work great for blending: translucent particles are drawn on top of all the textured objects.
- Normalizing into the NDC meant that the culling only applied to entities behind the player, far in front of them, or to the sides of the peripheral vision.
  - This was good enough because at the time the player couldn't really move too far from the center of the scene.
Locomotion allows player to walk away and see the whole city.
Once I implemented locomotion, however, it became possible to move away from the center of the scene, turn around, and see all the buildings.
- I was back at 300+ draw calls per frame.
- The rendering performance was bad again.
The same view with the fog enabled.
The solution was to use the oldest trick in 3D programming: the fog.
- I also decoupled the camera's far distance from the fog distance so that the missiles launched from far away are still rendered.
- Thanks to the fog, I can turn off rendering of buildings fairly close to the player which would otherwise be in their plain sight.
- The number of draw calls is now usually well under 100.
As the camera moves through the city, the culling system turns rendering on for buildings which come out of the fog.
Entities on the edges of the viewport are sometimes culled too eagerly.
The culling systems still isn't perfect: it only considered objects' positions rather than the bounds. You can sometimes see objects at the edges of the screen disappear too early.

Collisions

The collision detection system is at the center of ROAR's gameplay.
- Buildings are rigid bodies which bounce off of each other when they collide.
- Other entities use colliders as triggers: when a collision is detected, they run some extra logic.
  - Example: when a missile detects a collision with another entity, both the missile and the hit entity are destroyed.
Colliders can be static or dynamic.
- Static colliders are assumed to never move. They're computed once when they're created. They also never collide with other static colliders.
- Dynamic colliders can move freely; they're re-computed every frame. The collide with both static colliders and other dynamic colliders.
Collisions between colliders are computed every frame.
- Each dynamic collider is checked against all static colliders.
- Each pair of dynamic colliders is checked, too.
  - This approach has the time complexity of O(n²).
  - Each pair is actually checked only once, halving the number of intersection checks.
  - Assuming ca. 200 building cubes and colliders, that's still 20,000 intersection checks every frame.
    - Perhaps surprisingly, it's under 1 ms on the Quest. Not bad.
Static shell colliders turn into smaller dynamic colliders when awakened.
I implemented an optimization related to how the collisions between buildings are computed.
- Buildings start with their individual cubes' colliders disabled.
- Instead, there's one parent collider spanning all cubes.
  - It's static. The number of dynamic intersection checks drops exponentially.
- When another entity intersects with this static shell, the cubes' colliders wake up, i.e. turn on, and the shell itself is destroyed.
- sys_collide now takes less than 0.2 ms.
The main driver for this optimization wasn't the performance of sys_collide.
- I considered the original 1 ms good enough.
  - The five-fold improvement is nice, though.
  - I prefer to have that extra 0.8 ms for rendering.
- Shells helped me improve the rendering performance.
  - The buildings which haven't been yet interacted with only have a single collider, which means that they only register a single collision with the fire breath.
  - I can thus control how many cubes in the buildings are set on fire.
    - Only the bottom cube is set on fire in buildings that are asleep.
  - This helps limit the number of active fires on the screen, and consequently the number of particles which must be rendered.
- Shells are also more convenient to work with.
  - Building cubes must be top-level entities for the rigid body physics to work properly.
  - Before the introduction of shells, I spawned each cube of each building independently in one big for loop.
  - Shells are easier to spawn. It's just one blueprint that creates the entire building at once.