WebXR

Overview

The WebXR API allows web content to be displayed as VR and AR experiences.
- It supersedes the now-deprecated WebVR API.
The docs on MDN are exceptional, and they helped me a lot.
The specification is also very well clear and well-written.
js13kGames' WebXR category allows using A-Frame, Babylon.js, or THREE.js to handle WebXR for you.
- These libraries provide a whole 3D programming environment and API.
- They are 380 KB, 880 KB, and 190 KB respectively, when zipped.
- They don't count towards the 13 KB limit, however.
In ROAR, I implemented the WebXR support myself.
- ROAR doesn't use any third-party libraries.
- The game loop, the rendering pipeline, the physics… everything is implemented from scratch in vanilla TypeScript.
- It's still less than 13 KB zipped!
Overall, I enjoyed using the WebXR API.
- Once initialized, using the API boils down to two main concepts:
  - Finding out where things are in the world space — see Pose below.
  - Rendering the scene twice — see Rendering.
- It adds little size overhead.
  - The VR-enabled version is only ca. 500 bytes larger than the 2D-only one.

Pose

One of the key concepts of the WebXR API is the concept of the reference space.
- A reference space is the space requested during the initialization of the XR session.
```
let session = await navigator.xr.requestSession("immersive-vr", {
    requiredFeatures: ["local-floor"],
});

game.XrSpace = await session.requestReferenceSpace("local-floor");
```
- Depending on the type, the reference space can represent different origins.
  - A local reference space is anchored at the center of the headset at the time of the initialization.
  - A local-floor reference space has its origin at the floor level.
  - There are more types of reference spaces: bounded-floor, unbounded, and viewer.
- During the lifetime of an XR session, the position and orientation of the headset and the controllers are reported relative to the origin of the reference space.
- It's also possible to create new reference spaces, translated or rotated relative to the parent one, through the getOffsetReferenceSpace method.
  - I didn't use this method in ROAR, but see Locomotion for a potential use-case.
The WebXR API exposes the position and orientation of the headset and the controllers through the XRPose interface.
- For the headset, the pose can be retrieved with the XRFrame.getViewerPose method.
- For the controllers, the XRFrame.getPose method can be used.
- Both of these methods require a reference space to be passed as an argument. The returned poses contain an XRRigidTransform which provides the position and orientation of the pose relative to the reference space.
- The transform exposes the position and the orientation as DOMPointReadOnly.
- It also exposes the full transformation matrix as a Float32Array.
For regular 3D, Goodluck stores the position data as transformation matrices too.
- The Transform component stores the local position, rotation and scale, i.e. relative to the parent transform, or relative to the world if the entity doesn't have a parent.
- The sys_transform system is responsible for computing the world-relative transformation matrix, called Transform.World, for each entity.
  - First, it composes the local position, rotation and scale into a local transformation matrix.
  - If the transform has a parent, sys_transform then left-multiplies its local matrix by the matrix of the parent.
  - This requires that parents are updated before children. Goodluck doesn't strictly guarantee this, but the instantiate function creates entity hierarchies in the correct order, which is usually enough.
- The world-relative transformation matrices are then handed off to sys_render for rendering.
To allow transforms to be driven by XR poses rather than the local position and rotation, I created a new component, ControlPose. It stores the kind of the input driver, i.e. the head, the left hand, or the right hand.
- The corresponding sys_contorl_pose system then retrieves the relevant pose from the WebXR API, and assigns the XRRigidTransform's matrix to the entity's transform.World.
```
if (control.Kind === ControlPoseKind.Head) {
    let headset = game.XrFrame!.getViewerPose(game.XrSpace);
    transform.World = headset.transform.matrix;
    transform.Dirty = true;
    return;
}
```
- When sys_transform runs, it skips entities driven by XR poses when it composes the local transformation matrices from position, rotation, and scale. It doesn't need to do it, because the matrix has already been updated this frame by sys_control_pose.
```
let transform = game.World.Transform[entity];
if (transform.Dirty) {
    transform.Dirty = false;
    set_children_as_dirty(game.World, transform);

    if (game.XrFrame && game.World.Signature[entity] & Has.ControlPose) {
        // Pose transforms have their World matrix set from XRPose by other
        // systems. Their translation, rotation and scale are ignored.
    } else {
        from_rotation_translation_scale(
            transform.World,
            transform.Rotation,
            transform.Translation,
            transform.Scale
        );
    }

    if (transform.Parent !== undefined) {
        let parent = game.World.Transform[transform.Parent].World;
        multiply(transform.World, parent, transform.World);
    }

    invert(transform.Self, transform.World);
}
```
- sys_transform still left-multiplies the transformation matrix by the parent's matrix, if applicable, for all entities.
  - It makes it possible to programmatically move the parents of XR-driven entities, which is important for locomotion.
  - The world-relative transforms of the XR-driven entities take into account the transforms of their parents.
- Integrating WebXR poses into Goodluck's regular sys_transform was important for scaling.
  - In WebGL (and OpenGL) positions are expressed in a unitless measure, often called units.
  - In WebXR, 1 unit is assumed to equal 1 meter in the real world's space.
  - My initial prototype was made of unit cubes, i.e. cubes whose volume is 1 cubic unit.
    - That's 1 meter in each dimension.
    - I wanted the player to feel huge, so I decided to adjust the scale.
  - Rather than scale the buildings down, I scaled the camera's parent up.
    - 33 meters ≈ 36 yards or 108 feet
      100 meters ≈ 109 yards or 328 feet
      170 centimeters ≈ 5' 7"
    - I assumed 1 game-world unit ≈ 33 meters.
    - I scaled the camera up x3, so that 1 camera-space unit ≈ 100 meters.
      - In other words, 1 real-life meter ≈ 100 game world meters.
    - Assuming player's average height of 170 centimeters, their in-game size is ~170 meters.
      - According to this chart, that's a lot even for a monster!

Rendering

While in the XR session, the scene is rendered into a special framebuffer provided by the session.
- The rendering is handled by an XR Compositor, maintained by the user agent.
- The compositor is independent from other rendering contexts in the document.
- The compositor manages the frame timing, too.
  - While in the XR session, the app must use XRSession.requestAnimationFrame rather than the regular Window.requestAnimationFrame.
```
if (game.XrSession) {
    raf = game.XrSession.requestAnimationFrame(tick);
} else {
    raf = requestAnimationFrame(tick);
}
```
  - When invoked by XRSession.requestAnimationFrame, the tick callback receives a second parameter of type XRFrame.
    - The XRFrame object contains the data about the current positions of the headset and the controllers. See XRFrame.getViewerPose and XRFrame.getPose in the Pose section above.
- The framebuffer needs to be bound before any draw calls are made.
```
let layer = game.XrFrame!.session.renderState.baseLayer!;
game.Gl.bindFramebuffer(GL_FRAMEBUFFER, layer.framebuffer);
```
The scene is rendered twice, once for each eye.
- The WebXR API provides the information about how to render in XRViewerPose.views array of XRView objects.
- According to the spec, the number of views on the XRViewerPose may change dynamically during the lifetime of the XR session, depending on the device and use-case.
- So actually, the scene might be rendered fewer or more times than exactly two. I guess that two views are the most common scenario, though.

A view stores the dimensions of the slice of the canvas to render to.

In OpenGL terms, these are the dimensions of the viewport.

for (let eye of camera.Eyes) {
    let viewport = layer.getViewport(eye.Viewpoint);
    game.Gl.viewport(viewport.x, viewport.y, viewport.width, viewport.height);
    
    // Render the scene from the eye's PoV.
}

A view also stores the information about the position and orientation of the eye, together with its projection matrix.

The position and the rotation are reported relative to the reference space.
The projection matrix describes the frustum, i.e. the field of view and the near and the far planes.
- Some parameters of the frustum can be configured through the XRRenderState interface.
With some matrix math, it's possible to compute the world-space transformation matrices for each eye.
- The inverse of this matrix is called the View matrix and is used by sys_cull.
- The View matrix is then left-multiplied by the eye's projection matrix to obtain the so-called PV (projection * view) matrix.
- The PV matrix is passed as a uniform to all shaders, to allow the scene the be rendered the way it should be visible from the corresponding eye.

function update_vr(game: Game, entity: Entity, camera: CameraXr) {
    game.Camera = camera;

    let transform = game.World.Transform[entity];
    let pose = game.XrFrame!.getViewerPose(game.XrSpace);

    camera.Eyes = [];
    for (let viewpoint of pose.views) {
        let eye: XrEye = {
            Viewpoint: viewpoint,
            View: create(),
            Pv: create(),
            Position: [0, 0, 0],
            FogDistance: camera.FogDistance,
        };

        // Compute the eye's world matrix.
        multiply(eye.View, transform.World, viewpoint.transform.matrix);
        get_translation(eye.Position, eye.View);

        // Compute the view matrix.
        invert(eye.View, eye.View);
        // Compute the PV matrix.
        multiply(eye.Pv, viewpoint.projectionMatrix, eye.View);

        camera.Eyes.push(eye);
    }
}

Locomotion

I was hesitant to add any locomotion in ROAR.
- I know it can cause motion sickness, and I'm sensitive to it myself.
- The Oculus docs recommend a number locomotion techniques.
  - Most of them sound like they'd require a significant amount of code to get right.
  - I was hoping I would get away with no locomotion by design.
- However, I was struggling with finding a gameplay mechanic which has the player stand in the middle of the scene only.
  - See Game Design for more about this.
  - I experimented with artificially moving the player through the city on a straight path, but I felt that it took the focus away from destruction, and was less fun.
I implemented it as a test in the simplest manner possible.
- Surprisingly, it felt pretty good! I quickly got used to it to a point where it didn't cause any discomfort at all.
- I tested it with a few friends, and they had similar experiences. After a short time playing they would get used to it completely.
I didn't implement any rotation controls.
- I had trouble with the pivot of the rotation being the origin of the reference space rather than the position of the player in that space.
- It caused the player to rotate similar to how a clock's hands do: the tip of a hand moves faster and farther than the base.
- I think the solution involves the getOffsetReferenceSpace method which I overlooked in my initial API research, but I haven't tried it yet.
I also implemented basic head bobbing during movement.
- The vertical position of the player's head is adjusted slightly depending on where they stand.
- The amount of the adjustment is computed by taking a sine of the player's position on the horizontal plane.
- When the player is moving, the camera moves smoothly on a curved plane.
  - This isn't how bobbing works in real life.
  - The way humans walk in real life, your head always dips a little bit when you take the first step.
  - In my implementation it may actually go up if you happen to be in a XZ position where the sine increases.
- I'm not sure if this was a needed addition at all.
  - I don't think it helps with the motion sickness.
  - It might even intensify it for some.
  - I should have done more research on this.
  - Fortunately it's only a few bytes.

Grabbing

Grabbing objects is implemented using dynamic colliders.
Using collision layers, I was able to define which kinds of objects can be grabbed by the player.
When a collision is reported, the code checks if the grip button under the middle finger is squeezed.
- If the test passes, the target object is anchored as a child of the hand entity.
- There's a bit of math involved in order to position and rotate the object relative to the hand in a way that doesn't change its position and rotation in the world space.
  - This creates a seamless experience of grabbing objects.
  - Otherwise, the grabbed object would instantly snap into the same rotation as the hand.
- I always check first the left controller then the right controller for the squeeze.
  - A non-critical side-effect is that it's possible to use the right hand to grab an object currently being help in the left hand, but not the other way around.
When the grip button is released, I de-parent the grabbed object and move it back into the world space.
- Again, some matrix math makes sure the position and the rotation are what the player would expect the to be.
I also animate the hand slightly when the grip button is being pressed.
- The value of the input is reported an an analogue axis with a range of [0, 1].
- This allows me to control the effect gradually, according to how far the grip button is pressed down.
- I didn't want to spend any size budget on rigging the hand models to animate them when they close.
- Instead, I slightly rotate them and scale down on one axis.
  - I guess it kind of, sort of looks OK(-ish).
  - No, really, it's good enough!

Debugging

Debugging WebXR apps is a bit more involved than debugging regular web content.
The WebXR emulator in action.
For some tasks, the WebXR emulator by Mozilla Mixed Reality team is very helpful.
- It adds a tab to the Devtools where you can position a virtual headset and controllers.
- Unfortunately, I wasn't able to make any buttons work in the emulator.
For other tasks and for testing the gameplay, it's best to put the headset on and play in real VR.
- It's a bit hard to quickly iterate when you have to switch between the keyboard and the headset all the time.
On the Oculus Quest, it's helpful to turn the developer mode on.
- It allows Chrome and Edge to debug the pages open in the Oculus Browser remotely from your computer.
  - In Chrome, go to chrome://inspect/#devices.
  - In Edge, go to edge://inspect/#devices.
  - Find the tab you're intersted in and click Inspect. A regular Devtools window will open.
- If you're using Firefox Reality, you can debug remotely using Firefox on your desktop.
  - In Firefox, go to about:debugging.
In Chrome (and Edge) it's also possible to turn on port forwarding between the headset and the computer.
- This is really convenient when developing on your computer using a local dev server.
- WebXR requires localhost or HTTPS.
  - This means that it's not possible to launch an XR session from a LAN IP, e.g. 192.168....
- The solution is use port forwarding.
  - A port listening on the headset will forward traffic to a port on the computer.
    - E.g. localhost:1234 on my Quest forwards to localhost:1234 on my computer.
  - You can then access your dev server by going to localhost:1234 in your headset, as if the server was running on the device!
- To enable port forwarding, set it up in the Port forwarding… dialog on chrome://inspect or edge://inspect.