WebXR
Overview
- The WebXR API allows web content to be displayed as VR and AR experiences.
- It supersedes the now-deprecated WebVR API.
- The docs on MDN are exceptional, and they helped me a lot.
- The specification is also very well clear and well-written.
- js13kGames' WebXR category allows using A-Frame, Babylon.js, or THREE.js to handle WebXR for you.
- These libraries provide a whole 3D programming environment and API.
- They are 380 KB, 880 KB, and 190 KB respectively, when zipped.
- They don't count towards the 13 KB limit, however.
- In ROAR, I implemented the WebXR support myself.
- ROAR doesn't use any third-party libraries.
- The game loop, the rendering pipeline, the physics… everything is implemented from scratch in vanilla TypeScript.
- It's still less than 13 KB zipped!
- Overall, I enjoyed using the WebXR API.
Pose
- One of the key concepts of the WebXR API is the concept of the reference space.
- A reference space is the space requested during the initialization of the XR session.
let session = await navigator.xr.requestSession("immersive-vr", { requiredFeatures: ["local-floor"], }); game.XrSpace = await session.requestReferenceSpace("local-floor");
- Depending on the type, the reference space can represent different origins.
- A
local
reference space is anchored at the center of the headset at the time of the initialization. - A
local-floor
reference space has its origin at the floor level. - There are more types of reference spaces:
bounded-floor
,unbounded
, andviewer
.
- A
- During the lifetime of an XR session, the position and orientation of the headset and the controllers are reported relative to the origin of the reference space.
- It's also possible to create new reference spaces, translated or rotated relative to the parent one, through the
getOffsetReferenceSpace
method.- I didn't use this method in ROAR, but see Locomotion for a potential use-case.
- A reference space is the space requested during the initialization of the XR session.
- The WebXR API exposes the position and orientation of the headset and the controllers through the
XRPose
interface.- For the headset, the pose can be retrieved with the
XRFrame.getViewerPose
method. - For the controllers, the
XRFrame.getPose
method can be used. - Both of these methods require a reference space to be passed as an argument. The returned poses contain an
XRRigidTransform
which provides the position and orientation of the pose relative to the reference space. - The transform exposes the position and the orientation as
DOMPointReadOnly
. - It also exposes the full transformation matrix as a
Float32Array
.
- For the headset, the pose can be retrieved with the
- For regular 3D, Goodluck stores the position data as transformation matrices too.
- The
Transform
component stores the local position, rotation and scale, i.e. relative to the parent transform, or relative to the world if the entity doesn't have a parent. - The
sys_transform
system is responsible for computing the world-relative transformation matrix, calledTransform.World
, for each entity.- First, it composes the local position, rotation and scale into a local transformation matrix.
- If the transform has a parent,
sys_transform
then left-multiplies its local matrix by the matrix of the parent. - This requires that parents are updated before children. Goodluck doesn't strictly guarantee this, but the
instantiate
function creates entity hierarchies in the correct order, which is usually enough.
- The world-relative transformation matrices are then handed off to
sys_render
for rendering.
- The
- To allow transforms to be driven by XR poses rather than the local position and rotation, I created a new component,
ControlPose
. It stores the kind of the input driver, i.e. the head, the left hand, or the right hand.- The corresponding
sys_contorl_pose
system then retrieves the relevant pose from the WebXR API, and assigns theXRRigidTransform
's matrix to the entity'stransform.World
.if (control.Kind === ControlPoseKind.Head) { let headset = game.XrFrame!.getViewerPose(game.XrSpace); transform.World = headset.transform.matrix; transform.Dirty = true; return; }
- When
sys_transform
runs, it skips entities driven by XR poses when it composes the local transformation matrices from position, rotation, and scale. It doesn't need to do it, because the matrix has already been updated this frame bysys_control_pose
.let transform = game.World.Transform[entity]; if (transform.Dirty) { transform.Dirty = false; set_children_as_dirty(game.World, transform); if (game.XrFrame && game.World.Signature[entity] & Has.ControlPose) { // Pose transforms have their World matrix set from XRPose by other // systems. Their translation, rotation and scale are ignored. } else { from_rotation_translation_scale( transform.World, transform.Rotation, transform.Translation, transform.Scale ); } if (transform.Parent !== undefined) { let parent = game.World.Transform[transform.Parent].World; multiply(transform.World, parent, transform.World); } invert(transform.Self, transform.World); }
-
sys_transform
still left-multiplies the transformation matrix by the parent's matrix, if applicable, for all entities.- It makes it possible to programmatically move the parents of XR-driven entities, which is important for locomotion.
- The world-relative transforms of the XR-driven entities take into account the transforms of their parents.
- Integrating WebXR poses into Goodluck's regular
sys_transform
was important for scaling.- In WebGL (and OpenGL) positions are expressed in a unitless measure, often called units.
- In WebXR, 1 unit is assumed to equal 1 meter in the real world's space.
- My initial prototype was made of unit cubes, i.e. cubes whose volume is 1 cubic unit.
- That's 1 meter in each dimension.
- I wanted the player to feel huge, so I decided to adjust the scale.
- Rather than scale the buildings down, I scaled the camera's parent up.
- I assumed 1 game-world unit ≈ 33 meters.
- I scaled the camera up x3, so that 1 camera-space unit ≈ 100 meters.
- In other words, 1 real-life meter ≈ 100 game world meters.
- Assuming player's average height of 170 centimeters, their in-game size is ~170 meters.
- According to this chart, that's a lot even for a monster!
- The corresponding
Rendering
- While in the XR session, the scene is rendered into a special framebuffer provided by the session.
- The rendering is handled by an XR Compositor, maintained by the user agent.
- The compositor is independent from other rendering contexts in the document.
- The compositor manages the frame timing, too.
- While in the XR session, the app must use
XRSession.requestAnimationFrame
rather than the regularWindow.requestAnimationFrame
.if (game.XrSession) { raf = game.XrSession.requestAnimationFrame(tick); } else { raf = requestAnimationFrame(tick); }
- When invoked by
XRSession.requestAnimationFrame
, thetick
callback receives a second parameter of typeXRFrame
.- The
XRFrame
object contains the data about the current positions of the headset and the controllers. SeeXRFrame.getViewerPose
andXRFrame.getPose
in the Pose section above.
- The
- While in the XR session, the app must use
- The framebuffer needs to be bound before any draw calls are made.
let layer = game.XrFrame!.session.renderState.baseLayer!; game.Gl.bindFramebuffer(GL_FRAMEBUFFER, layer.framebuffer);
- The scene is rendered twice, once for each eye.
- The WebXR API provides the information about how to render in
XRViewerPose.views
array ofXRView
objects. -
According to the spec, the number of views on the
XRViewerPose
may change dynamically during the lifetime of the XR session, depending on the device and use-case. - So actually, the scene might be rendered fewer or more times than exactly two. I guess that two views are the most common scenario, though.
- The WebXR API provides the information about how to render in
- A view stores the dimensions of the slice of the canvas to render to.
- In OpenGL terms, these are the dimensions of the
viewport
.
for (let eye of camera.Eyes) { let viewport = layer.getViewport(eye.Viewpoint); game.Gl.viewport(viewport.x, viewport.y, viewport.width, viewport.height); // Render the scene from the eye's PoV. }
- In OpenGL terms, these are the dimensions of the
- A view also stores the information about the position and orientation of the eye, together with its projection matrix.
- The position and the rotation are reported relative to the reference space.
- The projection matrix describes the frustum, i.e. the field of view and the near and the far planes.
- Some parameters of the frustum can be configured through the
XRRenderState
interface.
- Some parameters of the frustum can be configured through the
- With some matrix math, it's possible to compute the world-space transformation matrices for each eye.
- The inverse of this matrix is called the
View
matrix and is used bysys_cull
. - The
View
matrix is then left-multiplied by the eye's projection matrix to obtain the so-calledPV
(projection * view) matrix. - The
PV
matrix is passed as a uniform to all shaders, to allow the scene the be rendered the way it should be visible from the corresponding eye.
- The inverse of this matrix is called the
function update_vr(game: Game, entity: Entity, camera: CameraXr) { game.Camera = camera; let transform = game.World.Transform[entity]; let pose = game.XrFrame!.getViewerPose(game.XrSpace); camera.Eyes = []; for (let viewpoint of pose.views) { let eye: XrEye = { Viewpoint: viewpoint, View: create(), Pv: create(), Position: [0, 0, 0], FogDistance: camera.FogDistance, }; // Compute the eye's world matrix. multiply(eye.View, transform.World, viewpoint.transform.matrix); get_translation(eye.Position, eye.View); // Compute the view matrix. invert(eye.View, eye.View); // Compute the PV matrix. multiply(eye.Pv, viewpoint.projectionMatrix, eye.View); camera.Eyes.push(eye); } }
Locomotion
- I was hesitant to add any locomotion in ROAR.
- I know it can cause motion sickness, and I'm sensitive to it myself.
- The Oculus docs recommend a number locomotion techniques.
- Most of them sound like they'd require a significant amount of code to get right.
- I was hoping I would get away with no locomotion by design.
- However, I was struggling with finding a gameplay mechanic which has the player stand in the middle of the scene only.
- I experimented with artificially moving the player through the city on a straight path, but I felt that it took the focus away from destruction, and was less fun.
- I implemented it as a test in the simplest manner possible.
- Surprisingly, it felt pretty good! I quickly got used to it to a point where it didn't cause any discomfort at all.
- I tested it with a few friends, and they had similar experiences. After a short time playing they would get used to it completely.
- I didn't implement any rotation controls.
- I had trouble with the pivot of the rotation being the origin of the reference space rather than the position of the player in that space.
- It caused the player to rotate similar to how a clock's hands do: the tip of a hand moves faster and farther than the base.
- I think the solution involves the getOffsetReferenceSpace method which I overlooked in my initial API research, but I haven't tried it yet.
- I also implemented basic head bobbing during movement.
- The vertical position of the player's head is adjusted slightly depending on where they stand.
- The amount of the adjustment is computed by taking a sine of the player's position on the horizontal plane.
- When the player is moving, the camera moves smoothly on a curved plane.
- This isn't how bobbing works in real life.
- The way humans walk in real life, your head always dips a little bit when you take the first step.
- In my implementation it may actually go up if you happen to be in a XZ position where the sine increases.
- I'm not sure if this was a needed addition at all.
- I don't think it helps with the motion sickness.
- It might even intensify it for some.
- I should have done more research on this.
- Fortunately it's only a few bytes.
Grabbing
- Grabbing objects is implemented using dynamic colliders.
- Using collision layers, I was able to define which kinds of objects can be grabbed by the player.
- When a collision is reported, the code checks if the grip button under the middle finger is squeezed.
- If the test passes, the target object is anchored as a child of the hand entity.
- There's a bit of math involved in order to position and rotate the object relative to the hand in a way that doesn't change its position and rotation in the world space.
- This creates a seamless experience of grabbing objects.
- Otherwise, the grabbed object would instantly snap into the same rotation as the hand.
- I always check first the left controller then the right controller for the squeeze.
- A non-critical side-effect is that it's possible to use the right hand to grab an object currently being help in the left hand, but not the other way around.
- When the grip button is released, I de-parent the grabbed object and move it back into the world space.
- Again, some matrix math makes sure the position and the rotation are what the player would expect the to be.
- I also animate the hand slightly when the grip button is being pressed.
- The value of the input is reported an an analogue axis with a range of
[0, 1]
. - This allows me to control the effect gradually, according to how far the grip button is pressed down.
- I didn't want to spend any size budget on rigging the hand models to animate them when they close.
- Instead, I slightly rotate them and scale down on one axis.
- I guess it kind of, sort of looks OK(-ish).
- No, really, it's good enough!
- The value of the input is reported an an analogue axis with a range of
Debugging
- Debugging WebXR apps is a bit more involved than debugging regular web content.
- For some tasks, the WebXR emulator by Mozilla Mixed Reality team is very helpful.
- It adds a tab to the Devtools where you can position a virtual headset and controllers.
- Unfortunately, I wasn't able to make any buttons work in the emulator.
- For other tasks and for testing the gameplay, it's best to put the headset on and play in real VR.
- It's a bit hard to quickly iterate when you have to switch between the keyboard and the headset all the time.
- On the Oculus Quest, it's helpful to turn the developer mode on.
- It allows Chrome and Edge to debug the pages open in the Oculus Browser remotely from your computer.
- In Chrome, go to
chrome://inspect/#devices
. - In Edge, go to
edge://inspect/#devices
. - Find the tab you're intersted in and click Inspect. A regular Devtools window will open.
- In Chrome, go to
- If you're using Firefox Reality, you can debug remotely using Firefox on your desktop.
- In Firefox, go to
about:debugging
.
- In Firefox, go to
- It allows Chrome and Edge to debug the pages open in the Oculus Browser remotely from your computer.
- In Chrome (and Edge) it's also possible to turn on port forwarding between the headset and the computer.
- This is really convenient when developing on your computer using a local dev server.
- WebXR requires
localhost
or HTTPS.- This means that it's not possible to launch an XR session from a LAN IP, e.g. 192.168....
- The solution is use port forwarding.
- A port listening on the headset will forward traffic to a port on the computer.
- E.g.
localhost:1234
on my Quest forwards tolocalhost:1234
on my computer.
- E.g.
- You can then access your dev server by going to
localhost:1234
in your headset, as if the server was running on the device!
- A port listening on the headset will forward traffic to a port on the computer.
- To enable port forwarding, set it up in the Port forwarding… dialog on
chrome://inspect
oredge://inspect
.