Size Optimizations
Approach
- We took a new approach for us this year: start with a full-featured game exceeding the limit, then optimize the code for size, then cut scope.
- A few days before the deadline the full-featured build of the game clocked in at nearly 15 KB.
- We had to shave off over 1.5 KB.
- Focus on simplifying and removing code, at the cost of not handling edge cases.
- We didn't use any external libraries other than
gl-matrix
. We were able to make size optimizations everywhere throughout our codebase.- Rather than
npm install gl-matrix
, we copied parts of it to our repository and rewrote to TypeScript. This allowed us to make small adjustments which benefited the final code size.
- Rather than
- Unfortunately this led to crunch time in the last two days before the deadline.
- It also made it harder to monitor how the size grew. For most of the 4 weeks of development our build size was over the limit.
- We enjoyed the focus on features and gameplay mechanics.
- In the end, we only removed a small number of features.
- We monitored the build size changes after every commit.
- As part of our build process, we ran
gzip
on our HTML, minified JS, and themodels.tfu
file, and displayed the size right in the terminal:$ make Compiling project... Done Bundling files into one... Done Running replacements... Done Stripping newlines... Done Minifying... - parse: 0.105s - rename: 0.000s - compress: 1.058s - scope: 0.026s - mangle: 0.071s - properties: 0.082s - output: 0.035s - total: 1.378s Size gzipped (including HTML): 13881 bytes
- In retrospect, we should have used
7z
rather thangzip
in day-to-day measurements, same as when we created release builds.- The differences between compression formats and methods can be significant.
- A few bytes saved with
gzip
might have a completely different effect on the final release build compressed with7z
. - For future reference, here's the CLI command calling
7z
to create a best-quality Zip archive using the DEFLATE compression method:7z a opt.zip -mx=9 -mfb=256 -mpass=15 \ public/opt/index.html \ public/opt/game.terser.js \ public/opt/models.tfu
- On Ubuntu, install
7z
withapt install p7zip-full
.
- As part of our build process, we ran
- We created a release branch with a leaner version of the codebase.
- We removed the
sys_debug
system which we sometimes turned on to debug collisions or position entities as desired. Together with it, we removed theWireframeMaterial
and theRenderBasic
component, both used to draw wireframes around the bounding boxes of colliders and other invisible entities like particle emitter - We removed
sys_performance
andsys_framerate
, together with the HTML and CSS inindex.html
to display it. - We removed the
GL_COMPILE_STATUS
check, which we used to debug shaders.- We forgot to remove a similar
GL_LINK_STATUS
. It would have saved us 52 bytes!
- We forgot to remove a similar
- We replaced all
throw
statements with earlyreturns
. - We removed the
window.game
global which we sometimes used for debugging. - We would merge
master
intorelease
every couple of days, and then daily during the last week of the development.
- We removed the
- We replaced WebGL constants with custom-defined
const
values which are then inlined by the minifier.- WebGL defines many constants available on the
WebGL2RenderingContext
interface which are then used as arguments to WebGL APIs. For instance, the first argument todrawArrays
anddrawElements
specifies the type of the primitive to render, and is often called the drawing mode:drawArrays(gl.POINTS, ..)
drawElements(gl.TRIANGLES, ..)
- All constants are defined by the spec and are guaranteed to have the same values across all browsers.
- There's a community-maintained list of all constants on GitHub, prefixed with
GL_
, which we use instead of referencing the properties of thegl
context. drawElements(gl.TRIANGLES, ..)
becomesdrawElements(GL_TRIANGLES, ..)
which is then inlined todrawElements(0x0004, ..)
and minified todrawElements(4, ..)
.
- WebGL defines many constants available on the
- We used Chrome's Coverage tools for detecting unvisited code paths.
- The screenshot was taken during the first mission before entering the mine.
worlds/wor_mine
reports unused bytes because it hasn't had a chance to run yet. - Campfires are placed randomly and it just so happened that none was generated in the desert during the playthrough above. Consequently,
get_campfire_blueprint
is reported as unused. - Take the output with a grain of salt. We found that it gave us a few false positives even after a complete playthrough.
- The screenshot was taken during the first mission before entering the mine.
Build Pipeline
- Our build pipeline comprised TypeScript, Rollup,
sed
,tr
, and Terser. We usedmake
to build each stage, one by one (cf.Makefile
). - First, all source files in
src/
are compiled to JS inpublic/js
usingtsc
. - Then, we run Rollup to bundle them all into a single file:
game.rollup.js
.- Our codebase is around 110 files, with plenty of
export
andimport
statement. Thanks to Rollup's tree shaking algorithms, most of unused code was removed from the final build. - This allowed us to keep a number of easing functions in
src/math/easing.ts
knowing that only the ones imported and actually used elsewhere would make it to the final build.
- Our codebase is around 110 files, with plenty of
- Next, we run
sed
to perform a number of replacements which are known to be safe. This producesgame.sed.js
.- All indentation and double-slash comments are removed in this step. The minifier would do it for JS code anyway, but not for shaders (which are stored as template literals inside the JS code).
- In
Game
's constructor we run the following code to shorten the names of WebGL methods:for (let name in this.GL) { // @ts-ignore if (typeof this.GL[name] == "function") { // @ts-ignore this.GL[name.match(/^..|[A-Z]|([1-9].*)/g).join("")] = this.GL[name]; } }
Then, we use
sed
to rename these methods accordingly at the callsites. Here's an excerpt from the list of replacements that we run:s/createVertexArray/crVA/g s/bindVertexArray/biVA/g s/createBuffer()/crB()/ s/bindBuffer/biB/g s/bufferData/buD/g s/enableVertexAttribArray/enVAA/g s/vertexAttribPointer/veAP/g s/vertexAttribDivisor/veAD/
- Next, we use
tr
to remove all newlines and producegame.tr.js
. Again, this is done as a separate step to also remove newlines from shaders.- GLSL is a C-like language and it can be written in one line.
- There's one exception: the version pragma must end with a newline. We use an escape sequence,
\n
, to ensure that it's not removed.let vertex = `#version 300 es\n // Matrices: PV, world, self uniform mat4 p,q,r; // ... `;
- Lastly, we run Terser to minify the code and produce
game.terser.js
, the final artifact of the automated build process.- Since we control the entire code base of the project, we can use the more aggressive
--mangle toplevel
option which also mangles names declared in the top-level scope. - We used the following
--compress
options:passes=3 ecma=9 booleans_as_integers drop_console pure_getters toplevel unsafe unsafe_math hoist_funs
- The star of the show, however, is the
--mangle-props
option.- It makes Terser minify property names, like
Game.ViewportWidth
, consistently across the entire codebase. - When using it, you have to be careful to not minify too much. For instance, you don't want to minify
window.innerWidth
ordocument.querySelector
. - There are a number of ways to only minify some property names.
- You can pass a list of reserved names to Terser, e.g.
["innerWidth", "querySelector", ..]
, but that's hard to maintain and wouldn't scale well in Backcountry. - Or you can decide to keep quoted names:
window["innerWidth"]
, but that's a bit tedious and more typing. - Or you can only mangle property names matching a regex.
- You can pass a list of reserved names to Terser, e.g.
- We chose the regex approach:
--mangle-props regex=/^[A-Z]/
, which is why you'll see a lot of UpperCamelCase names (aka PascalCase) in our code. It's not a common practice to write JavaScript this way, but it turned out to be a really effective way of differentiating between our custom properties and properties defined by various web APIs.- We could have used a prefix or a postfix, e.g.
_
or$
, but we thought it would add visual clutter to the code. It would need to be used for all fields of all interfaces in all components.
- We could have used a prefix or a postfix, e.g.
- In our codebase,
--mangle-props
saves around 700 bytes!
- It makes Terser minify property names, like
- Since we control the entire code base of the project, we can use the more aggressive
- We perform a few manual modifications, too.
- We inline the minified JS into HTML. (This could be easily automated; we'll fix it next year.)
- There's a cost for each individual file added to the Zip archive, so it's best to keep everything in a single file.
- That said, we left
models.tfu
as a separate file because our testing revealed that it was the optimal choice. We tried base64-encoding it and inlining into HTML, but the end result was a bigger Zip archive.
- We rewrite the initialization of arrays holding component data.
- In TypeScript, we typed the arrays holding component data like so:
class Game implements ComponentData { public World: Array<number>; public [Get.Transform]: Array<Transform> = []; public [Get.Render]: Array<Render> = []; // ... }
- Thanks to this approach, every time we access component data through
game[Get.Transform][entity]
, the retrieved element is properly typed. - The JavaScript produced by the compiler, however, is rather verbose:
var _a, _b, // 21 more declarations class Game { constructor() { this.World = []; this[_a] = []; this[_b] = []; // 21 more arrays initialized, one for each component storing data. } } _a = 1 /* Transform */, _b = 2 /* Render */, // 21 more assignments
- Before minification happens, we hand-edit the compiled JavaScript and replace the code above with the following, which saves us 44 bytes.
class Game { constructor() { // ... for (let i = 0; i < 23; i++) { this[i] = []; } } }
- In TypeScript, we typed the arrays holding component data like so:
- We inline the minified JS into HTML. (This could be easily automated; we'll fix it next year.)
Compression
- Simple, even "dumb" code often times compresses better than shorter but more complex equivalents.
- Most of our code uses three basic keywords:
function
,if
andfor
. - We avoided more complex constructs, which otherwise would be preferable and considered elegant and idiomatic in JavaScript.
- With time, we came to appreciate this style. It was like writing C but in TypeScript. A bit more verbose but simple and straightforward.
- Mateusz Tomczyk, the author of The Wandering Wraith, said it best:
(…) zipping algorithms are not interested in the input size. They care about input entropy.
— Mateusz Tomczyk - This can be observed on the following heatmap where bright hot colors represent content which doesn't compress well, and blue cold colors represent content which doesn't add much to the size of the compressed archive.
- The first
function
is bright, but every other occurrence throughout the codebase is practically free. The same is true for thereturn
keyword.
- The first
- Another example of this are
for
loops which we use in systems to iterate over all entities in the scene. After minification and compression, they become gratis:- Here's what the original code looks like, using
sys_transform
as an example. Because we use this pattern consistently across all systems, it compresses very well.export function sys_transform(game: Game, delta: number) { for (let i = 0; i < game.World.length; i++) { if ((game.World[i] & QUERY) == QUERY) { update(game, i); } } }
- Here's what the original code looks like, using
-
gzthermal
is a nice visualization tool which produced the heatmaps above.- We use DEFLATE rather than gzip in the final build, so the actual compression details are different. The heatmap still gives a good idea of what goes into the archive.
- We used the web-based version available at https://github.com/simonw/gzthermal-web.
-
gzthermal.now.sh
appears to be permanently down, but it's fairly easy to set up the server locally with Docker.
-
- A full heatmap of a pre-release version of Backcountry is available at heatmap.png (1.87 MB).
- Most of our code uses three basic keywords:
- Compression is a lottery.
- Sometimes we'd remove some code only to see the size of the final build go up.
- Changing the desert map size to 30x30 cells, same as the town and the mine, reduced the build size by 70 bytes! In terms of bytes added, it was a net-zero change, but it probably made the desert initialization code close enough, or even identical, to some other code, which Zip then compressed much better than before.
- Compression depends on the entirety of the codebase. Optimizing for size early on makes little sense, in particular if gains are tiny, e.g. 10 or 20 bytes.
- If you continue working on the game and add some other code later on, there's a chance that early size optimizations actually hurt the final build size. Measuring the impact of an early change on the final size may be an impossible task.
- It's crucial to only start heavy optimizations when the game is complete.
- We used 7-Zip. It usually achieves better compression than the system archiver. We used the following options:
- Archive format: zip
- Compression level: Ultra
- Compression method: Deflate
- Dictionary Size: 32 KB
- Word size: 256
- Build size: 13,420 bytes. 108 bytes to go.
- Even more compression: Advzip
- Part of the AdvanceCOMP project, a collection of recompression utilities for your .ZIP archives, .PNG snapshots, .MNG video clips and .GZ files.
-
Download the binaries (look for the AdvanceCOMP section), or if you're on Debian/Ubuntu,
apt install advancecomp
. Or you can compile the sources. - Under the hood,
advzip
uses Zopfli, a compression library programmed in C to perform very good, but slow, deflate or zlib compression. Zopfli is developed by Google. -
advzip -z4 -i50000 opt.zip
- It takes ca. 4:30 minutes on an 8th-gen i7.
- In our case, it reduced the final build size by 116 bytes.
- Final build size: 13,304 bytes.
- Mission accomplished :)
- Out of curiosity, we checked compression methods other than Zip's DEFLATE. They left us craving for an update to js13kGames' rules :)
- Zip format, LZMA algorithm: 12,594 bytes.
- Zip format, PPMd algorithm: 12,132 bytes.
- Brotli: 11,926 bytes!
- Brotli is well-supported by browsers and servers, and is MIT-licensed, which makes it a good successor both for the web traffic and the competition.
- Similar to
gzip
, Brotli can only compress individual files. For the purpose of the experiment wetar
-ed our two files into a single archive.
- Allowing more compression methods and algorithms could allow even more creative expression from participants by giving them over 1KB of extra space.
DOMMatrix
- Until the night before the deadline we weren't sure it would be possible to reduce the size of the build without making substantial changes to the gameplay. As a last resort, we maintained a branch with some methods of
gl-matrix
replaced by the nativeDOMMatrix
implementation, which traded size for... performance.- From MDN: The DOMMatrix interface is designed with the intent that it will be used for all matrices within markup, supplanting the SVGMatrix and CSSMatrix interfaces.
inverse
,multiply
,preMultiplySelf
, andtransformPoint
are perfect for 3D maths.DOMMatrix
is available in all modern browsers; we wouldn't have had to ship custom JS implementations of these methods. It would have saved us around 430 bytes.
- Unfortunately, even if
DOMMatrix
is implemented natively, it’s not a good choice for 3D animation. The performance is noticeably worse than that ofgl-matrix
and regular JS arrays, or typed arrays.- Browsing old Intent to Implement threads from 2014, it looks like
DOMMatrix
was specifically designed as a drop-in replacement forSVGMatrix
for use-cases other than WebGL. It isn't ageneric high-performance solution for matrix math
. - One reason for the slowness might actually be the very fact that
DOMMatrix
is native code. Data must be transferred from JS to C++ and back for every operation.-
One old benchmark shows that JS can be orders of magnitude faster than the native
DOMMatrix
. Tested on an i7-8750H 2.20 GHz:- In Chrome,
multiplySelf
is 700x faster in JavaScript than when using the nativeDOMMatrix
method! - In Firefox,
multiplySelf
is 250x faster in JavaScript than when using the nativeDOMMatrix
method. - Other
DOMMatrix
methods perform better, with the JavaScript counterparts being "only" 5-10x faster.
- In Chrome,
- However, a comment in blink-dev argued that the benchmark was not realistic because it only transformed matrices without ever using them.
-
Another benchmark was proposed in which
DOMMatrix
performs better than JS.- Looking at its code today, the
DOMMatrix
test case does less then the JS equivalents. When the test is modified to account for this,DOMMatrix
performs 18x slower than theFloat32Array
snippet. - The JS test cases could benefit from some optimization to avoid unnecessary allocation of interim arrays.
- Looking at its code today, the
-
One old benchmark shows that JS can be orders of magnitude faster than the native
- Another reason might be that the spec requires that a new
DOMMatrix
be created from the object passed as the argument tomultiplySelf
andpreMultiplySelf
. In our case, we could guarantee that it always is an instance ofDOMMatrix
. - Lastly, in practical use,
DOMMatrix
suffers from not being iterable. It cannot be used directly when passing data into the WebGL context. ThetoFloat32Array()
method must be used instead which incurs an extra cost.- This seems to be the reason why both the now-deprecated WebVR API and the current WebXR specification use
Float32Arrays
to represent matrices. - In Backcountry, we experimented with caching
Float32Arrays
produced bytoFloat32Array()
. It improved the situation a little bit but it still wasn’t enough.
- This seems to be the reason why both the now-deprecated WebVR API and the current WebXR specification use
- Browsing old Intent to Implement threads from 2014, it looks like
- We created our own benchmark which is the closest to how we operate on matrices in Backcountry. In our tests,
gl-matrix
easily outperformsDOMMatrix
.DOMMatrix.preMultiplySelf
is 20x slower in Firefox and 50x slower in Chrome, compared togl-matrix
'smultiply
.- In practical use,
gl-matrix
avoids many new array allocations by always taking the target matrix as the first argument to all functions. This allows reusing existing matrices as targets and saves GC cycles.- For instance, to invert the
World
transformation matrix and store the result in theSelf
matrix, you callinvert(transform.Self, transform.World)
.
- For instance, to invert the
- Matrix multiplication is by far the most important and common operation in 3D. It can be used to express all other operations, and is crucial to converting from local space of an object, to world space, to camera space.
- In Backcountry, we typically perform ca. 250 matrix multiplications, 250 matrix inversions, and 2,000 point transformations (matrix × vector multiplications) per frame. We need these operations to be fast.
- Consider a profile of a typical frame on the
DOMMatrix
branch, recorded on a desktop with GeForce GFX 1060, so that we can focus on the CPU running time:- The entire frame takes 12.3 ms and roughly ¾ of that is
sys_transform
andsys_cull
, leaving little time for rendering. sys_transform
performs matrix multiplications and inversions, and caches the results via theDOMMatrix.toFloat32Array()
method. The cached typed arrays are then used in calls to WebGL APIs insys_render
.sys_cull
callstransformPoint
to check whether entities are inside the camera frustum.
- The entire frame takes 12.3 ms and roughly ¾ of that is
- For reference, here's a typical frame on the
master
branch, which usesgl-matrix
.- The frame now takes 4.4 ms, which is almost 3 times faster than the
DOMMatrix
branch. - All CPU-bound systems take ¼ of the frame time, leaving much more time for rendering.
sys_transform
andsys_cull
now take a total of 0.2 ms.
- The frame now takes 4.4 ms, which is almost 3 times faster than the
- On some computers the differences between the two branches were less significant. Often times, these were computers with integrated GPUs and ran Backcountry at 30 FPS, with more time to spare for CPU tasks.
- If you're curious, try the
DOMMatrix
build of Backcountry.
- If you're curious, try the
- We considered great performance one of the most important features in Backcountry, and decided to cut other features in order to not have to use the
DOMMatrix
branch. - Speaking from the perspective of 3D game development, it would be nice to have a high-performance solution for matrix math available as part of the platform.