Library · Summary & review

Thinking Low-Level, Writing High-Level

Write Great Code, vol. 2, by Randall Hyde. What your elegant code costs once the compiler is done with it.

FR EN
Write Great Code Volume 2 book cover

Write Great Code, vol. 2

Write Great Code, Volume 2: Thinking Low-Level, Writing High-Level (2nd Edition)

6.6 /10

« The machine price list for every line of code: dated in its tools, irreplaceable in its verdicts. »

  • AuthorRandall Hyde · auteur de The Art of Assembly Language
  • OriginalNo Starch Press, 2020 · 656 pages
  • EditionNotes based on the 1st ed. (2006) + verified 2nd ed. additions
  • This page~10 min read
Book rating across 5 dimensionsIdeas8/10Practical6/10Readability6/10Aged well6/10Examples7/10

What every construct in your language costs once compiled: switches, loops, objects, strings.

Why this book

Volume 1 taught what the machine does with your data. This one answers the question that follows: what does the compiler do with your statements? Two pieces of code can look equivalent in PHP, C or JavaScript and compile into machine code that differs by an order of magnitude. Hyde's bet: you do not need to write assembly (the language with one machine instruction per line) to know which is which. "Think in assembly; write in a high-level language" (p. 5).

And he attacks the misquote that shields us from the question. "Premature optimization is the root of all evil"? Hoare was talking about counting cycles in assembly, not about an architect choosing structures wisely at design time. Pushing every cost decision to a hypothetical "optimization phase" means it never happens.

The ideas that stick

1Think in assembly, write in a high-level language#

The first generation of high-level programmers came from assembly: they naturally picked the constructs that compiled well, because they could see the machine code behind the statement. The next generation inherited the languages without the eye, and, as Hyde puts it, "most software engineers have no idea about the runtime costs of HLL statements" (p. xviii). His fix is not to go back to assembly; it is to restore the eye.

To the objection "the compiler optimizes better than me", he answers twice:

  • "One of the best-kept secrets in the compiler world is that most compiler benchmarks are rigged" (p. 4): vendors write the reference programs themselves, knowing exactly what their compiler rewards;
  • more fundamental: no optimizer will ever swap your linear search for a binary search. Algorithms and data structures stay your job.

The low-level eye spots the hidden cost behind an innocent line:

// ✗ strlen() RE-COUNTS the whole string every iteration → huge hidden cost
for (int i = 0; i < strlen(s); i++) { ... }

// ✓ measure ONCE: same result, no trap
int n = strlen(s);
for (int i = 0; i < n; i++) { ... }
A developer typing elegant flowing calligraphy on a screen while their thought bubble contains tiny precise gears and binary-like dots arranged in neat machine patterns
Write in your language. Think in the machine's.

2The optimizer is bounded: help it#

Why doesn't the compiler just fix everything? Because perfect optimization is intractable: "a full guaranteed optimization of a modern application could take longer than the known lifetime of the universe" (p. 71). Real compilers run time-boxed heuristics over small windows of code. Messy code, deeply nested conditions, jumps that tangle the control flow: all of it burns the heuristic's budget before it has done anything useful for you.

Hence the book's quiet rule: "Great code works synergistically with the compiler, not against it" (p. 72).

int x = 3 * 4;     // CONSTANT FOLDING → the binary just contains "12"
if (false) foo(); // DEAD-CODE ELIMINATION → foo() vanishes from the binary
y = i * 8;        // STRENGTH REDUCTION → becomes  y = i << 3  (a shift, cheaper)

Three tools the optimizer applies on its own, as long as your code is simple enough for it to spot them:

  • constant folding: computing 3 * 4 once, at compile time;
  • dead-code elimination: deleting what can never run;
  • strength reduction: replacing an expensive operation with a cheaper one, like a multiplication with a bit shift.

And keep in mind the three things it will never do: choose your algorithm, choose your data structure, restructure your architecture.

3Trust the compiler, then check its work#

The book's working method is a loop: write the high-level version, look at the generated code, compare. On a real case, it is eye-opening:

// your source                   // what Compiler Explorer shows (x86, -O2)
int twice(int x) {              twice(int):
  return x * 2;                   lea eax, [rdi + rdi]  ; no "mul": an addition!
}                                ret

The compiler swapped the multiplication for a faster address-addition: exactly the kind of thing you only see by looking. In 2006 that meant gcc -S and disassemblers; today the same loop is one paste into Compiler Explorer.

The book works in C/C++ (with some Pascal), but the loop applies to any natively compiled language: in Go, go build -gcflags=-S prints the assembly, and Compiler Explorer accepts Go and Rust too. For Java or C# there is one more floor: the bytecode (javap -c) is not the final code; the JIT produces the real machine code at runtime.

Two rules survive intact:

  • check at the same optimization level as production: "you should never tweak your high-level code to produce better assembly code at one optimization level and then change the optimization level for your production code" (ch. 6);
  • if two versions compile to the same machine code, "you should use the more readable and maintainable version" (ch. 6).

The whole exercise serves readability as much as speed: micro-rewrites that change nothing get rejected with proof.

4Every variable has an address cost#

Quick scenery if you have never touched low-level code. The CPU only computes in its registers, a handful of cells inside the chip itself. Everything else lives in RAM, so every computation starts by fetching the data. And that RAM is carved into zones: the stack, where each function call automatically parks its local variables (pushed on call, discarded on return); the globals, sitting at a fixed address for the program's whole life; and the heap, the self-service area for objects and anything allocated on demand. The heap's quirk: you never reach it directly, only through a pointer, that is, a variable holding the data's address. Read the pointer first, the data second: two trips instead of one.

So where a variable lives decides what each access costs. The ladder, from free to expensive:

  • register: "Machine registers are always the most efficient place to keep variables and parameters" (p. 228); the compiler assigns them, mostly better than you would;
  • local on the stack: one short instruction, as long as it sits within the first 127 bytes of the frame, the function's own zone of the stack (beyond that, the instruction must carry a longer address);
  • global: a full 32-bit address embedded in every instruction that touches it, and a poison for the optimizer, which can rarely prove who else modifies it;
  • heap: a pointer load first, then the access, plus the allocator's bookkeeping around it.

In PHP or JavaScript you choose none of this: the engine decides for you, and it puts your objects on the heap. That is one reason an object is structurally more expensive than a scalar, whatever the language.

Two practical corollaries from the book:

  • declare your frequently used scalars (numbers, booleans) first and your big arrays last, so the hot variables stay in the cheap zone of the frame;
  • order struct fields from largest to smallest to avoid invisible padding: a char (1 byte) followed by an int (4 bytes) occupies 8 bytes, not 5, because the int must start on a multiple of 4.

5An array access is an address calculation#

a[i] looks free; it is actually an address computation: start address + index × element size. Concretely: an array of 4-byte integers starting at address 1000, its element 3 lives at 1000 + 3 × 4 = 1012. The machine runs this computation on every access. If the element size is a power of two, the multiplication becomes a single bit shift, nearly free; if it is 9 bytes, the compiler emits extra instructions at every access. Each added dimension (a[i][j]) adds a multiplication.

And the order you walk a 2D array decides cache behavior. Same logic, two loop orders:

// ✗ column by column: each access jumps a whole row → cache miss
for (j...) for (i...) a[i][j] = 0;

// ✓ row by row: contiguous accesses, the cache line is reused → up to 10× faster
for (i...) for (j...) a[i][j] = 0;

For strictly identical logic, the second can run ten times faster. Direct echo of volume 1's cache-line lesson.

The honest transposition for web developers: a PHP array or a sparse JavaScript array is a hash map, not an array. Every warning the book makes about "purely dynamic arrays" (bookkeeping at each access) applies with extra force; the only true arrays in JS are TypedArrays.

6Strings: copying is the enemy#

Concrete frame: you are generating an HTML page, a JSON, a CSV, anything built piece by piece. In memory, a string is a row of bytes glued side by side. You cannot just "append at the end": the space next door is already occupied by something else. So on every out += piece, the engine actually does three things: reserve a bigger zone, copy everything accumulated so far, then copy the new piece.

On a short string, invisible. Inside a loop, it becomes copies of copies: on the thousandth turn you re-copy the 999 pieces already copied. To produce a one-megabyte page, you will have moved hundreds of them in total. That is what the book measures: "Copying string data from one place to another in memory is one of the more expensive costs [...]" (p. 300). The remedy fits in one line: accumulate the pieces in an array, and glue only once, at the end.

// each += copies THE WHOLE accumulated string again
let out = "";
for (const row of rows) out += render(row);

// build the pieces, copy once
const out = rows.map(render).join("");

Same family of waste: in C, a string's length is stored nowhere; strlen recounts it byte by byte up to the final zero. Calling it in a loop condition means re-reading the whole string on every turn. The reflex transposed to our world: do not recompute inside a loop what does not change (a count(), a query, a regex).

Worth knowing: modern engines absorb part of this cost (V8 defers concatenation, PHP over-reserves space). But the book's mental model stays right, and it is its underlying rule: every string manipulation can hide a full copy. Pass references around, and only copy when someone actually needs their own copy.

7The real price of dynamism#

The most valuable chapter for a web developer is the one on variant types, the ancestor of PHP/JS/Python values. Adding two statically typed integers costs 2 or 3 instructions. Adding two variants means inspecting the type of each operand, converting if needed, then dispatching: "It's not at all unreasonable to expect a variant addition operation to require dozens, if not hundreds, of machine instructions" (ch. 12). That single sentence explains why V8 and PHP 8's JIT win so much by specializing types at runtime, and what they fight against.

Two siblings in the same family:

  • when you call $object->method(), the machine does not know in advance which code to run: it depends on the object's class, which may override the method. So it must first look up the right version (read the class's card, then find the method's address in it) before executing it. Two detours before the work. Tolerable ("about 10 percent of your application's total performance", ch. 12), until deep hierarchies and getters/setters called everywhere multiply it;
  • calling a function has a fixed price, whatever its content: save where we were, pass the arguments, jump, come back. The book counts ~9 instructions of that machinery wrapping 3 instructions of actual work: like sending a courier ten kilometers to hand over an envelope. Give your functions real work, or let inlining (the compiler pasting the function body in place of the call) remove the trip.

8Branches: the switch is not what you think#

What does the compiler do with a switch? It depends on your case values, and that is where everyone gets it wrong.

  • Three or four cases: it generates the same thing as a series of if/else; the value is compared case by case.
  • Many cases with consecutive values (0, 1, 2, 3…): it builds a jump table. Picture an array of addresses where slot n holds "where the code for case n lives": the CPU reads the slot, jumps, done. One single jump, whether the switch has 4 cases or 400. That is what a "fast" switch is.

The trap: the table needs a slot for every value between the smallest and the largest case, including the ones you never use. The book's example: cases 0 through 15, plus one lone case at 10,000. Keeping the table would take 10,001 slots (40,004 bytes), 9,985 of them empty. No compiler accepts that: it falls back to the series of comparisons, and your "fast" switch becomes a dressed-up if-chain, with nothing warning you.

Two more habits from these chapters:

  • in a chain of conditions, put first the one that settles the matter most often. With &&, as soon as one test is false the rest is not even evaluated: so put the most-often-false test first. Example: if (user.isAdmin && expensiveAudit(user)); almost nobody is admin, so the expensive audit almost never runs. With ||, the opposite: most-often-true test first;
  • in f(x) + g(x), which runs first, f or g? JavaScript guarantees it (left to right), C and C++ guarantee nothing, PHP does not promise it everywhere. If f and g modify something along the way (a counter, a cart, a file), the result can depend on the compiler. The safe reflex: call them on two separate lines, then combine the results.

Three things I didn't know

My take, honestly

I will not play the expert: I am a web developer, I do not read assembly, and half of this book is above my daily work. But that is exactly why it stuck with me. It answers a question I actually ask while coding: of these two ways to write the same thing, which one costs more? And it answers with proof, not slogans.

What I take away, seen from my desk: the variant chapter finally made me understand why PHP and JavaScript pay a tax on every operation, and what V8 or PHP 8's JIT are fighting to win back. The rest gave me orders of magnitude: what is free, what costs, what hides a copy. I will never look at the assembly of my controller on a Tuesday morning. But pasting two versions of a function into Compiler Explorer to settle a micro-optimization debate, that I can do, and it closes the discussion in thirty seconds.

The limits, honestly: half the pages are assembly listings (x86 and PowerPC) that I skimmed, and I read the 1st edition (2006), whose tooling has aged. The 2nd edition (No Starch, August 2020, 656 p.) is the one to buy: per the publisher it covers 64-bit CPUs, ARM, the JVM and the .NET CLR, with examples from Swift and Java.

Bottom line for a web dev: you do not come out an optimizer, you come out less gullible. You stop repeating performance advice heard elsewhere, because you have seen the mechanism underneath. That is already a lot.

Odilon

Still relevant in 2026?

The method more than ever, the details less. JITs (V8, PHP 8) and modern optimizers have absorbed several of the micro-verdicts, and that is precisely why the chapter on what optimizers cannot do (your algorithms, your structures, your tangled control flow) is the most durable part. Compiler Explorer turned the book's laborious workflow into a 30-second habit. And in the AI era the lesson doubles: generated code is exactly the kind of plausible-but-unexamined code this book teaches you to price. Skip in the 1st edition: the PowerPC chapter and the 2006 tooling.

Who is it for?

Read it if

  • You loved volume 1 and want the sequel that prices your statements, not your data
  • You write in a dynamic language and want to know what the engine fights for you
  • You repeat performance tips ("switch is faster") without knowing the mechanism behind them
  • You review AI-generated code and want a cost model for judging it

Skip it if

  • Pages of assembly listings put you off: half the book demonstrates through them
  • You have not read volume 1: the memory and cache notions are assumed
  • You only find the 1st edition: PowerPC and the 2006 toolchain have aged badly

Going further

Start with volume 1, which builds the memory and CPU foundations this book leans on. In the library, Effective TypeScript shows the type-system counterpart of the variant tax, and Fluent Python explores what a dynamic language does under its own hood. My free courses apply the same rule everywhere: explain the real mechanism, not the incantation.

Comments (0)

Browse the whole library

More book notes coming: one book at a time, the marrow only.