Algorithms in drawings: binary search, hash tables, Dijkstra and dynamic programming without a single tear.
Why this book
Plenty of web developers, me included, learned the job without a computer science degree. Algorithms are the chapter we skipped, and the one job interviews love most. This book exists precisely for us: 250 pages, 400 hand-drawn doodles, Python code, and zero formal proofs.
I read the 1st edition (2016); the one to buy is the 2nd (2024): code moved to Python 3, new chapters on trees (binary search trees, balanced trees, B-trees), and a rewritten take on NP-hard problems.
The ideas that stick
111 days versus 30 milliseconds
First let's decode the notation on every page. Big O, written O(...), does not measure a duration, but a growth rate: if you double the data, what does the time do? Three families cover 90% of the job:
O(1): constant. Data size changes nothing. Readingarray[500]is as fast asarray[5];O(n): linear. Twice the data = twice the time. A loop walking everything (foreach);O(log n): logarithmic. Doubling the data adds only one step. The magic class, the one of the binary search below.
The n is the number of elements. And the golden rule: keep the term that dominates as n grows, drop the constants. A loop of 3 billion operations and a loop of n operations are both O(n): at scale, the shape of the curve decides, not its coefficient.
The book makes all of this tangible with a story. Bob has to write, for NASA, a function that searches a name in a sorted list of one billion elements. He has two methods to choose from:
- simple search: walk the list from the start, one by one, until the right name. Worst case, the name is last: one billion checks. That is
O(n); - binary search: like looking up a word in a paper dictionary. Open in the middle, the name is before or after, throw away the useless half, repeat on what is left. How many times can you cut one billion in half before only one element remains? One billion → 500 million → 250 million… and you reach 1 in just 30 cuts. So 30 guesses, not one more. That is
O(log n).
Bob's mistake, the one the book wants to break: he tells himself "binary is like 15 times faster". His brain reasons in proportion, as if both methods sat on the same curve at nearby speeds. They do not: one runs one billion operations, the other runs 30. The real ratio is not 15, it is one billion divided by 30, that is 33 million. The book turns it into time (10 ms per operation) to make it palpable: simple search would take "11 days!" (p. 13), binary search 30 milliseconds. For the exact same result.
The moral fits in one word: these two families do not differ a little, they differ in nature. That is why we label them with Big O rather than a stopwatch: O(log n) announces "this curve stays flat even when the data explodes", which no benchmark on a small list will ever reveal.
Fair objection: you will never write your own hash table, the language provides it. But Big O is not for rewriting those structures. It is for picking the right one and spotting the trap. Three everyday examples:
- You hunt duplicates in a list with a loop inside a loop. That is
O(n²). Invisible on 50 elements, it freezes the page on 10,000. ASet(a hash table) brings the same task back toO(n). - Inside a loop, you test
array.includes(x). Eachincludesrescans the whole array, so it isO(n), and the whole loop becomesO(n²). The same thing withSet.has(x)isO(1). - An
unshift(), a mid-arraysplice(), an SQL query with no index: each hides anO(n)behind a line that looks harmless.
So knowing Big O is not about coding a sort. It is about reading your own code and spotting the line that will slow down as the data grows. And incidentally, it is exactly what technical interviews test.
2Arrays and linked lists: the cinema problem
Let's start with what these two words really mean, in memory. A computer's memory is one long row of numbered cells, like a street of addresses.
An array stores its elements in cells glued next to each other. Since it knows the first cell's address and the size of one element, it instantly computes where the 500th lives: start address + 500 × size. So reading any position is free, that is O(1). The flip side: to insert an element in the middle, the cells must stay glued, so you have to shift everything after it over by one. On a big array, that costs, that is O(n).
A linked list does the opposite: its elements are scattered anywhere in memory, and each one holds the address of the next, like a treasure hunt where every clue points to the next. Inserting shifts nothing: you just rewrite two addresses, that is O(1). The flip side: to reach the 500th, there is no computing it, you must follow the clues one by one from the start. So reading is sequential, that is O(n).
Hence the book's metaphor: you and five friends want seats in a packed cinema. The array needs six seats side by side (and everyone moves if a seventh friend shows up). The linked list says "let's split up", and each person just remembers where the next one sat; but to find friend number 5, you go through the first four.
The book's summary: "Arrays allow fast reads. Linked lists allow fast inserts and deletes" (p. 36). And you touch it daily without knowing: in JavaScript, push() appends at the end disturbing nothing (O(1)), but unshift() inserts at the start, shifting everything else over by one (O(n)).
// O(1): drop it at the end, nobody moves
queue.push(customer)
// O(n): inserting at the start shifts EVERYTHING over by one
queue.unshift(customer) // on 1 million elements, you feel it
Python lists, JS arrays, PHP arrays: every everyday structure rests on this choice, made for you. Knowing which one is under your fingers means knowing which operation is free and which one costs.
The book is in Python, but here is where each of the four basic structures lives in the web languages:
| Structure | PHP | JavaScript | Go | Python |
|---|---|---|---|---|
| Array (read O(1)) | SplFixedArray | Array, Int32Array | []T (slice) | list |
| Linked list (insert O(1)) | SplDoublyLinkedList | write it yourself | container/list | deque |
| Set (uniqueness, has O(1)) | $s[$x]=true | Set | map[T]struct{} | set |
| Map (key→value O(1)) | array | Map | map[K]V | dict |
The false friend: PHP's array is NOT an array, it is an ordered hash table. It plays array, list, stack, queue, dictionary and set all by itself, which is why a PHP dev rarely codes the others. The catch is still the chapter's trap: array_unshift() reindexes everything, so O(n). And deque (Python) and SplDoublyLinkedList (PHP) are real linked lists: insertion at both ends in O(1).
3Recursion is for the programmer, not the machine
A recursive function is simply a function that calls itself. The idea surprises, but it exists to break a big problem into a smaller version of the same problem, down to a case so small you can answer it directly. The book reduces it to two mandatory ingredients:
- the base case: the stopping condition, the smallest case you handle without calling the function again;
- the recursive case: the function calls itself on a problem one notch smaller.
Forget the base case, and the function never stops: your countdown prints 3, 2, 1, 0, -1, -2… forever. On a factorial, it looks like this:
function factorial(n) {
if (n === 1) return 1 // base case: we can answer, we stop
return n * factorial(n - 1) // recursive case: one notch smaller
}
But where are all these pending calls stored? On the call stack, which the book draws as a pile of sticky notes: each call sticks a note on top ("I still have to multiply by 3"), and when the base case answers, the notes come off one by one in reverse order. Unrolled on factorial(3):
For a web dev, recursion shines on everything nested by nature: walking a DOM tree, a thread of nested comments (a reply to a reply to a reply, like on Reddit), a deep JSON, a file tree. A plain loop struggles there; recursion matches the shape of the problem. Hence the quote that settles the eternal loop-versus-recursion debate, borrowed from Leigh Caldwell: "Loops may achieve a performance gain for your program. Recursion may achieve a performance gain for your programmer. Choose which is more important in your situation!" (p. 41).
4Divide and conquer: sorting without breaking a sweat
Divide and conquer (D&C) is a strategy, not one precise algorithm. It is the direct extension of recursion (idea 3), in three reflexes:
- find the trivial case: the smallest problem you can solve without thinking (a list of 0 or 1 element is already sorted);
- shrink toward it: cut the problem into smaller pieces of the same kind;
- recombine the pieces' solutions.
The book first illustrates it with a 1,680 × 640 m farm plot to cut into the largest possible equal squares. You lay the biggest square that fits (640 × 640), a strip is left, you repeat on the strip, again and again, until a plot where the division is exact. Answer: 80 × 80 squares. You have just run Euclid's algorithm (the greatest-common-divisor computation, 2,300 years old) without knowing it.
But the use you meet every day is sorting. Quicksort applies the exact three reflexes: pick a random element, the pivot; put everything smaller on the left, everything larger on the right; then sort each side the same way. The trivial case: a side that is empty or has one element is already sorted, you stop.
function quicksort(list) {
if (list.length < 2) return list // trivial case: 0 or 1 element, already sorted
const pivot = list[0]
const smaller = list.slice(1).filter(x => x <= pivot)
const larger = list.slice(1).filter(x => x > pivot)
return [...quicksort(smaller), pivot, ...quicksort(larger)] // recombine
}
The trap, which the book dismantles with call stacks: if you always take the first element as pivot and the list is already sorted, each cut removes only one element, and quicksort degenerates into O(n²). The fix is one word: "if you always choose a random element as the pivot, quicksort completes in O(n log n) on average" (p. 71). In practice you will never write this code: your language's .sort() is already a finely tuned quicksort (or mergesort). But knowing what it does underneath is understanding why sorting costs O(n log n) and not O(n).
Quicksort applies the same instinct to sorting: pick a pivot, split into smaller-than and bigger-than, sort each side the same way. The trap, explained with call-stack drawings: pick the first element of an already-sorted list as your pivot and you hit O(n²); "if you always choose a random element in the array as the pivot, quicksort will complete in O(n log n) time on average" (p. 71).
5The hash table is Maggie, who knows all the prices
The book introduces O(1) with a human. At the grocery store, looking up a price page by page is O(n); in a sorted book, O(log n) (idea 1's binary search). But your colleague Maggie has memorized everything: you say "milk?", she answers "$1.49" instantly, whether the catalog has 10 or 10,000 items. That is what O(1) means: the response time no longer depends on the size.
A hash table is Maggie in code. The secret is a hash function: a grinder that turns a key (text) into a slot number. You feed it "milk", it spits out, say, 3; you store the price in slot 3 of an array. To read it back, you run "milk" through the same grinder, it returns 3, and you go straight to slot 3. No walking: you compute the address instead of searching for it. Read, write, delete, all O(1) on average.
// you already use it every day, without seeing the hash function underneath
const prices = new Map()
prices.set('milk', 1.49) // "milk" → internal slot → drop 1.49 there
prices.get('milk') // "milk" → same slot → 1.49, in O(1)
"Hash tables are probably the most useful complex data structure you'll learn" (p. 78), and you meet them under a thousand names: Map and Set in JS, dict in Python, array in PHP. The book gives them three jobs:
- lookups: mapping a key to a value (DNS, which translates
web-developpeur.cominto an IP address, is one giant hash table); - duplicate prevention: the voter who tries to vote twice is caught in
O(1)(it is theSetthat killed idea 1'sO(n²)loop); - caching: memorize an expensive answer the first time so you never recompute it (Redis, OPcache, the browser's HTTP cache, all hash tables).
The fine print: two different keys can land on the same slot (a collision), then chained into a mini-list in that slot. Too many collisions and the O(1) degrades; that is why the engine watches the fill rate and grows the array when it passes ~70%.
🎨 Illustration to generate : copy this prompt into ChatGPT :
Flat illustration, warm ivory background (#faf7f2), muted green and terracotta palette, no text, minimalist editorial style, a serene grocery store clerk with closed eyes instantly holding up a price tag with one hand, while behind her a long queue of amazed customers and a colleague frantically flipping through an enormous thick price book, deadpan humor, 3:2 aspect ratio
Planned caption: "Maggie answers in O(1), whatever the size of the catalog. Her colleague with the book is doing O(n)."
6The mango seller and the piano: graphs without fear
A graph is just a way to model links: nodes (things) joined by edges (links between those things). A social network is a graph (people, friendships); so is a road map (cities, roads); so is the web (pages, links). As soon as you have "who is connected to whom", you have a graph, and two algorithms extract the essentials.
Breadth-first search (BFS) answers "what is the shortest path from A to B?". The book's example: you want a mango seller in your network. You check your direct friends first (ring 1), then their friends (ring 2), and so on, in ever-wider rings, using a simple queue (first in, first out). The first seller found is necessarily in the nearest ring: that is why "breadth-first search not only finds a path from A to B, it also finds the shortest" (p. 103), shortest in number of hops.
One vital detail: mark people already checked. Peggy can show up via Alice AND via Bob; without marking, you recheck her endlessly and loop forever. (That "seen already?" is a Set, in O(1), idea 5.)
Dijkstra takes over when edges have a weight. Hops stop being enough: on a map, the path with the fewest roads is not the shortest in kilometers. The book's example is a barter chain, from a music book up to a piano, via posters, LPs and drums, each trade costing a little money. Dijkstra always expands the cheapest node reached so far, and finds the $35 total path that a plain hop-count would miss. The limit, stated by the book: a negative weight breaks Dijkstra, "there's an algorithm for that! It's called the Bellman-Ford algorithm" (p. 130).
You use both without knowing: BFS whenever a site suggests "people you may know" (the friends-of-friends in ring 2), and Dijkstra on every Google Maps route (roads weighted by travel time).
7Greedy algorithms, and the wall of NP-complete problems
A greedy algorithm follows a dumb rule: at each step, take the best choice right now, without thinking about what comes next, and hope it adds up to the best overall result. You already do it: to give back 88 cents, you grab the biggest coin that fits (50), then the biggest that fits in what's left (20), and so on. That's greedy, and with our coins it always lands right.
The catch is that "lands right" is not guaranteed. Sometimes greedy gives the exact optimum (scheduling rooms: always pick the class that ends earliest leaves the most room for the next ones). Sometimes it is wrong, and the book proves it with its burglar: a knapsack that carries 35 lb, and three items:
- a stereo worth $3,000, weighing 30 lb;
- a laptop worth $2,000, 20 lb;
- a guitar worth $1,500, 15 lb.
Greedy grabs the most expensive first, the stereo ($3,000, 30 lb). Only 5 lb left: nothing else fits. Loot: $3,000. Except laptop + guitar (20 + 15 = 35 lb exactly) was worth $3,500. The best local choice missed the best global one.
Then the book hits the wall: some problems explode no matter how clever you are. The travelling salesperson who must visit 10 cities by the shortest route faces 3,628,800 possible routes, and each added city multiplies that count. These problems have a name, NP-complete: no known method solves them fast as the size grows, we only know how to test an exploding number of combinations. The book's advice is liberating: for them, stop looking for the perfect solution, take a "good enough" greedy approximation and move on. Recognizing a wall and stopping optimizing is a skill in itself.
8Dynamic programming: a grid, not a formula
Dynamic programming (DP) is the clean answer to idea 7's greedy trap. The principle: instead of grabbing the best item in the moment, you break the big problem into a swarm of tiny sub-problems ("what's the best loot if I only had a 1 lb bag? a 2 lb bag? a 3 lb bag?"), solve each once, and store the answers in a grid you fill step by step. Each cell builds on cells already computed: no recompute, no missed combination.
Back to the burglar, but with a 4 lb knapsack and three items: a $1,500 guitar (1 lb), a $3,000 stereo (4 lb), a $2,000 laptop (3 lb). You draw a grid: one row per item, one column per bag capacity from 1 to 4 lb. Each cell answers one precise question: "with only the items seen up to this row, and a bag of this capacity, what's the best loot?". You fill it row by row, and the final answer emerges in the last cell, bottom right.
The honest warning that makes this the best DP introduction around: "there's no single formula for calculating a dynamic-programming solution" (p. 186). You design the grid per problem. The same grid technique then compares strings (longest common substring: typo "hish" is closer to "fish" than to "vista"), which is what git diff and spell checkers build on.
Three things I didn't know
- The Fourier transform explained in one sentence the book borrows with glee: "given a smoothie, the Fourier transform will tell you the ingredients in the smoothie" (p. 207). It is how MP3 and JPG decide what to throw away.
- Bloom filters can be wrong in only one direction: false positives possible, false negatives impossible. Exactly what Google needs for "have I already crawled this URL?" at a fraction of the memory.
- The farm-plot trick of chapter 4 is Euclid's algorithm, about 2,300 years old. The book never name-drops it, which is very much its style.
My take, honestly
I am the target audience: a web developer whose algorithm training happened on the job, with a faint guilt about it. This book removed the guilt in an afternoon. The doodles are not decoration, they are the explanation: the call stack as sticky notes and the DP grid filled cell by cell taught me more than any lecture would have.
The honest limits: it is a gateway, not a reference. No proofs, no amortized analysis, and the 1st edition's machine-learning chapter (k-nearest neighbors, KNN: classify a data point by the examples it most resembles) reads like a postcard from 2016, written before deep learning ate that world. The 1st edition's Python 2 also shows its age, which the 2nd edition fixes along with adding the missing chapter on trees.
If sorting algorithms ever made you feel stupid, this is the antidote. Read it in a weekend, then let the heavier books make sense.
Odilon
Still relevant in 2026?
More than ever, with a twist: AI assistants now write the quicksorts for you, but they also confidently produce O(n²) code that works fine on 100 items and dies on a million. Recognizing growth rates in generated code is exactly the skill this book installs, and you cannot delegate it. The 2nd edition (2024) keeps the material current: Python 3, tree chapters, a rewritten NP-hard discussion. And the interview industry has not budged: hash tables, BFS and Big O remain the toll you pay at every door.
Who is it for?
Read it if
- You are self-taught and the word "algorithm" still triggers mild dread
- You have interviews coming: this is the fastest respectable preparation ramp
- You review AI-generated code and want to spot the accidental O(n²)
- You teach or mentor juniors: the pedagogy is worth stealing
Skip it if
- You have a CS degree: you know all of this, in more rigorous form
- You want proofs and amortized analysis: this book deliberately has none
- You need advanced material: it stops where CLRS begins
Going further
The book's code is Python: my Python course gets you fluent enough to run every example. In the library, Python Crash Course shares the same gentle-first-book spirit, Designing Data-Intensive Applications shows what these structures become at planetary scale, and Write Great Code vol. 1 explains the memory hierarchy hiding behind every "constant factor".
Comments (0)