Algorithms in drawings: binary search, hash tables, Dijkstra and dynamic programming without a single tear.

Why this book

Plenty of web developers, me included, learned the job without a computer science degree. Algorithms are the chapter we skipped, and the one job interviews love most. This book exists precisely for us: 250 pages, 400 hand-drawn doodles, Python code, and zero formal proofs.

I read the 1st edition (2016); the one to buy is the 2nd (2024): code moved to Python 3, new chapters on trees (binary search trees, balanced trees, B-trees), and a rewritten take on NP-hard problems.

The ideas that stick

111 days versus 30 milliseconds

First let's decode the notation on every page. Big O, written O(...), does not measure a duration, but a growth rate: if you double the data, what does the time do? Three families cover 90% of the job:

O(1): constant. Data size changes nothing. Reading array[500] is as fast as array[5];
O(n): linear. Twice the data = twice the time. A loop walking everything (foreach);
O(log n): logarithmic. Doubling the data adds only one step. The magic class, the one of the binary search below.

The n is the number of elements. And the golden rule: keep the term that dominates as n grows, drop the constants. A loop of 3 billion operations and a loop of n operations are both O(n): at scale, the shape of the curve decides, not its coefficient.

The book makes all of this tangible with a story. Bob has to write, for NASA, a function that searches a name in a sorted list of one billion elements. He has two methods to choose from:

simple search: walk the list from the start, one by one, until the right name. Worst case, the name is last: one billion checks. That is O(n);
binary search: like looking up a word in a paper dictionary. Open in the middle, the name is before or after, throw away the useless half, repeat on what is left. How many times can you cut one billion in half before only one element remains? One billion → 500 million → 250 million… and you reach 1 in just 30 cuts. So 30 guesses, not one more. That is O(log n).

Bob's mistake, the one the book wants to break: he tells himself "binary is like 15 times faster". His brain reasons in proportion, as if both methods sat on the same curve at nearby speeds. They do not: one runs one billion operations, the other runs 30. The real ratio is not 15, it is one billion divided by 30, that is 33 million. The book converts it to time (1 ms per operation): simple search would take "11 days!" (p. 13), binary search 30 milliseconds. Same result.

The moral fits in one word: these two families do not differ a little, they differ in nature. That is why we label them with Big O rather than a stopwatch: O(log n) announces "this curve stays flat even when the data explodes", which no benchmark on a small list will ever reveal.

Six steps in, 98% of the work has vanished. Simple search would still be at 0.0000015%.

What is this for, when I already have built-in structures?

Fair objection: you will never write your own hash table, the language provides it. But Big O is not for rewriting those structures. It is for picking the right one and spotting the trap. Three everyday examples:

You hunt duplicates in a list with a loop inside a loop. That is O(n²). Invisible on 50 elements, it freezes the page on 10,000. A Set (a hash table) brings the same task back to O(n).
Inside a loop, you test array.includes(x). Each includes rescans the whole array, so it is O(n), and the whole loop becomes O(n²). The same thing with Set.has(x) is O(1).
An unshift(), a mid-array splice(), an SQL query with no index: each hides an O(n) behind a line that looks harmless.

So knowing Big O is not about coding a sort. It is about reading your own code and spotting the line that will slow down as the data grows. And incidentally, it is exactly what technical interviews test.

2Arrays and linked lists: the cinema problem

Let's start with what these two words really mean, in memory. A computer's memory is one long row of numbered cells, like a street of addresses.

An array stores its elements in cells glued next to each other. Since it knows the first cell's address and the size of one element, it instantly computes where the 500th lives: start address + 500 × size. So reading any position is free, that is O(1). The flip side: to insert an element in the middle, the cells must stay glued, so you have to shift everything after it over by one. On a big array, that costs, that is O(n).

A linked list does the opposite: its elements are scattered anywhere in memory, and each one holds the address of the next, like a treasure hunt where every clue points to the next. Inserting shifts nothing: you just rewrite two addresses, that is O(1). The flip side: to reach the 500th, there is no computing it, you must follow the clues one by one from the start. So reading is sequential, that is O(n).

Hence the book's metaphor: you and five friends want seats in a packed cinema. The array needs six seats side by side (and everyone moves if a seventh friend shows up). The linked list says "let's split up", and each person just remembers where the next one sat; but to find friend number 5, you go through the first four.

The whole trade-off comes down to one fact: an array's cells are glued together, a list's cells are joined by arrows.

The book's summary: "Arrays allow fast reads. Linked lists allow fast inserts and deletes" (p. 36). And you touch it daily without knowing: in JavaScript, push() appends at the end disturbing nothing (O(1)), but unshift() inserts at the start, shifting everything else over by one (O(n)).

// O(1): drop it at the end, nobody moves
queue.push(customer)

// O(n): inserting at the start shifts EVERYTHING over by one
queue.unshift(customer)   // on 1 million elements, you feel it

Python lists, JS arrays, PHP arrays: every everyday structure rests on this choice, made for you. Knowing which one is under your fingers means knowing which operation is free and which one costs.

These structures in your language, concretely

The book is in Python, but here is where each of the four basic structures lives in the web languages:

Structure	PHP	JavaScript	Go	Python
Array (read O(1))	`SplFixedArray`	`Array`, `Int32Array`	`[]T` (slice)	`list`
Linked list (insert O(1))	`SplDoublyLinkedList`	write it yourself	`container/list`	`deque`
Set (uniqueness, has O(1))	`$s[$x]=true`	`Set`	`map[T]struct{}`	`set`
Map (key→value O(1))	`array`	`Map`	`map[K]V`	`dict`

The false friend: PHP's array is NOT an array, it is an ordered hash table. It plays array, list, stack, queue, dictionary and set all by itself, which is why a PHP dev rarely codes the others. The catch is still the chapter's trap: array_unshift() reindexes everything, so O(n). And deque (Python) and SplDoublyLinkedList (PHP) are real linked lists: insertion at both ends in O(1).

3Recursion is for the programmer, not the machine

A recursive function is simply a function that calls itself. The idea surprises, but it exists to break a big problem into a smaller version of the same problem, down to a case so small you can answer it directly. The book reduces it to two mandatory ingredients:

the base case: the stopping condition, the smallest case you handle without calling the function again;
the recursive case: the function calls itself on a problem one notch smaller.

Forget the base case, and the function never stops: your countdown prints 3, 2, 1, 0, -1, -2… forever. On a factorial, it looks like this:

function factorial(n) {
  if (n === 1) return 1       // base case: we can answer, we stop
  return n * factorial(n - 1)  // recursive case: one notch smaller
}

But where are all these pending calls stored? On the call stack, which the book draws as a pile of sticky notes: each call sticks a note on top ("I still have to multiply by 3"), and when the base case answers, the notes come off one by one in reverse order. Unrolled on factorial(3):

Recursion is not magic: behind it, a stack builds up on the way in and empties on the way out. That stack is what overflows when it crashes.

For a web dev, recursion shines on everything nested by nature: walking a DOM tree, a thread of nested comments (a reply to a reply to a reply, like on Reddit), a deep JSON, a file tree. A plain loop struggles there; recursion matches the shape of the problem. Hence the quote that settles the eternal loop-versus-recursion debate, borrowed from Leigh Caldwell: "Loops may achieve a performance gain for your program. Recursion may achieve a performance gain for your programmer. Choose which is more important in your situation!" (p. 41).

4Divide and conquer: sorting without breaking a sweat

Divide and conquer (D&C) is a strategy, not one precise algorithm. It is the direct extension of recursion (idea 3), in three reflexes:

find the trivial case: the smallest problem you can solve without thinking (a list of 0 or 1 element is already sorted);
shrink toward it: cut the problem into smaller pieces of the same kind;
recombine the pieces' solutions.

The book first illustrates it with a 1,680 × 640 m farm plot to cut into the largest possible equal squares. You lay the biggest square that fits (640 × 640), a strip is left, you repeat on the strip, again and again, until a plot where the division is exact. Answer: 80 × 80 squares. You have just run Euclid's algorithm (the greatest-common-divisor computation, 2,300 years old) without knowing it.

But the use you meet every day is sorting. Quicksort applies the exact three reflexes: pick a random element, the pivot; put everything smaller on the left, everything larger on the right; then sort each side the same way. The trivial case: a side that is empty or has one element is already sorted, you stop.

The pivot cuts the problem into two smaller halves of the same kind. Gluing the sorted sides around the pivot is enough: not one extra comparison.

function quicksort(list) {
  if (list.length < 2) return list   // trivial case: 0 or 1 element, already sorted
  const pivot = list[0]
  const smaller = list.slice(1).filter(x => x <= pivot)
  const larger  = list.slice(1).filter(x => x >  pivot)
  return [...quicksort(smaller), pivot, ...quicksort(larger)]  // recombine
}

The trap, which the book dismantles with call stacks: if you always take the first element as pivot and the list is already sorted, each cut removes only one element, and quicksort degenerates into O(n²). The fix is one word: "if you always choose a random element as the pivot, quicksort completes in O(n log n) on average" (p. 71). In practice you will never write this code: your language's .sort() is already a finely tuned quicksort (or mergesort). But knowing what it does underneath is understanding why sorting costs O(n log n) and not O(n).

5The hash table is Maggie, who knows all the prices

The book introduces O(1) with a human. At the grocery store, looking up a price page by page is O(n); in a sorted book, O(log n) (idea 1's binary search). But your colleague Maggie has memorized everything: you say "milk?", she answers "$1.49" instantly, whether the catalog has 10 or 10,000 items. That is what O(1) means: the response time no longer depends on the size.

A hash table is Maggie in code. The secret is a hash function: a grinder that turns a key (text) into a slot number. You feed it "milk", it spits out, say, 3; you store the price in slot 3 of an array. To read it back, you run "milk" through the same grinder, it returns 3, and you go straight to slot 3. No walking: you compute the address instead of searching for it. Read, write, delete, all O(1) on average.

// you already use it every day, without seeing the hash function underneath
const prices = new Map()
prices.set('milk', 1.49)   // "milk" → internal slot → drop 1.49 there
prices.get('milk')        // "milk" → same slot → 1.49, in O(1)

"Hash tables are probably the most useful complex data structure you'll learn" (p. 78), and you meet them under a thousand names: Map and Set in JS, dict in Python, array in PHP. The book gives them three jobs:

lookups: mapping a key to a value (DNS, which translates web-developpeur.com into an IP address, is one giant hash table);
duplicate prevention: the voter who tries to vote twice is caught in O(1) (it is the Set that killed idea 1's O(n²) loop);
caching: memorize an expensive answer the first time so you never recompute it (Redis, OPcache, the browser's HTTP cache, all hash tables).

The fine print: two different keys can land on the same slot (a collision), then chained into a mini-list in that slot. Too many collisions and the O(1) degrades; that is why the engine watches the fill rate and grows the array when it passes ~70%.

A serene grocery clerk holds up a price tag with her eyes closed while a panicked colleague flips through an enormous price book in front of a line of amazed customers — Maggie answers in O(1), whatever the size of the catalog. Her colleague with the book is doing O(n).

6The mango seller and the piano: graphs without fear

A graph is just a way to model links: nodes (things) joined by edges (links between those things). A social network is a graph (people, friendships); so is a road map (cities, roads); so is the web (pages, links). As soon as you have "who is connected to whom", you have a graph, and two algorithms extract the essentials.

Breadth-first search (BFS) answers "what is the shortest path from A to B?". The book's example: you want a mango seller in your network. You check your direct friends first (ring 1), then their friends (ring 2), and so on, in ever-wider rings, using a simple queue (first in, first out). The first seller found is necessarily in the nearest ring: that is why "breadth-first search not only finds a path from A to B, it also finds the shortest" (p. 103), shortest in number of hops.

You explore ring by ring, never charging straight ahead. That is what guarantees the first result is the nearest.

One vital detail: mark people already checked. Peggy can show up via Alice AND via Bob; without marking, you recheck her endlessly and loop forever. (That "seen already?" is a Set, in O(1), idea 5.)

Dijkstra takes over when edges have a weight and a direction (A → B but not necessarily B → A — a directed graph). Hops stop being enough: on a map, the path with the fewest roads is not the shortest in kilometers. The book's example is a barter chain, from a music book up to a piano, via posters, LPs and drums, each trade costing a little money. Dijkstra always expands the cheapest node reached so far, and finds the $35 total path that a plain hop-count would miss. The limit, stated by the book: a negative weight breaks Dijkstra, "there's an algorithm for that! It's called the Bellman-Ford algorithm" (p. 130).

You use both without knowing: BFS whenever a site suggests "people you may know" (the friends-of-friends in ring 2), and Dijkstra on every Google Maps route (roads weighted by travel time).

7Greedy algorithms, and the wall of NP-complete problems

A greedy algorithm follows a dumb rule: at each step, take the best choice right now, without thinking about what comes next, and hope it adds up to the best overall result. You already do it: to give back 88 cents, you grab the biggest coin that fits (50), then the biggest that fits in what's left (20), and so on. That's greedy, and with our coins it always lands right.

The catch is that "lands right" is not guaranteed. Sometimes greedy gives the exact optimum (scheduling rooms: always pick the class that ends earliest leaves the most room for the next ones). Sometimes it is wrong, and the book proves it with its burglar: a knapsack that carries 35 lb, and three items:

a stereo worth $3,000, weighing 30 lb;
a laptop worth $2,000, 20 lb;
a guitar worth $1,500, 15 lb.

Greedy grabs the most expensive first, the stereo ($3,000, 30 lb). Only 5 lb left: nothing else fits. Loot: $3,000. Except laptop + guitar (20 + 15 = 35 lb exactly) was worth $3,500. The best local choice missed the best global one.

Then the book hits the wall: some problems explode no matter how clever you are. The travelling salesperson who must visit 10 cities by the shortest route faces 3,628,800 possible routes, and each added city multiplies that count. These problems have a name, NP-complete: no known method solves them fast as the size grows, we only know how to test an exploding number of combinations. The book's advice is liberating: for them, stop looking for the perfect solution, take a "good enough" greedy approximation and move on. Recognizing a wall and stopping optimizing is a skill in itself.

8Dynamic programming: a grid, not a formula

Dynamic programming (DP) is the clean answer to idea 7's greedy trap. The principle: instead of grabbing the best item in the moment, you break the big problem into a swarm of tiny sub-problems ("what's the best loot if I only had a 1 lb bag? a 2 lb bag? a 3 lb bag?"), solve each once, and store the answers in a grid you fill step by step. Each cell builds on cells already computed: no recompute, no missed combination.

Back to the burglar, but with a 4 lb knapsack and three items: a $1,500 guitar (1 lb), a $3,000 stereo (4 lb), a $2,000 laptop (3 lb). You draw a grid: one row per item, one column per bag capacity from 1 to 4 lb. Each cell answers one precise question: "with only the items seen up to this row, and a bag of this capacity, what's the best loot?". You fill it row by row, and the final answer emerges in the last cell, bottom right.

The whole of dynamic programming in twelve cells: sub-problems first, and the optimum emerges in the corner.

The honest warning that makes this the best DP introduction around: "there's no single formula for calculating a dynamic-programming solution" (p. 186). You design the grid per problem. The same grid technique then compares strings (longest common substring: the typo "hish" is closer to "fish" than to "vista"). Git diff uses DP to show file differences; spell checkers use a related measure, Levenshtein distance, to suggest corrections.

Three things I didn't know

The Fourier transform explained in one sentence the book borrows with glee: "given a smoothie, the Fourier transform will tell you the ingredients in the smoothie" (p. 207). It is how MP3 and JPG decide what to throw away.
Bloom filters can be wrong in only one direction: false positives possible (saying "yes" when the answer is "no"), false negatives impossible. Exactly what Google needs for "have I already crawled this URL?" at a fraction of the memory.
The farm-plot trick of chapter 4 is Euclid's algorithm, about 2,300 years old. The book never name-drops it, which is very much its style.

My take, honestly

I am the target audience: a web developer whose algorithm training happened on the job, with a faint guilt about it. This book removed the guilt in an afternoon. The doodles are not decoration, they are the explanation: the call stack as sticky notes and the DP grid filled cell by cell taught me more than any lecture would have.

The honest limits: it is a gateway, not a reference. No proofs, no amortized analysis, and the 1st edition's machine-learning chapter (k-nearest neighbors, KNN: classify a data point by the examples it most resembles) reads like a postcard from 2016, written before deep learning ate that world.

The 1st edition's Python 2 also shows its age, which the 2nd edition fixes along with adding the missing chapter on trees.

If sorting algorithms ever made you feel stupid, this is the antidote. Read it in a weekend, then let the heavier books make sense.

Odilon

Still relevant in 2026?

More than ever, with a twist: AI assistants now write the quicksorts for you, but they also confidently produce O(n²) code that works fine on 100 items and dies on a million. Recognizing growth rates in generated code is exactly the skill this book installs, and you cannot delegate it. The 2nd edition (2024) keeps the material current: Python 3, tree chapters, a rewritten NP-hard discussion. And the interview industry has not budged: hash tables, BFS and Big O remain the toll you pay at every door.

Who is it for?

Read it if

You are self-taught and the word "algorithm" still triggers mild dread
You have interviews coming: this is the fastest respectable preparation ramp
You review AI-generated code and want to spot the accidental O(n²)
You teach or mentor juniors: the pedagogy is worth stealing

Skip it if

You have a CS degree: you know all of this, in more rigorous form
You want proofs and amortized analysis: this book deliberately has none
You need advanced material: it stops where CLRS begins

Going further

The book's code is Python: my Python course gets you fluent enough to run every example. In the library, Python Crash Course shares the same gentle-first-book spirit, Designing Data-Intensive Applications shows what these structures become at planetary scale, and Write Great Code vol. 1 explains the memory hierarchy hiding behind every "constant factor".

Grokking Algorithms