I Rebuilt My Courses on the Science of Learning (2026)

I had already built the courses. Interactive, with a live code editor, quizzes, diagrams, bilingual. I was rather proud of them. Then I read an MIT EEG study on cognitive debt, and another on the widening gap between learners in the age of AI. They made me uneasy, because they described a trap my own courses might be falling into: giving the illusion of learning while anchoring nothing for the long term.

So I reworked everything. Here is what I changed, why, and how I rolled it out across 109 lessons without spending three months on it. It's a retrospective that's as pedagogical as it is technical. (For the science itself, I cover it in a dedicated article.)

The problem my courses couldn't see

My lessons were smooth. You read a clear explanation, watched a working example, ticked a multiple-choice quiz, moved on. Pleasant. And that's exactly the trap: the science of learning is clear, felt fluency is not learning. The easier it feels in the moment, the less it stays.

My quizzes came right after the theory: they tested working memory, not durable memory. My editors showed the solution before the learner had made the effort to find it. My multiple-choice was recognition (pick the right answer), not production (write it from memory). And nothing brought notions back over time. In short, I was sanding away every friction, when the good frictions are precisely what carves things in.

Five mechanisms, five desirable difficulties

Robert Bjork calls them desirable difficulties: deliberate efforts that make learning harder in the moment and far more durable afterward. I implemented five, each wired into the right moment of a lesson.

Each mechanism targets a precise moment: predict, produce, re-explain, judge, then come back.

1. Predict before running. Before running a snippet, the preview is blurred and the learner must write what they think it will print. That's the generation effect: attempting before seeing primes the brain to encode, even when you're wrong.

2. Free recall. At the end of a lesson, a question you answer from memory, without looking, then self-assess honestly. Retrieving information carves it in far more than rereading it.

3. Re-explaining after the AI. After each box where the learner queries an AI, we ask them to re-explain in their own words. That closes the loop: checking you understood the answer, not just read it.

4. The accept / reject journal. We show a snippet produced by the AI, sometimes subtly flawed, sometimes fine. The learner decides and justifies, then compares with a senior dev's take. That's negative expertise: learning to say no to generated code.

5. Spaced review. Self-assessed notions return in a review hub, re-surfaced at growing intervals, right when you were about to forget them.

A spaced-repetition hub with no backend

The biggest piece is the fifth. Real spaced repetition needs to remember each notion's state: when it was seen, its interval, its next due date. With no database and no user account, I put it all in the browser's localStorage.

The key was freezing a data contract: each review card is self-contained. When the learner self-assesses, we store not only the verdict, but the question and answer in both languages, the link to the lesson, and the scheduling state. The hub can then replay the card outside its original lesson, entirely client-side. The accepted trade-off: history stays on the device. For a free, no-signup path, that's the right compromise.

Industrializing across 109 lessons with agents

Adding these blocks by hand across 109 lessons, in two languages, writing content specific to each topic, was weeks of work. I first validated the components on one pilot lesson, then orchestrated the rest with AI agents: one agent per lesson, each reading the pilot lesson as a reference, applying a relevance rubric, and writing content adapted to the topic.

The golden rule: don't slap all five mechanisms everywhere by reflex. Free recall almost always; a "predict" only if there's a runnable editor; a re-explanation only if there's an AI box; a journal only if a judgeable snippet exists within scope. Two excellent components beat four lukewarm ones.

The result surprised me with its contextual accuracy. On a Git lesson, the journal doesn't judge "code" but a command or a commit. On a SQL lesson, it flags a DELETE without a WHERE. On PHP, a concatenated query vulnerable to injection. Each agent adapted the snippet to the exact topic of its lesson. The irony wasn't lost on me: I used AI to industrialize courses that teach, precisely, not to delegate to it blindly.

What the tests caught

I verified everything in production with automated tests that really interact with the site: the prediction unlocking the preview, the answer revealing itself, the verdict persisting after a reload, French / English parity, zero console errors. In total, more than 230 green tests across all the courses.

And they caught real traps, all of the same sneaky kind: a hidden attribute overridden by a display: flex, so a zone showing up too early. A detail invisible when reading the code, obvious in a test. An accessibility audit also revealed an unlabeled editor and contrasts that were too weak, fixed right away. The meta lesson: behavior tests say nothing about whether a color is readable. You also have to look.

Conclusion

The hardest part wasn't writing the components' code. It was accepting that my courses, which I thought were good, were too comfortable to actually teach. Perceived quality and real effectiveness are two different things, and only the second one counts.

There remains the real question, the one no test settles: will people come back to review? Spaced repetition is only worth it if you return. That's why the final brick isn't an algorithm, but a simple "to review" badge reminding you that a notion is waiting. The best learning engine is useless if you forget to start it.