I let Claude Code drive my side project from A to Z: the real surprises of AI pair programming

Sharebox started from a simple frustration: Plex and Jellyfin are overkill when you just want to send someone a movie link. The project took shape over a few intense days of work, with an AI as a permanent pair. Not vibe coding, not blind delegation — something in between, and quite different from both.

What I learned about this way of working doesn't match what you usually read. Not the naive enthusiasm ("AI codes for you"), not the posture scepticism ("it's worthless"). It's more nuanced, and very visible in the git history.

A project, not a prompt

The stack choice was deliberate and contrarian: pure PHP 8.1, SQLite in WAL mode, zero external dependencies. No framework, no composer at the start. The AI pushed for Laravel several times. I said no.

That's where real pair programming begins. The AI optimises for what it's seen most in training data: Laravel projects, ORMs, abstraction layers. It's right 80% of the time. For a self-hosted file sharing tool where the main constraint is "no system requirements except PHP and SQLite", it's wrong.

The result: a SQLite database created automatically on first launch, a config file auto-generated from environment variables, Docker-ready without it being the only deployment option. The AI contributed to implementing these choices once they were set — but the choices themselves, I made those.

AI pair programming only works if you know where you're going. The AI is an excellent executor and a decent devil's advocate. It's not an architect.

What it concretely changes

Three real problems where the AI pair genuinely accelerated things.

The three streaming modes

Sharebox streams video. The problem: an H.264/AAC file in an MKV container can't be streamed directly in a browser. An H.265 file can't be played on iOS. A file with image-based subtitles (PGS) can't be displayed natively.

The solution: three modes. Native for files that work as-is (MP4/H.264, zero CPU). Remux for container changes without re-encoding (MKV → MP4, minimal CPU). Transcode for everything else (ffmpeg, high CPU).

The AI produced the codec detection code via ffprobe quickly. What would have taken a day of documentation reading took two hours. The decision logic between the three modes — I wrote that, after challenging every AI proposal that mixed up the cases.

File-locking for FPM workers

PHP-FPM creates parallel workers. When a browser opens two tabs or an iOS player makes parallel range requests, multiple workers can simultaneously start the same ffmpeg process. The result: 800% CPU, corrupted output files.

I described the problem. The AI proposed a Redis mutex. I said no — Redis is a dependency, we stay dependency-free. It proposed a file lock. We implemented it together: a .lock file per segment, flock() with LOCK_EX | LOCK_NB, timeout if the lock isn't acquired within 30 seconds.

The pair worked here because concurrency problems are exactly where AI excels — it's a known, documented pattern with proven solutions. My role was to constrain the solution to the project's rules.

Subtitle caching

Extracting subtitles from an MKV with ffmpeg takes 15–20 seconds for a 2-hour file. Unacceptable at play time. The solution: background extraction on first access, disk cache afterwards.

The AI wrote the cache mechanism. Then we discovered a bug: the browser was caching the empty HTTP response (during extraction), and subsequent requests returned empty even after extraction completed. The AI proposed Cache-Control: no-cache headers. Correct. The final gain: 20 seconds of latency eliminated on subsequent plays.

The limits nobody talks about

AI pair programming has blind spots. Not the obvious bugs — the AI often fixes those faster than I do. The structural blind spots.

Project context doesn't persist. Every session, the AI starts from scratch. It doesn't know you decided not to add an authentication system. It doesn't know deployment must fit on a €3 VPS. It will propose JWTs, refresh tokens, Redis for sessions. You have to re-explain constraints every time — or use a well-maintained CLAUDE.md.

It doesn't maintain scope discipline. This is the most insidious problem. "We could also add..." is a phrase the AI uses often. A notification system. A full REST API. A playlist system. Each suggestion is reasonable in isolation. Together, they turn a simple tool into a complexity monster. Saying no regularly is a skill in itself in this way of working.

It agrees too easily. If you propose a flawed approach, it'll implement it cleanly. A human pair would have said "wait, are you sure?" The AI by default says "here's the implementation." You have to explicitly ask it to challenge your choices.

The pattern that works

After several sessions on sharebox, the pattern that produces the best results is this:

Pair review, not delegation. "Write me the streaming function" gives something that works but doesn't respect the project's constraints. "Here's my implementation, what's problematic?" gives something useful.

Describe the problem, not the solution. "I have a concurrency problem on FPM workers when multiple requests arrive simultaneously" is far better than "implement a mutex." The second formulation gives a generic answer. The first triggers real thinking about the context.

Never accept the first solution. Not because it's bad — it's often correct. But because there's almost always a constraint the AI hasn't accounted for. Asking "what are the alternatives?" and "what are the limits of this approach?" systematically reveals something useful.

Keep control of git. Every commit is a decision. The AI doesn't commit — you do. This minimal friction is healthy: it forces you to read what's going into the repo rather than merging blindly.

What the git log reveals

Sharebox's recent commit history shows something interesting: the latest commits are almost all corrections and hardening, not features.

Sanitising filenames in Content-Disposition to prevent injection. Fixing an iOS hoisting bug where browser detection was called before initialisation. Fixing a subtitle cache poisoning caused by browser caching. Fixing a duplicate ffmpeg process spawned from parallel requests.

That's the reality of AI pair programming on a project built over a few days: the iteration speed is real. You reach a first working state quickly. Then the consolidation work begins — and there, the AI is less useful than during the build phase. Timing bugs, race conditions, browser edge cases: the AI proposes solutions, but diagnosing the actual problem often requires careful log reading that the AI can't do for you.

The refactoring is visible too: over 1000 lines of JS/CSS extracted into separate files. That cleanup, I wanted it. The AI executed it well, but it didn't need it — it would have kept adding code to the same file without complaint.

Conclusion

Sharebox works. It runs on a NAS, serves movies, handles subtitles, survives parallel requests. It took a fraction of the time it would have taken alone.

What AI pair programming changed is iteration speed on known problems. Video streaming, concurrency handling, codec detection: documented domains where AI is operational fast. What it didn't change: scope decisions, architecture, the discipline not to over-engineer. Those decisions remain human, and they have direct impact on the quality of the result.

The real gain isn't "AI codes for me." It's "I have a pair available 24/7 who knows most patterns, doesn't get tired, and can implement what I validate." That's different. And it's already a lot.