The first real test of ShareBox
with an actual file was an 8 GB MKV — HEVC encode, DTS audio, PGS subtitles burned from
a Japanese Blu-ray. The <video src="..."> I had put in place opened,
spun for two seconds, and stopped in complete silence. No error event,
no console message. Just a spinner going nowhere.
ShareBox started from a simple premise — share large files without a third-party cloud.
Video streaming was supposed to be one feature among others. Except real-world files are
not clean H.264/AAC MP4s: there are MKVs, HEVC encodes, Dolby audio tracks, bitmap
subtitles. The native browser player gives up without a word. So I built a full player:
on the server side, download.php orchestrates three ffmpeg streaming modes.
On the client side, a JS state machine handles mode selection, stall recovery,
image subtitle burn-in, and UX. This post documents what I built — and where it cost
more time than expected.
Three server-side streaming modes
The entry point is download.php?stream=MODE. Three modes are available,
selected dynamically by the JS based on the detected codec.
native — X-Accel-Redirect
For files already playable by the browser (MP4 H.264 + AAC), we delegate to nginx via
X-Accel-Redirect. PHP never touches the bytes, no ffmpeg, no CPU cost.
HTTP byte-range works normally, seeking is instant.
remux — zero-cost repackaging
An H.264 MKV cannot be streamed natively by the browser, but the codecs are compatible. The solution: repackage on the fly into a fragmented MP4 without re-encoding.
ffmpeg -i input.mkv \
-c:v copy -c:a aac \
-movflags frag_keyframe+empty_moov+default_base_moof \
-min_frag_duration 300000 \
-f mp4 pipe:1
-c:v copy: zero video re-encoding. -c:a aac: audio is converted
if needed (DTS, AC3 → AAC). The result is a fragmented MP4 piped directly into the HTTP
response. CPU cost: near zero for video, a few percent for audio conversion.
The flags frag_keyframe+empty_moov+default_base_moof are critical. Without
empty_moov, the browser waits for the end of the file to read the moov atom
(metadata) — which blocks indefinitely on a pipe. Without frag_keyframe,
fragments don't start on keyframes and seeking breaks.
A pitfall that cost me an hour: I had added -fflags +genpts and
first_pts=0 to normalize timestamps. The reasoning seemed sound — some MKVs
have PTS values that don't start at zero, and I wanted to avoid surprises during seeking.
In practice, these options modify PTS in a way that browsers misinterpret on a fragmented
stream: video and audio gradually desync after each seek. Removed, the problem disappeared
immediately.
Normalizing timestamps on a fragmented pipe means "fixing" something the browser already handles fine — and introducing a desync it has no way to correct.
transcode — for incompatible codecs
Unsupported HEVC, VP9, AV1, or Dolby audio that won't pass through: we re-encode.
ffmpeg -i input.mkv \
-c:v libx264 -preset ultrafast -crf 23 \
-c:a aac \
-movflags frag_keyframe+empty_moov+default_base_moof \
-min_frag_duration 300000 \
-f mp4 pipe:1
-preset ultrafast sacrifices compression to minimize startup latency.
-crf 23 gives acceptable quality without blowing up the bitrate.
The video is playable within a few seconds, not after a full encode pass.
Concurrency semaphore and clean disconnect
Each ffmpeg process consumes CPU for the entire duration of the stream. Without a limit,
ten concurrent downloads mean ten ffmpeg processes fighting over CPU cores. I added a
PHP semaphore (via sem_acquire) capped at 4 concurrent processes.
$sem = sem_get(ftok(__FILE__, 'f'), 4);
sem_acquire($sem);
register_shutdown_function(function () use ($sem, $proc) {
if (is_resource($proc)) {
proc_terminate($proc);
}
sem_release($sem);
});
// Streaming loop
while (!feof($stdout) && !connection_aborted()) {
echo fread($stdout, 65536);
flush();
}
The connection_aborted() check inside the loop is essential: when the client
closes the tab, PHP detects the disconnect, exits the loop, the shutdown function kills the
ffmpeg process, and releases the semaphore slot. Without this, orphan ffmpeg processes
pile up until reboot.
Infrastructure adjustment: pm.max_children in PHP-FPM was bumped from 5 to 25.
Each active stream monopolizes one FPM worker for its entire duration — unavoidable with
synchronous streaming. With 5 workers, the 6th visitor waits in the dark.
SQLite ffprobe cache
Before choosing the streaming mode, the JS needs to know the file's codec.
ffprobe answers that, but takes 2 to 12 seconds on a large file
(disk access, header parsing). My first version did trial-and-error detection on the
JS side: try native, wait 2 seconds, if nothing plays → try remux. In practice, 2 seconds
is a long wait, and it generates false positives on slow connections. The right solution:
know the codec before starting anything.
SQLite cache keyed by (path, mtime). The key includes the file's
mtime to automatically invalidate the cache if the file is replaced.
Cache hit: ~100ms. Cold: 2-12s (once per file, never again).
function probeVideo(string $path): array {
$db = new PDO('sqlite:' . PROBE_CACHE_DB);
$key = hash('xxh64', $path . '|' . filemtime($path));
$row = $db->query(
"SELECT data FROM probe_cache WHERE key = " . $db->quote($key)
)->fetch();
if ($row) {
return json_decode($row['data'], true);
}
$cmd = 'ffprobe -v quiet -print_format json -show_streams ' . escapeshellarg($path);
$result = json_decode(shell_exec($cmd), true);
$db->prepare("INSERT OR REPLACE INTO probe_cache (key, data) VALUES (?, ?)")
->execute([$key, json_encode($result)]);
return $result;
}
The JS state machine
The client-side player is built around a state object S that evolves
as video events fire. No framework, no external state manager — just a mutable object
watched by targeted event listeners.
const S = {
step: 'native', // mode currently being tried
confirmed: null, // confirmed working mode (skip re-test on seek)
stallCount: 0, // watchdog retry counter
seekPending: false, // seek in progress (debounce)
};
Probe-driven mode selection
chooseModeFromProbe(streams) inspects the streams returned by
ffprobe and picks the mode:
function chooseModeFromProbe(streams) {
const video = streams.find(s => s.codec_type === 'video');
const audio = streams.find(s => s.codec_type === 'audio');
const vcodec = video?.codec_name;
const acodec = audio?.codec_name;
const container = currentFile.ext.toLowerCase();
if (vcodec === 'h264' && container === 'mkv') return 'remux';
if (vcodec === 'h264' && container === 'mp4' && acodec === 'aac') return 'native';
if (vcodec === 'h264' && container === 'mp4') return 'transcode';
// VP9, AV1, HEVC: try native if the browser claims support
const probe = document.createElement('video').canPlayType(mimeFor(vcodec));
return probe !== '' ? 'native' : 'transcode';
}
Silent HEVC hardware decode failure
Safari and Chrome on Windows sometimes claim HEVC support via canPlayType().
When hardware decoding silently fails on these browsers, it does so with extraordinary
discretion: no error event, no stalled event, nothing in the
console. Audio starts playing normally. The image freezes on the first frame — or worse,
stays black. video.videoWidth stays at 0, and that's the only available
signal.
setTimeout(() => {
if (video.videoWidth === 0 && S.step === 'native') {
console.warn('Silent HEVC failure → cascade to transcode');
S.step = 'transcode';
reloadStream();
}
}, 1500);
A codec that "supports" a format but fails to decode silently is exactly as useful as one that doesn't support it — except it's much harder to detect.
Stall watchdog with exponential backoff
My first watchdog implementation used a fixed timeout per mode. The problem surfaced immediately on large HEVC files: ffmpeg takes several seconds to start a transcode (parsing, memory allocation, producing the first keyframe). The watchdog would lose patience and relaunch the stream. Two concurrent ffmpeg processes started up, competed for CPU cores, slowed each other down. The semaphore saturated on the third retry. The user saw a spinner forever with no explanation — while on the server side, several zombie ffmpeg processes waited for their four slots.
Exponential backoff per mode fixed that: the first retry waits 20s in transcode mode, the second 40s, capped at 120s. A legitimately slow-starting ffmpeg has time to produce its first fragments before the watchdog panics.
const BASE_TIMEOUTS = { native: 5, remux: 10, transcode: 20, burnSub: 30 };
function startWatchdog() {
clearTimeout(stallTimer);
const base = BASE_TIMEOUTS[S.confirmed ?? S.step] ?? 15;
const timeout = Math.min(base * Math.pow(2, S.stallCount), 120) * 1000;
stallTimer = setTimeout(() => {
S.stallCount++;
console.warn(`Stall #${S.stallCount}, retrying from ${video.currentTime}s`);
reloadStream(video.currentTime);
}, timeout);
}
video.addEventListener('timeupdate', () => startWatchdog());
video.addEventListener('waiting', () => startWatchdog());
The watchdog resets on every timeupdate (active playback). If it fires,
we restart the stream from the current position. stallCount doubles the
delay on each retry, capped at 120s.
Subtitles: text and image tracks
Text subtitles (SRT/ASS) via WebVTT
Text subtitles are extracted by ffmpeg as WebVTT on demand and served as plain text.
On the JS side, I don't use the native <track> element:
positioning is too limited and ASS rendering is non-existent.
I use an absolutely-positioned <div> overlay, positioned via
getBoundingClientRect with an offset for the control bar.
For files with thousands of cues, finding the active cue on each timeupdate
would be O(n). I use a binary search for the initial position, then a pointer that
advances linearly — O(log n) on the first seek, O(1) afterwards.
function findCueIndex(cues, time) {
let lo = 0, hi = cues.length - 1;
while (lo < hi) {
const mid = (lo + hi) >> 1;
if (cues[mid].end < time) lo = mid + 1;
else hi = mid;
}
return lo;
}
let cuePtr = 0;
video.addEventListener('timeupdate', () => {
const t = video.currentTime;
while (cuePtr < cues.length - 1 && cues[cuePtr].end < t) cuePtr++;
if (cues[cuePtr].start > t + 1) cuePtr = findCueIndex(cues, t);
const cue = cues[cuePtr];
subOverlay.textContent = (cue.start <= t && t <= cue.end) ? cue.text : '';
});
Font size is proportional to video width (2.5%, min 13px),
recalculated by a ResizeObserver on the container element.
Image subtitles (PGS/VOBSUB) — burn-in
PGS (Blu-ray) and VOBSUB (DVD) subtitles are bitmap images: CSS overlay is not an option.
The only solution is to burn them directly into the video via ffmpeg, which triggers
a full transcode with the subtitles filter.
This cost me half a day. On my test files — extracts I had prepared myself — the
subtitles filter worked flawlessly. Then I tested on real Blu-ray files:
subtitles shifted, cropped, sometimes absent. The problem: PGS subtitle canvas dimensions
don't always match the video dimensions. A 1080p Blu-ray MKV can carry a subtitle canvas
at 1920x1080, but also at 1920x816 or any value inherited from the original mastering.
A plain subtitles filter stretches or crops without warning. The fix,
found after a while in a 2019 ffmpeg-user mailing list thread, is scale2ref:
fit the subtitle canvas to the video resolution before the overlay.
ffmpeg -i input.mkv \
-filter_complex \
"[0:v][0:s:0]scale2ref[vid][sub]; \
[sub]scale=iw:ih[sub2]; \
[vid][sub2]overlay" \
-c:v libx264 -preset ultrafast -crf 23 \
-c:a aac \
-movflags frag_keyframe+empty_moov+default_base_moof \
-f mp4 pipe:1
On the JS side, detecting an image subtitle track sets S.step = 'burnSub'
with the track index (burnSub=N in the URL). The watchdog timeout rises
to 30s for this combination — transcode startup with embedded subtitles is slower
than a plain transcode.
UX: controls, mode badge, fullscreen
A few UX decisions are worth explaining. The mode badge (green = REMUX, orange = TRANSCODE, grey = NATIVE) is clickable to cycle modes manually. It's primarily a debugging tool — when remux produces an audio artifact or transcode is inexplicably slow, one click lets you force the other mode without reloading the page. In practice, end users reach for it too when something isn't working.
Fullscreen is triggered on .player-card (the parent div), not on
<video> directly. Calling element.requestFullscreen()
on <video> causes the browser to render its own controls on top of
the custom ones — on Firefox the result is particularly chaotic. On the parent div,
custom controls stay visible and functional. Auto-hide kicks in after 3 seconds of mouse
inactivity, with the cursor hidden. Single tap = play/pause, double tap = fullscreen,
with a 250ms debounce to tell them apart. Keyboard shortcuts (Space/K, left/right arrow
±10s, F, M) map to the same actions.
One iOS detail worth its weight in debugging time: Accept-Ranges: none in
PHP response headers for remux and transcode modes. Safari on iOS sends Range
requests on <video> elements even when the source is a pipe. Without
this header, Safari attempts a byte-range on a non-seekable stream and gets a broken
response — playback dies within the first second. Volume, mute state, and playback speed
are persisted in localStorage across sessions. The
requestAnimationFrame throttle on timeupdate prevents stacking
progress bar updates at 60 fps during active playback.
Conclusion
A video player that "actually works" on a heterogeneous file library is much more surface area than expected. Most of the code isn't in the player itself — it's in handling the edge cases: silent codec failures, bitmap subtitles, orphan processes, iOS doing byte-range on a pipe.
If I were starting over: implement the ffprobe probe and SQLite cache first, before writing a single line of JS. Having codec truth available in 100ms completely changes the mode selection logic and simplifies everything downstream. The state machine, the watchdog, the subtitle burn-in — all of it becomes predictable once you know exactly what you're dealing with before you start.
Full source code is on GitHub (ohugonnot/sharebox).