Pagination, filtres et erreurs

8 mégaoctets dans la figure du mobile

Reviens à la bibliothèque. Elle compte maintenant 12 000 livres dans son catalogue. Un client mobile appelle GET /api/livres, tout content. Et là, le serveur lui renvoie… les 12 000, d'un seul bloc. Une réponse de 8 mégaoctets. Le téléphone rame, la 4G fond, l'écran reste blanc cinq secondes. Personne ne lira jamais 12 000 livres d'un coup, mais le serveur les a quand même tous envoyés.

Le problème n'est pas la collection. C'est de la servir entière. Une collection, ça ne se renvoie pas : ça s'apprivoise. Et l'outil pour l'apprivoiser, tu le connais déjà depuis la leçon 2 : la query string. Souviens-toi de la règle posée là-bas et du cours HTTP (anatomie d'une URL) : la query string ne désigne pas une ressource, elle filtre une ressource. /api/livres reste la collection ; ?page=2 dit juste quelle tranche tu veux.

Dans cette leçon : découper la collection en pages, la filtrer, la trier, le tout par la query string. Puis, quand quelque chose tourne mal, renvoyer une erreur qu'un programme sait lire, pas juste un humain. C'est tout le sujet du problem+json.

Paginer : découper la collection en tranches

La pagination la plus simple, celle par laquelle on commence, s'appelle la pagination par offset. Deux paramètres dans la query string suffisent :

GET /api/livres?page=2&limit=20

limit dit combien d'éléments par tranche (ici 20). page dit quelle tranche (ici la deuxième, donc les livres 21 à 40). Le serveur ne renvoie que ces 20 livres-là.

Mais 20 livres tout nus ne suffisent pas. Le client a besoin de savoir où il en est : est-il à la fin ? Combien de pages reste-t-il ? C'est pour ça que la réponse n'est pas qu'un tableau : elle inclut des métadonnées.

{
  "data": [ ... 20 livres ... ],
  "page": 2,
  "limit": 20,
  "total": 12000
}

Avec total: 12000 et limit: 20, le client calcule tout seul qu'il y a 600 pages. Il sait qu'il est à la page 2, donc qu'il peut demander la 3. Les données voyagent dans data, le contexte voyage à côté. C'est ce contexte qui transforme une liste muette en une collection navigable.

Borne toujours limit côté serveur. Si tu fais aveuglément confiance au client, un petit malin enverra ?limit=999999 pour rapatrier tout le catalogue d'un coup et faire transpirer ta base. Fixe un plafond (par exemple 100) : si la valeur demandée dépasse, le serveur la ramène silencieusement à 100. La pagination protège ton serveur autant qu'elle soulage le client : sans plafond, elle ne protège plus personne.

Pour aller plus loin : sur des collections énormes ou des flux temps réel, la pagination par offset montre ses limites, et on lui préfère la pagination par curseur (le client renvoie un pointeur opaque vers le dernier élément vu plutôt qu'un numéro de page). C'est hors périmètre ici ; retiens juste que le mot existe.

Filtrer et trier : la même query string

La query string ne sert pas qu'à découper. Elle filtre aussi. Tu veux les livres disponibles écrits par Herbert ?

GET /api/livres?statut=disponible&auteur=Herbert

Plusieurs filtres se cumulent par un ET logique : disponible et de Herbert. Chaque paramètre rétrécit la sélection. Et tu peux trier le résultat avec une convention répandue : un champ précédé d'un - trie en descendant.

GET /api/livres?sort=-date_ajout

Ici, les plus récents d'abord. Sans le - (?sort=date_ajout), ce serait l'ordre croissant, les plus anciens d'abord. Le - est une simple convention, mais elle est partout : autant l'adopter.

Valide chaque paramètre, ne fais jamais confiance à l'URL. Si le client demande ?sort=prix alors que tes livres n'ont pas de champ prix, ne renvoie pas une erreur 500 ni une liste au hasard. Le client a bien formé sa requête (sa syntaxe est correcte), mais il vise un champ qui n'existe pas : c'est exactement le cas du 422 Unprocessable Content vu à la leçon 4. Une requête comprise mais inapplicable se répond 422, jamais 400.

Des erreurs lisibles par le code : problem+json

Justement, parlons des erreurs. Jusqu'ici on renvoyait un bon status code (422, 404…). Mais le code seul ne dit pas pourquoi. Et trop d'API se contentent d'un message texte improvisé, différent à chaque endpoint : impossible pour un programme client de réagir proprement. Il existe un format standard pour ça, et il a un nom : problem+json, défini par la RFC 9457.

La RFC 9457 a été publiée en juillet 2023 et remplace l'ancienne RFC 7807, dont elle reprend l'essentiel. La réponse porte le Content-Type application/problem+json : c'est lui qui prévient le client « ceci est une erreur structurée, pas une réponse normale ». Côté adoption, Spring 6 le gère nativement, et Cloudflare l'a adopté pour ses erreurs d'API en mars 2026.

Le format définit cinq membres standard, tous optionnels, que tout le monde sait lire :

type : une URI qui identifie le genre de problème (un identifiant stable, pas forcément une page web réelle).
title : un résumé court et lisible, toujours le même pour un type donné.
status : le code HTTP, recopié dans le body pour rester sous la main.
detail : l'explication précise de cette occurrence-là.
instance : une URI qui pointe vers l'occurrence exacte de l'erreur.

Voici une erreur 422 complète, pour un ISBN invalide à la création d'un livre :

HTTP/1.1 422 Unprocessable Content
Content-Type: application/problem+json

{
  "type": "https://biblio.fr/erreurs/isbn-invalide",
  "title": "ISBN invalide",
  "status": 422,
  "detail": "L'ISBN « 978-X » ne respecte pas le format à 13 chiffres.",
  "instance": "/api/livres"
}

La force du format tient en une phrase : le client peut brancher du code sur le type. Il teste l'URI .../isbn-invalide et déclenche le bon traitement, sans jamais parser une phrase en français. Le detail reste là pour l'humain qui débugue ; le type est pour la machine qui réagit. Une erreur, deux lecteurs, un seul format que tout le monde connaît.

Prédis avant de lire

Tu appelles ?page=2&limit=3 sur les 12 000 livres. Combien d'éléments y aura-t-il dans data, et comment le client saura-t-il qu'il existe d'autres pages après ?

Voir la réponse

3 éléments dans data (les livres 4, 5 et 6 : page 2 d'une tranche de 3). Le client sait qu'il reste des pages grâce aux métadonnées : avec total: 12000 et limit: 3, il calcule 4 000 pages au total. Il est à la page 2, il peut donc en demander 3 998 autres. Sans ces métadonnées, il recevrait 3 livres sans aucun moyen de savoir s'il y en a d'autres.

Une collection découpée en pages

Garde cette image : la collection entière reste à gauche, intacte. La query string en prélève une tranche, et la réponse l'accompagne du bloc de métadonnées qui dit où on en est.

La collection reste entière ; la query string en prélève une tranche, et les métadonnées disent où on en est.

À toi : apprivoiser la collection au curl

Un terminal simulé, face à l'API de la bibliothèque. Tu vas paginer, filtrer et trier, voir le serveur borner un limit abusif, puis déclencher une vraie erreur problem+json sur un tri impossible.

🖥️ Terminal simulé · apprivoiser la collection

8 megabytes in the mobile's face

Back to the library. Its catalogue now holds 12,000 books. A mobile client calls GET /api/books, happy as can be. And there, the server sends back… all 12,000, in one chunk. An 8-megabyte response. The phone chokes, the 4G melts, the screen stays blank for five seconds. Nobody will ever read 12,000 books at once, yet the server shipped them all anyway.

The problem isn't the collection. It's serving it whole. A collection isn't returned — it's tamed. And the tool to tame it you already know from lesson 2: the query string. Remember the rule set there and in the HTTP course (anatomy of a URL): the query string doesn't name a resource, it filters one. /api/books stays the collection; ?page=2 just says which slice you want.

In this lesson: slicing the collection into pages, filtering it, sorting it, all through the query string. Then, when something goes wrong, returning an error that a program can read, not just a human. That's the whole point of problem+json.

Paginating: slicing the collection

The simplest pagination, the one you start with, is called offset pagination. Two query-string parameters are enough:

GET /api/books?page=2&limit=20

limit says how many items per slice (here 20). page says which slice (here the second, so books 21 to 40). The server returns only those 20 books.

But 20 bare books aren't enough. The client needs to know where it stands: is it at the end? How many pages are left? That's why the response isn't just an array: it includes metadata.

{
  "data": [ ... 20 books ... ],
  "page": 2,
  "limit": 20,
  "total": 12000
}

With total: 12000 and limit: 20, the client works out on its own that there are 600 pages. It knows it's on page 2, so it can ask for page 3. The data travels in data, the context travels alongside. It's that context that turns a mute list into a navigable collection.

Always cap limit on the server side. If you blindly trust the client, some smart aleck will send ?limit=999999 to pull the whole catalogue at once and make your database sweat. Set a ceiling (say 100): if the requested value goes over, the server silently brings it back to 100. Pagination protects your server as much as it relieves the client: without a ceiling, it protects no one.

Going further: on huge collections or real-time feeds, offset pagination shows its limits, and cursor pagination is preferred (the client sends back an opaque pointer to the last item seen rather than a page number). It's out of scope here; just remember the term exists.

Filtering and sorting: the same query string

The query string isn't only for slicing. It filters too. Want the available books written by Herbert?

GET /api/books?status=available&author=Herbert

Several filters combine with a logical AND: available and by Herbert. Each parameter narrows the selection. And you can sort the result with a widespread convention: a field prefixed with a - sorts descending.

GET /api/books?sort=-date_added

Here, the most recent first. Without the - (?sort=date_added), it would be ascending, oldest first. The - is just a convention, but it's everywhere: might as well adopt it.

Validate every parameter, never trust the URL. If the client asks for ?sort=price while your books have no price field, don't return a 500 nor a random list. The client formed the request correctly (its syntax is fine), but it targets a field that doesn't exist: that's exactly the 422 Unprocessable Content case from lesson 4. A request understood but inapplicable answers 422, never 400.

Errors a program can read: problem+json

Speaking of errors. So far we returned a good status code (422, 404…). But the code alone doesn't say why. And too many APIs settle for an ad-hoc text message, different at every endpoint: impossible for a client program to react cleanly. There's a standard format for this, and it has a name: problem+json, defined by RFC 9457.

RFC 9457 was published in July 2023 and replaces the older RFC 7807, whose essentials it keeps. The response carries the Content-Type application/problem+json: that's what warns the client "this is a structured error, not a normal response". On the adoption side, Spring 6 supports it natively, and Cloudflare adopted it for its API errors in March 2026.

The format defines five standard members, all optional, that everyone knows how to read:

type — a URI identifying the kind of problem (a stable identifier, not necessarily a real web page).
title — a short, readable summary, always the same for a given type.
status — the HTTP code, copied into the body to keep it at hand.
detail — the precise explanation of this occurrence.
instance — a URI pointing to the exact occurrence of the error.

Here's a complete 422 error, for an invalid ISBN when creating a book:

HTTP/1.1 422 Unprocessable Content
Content-Type: application/problem+json

{
  "type": "https://biblio.fr/errors/invalid-isbn",
  "title": "Invalid ISBN",
  "status": 422,
  "detail": "ISBN \"978-X\" does not match the 13-digit format.",
  "instance": "/api/books"
}

The strength of the format fits in one sentence: the client can wire code onto type. It tests the URI .../invalid-isbn and triggers the right handling, never parsing an English sentence. The detail stays for the human debugging; the type is for the machine reacting. One error, two readers, one format everyone knows.

Predict before reading on

You call ?page=2&limit=3 on the 12,000 books. How many items will be in data, and how will the client know there are more pages after?

Show the answer

3 items in data (books 4, 5 and 6: page 2 of a 3-item slice). The client knows pages remain thanks to the metadata: with total: 12000 and limit: 3, it computes 4,000 pages total. It's on page 2, so it can ask for 3,998 more. Without that metadata, it would get 3 books with no way to know whether others exist.

A collection sliced into pages

Keep this picture: the whole collection stays on the left, intact. The query string takes one slice off it, and the response carries it along with the metadata block that says where you stand.

The collection stays whole; the query string takes one slice, and the metadata says where you stand.

Your turn: tame the collection with curl

A simulated terminal, facing the library's API. You'll paginate, filter and sort, watch the server cap an abusive limit, then trigger a real problem+json error on an impossible sort.

🖥️ Simulated terminal · tame the collection

Prochaine étape

Ton API est propre et lisible. Trop, peut-être : pour l'instant, tout le monde peut écrire dedans. Leçon 7 : les tokens, et la différence entre être identifié et avoir le droit.

Leçon 7 : Protéger son API : les tokens →