Inside the Jumbo Gemini scanner.
A production AI vision component inside a retail POS. Cashier snaps a product, the system identifies it, and the cart line lands in under a second. The interesting part is what happens when the model is wrong.
{
"candidates": [
{ "sku": "PLY-3421", "name": "Wooden Train Set, 24 pc", "confidence": 0.94 },
{ "sku": "PLY-3418", "name": "Wooden Train Set, 12 pc", "confidence": 0.41 }
],
"notes": "High confidence on lead candidate. Box layout matches reference image."
}- Why
- Plenty of toy SKUs ship without scannable barcodes. Manual lookup at the register slows checkout.
- Model
- Gemini 2.5 Flash, vision input, JSON-structured output.
- Posture
- Async, time-bounded, never on the checkout critical path.
The problem
Toy retail has a long tail of items without scannable barcodes. Loose plush, partial bundles, gift sets the supplier ships unlabeled. The cashier's only options are typing a name from memory or scrolling a category. Both slow the queue.
The scanner gives the register a third input. Point the tablet camera at the product, tap once, the suggested match drops into the cart. When it works, it shaves five to ten seconds per ambiguous item. When it does not, nothing breaks.
How a single scan works
- Cashier taps the scan button on the POS surface, available next to the barcode and manual-search inputs.
- Browser opens the rear camera through getUserMedia. Frame, capture, single still image.
- Image is downscaled and JPEG-compressed in the browser before it ever leaves the tablet, keeping the request small.
- POST hits the Next.js route on the edge runtime. Per-user rate limit checked first. Anything above the cap returns 429 immediately, no model call.
- Server calls Gemini 2.5 Flash with the image and a JSON-shaped prompt. Catalog snippet for the active store is included so the model has a closed set to match against.
- Response is parsed and the top candidate above the confidence threshold lands in the cashier's suggestion panel.
The prompt
The prompt is short, specific, and closed. The model is not asked to be creative. It is asked to match an image against a known catalog and report its confidence.
Identify the product in this image. Match against this catalog:
{catalog_snippet}
Return JSON in this exact shape:
{
"candidates": [
{ "sku": "string", "name": "string", "confidence": 0.0-1.0 }
],
"notes": "string, optional"
}
Return up to 3 candidates ordered by confidence.
If no candidate scores above 0.7, return an empty array.
Do not invent SKUs that are not in the catalog.Two design calls worth naming. Closed set instead of open-ended description, so the model cannot hallucinate a SKU. Explicit confidence threshold inside the prompt, so the model itself decides when to refuse.
Confidence threshold and fallback
- Above 0.85: top candidate auto-fills the next cart line, cashier confirms with one tap. The common case.
- Between 0.7 and 0.85: top three render as a chooser. Cashier picks or rejects. Two extra taps but no manual lookup.
- Below 0.7: empty result. POS surfaces the standard barcode and manual-search inputs. Same workflow they used before the scanner existed.
- Server timeout at 3 seconds. If Gemini is slow or unhealthy, the request is dropped client-side and the cashier moves on. Sentry catches the failure with user identifiers stripped.
AI as progressive enhancement, never a dependency.
The cashier had a working POS before the scanner shipped. They still do, on every code path. The scanner is one input among three, behind a budget cap, behind a timeout, behind a confidence gate. The day Gemini has an outage, the register keeps clearing transactions.
Where it sits in checkout
The POS target is sub-300ms transaction clearance on the shop floor. Gemini round-trips do not fit inside that budget. So the scanner runs in parallel, not in series. The cashier can keep building the cart with barcodes while the scan is pending. When the suggestion lands, it animates into the side panel and the cashier accepts or ignores it.
That decoupling is the whole posture. Latency from the model never becomes latency for the customer.
What I would change next
- Catalog embedding pre-filter on the server. Right now the catalog snippet is roughly the active store. With embeddings, the prompt could ship only the nearest 50 candidates, cutting tokens and latency together.
- On-device fallback model for offline or degraded-network days. Smaller, less accurate, but always available.
- Multi-frame capture for items that look similar from one angle. Capture three frames over half a second, send the best one or vote across them.
- Per-category confidence tuning. Plush is harder than boxed sets. The threshold should know that.
Why this pattern travels
The shape generalizes. Closed set, structured output, confidence gate, time-bounded call, graceful fallback. Swap product identification for invoice line extraction, document classification, image moderation, or support triage. The prompt and the catalog change. The posture does not.
That is most of what production AI work is. Not getting the model to perform on a demo. Getting the surrounding system to keep shipping when the model does not.
Need an AI feature wired into a real product?
Vision, classification, extraction, agent workflows. I scope and ship them with the same posture: time-bounded, fallback-ready, never on the critical path.