Building QR Menus at Scale for Restaurants
The engineering challenges behind processing thousands of menu photos daily with near-perfect accuracy.
When a restaurant uploads a photo of their printed menu, we have roughly three seconds to extract every dish name, description, price, and category — then structure it into a queryable digital format. At scale, this means processing thousands of menus a day, in dozens of languages, from images taken in poor lighting with consumer smartphones.
This is a genuinely hard problem. And it's the core of Restaurant Managment System.
The Pipeline
Our extraction pipeline runs in three stages. First, a layout detection model identifies the menu's structure — columns, sections, headers. Then a specialized OCR model optimized for restaurant typography extracts text. Finally, a language model parses the raw text into structured JSON, resolving ambiguities and normalizing currencies.
The whole pipeline runs in a serverless environment, scaling from zero to 1,000 concurrent jobs in under 30 seconds.
Handling Edge Cases
Real-world menus are beautifully chaotic. Hand-written chalkboards. Triple-folded laminated cards. Screenshots of PDFs. We've seen it all. Our test suite now includes 8,400 edge-case menu images, each tagged with the failure mode it was designed to catch.
Lessons Learned
The biggest lesson: never underestimate the diversity of human creativity in menu design. Our most impactful accuracy improvements have come not from model architecture changes but from better training data — specifically, from the thousands of restaurants who've provided feedback when extraction went wrong.