The brief.
Office workers in Việt Nam fight with documents every day: a 200-page contract that arrived as a flattened PDF, an invoice scanned at the wrong angle, a tax form whose tables stop existing the moment you copy them. The brief was simple and unforgiving: make these problems go away in under a minute.
Hybrid client–server processing.
The naïve approach is to send every byte to a server. The right approach is to do as much as possible in the browser, then hand the hard parts to the edge. I split the pipeline so layout analysis and rendering happen client-side, while OCR, table reconstruction, and high-throughput conversion run across edge nodes backed by an object store and a small queue.
OCR that respects Vietnamese.
Vietnamese is a stress test for OCR: stacked diacritics, mixed scripts, fonts trained on English. I tuned a small model for vie+eng and run it at the edge, then layered a deterministic post-processor that knows about tone marks and Vietnamese spacing conventions. The result: documents that read like documents, not like ransom notes.
Concurrency on cheap hardware.
Large files break naïve pipelines. Utility Hub chunks PDFs at safe boundaries, dispatches chunks to a queue, and reassembles them with a deterministic merge that preserves table cells across chunk seams. The whole thing scales horizontally on edge compute without me paying for idle capacity.
What I am proud of.
The architecture is genuinely calm — no GPU baby-sitting, no long-running servers, no surprise bills. The product feels effortless because the system underneath has been fought with, not papered over. That's what I mean when I say "AI-first" and "edge-native" in the same sentence.