I want a real local pipeline: image in, structured JSON out, no cloud dependency. Optimized to run Metal / ANE or whatever apple exposes ?
My goal is to infer a json-struct of variables from image using FM. Sounds simple, but it ain't so as of May 2026.
And I really want it.
After doing a bit of research, llama.cpp provides optimization and all the necesary low level work. I just need to make swift bindings that are worth the trouble...
This is a complete tutorial on how i did it. i will use something like quickbooks / wise.com receipt capture example to make it real and safe.
Bon courage!
What We’re Building
A local inference stack with clear separation of concerns:
llama.cpp as an iOS XCFramework (vendor/llama.cpp/build-apple/llama.xcframework)
Objective-C++ bridge (Controllers/LlamaBridge.h, Controllers/LlamaBridge.mm)
Swift-facing API in Controllers/LLMFunctionsController.swift
Typed decode API:
let result: ReceiptResult = try await LLMFunctionsController.s
Discussion
Start the conversation
Your voice can be the first to spark an engaging conversation.