Haiyang Wu
Edge AI Engineer · Founder, Machina AI
I build production computer-vision and small-model systems that run on the factory floor, not in the cloud.
Founded Machina AI in 2025 to bring zero-cloud visual inspection to high-mix manufacturing — shipped EdgeRunner on NVIDIA Jetson and signed a pilot LOI with a Shenzhen powder metallurgy manufacturer.
Currently open-sourcing SMBoost — a reliability harness for 2B–7B LLMs running at the edge.
About
I came to engineering through mathematics and statistics at UConn, graduating in December 2025. The path into ML and systems work was self-directed — training models that performed well in notebooks, then learning the harder half: running them on constrained hardware, under load, with no second chances.
In 2025 I founded Machina AI to bring computer-vision quality inspection to high-mix, low-volume factories that can't — or won't — send production-line imagery to the cloud. I built EdgeRunner, a real-time vision pipeline combining YOLO detection, ByteTrack, and a small VLM on a Jetson Orin Nano, and took it to a Shenzhen powder metallurgy manufacturer. We signed a pilot LOI. We did not sign a contract. ToB sales on a single-founder timeline is a different problem than I had framed it going in; the engineering was the easy half.
The harder problem sat below the stack I was trying to sell. Running a 3B VLM on the Jetson, the model worked when it worked — but its tail behavior (malformed JSON, hallucinated classes, unjustified confidence) was what kept a plant from trusting it. Not raw capability. That observation became SMBoost: a reliability harness for small LLMs in production — grammar-constrained outputs, a hierarchical state machine around inference, invariant checks on structured results, and a scoring layer that flags silent degradation.
I'm looking for the next room to build in — either a team pushing Edge AI into production at real scale, or a continuation of SMBoost as a standalone project.
Projects
EdgeRunner— haiyang5535/edge-runner
Real-time edge AI vision system on NVIDIA Jetson Orin Nano combining YOLO detection with VLM reasoning.
Stack: TensorRT FP16, ByteTrack, GBNF-constrained VLM JSON output, dual-thread architecture.
Results: <12ms P99 detection latency on Jetson Orin Nano. Signed pilot LOI with a Shenzhen powder metallurgy manufacturer.
SMBoost— haiyang5535/smboost
Open-source reliability harness for 2B–7B local LLMs in production.
Origin: came directly out of a failure mode in the Machina pilot — small VLMs on Jetson worked in the median but broke in the tail, and that tail was the adoption blocker, not raw capability.
Stack: hierarchical state machine, adaptive robustness scorer, GBNF grammar constraints, Pydantic invariants.
Stack
Python · NVIDIA Jetson · TensorRT · YOLO · VLMs · Computer Vision · ML Systems · Edge Deployment
Contact
- Emailw.haiyang@outlook.com
- GitHubgithub.com/haiyang5535
- LinkedInlinkedin.com/in/h-wu