Run LLMs entirely in the browser with a simple headless React hook, useLLM().
Run LLMs entirely in the browser with a simple headless React hook, useLLM().
Live demo: http://chat.matt-rickard.com
GitHub: https://github.com/r2d4/react-llm
react-llm/headless lets you customize everything from the system prompt to the user/assistant role names. It manages a WebGPU-powered background worker.
react-llm sets everything up for you — an off-the-main-thread worker that fetches the model from a CDN (HuggingFace), cross-compiles the WebAssembly components (like the tokenizer and model bindings), and manages the model state (attention kv cache, and more).
Everything runs clientside — the model is cached and inferenced in the browser. Conversations are stored in session storage.
Under the hood, it’s powered by Apache TVM Unity and MLC.