Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.edgeimpulse.com/llms.txt

Use this file to discover all available pages before exploring further.

Created By: Eoin Jordan GitHub Repo: https://github.com/eoinjordan/arduino-edgeai-opencode-starter Hugging Face models:

Introduction

The earlier guides in this series showed how to load a Qwen1.5 LoRA adapter in Python and query it directly, with or without a FAISS retrieval index. Both approaches are good for scripted question-answering, but they are not a coding assistant — you cannot ask them to read a file, suggest an edit, or build a sketch. This guide sets up OpenCode, an offline terminal-based AI coding assistant, backed by two fine-tuned Qwen2.5-Coder-0.5B models served by llama-server. No API keys, no cloud calls, no internet connection required at runtime. Everything runs on the same hardware covered in the previous guides:
  • Raspberry Pi 4 / Pi 5 — CPU inference, ~5–15 tok/s on Pi 5
  • NVIDIA Jetson Orin / QCS6490 devices — GPU inference via cuBLAS, ~30–80 tok/s
The two models are quantized to Q4_K_M GGUF format (~398 MB each), which means they load and run comfortably on a Pi 5 with 8 GB RAM or a Jetson Orin without a CUDA memory shortage.
ModelHuggingFaceWhat it knows
qwen-edgeai-q4_k_m.ggufeoinedge/edgeai-docs-qwen2.5-coder-0.5b-loraEdge Impulse Studio, DSP blocks, SDK, REST API, deployment targets
qwen-arduino-q4_k_m.ggufeoinedge/arduino-qwen0.5-loraArduino C/C++, UNO R4 WiFi, Nano 33 BLE Sense, library APIs
This guide uses GGUF models served by llama-server — a different runtime from the Python transformers + PEFT approach in the earlier tutorials. The GGUF approach is generally faster for interactive use and does not require PyTorch.

How it works

Student prompt


OpenCode (TUI)
    │  reads opencode/opencode.json

llama-server (localhost:8081/v1)
    │  CPU (Pi) or CUDA GPU (Jetson) — fine-tuned Qwen model

Response streamed back to OpenCode
llama-server exposes an OpenAI-compatible /v1/chat/completions endpoint. opencode.json points OpenCode at that local endpoint using the @ai-sdk/openai-compatible provider. The two agents defined in .opencode/agents/ (edgeai.md and arduino.md) each carry a different system prompt that focuses the model on its specialist domain — switch agents when you switch models.

Prerequisites

  • A Raspberry Pi 4 / Pi 5, a Thundercomm Rubik Pi 3, or an NVIDIA Jetson Orin (Nano, NX, AGX) running 64-bit Linux
  • ~1 GB free disk space for both GGUF models plus the llama.cpp build artefacts
  • git, cmake, build-essential, and wget (the quickstart script installs these)
  • Node.js 18+ and npm for OpenCode (the quickstart script checks and installs if missing)
  • For Jetson GPU inference: CUDA toolkit installed at /usr/local/cuda

Quickstart

Clone the repo and run the one-shot setup script:
git clone https://github.com/eoinjordan/arduino-edgeai-opencode-starter.git
cd arduino-edgeai-opencode-starter
chmod +x scripts/*.sh
bash scripts/quickstart.sh
The script auto-detects your platform and runs four steps:
StepWhat happens
1Installs build tools (cmake, git, wget) via apt-get or Homebrew
2Clones and builds llama.cpp from source with DLLAMA_BUILD_SERVER=ON; enables DGGML_CUDA=ON on Jetson
3Downloads qwen-edgeai-q4_k_m.gguf and qwen-arduino-q4_k_m.gguf to ~/
4Installs opencode-ai globally via npm
You can force a platform instead of relying on auto-detection:
bash scripts/quickstart.sh pi       # Raspberry Pi — CPU only
bash scripts/quickstart.sh jetson   # Jetson — CUDA build (sm_87)
bash scripts/quickstart.sh cpu      # Generic Linux / macOS CPU
For Jetson targets other than Orin (sm_87), override the CUDA architecture before running the script:
CUDA_ARCH=72 bash scripts/quickstart.sh jetson   # Xavier = sm_72
If llama-server is already installed (for example via brew install llama.cpp on macOS), the build step is skipped automatically. The quickstart script checks PATH and common install locations before starting the build.

Step 1 — Start the model server

Pick the model that matches the task you are about to work on. Only one model runs at a time; the server starts on port 8081.
# For Edge Impulse / Edge AI questions and code:
./scripts/start-server.sh edgeai

# For Arduino sketch generation and debugging:
./scripts/start-server.sh arduino
Leave this terminal open. The server logs each incoming request to stdout. start-server.sh applies platform-specific settings automatically:
PlatformGPU layersThreads
Jetson / CUDA99 (all)4 (ARM cores; GPU handles inference)
Raspberry Pi0Min(nproc, 4) — capped to reduce thermal throttle
macOS / generic CPU0All logical cores
You can override both settings with environment variables if needed:
LLAMA_N_GPU_LAYERS=32 LLAMA_THREADS=6 ./scripts/start-server.sh edgeai
The server configuration passed to llama-server:
--ctx-size 4096    # context window
--n-predict 512    # maximum output tokens per response
--host 127.0.0.1   # bind to loopback only
--port 8081

Step 2 — Open OpenCode

In a second terminal, navigate to the opencode/ directory and start OpenCode:
cd opencode && opencode
OpenCode reads opencode/opencode.json from the current directory on startup. This file declares both local model providers and sets edgeai/qwen-edgeai as the default:
{
  "$schema": "https://opencode.ai/config.json",
  "model": "edgeai/qwen-edgeai",
  "provider": {
    "edgeai": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "EdgeAI (local)",
      "options": {
        "baseURL": "http://127.0.0.1:8081/v1",
        "apiKey": "local"
      },
      "models": {
        "qwen-edgeai": {
          "name": "Qwen EdgeAI — Edge Impulse docs (local)",
          "limit": { "context": 4096, "output": 512 }
        }
      }
    },
    "arduino": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Arduino (local)",
      "options": {
        "baseURL": "http://127.0.0.1:8081/v1",
        "apiKey": "local"
      },
      "models": {
        "qwen-arduino": {
          "name": "Qwen Arduino — code generation (local)",
          "limit": { "context": 4096, "output": 512 }
        }
      }
    }
  }
}
Both providers point at the same 127.0.0.1:8081 — switching the model in OpenCode selects which configured profile to use, but you must also restart start-server.sh with the matching argument to load the correct GGUF.

Step 3 — Select an agent

Inside OpenCode, press / and type agent to open the agent picker. Two agents are available:
AgentSystem prompt focus
edgeaiEdge Impulse Studio workflow, DSP blocks, SDK methods, REST API, deployment targets (Nano 33 BLE Sense, Pi, Jetson)
arduinoArduino C/C++, setup() / loop() structure, UNO R4 WiFi, Arduino Cloud / App Studio, run_classifier() for Edge Impulse libraries
Switch agent and model server together:
# Terminal 1 — restart for Arduino work:
./scripts/start-server.sh arduino

# Terminal 2 — inside OpenCode, press / → agent → arduino
Example prompts for each agent: edgeai agent:
  • How do I export a trained Edge Impulse model as a C++ library?
  • Walk me through the full project workflow from data collection to deployment
  • How do I call the Edge Impulse REST API to start a training job?
arduino agent:
  • Read my sketch and add a blinking LED on pin 13
  • Write a sketch for UNO R4 WiFi that reads an accelerometer and prints to Serial
  • Validate the project then build it for arduino:renesas_uno:unor4wifi

Arduino IDE integration with arduino-mcp (optional)

arduino-mcp is an MCP server that gives OpenCode tools to read, write, validate, and build Arduino sketches directly in Arduino IDE 2.0 format. When it is connected, the arduino agent can act on your sketch rather than only describe what to change. 1. Install and start arduino-mcp:
npm install -g arduino-claude-mcp
arduino-claude-mcp          # starts REST server on port 3080
On a Pi UNO Q, run it in Docker instead:
docker run --rm --network host \
  -e ARDUINO_FQBN=arduino:renesas_uno:unor4wifi \
  -v ~/Arduino:/workspace \
  eoinedge/arduino-mcp:latest
2. Add it to the OpenCode MCP config — create opencode/mcp.json:
{
  "mcpServers": {
    "arduino-mcp": {
      "command": "node",
      "args": ["/absolute/path/to/arduino-mcp/build/mcp.js"]
    }
  }
}
3. Use MCP tools inside the arduino agent: With arduino-mcp connected, the agent can call tools directly instead of suggesting changes for you to copy-paste:
PromptMCP tool
”Read my sketch and add a blinking LED”read_source → edits → write_source
”Validate my sketch”validate
”Build for UNO R4 WiFi”build (requires arduino-cli and ARDUINO_FQBN set)
build requires arduino-cli to be installed and the ARDUINO_FQBN environment variable to be set to your target board. On Pi UNO Q setups running the pi-openclaw-mcp-stack, validate and build are also available via the REST gateway on port 3000.

Switching models inside OpenCode

Type /models inside OpenCode to see the configured models and switch between them:
edgeai/qwen-edgeai   — Qwen EdgeAI (local)
arduino/qwen-arduino — Qwen Arduino (local)
Remember to have start-server.sh running with the matching argument — OpenCode talks to whatever model llama-server has loaded, regardless of which provider profile is selected in opencode.json.

Best practices

  • Switch both the server and the agent at the same time to keep the model and system prompt aligned.
  • On a Pi 4 with 4 GB RAM, only one model server should run at a time. Kill the previous server before starting the other.
  • Keep prompts focused. The 0.5B models are fast but have a 4096-token context window. For multi-file projects or long code generation tasks, break the work into smaller steps.
  • For complex multi-step reasoning (e.g. a full Edge Impulse project from scratch), consider running a 3B+ GGUF model instead. Swap the GGUF path in start-server.sh and update the limit.context value in opencode.json accordingly.
  • Validate generated Arduino code in the Serial Monitor or with arduino-cli compile before flashing to hardware.

Reference