Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.edgeimpulse.com/llms.txt

Use this file to discover all available pages before exploring further.

Created By: Eoin Jordan GitHub Repo: https://github.com/eoinjordan/pi-openclaw-mcp-stack Hugging Face model: eoinedge/edgeai-docs-embedding-qwen1.5-0.5b-instruct

Introduction

Edge AI development often happens in environments with limited connectivity — on the factory floor, at a demo table, or embedded in a field device. When that happens, having the Edge Impulse documentation available to a local language model means you can still ask questions about the Studio API, deployment options, DSP blocks, and SDK usage without reaching the internet. This project fine-tunes a small Qwen2.5-Coder-0.5B-Instruct language model using LoRA (Low-Rank Adaptation) on 1,794 Edge Impulse documentation files. The resulting adapter ships on Hugging Face and can be loaded in a few lines of Python on any Linux device. This guide covers how to set it up and run it on two common platforms:
  • Raspberry Pi 4 / Pi 5 — the most accessible Linux edge device for developers
  • Thundercomm Rubik Pi 3 — a QCS6490-based board with 12 TOPS NPU and a Raspberry Pi-compatible form factor
The same workflow runs on any other QCS6490 device such as the Qualcomm Dragonwing RB3 Gen 2 Dev Kit or the Advantech AOM-2721 SOM.

What the adapter covers

The adapter is trained on the full Edge Impulse documentation set as of mid-2026, covering:
  • Studio projects, datasets, and data acquisition
  • DSP and transformation blocks
  • Learning and processing blocks
  • Model deployment and edge inference
  • Python SDK, REST API, and CLI tools
  • Hardware board setup and deployment guides
It is a 0.5B parameter model, so responses are fast even on a CPU. For complex multi-step reasoning, pair it with a retrieval index (see the RAG section below).

Hardware requirements

DeviceRAMStorageNotes
Raspberry Pi 4 (4 GB+)4 GB min16 GB SDCPU inference, ~2–4 s/response
Raspberry Pi 5 (8 GB)8 GB32 GB SDNoticeably faster; recommended
Thundercomm Rubik Pi 38 GB LPDDR4x128 GB UFSQCS6490, Ubuntu 24.04 or Qualcomm Linux
Any QCS6490 Linux device4 GB min16 GB minSame steps as Rubik Pi 3

Setting up the Raspberry Pi

Flash Raspberry Pi OS (64-bit Bookworm) to an SD card using Raspberry Pi Imager, enable SSH during imaging, and boot the Pi. Then SSH in and run:
sudo apt update && sudo apt upgrade -y
sudo apt install -y python3-pip python3-venv git
python3 -m venv ~/edgeai-llm
source ~/edgeai-llm/bin/activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install transformers peft sentence-transformers faiss-cpu
On a Pi 4, install the cpu PyTorch wheel. The full CUDA wheel will fail. On a Pi 5 with 8 GB RAM, inference is comfortable with float32. On a Pi 4 with 4 GB, add --low-cpu-mem-usage when loading the base model.

Setting up the Rubik Pi 3 (QCS6490)

The Rubik Pi 3 ships with Ubuntu 24.04 or Qualcomm Linux. Log in with username ubuntu / password ubuntu (you may be prompted to change it on first boot), connect to your network with sudo nmtui, then reboot to sync the system clock. After rebooting:
sudo apt update && sudo apt upgrade -y
sudo apt install -y python3-pip python3-venv git
python3 -m venv ~/edgeai-llm
source ~/edgeai-llm/bin/activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install transformers peft sentence-transformers faiss-cpu
The Rubik Pi 3’s Adreno 643L GPU is not yet targeted by the standard PyTorch wheel, so CPU inference is used here too. The QCS6490’s eight Kryo 670 cores handle the 0.5B model well at float16.
If you want to also use Edge Impulse on this board for model training and deployment, install the Edge Impulse Linux CLI after the steps above. See the Thundercomm Rubik Pi 3 setup guide for full CLI installation instructions.

Load the base model and adapter

With the virtual environment activated, save the following as load_model.py on your device:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

BASE_MODEL = "Qwen/Qwen2.5-Coder-0.5B-Instruct"
ADAPTER    = "eoinedge/edgeai-docs-embedding-qwen1.5-0.5b-instruct"

# ARM devices use CPU; this also works on any CUDA host
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype  = torch.float16 if device != "cpu" else torch.float32

tokenizer  = AutoTokenizer.from_pretrained(BASE_MODEL)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=dtype,
    low_cpu_mem_usage=True,
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()
if device != "cpu":
    model = model.to(device)
The first run downloads the base model (~1 GB) and the adapter (~6 MB) from Hugging Face. On subsequent runs they are loaded from the local cache (~/.cache/huggingface).

Ask a question

from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=-1,           # -1 = CPU; set to 0 for CUDA
)

questions = [
    "How do I use the Edge Impulse Python SDK to upload data?",
    "What is a DSP block in Edge Impulse?",
    "How do I deploy a model to an Arduino Nano 33 BLE Sense?",
    "How do I call the Edge Impulse REST API to start a training job?",
]

for q in questions:
    out = pipe(q, max_new_tokens=200, do_sample=False)
    print(f"Q: {q}\nA: {out[0]['generated_text']}\n")
Typical response time on a Raspberry Pi 5 is 3–6 seconds per answer at float32. On the Rubik Pi 3 at float16 it is roughly the same. The model has the full documentation vocabulary baked in, so answers about Studio workflows, SDK methods, and deployment targets are grounded in real documentation rather than general web knowledge.

Best practices

  • On a Raspberry Pi 4 with 4 GB RAM, close other services before loading the model to avoid OOM. Run sudo systemctl stop for any unused services.
  • Validate generated code before deploying it to hardware or calling it against the Edge Impulse REST API.
A companion adapter trained on Arduino documentation is available at eoinedge/arduino-qwen0.5-lora. It uses the same Qwen/Qwen2.5-Coder-0.5B-Instruct base and is fine-tuned to write Arduino sketches, understand Arduino library APIs, and answer hardware-level questions. Load it using the same pattern:
BASE_MODEL = "Qwen/Qwen2.5-Coder-0.5B-Instruct"
ADAPTER    = "eoinedge/arduino-qwen0.5-lora"

base_model = AutoModelForCausalLM.from_pretrained(BASE_MODEL, torch_dtype=dtype, low_cpu_mem_usage=True)
model      = PeftModel.from_pretrained(base_model, ADAPTER)
Use this when you need to generate Edge Impulse inference sketches for Arduino Nano 33 BLE Sense or other Arduino targets offline — for example at a demo table without internet access. There is some overlap in the documentation covered by the two adapters, but the Arduino adapter is more focused on hardware-level questions and code generation, while the Edge AI docs adapter covers the full breadth of Studio and API topics.

Reference