Keyword Spotting on Android

Deploy audio keyword spotting models on Android for wake word detection, voice commands, and audio event recognition.

What You’ll Build

An Android app that:

Captures real-time audio from microphone
Recognizes spoken keywords continuously
Displays classification results with confidence scores
Runs entirely on-device with low latency

Time: 30 minutes
Difficulty: Intermediate

Prerequisites

Trained audio keyword spotting model
Android Studio with NDK and CMake
Android device with microphone (usb camera with a mic also works)
Basic familiarity with Android development

Step 1: Clone the Repository

git clone https://github.com/edgeimpulse/example-android-inferencing.git
cd example-android-inferencing/example_kws

Step 2: Download TensorFlow Lite Libraries

cd app/src/main/cpp/tflite
sh download_tflite_libs.sh  # or .bat for Windows

Step 3: Export Your Audio Model

In Edge Impulse Studio, go to Deployment
Select Android (C++ library)
Enable EON Compiler (recommended for audio)
Click Build and download the .zip

Step 4: Integrate the Model

Extract the downloaded .zip file
Copy all files except CMakeLists.txt to:
```
example_kws/app/src/main/cpp/
```

Your structure should be:

app/src/main/cpp/
├── edge-impulse-sdk/
├── model-parameters/
├── tflite-model/
├── native-lib.cpp
└── CMakeLists.txt (existing)

Step 5: Configure Audio Permissions

Permissions are already set in AndroidManifest.xml:

<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-feature android:name="android.hardware.microphone" />

Step 6: Build and Run

Open in Android Studio
Build > Make Project
Connect your Android device
Run the app
Grant microphone permission when prompted

Speak your keywords and see real-time recognition results!

Understanding the Code

Audio Capture

// MainActivity.kt
private fun startAudioRecording() {
    val bufferSize = AudioRecord.getMinBufferSize(
        SAMPLE_RATE,
        AudioFormat.CHANNEL_IN_MONO,
        AudioFormat.ENCODING_PCM_16BIT
    )
    
    audioRecord = AudioRecord(
        MediaRecorder.AudioSource.MIC,
        SAMPLE_RATE,
        AudioFormat.CHANNEL_IN_MONO,
        AudioFormat.ENCODING_PCM_16BIT,
        bufferSize
    )
    
    audioRecord.startRecording()
    
    // Read audio in background thread
    Thread {
        val buffer = ShortArray(bufferSize)
        while (isRecording) {
            val read = audioRecord.read(buffer, 0, buffer.size)
            if (read > 0) {
                processAudioBuffer(buffer, read)
            }
        }
    }.start()
}

Ring Buffer for Continuous Inference

private val audioRingBuffer = RingBuffer(16000) // 1 second at 16kHz

private fun processAudioBuffer(buffer: ShortArray, size: Int) {
    // Add to ring buffer
    audioRingBuffer.write(buffer, size)
    
    // Run inference when buffer is full
    if (audioRingBuffer.isFull()) {
        val features = audioRingBuffer.read()
        val result = runInference(features)
        
        runOnUiThread {
            updateUI(result)
        }
    }
}

Native Inference

// native-lib.cpp
extern "C" JNIEXPORT jobject JNICALL
Java_com_example_kws_EIClassifierAudio_run(
    JNIEnv* env, jobject, jshortArray audioData) {
    
    // Convert audio to float features
    jshort* audio = env->GetShortArrayElements(audioData, nullptr);
    int length = env->GetArrayLength(audioData);
    
    std::vector<float> features(length);
    for (int i = 0; i < length; i++) {
        features[i] = (float)audio[i] / 32768.0f;
    }
    
    // Create signal
    signal_t signal;
    signal.total_length = features.size();
    signal.get_data = &get_feature_data;
    
    // Run classifier
    ei_impulse_result_t result = {0};
    run_classifier(&signal, &result, false);
    
    // Return result object
    return createResultObject(env, result);
}

Result Display

private fun updateUI(result: ClassificationResult) {
    // Find highest confidence prediction
    val topResult = result.classification.maxByOrNull { it.score } ?: return
    
    if (topResult.score > CONFIDENCE_THRESHOLD) {
        // Show detected keyword
        keywordTextView.text = topResult.label
        confidenceTextView.text = "${(topResult.score * 100).toInt()}%"
        
        // Highlight detection
        detectionIndicator.setBackgroundColor(Color.GREEN)
        
        // Trigger action (optional)
        onKeywordDetected(topResult.label)
    } else {
        keywordTextView.text = "Listening..."
        detectionIndicator.setBackgroundColor(Color.GRAY)
    }
}

Summary

You’ve built a simple keyword spotting app on Android using Edge Impulse. You can now expand this foundation by integrating more complex models, adding custom actions on keyword detection, or optimizing performance further. See the Android series overview for more tutorials!

OVERVIEW

END-TO-END

TOPICS

TOOLS

INTEGRATIONS

What You’ll Build

Prerequisites

Step 1: Clone the Repository

Step 2: Download TensorFlow Lite Libraries

Step 3: Export Your Audio Model

Step 4: Integrate the Model

Step 5: Configure Audio Permissions

Step 6: Build and Run

Understanding the Code

Audio Capture

Ring Buffer for Continuous Inference

Native Inference

Result Display

Summary

Resources

OVERVIEW

END-TO-END

TOPICS

TOOLS

INTEGRATIONS

​What You’ll Build

​Prerequisites

​Step 1: Clone the Repository

​Step 2: Download TensorFlow Lite Libraries

​Step 3: Export Your Audio Model

​Step 4: Integrate the Model

​Step 5: Configure Audio Permissions

​Step 6: Build and Run

​Understanding the Code

​Audio Capture

​Ring Buffer for Continuous Inference

​Native Inference

​Result Display

​Summary

​Resources

What You’ll Build

Prerequisites

Step 1: Clone the Repository

Step 2: Download TensorFlow Lite Libraries

Step 3: Export Your Audio Model

Step 4: Integrate the Model

Step 5: Configure Audio Permissions

Step 6: Build and Run

Understanding the Code

Audio Capture

Ring Buffer for Continuous Inference

Native Inference

Result Display

Summary

Resources