QNN Hardware Acceleration on Android

Enable hardware acceleration for Edge Impulse models on Android using the Qualcomm AI Engine Direct (QNN) TFLite Delegate. This tutorial demonstrates object detection with real-time performance improvements on Snapdragon devices.

This tutorial builds on the Camera Inference tutorial. You’ll add QNN hardware acceleration to leverage Qualcomm’s Hexagon NPU for significantly faster inference.
Reference code: https://github.com/edgeimpulse/qnn-hardware-acceleration

Where the QNN TFLite delegate fits in the Qualcomm AI Engine Direct stack

What you’ll build

An Android application that:

Runs Edge Impulse object detection models with Camera2 API
Accelerates inference using Qualcomm’s HTP/DSP via QNN delegate
Displays real-time bounding boxes with overlay
Logs detailed performance metrics to Logcat

Time: 1 hour
Difficulty: Advanced

Performance expectations

Results from YOLOv5 small (480×480 quantized) on Qualcomm RB3 Gen 2 (6490):

Path	DSP (µs)	Inference (µs)	Speedup
Without QNN	5,640	5,748	Baseline
With QNN	3,748	527	~10.9× faster

Conservative gains:

Before (CPU-only)
After (QNN delegate)

Logcat timing without QNN acceleration

Inference: 5,748 → 527 µs ≈ 10.9× faster
DSP stage: 5,640 → 3,748 µs ≈ 1.5× faster
Smoother frame times with dedicated accelerator
Lower power consumption

INT8 quantized is required for HTP acceleration.Performance varies by device SoC, model architecture, and quantization. Optimizing your model for available QNN operations increases speedup dramatically.

Prerequisites

Edge Impulse account: Sign up
Trained object detection model
Android Studio: Ladybug 2024.2.2 or later
Snapdragon device real or Qualcomm Device Cloud with Hexagon NPU:
- Snapdragon 8 Gen 1/2/3 (mobile)
- Snapdragon 6/7 series (mid-range)
- QRB series (embedded: RB3, RB5, Dragonwing)
Qualcomm AI Engine Direct SDK: Download from Qualcomm
Tools: Android API 35, NDK 27.0.12077973, CMake 3.22.1

Supported devices

Devices with Qualcomm Hexagon NPU Gen 2 or later: Mobile:

Example Snapdragon reference device used for testing

Snapdragon 8 Gen 3/2/1
Snapdragon 7+ Gen 2/3
Snapdragon 6 Gen 1

Embedded:

QRB6490 (Rubik Pi 3)
QRB5165 (RB5)
QCS2210 (Arduino UNO Q)
Dragonwing platforms

Test on Device Cloud:
Don’t have hardware? Try the Qualcomm Device Cloud with pre-configured Snapdragon devices.

1. Clone the repository

git clone https://github.com/edgeimpulse/qnn-hardware-acceleration.git
cd qnn-hardware-acceleration

Open in Android Studio and let Gradle/NDK sync.

2. Locate Qualcomm AI Engine Direct SDK

Download and install the Qualcomm AI Engine Direct SDK. Common installation paths:

# macOS/Linux
/opt/qairt/<version>/
~/qairt/<version>/

# Windows
C:\qairt\<version>\

You’re looking for the folder containing libQnnTFLiteDelegate.so for Android arm64.

Find the delegate directory

macOS/Linux:

# Replace /opt/qairt/2.xx with your actual path
find /opt/qairt/2.xx -type f -name libQnnTFLiteDelegate.so

Windows: Open File Explorer at C:\qairt\<version>\ and search for libQnnTFLiteDelegate.so. The parent folder of that file is your source directory (it also contains other libQnn*.so runtime libs).

3. Copy QNN libraries

Create the destination directory in your project:

mkdir -p app/src/main/jniLibs/arm64-v8a/

Copy the required libraries from the delegate directory you found: Required libraries:

# From QAIRT SDK to app/src/main/jniLibs/arm64-v8a/
libQnnTFLiteDelegate.so
libQnnHtp.so
libQnnHtpV**.so         # Match your device (V68/V69/V75/V79)
libQnnHtpV**Skel.so     # Skeleton library for your version
libQnnSystem.so
libQnnIr.so
libQnnSaver.so          # If included in your SDK
libPlatformValidatorShared.so  # If included

Optional:

libcdsprpc.so           # Some devices provide from /vendor

Automated script (experimental)

The repository includes a fetch script:

sh ./fetchqnnlibs.sh

You’ll need to configure the script with your QAIRT SDK path first.

4. Deploy your model

In Edge Impulse Studio:

Go to Deployment
Select Android (C++ library)
Enable Quantized (int8) for best QNN performance
Click Build
Download the .zip

Extract into your project:

unzip ~/Downloads/your-model.zip -d app/src/main/cpp/

5. Configure Android manifest

Update app/src/main/AndroidManifest.xml:

<application 
    android:extractNativeLibs="true"
    ...>
    
    <uses-native-library 
        android:name="libcdsprpc.so" 
        android:required="false"/>
    
    <!-- Existing permissions and activities -->
</application>

This ensures QNN libraries are extracted and accessible.

6. Build and run

Build in Android Studio

Connect your Snapdragon device via USB
Enable USB debugging in Developer Options
Click Run (green play button)
Select your device

Monitor performance

Open Logcat and filter by MainActivity:

adb logcat -s MainActivity

Expected output:

DSP: 3748 us
Classification: 527 us
Anomaly: 0 us
End-to-end: 21 ms (~48 FPS)

Verify QNN acceleration

Check if QNN libraries are loaded:

# Check memory maps for QNN libraries
adb shell 'pid=$(pidof -s com.yourpackage.name); cat /proc/$pid/maps | grep -i qnn'

# Verify profiling output (if enabled)
adb shell ls -l /sdcard/qnn_profile.json

If you see QNN library paths, acceleration is active.

How it works

QNN TFLite delegate integration

// native-lib.cpp
#include <tensorflow/lite/delegates/external/external_delegate.h>

TfLiteDelegate* LoadQnnDelegate() {
    // Set environment for QNN
    setenv("ADSP_LIBRARY_PATH", GetNativeLibDir() + ":/dsp", 1);
    setenv("LD_LIBRARY_PATH", GetNativeLibDir() + ":" + 
           getenv("LD_LIBRARY_PATH"), 1);
    
    // Configure QNN options
    const char* qnn_options = 
        R"({"backend":"htp",
            "htp_performance_mode":"burst",
            "enable_intermediate_outputs":false})";
    
    // Load delegate
    TfLiteExternalDelegateOptions options = 
        TfLiteExternalDelegateOptionsDefault("/path/to/libQnnTFLiteDelegate.so");
    options.insert(&options, "qnn_options", qnn_options);
    
    return TfLiteExternalDelegateCreate(&options);
}

Environment configuration

The app automatically sets required environment variables on startup:

// MainActivity.kt
private fun setupQnnEnvironment() {
    val nativeLibDir = applicationInfo.nativeLibraryDir
    
    // Set library paths for QNN
    System.setProperty("ADSP_LIBRARY_PATH", "$nativeLibDir:/dsp")
    System.setProperty("LD_LIBRARY_PATH", "$nativeLibDir:${System.getenv("LD_LIBRARY_PATH")}")
    
    Log.d(TAG, "QNN environment configured: $nativeLibDir")
}

Project structure

qnn-hardware-acceleration/
├── app/
│   └── src/
│       ├── main/
│       │   ├── java/com/example/test_camera/
│       │   │   ├── MainActivity.kt       # Camera + overlay UI
│       │   │   ├── CameraManager.kt      # Camera2 handling
│       │   │   └── OverlayView.kt        # Bounding box drawing
│       │   ├── cpp/
│       │   │   ├── native-lib.cpp        # JNI + QNN integration
│       │   │   ├── edge-impulse-sdk/     # Your model
│       │   │   ├── model-parameters/
│       │   │   └── tflite-model/
│       │   ├── jniLibs/
│       │   │   └── arm64-v8a/            # QNN shared libraries
│       │   └── AndroidManifest.xml
│       └── CMakeLists.txt
└── build.gradle

Customization

Adjust HTP performance mode

In native-lib.cpp, modify QNN options:

// Options: "default", "burst", "balanced", "low_power", "high_performance"
const char* qnn_options = 
    R"({"backend":"htp",
        "htp_performance_mode":"high_performance",
        "enable_intermediate_outputs":false})";

Modes:

burst: Maximum speed, higher power (default)
high_performance: Sustained high performance
balanced: Balance between speed and power
low_power: Minimize power consumption

Enable profiling

const char* qnn_options = 
    R"({"backend":"htp",
        "htp_performance_mode":"burst",
        "profiling_level":"detailed",
        "enable_intermediate_outputs":true})";

Profile output saved to /sdcard/qnn_profile.json.

Optimize model for QNN

In Edge Impulse Studio:

Use quantization: INT8 models leverage HTP better than FP32
Supported operations: Check QNN operator support
Enable EON Compiler: Optimizes for Qualcomm hardware

Change detection threshold

// MainActivity.kt
private val DETECTION_THRESHOLD = 0.5f  // Default

// Modify to be more/less sensitive
private val DETECTION_THRESHOLD = 0.7f  // Less sensitive (fewer false positives)
private val DETECTION_THRESHOLD = 0.3f  // More sensitive (more detections)

Performance tuning tips

Model optimization

Use INT8 quantization - Essential for HTP acceleration
Reduce input resolution - 320×320 vs 640×640 can be 4× faster
Simplify architecture - Fewer layers = better HTP utilization
Test operator coverage - Check which ops run on HTP vs CPU

Runtime optimization

// Increase thread count for CPU fallback operations
interpreter->SetNumThreads(4);

// Allocate tensors once
interpreter->AllocateTensors();

// Reuse input/output buffers
TfLiteTensor* input = interpreter->input_tensor(0);
TfLiteTensor* output = interpreter->output_tensor(0);

Frame rate optimization

// Reduce camera frame rate if processing can't keep up
val fpsRange = Range(15, 30)  // 15-30 FPS instead of 30-60
captureRequestBuilder.set(
    CaptureRequest.CONTROL_AE_TARGET_FPS_RANGE,
    fpsRange
)

Benchmark results

Real-world performance on different devices:

Device	Model	Without QNN	With QNN	Speedup
RB3 Gen 2 (6490)	YOLOv5s 480×480 INT8	5.7 ms	0.5 ms	11.4×
Snapdragon 8 Gen 2	FOMO 96×96 INT8	2.1 ms	0.3 ms	7.0×
Pixel 8 Pro (Tensor G3)	YOLOv5s 320×320 INT8	N/A	N/A	Not supported

Google Tensor processors don’t include Hexagon NPU. QNN acceleration only works on Qualcomm Snapdragon devices.

Next steps

Qualcomm Device Cloud

Test without hardware

Additional resources

Summary

You’ve successfully enabled Qualcomm QNN hardware acceleration for Edge Impulse models on Android. Key takeaways:

10×+ speedup possible with proper configuration
Drop-in integration - no model retraining required
INT8 quantization essential for HTP acceleration
Device-specific - Snapdragon/Dragonwing only

With QNN acceleration, you can deploy production-grade ML applications on edge devices with real-time performance and low power consumption. Questions? Join the discussion on the Edge Impulse Forum!

OVERVIEW

END-TO-END

TOPICS

TOOLS

INTEGRATIONS

What you’ll build

Performance expectations

Prerequisites

Supported devices

1. Clone the repository

2. Locate Qualcomm AI Engine Direct SDK

Find the delegate directory

3. Copy QNN libraries

4. Deploy your model

5. Configure Android manifest

6. Build and run

Build in Android Studio

Monitor performance

Verify QNN acceleration

How it works

QNN TFLite delegate integration

Environment configuration

Project structure

Customization

Adjust HTP performance mode

Enable profiling

Optimize model for QNN

Change detection threshold

Performance tuning tips

Model optimization

Runtime optimization

Frame rate optimization

Benchmark results

Next steps

Qualcomm Device Cloud

Additional resources

Summary

OVERVIEW

END-TO-END

TOPICS

TOOLS

INTEGRATIONS

​What you’ll build

​Performance expectations

​Prerequisites

​Supported devices

​1. Clone the repository

​2. Locate Qualcomm AI Engine Direct SDK

​Find the delegate directory

​3. Copy QNN libraries

​4. Deploy your model

​5. Configure Android manifest

​6. Build and run

​Build in Android Studio

​Monitor performance

​Verify QNN acceleration

​How it works

​QNN TFLite delegate integration

​Environment configuration

​Project structure

​Customization

​Adjust HTP performance mode

​Enable profiling

​Optimize model for QNN

​Change detection threshold

​Performance tuning tips

​Model optimization

​Runtime optimization

​Frame rate optimization

​Benchmark results

​Next steps

Qualcomm Device Cloud

​Additional resources

​Summary

What you’ll build

Performance expectations

Prerequisites

Supported devices

1. Clone the repository

2. Locate Qualcomm AI Engine Direct SDK

Find the delegate directory

3. Copy QNN libraries

4. Deploy your model

5. Configure Android manifest

6. Build and run

Build in Android Studio

Monitor performance

Verify QNN acceleration

How it works

QNN TFLite delegate integration

Environment configuration

Project structure

Customization

Adjust HTP performance mode

Enable profiling

Optimize model for QNN

Change detection threshold

Performance tuning tips

Model optimization

Runtime optimization

Frame rate optimization

Benchmark results

Next steps

Additional resources

Summary