Back to All Written byHongqiangWangFeb 18, 2025
Co-authored with Li He, Alex Angus, Skyler Szot, Shangqing Gu, Shaofei Qi, and Alex Bourd.
Originally published in December 2024, updated in February 2025 with extra information, including new models supported.
We are thrilled to announce the availability of a new OpenCL backend for llama.cpp, the well-recognized open-source project focused on large language model (LLM) inference. This backend, optimized for Qualcomm Adreno GPUs, supports a range of popular LLMs, including the latest DeepSeek distilled R1 model. This achievement represents a significant step forward in enhancing the performance and flexibility of the llama.cpp project for LLM inference within the AI community. Adreno OpenCL backend for Llama.cpp is now officially upstreamed to the open-source community via Codelinaro.
With this update, developers now have two options for running LLM inference workloads on Qualcomm Adreno GPUs: the open-source Machine Learning Compiler (MLC) project, which our team has been actively developing, and the llama.cpp project. For more details on running LLMs with MLC, please refer to this blog: Harnessing Qualcomm Adreno GPU for Generative AI: Open Source approach
The new OpenCL backend has been integrated into the llama.cpp mainline after its initial availability via Codelinaro. It is primarily based on the OpenCL 3.0 standard, with optional features like subgroups to achieve optimal performance. The backend has been well tested and optimized for premium Adreno GPUs and can be easily ported to other vendors’ GPUs that support the OpenCL 3.0 standard.
Benefits of leveraging OpenCL for Adreno
OpenCL (Open Computing Language), developed by the Khronos Group, is a widely adopted industry standard that allows developers to write efficient and portable parallel programming code that runs on a wide range of devices, including CPUs, GPUs, NPUs, FPGAs, and more, without needing in-depth knowledge of these devices. OpenCL on GPUs, in particular, has empowered developers to harness the immense parallel computing power of modern GPUs for general-purpose GPU (GPGPU) applications, such as image/video/vision signal processing and AI workloads like convolutional neural networks (CNNs) and large language models (LLMs).
As a key member of the OpenCL working group within the Khronos Group, Qualcomm Technologies, Inc. has been actively involved in the standardization of OpenCL.
Being one of the earliest adopters of the OpenCL standard on mobile GPUs, Qualcomm has supported OpenCL across a wide range of SoC devices, including high-end, mid-range, and low-end Android smartphones, IoT devices (like drones), automotive platforms, and Windows on Snapdragon (WoS) devices.
Qualcomm Technologies, Inc. has also provided a comprehensive set of tools (Snapdragon Profiler), OpenCL SDK examples, and an OpenCL programming guide with best practices to help developers get started with OpenCL on Adreno GPUs.
OpenCL on GPUs opens new avenues for developers to leverage the computational power of Adreno GPUs in Snapdragon devices. Offloading computationally intensive tasks like llama.cpp into the GPU frees up the CPU for other operations. Thanks to OpenCL openness and portability, the time-to-market for solutions can be significantly lowered, making the return on investment (RoI) highly favorable.
Key features and benefits of using the OpenCL backend for llama.cpp
- Enhanced Performance: The new backend significantly boosts the performance of llama.cpp on Adreno GPUs, enabling faster computations and more efficient processing.
- Broader Compatibility: The backend has been highly optimized for Adreno GPUs. However, the backend would run on all GPUs that support the OpenCL 3.0 standard with subgroup support, ensuring broader compatibility and accessibility.
- High flexibility: Users may modify and optimize the backend for different GPUs, as the current solution uses all standard OpenCL features. For example, the backend can use vendor extensions targeting other GPUs.
- Open-Source Collaboration: This update is a testament to the power of open-source collaboration. We have worked closely with the community so that this backend meets the needs of developers and users alike.
Tested Supported Models and Platforms.
We have rigorously tested llama.cpp with various large language models to confirm its robustness and performance. These tests include:
- Meta’s llama models, including llama 2 & 3 models, with parameters of 7 billion (7B) and 8B, etc.
- Gemma 1&2 2B models, Phi3 mini.
- Mistral 7B models
- Bilingual models like Qwen 1&2 7B, Baichuan 7B.
- DeepSeek R1 distilled models
The backend has been tested with many premium devices powered by Snapdragon SOCs:
- Laptops running Windows 11 with Snapdragon X Elite and Snapdragon X Plus chips
- Android smartphones powered by Snapdragon 8 Gen 1, 2, 3, and the latest Snapdragon 8 Elite
How to Build and Run llama.cpp on Android and Snapdragon X Elite with Windows on Snapdragon
llama.cpp with Adreno OpenCL backend has been well optimized on the Android devices powered by Qualcomm Snapdragon 8 Gen 1, 2, 3, and Elite mobile platforms, as well as the Snapdragon X Elite Compute Platform running on Windows 11. Here are the instructions to build and run llama.cpp on the two platforms.
Steps for Android
List of prerequisite software (other versions may work) and hardware
- Ubuntu 22.04
- Python3, CMake, Make and Ninja
- C/C++ compiler
- Android NDK version of 26.3.11579264, and installed in /opt/android-sdk/ndk/26.3.11579264/
- An Android device powered by Qualcomm Snapdragon 8 Gen 1, 2, 3, or Elite mobile platforms.
Install NDK
cd ~
wget https://dl.google.com/android/repository/commandlinetools-linux-8512546_latest.zip && \
unzip commandlinetools-linux-8512546_latest.zip && \
mkdir -p ~/android-sdk/cmdline-tools && \
mv cmdline-tools latest && \
mv latest ~/android-sdk/cmdline-tools/ && \
rm -rf commandlinetools-linux-8512546_latest.zip
yes | ~/android-sdk/cmdline-tools/latest/bin/sdkmanager "ndk;26.3.11579264"
Install OpenCL headers and ICD loader
The required files for running OpenCL are not directly available in the NDK distribution. Users must download the OpenCL headers and the ICD loader from the official Khronos® OpenCL repos for free. These files are then used along with Android NDK to build the llama.cpp executables.
mkdir -p ~/dev/llm
cd ~/dev/llm
git clone https://github.com/KhronosGroup/OpenCL-Headers && \
cd OpenCL-Headers && \
cp -r CL ~/android-sdk/ndk/26.3.11579264/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include
cd ~/dev/llm
git clone https://github.com/KhronosGroup/OpenCL-ICD-Loader && \
cd OpenCL-ICD-Loader && \
mkdir build_ndk26 && cd build_ndk26 && \
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_TOOLCHAIN_FILE=$HOME/android-sdk/ndk/26.3.11579264/build/cmake/android.toolchain.cmake \
-DOPENCL_ICD_LOADER_HEADERS_DIR=$HOME/android-sdk/ndk/26.3.11579264/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include \
-DANDROID_ABI=arm64-v8a \
-DANDROID_PLATFORM=24 \
-DANDROID_STL=c++_shared && \
ninja && \
cp libOpenCL.so ~/android-sdk/ndk/26.3.11579264/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/lib/aarch64-linux-android
Build llama.cpp with the Adreno OpenCL backend
cd ~/dev/llm
git clone https://github.com/ggerganov/llama.cpp && \
cd llama.cpp && \
mkdir build-android && cd build-android
cmake .. -G Ninja \
-DCMAKE_TOOLCHAIN_FILE=$HOME/android-sdk/ndk/26.3.11579264/build/cmake/android.toolchain.cmake \
-DANDROID_ABI=arm64-v8a \
-DANDROID_PLATFORM=android-28 \
-DBUILD_SHARED_LIBS=OFF \
-DGGML_OPENCL=ON
ninja
If built successfully, the executable will be located at build/bin
Steps for Snapdragon X Elite with Windows on Snapdragon
List of prerequisite software (other versions may work) and hardware
- Visual Studio 2022 (community or professional version)
- Python3, CMake and Ninja
- LLVM 19 (can be downloaded from )
- A laptop powered by Snapdragon X Elite
Install OpenCL headers and ICD loader
The required files for running OpenCL are not directly available in the NDK distribution. Users must download the OpenCL headers and the ICD loader from the official Khronos® OpenCL repos for free.
These files are then used along with Android NDK to build the llama.cpp executables.
mkdir -p ~/dev/llm
cd ~/dev/llm
git clone https://github.com/KhronosGroup/OpenCL-Headers && cd OpenCL-Headers
mkdir build && cd build
cmake .. -G Ninja \`
-DBUILD_TESTING=OFF \`
-DOPENCL_HEADERS_BUILD_TESTING=OFF \`
-DOPENCL_HEADERS_BUILD_CXX_TESTS=OFF \`
-DCMAKE_INSTALL_PREFIX="$HOME/dev/llm/opencl"
cmake --build . --target install
cd ~/dev/llm
git clone https://github.com/KhronosGroup/OpenCL-ICD-Loader && cd OpenCL-ICD-Loader
mkdir build && cd build
cmake .. -G Ninja \`
-DCMAKE_BUILD_TYPE=Release \`
-DCMAKE_PREFIX_PATH="$HOME/dev/llm/opencl" \`
-DCMAKE_INSTALL_PREFIX="$HOME/dev/llm/opencl"
cmake --build . --target install
Build llama.cpp
mkdir -p ~/dev/llm
cd ~/dev/llm
git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp
mkdir build && cd build
cmake .. -G Ninja \`
-DCMAKE_TOOLCHAIN_FILE="$HOME/dev/llm/llama.cpp/cmake/arm64-windows-llvm.cmake" \`
-DCMAKE_BUILD_TYPE=Release \`
-DCMAKE_PREFIX_PATH="$HOME/dev/llm/opencl" \`
-DBUILD_SHARED_LIBS=OFF \`
-DGGML_OPENCL=ON
ninja
If built successfully, the executable will be located at build\bin
Launch the executable
Here is an example of how to run the llama.cpp executable:
./llama-cli -m ggml-model-qwen1.5-7b-chat-Q4_0.gguf -b 128 -ngl 99 -c 2048 -p "Hello"
Note that currently the Adreno OpenCL backend has been optimized for the weights using the Q4_0 quantization scheme. The optimization for weights using other schemes, such as FP16 and Q6, is in progress and we will update soon.
Future Work
Qualcomm team is working on bringing more Adreno specific features into the OpenCL backend. Adreno GPUs support a wide range of extensions that allows better performance and power. For instance, we support features like integer dot product, and on-chip global memory (please refer to the Adreno SDK from Qualcomm Developer ).
Conclusion
The addition of the OpenCL GPU backend for Adreno GPUs is a significant step forward for llama.cpp. We are excited to see how this enhancement will be utilized by the community and look forward to your feedback.
Want to know more? Join our Discord community to engage with Qualcomm Technologies’ experts, connect with fellow developers working with our technology and stay updated on the latest developer-focused news and product updates.