WebAssembly SIMD
Overview
ICP supports deterministic WebAssembly SIMD support. This is a significant milestone for smart contracts demanding top on-chain performance, such as artificial intelligence (AI), image processing (NFTs), games, scientific decentralized applications (dapps), and more.
However, a significant performance boost is also possible for "classical" blockchain operations implemented in canisters. For example, reward distribution or cryptographic operations might benefit from the new WebAssembly SIMD instructions.
What is WebAssembly SIMD?
WebAssembly SIMD (single instruction, multiple data) is a set of more than 200 deterministic vector instructions defined in the WebAssembly core specification. This parallel processing significantly accelerates specific tasks within canisters running on ICP.
The SIMD functionality is available on every ICP node.
Developer benefits
WebAssembly SIMD support enables a new level of performance on ICP. Developers can:
- Optimize code for computationally heavy tasks: Identify areas within their canisters that can benefit from SIMD instructions and tailor their code for accelerated performance.
- Unlock new possibilities: Explore novel functionalities and complex applications that were previously limited by processing power.
- Build a future-proof foundation: Positions developers at the forefront of blockchain innovation.
Using WebAssembly SIMD
There are two main ways to benefit from WebAssembly SIMD in a smart contract:
Loop auto-vectorization: Just enabling the WebAssembly SIMD and recompiling the project might be enough to get a significant performance boost. This is usually simple, error-proof, and can be a one-line change. This is often the recommended first step, but the result depends heavily on the used algorithms, libraries, and compilers.
SIMD intrinsics: Some computation-heavy functions may be rewritten using direct SIMD instructions. This exposes the full SIMD potential, but in many cases some core canister algorithms must be completely rewritten using new instructions.
Using loop auto-vectorization
To leverage the loop auto-vectorization, the WebAssembly SIMD instructions should be enabled globally for the entire workspace, or locally for specific functions within the canister. Once the instructions are available to the compiler, it automatically converts some normal loops into loops with parallel computations.
While the change is easy and error-proof, the result in practice depends on many factors, like the algorithm itself, the compiler optimization level and options, project dependencies, etc.
Example
To enable WebAssembly SIMD instructions globally for the whole workspace and all its dependencies:
- Rust
Create the `.cargo/config.toml` file with the following content:
[build]
target = ["wasm32-unknown-unknown"]
[target.wasm32-unknown-unknown]
rustflags = ["-C", "target-feature=+simd128"]
To enable WebAssembly SIMD instructions just for a specific function within a canister:
- Rust
#[target_feature(enable = "simd128")]
fn auto_vectorization() {
...
}
WebAssembly SIMD instructions may be enabled by default in future dfx
versions, so enabling it for a specific function within a canister might have no effect.
Using WebAssembly SIMD intrinsics
WebAssembly SIMD instructions are available as platform-specific intrinsics for the wasm32
platform. To use the intrinsics, the WebAssembly SIMD instructions should be enabled as described in the previous section.
Example
Here's a short code snippet demonstrating how to multiply two arrays of four float elements each using a single SIMD instruction:
- Rust
#[inline(always)]
#[target_feature(enable = "simd128")]
pub fn mul4(a: [f32; 4], b: [f32; 4]) -> [f32; 4] {
use core::arch::wasm32::*;
// Load the arrays `A` and `B` into the SIMD registers.
let a = unsafe { v128_load(a.as_ptr() as *const v128) };
let b = unsafe { v128_load(b.as_ptr() as *const v128) };
// Multiply elements of `A` and `B` using a single SIMD instruction.
let c = f32x4_mul(a, b);
// Store and return the result.
let mut res = [0.0; 4];
unsafe { v128_store(res.as_mut_ptr() as *mut v128, c) };
res
}
Frequently asked questions
How to measure performance speedup of a canister?
ICP provides the ic0.performance_counter
system API call to measure a canister's performance.
There is also the canbench
benchmarking framework.
Are there any libraries for artificial intelligence (AI) inferences?
The Sonos tract
is a tiny, self-contained, Tensorflow and ONNX inference Rust library. DFINITY contributed WebAssembly SIMD support to the library. The library is used in some DFINITY AI demos and examples.
References and examples
- WebAssembly SIMD Rust example compares the performance of a naive, optimized, auto-vectorized and SIMD intrinsic matrix multiplication running on ICP.
- WebAssembly core specification for SIMD instructions.