MLIR Sparsifier

Make writing sparse code easy.

Because exploiting sparsity explicitly (by hand) comes at a cost, we propose exploiting sparsity implicitly, treating sparsity as a type of data using polymorphism in a sparse compiler (aka sparsifier) to generate sparse code fully automatically from a sparsity-agnostic (viz. “dense”) definition.

Approach

We've implemented generalized sparsity. Rather than introducing a small set of "sparse ops" and types, we use operator polymorphism to support any operation with any sparse type, minimizing source changes to existing models, and dramatically simplifying incremental sparsification of existing models. For any sparsity, we can generate highly optimized code automatically.

Sparsity Use Case

Sparse tensors arise in problems in science, engineering, machine learning, and data analytics. Programs that operate on such tensors can leverage sparsity to reduce storage requirements and computational time. In recent years, the birth and exponential growth of large deep neural networks mandate more efficient approaches to sparse matrix computation. The MLIR Sparsifier is an initiative to extend Google's compiler stack for sparse deep learning workloads using various frameworks (JAX, PyTorch) and targets (mobile/server CPU, GPU, and TPU).

Sparsity as a Property

The MLIR Sparsifier treats sparsity as a property of tensors, not a tedious user implementation task. Programmers are merely required to annotate sparse tensor types; subsequently, the compiler generates sparse code automatically from a sparsity-agnostic definition of the computation. With sparse tensor types as first-class citizens, any operation can be made sparse by simply annotating the tensor types of the operands. Compiler transformations take care of lowering the operation to imperative constructs and sparse storage formats that only store and iterate over nonzero elements to perform the matrix multiplication, completely abstracting away complexities from users.

component_exchange MLIR-based

Google’s MLIR Sparsifier capabilities are built on top of MLIR, ensuring the most up-to-date design paradigms and extensibility. The ability to progressively lower dialects closer to the target hardware during the compilation process, together with an intuitive transformation mechanism, has made MLIR a popular compiler infrastructure for domain-specific languages that need to bridge large semantic gaps, such as compiling for machine learning.

all_inclusive Flexible

The MLIR Sparsifier not only enables non-expert programmers to generate sparse code quickly but also enables expert programmers to explore the full space of possible sparse implementations. Additionally, the MLIR Sparsifier is not restricted to a specific type of sparsity or hardware. The long-term goal of this initiative is to address different types of sparsity (weight, activation, or semantic) on different devices (CPU, GPU, or TPU).

rocket_launch Long-term Goal

The MLIR Sparsifier team has developed the vision and pilot implementation of the sparsification technology. The MLIR Sparsifier will be a multi-year initiative that further strengthens the leadership position of Google in ML and compiler technology. With feedback from our valued customers and partners including hardware vendors, and with support from ML framework layer, we aim to identify key workloads and requirements and roll out the sparse ML compiler into production for CPUs, GPUs, and ultimately TPUs.

Learn more about MLIR Sparsifier with hands-on tutorials!