このページは Cloud Translation API によって翻訳されました。

MLIR におけるスパースなテンソル型

スパーステンソルエンコードは、情報をエンコードするための属性テンソルのスパース性の性質に基づいて TACO によるスパーステンソルの形式化によって得たデータです。このエンコードは最終的に sparsifier パスによってスパースなコードを生成しています。計算のスパース性に依存しない表現、つまり暗黙的なスパース表現が明示的なスパース表現に変換されます。同時反復ループは、スパースストレージで動作します。スパース性を持つテンソルではなくあります。このスパースファイアパスの前に実行されるコンパイラパスは、テンソル型のセマンティクスの根本的な部分です。

このエンコードでは、dimension を使用して、セマンティックテンソルの軸を参照する level は実際のストレージ形式（つまり、メモリ内のスパーステンソルの演算表現。商品数ディメンションは通常、（CSR のストレージ形式など）。ただし、エンコードによって高次のレベル（たとえば、ブロックスパース BSR ストレージ形式をエンコードするか、下位レベルにエンコードします。（たとえば、ディメンションをストレージ内の単一のレベルとして線形化する場合など）。

エンコードには、以下を提供するマップが含まれます。

順序付けされたディメンション指定の順序。それぞれが以下を定義します。 <ph type="x-smartling-placeholder">
- ディメンションサイズ（テンソルのディメンション形状から暗黙的に取得）
- ディメンション式
順序付けされた一連のレベル仕様。各仕様には、 level-type: レベルの保存方法を定義します。各レベルタイプには、次の要素で構成されます。 <ph type="x-smartling-placeholder">
- 格納される内容を定義するレベル式
- level-format（レベル形式）
- レベル形式に適用されるレベルプロパティのコレクション

各レベル式はアフィン式である比較できますしたがって、複数のレベル式をまとめてアフィンマップを次元座標からレベル座標です。ディメンション式逆マップを一緒に定義し、これは、推測できない複雑なケースに対してのみ提供します。自動的に適用されます。

各ディメンションにオプションの SparseTensorDimSliceAttr を指定することもできます。スパースストレージ形式内では、明示的に保存されているインデックスを座標と位置としての保存形式へのオフセット。

サポートされているレベル形式は次のとおりです。

dense : このレベルのすべてのエントリが保存されています。
comcompress : この階のゼロ以外のデータのみが格納されます。
loose_compressed : 圧縮としてのリージョン間の空き領域を確保できる
シングルトン : 圧縮形式のバリアントで、座標に兄弟要素はありません。
block2_4 : 圧縮で 1x4 ブロックあたり 2:4 のエンコードが使用されます

圧縮レベルの場合、各位置間隔は下限の pos(i) と上限の pos(i+1) - 1 です。つまり、連続する間隔は「穴」なしで順番に表示される必要がある中間できます。疎圧縮形式では、それぞれの要素を表すことにより、下限 lo(i) と上限 hi(i) を持つ位置間隔。を使用すると、間隔を任意の順序で表示できます。また、間にゆとりを持たせることもできます。

デフォルトでは、各レベルタイプは一意である（重複しない）という特性があります。順序あり（座標はそのレベルで並べ替えて表示されます）。レベル形式に以下のプロパティを追加して変更可能動作しません。

nonunique : 同じレベルで座標が重複している可能性があります。
nonordered : 座標は指定順で指定可能

マップに加えて、次の 2 つのフィールドは省略可能です。

位置の保存に必要なビット幅（整数オフセットスパースストレージスキームに組み込まれます）。幅を狭くするとメモリの使用量がオーバーヘッドストレージのフットプリントが大きくなり、必要な合計範囲（つまり、保存データの。8、16、 32、64、またはデフォルトの 0（ネイティブのビット幅を示します）。
座標を格納するために必要なビット幅（座標はあります。幅を狭くするとメモリ使用量を削減できますオーバーヘッドストレージのオーバーヘッドサイズが大きくなります。ただし、必要な範囲の合計（つまり各テンソルの最大値）座標を表すものでなければなりません。8、16、32 から選択できます。 64、またはデフォルトの 0（ネイティブのビット幅を示します）。

例

CSR(Compressed Sparse Row) 形式スパーステンソルエンコーディングがあります。次のようになります。

#CSR = #sparse_tensor.encoding<{
  map = (i, j) -> (i : dense, j : compressed)
}>

これは、first dimension（行）が first level にマッピングされることを示しています。これは dense レベルであり、サイズ 4 で示されます。second dimension （列）は、position 配列と second level にマッピングされ、座標配列。値 3（元のマトリックスの [1, 1]）は次のとおりです。位置の配列（値 3 の行番号）からのオフセットで表される 2 番目のオフセットペアであるため、元の行列の 1 になり、列番号は座標配列のインデックス [2 : 4) にあります）。また、座標配列では、値 3 の列番号が 1 であることがわかります。コピーします。

BSR(Block Sparse Row) 形式のスパーステンソル型は次のとおりです。

#BSR = #sparse_tensor.encoding<{
  map = (i, j) ->
    ( i floordiv 2 : dense
    , j floordiv 2 : compressed
    , i mod 2 : dense
    , j mod 2 : dense
    )

次の 2x2 ブロックの疎行列を考えてみましょう。

Example 2x2 block storage:
 +-----+-----+-----+    +-----+-----+-----+
 | 1 2 | . . | 4 . |    | 1 2 |     | 4 0 |
 | . 3 | . . | . 5 |    | 0 3 |     | 0 5 |
 +-----+-----+-----+ => +-----+-----+-----+
 | . . | 6 7 | . . |    |     | 6 7 |     |
 | . . | 8 . | . . |    |     | 8 0 |     |
 +-----+-----+-----+    +-----+-----+-----+

最終的には TACO 風味の形式で

Stored as:
   positions[1]   : 0 2 3
   coordinates[1] : 0 2 1
   values         : 1.000000 2.000000 0.000000 3.000000
                    4.000000 0.000000 0.000000 5.000000
                    6.000000 7.000000 8.000000 0.000000

ちなみに、これは文字通り、 cuSparse ドキュメントに記載されている NVidia ブロック形式。

スパース行をブロックする。（n.d.-b）。NVIDIAhttps://docs.nvidia.com/cuda/cusparse/_images/bsr.png

Nvidia's 2:4 structured sparsity もサポートしています。

このスパーステンソル型は次のとおりです。

#NV_24 = #sparse_tensor.encoding<{
  map = ( i, j ) -> ( i            : dense,
                      j floordiv 4 : dense,
                      j mod 4      : block2_4),
  crdWidth = 2  // 2-bits for each coordinate
}>

次の NVidia のドキュメントに記載されているサンプルマトリックスを使用するとします。

スパース MMA ストレージの例。（出版年なし）。NVIDIA https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-for-sparse-mma

MLIR はこの行列を同じレイアウトにマッピングします。

coordinates[2]  :
   0 2 0 2 0 2 0 2
   1 3 1 3 1 3 1 3
   0 1 2 3 0 1 2 3
   2 3 0 1 2 3 0 1
   0 1 0 1 0 1 0 1
   0 1 0 1 0 1 0 1
   2 3 2 3 2 3 2 3
   2 3 2 3 2 3 2 3
   0 2 0 2 0 2 0 2
   1 3 1 3 1 3 1 3
   0 1 2 3 0 1 2 3
   2 3 0 1 2 3 0 1
   0 1 0 1 0 1 0 1
   0 1 0 1 0 1 0 1
   2 3 2 3 2 3 2 3
   2 3 2 3 2 3 2 3
values :
  1.000000 2.000000 3.000000 4.000000 1.000000 2.000000 3.000000 4.000000
  5.000000 6.000000 7.000000 8.000000 5.000000 6.000000 7.000000 8.000000
  9.000000 10.000000 11.000000 12.000000 9.000000 10.000000 11.000000 12.000000
  13.000000 14.000000 15.000000 16.000000 13.000000 14.000000 15.000000 16.000000
  17.000000 18.000000 19.000000 20.000000 17.000000 18.000000 19.000000 20.000000
  21.000000 22.000000 23.000000 24.000000 21.000000 22.000000 23.000000 24.000000
  25.000000 26.000000 27.000000 28.000000 25.000000 26.000000 27.000000 28.000000
  29.000000 30.000000 31.000000 32.000000 29.000000 30.000000 31.000000 32.000000
  1.000000 2.000000 3.000000 4.000000 1.000000 2.000000 3.000000 4.000000
  5.000000 6.000000 7.000000 8.000000 5.000000 6.000000 7.000000 8.000000
  9.000000 10.000000 11.000000 12.000000 9.000000 10.000000 11.000000 12.000000
  13.000000 14.000000 15.000000 16.000000 13.000000 14.000000 15.000000 16.000000
  17.000000 18.000000 19.000000 20.000000 17.000000 18.000000 19.000000 20.000000
  21.000000 22.000000 23.000000 24.000000 21.000000 22.000000 23.000000 24.000000
  25.000000 26.000000 27.000000 28.000000 25.000000 26.000000 27.000000 28.000000
  29.000000 30.000000 31.000000 32.000000 29.000000 30.000000 31.000000 32.000000

その他の例:

// Sparse vector.
#SparseVector = #sparse_tensor.encoding<{
  map = (i) -> (i : compressed)
}>
... tensor<?xf32, #SparseVector> ...

// Sorted coordinate scheme.
#SortedCOO = #sparse_tensor.encoding<{
  map = (i, j) -> (i : compressed(nonunique), j : singleton)
}>
... tensor<?x?xf64, #SortedCOO> ...

// Batched sorted coordinate scheme, with high encoding.
#BCOO = #sparse_tensor.encoding<{
  map = (i, j, k) -> (i : dense, j : compressed(nonunique, high), k : singleton)
}>
... tensor<10x10xf32, #BCOO> ...

// Compressed sparse row.
#CSR = #sparse_tensor.encoding<{
  map = (i, j) -> (i : dense, j : compressed)
}>
... tensor<100x100xbf16, #CSR> ...

// Doubly compressed sparse column storage with specific bitwidths.
#DCSC = #sparse_tensor.encoding<{
  map = (i, j) -> (j : compressed, i : compressed),
  posWidth = 32,
  crdWidth = 8
}>
... tensor<8x8xf64, #DCSC> ...

// Block sparse row storage (2x3 blocks).
#BSR = #sparse_tensor.encoding<{
  map = ( i, j ) ->
  ( i floordiv 2 : dense,
    j floordiv 3 : compressed,
    i mod 2      : dense,
    j mod 3      : dense
  )
}>
... tensor<20x30xf32, #BSR> ...

// Same block sparse row storage (2x3 blocks) but this time
// also with a redundant reverse mapping, which can be inferred.
#BSR_explicit = #sparse_tensor.encoding<{
  map = { ib, jb, ii, jj }
        ( i = ib * 2 + ii,
          j = jb * 3 + jj) ->
  ( ib = i floordiv 2 : dense,
    jb = j floordiv 3 : compressed,
    ii = i mod 2 : dense,
    jj = j mod 3 : dense)
}>
... tensor<20x30xf32, #BSR_explicit> ...

// ELL format.
// In the simple format for matrix, one array stores values and another
// array stores column indices. The arrays have the same number of rows
// as the original matrix, but only have as many columns as
// the maximum number of nonzeros on a row of the original matrix.
// There are many variants for ELL such as jagged diagonal scheme.
// To implement ELL, map provides a notion of "counting a
// dimension", where every stored element with the same coordinate
// is mapped to a new slice. For instance, ELL storage of a 2-d
// tensor can be defined with the mapping (i, j) -> (#i, i, j)
// using the notation of [Chou20]. Lacking the # symbol in MLIR's
// affine mapping, we use a free symbol c to define such counting,
// together with a constant that denotes the number of resulting
// slices. For example, the mapping [c](i, j) -> (c * 3 * i, i, j)
// with the level-types ["dense", "dense", "compressed"] denotes ELL
// storage with three jagged diagonals that count the dimension i.
#ELL = #sparse_tensor.encoding<{
  map = [c](i, j) -> (c * 3 * i : dense, i : dense, j : compressed)
}>
... tensor<?x?xf64, #ELL> ...

// CSR slice (offset = 0, size = 4, stride = 1 on the first dimension;
// offset = 0, size = 8, and a dynamic stride on the second dimension).
#CSR_SLICE = #sparse_tensor.encoding<{
  map = (i : #sparse_tensor<slice(0, 4, 1)>,
          j : #sparse_tensor<slice(0, 8, ?)>) ->
        (i : dense, j : compressed)
}>
... tensor<?x?xf64, #CSR_SLICE> ...

MLIR におけるスパースなテンソル型 コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

例

MLIR におけるスパースなテンソル型