Operators

Caffe-Supported Operators

Operator Note
ArgMax nan
BatchNorm nan
Concat max. 256 tensor concat
Convolution tensor size < 2^31
h*w < 64 if kernel size is do*h*w*di.
Depthwise Convolution if group is 1.
Convolution if group is C.
Multiple Convolution if 1 < group < C.
round(di/16)*round(do/16) < 512*1024
ConvolutionDepthwise Natively supported Kernel_size is 3 * 3. For any other cases, Convolution will be performed.
Limitation: Pad range [0,1].
CReLU nan
ContinuationIndicator nan
Crop nan
Deconvolution tensor size < 2^31
h*w < 64 if kernel size is do*h*w*di
round(di/16)*round(do/16) < 512*1024
Dropout nan
Eltwise Supports Add, Sub, Mul and Maximum
shape:
(4 dimensional vectors, NCHW), (NCHW, const), (NCHW, C), (NCHW, NCHW)
Flatten nan
InnerProduct round(di/16)*round(do/16) < 512*1024 if weight size do*di
Permute nan
Pooling assuming that kernel size is h*w
1. global_pooling is true
h*w <= 255 if AVGPool input is unit8
h*w <= 65025 if AVGPool input is int16
h*w <= 65025 for MAXPool
2. global_pooling is false
h*w <= 19*19
PriorBox nan
Power Only supports positive integers
Reshape nan
Reverse nan
ROIPooling The input dimension of rois in ROIPooling is (Nx5). Note that only when the second half of the network is entirely InnerProduct, will you be able to set N to be greater than 1. If any convolution exists in the second half of the network, N can only be set to 1, and the second network segment should be executed N times in a loop. For detailed method of use and restriction, see the note provided below.
ReLU input <= 4-dimensional
PReLU input <= 4-dimensional
Sigmoid nan
Slice nan
Scale nan
Softmax If you need to perform calculation on a specified dimension, transport the dimension to be calculated to the last dimension (innermost dimension)
max. 32*512=16384
Split nan
Tanh nan
Threshold input = 4-dimensional
Tile nan
Upsample Upsample operator does not exist in caffe. You can modify Deconvolution to Upsample manually.
input = 4-dimensional
support the same scale on H and W
Reorg Only supports stride = 2
LSTM support Forward only

Please Note:

  • Upsample operator is defined as follows in prototxt:

    layer { 
        bottom: "layer85-conv" 
        top: "layer86-upsample" 
        name: "layer86-upsample" 
        type: "Upsample" 
        upsample_param { 
            scale: 2 
        } 
    }
    

    The parameter scale has the same meaning as Stride in Deconvolution. Note however that Upsample is equivalent to the Deconvolution operator with a weight of all 1.

  • The description of the ROIPooling operator in prototxt goes as follows:

    layer { 
        name: "roi_pool5" 
        type: "ROIPooling" 
        bottom: "conv5_3" 
        bottom: "rois" 
        top: "pool5" 
        roi_pooling_param { 
            pooled_w: 7 
            pooled_h: 7 
            spatial_scale: 0.0625 
        } 
    }
    

    Roi_pooling_param supports pooled_w, pooled_h and spatial_scale only. The rois input of the Float model is the coordinate of the rpn layout output. The rois input of the Fixed and Offline models is the output coordinates of the rpn layer multiplied by the spatial_scale value and then quantized to int16, which will be sent to the model.

TensorFlow-Supported Operators

Category Operator Note
Convolution Conv Limitation: Kernel_size: H * W < 255
Convolution DepthwiseConv2dNative Natively supported Kernel_size is 3 * 3. For any other cases, Convolution will be performed.
Convolution FullyConnected nan
Pooling Max pooling nan
Pooling Average Pooling nan
Activation ReLU nan
Activation PReLU nan
Activation ReLU6 nan
Activation LeakyReLU nan
Activation Sigmoid nan
Math Less nan
Math Greater nan
Math GreaterEqual nan
Math Equal nan
Math Add nan
Math Sub nan
Math Mul nan
Math RealDiv Only supports the second operand as a constant Tensor
Math Maximum nan
Math Minimum nan
Math Mean nan
Math Max nan
Math Sqrt nan
Math Rsqrt nan
Math Round nan
Math Softmax If you need to perform calculation on a specified dimension, transport the dimension to be calculated to the last dimension (innermost dimension)
Math FusedBatchNorm nan
Math Exp nan
DMA Align nan
DMA ConcatV2 nan
DMA Fill nan
DMA Gather nan
DMA GatherV2 nan
DMA Pack nan
DMA Pad nan
DMA SpaceToBatchND nan
DMA BatchToSpaceND nan
DMA Zeroslike nan
DMA Split nan
DMA Slice nan
DMA Unpack nan
DMA Tile nan
DMA Reshape nan
DMA Transpose nan
DMA Resize_bilinear The current bilinear interpolation supports only those cases which satisfy the following requirements:
(1) supports interger-fold magnification only
(2) magnification must be less than or equal to 8 times
(3) supports only 3D data difference, i.e. the N of NHWC must be 1, which is similar to the convolution.
Misc TopKV2 nan
Misc shape nan

Onnx-Supported Operators

Operator Note
Add shape:
(4 dimensional vectors, NCHW), (NCHW, const), (NCHW, C), (NCHW, NCHW)
Abs nan
ArgMax nan
AveragePool h*w <= 19*19 if kernel size is h*w
BatchNorm nan
Concat max. 256 tensor concat
Convolution tensor size < 2^31
h*w < 64 if kernel size co*h*w*ci
round(ci/16)*round(co/16) < 512*1024
Supports autoPad SAME_UPPER attribute. Limitation: Pads range: [0, 7]
Clip Max == 6 is equal to Relu6
DepthwiseConv2D Natively supported Kernel_size is 3 * 3. For any other cases, Convolution will be performed.
Limitation: Pad range [0,1].
Div refer to Add operator
Dropout nan
DepthToSpace for 4 dimensional input only
Expand nan
Exp nan
Gather indices are int32 constant
Gemm nan
GlobalAveragePool if kernel size is h*w
h*w <= 255 for uint8
h*w <= 65025 for int16
GlobalMaxPool if kernel size is h*w
h*w <= 255 for uint8
h*w <= 65025 for int16
LSTM for Forward only
Matmul input <= 4-dimensional
if weight size is do*di
round(di/16)*round(do/16) < 512*1024
Mul Refer to Add operator
MaxPool if kernel size is h*w, h*w <= 19*19
Max nan
Pad support constant only
Reshape nan
ReduceSum input <= 4-dimensional
ReduceMean input <= 4-dimensional
ReduceMax input <= 4-dimensional
Only supports calculation of one axis at the same time
Resize Does not support roi parameter. This operator only scales the entire dimension. In bilinear fashion, the parameter coordinate_transformation_mod only supports align_corners and asymmetric.
down sampling not supported
input = 4-dimensional
same scale on H and W
ReLU input = 4-dimensional
PReLU input <= 4-dimensional
LeakyReLU input <= 4-dimensional
TanH nan
Sigmoid nan
Slice nan
Softmax If you need to perform calculation on a specified dimension, transport the dimension to be calculated to the last dimension (innermost dimension)
max. 32*512=16384
Split nan
SpaceToDepth nan
Squeeze nan
Sub Refer to Add operator
Transpose nan
Tile nan
Unsqueeze nan
Upsample input = 4-dimensional
same scale on H and W

SigmaStar IPU SDK Model Limitations

  • For DepthwiseConv, if kernel size > 3, input size == kernel size must be satisfied.

  • For Softmax with specified dimension, we only support operation of the inntermost dimension (that is, if the Tensor is multi-dimensional, only Softmax operation against the innermost dimension will be done).

  • For TensorFlow network, try to minimize use of DMA operators that occupy large amount of data (including operators for pure data transportation operation such as Gather, Unpack, Pack, Concat, Reshape, Slice, Tile, Tanspose, Pad, and Split). If the C dimension is an integer multiple of 16, use of the above operator would speed up the calculation.

  • Similar to TensorFlow network, minimize use of operators such as Split, Concat, Reshape, Flatten, Slice, and Permute, in Caffe network.

  • Except the first-layer conversion, the Conv DI dimension of the remaining layers (i.e. the C dimension in NHWC) is in positive proportion to the efficiency. The larger the dimension, the higher the efficiency. The maximum support is 2048.

  • For math type operators (including Add, Sub, Mul, and Div, etc.), the efficiency will be higher if the right-hand operand is scaler (single number) or 1-dimensional (with identical HW data dimension but different C data dimension).

  • In the network structure, avoid using the output of one operator as the input of multiple operators, such as ResNet residual structure and GoogleNet inception module, where possible.