Operators
Caffe-Supported Operators¶
Operator | Note |
---|---|
ArgMax | nan |
BatchNorm | nan |
Concat | max. 256 tensor concat |
Convolution | tensor size < 2^31 h*w < 64 if kernel size is do*h*w*di. Depthwise Convolution if group is 1. Convolution if group is C. Multiple Convolution if 1 < group < C. round(di/16)*round(do/16) < 512*1024 |
ConvolutionDepthwise | Natively supported Kernel_size is 3 * 3. For any other cases, Convolution will be performed. Limitation: Pad range [0,1]. |
CReLU | nan |
ContinuationIndicator | nan |
Crop | nan |
Deconvolution | tensor size < 2^31 h*w < 64 if kernel size is do*h*w*di round(di/16)*round(do/16) < 512*1024 |
Dropout | nan |
Eltwise | Supports Add, Sub, Mul and Maximum shape: (4 dimensional vectors, NCHW), (NCHW, const), (NCHW, C), (NCHW, NCHW) |
Flatten | nan |
InnerProduct | round(di/16)*round(do/16) < 512*1024 if weight size do*di |
Permute | nan |
Pooling | assuming that kernel size is h*w 1. global_pooling is true h*w <= 255 if AVGPool input is unit8 h*w <= 65025 if AVGPool input is int16 h*w <= 65025 for MAXPool 2. global_pooling is false h*w <= 19*19 |
PriorBox | nan |
Power | Only supports positive integers |
Reshape | nan |
Reverse | nan |
ROIPooling | The input dimension of rois in ROIPooling is (Nx5). Note that only when the second half of the network is entirely InnerProduct, will you be able to set N to be greater than 1. If any convolution exists in the second half of the network, N can only be set to 1, and the second network segment should be executed N times in a loop. For detailed method of use and restriction, see the note provided below. |
ReLU | input <= 4-dimensional |
PReLU | input <= 4-dimensional |
Sigmoid | nan |
Slice | nan |
Scale | nan |
Softmax | If you need to perform calculation on a specified dimension, transport the dimension to be calculated to the last dimension (innermost dimension) max. 32*512=16384 |
Split | nan |
Tanh | nan |
Threshold | input = 4-dimensional |
Tile | nan |
Upsample | Upsample operator does not exist in caffe. You can modify Deconvolution to Upsample manually. input = 4-dimensional support the same scale on H and W |
Reorg | Only supports stride = 2 |
LSTM | support Forward only |
Please Note:
-
Upsample operator is defined as follows in prototxt:
layer { bottom: "layer85-conv" top: "layer86-upsample" name: "layer86-upsample" type: "Upsample" upsample_param { scale: 2 } }
The parameter scale has the same meaning as Stride in Deconvolution. Note however that Upsample is equivalent to the Deconvolution operator with a weight of all 1.
-
The description of the ROIPooling operator in prototxt goes as follows:
layer { name: "roi_pool5" type: "ROIPooling" bottom: "conv5_3" bottom: "rois" top: "pool5" roi_pooling_param { pooled_w: 7 pooled_h: 7 spatial_scale: 0.0625 } }
Roi_pooling_param supports pooled_w, pooled_h and spatial_scale only. The rois input of the Float model is the coordinate of the rpn layout output. The rois input of the Fixed and Offline models is the output coordinates of the rpn layer multiplied by the spatial_scale value and then quantized to int16, which will be sent to the model.
TensorFlow-Supported Operators¶
Category | Operator | Note |
---|---|---|
Convolution | Conv | Limitation: Kernel_size: H * W < 255 |
Convolution | DepthwiseConv2dNative | Natively supported Kernel_size is 3 * 3. For any other cases, Convolution will be performed. |
Convolution | FullyConnected | nan |
Pooling | Max pooling | nan |
Pooling | Average Pooling | nan |
Activation | ReLU | nan |
Activation | PReLU | nan |
Activation | ReLU6 | nan |
Activation | LeakyReLU | nan |
Activation | Sigmoid | nan |
Math | Less | nan |
Math | Greater | nan |
Math | GreaterEqual | nan |
Math | Equal | nan |
Math | Add | nan |
Math | Sub | nan |
Math | Mul | nan |
Math | RealDiv | Only supports the second operand as a constant Tensor |
Math | Maximum | nan |
Math | Minimum | nan |
Math | Mean | nan |
Math | Max | nan |
Math | Sqrt | nan |
Math | Rsqrt | nan |
Math | Round | nan |
Math | Softmax | If you need to perform calculation on a specified dimension, transport the dimension to be calculated to the last dimension (innermost dimension) |
Math | FusedBatchNorm | nan |
Math | Exp | nan |
DMA | Align | nan |
DMA | ConcatV2 | nan |
DMA | Fill | nan |
DMA | Gather | nan |
DMA | GatherV2 | nan |
DMA | Pack | nan |
DMA | Pad | nan |
DMA | SpaceToBatchND | nan |
DMA | BatchToSpaceND | nan |
DMA | Zeroslike | nan |
DMA | Split | nan |
DMA | Slice | nan |
DMA | Unpack | nan |
DMA | Tile | nan |
DMA | Reshape | nan |
DMA | Transpose | nan |
DMA | Resize_bilinear | The current bilinear interpolation supports only those cases which satisfy the following requirements: (1) supports interger-fold magnification only (2) magnification must be less than or equal to 8 times (3) supports only 3D data difference, i.e. the N of NHWC must be 1, which is similar to the convolution. |
Misc | TopKV2 | nan |
Misc | shape | nan |
Onnx-Supported Operators¶
Operator | Note |
---|---|
Add | shape: (4 dimensional vectors, NCHW), (NCHW, const), (NCHW, C), (NCHW, NCHW) |
Abs | nan |
ArgMax | nan |
AveragePool | h*w <= 19*19 if kernel size is h*w |
BatchNorm | nan |
Concat | max. 256 tensor concat |
Convolution | tensor size < 2^31 h*w < 64 if kernel size co*h*w*ci round(ci/16)*round(co/16) < 512*1024 Supports autoPad SAME_UPPER attribute. Limitation: Pads range: [0, 7] |
Clip | Max == 6 is equal to Relu6 |
DepthwiseConv2D | Natively supported Kernel_size is 3 * 3. For any other cases, Convolution will be performed. Limitation: Pad range [0,1]. |
Div | refer to Add operator |
Dropout | nan |
DepthToSpace | for 4 dimensional input only |
Expand | nan |
Exp | nan |
Gather | indices are int32 constant |
Gemm | nan |
GlobalAveragePool | if kernel size is h*w h*w <= 255 for uint8 h*w <= 65025 for int16 |
GlobalMaxPool | if kernel size is h*w h*w <= 255 for uint8 h*w <= 65025 for int16 |
LSTM | for Forward only |
Matmul | input <= 4-dimensional if weight size is do*di round(di/16)*round(do/16) < 512*1024 |
Mul | Refer to Add operator |
MaxPool | if kernel size is h*w, h*w <= 19*19 |
Max | nan |
Pad | support constant only |
Reshape | nan |
ReduceSum | input <= 4-dimensional |
ReduceMean | input <= 4-dimensional |
ReduceMax | input <= 4-dimensional Only supports calculation of one axis at the same time |
Resize | Does not support roi parameter. This operator only scales the entire dimension. In bilinear fashion, the parameter coordinate_transformation_mod only supports align_corners and asymmetric. down sampling not supported input = 4-dimensional same scale on H and W |
ReLU | input = 4-dimensional |
PReLU | input <= 4-dimensional |
LeakyReLU | input <= 4-dimensional |
TanH | nan |
Sigmoid | nan |
Slice | nan |
Softmax | If you need to perform calculation on a specified dimension, transport the dimension to be calculated to the last dimension (innermost dimension) max. 32*512=16384 |
Split | nan |
SpaceToDepth | nan |
Squeeze | nan |
Sub | Refer to Add operator |
Transpose | nan |
Tile | nan |
Unsqueeze | nan |
Upsample | input = 4-dimensional same scale on H and W |
SigmaStar IPU SDK Model Limitations¶
-
For DepthwiseConv, if kernel size > 3, input size == kernel size must be satisfied.
-
For Softmax with specified dimension, we only support operation of the inntermost dimension (that is, if the Tensor is multi-dimensional, only Softmax operation against the innermost dimension will be done).
-
For TensorFlow network, try to minimize use of DMA operators that occupy large amount of data (including operators for pure data transportation operation such as Gather, Unpack, Pack, Concat, Reshape, Slice, Tile, Tanspose, Pad, and Split). If the C dimension is an integer multiple of 16, use of the above operator would speed up the calculation.
-
Similar to TensorFlow network, minimize use of operators such as Split, Concat, Reshape, Flatten, Slice, and Permute, in Caffe network.
-
Except the first-layer conversion, the Conv DI dimension of the remaining layers (i.e. the C dimension in NHWC) is in positive proportion to the efficiency. The larger the dimension, the higher the efficiency. The maximum support is 2048.
-
For math type operators (including Add, Sub, Mul, and Div, etc.), the efficiency will be higher if the right-hand operand is scaler (single number) or 1-dimensional (with identical HW data dimension but different C data dimension).
-
In the network structure, avoid using the output of one operator as the input of multiple operators, such as ResNet residual structure and GoogleNet inception module, where possible.