Other Issues

Q: How should I design the network to have it perform efficiently on SigmaStar’s hardware?

A: First, it should be noted that the default data layout of the model after SGS_IPU_SDK conversion is NHWC. From the perspective of convolution, the larger the C dimension of the input shape of the convolution, the smaller the HW and the higher the efficiency of the convolution. The hardware can directly support Depthwise convolution with a kernel size of 3x3, and kernels of other sizes will be converted to non-Depthwise convolution for calculation. As for the other layers, the C dimension of the other layers should preferably be an integer multiple of 16, because this can speed up the calculation, especially for Gather, Unpack, Pack, Concat, Reshape, Slice, Tile, Tanspose, Pad, Split and other operators.

Q: What is the layout of the image data input from network?

A: As stated above, the default data layout of the model after SGS_IPU_SDK conversion is NHWC. Hence, for image input model of RGB channel order, the image data arrangement is ……RGBRGB…… And if the image input model is arranged in BGR channel order, the image data arrangement will be ……BGRBGR…….

Q: How many box inputs does the NMS support?

A: The TFLite_Detection_NMS operator supports up to 24576 BBox inputs. Due to the particularity of the NMS algorithm, the maximum limit on current hardware is 24576 inputs.

Q: Can network for non-image inputs work?

A: Yes. But the data input requirement is slightly different from the image input case. See RAWDATA_S16_NHWC Input Model for details.

Q: Is it fast to convert the int16 output from the network to floating point using ARM?

A: It takes about 8ms to test 1 million int16 data entries multiplied by scale.

Q: How to check the on-board operating efficiency of the model?

A: To check the detailed operating efficiency inside the network, you can modify the process by using while(1) before MI_IPU_DestroyCHN MI_IPU_DestroyDevice to hold the process temporarily.

After the process goes into operation again, use Telnet to create a terminal using the following command:

echo time_statistic > /proc/mi_modules/mi_ipu/mi_ipu0

The detailed operating efficiency inside the network will be displayed on the serial port.

To check the IPU frequency, use the command below:

cat /sys/dla/clk_rate