Running Offline Network Model on Development Board

We recommend that you verify the model using the simulator before operating offline network model on the development board. The simulator can simulate the on-board operation environment on PC terminal. First, ensure that your development board has been burned with the latest firmware. Then use dla_classify in Demo to make a deduction against one single frame. The result is TOP5 of the deduction. In the following sections, we will provide an in-depth description of the dla_classify.cpp file, to illustrate the sequence of using MI_IPU API. The dla_classify.cpp file is located at：sdk/verify/mi_demo/source/dla_classify/dla_classify.cpp.

Creating an IPU Device¶

MI_S32 IPUCreateDevice(char* pFirmwarePath, MI_U32 u32VarBufSize) 
{ 
    MI_S32 s32Ret = MI_SUCCESS; 
    MI_IPU_DevAttr_t stDevAttr; 
    stDevAttr.u32MaxVariableBufSize = u32VarBufSize; 
    stDevAttr.u32YUV420_W_Pitch_Alignment = 16; 
    stDevAttr.u32YUV420_H_Pitch_Alignment = 2; 
    stDevAttr.u32XRGB_W_Pitch_Alignment = 16; 
    s32Ret = MI_IPU_CreateDevice(&stDevAttr, NULL, pFirmwarePath, 0); 
    return s32Ret;
 }

Input parameter:

pFirmwarePath：Firmware file path. If NULL, will call /config/dla/ipu_firmware.bin. 

u32VarBufSize：Maximum memory size used for internal Tensor model.

    u32VarBufSize can be obtained by two methods:

    Method 1: Use parse_net tool.

         The path of the parse_net tool is SGS_IPU_SDK/bin/parse_net. 

         In use, add the fixed-point network model path after the parse_net command. See the following command for example:

         ./parse_net xxx_fixed.sim | grep Variable
         Output： 
         | |--> SubGraph[0]: Variable buffer 0xd2f00 @ 0x82f3b40.
         Wherein, 0xd2f00 is the u32VarBufSize after being aligned to 2.
         In C/C++, alignment to 2 is defined as: 
         #define alignment_up(a, size) ((a + size - 1) & (~(size - 1)))
         So u32VarBufSize is 864000.

    Method 2: Use MI API.

         MI_IPU_OfflineModelStaticInfo_t OfflineModelInfo;
         ret = MI_IPU_GetOfflineModeStaticInfo(NULL, modelFilePath, &OfflineModelInfo);
         if (ret != MI_SUCCESS) 
         { 
            std::cerr << "get model variable buffer size failed!" << std::endl;
            return;
         } 
         u32VarBufSize = OfflineModelInfo.u32VariableBufferSize

         If multiple models are to be operated, use the maximum u32VarBufSize to create the IPU device.

Output parameter:

MI_IPU API error code. See MI_IPU_API for details.

Creating an IPU Channel¶

MI_S32 IPUCreateChannel(MI_U32* u32Channel, char* pModelImage) 
{ 
    MI_S32 s32Ret ; 
    MI_IPUChnAttr_t stChnAttr; 

    //create channel 
    memset(&stChnAttr, 0, sizeof(stChnAttr)); 
    stChnAttr.u32InputBufDepth = 2; 
    stChnAttr.u32OutputBufDepth = 2; 
    return MI_IPU_CreateCHN(u32Channel, &stChnAttr, NULL, pModelImage); 
}
MI_U32 u32Channel; 
ret = IPUCreateChannel(&u32ChannelID, “./mobilenet_v1_fixed.img”);

Input parameter：

s32Channel: ID of the created IPU channel. 

pModelImage：Offline network model file path.

Output parameter：

MI_IPU API error code. See MI_IPU_API for details.

Getting Model Input and Output Tensor Properties¶

MI_IPU_SubNet_InputOutputDesc_t desc; 
MI_IPU_GetInOutTensorDesc(u32ChannelID, &desc);

Input parameter：

u32ChnId：ID of IPU channel. 
pstDesc：IPU subnet input/output descriptor structure.

Output parameter：

MI_IPU API error code. See MI_IPU_API for details.

Getting Input and Output Tensor¶

MI_IPU_TensorVector_t InputTensorVector; 
MI_IPU_TensorVector_t OutputTensorVector; 
MI_IPU_GetInputTensors(u32ChannelID, &InputTensorVector); 
MI_IPU_GetOutputTensors(u32ChannelID, &OutputTensorVector);

Input parameter:

u32ChannelID: ID of IPU channel. 

InputTensorVector: Input IPU Tensor array structure. 

OutputTensorVector: Output IPU Tensor array structure.

Output parameter：

MI_IPU API error code. See MI_IPU_API for details.

Model Data Input¶

Copying Data¶

To copy data to the virtual address of the model input Tensor, and call MI_SYS_FlushInvCache after copying:

MI_U8* pdata = (MI_U8 *)InputTensorVector.astArrayTensors[0].ptTensorData[0]; 
MI_U8* pSrc = (MI_U8 *)Input_Data; 
for(int i = 0; i < imageSize; i++) 
{ 
    *(pdata + i) = *(pSrc + i); 
} 
MI_SYS_FlushInvCache(pdata, imageSize);

If the input format of the model is BGR or RGB, do NOT do stride alignment.
If the input format of the model is BGRA or RGBA (A channel located at high address), the rule for stride alignmet is stride = alignment_up(width*4,16).
- If the input_format is BGRA, MI_SYS_PixelFormat_e corresponds to E_MI_SYSPIXEL_FRAME_ARGB8888.
- If the input_format is RGBA, MI_SYS_PixelFormat_e corresponds to E_MI_SYSPIXEL_FRAME_ABGR8888.
If the input format is YUV_NV12 or GRAY, the rule for stride alignment is stride = alignment_up(width,16), and the height requires alignment to 2.
- If the input format is YUV_NV12 or GRAY, MI_SYS_PixelFormat_e corresponds to E_MI_SYSPIXEL_FRAME_YUV_SEMIPLANAR_420.

Zero Copying¶

If another MI module is used and the data is ready for model input, you can directly use the physical address of that MI module without copying the data.

Below we will use MI_SYS module as an example to illustrate the zero-copy data input. Meanwhile, please pay special attention to the following:

InputBufDepth should be set to 0, when creating an IPU channel. This will prevent use of MI_IPU_GetInputTensors for input space allocation:

stChnAttr.u32InputBufDepth = 0;

Since MI_IPU_GetInputTensors is not used, you need to manually assign the number of network model inputs described by the network to InputTensorVector.u32TensorCount.

InputTensorVector.u32TensorCount = desc.u32InputTensorCount;

To use MI_SYS APIs to allocate memory space:

MI_PHY phySrcBufAddr = 0; 
MI_PHY phyDstBufAddr = 0;
void *pVirSrcBufAddr; 
void *pVirDstBufAddr; 
MI_S32 ret;

ret = MI_SYS_MMA_Alloc(NULL, SRC_BUFF_SIZE, &phySrcBufAddr); 
if(ret != MI_SUCCESS) 
{
    divp_ut_dbg("alloc src buff failed\n"); 
    return -1;
}
ret = MI_SYS_Mmap(phySrcBufAddr, SRC_BUFF_SIZE, &pVirSrcBufAddr, TRUE); 
if(ret != MI_SUCCESS) 
{
    MI_SYS_MMA_Free(phySrcBufAddr); 
    divp_ut_dbg("mmap src buff failed\n"); 
    return -1;
}
ret = MI_SYS_MMA_Alloc(NULL, DST_BUFF_SIZE, &phyDstBufAddr); 
if(ret != MI_SUCCESS) 
{
    MI_SYS_Munmap(pVirSrcBufAddr, SRC_BUFF_SIZE); 
    MI_SYS_MMA_Free(phySrcBufAddr); 
    divp_ut_dbg("alloc dst buff failed\n"); 
    return -1;
}
ret = MI_SYS_Mmap(phyDstBufAddr, DST_BUFF_SIZE, &pVirDstBufAddr, TRUE); 
if(ret != MI_SUCCESS)
{ 
    MI_SYS_Munmap(pVirSrcBufAddr, SRC_BUFF_SIZE); 
    MI_SYS_MMA_Free(phySrcBufAddr); 
    MI_SYS_MMA_Free(phyDstBufAddr); 
    divp_ut_dbg("mmap dst buff failed\n"); 
    return -1; 
}

MI_S32 s32ret = 0;
MI_PHY phyAddr = 0;
void* pVirAddr = NULL;
s32ret = MI_SYS_MMA_Alloc(NULL, BufSize, &phyAddr);
if (s32ret != MI_SUCCESS)
{
    throw std::runtime_error("Alloc buffer failed!");
}
s32ret = MI_SYS_Mmap(phyAddr, BufSize, &pVirAddr, TRUE);
if (s32ret != MI_SUCCESS)
{
    MI_SYS_MMA_Free(phyAddr);
    throw std::runtime_error("Mmap buffer failed!");
}

To pass virtual address and physical address to InputTensorVector:

InputTensorVector.astArrayTensors[0].ptTensorData[0] = pVirAddr;
InputTensorVector.astArrayTensors[0].phyTensorAddr[0] = phyAddr;

After this, model deduction can be performed. With the zero-copy data method, there is no need to release the input Tensor, that is, the MI_IPU_PutInputTensors function is no longer called.

MI_SYS_Munmap(pVirAddr, BufSize);
MI_SYS_MMA_Free(phyAddr);

Model Inference¶

ret = MI_IPU_Invoke(u32ChannelID, &InputTensorVector, &OutputTensorVector); 
if (ret != MI_SUCCESS) 
{ 
    MI_IPU_DestroyCHN(u32ChannelID); 
    MI_IPU_DestroyDevice(); 
    std::cerr << "IPU invoke failed!!" << std::endl; 
}

Input parameter:

u32ChannelID：ID of IPU channel. 

InputTensorVector：Input IPU Tensor array structure. 

OutputTensorVector：Output IPU Tensor array structure.

Output parameter：

MI_IPU API error code. See MI_IPU_API for details.

Releasing Input and Output Tensor¶

MI_IPU_PutInputTensors(u32ChannelID, &InputTensorVector); 
MI_IPU_PutOutputTensors(u32ChannelID, &OutputTensorVector);

Input parameter:

u32ChannelID：ID of IPU channel.

InputTensorVector: Input IPU Tensor array structure. 

OutputTensorVector: Output IPU Tensor array structure.

Output parameter：

MI_IPU API error code. See MI_IPU_API for details.

Destroying an IPU Channel¶

MI_IPU_DestroyCHN(u32ChannelID);

Input parameter:

s32Channel: ID of IPU channel.

Output parameter：

MI_IPU API error code. See MI_IPU_API for details.

Destroying an IPU Device¶

MI_IPU_DestroyDevice();

Input parameter:

Null

Output parameter：

MI_IPU API error code. See MI_IPU_API for details.

Using Py_dla Interface¶

Py_dla is a Python-encapsulated MI_IPU interface, which realizes quick processing of model output data on the development board and facilitates verification of model accuracy. To use Py_dla, place the ARM Python3.8 library at sdk/verify/mi_demo/source/py_dla/Python3.8.tar.bz2, then use the following command to decompress the library to a local path which can be mounted to the development board:

tar -jxvf Python3.8.tar.bz2

Then configure the path of Python3.8 on the development board:

export LD_LIBRARY_PATH=path/to/Python3.8/usr/lib/:$LD_LIBRARY_PATH
export PATH=path/to/Python3.8/usr/bin:$PATH

The Py_dla interface file, set forth below, can be imported by Python for use directly: sdk/verify/mi_demo/source/py_dla/_misim.cpython-38-arm-linux-gnueabihf.so. To use Py_dla, place the above file along with the compiled Python script under the same path. In the following, we will illustrate how to call _misim.cpython-38-arm-linux-gnueabihf.so under Python environment, with a detailed description of the related APIs.

First, use the following to import _misim.cpython-38-arm-linux-gnueabihf.so:

from _misim import MI_simulator
from _misim import IPU_GetOfflineModeStaticInfo
from _misim import IPU_CreateDevice

API instance:

IPU_GetOfflineModeStaticInfo: used to get the variable buffer size of the model:

>>> model_path = 'mobilenet_v1_fixed.sim_sgsimg.img'
>>> variable_buffer_size = IPU_GetOfflineModeStaticInfo(model_path)

IPU_CreateDevice: used to create an IPU device. If there are multiple models to run, use the largest variable_buffer_size to create an IPU device.

>>> IPU_CreateDevice(variable_buffer_size)

Optional parameter:

ipu_firmware: used to specify the path for ipu_firmware.bin; default path is /config/dla/ipu_firmware.bin

>>> IPU_CreateDevice(variable_buffer_size, '/path/to/ipu_firmware')

MI_simulator: used to create a model instance

>>> model = MI_simulator(model_path)

To get the input details of the model:

>>> in_details = model.get_input_details()
>>> print(in_details)
[{'name': 'input', 'index': 0, 'dtype': <class 'numpy.uint8'>, 'shape': array([  1, 224, 224,   3])}]

To get the output details of the model:

>>> out_details = model.get_output_details()
>>> print(out_details)
[{'name': 'MobilenetV2/Predictions/Reshape_1', 'index': 0, 'dtype': <class 'numpy.float32'>, 'shape': array([   1, 1001])}]

To input data to Index 0 (the img_data is the numpy.ndarray data with the same input shape and dtype):

model.set_input(in_details[0]['index'], img_data)

To invoke a model:

model.invoke()

To get output (numpy.ndarray) from Index 0. It is suggested to use the numpy copy() method to do this.

>>> output0 = model.get_output(out_details[0]['index'])

To get output of Index 1 and use numpy copy() method to copy the RAM to new RAM:

>>> output1 = model.get_output(out_details[1]['index']).copy()