Appendix - Preprocess and Input Configuration
Preprocess.py¶
Since different network models might require different pre-processing methods, in order to minimize the loss of accuracy during network conversion, the same image pre-processing method as the one employed for training should be used. Besides, for each pre-processing method, a Python file should be compiled independently. To compile the image pre-processing file, configure the image processing function (function name not limited), and return the image data in np.array format. The function should contain two parameters:
-
Image path
-
Normalization flag (norm = True)
The normalization flag is used to distinguish whether the network model is a floating-point model, because in the floating-point network model phase, the normalization of the image needs to be processed before the image is sent to the network. However, since fixed-point network models and offline network models already include the configuration information of the input_config.ini file, and can therefore normalize the image data by themselves, no normalization is required of the network model data fed to the network. This is the same as the method by which the network model data is processed on SigmaStar hardware.
The compilation of the pre-processing file might be different depending on the configuration of the training_input_formats and input_formats in input_config.ini. Here are some examples.
Case 1: training_input_formats and input_formats are both RGB or BGR¶
The model converted on PC and the preprocess.py exemplified by the simulation model are as follows.
If each channel has a different std_value, you need to divide the corresponding channel by the std_value, e.g. std= [57.375, 57.12, 58.395]. Meanwhile, the std_value in input_config.ini that corresponds to each channel should be separated by a colon (:), and arranged in RGB order.
training_input_formats | input_formats | Data Alignment when Running on Board |
---|---|---|
RGB/BGR | RGB/BGR | Alignment not required |
RGB/BGR | RGBA/BGRA | W = ALIGN_UP(W * 4, xrgb_h_pitch_alignment ) / 4 |
import cv2 import numpy as np def get_image(img_path, resizeH=224, resizeW=224, norm=True, meanB=103.53, meanG=116.28, meanR=123.68, std=57.375, rgb=False, nchw=False): img = cv2.imread(img_path) if img is None: raise FileNotFoundError('No such image: {}'.format(img_path)) img_norm = cv2.resize(img, (resizeW, resizeH), interpolation=cv2.INTER_LINEAR) if norm: img_norm = (img_norm - [meanB, meanG, meanR]) / std img_norm = img_norm.astype('float32') else: img_norm = np.round(img_norm).astype('uint8') if rgb: img_norm = cv2.cvtColor(img_norm, cv2.COLOR_BGR2RGB) if nchw: img_norm = np.transpose(img_norm, axes=(2, 0, 1)) return np.expand_dims(img_norm, 0) def image_preprocess(img_path, norm=True): return get_image(img_path, norm=norm)
Case 2: training_input_formats is RGB and input_formats is GRAY¶
The model converted on PC and the preprocess.py exemplified by the simulation model are as follows.
Gray scale models refer to models whose input is single-channel, that is, models with input C dimension = 1. Please configure the gray scale image in the input_config.ini file as training_input_formats=RGB and input_formats=GRAY.
training_input_formats | input_formats | Data Alignment when Running on Board |
---|---|---|
RGB/BGR | GRAY | H = ALIGN_UP(H, yuv420_v_pitch_alignment ) |
W = ALIGN_UP(W, yuv420_h_pitch_alignment ) |
import cv2 import numpy as np def get_image(img_path, resizeH=28, resizeW=28, norm=True, meanR=33.318, std=1.0, rgb=False, nchw=False): img = cv2.imread(img_path, flags=-1) if img is None: raise FileNotFoundError('No such image: {}'.format(img_path)) try: img_dim = img.shape[2] except IndexError: img_dim = 1 if img_dim == 3: img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) elif img_dim == 4: img = cv2.cvtColor(img, cv2.COLOR_BGRA2GRAY) img_norm = cv2.resize(img, (resizeW, resizeH), interpolation=cv2.INTER_LINEAR) if norm: img_norm = (img_norm - meanR) / std img_norm = np.expand_dims(img_norm, axis=2) dummy = np.zeros((28,28,2)) img_norm = np.concatenate((img_norm,dummy),axis=2) img_norm = img_norm.astype('float32') else: img_norm = np.expand_dims(img_norm, 2) img_norm = np.round(img_norm).astype('uint8') if rgb: img_norm = cv2.cvtColor(img_norm, cv2.COLOR_BGR2RGB) if nchw: # NCHW img_norm = np.transpose(img_norm, -1), axes=(2, 0, 1)) return np.expand_dims(img_norm, 0) def image_preprocess(img_path, norm=True): return get_image(img_path, norm=norm)
Case 3: training_input_formats is RGB or BGR and input_formats is YUV_NV12¶
The model converted on PC and the preprocess.py exemplified by the simulation model are as follows.
training_input_formats | input_formats | Data Alignment when Running on Board |
---|---|---|
RGB/BGR | YUV_NV12 | H = ALIGN_UP(H, yuv420_v_pitch_alignment ) |
W = ALIGN_UP(W, yuv420_h_pitch_alignment ) |
import cv2 import numpy as np def get_image(img_path, resizeH=224, resizeW=224, norm=True, meanB=127.5, meanG=127.5, meanR=127.5, std=128.0, rgb=False, nchw=False): img = cv2.imread(img_path) if img is None: raise FileNotFoundError('No such image: {}'.format(img_path)) img_norm = cv2.resize(img, (resizeW, resizeH), interpolation=cv2.INTER_LINEAR) if norm: img_norm = (img_norm - [meanB, meanG, meanR]) / std img_norm = img_norm.astype('float32') else: img_norm = np.round(img_norm).astype('uint8') if rgb: img_norm = cv2.cvtColor(img_norm, cv2.COLOR_BGR2RGB) if nchw: img_norm = np.transpose(img_norm, axes=(2, 0, 1)) return np.expand_dims(img_norm, 0) def image_preprocess(img_path, norm=True): return get_image(img_path, norm=norm)
Case 4: training_input_formats and input_formats are both RAWDATA_S16_NHWC or RAWDATA_F32_NHWC¶
The model converted on PC and the preprocess.py exemplified by the simulation model are as follows.
Running on PC
When training_input_formats and input_formats in input_config.ini are both set to RAWDATA_S16_NHWC, mean_red, mean_green, mean_blue and std_value in input_config.ini will no longer take effect in fixed-point network models, and all preprocessing procedure will be done before model input. Besides, input_config.ini will not need to configure mean_red, mean_green, mean_blue and std_value either. The method of use of floating-point models is the same as the one used for operating image input models. When using simulator.py to operate fixed-point models, the preprocessing method should be consistent with the one used for preprocessing floating-point models, that is, norm must be True. In other words, when doing network preprocessing of Python file using RAWDATA_S16_NHWC, all compilation must be done based on norm = True. The simulator.py will read the original floating-point data, perform dequantization and alignment against the data, and then input the dequantized and aligned data to fixed-point models.
When training_input_formats and input_formats in input_config.ini are both set to RAWDATA_F32_NHWC, simulator.py will read the original floating-point data, and the data will be automatically aligned by align operator inside the model without manual operation.
Running on development board
When operating network with RAWDATA_S16_NHWC format on the development board, it is not necessary to fix or align the input data.
When operating network with RAWDATA_F32_NHWC format on the development board, it is not necessary to fix or align the input data, either. See RAWDATA_F32_NHWC Model Conversion for details.
training_input_formats | input_formats | Data Alignment when Running on Board |
---|---|---|
RAWDATA_F32_NHWC | RAWDATA_F32_NHWC | Alignment not required |
RAWDATA_S16_NHWC | RAWDATA_S16_NHWC | Alignment not required |
import cv2 import numpy as np def get_image(img_path, resizeH=224, resizeW=224, norm=True, rgb=False, nchw=False): img = cv2.imread(img_path) if img is None: raise FileNotFoundError('No such image: {}'.format(img_path)) img_norm = cv2.resize(img, (resizeW, resizeH), interpolation=cv2.INTER_LINEAR) img_norm = img_norm.astype('float32') if rgb: img_norm = cv2.cvtColor(img_norm, cv2.COLOR_BGR2RGB) if nchw: img_norm = np.transpose(img_norm, axes=(2, 0, 1)) return np.expand_dims(img_norm, 0) def image_preprocess(img_path, norm=True): return get_image(img_path, norm=norm)
Model Performance Optimization¶
For performance optimization of convolution:
-
Keep the kernel size at 3x3, especially for the top layer, where possible.
-
When the kernel size is 1x1, align the value of the shape at the innermost dimension of the input tensor to 16, where possible.
For the DMA operators:
-
The concatenation operator has better performance than the pack operator.
-
The split operator has better performance than the slice operator.
-
Avoid doing transpose operation on the innermost dimension, where possible.
-
The const operand of the elementwise operator should preferably be the right-hand operand, i.e. input[1].
As a general rule:
-
The tensor should preferably be 4-dimension.
-
It is recommended to align the value of the shape at the innermost dimension of the tensor to 32.
-
Softmax operation should preferably be performed on the innermost dimension only.
-
For ReduceMax, ReduceMin and ReduceSum, the collapsed latitudes should preferably be adjacent.
Rules for combined use of operators:
-
For Pad + Conv2D/DepthwiseConv, the Pad will be merged to the convolution and will therefore take little or no time to operate.
-
For Conv2D/DepthwiseConv + single-constant-operand Mul + single-constant-operand Add, the Mul and Add operators will work together with the convolution. Since BatchNorm will be converted to Mul+Add, it is okay to combine Conv2D/DepthwiseConv and BatchNorm.
-
It is okay to use Pad and avgPooling in combination.
-
Consecutive transpose operations will be automatically merged into one.
-
All reshape operators will be skipped.
-
Consecutive elementwise operators with one constant operand will be merged into one.
-
Separate single-constant-operand mul and single-constant-operand add operators will become a single MultAdd operator.
Note: Single-constant-operand here means there is only one constant operand for the operator.