参考模型

时间

The tables in this section contain inference timings for a set of representative models. The quantized models have been imported and compiled offline using SyNAP toolkit. The floating point models are benchmarked for comparison purpose with the corresponding quantized models.

The mobilenet_v1, mobilenet_v2, posenet and inception models are open-source models available in tflite format from TensorFlow Hosted Models page: https://www.tensorflow.org/lite/guide/hosted_models

yolov5 models are available from https://github.com/ultralytics/yolov5, while yolov5_face comes from https://github.com/deepcam-cn/yolov5-face.

Other models come from AI-Benchmark APK: https://ai-benchmark.com/ranking_IoT.html.

Some of the models are Synaptics proprietary, including test models, object detection (mobilenet224), super-resolution and format conversion models.

The model test_64_128x128_5_132_132 has been designed to take maximum advantage of the computational capabilities of the NPU. It has 64 5x5 convolutions with a [1, 128, 128, 132] input and output. Its execution requires 913’519’411’200 operations (0.913 TOPs). Inference time shows that in the right conditions VS640 and SL1640 achieve above 1.6 TOP/s while VS680 and SL1680 able to achieve above 7.9 TOP/s. For 16-bits inference the maximum TOP/s can be achieved with test_64_64x64_5_132_132. With this model we achieve 0.45 TOP/s on VS640/SL1640 and above 1.7 TOP/s on VS680/SL1680. For actual models used in practice it’s very difficut to get close to this level of performance and it’s hard to predict the inference time of a model from the number of operation it contains. The only reliable way is to execute the model on the platform and measure.

Remarks:

  • In the following tables all timing values are expressed in milliseconds

  • The columns Online CPU and Online NPU represent the inference time obtained by running the original tflite model directly on the board (online conversion)

  • Online CPU tests have been done with 4 threads (--num_threads=4) on both vs680 and vs640

  • Online CPU tests of floating point models on vs640 have been done in fp16 mode (--allow_fp16=true)

  • Online NPU tests executed with the timvx delegate (--external_delegate_path=libvx_delegate.so)

  • The Offline Infer column represents the inference time obtained by using a model converted offline using SyNAP toolkit (median time over 10 consecutive inferences)

  • The Online timings represent the minimum time measured (for both init and inference). We took minimim instead of average because this is measure less sensitive to outliers due to the test process being temporarily suspended by the CPU scheduler

  • Online timings, in particular for init and CPU inference, can be influenced by other processes running on the board and the total amount of free memory available. We ran all tests on Android AOSP/64bits with 4GB of memory on VS680 and 2GB on VS640. Running on Android GMS or 32-bits OS or with less memory can result in longer init and inference times

  • Offline tests have been done with non-contiguous memory allocation and no cache flush

  • Models marked with * come precompiled and preinstalled on the platform

VS680 和 SL1680 的推理时间

These tables show the inference timings for a set of models on VS680 and SL1680. All tests have been done on 64-bits OS with 4GB of memory.

Synaptics models

Model

Online CPU Infer

Online GPU Infer

Online NPU Init

Online NPU Infer

Offline NPU Init

Offline NPU Infer

convert_nv12@1920x1080_rgb@1920x1080

17.52

30.81

*

convert_nv12@1920x1080_rgb@224x224

14.25

1.27

*

convert_nv12@1920x1080_rgb@640x360

13.55

5.14

*

sr_fast_y_uv_1280x720_3840x2160

317

32.88

18.09

11.46

*

sr_fast_y_uv_1920x1080_3840x2160

776

50.56

20.40

17.50

*

sr_qdeo_y_uv_1280x720_3840x2160

149

32.49

21.84

20.59

*

sr_qdeo_y_uv_1920x1080_3840x2160

233

38.41

24.11

25.84

*

mobilenet224_full80

66.28

25.17

*

mobilenet224_full1

57.71

14.23

*

test_64_128x128_5_132_132

50.07

119.34

Open models

Model

Online CPU Infer

Online GPU Infer

Online NPU Init

Online NPU Infer

Offline NPU Init

Offline NPU Infer

inception_v4_299_quant

500.54

13502

17.80

100.79

19.59

mobilenet_v1_0.25_224_quant

3.37

166

0.81

2.61

0.77

*

mobilenet_v2_1.0_224_quant

18.60

854

1.85

6.13

1.79

*

posenet_mobilenet_075_float

34.44

61.78

*

posenet_mobilenet_075_quant

28.60

382

6.01

1.84

2.32

yolov8s-pose

14.61

30.79

*

yolov5m-640x480

6606

113.88

54.11

118.82

yolov5s-640x480

2672

72.27

22.17

75.83

yolov5s_face_640x480_onnx_mq

13.00

31.88

*

AiBenchmark 4 models

Model

Online CPU Infer

Online GPU Infer

Online NPU Init

Online NPU Infer

Offline NPU Init

Offline NPU Infer

deeplab_v3_plus_quant

231.76

4090

60.73

7.68

59.81

dped_quant

335.63

1019

8.93

4.74

8.82

inception_v3_float

370.95

436.74

inception_v3_quant

267.95

7210

9.47

59.55

10.22

mobilenet_v2_b4_quant

53.54

875

12.50

11.53

13.63

mobilenet_v2_float

20.84

35.40

mobilenet_v2_quant

18.72

886

2.02

9.27

1.98

mobilenet_v3_quant

51.89

1089

9.76

13.15

10.15

pynet_quant

976.61

3175

18.56

24.45

19.30

srgan_quant

1513.86

3517

54.58

14.72

56.95

unet_quant

265.51

487

9.34

7.73

14.80

vgg_quant

1641.18

2177

29.77

10.74

30.07

AiBenchmark 5 models

Model

Online CPU Infer

Online GPU Infer

Online NPU Init

Online NPU Infer

Offline NPU Init

Offline NPU Infer

crnn_float

240.53

217.89

crnn_quant

113.52

40641

22.54

217.19

23.03

deeplab_v3_plus_float

1181.26

1447.74

deeplab_v3_plus_quant

674.10

2381

97.98

16.68

103.05

dped_float

4043.67

2353.72

dped_instance_float

2236.11

1386.78

dped_instance_quant

3422.17

288

229.41

5.43

953.44

dped_quant

2249.53

3266

196.30

18.70

199.15

efficientnet_b4_float

432.00

579.98

efficientnet_b4_quant

228.86

9649

162.06

54.48

166.33

esrgan_float

1445.08

1522.63

esrgan_quant

770.69

2119

93.78

5.08

101.56

imdn_float

2553.12

2382.78

imdn_quant

1350.97

3215

165.92

8.63

155.46

inception_v3_float

371.39

437.60

inception_v3_quant

221.61

7254

10.26

76.98

11.38

mobilenet_v2_b8_float

152.22

203.15

mobilenet_v2_b8_quant

91.44

889

25.91

14.69

27.18

mobilenet_v2_float

20.98

36.08

mobilenet_v2_quant

12.28

968

2.11

10.04

2.07

mobilenet_v3_b4_float

351.62

461.38

mobilenet_v3_b4_quant

359.39

1497

97.86

20.36

101.09

mobilenet_v3_float

91.33

114.77

mobilenet_v3_quant

97.05

1706

19.82

15.14

20.92

mv3_depth_float

132.65

194.37

mv3_depth_quant

218.06

1513

71.09

15.74

90.96

punet_float

2612.87

1796.33

punet_quant

1660.59

2019

155.60

14.06

149.79

pynet_float

2836.85

1620.06

pynet_quant

2100.18

3441

137.39

15.80

135.94

resnet_float

0.10

2.86

resnet_quant

0.41

132

0.13

3.95

0.12

srgan_float

6192.96

2921.47

srgan_quant

4220.89

12224

200.90

29.85

208.23

unet_float

2909.00

2132.16

unet_quant

1710.41

775

69.08

19.11

95.29

vsr_float

820.35

974.12

vsr_quant

580.30

2124

155.28

20.45

133.86

xlsr_float

518.61

532.46

xlsr_quant

470.63

1700

36.20

3.93

31.38

yolo_v4_tiny_float

187.81

157.75

yolo_v4_tiny_quant

311.65

1406

6.62

4.69

6.03

VS640 和 SL1640 的推理时间

These tables show the inference timings for a set of models on VS640 and SL1640. All tests have been done on 64-bits OS with 2GB of memory.

Synaptics models

Model

Online CPU Infer

Online GPU Infer

Online NPU Init

Online NPU Infer

Offline NPU Init

Offline NPU Infer

convert_nv12@1920x1080_rgb@1920x1080

17.48

34.49

*

convert_nv12@1920x1080_rgb@224x224

15.14

1.25

*

convert_nv12@1920x1080_rgb@640x360

14.70

5.29

*

sr_fast_y_uv_1280x720_3840x2160

274

53.03

17.87

17.01

*

sr_fast_y_uv_1920x1080_3840x2160

524

86.39

20.35

25.90

*

sr_qdeo_y_uv_1280x720_3840x2160

20.33

26.16

*

sr_qdeo_y_uv_1920x1080_3840x2160

22.03

33.56

*

mobilenet224_full80

718.96

52.98

*

mobilenet224_full1

595.52

36.53

*

test_64_128x128_5_132_132

63.95

563.81

Open models

Model

Online CPU Infer

Online GPU Infer

Online NPU Init

Online NPU Infer

Offline NPU Init

Offline NPU Infer

inception_v4_299_quant

255.94

21481

54.07

127.13

53.82

mobilenet_v1_0.25_224_quant

2.80

244

1.00

4.99

0.93

*

mobilenet_v2_1.0_224_quant

12.31

1203

2.40

14.21

2.31

*

posenet_mobilenet_075_float

27.96

90.06

*

posenet_mobilenet_075_quant

18.70

565

9.76

2.48

4.13

yolov8s-pose

20.66

54.59

*

yolov5m-640x480

10657

175.90

60.64

178.00

yolov5s-640x480

4264

101.73

24.90

103.36

yolov5s_face_640x480_onnx_mq

27.31

59.63

*

AiBenchmark 4 models

Model

Online CPU Infer

Online GPU Infer

Online NPU Init

Online NPU Infer

Offline NPU Init

Offline NPU Infer

deeplab_v3_plus_quant

158.81

3679

82.47

8.31

70.85

dped_quant

134.25

1694

25.84

6.71

25.72

inception_v3_float

229.07

706.98

inception_v3_quant

146.25

11130

30.21

80.70

29.82

mobilenet_v2_b4_quant

47.85

1373

18.95

13.60

18.39

mobilenet_v2_float

18.27

52.41

mobilenet_v2_quant

12.44

1282

2.57

13.81

2.44

mobilenet_v3_quant

47.99

1593

12.57

16.38

11.91

pynet_quant

447.61

4803

57.11

31.04

56.30

srgan_quant

829.17

5232

121.92

15.97

121.75

unet_quant

159.01

745

18.58

9.93

24.20

vgg_quant

572.70

3258

103.66

10.65

102.65

AiBenchmark 5 models

Model

Online CPU Infer

Online GPU Infer

Online NPU Init

Online NPU Infer

Offline NPU Init

Offline NPU Infer

crnn_float

140.70

352.45

crnn_quant

87.40

70679

33.96

284.46

33.66

deeplab_v3_plus_float

976.50

2465.12

deeplab_v3_plus_quant

580.25

3889

137.92

19.01

133.41

dped_float

2197.13

4492.49

dped_instance_float

1458.77

2556.78

dped_instance_quant

3565.00

438

366.10

dped_quant

1166.57

4025

340.13

14.97

326.37

efficientnet_b4_float

396.99

945.97

efficientnet_b4_quant

266.18

15393

202.61

74.93

200.67

esrgan_float

971.15

2632.86

esrgan_quant

501.45

2394

147.30

5.28

147.75

imdn_float

1596.76

4181.09

imdn_quant

1123.80

5018

281.54

5.57

269.71

inception_v3_float

228.96

724.73

inception_v3_quant

157.53

11300

30.83

96.69

30.12

mobilenet_v2_b8_float

132.81

338.71

mobilenet_v2_b8_quant

88.94

1395

39.21

15.75

38.26

mobilenet_v2_float

18.09

52.04

mobilenet_v2_quant

11.68

1395

2.69

14.11

2.58

mobilenet_v3_b4_float

327.02

765.60

mobilenet_v3_b4_quant

364.18

2358

136.42

22.52

135.36

mobilenet_v3_float

90.62

181.89

mobilenet_v3_quant

100.55

2586

24.74

20.79

23.85

mv3_depth_float

189.75

331.05

mv3_depth_quant

301.72

2307

80.36

18.43

98.45

punet_float

1856.32

3173.80

punet_quant

1572.03

2736

259.45

11.74

249.44

pynet_float

2747.17

2833.27

pynet_quant

2126.74

5113

282.27

17.12

275.28

resnet_float

0.08

2.16

resnet_quant

0.04

205

0.17

5.09

0.23

srgan_float

3295.12

5420.38

srgan_quant

1740.11

17125

423.24

29.30

420.17

unet_float

1726.05

3776.94

unet_quant

1315.86

1089

155.08

14.68

195.52

vsr_float

680.60

1905.83

vsr_quant

621.03

1732

200.22

11.42

156.74

xlsr_float

595.84

987.90

xlsr_quant

570.30

2063

42.01

3.27

41.15

yolo_v4_tiny_float

125.44

254.04

yolo_v4_tiny_quant

321.50

2064

13.68

8.08

12.73

超分辨率

Synaptics provides two proprietary families of super resolution models: fast and qdeo, the former provides better inference time, the latter better upscaling quality. They can be tested using synap_cli_ip application, see synap_cli_ip 应用.

These models are preinstalled in $MODELS/image_processing/super_resolution .

Synaptics SuperResolution Models on Y+UV Channels

Name

Input Image

Ouput Image

Factor

sr_fast_y_uv_960x540_3840x2160

960x540

3840x2160

4

sr_fast_y_uv_1280x720_3840x2160

1280x720

3840x2160

3

sr_fast_y_uv_1920x1080_3840x2160

1920x1080

3840x2160

2

sr_qdeo_y_uv_960x540_3840x2160

960x540

3840x2160

4

sr_qdeo_y_uv_1280x720_3840x2160

1280x720

3840x2160

3

sr_qdeo_y_uv_1920x1080_3840x2160

1920x1080

3840x2160

2

sr_qdeo_y_uv_640x360_1920x1080

640x360

1920x1080

3

格式转换

Conversion models can be used to convert an image from NV12 format to RGB. A set of models is provided for the most commonly used resolutions. These models have been generated by taking advantage of the preprocessing feature of the SyNAP toolkit (see 预处理) and can be used to convert an image so that it can be fed to a processing model with RGB input.

These models are preinstalled in $MODELS/image_processing/preprocess and can be tested using synap_cli_ic2 application, see synap_cli_ic2 应用.

Synaptics Conversion Models NV12 to RGB 224x224

Name

Input Image (NV12)

Ouput Image (RGB)

convert_nv12@426x240_rgb@224x224

426x240

224x224

convert_nv12@640x360_rgb@224x224

640x360

224x224

convert_nv12@854x480_rgb@224x224

854x480

224x224

convert_nv12@1280x720_rgb@224x224

1280x720

224x224

convert_nv12@1920x1080_rgb@224x224

1920x1080

224x224

convert_nv12@2560x1440_rgb@224x224

2560x1440

224x224

convert_nv12@3840x2160_rgb@224x224

3840x2160

224x224

convert_nv12@7680x4320_rgb@224x224

7680x4320

224x224

Synaptics Conversion Models NV12 to RGB 640x360

Name

Input Image (NV12)

Ouput Image (RGB)

convert_nv12@426x240_rgb@640x360

426x240

640x360

convert_nv12@640x360_rgb@640x360

640x360

640x360

convert_nv12@854x480_rgb@640x360

854x480

640x360

convert_nv12@1280x720_rgb@640x360

1280x720

640x360

convert_nv12@1920x1080_rgb@640x360

1920x1080

640x360

convert_nv12@2560x1440_rgb@640x360

2560x1440

640x360

convert_nv12@3840x2160_rgb@640x360

3840x2160

640x360

convert_nv12@7680x4320_rgb@640x360

7680x4320

640x360

Synaptics Conversion Models NV12 to RGB 1920x1080

Name

Input Image (NV12)

Ouput Image (RGB)

convert_nv12@426x240_rgb@1920x1080

426x240

1920x1080

convert_nv12@640x360_rgb@1920x1080

640x360

1920x1080

convert_nv12@854x480_rgb@1920x1080

854x480

1920x1080

convert_nv12@1280x720_rgb@1920x1080

1280x720

1920x1080

convert_nv12@1920x1080_rgb@1920x1080

1920x1080

1920x1080

convert_nv12@2560x1440_rgb@1920x1080

2560x1440

1920x1080

convert_nv12@3840x2160_rgb@1920x1080

3840x2160

1920x1080

convert_nv12@7680x4320_rgb@1920x1080

7680x4320

1920x1080