Lightweight Target Detection Method Based on Adaptive Feature Fusion of Optimized RetinaNet
ZHANG Liguo1,2, JI Xinye1,2, ZHANG Yupeng1,2, GENG Xingshuo1,2, ZHANG Sheng1,2
1. Hebei Key Laboratory of Meas Tech and Instrument, Yanshan University, Qinhuangdao, Hebei 066000, China
2.School of Electrical Engineering, Yanshan University, Qinhuangdao, Hebei 066000, China
Abstract:Aiming at the problems of large amount of computation and complex model in target detection algorithm, which make it difficult to deploy in application scenarios with limited computing resources on embedded platform. The lightweight target detection method based on optimized RetinaNet is proposed with adaptive feature fusion. Firstly, the proposed algorithm refers to the Ghost Module in GhostNet to reduce the number of model parameters. By means of a spatial feature fusion mechanism, the scale invariance of features is improved. Secondly, the idea of structural reparameterization is integrated to increase the depth of training, realize multi-branch training, single-branch training, and better improve the detecting performance of the model. The method is evaluated on two common target detection datasets, PASCAL VOC2007 and COCO. With an average accuracy of 54.1%, better than that of RetinaNet. The experimental results show that the memory taken by the proposed method is 170.71MByte, which is 44.27% of the memory taken by the RetinaNet, indicating that the proposed algorithm can greatly improve the network inference speed without ensuring the accuracy.
DING X, ZHANG X, MA N, et al. Repvgg: Making vgg-style convnets great again[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, 2021: 13733-13742.
[3]
BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: Optimal speed and accuracy of object detection[EB/OL]. (2020-04-23). [2023-02-15]. https://arxiv.org/abs/2004.10934.
[11]
LIU W, ANGUELOV D, ERHAN D, et al. Ssd: Single shot multibox detector[C]//European conference on computer vision. Amsterdam, Netherlands, 2016: 21-37.
[14]
MISRA D. Mish: A self regularized non-monotonic neural activation function[EB/OL]. (2020-08-05). [2023-02-18]. https://arxiv.org/abs/1908.08681.
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[7]
HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Honolulu, USA, 2017: 4700-4708.
[8]
GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE international conference on computer vision. Boston, USA, 2015: 1440-1448.
[12]
LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE international conference on computer vision. Honolulu, USA, 2017: 2980-2988.
YAO B, WEN X L, JIAO L B, et al. Improved YOLOv3 Algorithm for Surface Defect Detection of Aluminum Profile[J]. Acta Metrologica Sinica, 2022, 43(10): 1256-1261.
WANG Z W, GUO B, HU X F, et al. Internal Groove Defect Detection Method of Brake Master Cylinder Based on FCOS Neural Network.[J] Acta Metrologica Sinica, 2021, 42(9): 1225-1231.
[19]
QI L, KUEN J, GU J, et al. Multiscale aligned distillation for low-resolution detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, 2021: 14443-14453.
[20]
LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft coco: Common objects in context[C]//European conference on computer vision. Springer, Cham, 2014.
[22]
WU Y, LIM J, YANG M H. Online object tracking: A benchmark[C]//IEEE conference on computer vision and pattern recognition. Portland, USA, 2013.
[6]
HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Las Vegas, USA, 2016: 770-778.
[15]
LIU S, HUANG D. Receptive field block net for accurate and fast object detection[C]//Proceedings of the European conference on computer vision. Munich, Germany, 2018: 385-400.
[23]
DENG J, DONG W, SOCHER R, et al. Imagenet: A large-scale hierarchical image database[C]//IEEE conference on computer vision and pattern recognition. Miami, USA, 2009.
[1]
LI C Y, LI L L, JIANG H L, et al. YOLOv6: a single-stage object detection framework for industrial applications[EB/OL]. (2022-09-07) [2023-01-08]. https://arxiv.org/abs/2209.02976.
[4]
HAN K, WANG Y, TIAN Q, et al. Ghostnet: More features from cheap operations[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Seattle, USA, 2020: 1580-1589.
[10]
REDMON J, FARHADI A. Yolov3: An incremental improvement[EB/OL]. (2018-04-08). [2022-12-18]. https://arxiv.org/abs/1804.02767.
[13]
GLOROT X, BORDES A, BENGIO Y. Deep sparse rectifier neural networks[C]//Proceedings of the fourteenth international conference on artificial intelligence and statistics. Crete, Greece, 2011: 315-323.
[18]
CHEN Q, WANG Y M, YANG T, et al. You only look one-level feature[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Nashville, TN, USA, 2021: 13039-13048.
[21]
RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International journal of computer vision, 2015, 115(3): 211-252.
[9]
REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6):1137-1149.