基于多模态特征融合的井下人员不安全行为识别_中国煤炭行业知识服务平台

基于多模态特征融合的井下人员不安全行为识别

Title

Recognition of unsafe behaviors of underground personnel based on multi modal feature fusion
作者

王宇于春华陈晓青宋家威
Author

WANG Yu;YU Chunhua;CHEN Xiaoqing;SONG Jiawei
单位

辽宁科技大学矿业工程学院凌钢股份北票保国铁矿有限公司
Organization

School of Mining Engineering, University of Science and Technology Liaoning
Lingang Group Beipiao Baoguo Iron Mining Co., Ltd.
摘要

采用人工智能技术对井下人员的行为进行实时识别，对保证矿井安全生产具有重要意义。针对基于RGB模态的行为识别方法易受视频图像背景噪声影响、基于骨骼模态的行为识别方法缺乏人与物体的外观特征信息的问题，将2种方法进行融合，提出了一种基于多模态特征融合的井下人员不安全行为识别方法。通过SlowOnly网络对RGB模态特征进行提取；使用YOLOX与Lite−HRNet网络获取骨骼模态数据，采用PoseC3D网络对骨骼模态特征进行提取；对RGB模态特征与骨骼模态特征进行早期融合与晚期融合，最后得到井下人员不安全行为识别结果。在X−Sub标准下的NTU60 RGB+D公开数据集上的实验结果表明：在基于单一骨骼模态的行为识别模型中，PoseC3D拥有比GCN（图卷积网络）类方法更高的识别准确率，达到93.1%；基于多模态特征融合的行为识别模型对比基于单一骨骼模态的识别模型拥有更高的识别准确率，达到95.4%。在自制井下不安全行为数据集上的实验结果表明：基于多模态特征融合的行为识别模型在井下复杂环境下识别准确率仍最高，达到93.3%，对相似不安全行为与多人不安全行为均能准确识别。
Abstract

The use of artificial intelligence technology for real-time recognition of underground personnel's behavior is of great significance for ensuring safe production in mines. The RGB modal based behavior recognition methods is susceptible to video image background noise. The bone modal based behavior recognition methods lacks visual feature information of humans and objects. In order to solve the above problems, a multi modal feature fusion based underground personnel unsafe behavior recognition method is proposed by combining the two methods. The SlowOnly network is used to extract RGB modal features. The YOLOX and Lite HRNet networks are used to obtain bone modal data. The PoseC3D network is used to extract bone modal features. The early and late fusion of RGB modal features and bone modal features are performed. The recognition results for unsafe behavior of underground personnel are finally obtained. The experimental results on the NTU60 RGB+D public dataset under the X-Sub standard show the following points. In the behavior recognition model based on a single bone modal, PoseC3D has a higher recognition accuracy than GCN (graph convolutional network) methods, reaching 93.1%. The behavior recognition model based on multimodal feature fusion has a higher recognition accuracy than the recognition model based on a single bone modal, reaching 95.4%. The experimental results on a self-made underground unsafe behavior dataset show that the behavior recognition model based on multimodal feature fusion still has the highest recognition accuracy in complex underground environments, reaching 93.3%. It can accurately recognize similar unsafe behaviors and multiple unsafe behaviors.
关键词

智能矿山行为识别目标检测姿态估计多模态特征融合RGB模态骨骼模态YOLOX
KeyWords

intelligent mine;behavior recognition;object detection;pose estimation;multi modal feature fusion;RGB mode;bone modal;YOLOX
基金项目(Foundation)

国家自然科学基金项目（51174110）。
DOI

10.13272/j.issn.1671-251x.2023070055
引用格式

王宇，于春华，陈晓青，等. 基于多模态特征融合的井下人员不安全行为识别[J]. 工矿自动化，2023，49（11）：138-144.
Citation

WANG Yu, YU Chunhua, CHEN Xiaoqing, et al. Recognition of unsafe behaviors of underground personnel based on multi modal feature fusion[J]. Journal of Mine Automation，2023，49（11）：138-144.
相关专题

这篇文章属于“全矿井智能视频分析技术”专题（20篇）
图表
图(9) / 表(0)

煤问提

煤传媒

煤视界

科技创新50强

会员中心