Camouflage object detection (COD)is a challenging task that aims to identify targets “seamlessly” concealed within their surrounding environment, presenting a more challenging task compared to traditional object detection[1]. While under the Video camouflage object detection (VCOD), the intrinsic high variations and increased complexity in the scene poses new obstacles for the detection with videos. We proposed a green method, termed GreenCOD, that leverages gradient boosting and deep features extracted from pre-trained Deep Neural Networks (DNNs), efficiently detects the camouflage objects without back-propagation. In this quarter, based on the GreenCOD model, we further move on to explore how to deal with object detection in camouflage videos in a light-weighted, explainable way.
Inspired by the GreenCOD pipeline, our architecture integrates the EfficientNetB4 backbone in the feature extraction module for each frame. Initially, the input frames are first reshaped into a standard size of 672x672x3 and the processed through the 8-block EfficientNetB4 backbone for feature extraction under different size of reception fields. To well consider the information across different reception spatial sizes, all the features are resized and concatenated to form a rich set of features. A hierarchical architecture is implemented in decision learning module.
Then XGBoost is trained based on the initial prediction maps and temporal information. The short term temporal information is considered by extracting the motion among consecutive frames. The motion flow maps are extracted at a higher resolution, then followed by the utilization of neighborhood reconstruction. This approach ensures that each prediction location takes into account the information from a corresponding 4×4 window in the motion map. The initial result show a satisfying result on VCOD problems.
[1] Fan, Deng-Ping, et al. “Camouflaged object detection.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.Cheng, X., Xiong, H., Fan, D., Zhong, Y., Harandi, M.T.,
[2] Drummond, T., & Ge, Z. (2022). Implicit Motion Handling for Video Camouflaged Object Detection. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13854-13863.