Pedestrian Re-identification Based on Joint Attention Mechanism and Multimodal Features
DOI:
https://doi.org/10.53469/jrse.2025.07(07).04Keywords:
Pedestrian Re-identification, Multimodal Learning, Attention Mechanism, Cross-modal Fusion, Feature AlignmentAbstract
To address the challenges of insufficient robustness in single-modal features and interference from cross-modal disparities in pedestrian re-identification under complex scenarios, we propose a novel network model that integrates joint attention mechanisms and multimodal features. Built upon a residual network backbone, the model introduces a cross-modal self-attention module to adaptively weight features from RGB, thermal infrared, and depth modalities. A multimodal feature fusion module is designed with three branches: intra-modal enhancement, cross-modal correlation, and modal discrepancy suppression, which together construct comprehensive pedestrian feature representations. During optimization, we introduce a combination of modal cosine cross-entropy loss, cross-modal triplet loss, center alignment loss, and modal consistency loss, updating the network using a min-max strategy. The proposed method achieves top-1 accuracy rates of 94.3% and 88.7% on the RegDB and SYSU-MM01 datasets, respectively, demonstrating its effectiveness in multimodal pedestrian re-identification scenarios.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Li Fan

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.