Details
Original language | English |
---|---|
Pages (from-to) | 6811–6837 |
Number of pages | 27 |
Journal | Machine learning |
Volume | 113 |
Issue number | 9 |
Early online date | 15 Jul 2024 |
Publication status | Published - Sept 2024 |
Abstract
Multi-target multi-camera tracking is crucial to intelligent transportation systems. Numerous recent studies have been undertaken to address this issue. Nevertheless, using the approaches in real-world situations is challenging due to the scarcity of publicly available data and the laborious process of manually annotating the new dataset and creating a tailored rule-based matching system for each camera scenario. To address this issue, we present a novel solution termed LaMMOn, an end-to-end transformer and graph neural network-based multi-camera tracking model. LaMMOn consists of three main modules: (1) Language Model Detection (LMD) for object detection; (2) Language and Graph Model Association module (LGMA) for object tracking and trajectory clustering; (3) Text-to-embedding module (T2E) that overcome the problem of data limitation by synthesizing the object embedding from defined texts. LaMMOn can be run online in real-time scenarios and achieve a competitive result on many datasets, e.g., CityFlow (HOTA 76.46%), I24 (HOTA 25.7%), and TrackCUIP (HOTA 80.94%) with an acceptable FPS (from 12.20 to 13.37) for an online application.
Keywords
- Language model, Mtmct, Multi-camera tracking, Object tracking
ASJC Scopus subject areas
- Computer Science(all)
- Software
- Computer Science(all)
- Artificial Intelligence
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: Machine learning, Vol. 113, No. 9, 09.2024, p. 6811–6837.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - LaMMOn
T2 - language model combined graph neural network for multi-target multi-camera tracking in online scenarios
AU - Nguyen, Tuan T.
AU - Nguyen, Hoang H.
AU - Sartipi, Mina
AU - Fisichella, Marco
N1 - Publisher Copyright: © The Author(s) 2024.
PY - 2024/9
Y1 - 2024/9
N2 - Multi-target multi-camera tracking is crucial to intelligent transportation systems. Numerous recent studies have been undertaken to address this issue. Nevertheless, using the approaches in real-world situations is challenging due to the scarcity of publicly available data and the laborious process of manually annotating the new dataset and creating a tailored rule-based matching system for each camera scenario. To address this issue, we present a novel solution termed LaMMOn, an end-to-end transformer and graph neural network-based multi-camera tracking model. LaMMOn consists of three main modules: (1) Language Model Detection (LMD) for object detection; (2) Language and Graph Model Association module (LGMA) for object tracking and trajectory clustering; (3) Text-to-embedding module (T2E) that overcome the problem of data limitation by synthesizing the object embedding from defined texts. LaMMOn can be run online in real-time scenarios and achieve a competitive result on many datasets, e.g., CityFlow (HOTA 76.46%), I24 (HOTA 25.7%), and TrackCUIP (HOTA 80.94%) with an acceptable FPS (from 12.20 to 13.37) for an online application.
AB - Multi-target multi-camera tracking is crucial to intelligent transportation systems. Numerous recent studies have been undertaken to address this issue. Nevertheless, using the approaches in real-world situations is challenging due to the scarcity of publicly available data and the laborious process of manually annotating the new dataset and creating a tailored rule-based matching system for each camera scenario. To address this issue, we present a novel solution termed LaMMOn, an end-to-end transformer and graph neural network-based multi-camera tracking model. LaMMOn consists of three main modules: (1) Language Model Detection (LMD) for object detection; (2) Language and Graph Model Association module (LGMA) for object tracking and trajectory clustering; (3) Text-to-embedding module (T2E) that overcome the problem of data limitation by synthesizing the object embedding from defined texts. LaMMOn can be run online in real-time scenarios and achieve a competitive result on many datasets, e.g., CityFlow (HOTA 76.46%), I24 (HOTA 25.7%), and TrackCUIP (HOTA 80.94%) with an acceptable FPS (from 12.20 to 13.37) for an online application.
KW - Language model
KW - Mtmct
KW - Multi-camera tracking
KW - Object tracking
UR - http://www.scopus.com/inward/record.url?scp=85198649632&partnerID=8YFLogxK
U2 - 10.1007/s10994-024-06592-1
DO - 10.1007/s10994-024-06592-1
M3 - Article
AN - SCOPUS:85198649632
VL - 113
SP - 6811
EP - 6837
JO - Machine learning
JF - Machine learning
SN - 0885-6125
IS - 9
ER -