Details
Original language | English |
---|---|
Pages (from-to) | 169-177 |
Number of pages | 9 |
Journal | ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences |
Volume | X-3-2024 |
Publication status | Published - 4 Nov 2024 |
Event | 2024 Symposium on Beyond the Canopy: Technologies and Applications of Remote Sensing - Belem, Brazil Duration: 4 Nov 2024 → 8 Nov 2024 |
Abstract
Semantic segmentation is essential in the field of remote sensing because it is used for various applications such as environmental monitoring and land cover classification. Recent advancements aim to collectively classify data from diverse sensors and epochs to improve predictive accuracy. With the availability of vast Satellite Image Time Series (SITS) data, supervised deep learning methods, such as Transformer models, become viable options. This paper introduces the Temporal Vision Transformer(ViT), designed to extract features from SITS. These features, capturing the temporal patterns of land cover classes, are integrated with features derived from aerial imagery to improve land cover classification. Drawing inspiration from the success of transformers in Natural language processing (NLP), Temporal ViT concurrently extracts spatial and temporal information from SITS data using tailored positional encoding strategies. The proposed approach fosters comprehensive feature learning across both domains, facilitating seamless integration of encoded data from SITS into aerial images. Furthermore, a training strategy is proposed that supports the Temporal ViT to focus on classes with a changing appearance over the year. Extensive experiments carried out in this work indicate the enhanced classification performance of Temporal ViT compared to existing state-of-the-art techniques for multi-modal land cover classification. Our model achieves a 3.8% increase in the mean IoU compared to the network solely relying on aerial images.
Keywords
- Land cover classification, Multi-Sensor remote sensing, Satellite image time series, Semantic segmentation, Vision transformer
ASJC Scopus subject areas
- Physics and Astronomy(all)
- Instrumentation
- Environmental Science(all)
- Environmental Science (miscellaneous)
- Earth and Planetary Sciences(all)
- Earth and Planetary Sciences (miscellaneous)
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. X-3-2024, 04.11.2024, p. 169-177.
Research output: Contribution to journal › Conference article › Research › peer review
}
TY - JOUR
T1 - Temporal ViT-U-Net Tandem Model
T2 - 2024 Symposium on Beyond the Canopy: Technologies and Applications of Remote Sensing
AU - Heidarianbaei, Mohammadreza
AU - Kanyamahanga, Hubert
AU - Dorozynski, Mareike
N1 - Publisher Copyright: © Author(s) 2024.
PY - 2024/11/4
Y1 - 2024/11/4
N2 - Semantic segmentation is essential in the field of remote sensing because it is used for various applications such as environmental monitoring and land cover classification. Recent advancements aim to collectively classify data from diverse sensors and epochs to improve predictive accuracy. With the availability of vast Satellite Image Time Series (SITS) data, supervised deep learning methods, such as Transformer models, become viable options. This paper introduces the Temporal Vision Transformer(ViT), designed to extract features from SITS. These features, capturing the temporal patterns of land cover classes, are integrated with features derived from aerial imagery to improve land cover classification. Drawing inspiration from the success of transformers in Natural language processing (NLP), Temporal ViT concurrently extracts spatial and temporal information from SITS data using tailored positional encoding strategies. The proposed approach fosters comprehensive feature learning across both domains, facilitating seamless integration of encoded data from SITS into aerial images. Furthermore, a training strategy is proposed that supports the Temporal ViT to focus on classes with a changing appearance over the year. Extensive experiments carried out in this work indicate the enhanced classification performance of Temporal ViT compared to existing state-of-the-art techniques for multi-modal land cover classification. Our model achieves a 3.8% increase in the mean IoU compared to the network solely relying on aerial images.
AB - Semantic segmentation is essential in the field of remote sensing because it is used for various applications such as environmental monitoring and land cover classification. Recent advancements aim to collectively classify data from diverse sensors and epochs to improve predictive accuracy. With the availability of vast Satellite Image Time Series (SITS) data, supervised deep learning methods, such as Transformer models, become viable options. This paper introduces the Temporal Vision Transformer(ViT), designed to extract features from SITS. These features, capturing the temporal patterns of land cover classes, are integrated with features derived from aerial imagery to improve land cover classification. Drawing inspiration from the success of transformers in Natural language processing (NLP), Temporal ViT concurrently extracts spatial and temporal information from SITS data using tailored positional encoding strategies. The proposed approach fosters comprehensive feature learning across both domains, facilitating seamless integration of encoded data from SITS into aerial images. Furthermore, a training strategy is proposed that supports the Temporal ViT to focus on classes with a changing appearance over the year. Extensive experiments carried out in this work indicate the enhanced classification performance of Temporal ViT compared to existing state-of-the-art techniques for multi-modal land cover classification. Our model achieves a 3.8% increase in the mean IoU compared to the network solely relying on aerial images.
KW - Land cover classification
KW - Multi-Sensor remote sensing
KW - Satellite image time series
KW - Semantic segmentation
KW - Vision transformer
UR - http://www.scopus.com/inward/record.url?scp=85212389099&partnerID=8YFLogxK
U2 - 10.5194/isprs-annals-X-3-2024-169-2024
DO - 10.5194/isprs-annals-X-3-2024-169-2024
M3 - Conference article
AN - SCOPUS:85212389099
VL - X-3-2024
SP - 169
EP - 177
JO - ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
JF - ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
SN - 2194-9042
Y2 - 4 November 2024 through 8 November 2024
ER -