Transformer Models For Multi-Temporal Land Cover Classification Unsing Remote Sensing Images

Research output: Contribution to journalConference articleResearchpeer review

Authors

  • M. Voelsen
  • S. Lauble
  • F. Rottensteiner
  • C. Heipke
View graph of relations

Details

Original languageEnglish
Pages (from-to)981-990
Number of pages10
JournalISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Volume10
Issue number1
Publication statusPublished - 5 Dec 2023
EventISPRS Geospatial Week 2023 - Kairo, Egypt
Duration: 2 Sept 20237 Sept 2023

Abstract

The pixel-wise classification of land cover, i.e. the task of identifying the physical material of the Earth's surface in an image, is one of the basic applications of satellite image time series (SITS) processing. With the availability of large amounts of SITS it is possible to use supervised deep learning techniques such as Transformer models to analyse the Earth's surface at global scale and with high spatial and temporal resolution. While most approaches for land cover classification focus on the generation of a mono-temporal output map, we extend established deep learning models to multi-temporal input and output: using images acquired at different epochs we generate one output map for each input timestep. This has the advantage that the temporal change of land cover can be monitored. In addition, features conflicting over time are not averaged. We extend the Swin Transformer for SITS and introduce a new spatio-temporal transformer block (ST-TB) that extracts spatial and temporal features. We combine the ST-TB with the swin transformer block (STB) that is used in parallel for the individual input timesteps to extract spatial features. Furthermore, we investigate the usage of a temporal position encoding and different patch sizes. The latter is used to merge neighbouring pixels in the input embedding. Using SITS from Sentinel-2, the classification of land cover is improved by +1.8% in the mean F1-Score when using the ST-TB in the first stage of the Swin Transformer compared to a Swin Transformer without the ST-TB layer and by +1,6% compared to fully convolutional approaches. This demonstrates the advantage of the introduced ST-TB layer for the classification of SITS.

Keywords

    FCN, land cover classification, multi-temporal images, remote sensing, Swin Transformer

ASJC Scopus subject areas

Cite this

Transformer Models For Multi-Temporal Land Cover Classification Unsing Remote Sensing Images. / Voelsen, M.; Lauble, S.; Rottensteiner, F. et al.
In: ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. 10, No. 1, 05.12.2023, p. 981-990.

Research output: Contribution to journalConference articleResearchpeer review

Voelsen, M, Lauble, S, Rottensteiner, F & Heipke, C 2023, 'Transformer Models For Multi-Temporal Land Cover Classification Unsing Remote Sensing Images', ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 10, no. 1, pp. 981-990. https://doi.org/10.5194/isprs-annals-X-1-W1-2023-981-2023
Voelsen, M., Lauble, S., Rottensteiner, F., & Heipke, C. (2023). Transformer Models For Multi-Temporal Land Cover Classification Unsing Remote Sensing Images. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 10(1), 981-990. https://doi.org/10.5194/isprs-annals-X-1-W1-2023-981-2023
Voelsen M, Lauble S, Rottensteiner F, Heipke C. Transformer Models For Multi-Temporal Land Cover Classification Unsing Remote Sensing Images. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. 2023 Dec 5;10(1):981-990. doi: 10.5194/isprs-annals-X-1-W1-2023-981-2023
Voelsen, M. ; Lauble, S. ; Rottensteiner, F. et al. / Transformer Models For Multi-Temporal Land Cover Classification Unsing Remote Sensing Images. In: ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. 2023 ; Vol. 10, No. 1. pp. 981-990.
Download
@article{10f18da4c3da4e1bb33cac802a5a4b10,
title = "Transformer Models For Multi-Temporal Land Cover Classification Unsing Remote Sensing Images",
abstract = "The pixel-wise classification of land cover, i.e. the task of identifying the physical material of the Earth's surface in an image, is one of the basic applications of satellite image time series (SITS) processing. With the availability of large amounts of SITS it is possible to use supervised deep learning techniques such as Transformer models to analyse the Earth's surface at global scale and with high spatial and temporal resolution. While most approaches for land cover classification focus on the generation of a mono-temporal output map, we extend established deep learning models to multi-temporal input and output: using images acquired at different epochs we generate one output map for each input timestep. This has the advantage that the temporal change of land cover can be monitored. In addition, features conflicting over time are not averaged. We extend the Swin Transformer for SITS and introduce a new spatio-temporal transformer block (ST-TB) that extracts spatial and temporal features. We combine the ST-TB with the swin transformer block (STB) that is used in parallel for the individual input timesteps to extract spatial features. Furthermore, we investigate the usage of a temporal position encoding and different patch sizes. The latter is used to merge neighbouring pixels in the input embedding. Using SITS from Sentinel-2, the classification of land cover is improved by +1.8% in the mean F1-Score when using the ST-TB in the first stage of the Swin Transformer compared to a Swin Transformer without the ST-TB layer and by +1,6% compared to fully convolutional approaches. This demonstrates the advantage of the introduced ST-TB layer for the classification of SITS.",
keywords = "FCN, land cover classification, multi-temporal images, remote sensing, Swin Transformer",
author = "M. Voelsen and S. Lauble and F. Rottensteiner and C. Heipke",
note = "Funding Information: We thank the German Land Survey Office of Lower Saxony (Landesamt f{\"u}r Geoinformation und Landesvermessung Niedersachsen - LGLN) for providing the data of the geospatial database and for their support of this project. We thank NVIDIA Corporation for providing GPU resources to this project. ; ISPRS Geospatial Week 2023 ; Conference date: 02-09-2023 Through 07-09-2023",
year = "2023",
month = dec,
day = "5",
doi = "10.5194/isprs-annals-X-1-W1-2023-981-2023",
language = "English",
volume = "10",
pages = "981--990",
number = "1",

}

Download

TY - JOUR

T1 - Transformer Models For Multi-Temporal Land Cover Classification Unsing Remote Sensing Images

AU - Voelsen, M.

AU - Lauble, S.

AU - Rottensteiner, F.

AU - Heipke, C.

N1 - Funding Information: We thank the German Land Survey Office of Lower Saxony (Landesamt für Geoinformation und Landesvermessung Niedersachsen - LGLN) for providing the data of the geospatial database and for their support of this project. We thank NVIDIA Corporation for providing GPU resources to this project.

PY - 2023/12/5

Y1 - 2023/12/5

N2 - The pixel-wise classification of land cover, i.e. the task of identifying the physical material of the Earth's surface in an image, is one of the basic applications of satellite image time series (SITS) processing. With the availability of large amounts of SITS it is possible to use supervised deep learning techniques such as Transformer models to analyse the Earth's surface at global scale and with high spatial and temporal resolution. While most approaches for land cover classification focus on the generation of a mono-temporal output map, we extend established deep learning models to multi-temporal input and output: using images acquired at different epochs we generate one output map for each input timestep. This has the advantage that the temporal change of land cover can be monitored. In addition, features conflicting over time are not averaged. We extend the Swin Transformer for SITS and introduce a new spatio-temporal transformer block (ST-TB) that extracts spatial and temporal features. We combine the ST-TB with the swin transformer block (STB) that is used in parallel for the individual input timesteps to extract spatial features. Furthermore, we investigate the usage of a temporal position encoding and different patch sizes. The latter is used to merge neighbouring pixels in the input embedding. Using SITS from Sentinel-2, the classification of land cover is improved by +1.8% in the mean F1-Score when using the ST-TB in the first stage of the Swin Transformer compared to a Swin Transformer without the ST-TB layer and by +1,6% compared to fully convolutional approaches. This demonstrates the advantage of the introduced ST-TB layer for the classification of SITS.

AB - The pixel-wise classification of land cover, i.e. the task of identifying the physical material of the Earth's surface in an image, is one of the basic applications of satellite image time series (SITS) processing. With the availability of large amounts of SITS it is possible to use supervised deep learning techniques such as Transformer models to analyse the Earth's surface at global scale and with high spatial and temporal resolution. While most approaches for land cover classification focus on the generation of a mono-temporal output map, we extend established deep learning models to multi-temporal input and output: using images acquired at different epochs we generate one output map for each input timestep. This has the advantage that the temporal change of land cover can be monitored. In addition, features conflicting over time are not averaged. We extend the Swin Transformer for SITS and introduce a new spatio-temporal transformer block (ST-TB) that extracts spatial and temporal features. We combine the ST-TB with the swin transformer block (STB) that is used in parallel for the individual input timesteps to extract spatial features. Furthermore, we investigate the usage of a temporal position encoding and different patch sizes. The latter is used to merge neighbouring pixels in the input embedding. Using SITS from Sentinel-2, the classification of land cover is improved by +1.8% in the mean F1-Score when using the ST-TB in the first stage of the Swin Transformer compared to a Swin Transformer without the ST-TB layer and by +1,6% compared to fully convolutional approaches. This demonstrates the advantage of the introduced ST-TB layer for the classification of SITS.

KW - FCN

KW - land cover classification

KW - multi-temporal images

KW - remote sensing

KW - Swin Transformer

UR - http://www.scopus.com/inward/record.url?scp=85179017131&partnerID=8YFLogxK

U2 - 10.5194/isprs-annals-X-1-W1-2023-981-2023

DO - 10.5194/isprs-annals-X-1-W1-2023-981-2023

M3 - Conference article

AN - SCOPUS:85179017131

VL - 10

SP - 981

EP - 990

JO - ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences

JF - ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences

SN - 2194-9042

IS - 1

T2 - ISPRS Geospatial Week 2023

Y2 - 2 September 2023 through 7 September 2023

ER -