Dépôt numérique
RECHERCHER

Integrating Local and Global Features in a Convolutional Vision Transformer for Wetland Mapping.

Hamed Alizadeh Moghaddam, Sayyed ORCID logoORCID: https://orcid.org/0000-0003-2992-4277; Gazor, Saeed ORCID logoORCID: https://orcid.org/0000-0003-4368-6682; Homayouni, Saeid ORCID logoORCID: https://orcid.org/0000-0002-0214-5356 et Karami, Fahime (2025). Integrating Local and Global Features in a Convolutional Vision Transformer for Wetland Mapping. IEEE Sensors Journal , vol. 25 , nº 5. pp. 8674-8683. DOI: 10.1109/JSEN.2025.3529762.

Ce document n'est pas hébergé sur EspaceINRS.

Résumé

Accurate classification of hyperspectral images is particularly challenging in wetlands regions, where land cover classes are highly similar. While integrating local and global features can enhance feature representation and improve classification accuracy, this approach remains largely unexplored in wetland mapping (WM) applications. In addition, existing methods often suffer from information loss during feature extraction and integration processes. To address these challenges, this article introduces a convolutional vision transformer (ConViT), which combines convolutional neural networks and vision transformers (ViTs) to simultaneously leverage local and global features for WM. ConViT consists of four components: 1) local feature extraction using DenseNet; 2) a novel local embedding for tokenization that preserves spatial relationships; 3) a convolutional-transformer block (CTB); and 4) a softmax-based classifier. By integrating DenseNet with squeeze-and-excitation (SE) layers, ConViT enhances feature reuse and boosts sensitivity to subtle variations in wetland classes. Our contributions include a dual-input CTB with skip connections for richer feature extraction, a unique local embedding method to maintain spatial context and efficient global-local feature fusion. ConViT outperforms two traditional and seven deep learning classifiers for WM across four datasets. In addition, ConViT exhibits superiority in visual inspection, statistical significance tests, and focused comparison experiments, which pinpoints its potential use for WM. The source codes of this work will be available at https://github.com/halizz821/ConViT.

Type de document: Article
Mots-clés libres: convolutional neural networks; hyperspectral image classification; vision transformer (ViT); wetland mapping (WM)
Centre: Centre Eau Terre Environnement
Date de dépôt: 26 mars 2025 19:56
Dernière modification: 26 mars 2025 19:56
URI: https://espace.inrs.ca/id/eprint/16396

Gestion Actions (Identification requise)

Modifier la notice Modifier la notice