SARATR-X: Towards Building A Foundation Model for SAR Target Recognition

Weijie Li (李玮杰), Wei Yang (杨威), Yuenan Hou (侯跃南), Li Liu (刘丽), Yongxiang Liu (刘永祥), and Xiang Li (黎湘)

📖 Introduction | ⚙️ Pre-training | 📊 Classification | 🎯 Detection | 📜 Statement

📖 Introduction

This is the official repository for the paper “SARATR-X: Towards Building A Foundation Model for SAR Target Recognition”. If you find our work useful, please give us a star 🌟 on GitHub and cite our paper using the BibTeX format at the end.

这里是论文 “SARATR-X: Towards Building A Foundation Model for SAR Target Recognition (SARATR-X：迈向SAR目标识别基础模型)” 的代码库。如果您觉得我们的工作有价值，请在 GitHub 上给我们一个星星 🌟，并按页面最后的 BibTeX 格式引用我们的论文。

You can contact us to obtain these datasets and weights by email: [email protected].

可以通过发送邮件至 [email protected] 获取相关数据集和权重。

Abstract:
Despite the remarkable progress in synthetic aperture radar automatic target recognition (SAR ATR), recent efforts have concentrated on detecting and classifying a specific category, e.g., vehicles, ships, airplanes, or buildings. One of the fundamental limitations of the top‑performing SAR ATR methods is that the learning paradigm is supervised, task‑specific, limited‑category, closed‑world learning, which depends on massive amounts of accurately annotated samples that are expensively labeled by expert SAR analysts and have limited generalization capability and scalability. In this work, we make the first attempt towards building a foundation model for SAR ATR, termed SARATR‑X. SARATR‑X learns generalizable representations via self‑supervised learning (SSL) and provides a cornerstone for label‑efficient model adaptation to generic SAR target detection and classification tasks. Specifically, SARATR‑X is trained on 0.18 M unlabelled SAR target samples, which are curated by combining contemporary benchmarks and constitute the largest publicly available dataset to date. Considering the characteristics of SAR images, a backbone tailored for SAR ATR is carefully designed, and a two‑step SSL method endowed with multi‑scale gradient features was applied to ensure the feature diversity and model scalability of SARATR‑X. The capabilities of SARATR‑X are evaluated on classification under few‑shot and robustness settings and detection across various categories and scenes, achieving impressive performance – often competitive with or even superior to prior fully supervised, semi‑supervised, or self‑supervised algorithms.

摘要：
尽管合成孔径雷达自动目标识别（SAR ATR）取得了显著进展，但最近的工作主要集中在对特定类别（如车辆、船舶、飞机或建筑物）的检测和分类上。性能良好的 SAR ATR 方法的一个基本局限是，其学习范式是有监督的、特定任务的、有限类别的、封闭世界的学习，依赖于大量由 SAR 专家高成本标注的准确样本，泛化能力和可扩展性有限。在这项工作中，我们首次尝试为 SAR ATR 建立一个基础模型，称为 SARATR‑X。SARATR‑X 通过自监督学习（SSL）学习可泛化的表征，为标签高效模型适应通用 SAR 目标检测和分类任务提供了基石。具体来说，SARATR‑X 在 0.18 M 个未标记的 SAR 目标样本上进行预训练，这些样本结合了当代基准数据集，构成了迄今为止最大的公开可用预训练数据集。考虑到 SAR 图像的特点，我们精心设计了专用于 SAR ATR 的主干网络，并采用了具有多尺度梯度特征的两步 SSL 方法，以确保 SARATR‑X 的特征多样性和模型可扩展性。我们对 SARATR‑X 的能力进行了评估，包括少样本和鲁棒性设置下的分类以及跨类别、跨场景的检测，其性能令人印象深刻，通常可与之前的全监督、半监督或自监督算法相媲美，甚至更胜一筹。

⚙️ Pre-training

Our code is based on SAR‑JEPA and HiViT.

Requirements

Python 3
CUDA 11.1
PyTorch 1.8+ with CUDA support
timm 0.5.4
tensorboard

Step‑by‑step installation (new version for H800 / MM3.0 – not compatible with the current GitHub version in the paper; please see `saratrx_mmdet3.0.7z` in BaiduYun and Hugging Face)

# Pre‑training environment
conda create -n saratrx python=3.10 -y
conda activate saratrx
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121
pip install timm==0.5.4 tensorboard
pip install opencv-python opencv-python-headless
pip install numpy==1.26.4

# Detection environment
cd detection
pip install "setuptools<80.9"
pip install -U openmim
mim install mmengine
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.1/index.html
pip install -v -e .
pip install pycocotools==2.0.7

# Classification environment
cd classification
cd Dassl.pytorch
pip install -r requirements.txt
python setup.py develop

Step‑by‑step installation (old version for 3090)

conda create -n saratrx python=3.9 -y
conda activate saratrx
cd pre-training

pip install torch==1.8.2 torchvision==0.9.2 torchaudio==0.8.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111
pip install timm==0.5.4 tensorboard
pip install -r requirements_pretrain.txt

Pre‑training dataset

Dataset	Year	Task	#Imgs.	Img. Size	#Targets	#Scenes	Res. (m)	Band	Pol.	Target description
MSTAR	1995	Cls.	14,577	128–193	10	1	0.3	X	Single	Fine‑grained vehicle dataset
Sandia MiniSAR	2006	Det.	3,927	224	≥1	≥7	0.1	Ku	Single	Terrestrial targets in urban, deserts, and others
SARSim	2017	Cls.	21,168	139	14	3	0.3	X	Single	Simulation vehicle dataset
SAMPLE	2019	Cls.	5,380	128	10	2	0.3	X	Single	Simulation and measured vehicle dataset
SIVED	2023	Det.	1,044	512	≥1	≥4	0.1–0.3	X/Ku/Ka	Single	Synthetic vehicle dataset
OpenSARShip	2017	Cls.	26,679	9–445	14	10	2.3–17.4	C	Double	Fine‑grained ship slices
SAR‑Ship	2019	Det.	39,729	256	≥1	≥4	3–25	C	Quad	Ship dataset in complex scenes
AIR‑SARShip	2019	Det.	801	512–1000	≥1	≥3	1–3	C	Single	Ship dataset in complex scenes
HRSID	2020	Det.	5,604	800	≥1	≥2	0.5–3	C/X	Quad	Instance‑level ship dataset
SSDD	2021	Det.	1,160	214–668	≥1	≥2	1–15	C/X	Quad	Ship dataset
SADD	2022	Det.	883	224	≥1	≥2	0.5–3	X	Single	Aircraft dataset
SAR‑AIRcraft	2023	Det.	18,818	512	≥7	≥3	1	C	Single	Aircraft dataset
MSAR	2022	Det.	28,499	256–2048	≥4	≥6	1	C	Quad	Terrestrial and maritime targets
OGSOD	2023	Det.	18,331	256	≥3	≥2	3	C	Double	Targets include bridges, oil tanks, and harbours

Start pre‑training with SAR images

We consider that constructing self‑supervised signals for SAR images, which suffer from many disturbances in imaging quality, requires a combination of noise suppression and feature compression. The initial weight is from mae_hivit_base_1600ep.pth.

python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 --use_env main_pretrain.py \
    --data_path <imagenet-path> --output_dir <pretraining-output-path> \
    --model mae_hivit_base_dec512d6b --mask_ratio 0.75 \
    --batch_size 100 --accum_iter 1 --blr 1.5e-4 --weight_decay 0.05 --epochs 800 --warmup_epochs 5

Q1: How do I use my dataset?
A1: Change --data_path and modify the data loading code if needed in main_pretrain.py and datasets.py.

# Dataset parameters
parser.add_argument('--data_path', default='D:\\2023_SARatrX_1\Pre-Train Data\\186K_notest\\', type=str,
                    help='dataset path')

from util.datasets import load_data
# dataset_train = datasets.ImageFolder(os.path.join(args.data_path), transform=transform_train)
dataset_train = load_data(os.path.join(args.data_path), transform=transform_train)
print(len(dataset_train))

Q2: How do we make improvements?
A2: You can add more high‑quality data and try more data augmentation methods. Besides, we suggest improvements to HiViT’s attention mechanism in models_hivit.py and our proposed SAR target features in models_hivit_mae.py.

# Simple augmentation
transform_train = transforms.Compose([
    transforms.RandomResizedCrop(args.input_size, scale=(0.2, 1.0), interpolation=3),  # 3 = bicubic
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(contrast=0.5),
    transforms.ToTensor(),
    # transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# SAR feature
self.sarfeature1 = GF(nbins=self.nbins, pool=self.cell_sz, kensize=9,
                      img_size=self.img_size, patch_size=self.patch_size)
self.sarfeature2 = GF(nbins=self.nbins, pool=self.cell_sz, kensize=13,
                      img_size=self.img_size, patch_size=self.patch_size)
self.sarfeature3 = GF(nbins=self.nbins, pool=self.cell_sz, kensize=17,
                      img_size=self.img_size, patch_size=self.patch_size)
target = torch.cat([self.patchify(self.sarfeature1(imgs)),
                    self.patchify(self.sarfeature2(imgs)),
                    self.patchify(self.sarfeature3(imgs))], dim=-1)

Q3: How to load ImageNet pre‑training weights?
A3: See main_pretrain.py.

# Define the model
model = models.__dict__[args.model](norm_pix_loss=args.norm_pix_loss)
checkpoint = torch.load('./mae_hivit_base_1600ep.pth', map_location='cpu')
# Load pre‑trained model
msg = model.load_state_dict(checkpoint, strict=False)
print(msg)

📊 Classification

Our few‑shot learning is based on Dassl. You need to install it and use our modified Dassl.pytorch\dassl\utils\tools.py and Dassl.pytorch\dassl\data\transforms\transforms.py from our modified zip for SAR single‑channel amplitude images. Then you can run our MIM_linear.sh for evaluations.

Linear probing with SAR‑VSA

Three open‑source target datasets (MSTAR, FUSARShip, and SAR‑ACD) were used to construct a fine‑grained classification dataset, SAR‑VSA, with 25 categories to evaluate the effectiveness of the proposed improvements.

Fine‑grained category	#Train	#Test
anti‑aircraft (ZSU234)	299	274
bulldozer (D7)	299	274
howitzer (2S1)	299	274
infantry vehicle (BMP2)	698	587
main battle tank (T62)	299	273
main battle tank (T72)	691	582
patrol car (BRDM2)	298	274
personnel carrier (BTR60)	256	195
personnel carrier (BTR70)	233	196
truck (ZIL131)	299	274
bridge	1,023	438
coastal land	707	303
land patch	1,137	487
sea clutter wave	1,378	590
sea patch	1,250	535
ship (cargo)	366	156
ship (fishing)	248	106
ship (tanker)	150	64
ship (others)	312	133
strong false alarms	299	128
aircraft (Airbus A220)	91	373
aircraft (Airbus A330)	97	415
aircraft (Comac ARJ21)	103	411
aircraft (Boeing 737)	100	428
aircraft (Boeing 787)	113	391

MSTAR’s SOC and EOC

Modification on Dassl. We perform center cropping for different slice sizes of the MSTAR dataset to prevent correlation between target categories and image sizes during resizing.

# Dassl.pytorch.zip\Dassl.pytorch\dassl\data\transforms\transforms.py

def _build_transform_test(cfg, choices, target_size, normalize):
    if cfg.OUTPUT_DIR.split('/')[-4].split('_')[0] == 'MSTAR':
        tfm_test += [CenterCrop([128, 128])]

🎯 Detection

We use MMDetection. For large‑image inference, see SARATR-X/detection/huge_image_infer.py.

Requirements

Python 3
CUDA 11.1
PyTorch 1.8+ with CUDA support
timm 0.5.4
mmcv‑full 1.6.0
opencv‑python
apex

Step‑by‑step installation (old version for 3090)

pip install torch==1.8.2 torchvision==0.9.2 torchaudio==0.8.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111
pip install mmcv-full==1.6.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.8/index.html
pip install opencv-python timm==0.5.4
sh ../install_apex.sh

pip install -e .
pip install -r requirements_detection.txt

Fine‑tuning

chmod -R +x tools
./tools/dist_train.sh configs/_hivit_/hivit_base_SARDet.py 8 --work-dir ./work_dirs/SARDet

📜 Statement

This project is released under the CC BY‑NC 4.0 license.
本项目采用 CC BY‑NC 4.0 协议发布。
If you have any questions or need additional data, code, or weight files, please contact us at [email protected].
如有任何问题或者需要其他数据、代码和权重文件，请通过 [email protected] 联系我们。
If you find our work useful, please give us a star 🌟 on GitHub and cite our paper using the following BibTeX entries:
如果您觉得我们的工作有价值，请在 GitHub 上给我们一颗星星 🌟，并按以下 BibTeX 格式引用我们的论文：

@ARTICLE{li2025saratr,
  author={Li, Weijie and Yang, Wei and Hou, Yuenan and Liu, Li and Liu, Yongxiang and Li, Xiang},
  journal={IEEE Transactions on Image Processing}, 
  title={SARATR-X: Toward Building a Foundation Model for SAR Target Recognition}, 
  year={2025},
  volume={34},
  number={},
  pages={869-884},
  doi={10.1109/TIP.2025.3531988}
}

@ARTICLE{li2024predicting,
  title = {Predicting gradient is better: Exploring self-supervised learning for SAR ATR with a joint-embedding predictive architecture},
  journal = {ISPRS Journal of Photogrammetry and Remote Sensing},
  volume = {218},
  pages = {326-338},
  year = {2024},
  issn = {0924-2716},
  doi = {https://doi.org/10.1016/j.isprsjprs.2024.09.013},
  url = {https://www.sciencedirect.com/science/article/pii/S0924271624003514},
  author = {Li, Weijie and Yang, Wei and Liu, Tianpeng and Hou, Yuenan and Li, Yuxuan and Liu, Zhen and Liu, Yongxiang and Liu, Li}
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
classification		classification
detection		detection
example		example
pre-training		pre-training
LICENSE		LICENSE
README.md		README.md
requirements_detection.txt		requirements_detection.txt
requirements_pretrain.txt		requirements_pretrain.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SARATR-X: Towards Building A Foundation Model for SAR Target Recognition

Weijie Li (李玮杰), Wei Yang (杨威), Yuenan Hou (侯跃南), Li Liu (刘丽), Yongxiang Liu (刘永祥), and Xiang Li (黎湘)

📖 Introduction

⚙️ Pre-training

Requirements

Step‑by‑step installation (new version for H800 / MM3.0 – not compatible with the current GitHub version in the paper; please see `saratrx_mmdet3.0.7z` in BaiduYun and Hugging Face)

Step‑by‑step installation (old version for 3090)

Pre‑training dataset

Start pre‑training with SAR images

📊 Classification

Linear probing with SAR‑VSA

MSTAR’s SOC and EOC

🎯 Detection

Requirements

Step‑by‑step installation (old version for 3090)

Fine‑tuning

📜 Statement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SARATR-X: Towards Building A Foundation Model for SAR Target Recognition

Weijie Li (李玮杰), Wei Yang (杨威), Yuenan Hou (侯跃南), Li Liu (刘丽), Yongxiang Liu (刘永祥), and Xiang Li (黎湘)

📖 Introduction

⚙️ Pre-training

Requirements

Step‑by‑step installation (new version for H800 / MM3.0 – not compatible with the current GitHub version in the paper; please see saratrx_mmdet3.0.7z in BaiduYun and Hugging Face)

Step‑by‑step installation (old version for 3090)

Pre‑training dataset

Start pre‑training with SAR images

📊 Classification

Linear probing with SAR‑VSA

MSTAR’s SOC and EOC

🎯 Detection

Requirements

Step‑by‑step installation (old version for 3090)

Fine‑tuning

📜 Statement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Step‑by‑step installation (new version for H800 / MM3.0 – not compatible with the current GitHub version in the paper; please see `saratrx_mmdet3.0.7z` in BaiduYun and Hugging Face)

Packages