Fine-tune and evaluate VisionMamba (Vim) on the Flame wildfire dataset. Propose, implement and compare Hybrid models with existing SOTA models that don't make use of self-attention, thereby, avoiding it's computational costs. The idea is to verify or refute claims that Mamba (VisionMamba), in this case and State Space Models in general, are able to capture long-range dependencies without the need for self-attention.
Right now, we're working on three branches
main- This branch contains the original code from the VisionMamba repository. We have used this branch to fine-tune and evaluate the VisionMamba model on the Flame wildfire dataset.control- This branch contains code for our controlled experiments with other models.hybrid- This branch contains code for our VisionMambaTiny+EfficientNetB0 hybrid model.
- Added support for single GPU training and fine-tuning bypassing torch.distributed module
- Addressed bugs related to conv1d versioning in mamba module's setup.py
- Added support for splitting train data into train and validation sets
- Added support for binary classifier
- Control experiment conditions
- Added test.py for separate evaluation on a previously unseen test set
- Bash scripts for training, fine-tuning and testing separate variants for the VisionMamba model
- Code refactoring for better readability
- Documentation on environment setup
For detailed setup instructions, please go through our Notion page.
Please direct all VisionMamba related queries to Sayeedur Rahman (sayeedur.rahman@g.bracu.ac.bd) or open an issue on Github.
| Model | #param. | Top-1 Acc. | Top-5 Acc. | Hugginface Repo |
|---|---|---|---|---|
| Vim-tiny | 7M | 76.1 | 93.0 | https://huggingface.co/hustvl/Vim-tiny-midclstok |
| Vim-tiny+ | 7M | 78.3 | 94.2 | https://huggingface.co/hustvl/Vim-tiny-midclstok |
| Vim-small | 26M | 80.5 | 95.1 | https://huggingface.co/hustvl/Vim-small-midclstok |
| Vim-small+ | 26M | 81.6 | 95.4 | https://huggingface.co/hustvl/Vim-small-midclstok |
| Vim-base | 98M | 81.9 | 95.8 | https://huggingface.co/hustvl/Vim-base-midclstok |
Notes:
- + means that we finetune at finer granularity with short schedule.
To evaluate Vim-Ti on ImageNet-1K, run:
python main.py --eval --resume /path/to/ckpt --model vim_tiny_patch16_224_bimambav2_final_pool_mean_abs_pos_embed_with_midclstok_div2 --data-path /path/to/imagenetThis project is based on Mamba (paper, code), Causal-Conv1d (code), DeiT (paper, code). Thanks for their wonderful works.
If you find Vim is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.
@inproceedings{vim,
title={Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model},
author={Zhu, Lianghui and Liao, Bencheng and Zhang, Qian and Wang, Xinlong and Liu, Wenyu and Wang, Xinggang},
booktitle={Forty-first International Conference on Machine Learning}
}