From a7a71a605aec4c726c30599538277406c620caa2 Mon Sep 17 00:00:00 2001
From: Huadai Liu <22160146@zju.edu.cn>
Date: Tue, 24 Mar 2026 15:02:46 +0800
Subject: [PATCH 1/2] Release PrismAudio (ICLR 2026)
---
README.md | 62 ++++++++++++++++++++++++++++++++++++++++---------------
1 file changed, 45 insertions(+), 17 deletions(-)
diff --git a/README.md b/README.md
index 88ac552..553d253 100644
--- a/README.md
+++ b/README.md
@@ -11,8 +11,8 @@
-
-
+
+
@@ -37,6 +37,19 @@
---
+## Repository layout
+
+This **ThinkSound** GitHub repository hosts two related projects on separate branches:
+
+| Branch | Project | Documentation |
+|--------|---------|----------------|
+| **`master`** | **ThinkSound** (NeurIPS 2025) — unified Any2Audio generation with CoT-guided flow matching | This file: **`README.md`** |
+| **`prismaudio`** | **PrismAudio** — follow-up work (ICLR 2026) on video-to-audio with multi-dimensional CoT-RL | **`README.md`** on the [`prismaudio`](https://github.com/liuhuadai/ThinkSound/tree/prismaudio) branch |
+
+For **ThinkSound**, use branch **`master`** (this README). For **PrismAudio**, check out **`prismaudio`** and follow **`README.md`** there.
+
+---
+
**ThinkSound** is a unified Any2Audio generation framework with flow matching guided by Chain-of-Thought (CoT) reasoning.
PyTorch implementation for multimodal audio generation and editing: generate or edit audio from video, text, and audio, powered by step-by-step reasoning from Multimodal Large Language Models (MLLMs).
@@ -45,10 +58,11 @@ PyTorch implementation for multimodal audio generation and editing: generate or
---
## 📰 News
-- **2026.01.26** 🎉 PrismAudio has been accepted to the **ICLR 2026 Main Conference**! We plan to release the project in February 2026.
-- **2025.11.25** 🔥[Online PrismAudio Demo](http://prismaudio-project.github.io/) is live - try it now!
-- **2025.11.25** 🔥[PrismAudio paper](https://arxiv.org/pdf/2511.18833) released on arXiv, the first multi-dimensional CoT-RL framework for Video-to-Audio Generation!
-- **2025.09.19** 🎉 ThinkSound has been accepted to the **NeurIPS 2025 Main Conference**!
+- **2026.03.24** 🔥 **PrismAudio** (sequel to ThinkSound, different project name) is released in the same repo on branch [`prismaudio`](https://github.com/liuhuadai/ThinkSound/tree/prismaudio) — see **`README.md`** there for setup and models.
+- **2026.01.26** 🎉 PrismAudio accepted to **ICLR 2026 Main Conference** (code/docs on `prismaudio`).
+- **2025.11.25** 🔥 [Online PrismAudio Demo](http://prismaudio-project.github.io/) is live.
+- **2025.11.25** 🔥 [PrismAudio paper](https://arxiv.org/pdf/2511.18833) on arXiv — multi-dimensional CoT-RL for video-to-audio.
+- **2025.09.19** 🎉 **ThinkSound** accepted to the **NeurIPS 2025 Main Conference**!
- **2025.09.01** Our AudioCoT dataset is now open-sourced and available on [Hugging Face](https://huggingface.co/datasets/liuhuadai/AudioCoT)!
- **2025.07.17** 🧠 Finetuning enabled: training and finetuning code is now publicly available, along with clear usage instructions to help you customize and extend ThinkSound with your own data.
- **2025.07.15** 📦 Simplified installation and usability: dependencies on PyPI for easy cross-platform setup; Windows `.bat` scripts automate environment creation and script running.
@@ -61,6 +75,19 @@ PyTorch implementation for multimodal audio generation and editing: generate or
---
+
+
+### Follow-up: PrismAudio (same repo, `prismaudio` branch)
+
+**PrismAudio** is the successor to ThinkSound (ICLR 2026), developed under a new name but kept in this repository on branch **`prismaudio`**. Installation, checkpoints, and citation are in **[`README.md` on that branch](https://github.com/liuhuadai/ThinkSound/blob/prismaudio/README.md)**.
+
+👉 [`git checkout prismaudio`](https://github.com/liuhuadai/ThinkSound/tree/prismaudio) or open the branch on GitHub.
+
+
+
+---
+
+
## 🚀 Features
- **Any2Audio**: Generate audio from arbitrary modalities — video, text, audio, or their combinations.
@@ -89,7 +116,8 @@ ThinkSound decomposes audio generation and editing into three interactive stages
**Environment Preparation:**
```bash
-git clone https://github.com/liuhuadai/ThinkSound.git
+# ThinkSound code: branch master. PrismAudio: clone with -b prismaudio (see README.md on that branch).
+git clone -b master https://github.com/liuhuadai/ThinkSound.git
cd ThinkSound
conda create -n thinksound python=3.10
conda activate thinksound
@@ -174,15 +202,6 @@ See [`Training.md`](docs/Training.md)
---
-## 📝 TODO & Future Plans
-* - [ ] Release a more powerful foundation model covering multiple domains to provide more engaging and immersive foley creation
-* - [ ] Add support for additional modalities and downstream tasks
-* - [ ] Release models at different scales
-* - [x] Open-source AudioCoT dataset and automated pipeline
-* - [x] Release training scripts for ThinkSound models
-* - [x] A beginner-friendly Windows quick-start README
----
-
## 📄 License
@@ -216,7 +235,7 @@ For providing an easy-to-use framework for audio generation, as well as the VAE
## 📖 Citation
-If you find ThinkSound useful in your research or work, please cite our paper:
+If you find our project useful in your research or work, please cite our paper:
```bibtex
@misc{liu2025thinksoundchainofthoughtreasoningmultimodal,
@@ -228,6 +247,15 @@ If you find ThinkSound useful in your research or work, please cite our paper:
primaryClass={eess.AS},
url={https://arxiv.org/abs/2506.21448},
}
+@misc{liu2025prismaudiodecomposedchainofthoughtsmultidimensional,
+ title={PrismAudio: Decomposed Chain-of-Thoughts and Multi-dimensional Rewards for Video-to-Audio Generation},
+ author={Huadai Liu and Kaicheng Luo and Wen Wang and Qian Chen and Peiwen Sun and Rongjie Huang and Xiangang Li and Jieping Ye and Wei Xue},
+ year={2025},
+ eprint={2511.18833},
+ archivePrefix={arXiv},
+ primaryClass={cs.SD},
+ url={https://arxiv.org/abs/2511.18833},
+ }
```
---
From 4e2cfd57efaaf40c90460d8111b6a3c1e7dfb2f0 Mon Sep 17 00:00:00 2001
From: Huadai Liu <22160146@zju.edu.cn>
Date: Tue, 24 Mar 2026 15:05:09 +0800
Subject: [PATCH 2/2] Release PrismAudio (ICLR 2026)
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 553d253..9ded2d5 100644
--- a/README.md
+++ b/README.md
@@ -58,7 +58,7 @@ PyTorch implementation for multimodal audio generation and editing: generate or
---
## 📰 News
-- **2026.03.24** 🔥 **PrismAudio** (sequel to ThinkSound, different project name) is released in the same repo on branch [`prismaudio`](https://github.com/liuhuadai/ThinkSound/tree/prismaudio) — see **`README.md`** there for setup and models.
+- **2026.03.24** 🔥 **PrismAudio** is released in the same repo on branch [`prismaudio`](https://github.com/liuhuadai/ThinkSound/tree/prismaudio) — see **`README.md`** there for setup and models.
- **2026.01.26** 🎉 PrismAudio accepted to **ICLR 2026 Main Conference** (code/docs on `prismaudio`).
- **2025.11.25** 🔥 [Online PrismAudio Demo](http://prismaudio-project.github.io/) is live.
- **2025.11.25** 🔥 [PrismAudio paper](https://arxiv.org/pdf/2511.18833) on arXiv — multi-dimensional CoT-RL for video-to-audio.