openclawclaudev1.5.0
ACE-Step Lyrics Transcription
Lyrics transcription skill using ACE-Step 1.5 audio model. Dual-backend support (OpenAI Whisper + ElevenLabs Scribe), excellent API key hygiene.
7.6/10
Verified
Mar 9, 2026// RATINGS
🟢ProSkills ScoreAI Verified
7.6/10📍
Not yet listed on ClawHub or SkillsMP
// README
<h1 align="center">ACE-Step 1.5</h1>
<h1 align="center">Pushing the Boundaries of Open-Source Music Generation</h1>
<p align="center">
<a href="https://ace-step.github.io/ace-step-v1.5.github.io/">Project</a> |
<a href="https://huggingface.co/ACE-Step/Ace-Step1.5">Hugging Face</a> |
<a href="https://modelscope.cn/models/ACE-Step/Ace-Step1.5">ModelScope</a> |
<a href="https://huggingface.co/spaces/ACE-Step/Ace-Step-v1.5">Space Demo</a> |
<a href="https://discord.gg/PeWDxrkdj7">Discord</a> |
<a href="https://arxiv.org/abs/2602.00744">Technical Report</a> |
<a href="https://github.com/ace-step/awesome-ace-step">Awesome ACE-Step</a>
</p>
<p align="center">
<img src="./assets/organization_logos.png" width="100%" alt="StepFun Logo">
</p>
## Table of Contents
- [✨ Features](#-features)
- [⚡ Quick Start](#-quick-start)
- [🚀 Launch Scripts](#-launch-scripts)
- [📚 Documentation](#-documentation)
- [📖 Tutorial](#-tutorial)
- [🏗️ Architecture](#️-architecture)
- [🦁 Model Zoo](#-model-zoo)
- [🔬 Benchmark](#-benchmark)
## 📝 Abstract
🚀 We present ACE-Step v1.5, a highly efficient open-source music foundation model that brings commercial-grade generation to consumer hardware. On commonly used evaluation metrics, ACE-Step v1.5 achieves quality beyond most commercial music models while remaining extremely fast—under 2 seconds per full song on an A100 and under 10 seconds on an RTX 3090. The model runs locally with less than 4GB of VRAM, and supports lightweight personalization: users can train a LoRA from just a few songs to capture their own style.
🌉 At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprints—scaling from short loops to 10-minute compositions—while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). ⚡ Uniquely, this alignment is achieved through intrinsic reinforcement learning relying solely on the model's internal mechanisms, thereby eliminating the biases inherent in external reward models or human preferences. 🎚️
🔮 Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing capabilities—such as cover generation, repainting, and vocal-to-BGM conversion—while maintaining strict adherence to prompts across 50+ languages. This paves the way for powerful tools that seamlessly integrate into the creative workflows of music artists, producers, and content creators. 🎸
## ✨ Features
<p align="center">
<img src="./assets/application_map.png" width="100%" alt="ACE-Step Framework">
</p>
### ⚡ Performance
- ✅ **Ultra-Fast Generation** — Under 2s per full song on A100, under 10s on RTX 3090 (0.5s to 10s on A100 depending on think mode & diffusion steps)
- ✅ **Flexible Duration** — Supports 10 seconds to 10 minutes (600s) audio generation
- ✅ **Batch Generation** — Generate up to 8 songs simultaneously
### 🎵 Generation Quality
- ✅ **Commercial-Grade Output** — Quality beyond most commercial music models (between Suno v4.5 and Suno v5)
- ✅ **Rich Style Support** — 1000+ instruments and styles with fine-grained timbre description
- ✅ **Multi-Language Lyrics** — Supports 50+ languages with lyrics prompt for structure & style control
### 🎛️ Versatility & Control
| Feature | Description |
|---------|-------------|
| ✅ Reference Audio Input | Use reference audio to guide generation style |
| ✅ Cover Generation | Create covers from existing audio |
| ✅ Repaint & Edit | Selective local audio editing and regeneration |
| ✅ Track Separation | Separate audio into individual stems |
| ✅ Multi-Track Generation | Add layers like Suno Studio's "Add Layer" feature |
| ✅ Vocal2BGM | Auto-generate accompaniment for vocal tracks |
| ✅ Metadata Control | Control duration, BPM, key/scale, time signature |
| ✅ Simple Mode | Generate full songs from simple descriptions |
| ✅ Query Rewriting | Auto LM expansion of tags and lyrics |
| ✅ Audio Understanding | Extract BPM, key/scale, time signature & caption from audio |
| ✅ LRC Generation | Auto-generate lyric timestamps for generated music |
| ✅ LoRA Training | One-click annotation & training in Gradio. 8 songs, 1 hour on 3090 (12GB VRAM) |
| ✅ Quality Scoring | Automatic quality assessment for generated audio |
## 🔔 Staying ahead
Star ACE-Step on GitHub and be instantly notified of new releases

## 🤝 Partners
<p align="center">
<a href="https://www.comfy.org/"><img src="https://registry.comfy.org/_next/static/media/logo_blue.9ac227d3.png" alt="ComfyUI" height="40" style="margin: 5px;"></a>
<a href="https://zilliz.com/"><img src="https://avatars.githubusercontent.com/u/18416694" alt="Zilliz" height="40" style="margin: 5px;"></a>
<a href="https://milvus.io/"><img src="https://miro.medium.com/v2/resize:fit:2400/1*-VEGyAgcIBD62XtZWavy8w.png" alt="Milvus" height="40" style="margin: 5px;"></a>
<a href="https://zeabur.com/"><img src="https://zeabur.notion.site/image/attachment%3A43bc244b-9a2d-4b96-9646-8392aa6fc862%3Alogo-dark_1.svg?table=block&id=318a221c-948e-8056-b3c0-f9c39ce543ba&spaceId=ba37aeb9-0937-401d-aa41-ce1d3b6ff778&userId=&cache=v2" alt="Zeabur" height="40" width="40" style="margin: 5px;"></a>
</p>
## ⚡ Quick Start
> **Requirements:** Python 3.11-3.12, CUDA GPU recommended (also supports MPS / ROCm / Intel XPU / CPU)
>
> **Note:** ROCm on Windows requires Python 3.12 (AMD officially provides Python 3.12 wheels only)
```bash
# 1. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh # macOS / Linux
# powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" # Windows
# 2. Clone & install
git clone https://github.com/ACE-Step/ACE-Step-1.5.git
cd ACE-Step-1.5
uv sync
# 3. Launch Gradio UI (models auto-download on first run)
uv run acestep
# Or launch REST API server
uv run acestep-api
```
Open http://localhost:7860 (Gradio) or http://localhost:8001 (API).
> 📦 **Windows users:** A [portable package](https://files.acemusic.ai/acemusic/win/ACE-Step-1.5.7z) with pre-installed dependencies is available. See [Installation Guide](./docs/en/INSTALL.md#-windows-portable-package).
> 📦 **MacOS users:** A [portable package](https://files.acemusic.ai/acemusic/mac/ACE-Step-1.5.zip) with pre-installed dependencies is available. See [Installation Guide](./docs/en/INSTALL.md#-macos-portable-package).
> 📖 **Full installation guide** (AMD/ROCm, Intel GPU, CPU, environment variables, command-line options): [English](./docs/en/INSTALL.md) | [中文](./docs/zh/INSTALL.md) | [日本語](./docs/ja/INSTALL.md)
### 💡 Which Model Should I Choose?
| Your GPU VRAM | Recommended LM Model | Backend | Notes |
|---------------|---------------------|---------|-------|
| **≤6GB** | None (DiT only) | — | LM disabled by default; INT8 quantization + full CPU offload |
| **6-8GB** | `acestep-5Hz-lm-0.6B` | `pt` | Lightweight LM with PyTorch backend |
| **8-16GB** | `acestep-5Hz-lm-0.6B` / `1.7B` | `vllm` | 0.6B for 8-12GB, 1.7B for 12-16GB |
| **16-24GB** | `acestep-5Hz-lm-1.7B` | `vllm` | 4B available on 20GB+; no offload needed on 20GB+ |
| **≥24GB** | `acestep-5Hz-lm-4B` | `vllm` | Best quality, all models fit without offload |
The UI automatically selects the best configuration for your GPU. All settings (LM model, backend, offloading, quantization) are tier-aware and pre-configured.
> 📖 GPU compatibility details: [English](./docs/en/GPU_COMPATIBILITY.md) | [中文](./docs/zh/GPU_COMPATIBILITY.md) | [日本語](./docs/ja/GPU_COMPATIBILITY.md) | [한국어](./docs/ko/GPU_COMPATIBILITY.md)
## 🚀 Launch Scripts
Ready-to-use launch scripts for all platforms with auto environment detection, update checking, and dependency installation.
| Platform | Scripts | Backend |
|----------|---------|---------|
| **Windows** | `start_gradio_ui.bat`, `start_api_server.bat` | CUDA |
| **Windows (ROCm)** | `start_gradio_ui_rocm.bat`, `start_api_server_roc
// REPO STATS
8.2k stars
142 open issues
Last commit: 1mo ago
// SHARE
// SOURCE
View on GitHub// PROSKILLS SCORE
7.6/10
Good
BREAKDOWN
Code Quality7/10
Documentation8/10
Functionality7/10
Maintenance8/10
Security8/10
Uniqueness8/10
Usefulness7/10