openclawclaudev1.5.0

ACE-Step Lyrics Transcription

ace-step@ace-step8.2k stars· last commit 1mo ago· 142 open issues

Lyrics transcription skill using ACE-Step 1.5 audio model. Dual-backend support (OpenAI Whisper + ElevenLabs Scribe), excellent API key hygiene.

7.6/10
Verified
Mar 9, 2026

// RATINGS

GitHub Stars
⭐⭐⭐⭐⭐ 8.2kGitHub ↗

Very popular

🟢ProSkills ScoreAI Verified
7.6/10
📍

Not yet listed on ClawHub or SkillsMP

// README

<h1 align="center">ACE-Step 1.5</h1> <h1 align="center">Pushing the Boundaries of Open-Source Music Generation</h1> <p align="center"> <a href="https://ace-step.github.io/ace-step-v1.5.github.io/">Project</a> | <a href="https://huggingface.co/ACE-Step/Ace-Step1.5">Hugging Face</a> | <a href="https://modelscope.cn/models/ACE-Step/Ace-Step1.5">ModelScope</a> | <a href="https://huggingface.co/spaces/ACE-Step/Ace-Step-v1.5">Space Demo</a> | <a href="https://discord.gg/PeWDxrkdj7">Discord</a> | <a href="https://arxiv.org/abs/2602.00744">Technical Report</a> | <a href="https://github.com/ace-step/awesome-ace-step">Awesome ACE-Step</a> </p> <p align="center"> <img src="./assets/organization_logos.png" width="100%" alt="StepFun Logo"> </p> ## Table of Contents - [✨ Features](#-features) - [⚡ Quick Start](#-quick-start) - [🚀 Launch Scripts](#-launch-scripts) - [📚 Documentation](#-documentation) - [📖 Tutorial](#-tutorial) - [🏗️ Architecture](#️-architecture) - [🦁 Model Zoo](#-model-zoo) - [🔬 Benchmark](#-benchmark) ## 📝 Abstract 🚀 We present ACE-Step v1.5, a highly efficient open-source music foundation model that brings commercial-grade generation to consumer hardware. On commonly used evaluation metrics, ACE-Step v1.5 achieves quality beyond most commercial music models while remaining extremely fast—under 2 seconds per full song on an A100 and under 10 seconds on an RTX 3090. The model runs locally with less than 4GB of VRAM, and supports lightweight personalization: users can train a LoRA from just a few songs to capture their own style. 🌉 At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprints—scaling from short loops to 10-minute compositions—while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). ⚡ Uniquely, this alignment is achieved through intrinsic reinforcement learning relying solely on the model's internal mechanisms, thereby eliminating the biases inherent in external reward models or human preferences. 🎚️ 🔮 Beyond standard synthesis, ACE-Step v1.5 unifies precise stylistic control with versatile editing capabilities—such as cover generation, repainting, and vocal-to-BGM conversion—while maintaining strict adherence to prompts across 50+ languages. This paves the way for powerful tools that seamlessly integrate into the creative workflows of music artists, producers, and content creators. 🎸 ## ✨ Features <p align="center"> <img src="./assets/application_map.png" width="100%" alt="ACE-Step Framework"> </p> ### ⚡ Performance - ✅ **Ultra-Fast Generation** — Under 2s per full song on A100, under 10s on RTX 3090 (0.5s to 10s on A100 depending on think mode & diffusion steps) - ✅ **Flexible Duration** — Supports 10 seconds to 10 minutes (600s) audio generation - ✅ **Batch Generation** — Generate up to 8 songs simultaneously ### 🎵 Generation Quality - ✅ **Commercial-Grade Output** — Quality beyond most commercial music models (between Suno v4.5 and Suno v5) - ✅ **Rich Style Support** — 1000+ instruments and styles with fine-grained timbre description - ✅ **Multi-Language Lyrics** — Supports 50+ languages with lyrics prompt for structure & style control ### 🎛️ Versatility & Control | Feature | Description | |---------|-------------| | ✅ Reference Audio Input | Use reference audio to guide generation style | | ✅ Cover Generation | Create covers from existing audio | | ✅ Repaint & Edit | Selective local audio editing and regeneration | | ✅ Track Separation | Separate audio into individual stems | | ✅ Multi-Track Generation | Add layers like Suno Studio's "Add Layer" feature | | ✅ Vocal2BGM | Auto-generate accompaniment for vocal tracks | | ✅ Metadata Control | Control duration, BPM, key/scale, time signature | | ✅ Simple Mode | Generate full songs from simple descriptions | | ✅ Query Rewriting | Auto LM expansion of tags and lyrics | | ✅ Audio Understanding | Extract BPM, key/scale, time signature & caption from audio | | ✅ LRC Generation | Auto-generate lyric timestamps for generated music | | ✅ LoRA Training | One-click annotation & training in Gradio. 8 songs, 1 hour on 3090 (12GB VRAM) | | ✅ Quality Scoring | Automatic quality assessment for generated audio | ## 🔔 Staying ahead Star ACE-Step on GitHub and be instantly notified of new releases ![](assets/star.gif) ## 🤝 Partners <p align="center"> <a href="https://www.comfy.org/"><img src="https://registry.comfy.org/_next/static/media/logo_blue.9ac227d3.png" alt="ComfyUI" height="40" style="margin: 5px;"></a> <a href="https://zilliz.com/"><img src="https://avatars.githubusercontent.com/u/18416694" alt="Zilliz" height="40" style="margin: 5px;"></a> <a href="https://milvus.io/"><img src="https://miro.medium.com/v2/resize:fit:2400/1*-VEGyAgcIBD62XtZWavy8w.png" alt="Milvus" height="40" style="margin: 5px;"></a> <a href="https://zeabur.com/"><img src="https://zeabur.notion.site/image/attachment%3A43bc244b-9a2d-4b96-9646-8392aa6fc862%3Alogo-dark_1.svg?table=block&id=318a221c-948e-8056-b3c0-f9c39ce543ba&spaceId=ba37aeb9-0937-401d-aa41-ce1d3b6ff778&userId=&cache=v2" alt="Zeabur" height="40" width="40" style="margin: 5px;"></a> </p> ## ⚡ Quick Start > **Requirements:** Python 3.11-3.12, CUDA GPU recommended (also supports MPS / ROCm / Intel XPU / CPU) > > **Note:** ROCm on Windows requires Python 3.12 (AMD officially provides Python 3.12 wheels only) ```bash # 1. Install uv curl -LsSf https://astral.sh/uv/install.sh | sh # macOS / Linux # powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" # Windows # 2. Clone & install git clone https://github.com/ACE-Step/ACE-Step-1.5.git cd ACE-Step-1.5 uv sync # 3. Launch Gradio UI (models auto-download on first run) uv run acestep # Or launch REST API server uv run acestep-api ``` Open http://localhost:7860 (Gradio) or http://localhost:8001 (API). > 📦 **Windows users:** A [portable package](https://files.acemusic.ai/acemusic/win/ACE-Step-1.5.7z) with pre-installed dependencies is available. See [Installation Guide](./docs/en/INSTALL.md#-windows-portable-package). > 📦 **MacOS users:** A [portable package](https://files.acemusic.ai/acemusic/mac/ACE-Step-1.5.zip) with pre-installed dependencies is available. See [Installation Guide](./docs/en/INSTALL.md#-macos-portable-package). > 📖 **Full installation guide** (AMD/ROCm, Intel GPU, CPU, environment variables, command-line options): [English](./docs/en/INSTALL.md) | [中文](./docs/zh/INSTALL.md) | [日本語](./docs/ja/INSTALL.md) ### 💡 Which Model Should I Choose? | Your GPU VRAM | Recommended LM Model | Backend | Notes | |---------------|---------------------|---------|-------| | **≤6GB** | None (DiT only) | — | LM disabled by default; INT8 quantization + full CPU offload | | **6-8GB** | `acestep-5Hz-lm-0.6B` | `pt` | Lightweight LM with PyTorch backend | | **8-16GB** | `acestep-5Hz-lm-0.6B` / `1.7B` | `vllm` | 0.6B for 8-12GB, 1.7B for 12-16GB | | **16-24GB** | `acestep-5Hz-lm-1.7B` | `vllm` | 4B available on 20GB+; no offload needed on 20GB+ | | **≥24GB** | `acestep-5Hz-lm-4B` | `vllm` | Best quality, all models fit without offload | The UI automatically selects the best configuration for your GPU. All settings (LM model, backend, offloading, quantization) are tier-aware and pre-configured. > 📖 GPU compatibility details: [English](./docs/en/GPU_COMPATIBILITY.md) | [中文](./docs/zh/GPU_COMPATIBILITY.md) | [日本語](./docs/ja/GPU_COMPATIBILITY.md) | [한국어](./docs/ko/GPU_COMPATIBILITY.md) ## 🚀 Launch Scripts Ready-to-use launch scripts for all platforms with auto environment detection, update checking, and dependency installation. | Platform | Scripts | Backend | |----------|---------|---------| | **Windows** | `start_gradio_ui.bat`, `start_api_server.bat` | CUDA | | **Windows (ROCm)** | `start_gradio_ui_rocm.bat`, `start_api_server_roc

// REPO STATS

8.2k stars
142 open issues
Last commit: 1mo ago

// PROSKILLS SCORE

7.6/10

Good

BREAKDOWN

Code Quality7/10
Documentation8/10
Functionality7/10
Maintenance8/10
Security8/10
Uniqueness8/10
Usefulness7/10

// DETAILS

Categorycontent
Author@ace-step
Versionv1.5.0
PriceFree