CHANGELOG MAY 08, 2026 6 min

Seedance 2.0: ByteDance video with audio out of the box

We added Seedance 2.0 — a new ByteDance video model. It makes clips up to 15 seconds with audio out of the box: dialogue, ambient sound and effects. Drive it from text, an image, a video or an audio track.

NetRoom

EDITORIAL, NETROOM

The short version

The NetRoom catalog now has Seedance 2.0 — a new video model from ByteDance. It makes clips up to 15 seconds long, and it writes the soundtrack for you: dialogue, ambient noise and SFX are generated and synced to the picture in the same pass. You can drive it from text, from an image, from another video, or from an audio track.

What it does

Seedance 2.0 covers most of the work people used to chain across three or four separate tools.

Video from text. Type what you want to see and the model builds it. Long prompts are fine: you can describe several scenes, camera moves and lines of dialogue, and the model cuts the result into shots.
Video from an image. Drop in a single picture as the starting frame and the rest is filled in. You can also pin a starting and an ending frame so the transition between them is exactly what you want.
Video from a reference clip. Upload a sample video — the model borrows pacing, camera moves and the layout of the scene, then rewrites the action and the look from your prompt.
Video to match an audio track. Send a clip of dialogue or ambient sound, and the visuals lock to its timeline so lips meet the lines and motion sits on the beat.
Edit and extend. Make targeted changes to a clip you already have, or extend it to a longer shot without breaking continuity.

Audio

This is the headline feature. Seedance 2.0 writes a dual-channel synchronized audio track right alongside the video — dialogue, room tone, sound effects, all in one pass. No separate TTS service afterwards, no manual lip-sync. You get an MP4 with sound out of the box. If you need it silent, there's a single toggle in the form to mute it.

Quality and formats

Three quality tiers: 480p, 720p and 1080p. Aspect ratios cover everything platforms ask for: horizontal 16:9, vertical 9:16 for Reels and TikTok, square 1:1, classic 4:3 and 3:4, and a wide cinematic 21:9 for teasers and YouTube headers. Clip length runs from 3 to 15 seconds in steps of 3, 5, 8, 10, 12 and 15. There's also an automatic mode where the model picks the right length from the prompt.

Pricing

Charged per second of output. Video from text or image at 480p costs $0.07/sec; at 720p, $0.16/sec. The video-to-video mode is heavier on compute, so it's from $0.13/sec at 480p and from $0.28/sec at 720p. NetRoom shows the live price in RUB on the model page — see /model/seedance-20.

Where it shines

Seedance 2.0 hits hardest in workflows that used to need a videographer, a sound designer and an editor working together.

Social ads. 5–10 second clips for Reels, TikTok and Shorts: dialogue, ambience, effects — done in one pass.
Concept and previz. Check an idea quickly before committing real production time. Camera moves, objects behave with weight, and that's usually enough to know whether the shot lands.
Dubbing and re-voicing. Hand it a dialogue track and get a clip that matches it. Useful for adapting existing material to another language or another voice.
Extending short cuts. Turn a 5-second clip into a 15-second shot without breaking style.
Cinematic formats. The 21:9 ratio covers teasers, trailers and letterbox openers.

What to keep in mind

One clip is up to 15 seconds. For longer pieces, generate several or use the extend mode.
No negative prompt. If you want to keep something out of the shot, phrase it positively and back it up with a reference.
ByteDance applies strict moderation on uploaded images. Real human faces, ID documents and recognizable personal data are rejected on the spot — that's their policy and it can't be worked around. Use generated or anonymized inputs.
1080p and long clips don't render instantly — three minutes and up. NetRoom waits the job out and notifies you when it's ready.
Pinning frames and uploading references are different modes. Within a single request you pick one — either you set the start/end frame, or you attach reference images, videos and audio.

How to try it

Open /model/seedance-20 — every mode is in the form. Pick what you want to make (video from text, image, another video, or to match audio), choose resolution and length, attach references if you need them, and hit Generate. No code, no parameter wrangling.

Seedance 2.0: ByteDance video with audio out of the box

The short version

What it does

Audio

Quality and formats

Pricing

Where it shines

What to keep in mind

How to try it

More from the blog

SkyReels V4 lands in the catalog: video with sound from text or image

What landed: 2.5× lower prices, new generation toolkit, Compare mode

What arrived in the catalog: Qwen 3.6 and Kimi K2.6