The pipeline
as small parts.
Every step of the network ships in the repo. Use them stand-alone, fork them, embed them in your own product. The website is one way to consume the rails; this is the other.
publish.py
Flow A — the website pipeline: download → transcribe → diarize → translate → neural TTS → publish to the catalog.
publish_clone.py
Flow B / Plan B — cross-lingual voice clone. Each speaker's own voice, speaking Mandarin, via Qwen3-TTS in-context cloning.
publish_clone_vc.py
Flow C / Plan C — clean native TTS + seed-vc timbre transfer. Clear, standard pronunciation in the speaker's voice; the hard multi-speaker case.
glossary.py
Keeps crypto / AI / web3 / startup proper nouns in English during translation (instruction + source masking + Chinese→English repair).
One command, multilingual.
The pipeline is scriptable end-to-end. Point an agent at a URL and it returns a publishable, Chinese-dubbed episode with transcripts and metadata.
Every episode is addressable.
Each episode has a stable URL — /interview/<id> — with audio, bilingual transcript, and metadata. Embed it in your own client.