前言
通过SoftVC VITS Singing Voice Conversion训练AI实现音色转换
必须使用NVIDIA显卡,显存必须6G以上
下载项目
1 2
| git clone https://github.com/svc-develop-team/so-vits-svc.git cd so-vits-svc
|
下载依赖
创建虚拟环境
1 2
| python -m venv venv venv\Scripts\activate.bat
|
安装PyTorch
1
| pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118
|
安装playsound
1 2
| pip install --upgrade wheel pip install playsound==1.3.0
|
安装omegaconf-2.0.5
- pip需要<24.1才能安装omegaconf的2.0.5版本
1 2
| python -m pip install --upgrade pip==24.0 pip install omegaconf==2.0.5
|
安装其他依赖
so-vits-svc/1
| pip install -r requirements_win.txt
|
1
| Successfully installed Flask-2.1.2 Flask_Cors-3.0.10 Jinja2-3.1.4 PyAudio-0.2.12 PyWavelets-1.6.0 Six-1.16.0 SoundFile-0.10.3.post1 Werkzeug-3.0.4 absl-py-2.1.0 aiofiles-23.2.1 aiohappyeyeballs-2.4.0 aiohttp-3.10.5 aiosignal-1.3.1 altair-5.4.1 annotated-types-0.7.0 antlr4-python3-runtime-4.8 anyio-4.4.0 async-timeout-4.0.3 attrs-24.2.0 audioread-3.0.1 bitarray-2.9.2 certifi-2024.8.30 cffi-1.17.1 charset-normalizer-2.1.1 click-8.1.7 colorama-0.4.6 contourpy-1.2.1 cycler-0.12.1 cython-3.0.11 decorator-5.1.1 edge_tts-6.1.12 exceptiongroup-1.2.2 fairseq-0.12.2 faiss-cpu-1.8.0.post1 fastapi-0.1.17 ffmpeg-python-0.2.0 ffmpy-0.4.0 filelock-3.16.0 fonttools-4.53.1 frozenlist-1.4.1 fsspec-2024.9.0 future-1.0.0 gradio-4.26.0 gradio-client-0.15.1 grpcio-1.66.1 h11-0.14.0 httpcore-1.0.5 httpx-0.27.2 huggingface-hub-0.24.7 hydra-core-1.0.7 idna-3.10 imageio-2.35.1 importlib-metadata-8.5.0 importlib-resources-6.4.5 itsdangerous-2.2.0 joblib-1.4.2 jsonschema-4.23.0 jsonschema-specifications-2023.12.1 kiwisolver-1.4.7 langdetect-1.0.9 librosa-0.9.1 llvmlite-0.43.0 loguru-0.7.2 lxml-5.3.0 markdown-3.7 markdown-it-py-3.0.0 markupsafe-2.1.5 matplotlib-3.8.4 mdurl-0.1.2 mpmath-1.3.0 multidict-6.1.0 narwhals-1.8.1 networkx-3.2.1 numba-0.60.0 numpy-1.22.4 onnx-1.16.2 onnxoptimizer-0.3.13 onnxsim-0.4.36 orjson-3.10.7 packaging-24.1 pandas-2.2.2 pillow-10.4.0 platformdirs-4.3.3 playsound-1.3.0 pooch-1.8.2 portalocker-2.10.1 praat-parselmouth-0.4.4 protobuf-5.28.1 pycparser-2.22 pydantic-2.9.1 pydantic-core-2.23.3 pydub-0.25.1 pygments-2.18.0 pynvml-11.5.3 pyparsing-3.1.4 python-dateutil-2.9.0.post0 python-multipart-0.0.9 pytz-2024.2 pywin32-306 pyworld-0.3.0 referencing-0.35.1 regex-2024.9.11 requests-2.28.1 resampy-0.4.3 rich-13.8.1 rpds-py-0.20.0 ruff-0.6.5 sacrebleu-2.4.3 safetensors-0.4.5 scikit-image-0.19.3 scikit-learn-1.5.2 scikit-maad-1.3.12 scipy-1.7.3 semantic-version-2.10.0 shellingham-1.5.4 sniffio-1.3.1 sounddevice-0.4.5 starlette-0.19.1 sympy-1.13.2 tabulate-0.9.0 tensorboard-2.17.1 tensorboard-data-server-0.7.2 tensorboardX-2.6.2.2 threadpoolctl-3.5.0 tifffile-2024.8.30 tokenizers-0.19.1 tomlkit-0.12.0 torch-2.4.1 torchaudio-2.4.1 torchcrepe-0.0.23 tqdm-4.63.0 transformers-4.44.2 typer-0.12.5 tzdata-2024.1 urllib3-1.26.20 uvicorn-0.30.6 websockets-11.0.3 win32-setctime-1.1.0 yarl-1.11.1 zipp-3.20.2
|
下载预训练模型
下载预训练底模
准备声音素材
- 要求总计2小时以上的声音素材
- 需人生干音作为声音素材,尽可能的去除噪音和混响
传送门
- 声音文件格式需为
.wav格式
- 声音文件重采样为44100Hz单声道
- 声音文件需切片,每个切片时长为5~15秒
传送门
- 切片后的声音文件放置在
dataset_raw目录下
1 2 3 4
| + dataset_raw + <voice_name> - 01.wav - 02.wav
|
重采样至 44100Hz 单声道
自动划分训练集、验证集,以及自动生成配置文件
1
| python preprocess_flist_config.py --speech_encoder vec768l12
|
创建文件夹
训练
1
| python train.py -c configs/config.json -m 44k
|

推理
- 把用于推理的原歌曲人声
<source_file>.wav放到raw文件夹下
-m "logs/44k/G_100.pth":指定
-n <speaker_name>:声音角色
-s <source_file>.wav:用于推理的原歌曲人声
1
| python inference_main.py -m "logs/44k/G_100.pth" -c "configs/config.json" -n "<source_file>.wav" -t 0 -s "<speaker_name>"
|
完成
参考文献
SUC-DriverOld/so-vits-svc-Deployment-Documents