编辑 | blame | 历史 | 原始文档

Speaker Verification

Note:
The modelscope pipeline supports all the models in
model zoo
to inference and finetine. Here we take the model of xvector_sv as example to demonstrate the usage.

Inference with pipeline

Quick start

Speaker verification

from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

inference_sv_pipline = pipeline(
    task=Tasks.speaker_verification,
    model='damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch'
)

# The same speaker
rec_result = inference_sv_pipline(audio_in=(
    'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav',
    'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_same.wav'))
print("Similarity", rec_result["scores"])

# Different speakers
rec_result = inference_sv_pipline(audio_in=(
    'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav',
    'https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_different.wav'))
print("Similarity", rec_result["scores"])

Speaker embedding extraction

from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

# Define extraction pipeline
inference_sv_pipline = pipeline(
    task=Tasks.speaker_verification,
    model='damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch'
)
# Extract speaker embedding
rec_result = inference_sv_pipline(
    audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/sv_example_enroll.wav')
speaker_embedding = rec_result["spk_embedding"]

Full code of demo, please ref to infer.py.

API-reference

Define pipeline

  • task: Tasks.speaker_verification
  • model: model name in model zoo, or model path in local disk
  • ngpu: 1 (Default), decoding on GPU. If ngpu=0, decoding on CPU
  • output_dir: None (Default), the output path of results if set
  • batch_size: 1 (Default), batch size when decoding
  • sv_threshold: 0.9465 (Default), the similarity threshold to determine
    whether utterances belong to the same speaker (it should be in (0, 1))

Infer pipeline for speaker embedding extraction

Infer pipeline for speaker verification

Inference with you data

Use wav1.scp or fbank.scp to organize your own data to extract speaker embeddings or perform speaker verification.
In this case, the output_dir should be set to save all the embeddings or scores.

Inference with multi-threads on CPU

You can inference with multi-threads on CPU as follow steps:
1. Set ngpu=0 while defining the pipeline in infer.py.
2. Split wav.scp to several files e.g.: 4
shell split -l $((`wc -l < wav.scp`/4+1)) --numeric-suffixes wav.scp splits/wav.scp.
3. Start to extract embeddings
shell for wav_scp in `ls splits/wav.scp.*`; do infer.py ${wav_scp} outputs/$((basename ${wav_scp})) done
4. The embeddings will be saved in outputs/*

Inference with multi GPU

Similar to inference on CPU, the difference are as follows:

Step 1. Set ngpu=1 while defining the pipeline in infer.py.

Step 3. specify the gpu device with CUDA_VISIBLE_DEVICES:
shell for wav_scp in `ls splits/wav.scp.*`; do CUDA_VISIBLE_DEVICES=1 infer.py ${wav_scp} outputs/$((basename ${wav_scp})) done