| | |
| | | <link rel="stylesheet" type="text/css" href="_static/css/bootstrap-theme.min.css" /> |
| | | <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| | | |
| | | <title>Baseline — m2met2 documentation</title> |
| | | <title>Baseline — MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE 2.0</title> |
| | | <link rel="stylesheet" type="text/css" href="_static/pygments.css" /> |
| | | <link rel="stylesheet" type="text/css" href="_static/guzzle.css" /> |
| | | <script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script> |
| | |
| | | <li class="right" > |
| | | <a href="Track_setting_and_evaluation.html" title="Track & Evaluation" |
| | | accesskey="P">previous</a> |</li> |
| | | <li class="nav-item nav-item-0"><a href="index.html">m2met2 documentation</a> »</li> |
| | | <li class="nav-item nav-item-0"><a href="index.html">MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE 2.0</a> »</li> |
| | | <li class="nav-item nav-item-this"><a href="">Baseline</a></li> |
| | | </ul> |
| | | </div> |
| | |
| | | </div> |
| | | <div id="left-column"> |
| | | <div class="sphinxsidebar"><a href=" |
| | | index.html" class="text-logo">m2met2 documentation</a> |
| | | index.html" class="text-logo">MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE 2.0</a> |
| | | <div class="sidebar-block"> |
| | | <div class="sidebar-wrapper"> |
| | | <div id="main-search"> |
| | |
| | | <h1>Baseline<a class="headerlink" href="#baseline" title="Permalink to this heading">¶</a></h1> |
| | | <section id="overview"> |
| | | <h2>Overview<a class="headerlink" href="#overview" title="Permalink to this heading">¶</a></h2> |
| | | <p>We will release an E2E SA-ASR~\cite{kanda21b_interspeech} baseline conducted on <a class="reference external" href="https://github.com/alibaba-damo-academy/FunASR">FunASR</a> at the time according to the timeline. The model architecture is shown in Figure 3. The SpeakerEncoder is initialized with a pre-trained speaker verification model from ModelScope. This speaker verification model is also be used to extract the speaker embedding in the speaker profile.</p> |
| | | <p>We will release an E2E SA-ASR baseline conducted on <a class="reference external" href="https://github.com/alibaba-damo-academy/FunASR">FunASR</a> at the time according to the timeline. The model architecture is shown in Figure 3. The SpeakerEncoder is initialized with a pre-trained speaker verification model from ModelScope. This speaker verification model is also be used to extract the speaker embedding in the speaker profile.</p> |
| | | <p><img alt="model archietecture" src="_images/sa_asr_arch.png" /></p> |
| | | </section> |
| | | <section id="quick-start"> |
| | | <h2>Quick start<a class="headerlink" href="#quick-start" title="Permalink to this heading">¶</a></h2> |
| | | <p>#TODO: fill with the README.md of the baseline</p> |
| | | <p>To run the baseline, first you need to install FunASR and ModelScope. (<a class="reference external" href="https://alibaba-damo-academy.github.io/FunASR/en/installation.html">installation</a>)<br /> |
| | | There are two startup scripts, <code class="docutils literal notranslate"><span class="pre">run.sh</span></code> for training and evaluating on the old eval and test sets, and <code class="docutils literal notranslate"><span class="pre">run_m2met_2023_infer.sh</span></code> for inference on the new test set of the Multi-Channel Multi-Party Meeting Transcription 2.0 (<a class="reference external" href="https://alibaba-damo-academy.github.io/FunASR/m2met2/index.html">M2MET2.0</a>) Challenge.<br /> |
| | | Before running <code class="docutils literal notranslate"><span class="pre">run.sh</span></code>, you must manually download and unpack the <a class="reference external" href="http://www.openslr.org/119/">AliMeeting</a> corpus and place it in the <code class="docutils literal notranslate"><span class="pre">./dataset</span></code> directory:</p> |
| | | <div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>dataset |
| | | <span class="p">|</span>——<span class="w"> </span>Eval_Ali_far |
| | | <span class="p">|</span>——<span class="w"> </span>Eval_Ali_near |
| | | <span class="p">|</span>——<span class="w"> </span>Test_Ali_far |
| | | <span class="p">|</span>——<span class="w"> </span>Test_Ali_near |
| | | <span class="p">|</span>——<span class="w"> </span>Train_Ali_far |
| | | <span class="p">|</span>——<span class="w"> </span>Train_Ali_near |
| | | Before<span class="w"> </span>running<span class="w"> </span><span class="sb">`</span>run_m2met_2023_infer.sh<span class="sb">`</span>,<span class="w"> </span>you<span class="w"> </span>need<span class="w"> </span>to<span class="w"> </span>place<span class="w"> </span>the<span class="w"> </span>new<span class="w"> </span><span class="nb">test</span><span class="w"> </span><span class="nb">set</span><span class="w"> </span><span class="sb">`</span>Test_2023_Ali_far<span class="sb">`</span><span class="w"> </span><span class="o">(</span>to<span class="w"> </span>be<span class="w"> </span>released<span class="w"> </span>after<span class="w"> </span>the<span class="w"> </span>challenge<span class="w"> </span>starts<span class="o">)</span><span class="w"> </span><span class="k">in</span><span class="w"> </span>the<span class="w"> </span><span class="sb">`</span>./dataset<span class="sb">`</span><span class="w"> </span>directory,<span class="w"> </span>which<span class="w"> </span>contains<span class="w"> </span>only<span class="w"> </span>raw<span class="w"> </span>audios.<span class="w"> </span>Then<span class="w"> </span>put<span class="w"> </span>the<span class="w"> </span>given<span class="w"> </span><span class="sb">`</span>wav.scp<span class="sb">`</span>,<span class="w"> </span><span class="sb">`</span>wav_raw.scp<span class="sb">`</span>,<span class="w"> </span><span class="sb">`</span>segments<span class="sb">`</span>,<span class="w"> </span><span class="sb">`</span>utt2spk<span class="sb">`</span><span class="w"> </span>and<span class="w"> </span><span class="sb">`</span>spk2utt<span class="sb">`</span><span class="w"> </span><span class="k">in</span><span class="w"> </span>the<span class="w"> </span><span class="sb">`</span>./data/Test_2023_Ali_far<span class="sb">`</span><span class="w"> </span>directory.<span class="w"> </span> |
| | | <span class="sb">```</span>shell |
| | | data/Test_2023_Ali_far |
| | | <span class="p">|</span>——<span class="w"> </span>wav.scp |
| | | <span class="p">|</span>——<span class="w"> </span>wav_raw.scp |
| | | <span class="p">|</span>——<span class="w"> </span>segments |
| | | <span class="p">|</span>——<span class="w"> </span>utt2spk |
| | | <span class="p">|</span>——<span class="w"> </span>spk2utt |
| | | </pre></div> |
| | | </div> |
| | | <p>For more details you can see <a class="reference external" href="https://github.com/alibaba-damo-academy/FunASR/blob/main/egs/alimeeting/sa-asr/README.md">here</a></p> |
| | | </section> |
| | | <section id="baseline-results"> |
| | | <h2>Baseline results<a class="headerlink" href="#baseline-results" title="Permalink to this heading">¶</a></h2> |
| | |
| | | <li class="right" > |
| | | <a href="Track_setting_and_evaluation.html" title="Track & Evaluation" |
| | | >previous</a> |</li> |
| | | <li class="nav-item nav-item-0"><a href="index.html">m2met2 documentation</a> »</li> |
| | | <li class="nav-item nav-item-0"><a href="index.html">MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE 2.0</a> »</li> |
| | | <li class="nav-item nav-item-this"><a href="">Baseline</a></li> |
| | | </ul> |
| | | </div> |