python/FunASR-XL.git

£/sphinx.addnodesdocument)}(    rawsourcechildren]docutils.nodessection)}(hhh](h    title)}(hBaselineh]h    TextBaseline}(parenth    _documenthsourceNlineNuba
attributes}(ids]classes]names]dupnames]backrefs]utagnamehhKh5/mnt/yhliang/workspace/FunASR/docs/m2met2/Baseline.mdhhhhubh)}(hhh](h)}(hOverviewh]hOverview}(hh0hhhNhNubah}(h!]h#]h%]h']h)]uh+hhKhh,hh-hhubh        paragraph)}(hXWe will release an E2E SA-ASR baseline conducted on [FunASR](https://github.com/alibaba-damo-academy/FunASR) at the time according to the timeline. The model architecture is shown in Figure 3. The SpeakerEncoder is initialized with a pre-trained speaker verification model from ModelScope. This speaker verification model is also be used to extract the speaker embedding in the speaker profile.h](h4We will release an E2E SA-ASR baseline conducted on }(hh@hhhNhNubh        reference)}(hFunASRh]hFunASR}(hhJhhhNhNubah}(h!]h#]h%]h']h)]refuri.https://github.com/alibaba-damo-academy/FunASRuh+hHhKhh,hh@hhubhX at the time according to the timeline. The model architecture is shown in Figure 3. The SpeakerEncoder is initialized with a pre-trained speaker verification model from ModelScope. This speaker verification model is also be used to extract the speaker embedding in the speaker profile.}(hh@hhhNhNubeh}(h!]h#]h%]h']h)]uh+h>hKhh,hh-hhubh?)}(h.![model archietecture](images/sa_asr_arch.png)h]h    image)}(hmodel archietectureh]h}(h!]h#]h%]h']h)]uriimages/sa_asr_arch.pngalthl
candidates}*husuh+hhhKhh,hhdhhubah}(h!]h#]h%]h']h)]uh+h>hKhh,hh-hhubeh}(h!]overviewah#]h%]overviewah']h)]slugoverviewuh+h
hKhh,hhhhubh)}(hhh](h)}(hQuick starth]hQuick start}(hhhhhNhNubah}(h!]h#]h%]h']h)]uh+hhKhh,hhhhubh?)}(hXtTo run the baseline, first you need to install FunASR and ModelScope. ([installation](https://alibaba-damo-academy.github.io/FunASR/en/installation.html))  
There are two startup scripts, `run.sh` for training and evaluating on the old eval and test sets, and `run_m2met_2023_infer.sh` for inference on the new test set of the Multi-Channel Multi-Party Meeting Transcription 2.0 ([M2MeT2.0](https://alibaba-damo-academy.github.io/FunASR/m2met2/index.html)) Challenge.  
Before running `run.sh`, you must manually download and unpack the [AliMeeting](http://www.openslr.org/119/) corpus and place it in the `./dataset` directory:h](hGTo run the baseline, first you need to install FunASR and ModelScope. (}(hhhhhNhNubhI)}(hinstallationh]hinstallation}(hh£hhhNhNubah}(h!]h#]h%]h']h)]hXBhttps://alibaba-damo-academy.github.io/FunASR/en/installation.htmluh+hHhKhh,hhhhubh)}(hhhhhNhNubh    raw)}(h<br />
h]h<br />
}(hh¸hhhNhNubah}(h!]h#]h%]h']h)]formathtml    xml:spacepreserveuh+h¶hhhhhh,hKubh·)}(h\\
h]h\\
}(hhÊhhhNhNubah}(h!]h#]h%]h']h)]formatlatexhÈhÉuh+h¶hhhhhh,hKubhThere are two startup scripts, }(hhhhhNhNubh    literal)}(hrun.shh]hrun.sh}(hhàhhhNhNubah}(h!]h#]h%]h']h)]uh+hÞhKhh,hhhhubh@ for training and evaluating on the old eval and test sets, and }(hhhhhNhNubhß)}(hrun_m2met_2023_infer.shh]hrun_m2met_2023_infer.sh}(hhòhhhNhNubah}(h!]h#]h%]h']h)]uh+hÞhKhh,hhhhubh_ for inference on the new test set of the Multi-Channel Multi-Party Meeting Transcription 2.0 (}(hhhhhNhNubhI)}(hM2MeT2.0h]hM2MeT2.0}(hjhhhNhNubah}(h!]h#]h%]h']h)]hX?https://alibaba-damo-academy.github.io/FunASR/m2met2/index.htmluh+hHhKhh,hhhhubh) Challenge.}(hhhhhNhNubh·)}(h<br />
h]h<br />
}(hjhhhNhNubah}(h!]h#]h%]h']h)]formathÇhÈhÉuh+h¶hhhhhh,hKubh·)}(h\\
h]h\\
}(hj&hhhNhNubah}(h!]h#]h%]h']h)]formathÙhÈhÉuh+h¶hhhhhh,hKubhBefore running }(hhhhhNhNubhß)}(hrun.shh]hrun.sh}(hj9hhhNhNubah}(h!]h#]h%]h']h)]uh+hÞhKhh,hhhhubh,, you must manually download and unpack the }(hhhhhNhNubhI)}(h
AliMeetingh]h
AliMeeting}(hjKhhhNhNubah}(h!]h#]h%]h']h)]hXhttp://www.openslr.org/119/uh+hHhKhh,hhhhubh corpus and place it in the }(hhhhhNhNubhß)}(h    ./dataseth]h    ./dataset}(hj^hhhNhNubah}(h!]h#]h%]h']h)]uh+hÞhKhh,hhhhubh directory:}(hhhhhNhNubeh}(h!]h#]h%]h']h)]uh+h>hKhh,hhhhubh     literal_block)}(hdataset
|ââ Eval_Ali_far
|ââ Eval_Ali_near
|ââ Test_Ali_far
|ââ Test_Ali_near
|ââ Train_Ali_far
|ââ Train_Ali_near
h]hdataset
|ââ Eval_Ali_far
|ââ Eval_Ali_near
|ââ Test_Ali_far
|ââ Test_Ali_near
|ââ Train_Ali_far
|ââ Train_Ali_near
}hjxsbah}(h!]h#]h%]h']h)]languageshellhÈhÉuh+jvhh,hKhhhhubh?)}(hXHBefore running `run_m2met_2023_infer.sh`, you need to place the new test set `Test_2023_Ali_far` (to be released after the challenge starts) in the `./dataset` directory, which contains only raw audios. Then put the given `wav.scp`, `wav_raw.scp`, `segments`, `utt2spk` and `spk2utt` in the `./data/Test_2023_Ali_far` directory.h](hBefore running }(hjhhhNhNubhß)}(hrun_m2met_2023_infer.shh]hrun_m2met_2023_infer.sh}(hjhhhNhNubah}(h!]h#]h%]h']h)]uh+hÞhKhh,hjhhubh%, you need to place the new test set }(hjhhhNhNubhß)}(hTest_2023_Ali_farh]hTest_2023_Ali_far}(hj¢hhhNhNubah}(h!]h#]h%]h']h)]uh+hÞhKhh,hjhhubh4 (to be released after the challenge starts) in the }(hjhhhNhNubhß)}(h    ./dataseth]h    ./dataset}(hj´hhhNhNubah}(h!]h#]h%]h']h)]uh+hÞhKhh,hjhhubh? directory, which contains only raw audios. Then put the given }(hjhhhNhNubhß)}(hwav.scph]hwav.scp}(hjÆhhhNhNubah}(h!]h#]h%]h']h)]uh+hÞhKhh,hjhhubh, }(hjhhhNhNubhß)}(hwav_raw.scph]hwav_raw.scp}(hjØhhhNhNubah}(h!]h#]h%]h']h)]uh+hÞhKhh,hjhhubh, }(hjhhhh,hKubhß)}(hsegmentsh]hsegments}(hjêhhhNhNubah}(h!]h#]h%]h']h)]uh+hÞhKhh,hjhhubh, }(hjhhhh,hKubhß)}(hutt2spkh]hutt2spk}(hjühhhNhNubah}(h!]h#]h%]h']h)]uh+hÞhKhh,hjhhubh and }(hjhhhNhNubhß)}(hspk2utth]hspk2utt}(hjhhhNhNubah}(h!]h#]h%]h']h)]uh+hÞhKhh,hjhhubh in the }(hjhhhNhNubhß)}(h./data/Test_2023_Ali_farh]h./data/Test_2023_Ali_far}(hj hhhNhNubah}(h!]h#]h%]h']h)]uh+hÞhKhh,hjhhubh directory.}(hjhhhNhNubeh}(h!]h#]h%]h']h)]uh+h>hKhh,hhhhubjw)}(hldata/Test_2023_Ali_far
|ââ wav.scp
|ââ wav_raw.scp
|ââ segments
|ââ utt2spk
|ââ spk2utt
h]hldata/Test_2023_Ali_far
|ââ wav.scp
|ââ wav_raw.scp
|ââ segments
|ââ utt2spk
|ââ spk2utt
}hj8sbah}(h!]h#]h%]h']h)]languageshellhÈhÉuh+jvhh,hKhhhhubh?)}(h}For more details you can see [here](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs/alimeeting/sa-asr/README.md)h](hFor more details you can see }(hjHhhhNhNubhI)}(hhereh]hhere}(hjPhhhNhNubah}(h!]h#]h%]h']h)]hXXhttps://github.com/alibaba-damo-academy/FunASR/blob/main/egs/alimeeting/sa-asr/README.mduh+hHhKhh,hjHhhubeh}(h!]h#]h%]h']h)]uh+h>hKhh,hhhhubeh}(h!]quick-startah#]h%]quick startah']h)]hquick-startuh+h
hKhh,hhhhubh)}(hhh](h)}(hBaseline resultsh]hBaseline results}(hjqhhhNhNubah}(h!]h#]h%]h']h)]uh+hhKhh,hjnhhubh?)}(hX¢The results of the baseline system are shown in Table 3. The speaker profile adopts the oracle speaker embedding during training. However, due to the lack of oracle speaker label during evaluation, the speaker profile provided by an additional spectral clustering is used. Meanwhile, the results of using the oracle speaker profile on Eval and Test Set are also provided to show the impact of speaker profile accuracy.h]hX¢The results of the baseline system are shown in Table 3. The speaker profile adopts the oracle speaker embedding during training. However, due to the lack of oracle speaker label during evaluation, the speaker profile provided by an additional spectral clustering is used. Meanwhile, the results of using the oracle speaker profile on Eval and Test Set are also provided to show the impact of speaker profile accuracy.}(hjhhhNhNubah}(h!]h#]h%]h']h)]uh+h>hK hh,hjnhhubh?)}(h.![baseline_result](images/baseline_result.png)h]hi)}(hbaseline_resulth]h}(h!]h#]h%]h']h)]htimages/baseline_result.pnghvjhw}hyjsuh+hhhK"hh,hjhhubah}(h!]h#]h%]h']h)]uh+h>hK"hh,hjnhhubeh}(h!]baseline-resultsah#]h%]baseline resultsah']h)]hbaseline-resultsuh+h
hKhh,hhhhubeh}(h!]baselineah#]h%]baselineah']h)]hbaselineuh+h
hKhh,hhhhubah}(h!]h#]h%]h']h)]sourceh,uh+hcurrent_sourceNcurrent_lineNsettingsdocutils.frontendValues)}(hN    generatorN    datestampNsource_linkN
source_urlN toc_backlinksentryfootnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesNreport_levelK
halt_levelKexit_status_levelKdebugNwarning_streamN    tracebackinput_encoding    utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerjØerror_encodingUTF-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN    id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourceh,_destinationN _config_files]file_insertion_enabledraw_enabledKline_length_limitM'pep_referencesNpep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesNrfc_base_url&https://datatracker.ietf.org/doc/html/    tab_widthKtrim_footnote_reference_spacesyntax_highlightlongsmart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}(wordcount-wordsh    substitution_definition)}(h222h]h222}hjsbah}(h!]h#]h%]wordcount-wordsah']h)]uh+jhh,ubwordcount-minutesj)}(h1h]h1}hj&sbah}(h!]h#]h%]wordcount-minutesah']h)]uh+jhh,ubusubstitution_names}(wordcount-wordsjwordcount-minutesj%urefnames}refids}nameids}(j±j®hhjjjgj¨j¥u    nametypes}(j±hjjj¨uh!}(j®hhh-jghj¥jnu footnote_refs} citation_refs} autofootnotes]autofootnote_refs]symbol_footnotes]symbol_footnote_refs]    footnotes]    citations]autofootnote_startKsymbol_footnote_startK
id_countercollectionsCounter}Rparse_messages]transform_messages]transformerNinclude_log]
decorationNhh
myst_slugs}(j´Kj®BaselinehKhOverviewjmKjgQuick startj«Kj¥Baseline resultsuub.