r sphinx.addnodesdocument)}( rawsource children]docutils.nodessection)}(hhh](h title)}(hBaselineh]h TextBaseline
}(parenth _documenthsourceNlineNuba
|
attributes}(ids]classes]names]dupnames]backrefs]utagnamehhKh5/mnt/yhliang/workspace/FunASR/docs/m2met2/Baseline.mdhhhhubh)}(hhh](h)}(hOverviewh]hOverview
}(hh0hhhNhNubah}(h!]h#]h%]h']h)]uh+hhKhh,hh-hhubh paragraph)}(hX¦ We will release an E2E SA-ASR~\cite{kanda21b_interspeech} baseline conducted on [FunASR](https://github.com/alibaba-damo-academy/FunASR) at the time according to the timeline. The model architecture is shown in Figure 3. The SpeakerEncoder is initialized with a pre-trained speaker verification model from ModelScope. This speaker verification model is also be used to extract the speaker embedding in the speaker profile.h](hPWe will release an E2E SA-ASR~\cite{kanda21b_interspeech} baseline conducted on
}(hh@hhhNhNubh reference)}(hFunASRh]hFunASR
}(hhJhhhNhNubah}(h!]h#]h%]h']h)]refuri.https://github.com/alibaba-damo-academy/FunASRuh+hHhKhh,hh@hhubhX at the time according to the timeline. The model architecture is shown in Figure 3. The SpeakerEncoder is initialized with a pre-trained speaker verification model from ModelScope. This speaker verification model is also be used to extract the speaker embedding in the speaker profile.
}(hh@hhhNhNubeh}(h!]h#]h%]h']h)]uh+h>hKhh,hh-hhubh?)}(h.h]h image)}(hmodel archietectureh]h}(h!]h#]h%]h']h)]uriimages/sa_asr_arch.pngalthl
|
candidates}*husuh+hhhKhh,hhdhhubah}(h!]h#]h%]h']h)]uh+h>hKhh,hh-hhubeh}(h!]overviewah#]h%]overviewah']h)]slugoverviewuh+h
|
hKhh,hhhhubh)}(hhh](h)}(hQuick starth]hQuick start
}(hhhhhNhNubah}(h!]h#]h%]h']h)]uh+hhKhh,hhhhubh?)}(h.#TODO: fill with the README.md of the baselineh]h.#TODO: fill with the README.md of the baseline
}(hhhhhNhNubah}(h!]h#]h%]h']h)]uh+h>hKhh,hhhhubeh}(h!]quick-startah#]h%]quick startah']h)]hquick-startuh+h
|
hKhh,hhhhubh)}(hhh](h)}(hBaseline resultsh]hBaseline results
}(hhµhhhNhNubah}(h!]h#]h%]h']h)]uh+hhK
|
hh,hh²hhubh?)}(hX¢ The results of the baseline system are shown in Table 3. The speaker profile adopts the oracle speaker embedding during training. However, due to the lack of oracle speaker label during evaluation, the speaker profile provided by an additional spectral clustering is used. Meanwhile, the results of using the oracle speaker profile on Eval and Test Set are also provided to show the impact of speaker profile accuracy.h]hX¢ The results of the baseline system are shown in Table 3. The speaker profile adopts the oracle speaker embedding during training. However, due to the lack of oracle speaker label during evaluation, the speaker profile provided by an additional spectral clustering is used. Meanwhile, the results of using the oracle speaker profile on Eval and Test Set are also provided to show the impact of speaker profile accuracy.
}(hhÃhhhNhNubah}(h!]h#]h%]h']h)]uh+h>hKhh,hh²hhubh?)}(h.h]hi)}(hbaseline resulth]h}(h!]h#]h%]h']h)]htimages/baseline_result.pnghvh×hw}hyhßsuh+hhhK hh,hhÑhhubah}(h!]h#]h%]h']h)]uh+h>hK hh,hh²hhubeh}(h!]baseline-resultsah#]h%]baseline resultsah']h)]hbaseline-resultsuh+h
|
hK
|
hh,hhhhubeh}(h!]baselineah#]h%]baselineah']h)]hbaselineuh+h
|
hKhh,hhhhubah}(h!]h#]h%]h']h)]sourceh,uh+hcurrent_sourceNcurrent_lineNsettingsdocutils.frontendValues)}(hN generatorN datestampNsource_linkN
|
source_urlN toc_backlinksentryfootnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesNreport_levelK
|
halt_levelKexit_status_levelKdebugNwarning_streamN tracebackinput_encoding utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerj error_encodingUTF-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourceh,_destinationN _config_files]file_insertion_enabledraw_enabledKline_length_limitM'pep_referencesNpep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesNrfc_base_url&https://datatracker.ietf.org/doc/html/ tab_widthKtrim_footnote_reference_spacesyntax_highlightlongsmart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}(wordcount-wordsh substitution_definition)}(h130h]h130
}hjZ sbah}(h!]h#]h%]wordcount-wordsah']h)]uh+jX hh,ubwordcount-minutesjY )}(h1h]h1
}hjj sbah}(h!]h#]h%]wordcount-minutesah']h)]uh+jX hh,ubusubstitution_names}(wordcount-wordsjW wordcount-minutesji urefnames}refids}nameids}(hõhòh
hh®h«hìhéu nametypes}(hõh
h®hìuh!}(hòhhh-h«hhéh²u footnote_refs} citation_refs} autofootnotes]autofootnote_refs]symbol_footnotes]symbol_footnote_refs] footnotes] citations]autofootnote_startKsymbol_footnote_startK
|
id_countercollectionsCounter}
Rparse_messages]transform_messages]transformerNinclude_log]
|
decorationNhh
|
myst_slugs}(høKhòBaselinehKhOverviewh±Kh«Quick starthïK
|
héBaseline resultsuub.
|