python/FunASR-XL.git

ÿ=sphinx.addnodesdocument)}(    rawsourcechildren]docutils.nodessection)}(hhh](h    title)}(hIntroductionh]h    TextIntroduction}(parenth    _documenthsourceNlineNuba
attributes}(ids]classes]names]dupnames]backrefs]utagnamehhKh9/mnt/yhliang/workspace/FunASR/docs/m2met2/Introduction.mdhhhhubh)}(hhh](h)}(hCall for participationh]hCall for participation}(hh0hhhNhNubah}(h!]h#]h%]h']h)]uh+hhKhh,hh-hhubh        paragraph)}(hXÕAutomatic speech recognition (ASR) and speaker diarization have made significant strides in recent years, resulting in a surge of speech technology applications across various domains. However, meetings present unique challenges to speech technologies due to their complex acoustic conditions and diverse speaking styles, including overlapping speech, variable numbers of speakers, far-field signals in large conference rooms, and environmental noise and reverberation.h]hXÕAutomatic speech recognition (ASR) and speaker diarization have made significant strides in recent years, resulting in a surge of speech technology applications across various domains. However, meetings present unique challenges to speech technologies due to their complex acoustic conditions and diverse speaking styles, including overlapping speech, variable numbers of speakers, far-field signals in large conference rooms, and environmental noise and reverberation.}(hh@hhhNhNubah}(h!]h#]h%]h']h)]uh+h>hKhh,hh-hhubh?)}(hX(Over the years, several challenges have been organized to advance the development of meeting transcription, including the Rich Transcription evaluation and Computational Hearing in Multisource Environments (CHIME) challenges. The latest iteration of the CHIME challenge has a particular focus on distant automatic speech recognition and developing systems that can generalize across various array topologies and application scenarios. However, while progress has been made in English meeting transcription, language differences remain a significant barrier to achieving comparable results in non-English languages, such as Mandarin. The Multimodal Information Based Speech Processing (MISP) and Multi-Channel Multi-Party Meeting Transcription (M2MeT) challenges have been instrumental in advancing Mandarin meeting transcription. The MISP challenge seeks to address the problem of audio-visual distant multi-microphone signal processing in everyday home environments, while the M2MeT challenge focuses on tackling the speech overlap issue in offline meeting rooms.h]hX(Over the years, several challenges have been organized to advance the development of meeting transcription, including the Rich Transcription evaluation and Computational Hearing in Multisource Environments (CHIME) challenges. The latest iteration of the CHIME challenge has a particular focus on distant automatic speech recognition and developing systems that can generalize across various array topologies and application scenarios. However, while progress has been made in English meeting transcription, language differences remain a significant barrier to achieving comparable results in non-English languages, such as Mandarin. The Multimodal Information Based Speech Processing (MISP) and Multi-Channel Multi-Party Meeting Transcription (M2MeT) challenges have been instrumental in advancing Mandarin meeting transcription. The MISP challenge seeks to address the problem of audio-visual distant multi-microphone signal processing in everyday home environments, while the M2MeT challenge focuses on tackling the speech overlap issue in offline meeting rooms.}(hhNhhhNhNubah}(h!]h#]h%]h']h)]uh+h>hKhh,hh-hhubh?)}(hXThe ICASSP2022 M2MeT challenge focuses on meeting scenarios, and it comprises two main tasks: speaker diarization and multi-speaker automatic speech recognition. The former involves identifying who spoke when in the meeting, while the latter aims to transcribe speech from multiple speakers simultaneously, which poses significant technical difficulties due to overlapping speech and acoustic interferences.h]hXThe ICASSP2022 M2MeT challenge focuses on meeting scenarios, and it comprises two main tasks: speaker diarization and multi-speaker automatic speech recognition. The former involves identifying who spoke when in the meeting, while the latter aims to transcribe speech from multiple speakers simultaneously, which poses significant technical difficulties due to overlapping speech and acoustic interferences.}(hh\hhhNhNubah}(h!]h#]h%]h']h)]uh+h>hKhh,hh-hhubh?)}(hXhBuilding on the success of the previous M2MeT challenge, we are excited to propose the M2MeT2.0 challenge as an ASRU2023 challenge special session. In the original M2MeT challenge, the evaluation metric was speaker-independent, which meant that the transcription could be determined, but not the corresponding speaker. To address this limitation and further advance the current multi-talker ASR system towards practicality, the M2MeT2.0 challenge proposes the speaker-attributed ASR task with two sub-tracks: fixed and open training conditions. The speaker-attribute automatic speech recognition (ASR) task aims to tackle the practical and challenging problem of identifying "who spoke what at when". To facilitate reproducible research in this field, we offer a comprehensive overview of the dataset, rules, evaluation metrics, and baseline systems. Furthermore, we will release a carefully curated test set, comprising approximately 10 hours of audio, according to the timeline. The new test set is designed to enable researchers to validate and compare their models' performance and advance the state of the art in this area.h]hXnBuilding on the success of the previous M2MeT challenge, we are excited to propose the M2MeT2.0 challenge as an ASRU2023 challenge special session. In the original M2MeT challenge, the evaluation metric was speaker-independent, which meant that the transcription could be determined, but not the corresponding speaker. To address this limitation and further advance the current multi-talker ASR system towards practicality, the M2MeT2.0 challenge proposes the speaker-attributed ASR task with two sub-tracks: fixed and open training conditions. The speaker-attribute automatic speech recognition (ASR) task aims to tackle the practical and challenging problem of identifying âwho spoke what at whenâ. To facilitate reproducible research in this field, we offer a comprehensive overview of the dataset, rules, evaluation metrics, and baseline systems. Furthermore, we will release a carefully curated test set, comprising approximately 10 hours of audio, according to the timeline. The new test set is designed to enable researchers to validate and compare their modelsâ performance and advance the state of the art in this area.}(hhjhhhNhNubah}(h!]h#]h%]h']h)]uh+h>hK    hh,hh-hhubeh}(h!]call-for-participationah#]h%]call for participationah']h)]slugcall-for-participationuh+h
hKhh,hhhhubh)}(hhh](h)}(hTimeline(AOE Time)h]hTimeline(AOE Time)}(hhhhhNhNubah}(h!]h#]h%]h']h)]uh+hhKhh,hhhhubh    bullet_list)}(hhh](h        list_item)}(hhh]h?)}(h4$ April~29, 2023: $ Challenge and registration open.h](h    math)}(h April~29, 2023: h]h April~29, 2023: }(hh£hhhNhNubah}(h!]h#]h%]h']h)]uh+h¡hKhh,hhhhubh! Challenge and registration open.}(hhhhhNhNubeh}(h!]h#]h%]h']h)]uh+h>hKhh,hhhhubah}(h!]h#]h%]h']h)]uh+hhKhh,hhhhubh)}(hhh]h?)}(h"$ May~8, 2023: $ Baseline release.h](h¢)}(h May~8, 2023: h]h May~8, 2023: }(hhÈhhhNhNubah}(h!]h#]h%]h']h)]uh+h¡hK hh,hhÄhhubh Baseline release.}(hhÄhhhNhNubeh}(h!]h#]h%]h']h)]uh+h>hK hh,hhÁhhubah}(h!]h#]h%]h']h)]uh+hhK hh,hhhhubh)}(hhh]h?)}(h]$ May~15, 2023: $ Registration deadline, the due date for participants to join the Challenge.h](h¢)}(h May~15, 2023: h]h May~15, 2023: }(hhíhhhNhNubah}(h!]h#]h%]h']h)]uh+h¡hKhh,hhéhhubhL Registration deadline, the due date for participants to join the Challenge.}(hhéhhhNhNubeh}(h!]h#]h%]h']h)]uh+h>hKhh,hhæhhubah}(h!]h#]h%]h']h)]uh+hhKhh,hhhhubh)}(hhh]h?)}(h9$ June~9, 2023: $ Test data release and leaderboard open.h](h¢)}(h June~9, 2023: h]h June~9, 2023: }(hjhhhNhNubah}(h!]h#]h%]h']h)]uh+h¡hKhh,hjhhubh( Test data release and leaderboard open.}(hjhhhNhNubeh}(h!]h#]h%]h']h)]uh+h>hKhh,hjhhubah}(h!]h#]h%]h']h)]uh+hhKhh,hhhhubh)}(hhh]h?)}(h-$ June~13, 2023: $ Final submission deadline.h](h¢)}(h June~13, 2023: h]h June~13, 2023: }(hj7hhhNhNubah}(h!]h#]h%]h']h)]uh+h¡hKhh,hj3hhubh Final submission deadline.}(hj3hhhNhNubeh}(h!]h#]h%]h']h)]uh+h>hKhh,hj0hhubah}(h!]h#]h%]h']h)]uh+hhKhh,hhhhubh)}(hhh]h?)}(h9$ June~19, 2023: $ Evaluation result and ranking release.h](h¢)}(h June~19, 2023: h]h June~19, 2023: }(hj\hhhNhNubah}(h!]h#]h%]h']h)]uh+h¡hKhh,hjXhhubh' Evaluation result and ranking release.}(hjXhhhNhNubeh}(h!]h#]h%]h']h)]uh+h>hKhh,hjUhhubah}(h!]h#]h%]h']h)]uh+hhKhh,hhhhubh)}(hhh]h?)}(h0$ July~3, 2023: $ Deadline for paper submission.h](h¢)}(h July~3, 2023: h]h July~3, 2023: }(hjhhhNhNubah}(h!]h#]h%]h']h)]uh+h¡hKhh,hj}hhubh Deadline for paper submission.}(hj}hhhNhNubeh}(h!]h#]h%]h']h)]uh+h>hKhh,hjzhhubah}(h!]h#]h%]h']h)]uh+hhKhh,hhhhubh)}(hhh]h?)}(h7$ July~10, 2023: $ Deadline for final paper submission.h](h¢)}(h July~10, 2023: h]h July~10, 2023: }(hj¦hhhNhNubah}(h!]h#]h%]h']h)]uh+h¡hKhh,hj¢hhubh% Deadline for final paper submission.}(hj¢hhhNhNubeh}(h!]h#]h%]h']h)]uh+h>hKhh,hjhhubah}(h!]h#]h%]h']h)]uh+hhKhh,hhhhubh)}(hhh]h?)}(hB$ December~12\ to\ 16, 2023: $ ASRU Workshop and challenge sessionh](h¢)}(h December~12\ to\ 16, 2023: h]h December~12\ to\ 16, 2023: }(hjËhhhNhNubah}(h!]h#]h%]h']h)]uh+h¡hKhh,hjÇhhubh$ ASRU Workshop and challenge session}(hjÇhhhNhNubeh}(h!]h#]h%]h']h)]uh+h>hKhh,hjÄhhubah}(h!]h#]h%]h']h)]uh+hhKhh,hhhhubeh}(h!]h#]h%]h']h)]bullet-uh+hhKhh,hhhhubeh}(h!]timeline-aoe-timeah#]h%]timeline(aoe time)ah']h)]htimelineaoe-timeuh+h
hKhh,hhhhubh)}(hhh](h)}(h
Guidelinesh]h
Guidelines}(hjýhhhNhNubah}(h!]h#]h%]h']h)]uh+hhKhh,hjúhhubh?)}(h¯Interested participants, whether from academia or industry, must register for the challenge by completing the Google form below. The deadline for registration is May 15, 2023.h]h¯Interested participants, whether from academia or industry, must register for the challenge by completing the Google form below. The deadline for registration is May 15, 2023.}(hjhhhNhNubah}(h!]h#]h%]h']h)]uh+h>hKhh,hjúhhubh?)}(h[M2MET2.0 Registration](https://docs.google.com/forms/d/e/1FAIpQLSf77T9vAl7Ym-u5g8gXu18SBofoWRaFShBo26Ym0-HDxHW9PQ/viewform?usp=sf_link)h]h        reference)}(hM2MET2.0 Registrationh]hM2MET2.0 Registration}(hjhhhNhNubah}(h!]h#]h%]h']h)]refuriohttps://docs.google.com/forms/d/e/1FAIpQLSf77T9vAl7Ym-u5g8gXu18SBofoWRaFShBo26Ym0-HDxHW9PQ/viewform?usp=sf_linkuh+jhKhh,hjhhubah}(h!]h#]h%]h']h)]uh+h>hKhh,hjúhhubh?)}(hXÙWithin three working days, the challenge organizer will send email invitations to eligible teams to participate in the challenge. All qualified teams are required to adhere to the challenge rules, which will be published on the challenge page. Prior to the ranking release time, each participant must submit a system description document detailing their approach and methods. The organizer will select the top ranking submissions to be included in the ASRU2023 Proceedings.h]hXÙWithin three working days, the challenge organizer will send email invitations to eligible teams to participate in the challenge. All qualified teams are required to adhere to the challenge rules, which will be published on the challenge page. Prior to the ranking release time, each participant must submit a system description document detailing their approach and methods. The organizer will select the top ranking submissions to be included in the ASRU2023 Proceedings.}(hj5hhhNhNubah}(h!]h#]h%]h']h)]uh+h>hKhh,hjúhhubeh}(h!]
guidelinesah#]h%]
guidelinesah']h)]h
guidelinesuh+h
hKhh,hhhhubeh}(h!]introductionah#]h%]introductionah']h)]hintroductionuh+h
hKhh,hhhhubah}(h!]h#]h%]h']h)]sourceh,uh+hcurrent_sourceNcurrent_lineNsettingsdocutils.frontendValues)}(hN    generatorN    datestampNsource_linkN
source_urlN toc_backlinksentryfootnote_backlinksK sectnum_xformKstrip_commentsNstrip_elements_with_classesN strip_classesNreport_levelK
halt_levelKexit_status_levelKdebugNwarning_streamN    tracebackinput_encoding    utf-8-siginput_encoding_error_handlerstrictoutput_encodingutf-8output_encoding_error_handlerjxerror_encodingUTF-8error_encoding_error_handlerbackslashreplace language_codeenrecord_dependenciesNconfigN    id_prefixhauto_id_prefixid dump_settingsNdump_internalsNdump_transformsNdump_pseudo_xmlNexpose_internalsNstrict_visitorN_disable_configN_sourceh,_destinationN _config_files]file_insertion_enabledraw_enabledKline_length_limitM'pep_referencesNpep_base_urlhttps://peps.python.org/pep_file_url_templatepep-%04drfc_referencesNrfc_base_url&https://datatracker.ietf.org/doc/html/    tab_widthKtrim_footnote_reference_spacesyntax_highlightlongsmart_quotessmartquotes_locales]character_level_inline_markupdoctitle_xform docinfo_xformKsectsubtitle_xform image_loadinglinkembed_stylesheetcloak_email_addressessection_self_linkenvNubreporterNindirect_targets]substitution_defs}(wordcount-wordsh    substitution_definition)}(h542h]h542}hj¶sbah}(h!]h#]h%]wordcount-wordsah']h)]uh+j´hh,ubwordcount-minutesjµ)}(h3h]h3}hjÆsbah}(h!]h#]h%]wordcount-minutesah']h)]uh+j´hh,ubusubstitution_names}(wordcount-wordsj³wordcount-minutesjÅurefnames}refids}nameids}(jQjNh}hzjöjójHjEu    nametypes}(jQh}jöjHuh!}(jNhhzh-jóhjEjúu footnote_refs} citation_refs} autofootnotes]autofootnote_refs]symbol_footnotes]symbol_footnote_refs]    footnotes]    citations]autofootnote_startKsymbol_footnote_startK
id_countercollectionsCounter}Rparse_messages]transform_messages]transformerNinclude_log]
decorationNhh
myst_slugs}(jTKjNIntroductionhKhzCall for participationjùKjóTimeline(AOE Time)jKKjE
Guidelinesuub.