add m2met2 registration form
| | |
| | | ## Overview of training data |
| | | In the fixed training condition, the training dataset is restricted to three publicly available corpora, namely, AliMeeting, AISHELL-4, and CN-Celeb. To evaluate the performance of the models trained on these datasets, we will release a new Test set called Test-2023 for scoring and ranking. We will describe the AliMeeting dataset and the Test-2023 set in detail. |
| | | ## Detail of AliMeeting corpus |
| | | AliMeeting contains 118.75 hours of speech data in total. The dataset is divided into 104.75 hours for training (Train), 4 hours for evaluation (Eval) and 10 hours as test set (Test) for scoring and ranking. Specifically, the Train and Eval sets contain 212 and 8 sessions, respectively. Each session consists of a 15 to 30-minute discussion by a group of participants. The total number of participants in Train and Eval sets is 456 and 25, respectively, with balanced gender coverage. |
| | | AliMeeting contains 118.75 hours of speech data in total. The dataset is divided into 104.75 hours for training (Train), 4 hours for evaluation (Eval) and 10 hours as test set (Test) for scoring and ranking. Specifically, the Train, Eval and Test sets contain 212, 8 and 20 sessions, respectively. Each session consists of a 15 to 30-minute discussion by a group of participants. The total number of participants in Train, Eval and Test sets is 456, 25 and 60, respectively, with balanced gender coverage. |
| | | |
| | | The dataset is collected in 13 meeting venues, which are categorized into three types: small, medium, and large rooms with sizes ranging from 8 m$^{2}$ to 55 m$^{2}$. Different rooms give us a variety of acoustic properties and layouts. The detailed parameters of each meeting venue will be released together with the Train data. The type of wall material of the meeting venues covers cement, glass, etc. Other furnishings in meeting venues include sofa, TV, blackboard, fan, air conditioner, plants, etc. During recording, the participants of the meeting sit around the microphone array which is placed on the table and conduct a natural conversation. The microphone-speaker distance ranges from 0.3 m to 5.0 m. All participants are native Chinese speakers speaking Mandarin without strong accents. During the meeting, various kinds of indoor noise including but not limited to clicking, keyboard, door opening/closing, fan, bubble noise, etc., are made naturally. For both Train and Eval sets, the participants are required to remain in the same position during recording. There is no speaker overlap between the Train and Eval set. An example of the recording venue from the Train set is shown in Fig 1. |
| | | |
| | |
| | | |
| | | ## Guidelines |
| | | |
| | | Interested participants, whether from academia or industry, must register for the challenge by completing a Google form, which will be available here. The deadline for registration is May 5, 2023. |
| | | Interested participants, whether from academia or industry, must register for the challenge by completing the Google form below. The deadline for registration is May 5, 2023. |
| | | |
| | | [M2MET2.0 Registration](https://docs.google.com/forms/d/e/1FAIpQLSf77T9vAl7Ym-u5g8gXu18SBofoWRaFShBo26Ym0-HDxHW9PQ/viewform?usp=sf_link) |
| | | |
| | | Within three working days, the challenge organizer will send email invitations to eligible teams to participate in the challenge. All qualified teams are required to adhere to the challenge rules, which will be published on the challenge page. Prior to the ranking release time, each participant must submit a system description document detailing their approach and methods. The organizer will select the top three submissions to be included in the ASRU2023 Proceedings. |
| | |
| | | # Track & Evaluation |
| | | ## Speaker-Attributed ASR (Main Track) |
| | | The speaker-attributed ASR task poses a unique challenge of transcribing speech from multiple speakers and assigning a speaker label to the transcription. Figure 2 illustrates the difference between the speaker-attributed ASR task and the multi-speaker ASR task. This track allows for the use of the AliMeeting, Aishell4, and Cn-Celeb datasets as constrained data sources during both training and evaluation. The AliMeeting dataset, which was used in the M2MeT challenge, includes Train, Eval, and Test sets. Additionally, a new Test-2023 set, consisting of approximately 10 hours of meeting data recorded in an identical acoustic setting as the AliMeeting corpus, will be released soon for challenge scoring and ranking. It's worth noting that the organizers will not provide the near-field audio, transcriptions, or oracle timestamps. Instead, segments containing multiple speakers will be provided on the Test-2023 set, which can be obtained using a simple voice activity detection (VAD) model. |
| | | ## Speaker-Attributed ASR |
| | | The speaker-attributed ASR task poses a unique challenge of transcribing speech from multiple speakers and assigning a speaker label to the transcription. Figure 2 illustrates the difference between the speaker-attributed ASR task and the multi-speaker ASR task. This track allows for the use of the AliMeeting, Aishell4, and Cn-Celeb datasets as constrained data sources during both training and evaluation. The AliMeeting dataset, which was used in the M2MeT challenge, includes Train, Eval, and Test sets. Additionally, a new Test-2023 set, consisting of approximately 10 hours of meeting data recorded in an identical acoustic setting as the AliMeeting corpus, will be released soon for challenge scoring and ranking. It's worth noting that the organizers will not provide the near-field audio, transcriptions, or oracle timestamps of the Test-2023 set. Instead, segments containing multiple speakers will be provided, which can be obtained using a simple voice activity detection (VAD) model. |
| | | |
| | |  |
| | | |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1"><a class="reference internal" href="Track_setting_and_evaluation.html">Track & Evaluation</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#speaker-attributed-asr-main-track">Speaker-Attributed ASR (Main Track)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#speaker-attributed-asr">Speaker-Attributed ASR</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#evaluation-metric">Evaluation metric</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#sub-track-arrangement">Sub-track arrangement</a></li> |
| | | </ul> |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1"><a class="reference internal" href="Track_setting_and_evaluation.html">Track & Evaluation</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#speaker-attributed-asr-main-track">Speaker-Attributed ASR (Main Track)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#speaker-attributed-asr">Speaker-Attributed ASR</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#evaluation-metric">Evaluation metric</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#sub-track-arrangement">Sub-track arrangement</a></li> |
| | | </ul> |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1"><a class="reference internal" href="Track_setting_and_evaluation.html">Track & Evaluation</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#speaker-attributed-asr-main-track">Speaker-Attributed ASR (Main Track)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#speaker-attributed-asr">Speaker-Attributed ASR</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#evaluation-metric">Evaluation metric</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#sub-track-arrangement">Sub-track arrangement</a></li> |
| | | </ul> |
| | |
| | | </section> |
| | | <section id="detail-of-alimeeting-corpus"> |
| | | <h2>Detail of AliMeeting corpus<a class="headerlink" href="#detail-of-alimeeting-corpus" title="Permalink to this heading">¶</a></h2> |
| | | <p>AliMeeting contains 118.75 hours of speech data in total. The dataset is divided into 104.75 hours for training (Train), 4 hours for evaluation (Eval) and 10 hours as test set (Test) for scoring and ranking. Specifically, the Train and Eval sets contain 212 and 8 sessions, respectively. Each session consists of a 15 to 30-minute discussion by a group of participants. The total number of participants in Train and Eval sets is 456 and 25, respectively, with balanced gender coverage.</p> |
| | | <p>AliMeeting contains 118.75 hours of speech data in total. The dataset is divided into 104.75 hours for training (Train), 4 hours for evaluation (Eval) and 10 hours as test set (Test) for scoring and ranking. Specifically, the Train, Eval and Test sets contain 212, 8 and 20 sessions, respectively. Each session consists of a 15 to 30-minute discussion by a group of participants. The total number of participants in Train, Eval and Test sets is 456, 25 and 60, respectively, with balanced gender coverage.</p> |
| | | <p>The dataset is collected in 13 meeting venues, which are categorized into three types: small, medium, and large rooms with sizes ranging from 8 m<span class="math notranslate nohighlight">\(^{2}\)</span> to 55 m<span class="math notranslate nohighlight">\(^{2}\)</span>. Different rooms give us a variety of acoustic properties and layouts. The detailed parameters of each meeting venue will be released together with the Train data. The type of wall material of the meeting venues covers cement, glass, etc. Other furnishings in meeting venues include sofa, TV, blackboard, fan, air conditioner, plants, etc. During recording, the participants of the meeting sit around the microphone array which is placed on the table and conduct a natural conversation. The microphone-speaker distance ranges from 0.3 m to 5.0 m. All participants are native Chinese speakers speaking Mandarin without strong accents. During the meeting, various kinds of indoor noise including but not limited to clicking, keyboard, door opening/closing, fan, bubble noise, etc., are made naturally. For both Train and Eval sets, the participants are required to remain in the same position during recording. There is no speaker overlap between the Train and Eval set. An example of the recording venue from the Train set is shown in Fig 1.</p> |
| | | <p><img alt="meeting room" src="_images/meeting_room.png" /></p> |
| | | <p>The number of participants within one meeting session ranges from 2 to 4. To ensure the coverage of different overlap ratios, we select various meeting topics during recording, including medical treatment, education, business, organization management, industrial production and other daily routine meetings. The average speech overlap ratio of Train, Eval and Test sets are 42.27%, 34.76% and 42.8%, respectively. More details of AliMeeting are shown in Table 1. A detailed overlap ratio distribution of meeting sessions with different numbers of speakers in the Train, Eval and Test set is shown in Table 2.</p> |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1"><a class="reference internal" href="Track_setting_and_evaluation.html">Track & Evaluation</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#speaker-attributed-asr-main-track">Speaker-Attributed ASR (Main Track)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#speaker-attributed-asr">Speaker-Attributed ASR</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#evaluation-metric">Evaluation metric</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#sub-track-arrangement">Sub-track arrangement</a></li> |
| | | </ul> |
| | |
| | | </section> |
| | | <section id="guidelines"> |
| | | <h2>Guidelines<a class="headerlink" href="#guidelines" title="Permalink to this heading">¶</a></h2> |
| | | <p>Interested participants, whether from academia or industry, must register for the challenge by completing a Google form, which will be available here. The deadline for registration is May 5, 2023.</p> |
| | | <p>Interested participants, whether from academia or industry, must register for the challenge by completing the Google form below. The deadline for registration is May 5, 2023.</p> |
| | | <p><a class="reference external" href="https://docs.google.com/forms/d/e/1FAIpQLSf77T9vAl7Ym-u5g8gXu18SBofoWRaFShBo26Ym0-HDxHW9PQ/viewform?usp=sf_link">M2MET2.0 registration</a></p> |
| | | <p>Within three working days, the challenge organizer will send email invitations to eligible teams to participate in the challenge. All qualified teams are required to adhere to the challenge rules, which will be published on the challenge page. Prior to the ranking release time, each participant must submit a system description document detailing their approach and methods. The organizer will select the top three submissions to be included in the ASRU2023 Proceedings.</p> |
| | | </section> |
| | | </section> |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1"><a class="reference internal" href="Track_setting_and_evaluation.html">Track & Evaluation</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#speaker-attributed-asr-main-track">Speaker-Attributed ASR (Main Track)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#speaker-attributed-asr">Speaker-Attributed ASR</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#evaluation-metric">Evaluation metric</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#sub-track-arrangement">Sub-track arrangement</a></li> |
| | | </ul> |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1"><a class="reference internal" href="Track_setting_and_evaluation.html">Track & Evaluation</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#speaker-attributed-asr-main-track">Speaker-Attributed ASR (Main Track)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#speaker-attributed-asr">Speaker-Attributed ASR</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#evaluation-metric">Evaluation metric</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#sub-track-arrangement">Sub-track arrangement</a></li> |
| | | </ul> |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1 current"><a class="current reference internal" href="#">Track & Evaluation</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="#speaker-attributed-asr-main-track">Speaker-Attributed ASR (Main Track)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="#speaker-attributed-asr">Speaker-Attributed ASR</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="#evaluation-metric">Evaluation metric</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="#sub-track-arrangement">Sub-track arrangement</a></li> |
| | | </ul> |
| | |
| | | |
| | | <section id="track-evaluation"> |
| | | <h1>Track & Evaluation<a class="headerlink" href="#track-evaluation" title="Permalink to this heading">¶</a></h1> |
| | | <section id="speaker-attributed-asr-main-track"> |
| | | <h2>Speaker-Attributed ASR (Main Track)<a class="headerlink" href="#speaker-attributed-asr-main-track" title="Permalink to this heading">¶</a></h2> |
| | | <p>The speaker-attributed ASR task poses a unique challenge of transcribing speech from multiple speakers and assigning a speaker label to the transcription. Figure 2 illustrates the difference between the speaker-attributed ASR task and the multi-speaker ASR task. This track allows for the use of the AliMeeting, Aishell4, and Cn-Celeb datasets as constrained data sources during both training and evaluation. The AliMeeting dataset, which was used in the M2MeT challenge, includes Train, Eval, and Test sets. Additionally, a new Test-2023 set, consisting of approximately 10 hours of meeting data recorded in an identical acoustic setting as the AliMeeting corpus, will be released soon for challenge scoring and ranking. Itâs worth noting that the organizers will not provide the near-field audio, transcriptions, or oracle timestamps. Instead, segments containing multiple speakers will be provided on the Test-2023 set, which can be obtained using a simple voice activity detection (VAD) model.</p> |
| | | <section id="speaker-attributed-asr"> |
| | | <h2>Speaker-Attributed ASR<a class="headerlink" href="#speaker-attributed-asr" title="Permalink to this heading">¶</a></h2> |
| | | <p>The speaker-attributed ASR task poses a unique challenge of transcribing speech from multiple speakers and assigning a speaker label to the transcription. Figure 2 illustrates the difference between the speaker-attributed ASR task and the multi-speaker ASR task. This track allows for the use of the AliMeeting, Aishell4, and Cn-Celeb datasets as constrained data sources during both training and evaluation. The AliMeeting dataset, which was used in the M2MeT challenge, includes Train, Eval, and Test sets. Additionally, a new Test-2023 set, consisting of approximately 10 hours of meeting data recorded in an identical acoustic setting as the AliMeeting corpus, will be released soon for challenge scoring and ranking. Itâs worth noting that the organizers will not provide the near-field audio, transcriptions, or oracle timestamps of the Test-2023 set. Instead, segments containing multiple speakers will be provided, which can be obtained using a simple voice activity detection (VAD) model.</p> |
| | | <p><img alt="task difference" src="_images/task_diff.png" /></p> |
| | | </section> |
| | | <section id="evaluation-metric"> |
| | |
| | | ## Overview of training data |
| | | In the fixed training condition, the training dataset is restricted to three publicly available corpora, namely, AliMeeting, AISHELL-4, and CN-Celeb. To evaluate the performance of the models trained on these datasets, we will release a new Test set called Test-2023 for scoring and ranking. We will describe the AliMeeting dataset and the Test-2023 set in detail. |
| | | ## Detail of AliMeeting corpus |
| | | AliMeeting contains 118.75 hours of speech data in total. The dataset is divided into 104.75 hours for training (Train), 4 hours for evaluation (Eval) and 10 hours as test set (Test) for scoring and ranking. Specifically, the Train and Eval sets contain 212 and 8 sessions, respectively. Each session consists of a 15 to 30-minute discussion by a group of participants. The total number of participants in Train and Eval sets is 456 and 25, respectively, with balanced gender coverage. |
| | | AliMeeting contains 118.75 hours of speech data in total. The dataset is divided into 104.75 hours for training (Train), 4 hours for evaluation (Eval) and 10 hours as test set (Test) for scoring and ranking. Specifically, the Train, Eval and Test sets contain 212, 8 and 20 sessions, respectively. Each session consists of a 15 to 30-minute discussion by a group of participants. The total number of participants in Train, Eval and Test sets is 456, 25 and 60, respectively, with balanced gender coverage. |
| | | |
| | | The dataset is collected in 13 meeting venues, which are categorized into three types: small, medium, and large rooms with sizes ranging from 8 m$^{2}$ to 55 m$^{2}$. Different rooms give us a variety of acoustic properties and layouts. The detailed parameters of each meeting venue will be released together with the Train data. The type of wall material of the meeting venues covers cement, glass, etc. Other furnishings in meeting venues include sofa, TV, blackboard, fan, air conditioner, plants, etc. During recording, the participants of the meeting sit around the microphone array which is placed on the table and conduct a natural conversation. The microphone-speaker distance ranges from 0.3 m to 5.0 m. All participants are native Chinese speakers speaking Mandarin without strong accents. During the meeting, various kinds of indoor noise including but not limited to clicking, keyboard, door opening/closing, fan, bubble noise, etc., are made naturally. For both Train and Eval sets, the participants are required to remain in the same position during recording. There is no speaker overlap between the Train and Eval set. An example of the recording venue from the Train set is shown in Fig 1. |
| | | |
| | |
| | | |
| | | ## Guidelines |
| | | |
| | | Interested participants, whether from academia or industry, must register for the challenge by completing a Google form, which will be available here. The deadline for registration is May 5, 2023. |
| | | Interested participants, whether from academia or industry, must register for the challenge by completing the Google form below. The deadline for registration is May 5, 2023. |
| | | |
| | | [M2MET2.0 registration](https://docs.google.com/forms/d/e/1FAIpQLSf77T9vAl7Ym-u5g8gXu18SBofoWRaFShBo26Ym0-HDxHW9PQ/viewform?usp=sf_link) |
| | | |
| | | Within three working days, the challenge organizer will send email invitations to eligible teams to participate in the challenge. All qualified teams are required to adhere to the challenge rules, which will be published on the challenge page. Prior to the ranking release time, each participant must submit a system description document detailing their approach and methods. The organizer will select the top three submissions to be included in the ASRU2023 Proceedings. |
| | |
| | | # Track & Evaluation |
| | | ## Speaker-Attributed ASR (Main Track) |
| | | The speaker-attributed ASR task poses a unique challenge of transcribing speech from multiple speakers and assigning a speaker label to the transcription. Figure 2 illustrates the difference between the speaker-attributed ASR task and the multi-speaker ASR task. This track allows for the use of the AliMeeting, Aishell4, and Cn-Celeb datasets as constrained data sources during both training and evaluation. The AliMeeting dataset, which was used in the M2MeT challenge, includes Train, Eval, and Test sets. Additionally, a new Test-2023 set, consisting of approximately 10 hours of meeting data recorded in an identical acoustic setting as the AliMeeting corpus, will be released soon for challenge scoring and ranking. It's worth noting that the organizers will not provide the near-field audio, transcriptions, or oracle timestamps. Instead, segments containing multiple speakers will be provided on the Test-2023 set, which can be obtained using a simple voice activity detection (VAD) model. |
| | | ## Speaker-Attributed ASR |
| | | The speaker-attributed ASR task poses a unique challenge of transcribing speech from multiple speakers and assigning a speaker label to the transcription. Figure 2 illustrates the difference between the speaker-attributed ASR task and the multi-speaker ASR task. This track allows for the use of the AliMeeting, Aishell4, and Cn-Celeb datasets as constrained data sources during both training and evaluation. The AliMeeting dataset, which was used in the M2MeT challenge, includes Train, Eval, and Test sets. Additionally, a new Test-2023 set, consisting of approximately 10 hours of meeting data recorded in an identical acoustic setting as the AliMeeting corpus, will be released soon for challenge scoring and ranking. It's worth noting that the organizers will not provide the near-field audio, transcriptions, or oracle timestamps of the Test-2023 set. Instead, segments containing multiple speakers will be provided, which can be obtained using a simple voice activity detection (VAD) model. |
| | | |
| | |  |
| | | |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1"><a class="reference internal" href="Track_setting_and_evaluation.html">Track & Evaluation</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#speaker-attributed-asr-main-track">Speaker-Attributed ASR (Main Track)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#speaker-attributed-asr">Speaker-Attributed ASR</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#evaluation-metric">Evaluation metric</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#sub-track-arrangement">Sub-track arrangement</a></li> |
| | | </ul> |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1"><a class="reference internal" href="Track_setting_and_evaluation.html">Track & Evaluation</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#speaker-attributed-asr-main-track">Speaker-Attributed ASR (Main Track)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#speaker-attributed-asr">Speaker-Attributed ASR</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#evaluation-metric">Evaluation metric</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#sub-track-arrangement">Sub-track arrangement</a></li> |
| | | </ul> |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1"><a class="reference internal" href="Track_setting_and_evaluation.html">Track & Evaluation</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#speaker-attributed-asr-main-track">Speaker-Attributed ASR (Main Track)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#speaker-attributed-asr">Speaker-Attributed ASR</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#evaluation-metric">Evaluation metric</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="Track_setting_and_evaluation.html#sub-track-arrangement">Sub-track arrangement</a></li> |
| | | </ul> |
| | |
| | | Search.setIndex({"docnames": ["Baseline", "Contact", "Dataset", "Introduction", "Organizers", "Rules", "Track_setting_and_evaluation", "index"], "filenames": ["Baseline.md", "Contact.md", "Dataset.md", "Introduction.md", "Organizers.md", "Rules.md", "Track_setting_and_evaluation.md", "index.rst"], "titles": ["Baseline", "Contact", "Datasets", "Introduction", "Organizers", "Rules", "Track & Evaluation", "ASRU 2023 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE 2.0 (M2MeT2.0)"], "terms": {"we": [0, 2, 3, 7], "releas": [0, 2, 3, 6], "an": [0, 2, 3, 6], "e2": 0, "sa": 0, "asr": [0, 3, 7], "cite": 0, "kanda21b_interspeech": 0, "conduct": [0, 2], "funasr": 0, "time": [0, 6], "accord": [0, 3], "timelin": [0, 2], "The": [0, 2, 3, 5, 6], "model": [0, 2, 3, 5, 6], "architectur": 0, "i": [0, 2, 3, 5], "shown": [0, 2], "figur": [0, 6], "3": [0, 2, 3], "speakerencod": 0, "initi": 0, "pre": [0, 6], "train": [0, 3, 5, 7], "speaker": [0, 2, 3, 7], "verif": 0, "from": [0, 2, 3, 5, 6], "modelscop": [0, 6], "thi": [0, 3, 5, 6], "also": [0, 2, 6], "us": [0, 2, 5, 6], "extract": 0, "embed": 0, "profil": 0, "todo": 0, "fill": 0, "readm": 0, "md": 0, "system": [0, 3, 5, 6, 7], "ar": [0, 2, 3, 5, 6, 7], "tabl": [0, 2], "adopt": 0, "oracl": [0, 6], "dure": [0, 2, 6], "howev": [0, 3, 6], "due": [0, 3], "lack": 0, "label": [0, 5, 6], "evalu": [0, 2, 3, 7], "provid": [0, 2, 6, 7], "addit": [0, 6], "spectral": 0, "cluster": 0, "meanwhil": 0, "eval": [0, 2, 5, 6], "test": [0, 2, 3, 5, 6], "set": [0, 2, 3, 5, 6], "show": 0, "impact": 0, "accuraci": [0, 6], "If": [1, 5, 6], "you": 1, "have": [1, 3], "ani": [1, 5, 6], "question": 1, "about": 1, "m2met2": [1, 3], "0": [1, 2, 3], "challeng": [1, 3, 5, 6], "pleas": 1, "u": [1, 2], "email": [1, 3, 4], "m2met": [1, 3, 6, 7], "alimeet": [1, 6], "gmail": 1, "com": [1, 4], "wechat": 1, "group": [1, 2], "In": [2, 3, 5], "fix": [2, 3, 7], "condit": [2, 3, 7], "restrict": 2, "three": [2, 3, 6], "publicli": [2, 6], "avail": [2, 3, 6], "corpora": 2, "name": 2, "aishel": [2, 4, 6], "4": [2, 6], "cn": [2, 4, 6], "celeb": [2, 6], "To": [2, 3, 7], "perform": [2, 3], "new": [2, 3, 6], "call": 2, "2023": [2, 3, 5, 6], "score": [2, 6], "rank": [2, 3, 6], "describ": 2, "contain": [2, 6], "118": 2, "75": 2, "hour": [2, 3, 6], "speech": [2, 3, 6, 7], "total": [2, 6], "divid": [2, 6], "104": 2, "10": [2, 3, 6], "specif": [2, 6], "212": 2, "8": 2, "session": [2, 3, 6, 7], "respect": 2, "each": [2, 3, 6], "consist": [2, 6], "15": 2, "30": 2, "minut": 2, "discuss": 2, "particip": [2, 5, 6], "number": [2, 3, 6], "456": 2, "25": 2, "balanc": 2, "gender": 2, "coverag": 2, "collect": 2, "13": [2, 3], "meet": [2, 3, 6], "venu": 2, "which": [2, 3, 6], "categor": 2, "type": 2, "small": 2, "medium": 2, "larg": [2, 3], "room": [2, 3], "size": 2, "rang": 2, "m": 2, "2": [2, 6], "55": 2, "differ": [2, 3, 5, 6], "give": 2, "varieti": 2, "acoust": [2, 3, 6], "properti": 2, "layout": 2, "paramet": [2, 5], "togeth": 2, "wall": 2, "materi": 2, "cover": 2, "cement": 2, "glass": 2, "etc": 2, "other": 2, "furnish": 2, "includ": [2, 3, 5, 6], "sofa": 2, "tv": 2, "blackboard": 2, "fan": 2, "air": 2, "condition": 2, "plant": 2, "record": [2, 6], "sit": 2, "around": 2, "microphon": [2, 3], "arrai": [2, 3], "place": 2, "natur": 2, "convers": 2, "distanc": 2, "5": [2, 3], "all": [2, 3, 5, 6], "nativ": 2, "chines": 2, "speak": [2, 3], "mandarin": [2, 3], "without": 2, "strong": 2, "accent": 2, "variou": [2, 3], "kind": 2, "indoor": 2, "nois": [2, 3, 5], "limit": [2, 3, 5], "click": 2, "keyboard": 2, "door": 2, "open": [2, 3, 7], "close": 2, "bubbl": 2, "made": [2, 3], "For": 2, "both": [2, 6], "requir": [2, 3, 6], "remain": [2, 3], "same": [2, 5], "posit": 2, "There": 2, "overlap": [2, 3], "between": [2, 6], "exampl": 2, "fig": 2, "1": 2, "within": [2, 3], "one": [2, 5], "ensur": 2, "ratio": 2, "select": [2, 3, 5, 6], "topic": 2, "medic": 2, "treatment": 2, "educ": 2, "busi": 2, "organ": [2, 3, 5, 6, 7], "manag": 2, "industri": [2, 3], "product": 2, "daili": 2, "routin": 2, "averag": 2, "42": 2, "27": 2, "34": 2, "76": 2, "more": 2, "A": [2, 4], "distribut": 2, "20": 2, "were": 2, "ident": [2, 6], "compris": [2, 3, 7], "therebi": 2, "share": 2, "similar": 2, "configur": 2, "field": [2, 3, 6], "signal": [2, 3], "headset": 2, "onli": [2, 5, 6], "": [2, 6], "own": 2, "transcrib": [2, 3, 6], "It": [2, 6], "worth": [2, 6], "note": [2, 6], "far": [2, 3], "audio": [2, 3, 6], "synchron": 2, "common": 2, "transcript": [2, 3, 5, 6], "prepar": 2, "textgrid": 2, "format": 2, "inform": [2, 3], "durat": 2, "id": 2, "segment": [2, 6], "timestamp": [2, 6], "mention": 2, "abov": 2, "can": [2, 3, 5, 6], "download": 2, "openslr": 2, "via": 2, "follow": [2, 5], "link": 2, "particularli": 2, "baselin": [2, 3, 7], "conveni": 2, "script": 2, "automat": [3, 7], "recognit": [3, 7], "diariz": 3, "signific": 3, "stride": 3, "recent": 3, "year": 3, "result": 3, "surg": 3, "technologi": 3, "applic": 3, "across": 3, "domain": 3, "present": 3, "uniqu": [3, 6], "complex": [3, 5], "divers": 3, "style": 3, "variabl": 3, "confer": 3, "environment": 3, "reverber": [3, 5], "over": 3, "sever": 3, "been": 3, "advanc": [3, 7], "develop": [3, 6], "rich": 3, "comput": [3, 5], "hear": 3, "multisourc": 3, "environ": 3, "chime": 3, "latest": 3, "iter": 3, "ha": 3, "particular": 3, "focu": 3, "distant": 3, "gener": 3, "topologi": 3, "scenario": 3, "while": 3, "progress": 3, "english": 3, "languag": [3, 5], "barrier": 3, "achiev": 3, "compar": 3, "non": 3, "multimod": 3, "base": 3, "process": [3, 6], "misp": 3, "multi": [3, 5, 6], "channel": 3, "parti": [3, 6], "instrument": 3, "seek": 3, "address": 3, "problem": 3, "visual": 3, "everydai": 3, "home": 3, "focus": 3, "tackl": 3, "issu": 3, "offlin": 3, "icassp2022": 3, "two": [3, 5, 7], "main": 3, "task": [3, 6, 7], "former": 3, "involv": [3, 6], "identifi": 3, "who": 3, "spoke": 3, "when": 3, "latter": 3, "aim": 3, "multipl": [3, 6], "simultan": 3, "pose": [3, 6], "technic": 3, "difficulti": 3, "interfer": 3, "build": [3, 6, 7], "success": [3, 7], "previou": 3, "excit": 3, "propos": [3, 7], "asru2023": [3, 7], "special": [3, 5, 7], "origin": [3, 5], "metric": [3, 7], "wa": [3, 6], "independ": 3, "meant": 3, "could": 3, "determin": 3, "correspond": [3, 5], "further": 3, "current": [3, 7], "talker": [3, 7], "toward": 3, "practic": 3, "attribut": [3, 7], "sub": [3, 5, 7], "track": [3, 5, 7], "By": [], "improv": [], "real": [], "world": [], "detail": [3, 6], "dataset": [3, 5, 6, 7], "rule": [3, 7], "method": 3, "facilit": [3, 7], "reproduc": [3, 7], "research": [3, 4, 7], "what": 3, "offer": 3, "comprehens": [3, 7], "overview": [3, 7], "furthermor": 3, "carefulli": 3, "curat": 3, "approxim": [3, 6], "design": 3, "enabl": 3, "valid": 3, "state": [3, 6, 7], "art": [3, 7], "area": 3, "mai": 3, "th": 3, "registr": 3, "deadlin": 3, "date": 3, "join": 3, "june": 3, "9": 3, "data": [3, 5, 6], "rd": 3, "final": [3, 5, 6], "submiss": 3, "19": 3, "juli": 3, "paper": [3, 6], "decemb": 3, "12": 3, "nd": 3, "16": 3, "asru": 3, "workshop": 3, "possibl": 6, "version": [], "interest": 3, "whether": 3, "academia": 3, "must": [3, 5, 6], "regist": 3, "complet": 3, "googl": 3, "form": 3, "here": 3, "work": 3, "dai": 3, "send": 3, "invit": 3, "elig": [3, 5], "team": 3, "qualifi": 3, "adher": [3, 5], "publish": 3, "page": 3, "prior": 3, "submit": 3, "descript": [3, 6], "document": 3, "approach": [3, 5], "top": 3, "proceed": 3, "lei": 4, "xie": 4, "professor": 4, "northwestern": 4, "polytechn": 4, "univers": 4, "china": 4, "lxie": 4, "nwpu": 4, "edu": 4, "receiv": [], "ph": [], "d": [], "degre": [], "scienc": [], "xi": [], "2004": [], "2001": [], "2002": [], "he": [], "depart": [], "electron": [], "vrije": [], "universiteit": [], "brussel": [], "vub": [], "belgium": [], "visit": [], "scientist": 4, "2006": [], "senior": 4, "associ": [], "center": [], "media": [], "school": [], "creativ": [], "citi": [], "hong": [], "kong": 4, "2007": [], "postdoctor": [], "fellow": [], "human": [], "commun": [], "laboratori": [], "hccl": [], "xian": [], "lead": [], "aslp": [], "npu": [], "200": [], "refer": 6, "journal": [], "ieee": 4, "acm": [], "transact": [], "multimedia": [], "interspeech": [], "icassp": [], "acl": [], "best": [], "award": [], "flagship": [], "hi": [], "interact": [], "dr": [], "editor": [], "ae": [], "tran": [], "activ": 6, "serv": [], "chair": [], "mani": [], "committe": [], "member": [], "aik": 4, "lee": 4, "institut": 4, "infocomm": 4, "star": 4, "singapor": 4, "kongaik": 4, "org": 4, "start": [], "off": [], "him": [], "career": [], "leader": [], "strateg": [], "plan": [], "2018": [], "2020": [], "spent": [], "half": [], "nec": [], "corpor": [], "japan": [], "veri": [], "much": [], "voic": 6, "biometr": [], "modal": [], "proud": [], "great": [], "featur": [], "bio": [], "idiom": [], "platform": [], "return": [], "now": [], "analyt": [], "pi": [], "elsevi": [], "sinc": [], "2016": [], "2017": [], "2021": [], "am": [], "elect": [], "2019": [], "zhiji": 4, "yan": 4, "princip": 4, "engin": 4, "alibaba": 4, "yzj": 4, "inc": 4, "hold": [], "phd": [], "electr": [], "expert": [], "review": [], "academ": [], "synthesi": [], "voiceprint": [], "appli": 4, "servic": [], "ant": [], "financi": [], "titl": [], "One": [], "100": 6, "grassroot": [], "shiliang": 4, "zhang": 4, "sly": 4, "zsl": 4, "graduat": [], "mainli": [], "understand": [], "machin": [], "learn": [], "40": [], "mainstream": [], "dozen": [], "patent": [], "after": [], "obtain": [5, 6], "doctor": [], "intellig": [], "direct": [], "fundament": [], "damo": [], "academi": [], "yanmin": 4, "qian": 4, "shanghai": 4, "jiao": 4, "tong": 4, "yanminqian": 4, "sjtu": 4, "b": [], "huazhong": [], "wuhan": [], "tsinghua": [], "beij": [], "2012": [], "2013": [], "where": 6, "2015": [], "cambridg": [], "k": [], "isca": [], "found": [], "kaldi": [], "toolkit": [], "than": [], "110": [], "4000": [], "citat": [], "kei": [], "word": [], "spot": [], "zhuo": 4, "chen": 4, "microsoft": 4, "usa": 4, "zhuc": 4, "columbia": [], "york": [], "ny": [], "author": [], "coauthor": [], "80": [], "peer": [], "6000": [], "ten": [], "separ": [], "diaris": [], "event": [], "won": [], "contribut": [], "sourc": 6, "wsj0": [], "2mix": [], "libricss": [], "benchmark": [], "jelinek": [], "student": [], "push": [], "jian": 4, "wu": 4, "wujian": 4, "master": [], "robust": [], "enhanc": [], "dereverber": [], "public": [], "1200": [], "chime5": [], "dn": [], "ffsvc": [], "slt": [], "taslp": [], "spl": [], "hui": 4, "bu": 4, "ceo": 4, "foundat": 4, "buhui": 4, "aishelldata": 4, "artifici": [], "korea": [], "2014": [], "founder": [], "dmash": [], "mia": [], "databas": [], "project": [], "co": [], "forum": [], "should": 5, "augment": 5, "allow": [5, 6], "ad": 5, "speed": 5, "perturb": 5, "tone": 5, "chang": 5, "permit": 5, "purpos": 5, "instead": [5, 6], "util": [5, 6], "tune": 5, "violat": 5, "strictli": [5, 6], "prohibit": [5, 6], "fine": 5, "fusion": 5, "structur": 5, "encourag": 5, "cpcer": [5, 6], "lower": 5, "judg": 5, "superior": 5, "forc": 5, "align": 5, "frame": 5, "level": 5, "classif": 5, "basi": 5, "shallow": 5, "end": 5, "e": [5, 6], "g": 5, "la": 5, "rnnt": 5, "transform": [5, 6], "come": 5, "right": 5, "interpret": 5, "belong": 5, "case": 5, "circumst": 5, "coordin": 5, "assign": 6, "illustr": 6, "aishell4": 6, "constrain": 6, "addition": 6, "corpu": 6, "soon": 6, "simpl": 6, "detect": 6, "vad": 6, "concaten": 6, "minimum": 6, "permut": 6, "charact": 6, "error": 6, "rate": 6, "calcul": 6, "step": 6, "firstli": 6, "hypothesi": 6, "chronolog": 6, "order": 6, "secondli": 6, "cer": 6, "repeat": 6, "lowest": 6, "tthe": 6, "insert": 6, "Ins": 6, "substitut": 6, "delet": 6, "del": 6, "output": 6, "text": 6, "frac": 6, "mathcal": 6, "n_": 6, "usag": 6, "third": 6, "hug": 6, "face": 6, "list": 6, "clearli": 6, "privat": 6, "manual": 6, "simul": 6, "thei": 6, "mandatori": 6, "clear": 6, "scheme": 6, "delight": 7, "introduct": 7, "contact": 7, "index": [], "modul": [], "search": []}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"baselin": 0, "overview": [0, 2], "quick": 0, "start": 0, "result": 0, "contact": 1, "dataset": 2, "train": [2, 6], "data": 2, "detail": 2, "alimeet": 2, "corpu": 2, "get": 2, "introduct": 3, "call": 3, "particip": 3, "timelin": 3, "aoe": 3, "time": 3, "guidelin": 3, "organ": 4, "rule": 5, "track": 6, "evalu": 6, "speaker": 6, "attribut": 6, "asr": 6, "main": 6, "metric": 6, "sub": 6, "arrang": 6, "i": 6, "fix": 6, "condit": 6, "ii": 6, "open": 6, "asru": 7, "2023": 7, "multi": 7, "channel": 7, "parti": 7, "meet": 7, "transcript": 7, "challeng": 7, "2": 7, "0": 7, "m2met2": 7, "content": 7, "indic": [], "tabl": []}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 57}, "alltitles": {"Baseline": [[0, "baseline"]], "Overview": [[0, "overview"]], "Quick start": [[0, "quick-start"]], "Baseline results": [[0, "baseline-results"]], "Contact": [[1, "contact"]], "Datasets": [[2, "datasets"]], "Overview of training data": [[2, "overview-of-training-data"]], "Detail of AliMeeting corpus": [[2, "detail-of-alimeeting-corpus"]], "Get the data": [[2, "get-the-data"]], "Introduction": [[3, "introduction"]], "Call for participation": [[3, "call-for-participation"]], "Timeline(AOE Time)": [[3, "timeline-aoe-time"]], "Guidelines": [[3, "guidelines"]], "Organizers": [[4, "organizers"]], "Rules": [[5, "rules"]], "Track & Evaluation": [[6, "track-evaluation"]], "Speaker-Attributed ASR (Main Track)": [[6, "speaker-attributed-asr-main-track"]], "Evaluation metric": [[6, "evaluation-metric"]], "Sub-track arrangement": [[6, "sub-track-arrangement"]], "Sub-track I (Fixed Training Condition):": [[6, "sub-track-i-fixed-training-condition"]], "Sub-track II (Open Training Condition):": [[6, "sub-track-ii-open-training-condition"]], "ASRU 2023 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE 2.0 (M2MeT2.0)": [[7, "asru-2023-multi-channel-multi-party-meeting-transcription-challenge-2-0-m2met2-0"]], "Contents:": [[7, null]]}, "indexentries": {}}) |
| | | Search.setIndex({"docnames": ["Baseline", "Contact", "Dataset", "Introduction", "Organizers", "Rules", "Track_setting_and_evaluation", "index"], "filenames": ["Baseline.md", "Contact.md", "Dataset.md", "Introduction.md", "Organizers.md", "Rules.md", "Track_setting_and_evaluation.md", "index.rst"], "titles": ["Baseline", "Contact", "Datasets", "Introduction", "Organizers", "Rules", "Track & Evaluation", "ASRU 2023 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE 2.0 (M2MeT2.0)"], "terms": {"we": [0, 2, 3, 7], "releas": [0, 2, 3, 6], "an": [0, 2, 3, 6], "e2": 0, "sa": 0, "asr": [0, 3, 7], "cite": 0, "kanda21b_interspeech": 0, "conduct": [0, 2], "funasr": 0, "time": [0, 6], "accord": [0, 3], "timelin": [0, 2], "The": [0, 2, 3, 5, 6], "model": [0, 2, 3, 5, 6], "architectur": 0, "i": [0, 2, 3, 5], "shown": [0, 2], "figur": [0, 6], "3": [0, 2, 3], "speakerencod": 0, "initi": 0, "pre": [0, 6], "train": [0, 3, 5, 7], "speaker": [0, 2, 3, 7], "verif": 0, "from": [0, 2, 3, 5, 6], "modelscop": [0, 6], "thi": [0, 3, 5, 6], "also": [0, 2, 6], "us": [0, 2, 5, 6], "extract": 0, "embed": 0, "profil": 0, "todo": 0, "fill": 0, "readm": 0, "md": 0, "system": [0, 3, 5, 6, 7], "ar": [0, 2, 3, 5, 6, 7], "tabl": [0, 2], "adopt": 0, "oracl": [0, 6], "dure": [0, 2, 6], "howev": [0, 3, 6], "due": [0, 3], "lack": 0, "label": [0, 5, 6], "evalu": [0, 2, 3, 7], "provid": [0, 2, 6, 7], "addit": [0, 6], "spectral": 0, "cluster": 0, "meanwhil": 0, "eval": [0, 2, 5, 6], "test": [0, 2, 3, 5, 6], "set": [0, 2, 3, 5, 6], "show": 0, "impact": 0, "accuraci": [0, 6], "If": [1, 5, 6], "you": 1, "have": [1, 3], "ani": [1, 5, 6], "question": 1, "about": 1, "m2met2": [1, 3], "0": [1, 2, 3], "challeng": [1, 3, 5, 6], "pleas": 1, "u": [1, 2], "email": [1, 3, 4], "m2met": [1, 3, 6, 7], "alimeet": [1, 6], "gmail": 1, "com": [1, 4], "wechat": 1, "group": [1, 2], "In": [2, 3, 5], "fix": [2, 3, 7], "condit": [2, 3, 7], "restrict": 2, "three": [2, 3, 6], "publicli": [2, 6], "avail": [2, 6], "corpora": 2, "name": 2, "aishel": [2, 4, 6], "4": [2, 6], "cn": [2, 4, 6], "celeb": [2, 6], "To": [2, 3, 7], "perform": [2, 3], "new": [2, 3, 6], "call": 2, "2023": [2, 3, 5, 6], "score": [2, 6], "rank": [2, 3, 6], "describ": 2, "contain": [2, 6], "118": 2, "75": 2, "hour": [2, 3, 6], "speech": [2, 3, 6, 7], "total": [2, 6], "divid": [2, 6], "104": 2, "10": [2, 3, 6], "specif": [2, 6], "212": 2, "8": 2, "20": 2, "session": [2, 3, 6, 7], "respect": 2, "each": [2, 3, 6], "consist": [2, 6], "15": 2, "30": 2, "minut": 2, "discuss": 2, "particip": [2, 5, 6], "number": [2, 3, 6], "456": 2, "25": 2, "60": 2, "balanc": 2, "gender": 2, "coverag": 2, "collect": 2, "13": [2, 3], "meet": [2, 3, 6], "venu": 2, "which": [2, 3, 6], "categor": 2, "type": 2, "small": 2, "medium": 2, "larg": [2, 3], "room": [2, 3], "size": 2, "rang": 2, "m": 2, "2": [2, 6], "55": 2, "differ": [2, 3, 5, 6], "give": 2, "varieti": 2, "acoust": [2, 3, 6], "properti": 2, "layout": 2, "paramet": [2, 5], "togeth": 2, "wall": 2, "materi": 2, "cover": 2, "cement": 2, "glass": 2, "etc": 2, "other": 2, "furnish": 2, "includ": [2, 3, 5, 6], "sofa": 2, "tv": 2, "blackboard": 2, "fan": 2, "air": 2, "condition": 2, "plant": 2, "record": [2, 6], "sit": 2, "around": 2, "microphon": [2, 3], "arrai": [2, 3], "place": 2, "natur": 2, "convers": 2, "distanc": 2, "5": [2, 3], "all": [2, 3, 5, 6], "nativ": 2, "chines": 2, "speak": [2, 3], "mandarin": [2, 3], "without": 2, "strong": 2, "accent": 2, "variou": [2, 3], "kind": 2, "indoor": 2, "nois": [2, 3, 5], "limit": [2, 3, 5], "click": 2, "keyboard": 2, "door": 2, "open": [2, 3, 7], "close": 2, "bubbl": 2, "made": [2, 3], "For": 2, "both": [2, 6], "requir": [2, 3, 6], "remain": [2, 3], "same": [2, 5], "posit": 2, "There": 2, "overlap": [2, 3], "between": [2, 6], "exampl": 2, "fig": 2, "1": 2, "within": [2, 3], "one": [2, 5], "ensur": 2, "ratio": 2, "select": [2, 3, 5, 6], "topic": 2, "medic": 2, "treatment": 2, "educ": 2, "busi": 2, "organ": [2, 3, 5, 6, 7], "manag": 2, "industri": [2, 3], "product": 2, "daili": 2, "routin": 2, "averag": 2, "42": 2, "27": 2, "34": 2, "76": 2, "more": 2, "A": [2, 4], "distribut": 2, "were": 2, "ident": [2, 6], "compris": [2, 3, 7], "therebi": 2, "share": 2, "similar": 2, "configur": 2, "field": [2, 3, 6], "signal": [2, 3], "headset": 2, "onli": [2, 5, 6], "": [2, 6], "own": 2, "transcrib": [2, 3, 6], "It": [2, 6], "worth": [2, 6], "note": [2, 6], "far": [2, 3], "audio": [2, 3, 6], "synchron": 2, "common": 2, "transcript": [2, 3, 5, 6], "prepar": 2, "textgrid": 2, "format": 2, "inform": [2, 3], "durat": 2, "id": 2, "segment": [2, 6], "timestamp": [2, 6], "mention": 2, "abov": 2, "can": [2, 3, 5, 6], "download": 2, "openslr": 2, "via": 2, "follow": [2, 5], "link": 2, "particularli": 2, "baselin": [2, 3, 7], "conveni": 2, "script": 2, "automat": [3, 7], "recognit": [3, 7], "diariz": 3, "signific": 3, "stride": 3, "recent": 3, "year": 3, "result": 3, "surg": 3, "technologi": 3, "applic": 3, "across": 3, "domain": 3, "present": 3, "uniqu": [3, 6], "complex": [3, 5], "divers": 3, "style": 3, "variabl": 3, "confer": 3, "environment": 3, "reverber": [3, 5], "over": 3, "sever": 3, "been": 3, "advanc": [3, 7], "develop": [3, 6], "rich": 3, "comput": [3, 5], "hear": 3, "multisourc": 3, "environ": 3, "chime": 3, "latest": 3, "iter": 3, "ha": 3, "particular": 3, "focu": 3, "distant": 3, "gener": 3, "topologi": 3, "scenario": 3, "while": 3, "progress": 3, "english": 3, "languag": [3, 5], "barrier": 3, "achiev": 3, "compar": 3, "non": 3, "multimod": 3, "base": 3, "process": [3, 6], "misp": 3, "multi": [3, 5, 6], "channel": 3, "parti": [3, 6], "instrument": 3, "seek": 3, "address": 3, "problem": 3, "visual": 3, "everydai": 3, "home": 3, "focus": 3, "tackl": 3, "issu": 3, "offlin": 3, "icassp2022": 3, "two": [3, 5, 7], "main": 3, "task": [3, 6, 7], "former": 3, "involv": [3, 6], "identifi": 3, "who": 3, "spoke": 3, "when": 3, "latter": 3, "aim": 3, "multipl": [3, 6], "simultan": 3, "pose": [3, 6], "technic": 3, "difficulti": 3, "interfer": 3, "build": [3, 6, 7], "success": [3, 7], "previou": 3, "excit": 3, "propos": [3, 7], "asru2023": [3, 7], "special": [3, 5, 7], "origin": [3, 5], "metric": [3, 7], "wa": [3, 6], "independ": 3, "meant": 3, "could": 3, "determin": 3, "correspond": [3, 5], "further": 3, "current": [3, 7], "talker": [3, 7], "toward": 3, "practic": 3, "attribut": [3, 7], "sub": [3, 5, 7], "track": [3, 5, 7], "what": 3, "facilit": [3, 7], "reproduc": [3, 7], "research": [3, 4, 7], "offer": 3, "comprehens": [3, 7], "overview": [3, 7], "dataset": [3, 5, 6, 7], "rule": [3, 7], "furthermor": 3, "carefulli": 3, "curat": 3, "approxim": [3, 6], "design": 3, "enabl": 3, "valid": 3, "state": [3, 6, 7], "art": [3, 7], "area": 3, "mai": 3, "th": 3, "registr": 3, "deadlin": 3, "date": 3, "join": 3, "june": 3, "9": 3, "data": [3, 5, 6], "rd": 3, "final": [3, 5, 6], "submiss": 3, "19": 3, "juli": 3, "paper": [3, 6], "decemb": 3, "12": 3, "nd": 3, "16": 3, "asru": 3, "workshop": 3, "interest": 3, "whether": 3, "academia": 3, "must": [3, 5, 6], "regist": 3, "complet": 3, "googl": 3, "form": 3, "below": 3, "work": 3, "dai": 3, "send": 3, "invit": 3, "elig": [3, 5], "team": 3, "qualifi": 3, "adher": [3, 5], "publish": 3, "page": 3, "prior": 3, "submit": 3, "descript": [3, 6], "document": 3, "detail": [3, 6], "approach": [3, 5], "method": 3, "top": 3, "proceed": 3, "lei": 4, "xie": 4, "professor": 4, "northwestern": 4, "polytechn": 4, "univers": 4, "china": 4, "lxie": 4, "nwpu": 4, "edu": 4, "kong": 4, "aik": 4, "lee": 4, "senior": 4, "scientist": 4, "institut": 4, "infocomm": 4, "star": 4, "singapor": 4, "kongaik": 4, "ieee": 4, "org": 4, "zhiji": 4, "yan": 4, "princip": 4, "engin": 4, "alibaba": 4, "yzj": 4, "inc": 4, "shiliang": 4, "zhang": 4, "sly": 4, "zsl": 4, "yanmin": 4, "qian": 4, "shanghai": 4, "jiao": 4, "tong": 4, "yanminqian": 4, "sjtu": 4, "zhuo": 4, "chen": 4, "appli": 4, "microsoft": 4, "usa": 4, "zhuc": 4, "jian": 4, "wu": 4, "wujian": 4, "hui": 4, "bu": 4, "ceo": 4, "foundat": 4, "buhui": 4, "aishelldata": 4, "should": 5, "augment": 5, "allow": [5, 6], "ad": 5, "speed": 5, "perturb": 5, "tone": 5, "chang": 5, "permit": 5, "purpos": 5, "instead": [5, 6], "util": [5, 6], "tune": 5, "violat": 5, "strictli": [5, 6], "prohibit": [5, 6], "fine": 5, "fusion": 5, "structur": 5, "encourag": 5, "cpcer": [5, 6], "lower": 5, "judg": 5, "superior": 5, "forc": 5, "align": 5, "obtain": [5, 6], "frame": 5, "level": 5, "classif": 5, "basi": 5, "shallow": 5, "end": 5, "e": [5, 6], "g": 5, "la": 5, "rnnt": 5, "transform": [5, 6], "come": 5, "right": 5, "interpret": 5, "belong": 5, "case": 5, "circumst": 5, "coordin": 5, "assign": 6, "illustr": 6, "aishell4": 6, "constrain": 6, "sourc": 6, "addition": 6, "corpu": 6, "soon": 6, "simpl": 6, "voic": 6, "activ": 6, "detect": 6, "vad": 6, "concaten": 6, "minimum": 6, "permut": 6, "charact": 6, "error": 6, "rate": 6, "calcul": 6, "step": 6, "firstli": 6, "refer": 6, "hypothesi": 6, "chronolog": 6, "order": 6, "secondli": 6, "cer": 6, "repeat": 6, "possibl": 6, "lowest": 6, "tthe": 6, "insert": 6, "Ins": 6, "substitut": 6, "delet": 6, "del": 6, "output": 6, "text": 6, "frac": 6, "mathcal": 6, "n_": 6, "100": 6, "where": 6, "usag": 6, "third": 6, "hug": 6, "face": 6, "list": 6, "clearli": 6, "privat": 6, "manual": 6, "simul": 6, "thei": 6, "mandatori": 6, "clear": 6, "scheme": 6, "delight": 7, "introduct": 7, "contact": 7}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"baselin": 0, "overview": [0, 2], "quick": 0, "start": 0, "result": 0, "contact": 1, "dataset": 2, "train": [2, 6], "data": 2, "detail": 2, "alimeet": 2, "corpu": 2, "get": 2, "introduct": 3, "call": 3, "particip": 3, "timelin": 3, "aoe": 3, "time": 3, "guidelin": 3, "organ": 4, "rule": 5, "track": 6, "evalu": 6, "speaker": 6, "attribut": 6, "asr": 6, "metric": 6, "sub": 6, "arrang": 6, "i": 6, "fix": 6, "condit": 6, "ii": 6, "open": 6, "asru": 7, "2023": 7, "multi": 7, "channel": 7, "parti": 7, "meet": 7, "transcript": 7, "challeng": 7, "2": 7, "0": 7, "m2met2": 7, "content": 7}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 57}, "alltitles": {"Baseline": [[0, "baseline"]], "Overview": [[0, "overview"]], "Quick start": [[0, "quick-start"]], "Baseline results": [[0, "baseline-results"]], "Contact": [[1, "contact"]], "Datasets": [[2, "datasets"]], "Overview of training data": [[2, "overview-of-training-data"]], "Detail of AliMeeting corpus": [[2, "detail-of-alimeeting-corpus"]], "Get the data": [[2, "get-the-data"]], "Introduction": [[3, "introduction"]], "Call for participation": [[3, "call-for-participation"]], "Timeline(AOE Time)": [[3, "timeline-aoe-time"]], "Guidelines": [[3, "guidelines"]], "Organizers": [[4, "organizers"]], "Rules": [[5, "rules"]], "Track & Evaluation": [[6, "track-evaluation"]], "Speaker-Attributed ASR": [[6, "speaker-attributed-asr"]], "Evaluation metric": [[6, "evaluation-metric"]], "Sub-track arrangement": [[6, "sub-track-arrangement"]], "Sub-track I (Fixed Training Condition):": [[6, "sub-track-i-fixed-training-condition"]], "Sub-track II (Open Training Condition):": [[6, "sub-track-ii-open-training-condition"]], "ASRU 2023 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE 2.0 (M2MeT2.0)": [[7, "asru-2023-multi-channel-multi-party-meeting-transcription-challenge-2-0-m2met2-0"]], "Contents:": [[7, null]]}, "indexentries": {}}) |
| | |
| | | å¨é宿°æ®éæ¡ä»¶ä¸ï¼è®ç»æ°æ®éä»
éäºä¸ä¸ªå
¬å¼çè¯æåºï¼å³AliMeetingãAISHELL-4åCN-Celebã为äºè¯ä¼°åèµè
æäº¤ç模åçæ§è½ï¼æä»¬å°åå¸ä¸ä¸ªæ°çæµè¯éï¼Test-2023ï¼ç¨äºæååæåãä¸é¢æä»¬å°è¯¦ç»æè¿°AliMeetingæ°æ®éåTest-2023æµè¯éã |
| | | |
| | | ## Alimeetingæ°æ®éä»ç» |
| | | AliMeetingæ»å
±å
å«118.75å°æ¶çè¯é³æ°æ®ï¼å
æ¬104.75å°æ¶çè®ç»éï¼Trainï¼ã4å°æ¶çéªè¯éï¼Evalï¼å10å°æ¶çæµè¯éï¼Testï¼ãTrainéåEvaléåå«å
å«212åºå8åºä¼è®®ï¼å
¶ä¸æ¯åºä¼è®®ç±å¤ä¸ªè¯´è¯äººè¿è¡15å°30åéç讨论ãTrainåEvaléä¸åä¸ä¼è®®çæ»äººæ°åå«ä¸º456人å25人ï¼å¹¶ä¸åä¼çç·å¥³æ¯ä¾äººæ°åè¡¡ã |
| | | AliMeetingæ»å
±å
å«118.75å°æ¶çè¯é³æ°æ®ï¼å
æ¬104.75å°æ¶çè®ç»éï¼Trainï¼ã4å°æ¶çéªè¯éï¼Evalï¼å10å°æ¶çæµè¯éï¼Testï¼ãTrainéï¼EvaléåTestéåå«å
å«212åºå8åºä¼è®®ï¼å
¶ä¸æ¯åºä¼è®®ç±å¤ä¸ªè¯´è¯äººè¿è¡15å°30åéç讨论ãTrainï¼EvalåTestéä¸åä¸ä¼è®®çæ»äººæ°åå«ä¸º456人å25人ï¼å¹¶ä¸åä¼çç·å¥³æ¯ä¾äººæ°åè¡¡ã |
| | | |
| | | è¯¥æ°æ®éæ¶éäº13个ä¸åçä¼è®®å®¤ï¼æç
§å¤§å°è§æ ¼å为å°åãä¸åå大åä¸ç§ï¼æ¿é´é¢ç§¯ä»8å°55å¹³æ¹ç±³ä¸çãä¸åæ¿é´å
·æä¸åçå¸å±å声å¦ç¹æ§ï¼æ¯ä¸ªæ¿é´ç详ç»åæ°ä¹å°åéç»åä¸è
ãä¼è®®åºå°çå¢ä½ææç±»åå
æ¬æ°´æ³¥ãç»ççãä¼è®®åºå°çå®¶å
·å
æ¬æ²åãçµè§ã黿¿ã飿ã空è°ãæ¤ç©çãå¨å½å¶è¿ç¨ä¸ï¼éº¦å
é£éµåæ¾ç½®äºæ¡ä¸ï¼å¤ä¸ªè¯´è¯äººå´å卿¡è¾¹è¿è¡èªç¶å¯¹è¯ã麦å
é£éµå离说è¯äººè·ç¦»çº¦0.3å°5.0ç±³ä¹é´ãææè¯´è¯äººçæ¯è¯åæ¯æ±è¯ï¼å¹¶ä¸è¯´ç齿¯æ®éè¯ï¼æ²¡ææµéçå£é³ãå¨ä¼è®®å½å¶æé´å¯è½ä¼äº§çåç§å®¤å
çåªé³ï¼å
æ¬é®ç声ãå¼é¨/å
³é¨å£°ãé£æå£°ãæ°æ³¡å£°çãææè¯´è¯äººå¨ä¼è®®çå½å¶æé´åä¿æç¸åä½ç½®ï¼ä¸åçèµ°å¨ãè®ç»éåéªè¯éç说è¯äººæ²¡æéå¤ãå¾1å±ç¤ºäºä¸ä¸ªä¼è®®å®¤çå¸å±ä»¥å麦å
é£çææç»æã |
| | | |
| | |  |
| | | |
| | | æ¯åºä¼è®®ç说è¯äººæ°éä»2å°4人ä¸çãåæ¶ä¸ºäºè¦çåç§å
容çä¼è®®åºæ¯ï¼æä»¬éæ©äºå¤ç§ä¼è®®ä¸»é¢ï¼å
æ¬å»çãæè²ãåä¸ãç»ç»ç®¡çãå·¥ä¸ç产çä¸åå
容çä¾ä¼ãTrainéï¼EvaléåTestéçå¹³åè¯é³éå çåå«ä¸º42.27\%å34.76\%ãAliMeeting Trainéï¼EvaléåTestéç详ç»ä¿¡æ¯è§è¡¨1ã表2æ¾ç¤ºäºTrainé,EvaléåTestéä¸ä¸ååè¨è
人æ°ä¼è®®çè¯é³éå çåä¼è®®æ°éã |
| | | æ¯åºä¼è®®ç说è¯äººæ°éä»2å°4人ä¸çãåæ¶ä¸ºäºè¦çåç§å
容çä¼è®®åºæ¯ï¼æä»¬éæ©äºå¤ç§ä¼è®®ä¸»é¢ï¼å
æ¬å»çãæè²ãåä¸ãç»ç»ç®¡çãå·¥ä¸ç产çä¸åå
容çä¾ä¼ãTrainéï¼EvaléåTestéçå¹³åè¯é³éå çåå«ä¸º42.27\%ï¼34.76\%å42.8\%ãAliMeeting Trainéï¼EvaléåTestéç详ç»ä¿¡æ¯è§è¡¨1ã表2æ¾ç¤ºäºTrainé,EvaléåTestéä¸ä¸ååè¨è
人æ°ä¼è®®çè¯é³éå çåä¼è®®æ°éã |
| | | |
| | |  |
| | | Test-2023æµè¯éç±20åºä¼è®®ç»æï¼è¿äºä¼è®®æ¯å¨ä¸AliMeetingæ°æ®éç¸åç声å¦ç¯å¢ä¸å½å¶çãTest-2023æµè¯éä¸çæ¯ä¸ªä¼è®®ç¯èç±2å°4个åä¸è
ç»æå¹¶ä¸ä¸AliMeetingæµè¯éçé
ç½®ç¸ä¼¼ã |
| | |
| | | |
| | | æ¥èªå¦æ¯çåå·¥ä¸ççææååèµè
ååºå¨2023å¹´5æ5æ¥åä¹åå¡«å䏿¹çè°·æè¡¨åï¼ |
| | | |
| | | [M2MET2.0æ¥å](https://docs.google.com/forms/d/e/1FAIpQLSf77T9vAl7Ym-u5g8gXu18SBofoWRaFShBo26Ym0-HDxHW9PQ/viewform?usp=sf_link) |
| | | |
| | | 䏻忹å°å¨3ä¸ªå·¥ä½æ¥å
éè¿çµåé®ä»¶éç¥ç¬¦åæ¡ä»¶çåèµå¢éï¼å¢éå¿
é¡»éµå®å°å¨ææç½ç«ä¸åå¸çææè§åã卿ååå¸ä¹åï¼æ¯ä¸ªåèµè
å¿
é¡»æäº¤ä¸ä»½ç³»ç»æè¿°æä»¶ï¼è¯¦ç»è¯´æä½¿ç¨çæ¹æ³å模åã䏻忹å°éæ©åä¸å纳å
¥ASRU2023论æéã |
| | |
| | | # èµé设置ä¸è¯ä¼° |
| | | ## 说è¯äººç¸å
³çè¯é³è¯å« (主èµé) |
| | | 说è¯äººç¸å
³çASRä»»å¡éè¦ä»éå çè¯é³ä¸è¯å«æ¯ä¸ªè¯´è¯äººçè¯é³ï¼å¹¶ä¸ºè¯å«å
容åé
ä¸ä¸ªè¯´è¯äººæ ç¾ãå¾2å±ç¤ºäºè¯´è¯äººç¸å
³è¯é³è¯å«ä»»å¡åå¤è¯´è¯äººè¯é³è¯å«ä»»å¡ç主è¦åºå«ã卿¬æ¬¡ç«èµä¸AliMeetingãAishell4åCn-Celebæ°æ®éå¯ä½ä¸ºåéæ°æ®æºãå¨M2MeTææèµä¸ä½¿ç¨çAliMeetingæ°æ®éå
å«è®ç»ãè¯ä¼°åæµè¯éï¼å¨M2MET2.0å¯ä»¥å¨è®ç»åè¯ä¼°ä¸ä½¿ç¨ãæ¤å¤ï¼ä¸ä¸ªå
å«çº¦10å°æ¶ä¼è®®æ°æ®çæ°çTest-2023éå°æ ¹æ®èµç¨å®æåå¸å¹¶ç¨äºææèµçè¯ååæåãå¼å¾æ³¨æçæ¯ï¼ç»ç»è
å°ä¸æä¾è³æºçè¿åºé³é¢ã转å½ä»¥åç宿¶é´æ³ã䏻忹å°ä¸åæä¾æ¯ä¸ªè¯´è¯äººçç宿¶é´æ³ï¼èæ¯å¨Test-2023é䏿ä¾å
å«å¤ä¸ªè¯´è¯äººççæ®µãè¿äºç段å¯ä»¥éè¿ä¸ä¸ªç®åçvad模åè·å¾ã |
| | | ## 说è¯äººç¸å
³çè¯é³è¯å« |
| | | 说è¯äººç¸å
³çASRä»»å¡éè¦ä»éå çè¯é³ä¸è¯å«æ¯ä¸ªè¯´è¯äººçè¯é³ï¼å¹¶ä¸ºè¯å«å
容åé
ä¸ä¸ªè¯´è¯äººæ ç¾ãå¾2å±ç¤ºäºè¯´è¯äººç¸å
³è¯é³è¯å«ä»»å¡åå¤è¯´è¯äººè¯é³è¯å«ä»»å¡ç主è¦åºå«ã卿¬æ¬¡ç«èµä¸AliMeetingãAishell4åCn-Celebæ°æ®éå¯ä½ä¸ºåéæ°æ®æºãå¨M2MeTææèµä¸ä½¿ç¨çAliMeetingæ°æ®éå
å«è®ç»ãè¯ä¼°åæµè¯éï¼å¨M2MET2.0å¯ä»¥å¨è®ç»åè¯ä¼°ä¸ä½¿ç¨ãæ¤å¤ï¼ä¸ä¸ªå
å«çº¦10å°æ¶ä¼è®®æ°æ®çæ°çTest-2023éå°æ ¹æ®èµç¨å®æåå¸å¹¶ç¨äºææèµçè¯ååæåãå¼å¾æ³¨æçæ¯ï¼å¯¹äºTest-2023æµè¯éï¼ä¸»åæ¹å°ä¸åæä¾è³æºçè¿åºé³é¢ã转å½ä»¥åç宿¶é´æ³ãèæ¯æä¾å¯ä»¥éè¿ä¸ä¸ªç®åçVAD模åå¾å°çå
å«å¤ä¸ªè¯´è¯äººççæ®µã |
| | | |
| | |  |
| | | |
| | |
| | | å
¶ä¸ $\mathcal N_{\text{Ins}}$ , $\mathcal N_{\text{Sub}}$ , $\mathcal N_{\text{Del}}$ æ¯ä¸ç§é误çå符æ°, $\mathcal N_{\text{Total}}$ æ¯åç¬¦æ»æ°. |
| | | ## åèµé设置 |
| | | ### åèµéä¸ (éå®è®ç»æ°æ®): |
| | | åèµè
å¨ç³»ç»æå»ºè¿ç¨ä¸ä»
è½ä½¿ç¨AliMeetingãAISHELL-4åCN Celebï¼ä¸¥ç¦ä½¿ç¨é¢å¤æ°æ®ãåèµè
å¯ä»¥ä»»ä½ç¬¬ä¸æ¹å¼æºçé¢è®ç»æ¨¡åï¼å¦[Hugging Face](https://huggingface.co/models)以å[ModelScope](https://www.modelscope.cn/models)䏿ä¾ç模åãåèµè
éè¦å¨æç»çç³»ç»æè¿°ææ¡£ä¸è¯¦ç»ååºä½¿ç¨çé¢è®ç»æ¨¡åå称以å龿¥ã |
| | | åèµè
å¨ç³»ç»æå»ºè¿ç¨ä¸ä»
è½ä½¿ç¨AliMeetingãAISHELL-4åCN-Celebï¼ä¸¥ç¦ä½¿ç¨é¢å¤æ°æ®ãåèµè
å¯ä»¥ä»»ä½ç¬¬ä¸æ¹å¼æºçé¢è®ç»æ¨¡åï¼å¦[Hugging Face](https://huggingface.co/models)以å[ModelScope](https://www.modelscope.cn/models)䏿ä¾ç模åãåèµè
éè¦å¨æç»çç³»ç»æè¿°ææ¡£ä¸è¯¦ç»ååºä½¿ç¨çé¢è®ç»æ¨¡åå称以å龿¥ã |
| | | ### åèµéäº (弿¾è®ç»æ°æ®): |
| | | é¤äºé宿°æ®å¤ï¼åä¸è
å¯ä»¥ä½¿ç¨ä»»ä½å
¬å¼å¯ç¨ãç§äººå½å¶å模æä»¿ççæ°æ®éã使¯ï¼åä¸è
å¿
é¡»æ¸
æ¥å°ååºä½¿ç¨çæ°æ®ãåæ ·ï¼åèµè
ä¹å¯ä»¥ä½¿ç¨ä»»ä½ç¬¬ä¸æ¹å¼æºçé¢è®ç»æ¨¡åï¼ä½å¿
须卿åçç³»ç»æè¿°æä»¶ä¸æç¡®çååºæä½¿ç¨çæ°æ®å模å龿¥ï¼å¦æä½¿ç¨æ¨¡æä»¿çæ°æ®ï¼è¯·è¯¦ç»æè¿°æ°æ®æ¨¡æçæ¹æ¡ã |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html">èµé设置ä¸è¯ä¼°</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id2">说è¯äººç¸å
³çè¯é³è¯å« (主èµé)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id2">说è¯äººç¸å
³çè¯é³è¯å«</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id3">è¯ä¼°æ¹æ³</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id4">åèµé设置</a></li> |
| | | </ul> |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html">èµé设置ä¸è¯ä¼°</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id2">说è¯äººç¸å
³çè¯é³è¯å« (主èµé)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id2">说è¯äººç¸å
³çè¯é³è¯å«</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id3">è¯ä¼°æ¹æ³</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id4">åèµé设置</a></li> |
| | | </ul> |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html">èµé设置ä¸è¯ä¼°</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id2">说è¯äººç¸å
³çè¯é³è¯å« (主èµé)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id2">说è¯äººç¸å
³çè¯é³è¯å«</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id3">è¯ä¼°æ¹æ³</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id4">åèµé设置</a></li> |
| | | </ul> |
| | |
| | | Search.setIndex({"docnames": ["index", "\u57fa\u7ebf", "\u6570\u636e\u96c6", "\u7b80\u4ecb", "\u7ec4\u59d4\u4f1a", "\u8054\u7cfb\u65b9\u5f0f", "\u89c4\u5219", "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30"], "filenames": ["index.rst", "\u57fa\u7ebf.md", "\u6570\u636e\u96c6.md", "\u7b80\u4ecb.md", "\u7ec4\u59d4\u4f1a.md", "\u8054\u7cfb\u65b9\u5f0f.md", "\u89c4\u5219.md", "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30.md"], "titles": ["ASRU 2023 \u591a\u901a\u9053\u591a\u65b9\u4f1a\u8bae\u8f6c\u5f55\u6311\u6218 2.0", "\u57fa\u7ebf", "\u6570\u636e\u96c6", "\u7b80\u4ecb", "\u7ec4\u59d4\u4f1a", "\u8054\u7cfb\u65b9\u5f0f", "\u7ade\u8d5b\u89c4\u5219", "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30"], "terms": {"m2met": [0, 3, 5, 7], "asru2023": [0, 3], "m2met2": [0, 3, 5, 7], "contact": [], "funasr": 1, "sa": 1, "asr": [1, 3, 7], "speakerencod": 1, "modelscop": [1, 7], "todo": 1, "fill": 1, "with": 1, "the": 1, "readm": 1, "md": 1, "of": 1, "baselin": [1, 2], "aishel": [2, 7], "cn": [2, 4, 7], "celeb": [2, 7], "test": [2, 6, 7], "2023": [2, 3, 6, 7], "118": 2, "75": 2, "104": 2, "train": 2, "eval": [2, 6], "10": [2, 3, 7], "212": 2, "15": 2, "30": 2, "456": 2, "25": 2, "13": [2, 3], "55": 2, "42": 2, "27": 2, "34": 2, "76": 2, "20": 2, "textgrid": 2, "id": 2, "openslr": 2, "automat": 3, "speech": 3, "recognit": 3, "speaker": 3, "diariz": 3, "rich": 3, "transcript": 3, "evalu": 3, "chime": 3, "comput": 3, "hear": 3, "in": 3, "multisourc": 3, "environ": 3, "misp": 3, "multimod": 3, "inform": 3, "base": 3, "process": 3, "multi": 3, "channel": 3, "parti": 3, "meet": 3, "assp2022": 3, "19": 3, "12": 3, "asru": 3, "workshop": 3, "lxie": 4, "nwpu": 4, "edu": 4, "kong": 4, "aik": 4, "lee": 4, "star": 4, "kongaik": 4, "ieee": 4, "org": 4, "zhiji": 4, "yzj": 4, "alibaba": 4, "inc": 4, "com": [4, 5], "sli": 4, "zsl": 4, "yanminqian": 4, "sjtu": 4, "zhuc": 4, "microsoft": 4, "wujian": 4, "ceo": 4, "buhui": 4, "aishelldata": 4, "alimeet": [5, 7], "gmail": 5, "cpcer": [6, 7], "las": 6, "rnnt": 6, "transform": 6, "aishell4": 7, "vad": 7, "cer": 7, "ins": 7, "sub": 7, "del": 7, "text": 7, "frac": 7, "mathcal": 7, "n_": 7, "total": 7, "time": 7, "100": 7, "hug": 7, "face": 7}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"asru": 0, "2023": 0, "indic": [], "and": [], "tabl": [], "alimeet": 2, "aoe": 3, "contact": []}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 57}, "alltitles": {"\u8054\u7cfb\u65b9\u5f0f": [[5, "id1"]], "\u57fa\u7ebf": [[1, "id1"]], "\u57fa\u7ebf\u6982\u8ff0": [[1, "id2"]], "\u5feb\u901f\u5f00\u59cb": [[1, "id3"]], "\u57fa\u7ebf\u7ed3\u679c": [[1, "id4"]], "\u6570\u636e\u96c6": [[2, "id1"]], "\u6570\u636e\u96c6\u6982\u8ff0": [[2, "id2"]], "Alimeeting\u6570\u636e\u96c6\u4ecb\u7ecd": [[2, "alimeeting"]], "\u83b7\u53d6\u6570\u636e": [[2, "id3"]], "\u7b80\u4ecb": [[3, "id1"]], "\u7ade\u8d5b\u4ecb\u7ecd": [[3, "id2"]], "\u65f6\u95f4\u5b89\u6392(AOE\u65f6\u95f4)": [[3, "aoe"]], "\u7ade\u8d5b\u62a5\u540d": [[3, "id3"]], "\u7ec4\u59d4\u4f1a": [[4, "id1"]], "\u7ade\u8d5b\u89c4\u5219": [[6, "id1"]], "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30": [[7, "id1"]], "\u8bf4\u8bdd\u4eba\u76f8\u5173\u7684\u8bed\u97f3\u8bc6\u522b (\u4e3b\u8d5b\u9053)": [[7, "id2"]], "\u8bc4\u4f30\u65b9\u6cd5": [[7, "id3"]], "\u5b50\u8d5b\u9053\u8bbe\u7f6e": [[7, "id4"]], "\u5b50\u8d5b\u9053\u4e00 (\u9650\u5b9a\u8bad\u7ec3\u6570\u636e):": [[7, "id5"]], "\u5b50\u8d5b\u9053\u4e8c (\u5f00\u653e\u8bad\u7ec3\u6570\u636e):": [[7, "id6"]], "ASRU 2023 \u591a\u901a\u9053\u591a\u65b9\u4f1a\u8bae\u8f6c\u5f55\u6311\u6218 2.0": [[0, "asru-2023-2-0"]], "\u76ee\u5f55:": [[0, null]]}, "indexentries": {}}) |
| | | Search.setIndex({"docnames": ["index", "\u57fa\u7ebf", "\u6570\u636e\u96c6", "\u7b80\u4ecb", "\u7ec4\u59d4\u4f1a", "\u8054\u7cfb\u65b9\u5f0f", "\u89c4\u5219", "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30"], "filenames": ["index.rst", "\u57fa\u7ebf.md", "\u6570\u636e\u96c6.md", "\u7b80\u4ecb.md", "\u7ec4\u59d4\u4f1a.md", "\u8054\u7cfb\u65b9\u5f0f.md", "\u89c4\u5219.md", "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30.md"], "titles": ["ASRU 2023 \u591a\u901a\u9053\u591a\u65b9\u4f1a\u8bae\u8f6c\u5f55\u6311\u6218 2.0", "\u57fa\u7ebf", "\u6570\u636e\u96c6", "\u7b80\u4ecb", "\u7ec4\u59d4\u4f1a", "\u8054\u7cfb\u65b9\u5f0f", "\u7ade\u8d5b\u89c4\u5219", "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30"], "terms": {"m2met": [0, 3, 5, 7], "asru2023": [0, 3], "m2met2": [0, 3, 5, 7], "funasr": 1, "sa": 1, "asr": [1, 3, 7], "speakerencod": 1, "modelscop": [1, 7], "todo": 1, "fill": 1, "with": 1, "the": 1, "readm": 1, "md": 1, "of": 1, "baselin": [1, 2], "aishel": [2, 7], "cn": [2, 4, 7], "celeb": [2, 7], "test": [2, 6, 7], "2023": [2, 3, 6, 7], "118": 2, "75": 2, "104": 2, "train": 2, "eval": [2, 6], "10": [2, 3, 7], "212": 2, "15": 2, "30": 2, "456": 2, "25": 2, "13": [2, 3], "55": 2, "42": 2, "27": 2, "34": 2, "76": 2, "20": 2, "textgrid": 2, "id": 2, "openslr": 2, "automat": 3, "speech": 3, "recognit": 3, "speaker": 3, "diariz": 3, "rich": 3, "transcript": 3, "evalu": 3, "chime": 3, "comput": 3, "hear": 3, "in": 3, "multisourc": 3, "environ": 3, "misp": 3, "multimod": 3, "inform": 3, "base": 3, "process": 3, "multi": 3, "channel": 3, "parti": 3, "meet": 3, "assp2022": 3, "19": 3, "12": 3, "asru": 3, "workshop": 3, "lxie": 4, "nwpu": 4, "edu": 4, "kong": 4, "aik": 4, "lee": 4, "star": 4, "kongaik": 4, "ieee": 4, "org": 4, "zhiji": 4, "yzj": 4, "alibaba": 4, "inc": 4, "com": [4, 5], "sli": 4, "zsl": 4, "yanminqian": 4, "sjtu": 4, "zhuc": 4, "microsoft": 4, "wujian": 4, "ceo": 4, "buhui": 4, "aishelldata": 4, "alimeet": [5, 7], "gmail": 5, "cpcer": [6, 7], "las": 6, "rnnt": 6, "transform": 6, "aishell4": 7, "vad": 7, "cer": 7, "ins": 7, "sub": 7, "del": 7, "text": 7, "frac": 7, "mathcal": 7, "n_": 7, "total": 7, "time": 7, "100": 7, "hug": 7, "face": 7}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"asru": 0, "2023": 0, "alimeet": 2, "aoe": 3}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 57}, "alltitles": {"ASRU 2023 \u591a\u901a\u9053\u591a\u65b9\u4f1a\u8bae\u8f6c\u5f55\u6311\u6218 2.0": [[0, "asru-2023-2-0"]], "\u76ee\u5f55:": [[0, null]], "\u57fa\u7ebf": [[1, "id1"]], "\u57fa\u7ebf\u6982\u8ff0": [[1, "id2"]], "\u5feb\u901f\u5f00\u59cb": [[1, "id3"]], "\u57fa\u7ebf\u7ed3\u679c": [[1, "id4"]], "\u6570\u636e\u96c6": [[2, "id1"]], "\u6570\u636e\u96c6\u6982\u8ff0": [[2, "id2"]], "Alimeeting\u6570\u636e\u96c6\u4ecb\u7ecd": [[2, "alimeeting"]], "\u83b7\u53d6\u6570\u636e": [[2, "id3"]], "\u7b80\u4ecb": [[3, "id1"]], "\u7ade\u8d5b\u4ecb\u7ecd": [[3, "id2"]], "\u65f6\u95f4\u5b89\u6392(AOE\u65f6\u95f4)": [[3, "aoe"]], "\u7ade\u8d5b\u62a5\u540d": [[3, "id3"]], "\u7ec4\u59d4\u4f1a": [[4, "id1"]], "\u8054\u7cfb\u65b9\u5f0f": [[5, "id1"]], "\u7ade\u8d5b\u89c4\u5219": [[6, "id1"]], "\u8d5b\u9053\u8bbe\u7f6e\u4e0e\u8bc4\u4f30": [[7, "id1"]], "\u8bf4\u8bdd\u4eba\u76f8\u5173\u7684\u8bed\u97f3\u8bc6\u522b": [[7, "id2"]], "\u8bc4\u4f30\u65b9\u6cd5": [[7, "id3"]], "\u5b50\u8d5b\u9053\u8bbe\u7f6e": [[7, "id4"]], "\u5b50\u8d5b\u9053\u4e00 (\u9650\u5b9a\u8bad\u7ec3\u6570\u636e):": [[7, "id5"]], "\u5b50\u8d5b\u9053\u4e8c (\u5f00\u653e\u8bad\u7ec3\u6570\u636e):": [[7, "id6"]]}, "indexentries": {}}) |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html">èµé设置ä¸è¯ä¼°</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id2">说è¯äººç¸å
³çè¯é³è¯å« (主èµé)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id2">说è¯äººç¸å
³çè¯é³è¯å«</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id3">è¯ä¼°æ¹æ³</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id4">åèµé设置</a></li> |
| | | </ul> |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html">èµé设置ä¸è¯ä¼°</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id2">说è¯äººç¸å
³çè¯é³è¯å« (主èµé)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id2">说è¯äººç¸å
³çè¯é³è¯å«</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id3">è¯ä¼°æ¹æ³</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id4">åèµé设置</a></li> |
| | | </ul> |
| | |
| | | </section> |
| | | <section id="alimeeting"> |
| | | <h2>Alimeetingæ°æ®éä»ç»<a class="headerlink" href="#alimeeting" title="æ¤æ é¢çæ°¸ä¹
龿¥">¶</a></h2> |
| | | <p>AliMeetingæ»å
±å
å«118.75å°æ¶çè¯é³æ°æ®ï¼å
æ¬104.75å°æ¶çè®ç»éï¼Trainï¼ã4å°æ¶çéªè¯éï¼Evalï¼å10å°æ¶çæµè¯éï¼Testï¼ãTrainéåEvaléåå«å
å«212åºå8åºä¼è®®ï¼å
¶ä¸æ¯åºä¼è®®ç±å¤ä¸ªè¯´è¯äººè¿è¡15å°30åéç讨论ãTrainåEvaléä¸åä¸ä¼è®®çæ»äººæ°åå«ä¸º456人å25人ï¼å¹¶ä¸åä¼çç·å¥³æ¯ä¾äººæ°åè¡¡ã</p> |
| | | <p>AliMeetingæ»å
±å
å«118.75å°æ¶çè¯é³æ°æ®ï¼å
æ¬104.75å°æ¶çè®ç»éï¼Trainï¼ã4å°æ¶çéªè¯éï¼Evalï¼å10å°æ¶çæµè¯éï¼Testï¼ãTrainéï¼EvaléåTestéåå«å
å«212åºå8åºä¼è®®ï¼å
¶ä¸æ¯åºä¼è®®ç±å¤ä¸ªè¯´è¯äººè¿è¡15å°30åéç讨论ãTrainï¼EvalåTestéä¸åä¸ä¼è®®çæ»äººæ°åå«ä¸º456人å25人ï¼å¹¶ä¸åä¼çç·å¥³æ¯ä¾äººæ°åè¡¡ã</p> |
| | | <p>è¯¥æ°æ®éæ¶éäº13个ä¸åçä¼è®®å®¤ï¼æç
§å¤§å°è§æ ¼å为å°åãä¸åå大åä¸ç§ï¼æ¿é´é¢ç§¯ä»8å°55å¹³æ¹ç±³ä¸çãä¸åæ¿é´å
·æä¸åçå¸å±å声å¦ç¹æ§ï¼æ¯ä¸ªæ¿é´ç详ç»åæ°ä¹å°åéç»åä¸è
ãä¼è®®åºå°çå¢ä½ææç±»åå
æ¬æ°´æ³¥ãç»ççãä¼è®®åºå°çå®¶å
·å
æ¬æ²åãçµè§ã黿¿ã飿ã空è°ãæ¤ç©çãå¨å½å¶è¿ç¨ä¸ï¼éº¦å
é£éµåæ¾ç½®äºæ¡ä¸ï¼å¤ä¸ªè¯´è¯äººå´å卿¡è¾¹è¿è¡èªç¶å¯¹è¯ã麦å
é£éµå离说è¯äººè·ç¦»çº¦0.3å°5.0ç±³ä¹é´ãææè¯´è¯äººçæ¯è¯åæ¯æ±è¯ï¼å¹¶ä¸è¯´ç齿¯æ®éè¯ï¼æ²¡ææµéçå£é³ãå¨ä¼è®®å½å¶æé´å¯è½ä¼äº§çåç§å®¤å
çåªé³ï¼å
æ¬é®ç声ãå¼é¨/å
³é¨å£°ãé£æå£°ãæ°æ³¡å£°çãææè¯´è¯äººå¨ä¼è®®çå½å¶æé´åä¿æç¸åä½ç½®ï¼ä¸åçèµ°å¨ãè®ç»éåéªè¯éç说è¯äººæ²¡æéå¤ãå¾1å±ç¤ºäºä¸ä¸ªä¼è®®å®¤çå¸å±ä»¥å麦å
é£çææç»æã</p> |
| | | <p><img alt="meeting room" src="_images/meeting_room.png" /></p> |
| | | <p>æ¯åºä¼è®®ç说è¯äººæ°éä»2å°4人ä¸çãåæ¶ä¸ºäºè¦çåç§å
容çä¼è®®åºæ¯ï¼æä»¬éæ©äºå¤ç§ä¼è®®ä¸»é¢ï¼å
æ¬å»çãæè²ãåä¸ãç»ç»ç®¡çãå·¥ä¸ç产çä¸åå
容çä¾ä¼ãTrainéï¼EvaléåTestéçå¹³åè¯é³éå çåå«ä¸º42.27%å34.76%ãAliMeeting Trainéï¼EvaléåTestéç详ç»ä¿¡æ¯è§è¡¨1ã表2æ¾ç¤ºäºTrainé,EvaléåTestéä¸ä¸ååè¨è
人æ°ä¼è®®çè¯é³éå çåä¼è®®æ°éã</p> |
| | | <p>æ¯åºä¼è®®ç说è¯äººæ°éä»2å°4人ä¸çãåæ¶ä¸ºäºè¦çåç§å
容çä¼è®®åºæ¯ï¼æä»¬éæ©äºå¤ç§ä¼è®®ä¸»é¢ï¼å
æ¬å»çãæè²ãåä¸ãç»ç»ç®¡çãå·¥ä¸ç产çä¸åå
容çä¾ä¼ãTrainéï¼EvaléåTestéçå¹³åè¯é³éå çåå«ä¸º42.27%ï¼34.76%å42.8%ãAliMeeting Trainéï¼EvaléåTestéç详ç»ä¿¡æ¯è§è¡¨1ã表2æ¾ç¤ºäºTrainé,EvaléåTestéä¸ä¸ååè¨è
人æ°ä¼è®®çè¯é³éå çåä¼è®®æ°éã</p> |
| | | <p><img alt="dataset detail" src="_images/dataset_detail.png" /> |
| | | Test-2023æµè¯éç±20åºä¼è®®ç»æï¼è¿äºä¼è®®æ¯å¨ä¸AliMeetingæ°æ®éç¸åç声å¦ç¯å¢ä¸å½å¶çãTest-2023æµè¯éä¸çæ¯ä¸ªä¼è®®ç¯èç±2å°4个åä¸è
ç»æå¹¶ä¸ä¸AliMeetingæµè¯éçé
ç½®ç¸ä¼¼ã</p> |
| | | <p>æä»¬è¿ä½¿ç¨è³æºéº¦å
é£è®°å½äºæ¯ä¸ªè¯´è¯äººçè¿åºé³é¢ä¿¡å·ï¼å¹¶ç¡®ä¿åªè½¬å½å¯¹åºè¯´è¯äººèªå·±çè¯é³ãéè¦æ³¨æçæ¯ï¼éº¦å
é£éµåè®°å½çè¿åºé³é¢åè³æºéº¦å
é£è®°å½çè¿åºé³é¢å¨æ¶é´ä¸æ¯åæ¥çãæ¯åºä¼è®®çææææ¬å以TextGridæ ¼å¼åå¨ï¼å
容å
æ¬ä¼è®®çæ¶é¿ã说è¯äººä¿¡æ¯ï¼è¯´è¯äººæ°éã说è¯äººIDãæ§å«çï¼ãæ¯ä¸ªè¯´è¯äººççæ®µæ»æ°ãæ¯ä¸ªçæ®µçæ¶é´æ³å转å½å
容ã</p> |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html">èµé设置ä¸è¯ä¼°</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id2">说è¯äººç¸å
³çè¯é³è¯å« (主èµé)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id2">说è¯äººç¸å
³çè¯é³è¯å«</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id3">è¯ä¼°æ¹æ³</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id4">åèµé设置</a></li> |
| | | </ul> |
| | |
| | | <section id="id3"> |
| | | <h2>ç«èµæ¥å<a class="headerlink" href="#id3" title="æ¤æ é¢çæ°¸ä¹
龿¥">¶</a></h2> |
| | | <p>æ¥èªå¦æ¯çåå·¥ä¸ççææååèµè
ååºå¨2023å¹´5æ5æ¥åä¹åå¡«å䏿¹çè°·æè¡¨åï¼</p> |
| | | <p><a class="reference external" href="https://docs.google.com/forms/d/e/1FAIpQLSf77T9vAl7Ym-u5g8gXu18SBofoWRaFShBo26Ym0-HDxHW9PQ/viewform?usp=sf_link">M2MET2.0æ¥å</a></p> |
| | | <p>䏻忹å°å¨3ä¸ªå·¥ä½æ¥å
éè¿çµåé®ä»¶éç¥ç¬¦åæ¡ä»¶çåèµå¢éï¼å¢éå¿
é¡»éµå®å°å¨ææç½ç«ä¸åå¸çææè§åã卿ååå¸ä¹åï¼æ¯ä¸ªåèµè
å¿
é¡»æäº¤ä¸ä»½ç³»ç»æè¿°æä»¶ï¼è¯¦ç»è¯´æä½¿ç¨çæ¹æ³å模åã䏻忹å°éæ©åä¸å纳å
¥ASRU2023论æéã</p> |
| | | </section> |
| | | </section> |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html">èµé设置ä¸è¯ä¼°</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id2">说è¯äººç¸å
³çè¯é³è¯å« (主èµé)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id2">说è¯äººç¸å
³çè¯é³è¯å«</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id3">è¯ä¼°æ¹æ³</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id4">åèµé设置</a></li> |
| | | </ul> |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html">èµé设置ä¸è¯ä¼°</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id2">说è¯äººç¸å
³çè¯é³è¯å« (主èµé)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id2">说è¯äººç¸å
³çè¯é³è¯å«</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id3">è¯ä¼°æ¹æ³</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id4">åèµé设置</a></li> |
| | | </ul> |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html">èµé设置ä¸è¯ä¼°</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id2">说è¯äººç¸å
³çè¯é³è¯å« (主èµé)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id2">说è¯äººç¸å
³çè¯é³è¯å«</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id3">è¯ä¼°æ¹æ³</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="%E8%B5%9B%E9%81%93%E8%AE%BE%E7%BD%AE%E4%B8%8E%E8%AF%84%E4%BC%B0.html#id4">åèµé设置</a></li> |
| | | </ul> |
| | |
| | | </ul> |
| | | </li> |
| | | <li class="toctree-l1 current"><a class="current reference internal" href="#">èµé设置ä¸è¯ä¼°</a><ul> |
| | | <li class="toctree-l2"><a class="reference internal" href="#id2">说è¯äººç¸å
³çè¯é³è¯å« (主èµé)</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="#id2">说è¯äººç¸å
³çè¯é³è¯å«</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="#id3">è¯ä¼°æ¹æ³</a></li> |
| | | <li class="toctree-l2"><a class="reference internal" href="#id4">åèµé设置</a></li> |
| | | </ul> |
| | |
| | | <section id="id1"> |
| | | <h1>èµé设置ä¸è¯ä¼°<a class="headerlink" href="#id1" title="æ¤æ é¢çæ°¸ä¹
龿¥">¶</a></h1> |
| | | <section id="id2"> |
| | | <h2>说è¯äººç¸å
³çè¯é³è¯å« (主èµé)<a class="headerlink" href="#id2" title="æ¤æ é¢çæ°¸ä¹
龿¥">¶</a></h2> |
| | | <p>说è¯äººç¸å
³çASRä»»å¡éè¦ä»éå çè¯é³ä¸è¯å«æ¯ä¸ªè¯´è¯äººçè¯é³ï¼å¹¶ä¸ºè¯å«å
容åé
ä¸ä¸ªè¯´è¯äººæ ç¾ãå¾2å±ç¤ºäºè¯´è¯äººç¸å
³è¯é³è¯å«ä»»å¡åå¤è¯´è¯äººè¯é³è¯å«ä»»å¡ç主è¦åºå«ã卿¬æ¬¡ç«èµä¸AliMeetingãAishell4åCn-Celebæ°æ®éå¯ä½ä¸ºåéæ°æ®æºãå¨M2MeTææèµä¸ä½¿ç¨çAliMeetingæ°æ®éå
å«è®ç»ãè¯ä¼°åæµè¯éï¼å¨M2MET2.0å¯ä»¥å¨è®ç»åè¯ä¼°ä¸ä½¿ç¨ãæ¤å¤ï¼ä¸ä¸ªå
å«çº¦10å°æ¶ä¼è®®æ°æ®çæ°çTest-2023éå°æ ¹æ®èµç¨å®æåå¸å¹¶ç¨äºææèµçè¯ååæåãå¼å¾æ³¨æçæ¯ï¼ç»ç»è
å°ä¸æä¾è³æºçè¿åºé³é¢ã转å½ä»¥åç宿¶é´æ³ã䏻忹å°ä¸åæä¾æ¯ä¸ªè¯´è¯äººçç宿¶é´æ³ï¼èæ¯å¨Test-2023é䏿ä¾å
å«å¤ä¸ªè¯´è¯äººççæ®µãè¿äºç段å¯ä»¥éè¿ä¸ä¸ªç®åçvad模åè·å¾ã</p> |
| | | <h2>说è¯äººç¸å
³çè¯é³è¯å«<a class="headerlink" href="#id2" title="æ¤æ é¢çæ°¸ä¹
龿¥">¶</a></h2> |
| | | <p>说è¯äººç¸å
³çASRä»»å¡éè¦ä»éå çè¯é³ä¸è¯å«æ¯ä¸ªè¯´è¯äººçè¯é³ï¼å¹¶ä¸ºè¯å«å
容åé
ä¸ä¸ªè¯´è¯äººæ ç¾ãå¾2å±ç¤ºäºè¯´è¯äººç¸å
³è¯é³è¯å«ä»»å¡åå¤è¯´è¯äººè¯é³è¯å«ä»»å¡ç主è¦åºå«ã卿¬æ¬¡ç«èµä¸AliMeetingãAishell4åCn-Celebæ°æ®éå¯ä½ä¸ºåéæ°æ®æºãå¨M2MeTææèµä¸ä½¿ç¨çAliMeetingæ°æ®éå
å«è®ç»ãè¯ä¼°åæµè¯éï¼å¨M2MET2.0å¯ä»¥å¨è®ç»åè¯ä¼°ä¸ä½¿ç¨ãæ¤å¤ï¼ä¸ä¸ªå
å«çº¦10å°æ¶ä¼è®®æ°æ®çæ°çTest-2023éå°æ ¹æ®èµç¨å®æåå¸å¹¶ç¨äºææèµçè¯ååæåãå¼å¾æ³¨æçæ¯ï¼å¯¹äºTest-2023æµè¯éï¼ä¸»åæ¹å°ä¸åæä¾è³æºçè¿åºé³é¢ã转å½ä»¥åç宿¶é´æ³ãèæ¯æä¾å¯ä»¥éè¿ä¸ä¸ªç®åçVAD模åå¾å°çå
å«å¤ä¸ªè¯´è¯äººççæ®µã</p> |
| | | <p><img alt="task difference" src="_images/task_diff.png" /></p> |
| | | </section> |
| | | <section id="id3"> |
| | |
| | | <h2>åèµé设置<a class="headerlink" href="#id4" title="æ¤æ é¢çæ°¸ä¹
龿¥">¶</a></h2> |
| | | <section id="id5"> |
| | | <h3>åèµéä¸ (éå®è®ç»æ°æ®):<a class="headerlink" href="#id5" title="æ¤æ é¢çæ°¸ä¹
龿¥">¶</a></h3> |
| | | <p>åèµè
å¨ç³»ç»æå»ºè¿ç¨ä¸ä»
è½ä½¿ç¨AliMeetingãAISHELL-4åCN Celebï¼ä¸¥ç¦ä½¿ç¨é¢å¤æ°æ®ãåèµè
å¯ä»¥ä»»ä½ç¬¬ä¸æ¹å¼æºçé¢è®ç»æ¨¡åï¼å¦<a class="reference external" href="https://huggingface.co/models">Hugging Face</a>以å<a class="reference external" href="https://www.modelscope.cn/models">ModelScope</a>䏿ä¾ç模åãåèµè
éè¦å¨æç»çç³»ç»æè¿°ææ¡£ä¸è¯¦ç»ååºä½¿ç¨çé¢è®ç»æ¨¡åå称以å龿¥ã</p> |
| | | <p>åèµè
å¨ç³»ç»æå»ºè¿ç¨ä¸ä»
è½ä½¿ç¨AliMeetingãAISHELL-4åCN-Celebï¼ä¸¥ç¦ä½¿ç¨é¢å¤æ°æ®ãåèµè
å¯ä»¥ä»»ä½ç¬¬ä¸æ¹å¼æºçé¢è®ç»æ¨¡åï¼å¦<a class="reference external" href="https://huggingface.co/models">Hugging Face</a>以å<a class="reference external" href="https://www.modelscope.cn/models">ModelScope</a>䏿ä¾ç模åãåèµè
éè¦å¨æç»çç³»ç»æè¿°ææ¡£ä¸è¯¦ç»ååºä½¿ç¨çé¢è®ç»æ¨¡åå称以å龿¥ã</p> |
| | | </section> |
| | | <section id="id6"> |
| | | <h3>åèµéäº (弿¾è®ç»æ°æ®):<a class="headerlink" href="#id6" title="æ¤æ é¢çæ°¸ä¹
龿¥">¶</a></h3> |
| | |
| | | å¨é宿°æ®éæ¡ä»¶ä¸ï¼è®ç»æ°æ®éä»
éäºä¸ä¸ªå
¬å¼çè¯æåºï¼å³AliMeetingãAISHELL-4åCN-Celebã为äºè¯ä¼°åèµè
æäº¤ç模åçæ§è½ï¼æä»¬å°åå¸ä¸ä¸ªæ°çæµè¯éï¼Test-2023ï¼ç¨äºæååæåãä¸é¢æä»¬å°è¯¦ç»æè¿°AliMeetingæ°æ®éåTest-2023æµè¯éã |
| | | |
| | | ## Alimeetingæ°æ®éä»ç» |
| | | AliMeetingæ»å
±å
å«118.75å°æ¶çè¯é³æ°æ®ï¼å
æ¬104.75å°æ¶çè®ç»éï¼Trainï¼ã4å°æ¶çéªè¯éï¼Evalï¼å10å°æ¶çæµè¯éï¼Testï¼ãTrainéåEvaléåå«å
å«212åºå8åºä¼è®®ï¼å
¶ä¸æ¯åºä¼è®®ç±å¤ä¸ªè¯´è¯äººè¿è¡15å°30åéç讨论ãTrainåEvaléä¸åä¸ä¼è®®çæ»äººæ°åå«ä¸º456人å25人ï¼å¹¶ä¸åä¼çç·å¥³æ¯ä¾äººæ°åè¡¡ã |
| | | AliMeetingæ»å
±å
å«118.75å°æ¶çè¯é³æ°æ®ï¼å
æ¬104.75å°æ¶çè®ç»éï¼Trainï¼ã4å°æ¶çéªè¯éï¼Evalï¼å10å°æ¶çæµè¯éï¼Testï¼ãTrainéï¼EvaléåTestéåå«å
å«212åºå8åºä¼è®®ï¼å
¶ä¸æ¯åºä¼è®®ç±å¤ä¸ªè¯´è¯äººè¿è¡15å°30åéç讨论ãTrainï¼EvalåTestéä¸åä¸ä¼è®®çæ»äººæ°åå«ä¸º456人å25人ï¼å¹¶ä¸åä¼çç·å¥³æ¯ä¾äººæ°åè¡¡ã |
| | | |
| | | è¯¥æ°æ®éæ¶éäº13个ä¸åçä¼è®®å®¤ï¼æç
§å¤§å°è§æ ¼å为å°åãä¸åå大åä¸ç§ï¼æ¿é´é¢ç§¯ä»8å°55å¹³æ¹ç±³ä¸çãä¸åæ¿é´å
·æä¸åçå¸å±å声å¦ç¹æ§ï¼æ¯ä¸ªæ¿é´ç详ç»åæ°ä¹å°åéç»åä¸è
ãä¼è®®åºå°çå¢ä½ææç±»åå
æ¬æ°´æ³¥ãç»ççãä¼è®®åºå°çå®¶å
·å
æ¬æ²åãçµè§ã黿¿ã飿ã空è°ãæ¤ç©çãå¨å½å¶è¿ç¨ä¸ï¼éº¦å
é£éµåæ¾ç½®äºæ¡ä¸ï¼å¤ä¸ªè¯´è¯äººå´å卿¡è¾¹è¿è¡èªç¶å¯¹è¯ã麦å
é£éµå离说è¯äººè·ç¦»çº¦0.3å°5.0ç±³ä¹é´ãææè¯´è¯äººçæ¯è¯åæ¯æ±è¯ï¼å¹¶ä¸è¯´ç齿¯æ®éè¯ï¼æ²¡ææµéçå£é³ãå¨ä¼è®®å½å¶æé´å¯è½ä¼äº§çåç§å®¤å
çåªé³ï¼å
æ¬é®ç声ãå¼é¨/å
³é¨å£°ãé£æå£°ãæ°æ³¡å£°çãææè¯´è¯äººå¨ä¼è®®çå½å¶æé´åä¿æç¸åä½ç½®ï¼ä¸åçèµ°å¨ãè®ç»éåéªè¯éç说è¯äººæ²¡æéå¤ãå¾1å±ç¤ºäºä¸ä¸ªä¼è®®å®¤çå¸å±ä»¥å麦å
é£çææç»æã |
| | | |
| | |  |
| | | |
| | | æ¯åºä¼è®®ç说è¯äººæ°éä»2å°4人ä¸çãåæ¶ä¸ºäºè¦çåç§å
容çä¼è®®åºæ¯ï¼æä»¬éæ©äºå¤ç§ä¼è®®ä¸»é¢ï¼å
æ¬å»çãæè²ãåä¸ãç»ç»ç®¡çãå·¥ä¸ç产çä¸åå
容çä¾ä¼ãTrainéï¼EvaléåTestéçå¹³åè¯é³éå çåå«ä¸º42.27\%å34.76\%ãAliMeeting Trainéï¼EvaléåTestéç详ç»ä¿¡æ¯è§è¡¨1ã表2æ¾ç¤ºäºTrainé,EvaléåTestéä¸ä¸ååè¨è
人æ°ä¼è®®çè¯é³éå çåä¼è®®æ°éã |
| | | æ¯åºä¼è®®ç说è¯äººæ°éä»2å°4人ä¸çãåæ¶ä¸ºäºè¦çåç§å
容çä¼è®®åºæ¯ï¼æä»¬éæ©äºå¤ç§ä¼è®®ä¸»é¢ï¼å
æ¬å»çãæè²ãåä¸ãç»ç»ç®¡çãå·¥ä¸ç产çä¸åå
容çä¾ä¼ãTrainéï¼EvaléåTestéçå¹³åè¯é³éå çåå«ä¸º42.27\%ï¼34.76\%å42.8\%ãAliMeeting Trainéï¼EvaléåTestéç详ç»ä¿¡æ¯è§è¡¨1ã表2æ¾ç¤ºäºTrainé,EvaléåTestéä¸ä¸ååè¨è
人æ°ä¼è®®çè¯é³éå çåä¼è®®æ°éã |
| | | |
| | |  |
| | | Test-2023æµè¯éç±20åºä¼è®®ç»æï¼è¿äºä¼è®®æ¯å¨ä¸AliMeetingæ°æ®éç¸åç声å¦ç¯å¢ä¸å½å¶çãTest-2023æµè¯éä¸çæ¯ä¸ªä¼è®®ç¯èç±2å°4个åä¸è
ç»æå¹¶ä¸ä¸AliMeetingæµè¯éçé
ç½®ç¸ä¼¼ã |
| | |
| | | |
| | | æ¥èªå¦æ¯çåå·¥ä¸ççææååèµè
ååºå¨2023å¹´5æ5æ¥åä¹åå¡«å䏿¹çè°·æè¡¨åï¼ |
| | | |
| | | [M2MET2.0æ¥å](https://docs.google.com/forms/d/e/1FAIpQLSf77T9vAl7Ym-u5g8gXu18SBofoWRaFShBo26Ym0-HDxHW9PQ/viewform?usp=sf_link) |
| | | |
| | | 䏻忹å°å¨3ä¸ªå·¥ä½æ¥å
éè¿çµåé®ä»¶éç¥ç¬¦åæ¡ä»¶çåèµå¢éï¼å¢éå¿
é¡»éµå®å°å¨ææç½ç«ä¸åå¸çææè§åã卿ååå¸ä¹åï¼æ¯ä¸ªåèµè
å¿
é¡»æäº¤ä¸ä»½ç³»ç»æè¿°æä»¶ï¼è¯¦ç»è¯´æä½¿ç¨çæ¹æ³å模åã䏻忹å°éæ©åä¸å纳å
¥ASRU2023论æéã |
| | |
| | | # èµé设置ä¸è¯ä¼° |
| | | ## 说è¯äººç¸å
³çè¯é³è¯å« (主èµé) |
| | | 说è¯äººç¸å
³çASRä»»å¡éè¦ä»éå çè¯é³ä¸è¯å«æ¯ä¸ªè¯´è¯äººçè¯é³ï¼å¹¶ä¸ºè¯å«å
容åé
ä¸ä¸ªè¯´è¯äººæ ç¾ãå¾2å±ç¤ºäºè¯´è¯äººç¸å
³è¯é³è¯å«ä»»å¡åå¤è¯´è¯äººè¯é³è¯å«ä»»å¡ç主è¦åºå«ã卿¬æ¬¡ç«èµä¸AliMeetingãAishell4åCn-Celebæ°æ®éå¯ä½ä¸ºåéæ°æ®æºãå¨M2MeTææèµä¸ä½¿ç¨çAliMeetingæ°æ®éå
å«è®ç»ãè¯ä¼°åæµè¯éï¼å¨M2MET2.0å¯ä»¥å¨è®ç»åè¯ä¼°ä¸ä½¿ç¨ãæ¤å¤ï¼ä¸ä¸ªå
å«çº¦10å°æ¶ä¼è®®æ°æ®çæ°çTest-2023éå°æ ¹æ®èµç¨å®æåå¸å¹¶ç¨äºææèµçè¯ååæåãå¼å¾æ³¨æçæ¯ï¼ç»ç»è
å°ä¸æä¾è³æºçè¿åºé³é¢ã转å½ä»¥åç宿¶é´æ³ã䏻忹å°ä¸åæä¾æ¯ä¸ªè¯´è¯äººçç宿¶é´æ³ï¼èæ¯å¨Test-2023é䏿ä¾å
å«å¤ä¸ªè¯´è¯äººççæ®µãè¿äºç段å¯ä»¥éè¿ä¸ä¸ªç®åçvad模åè·å¾ã |
| | | ## 说è¯äººç¸å
³çè¯é³è¯å« |
| | | 说è¯äººç¸å
³çASRä»»å¡éè¦ä»éå çè¯é³ä¸è¯å«æ¯ä¸ªè¯´è¯äººçè¯é³ï¼å¹¶ä¸ºè¯å«å
容åé
ä¸ä¸ªè¯´è¯äººæ ç¾ãå¾2å±ç¤ºäºè¯´è¯äººç¸å
³è¯é³è¯å«ä»»å¡åå¤è¯´è¯äººè¯é³è¯å«ä»»å¡ç主è¦åºå«ã卿¬æ¬¡ç«èµä¸AliMeetingãAishell4åCn-Celebæ°æ®éå¯ä½ä¸ºåéæ°æ®æºãå¨M2MeTææèµä¸ä½¿ç¨çAliMeetingæ°æ®éå
å«è®ç»ãè¯ä¼°åæµè¯éï¼å¨M2MET2.0å¯ä»¥å¨è®ç»åè¯ä¼°ä¸ä½¿ç¨ãæ¤å¤ï¼ä¸ä¸ªå
å«çº¦10å°æ¶ä¼è®®æ°æ®çæ°çTest-2023éå°æ ¹æ®èµç¨å®æåå¸å¹¶ç¨äºææèµçè¯ååæåãå¼å¾æ³¨æçæ¯ï¼å¯¹äºTest-2023æµè¯éï¼ä¸»åæ¹å°ä¸åæä¾è³æºçè¿åºé³é¢ã转å½ä»¥åç宿¶é´æ³ãèæ¯æä¾å¯ä»¥éè¿ä¸ä¸ªç®åçVAD模åå¾å°çå
å«å¤ä¸ªè¯´è¯äººççæ®µã |
| | | |
| | |  |
| | | |
| | |
| | | å
¶ä¸ $\mathcal N_{\text{Ins}}$ , $\mathcal N_{\text{Sub}}$ , $\mathcal N_{\text{Del}}$ æ¯ä¸ç§é误çå符æ°, $\mathcal N_{\text{Total}}$ æ¯åç¬¦æ»æ°. |
| | | ## åèµé设置 |
| | | ### åèµéä¸ (éå®è®ç»æ°æ®): |
| | | åèµè
å¨ç³»ç»æå»ºè¿ç¨ä¸ä»
è½ä½¿ç¨AliMeetingãAISHELL-4åCN Celebï¼ä¸¥ç¦ä½¿ç¨é¢å¤æ°æ®ãåèµè
å¯ä»¥ä»»ä½ç¬¬ä¸æ¹å¼æºçé¢è®ç»æ¨¡åï¼å¦[Hugging Face](https://huggingface.co/models)以å[ModelScope](https://www.modelscope.cn/models)䏿ä¾ç模åãåèµè
éè¦å¨æç»çç³»ç»æè¿°ææ¡£ä¸è¯¦ç»ååºä½¿ç¨çé¢è®ç»æ¨¡åå称以å龿¥ã |
| | | åèµè
å¨ç³»ç»æå»ºè¿ç¨ä¸ä»
è½ä½¿ç¨AliMeetingãAISHELL-4åCN-Celebï¼ä¸¥ç¦ä½¿ç¨é¢å¤æ°æ®ãåèµè
å¯ä»¥ä»»ä½ç¬¬ä¸æ¹å¼æºçé¢è®ç»æ¨¡åï¼å¦[Hugging Face](https://huggingface.co/models)以å[ModelScope](https://www.modelscope.cn/models)䏿ä¾ç模åãåèµè
éè¦å¨æç»çç³»ç»æè¿°ææ¡£ä¸è¯¦ç»ååºä½¿ç¨çé¢è®ç»æ¨¡åå称以å龿¥ã |
| | | ### åèµéäº (弿¾è®ç»æ°æ®): |
| | | é¤äºé宿°æ®å¤ï¼åä¸è
å¯ä»¥ä½¿ç¨ä»»ä½å
¬å¼å¯ç¨ãç§äººå½å¶å模æä»¿ççæ°æ®éã使¯ï¼åä¸è
å¿
é¡»æ¸
æ¥å°ååºä½¿ç¨çæ°æ®ãåæ ·ï¼åèµè
ä¹å¯ä»¥ä½¿ç¨ä»»ä½ç¬¬ä¸æ¹å¼æºçé¢è®ç»æ¨¡åï¼ä½å¿
须卿åçç³»ç»æè¿°æä»¶ä¸æç¡®çååºæä½¿ç¨çæ°æ®å模å龿¥ï¼å¦æä½¿ç¨æ¨¡æä»¿çæ°æ®ï¼è¯·è¯¦ç»æè¿°æ°æ®æ¨¡æçæ¹æ¡ã |