When most of our interactions went virtual, the need for automatic support for smooth running of the online events such as project meetings became more intense. Summarizing meeting contents is one of them. Meeting minutes keep a record of what was discussed at a meeting. It is usually a written document with little or no structure (perhaps a hierarchical bulleted list) aimed at informing the participants and non-participants of what happened during the meeting. ‘Automatic minuting’ tools would be a useful addition to better comprehend the meeting contents quickly. People adopt different styles when ‘taking notes’ or ‘recording minutes’ of the meeting. The minutes also depend on the category of the meeting, the intended audience, and the goal or objective of the meeting. Text or speech summarization methods known from the past would rank close to this task. However, Automatic Minuting is challenging due to the absence of agreed-upon guidelines, variety of minuting practices, and lack of extensive background research.
We propose AutoMin, the first shared task on automatic minuting of meeting proceedings. Our objective is to drive community efforts towards understanding the challenges of the task and develop tools for this important use case especially in the current world which had to go on-line far more than expected. With this shared task, we would invite the speech and natural language processing community to investigate the challenges of automatic minuting with real meeting data in two different settings: technical project meetings (both in English and Czech) and parliamentary proceedings (English).
We propose one main task and two subsidiary tasks. The subsidiary tasks are optional.
The data for the shared task would be available in the following Github repository. More task-specific details to use the data would be provided in due time on our website. https://github.com/ELITR/automin-2021
Aside from the data we release, we recommend the following datasets to use in your training although our domains do not match:
In any case, please clearly describe which data was used in what way in your system paper. A comprehensive list of summarization datasets could be found here:
We would ask our participants to host their system-runs in their own GitHub repository and share the link with us with exactly the system requirements/environment to run their code. Also, they would submit their automatically generated outputs. In general, you will be expected to submit the outputs of your system in Task A and optionally in Task B and/or Task C, in some fairly simple format based on plain text. Please refer to the submission page for details.
All teams are required to submit a brief technical report describing their method. Please use the Interspeech template for your system description reports. All reports must be a minimum of 2 pages and a maximum of: 5 pages excluding references (for single task), 8 pages (for multiple tasks). Reports must be written in English. Authors would submit their papers to minute@ufal.mff.cuni.cz. The proceedings would be added to the ISCA archive.
We would additionally invite selected authors to submit a full-paper to a special issue of the open access Information journal from MDPI which is indexed within Scopus, ESCI (Web of Science), Ei Compendex, DBLP, and many other databases. The journal submissions would undergo further review. Authors of invited papers should be aware that the final submitted manuscript must provide a minimum of 50% new content and not exceed 30% copy/paste from the proceedings paper.
For further information about this task and dataset, please contact: