Text Summarization of Transcripts from Online Meetings – Students Research in ML and DL at Durham College

April 28 @ 6:00 pm - 7:00 pm

Text Summarization is a technique for generating a concise and precise summary of voluminous texts while focusing on the sections that convey useful information without losing the overall meaning. It aims to transform lengthy documents into shortened versions, which could be difficult and costly to undertake if done manually. With the current explosion of data circulating in digital space, primarily unstructured textual data, there is a need to develop tools that allow people to get insights from them quickly. In situations where it is essential to keep track of what is being spoken, such as during an online lecture, taking notes is a popular activity used by many. The art of notetaking does not involve making notes of every single word that is spoken but comprehensive outlines of what is discussed. The key to good notetaking lies in making concise yet informative summaries. In this seminar, we will be discussing how we have tried to address the difficulties of notetaking by building an application that produces notes based on transcripts generated by the Automatic Speech Recognition (ASR) technology of the meeting platforms. We experimented with six summarization models for this application, including transformer-based models pre-trained on large corpora. The datasets used for this application are the transcripts dataset acquired from online meeting platforms and the Extreme Summarization (XSum) dataset. We evaluated the models using Rouge metrics (Rouge-1, Rouge-2, and Rouge-L) and selected the best-performing model as the final model. We have built a bot that utilizes Telegram's API and shares the generated summaries via group chat with the users.