Scalable Understanding of Multilingual MediA (SUMMA) is an European Horizon 2020 research project which targets to develop a highly scalable platform to automatically monitor public broadcast and web-based news sources, enabling news agencies and journalists to cope with world-scale amounts of information. To this end, a multilingual machine learning stream-processing pipeline is being developed which integrates several technologies such as Audio-to-Speech Recognition (ASR), Machine Translation (MT), Online Clustering and Text Summarization, among many others. In this talk we’ll focus on Priberam’s research effort to develop the Online Clustering component of this project, which enables the discovery of relevant storylines across multiple languages from streaming news data.
Online news clustering for crosslingual media monitoring
March 21, 2017
1:00 pm