Online news clustering for crosslingual media monitoring

March 21, 2017

1:00 pm

Scalable Understanding of Multilingual MediA (SUMMA) is an European Horizon 2020 research project which targets to develop a highly scalable platform to automatically monitor public broadcast and web-based news sources, enabling news agencies and journalists to cope with world-scale amounts of information. To this end, a multilingual machine learning stream-processing pipeline is being developed which integrates several technologies such as Audio-to-Speech Recognition (ASR), Machine Translation (MT), Online Clustering and Text Summarization, among many others. In this talk we’ll focus on Priberam’s research effort to develop the Online Clustering component of this project, which enables the discovery of relevant storylines across multiple languages from streaming news data.

Sebastião Miranda

Sebastião Miranda is a Software Engineer at Priberam, where he's been working on search engine architecture and algorithms, news clustering and other natural language processing applications. He holds an MSc in Electrical and Computer Engineering from Instituto Superior Técnico (University of Lisbon, 2014), and is also interested in high performance computing, artificial intelligence and embedded systems.Priberam

Seminários

Últimos seminários

Cost-Sensitive Learning to Defer to Multiple Experts
March 2, 2026
Large language models (LLMs) have emerged as strong contenders in machine translation. Yet, they often fall behind specialized neural machine…
Fair Federated Learning under Group-Specific Distributed Concept Drift
February 24, 2026
Machine learning models can become unfair when different groups experience changes in data over time, a phenomenon called group-specific concept…
Unlocking Latent Discourse Translation in LLMs Through Quality-Aware Decoding
June 17, 2025
Large language models (LLMs) have emerged as strong contenders in machine translation. Yet, they often fall behind specialized neural machine…
Speech as a Biomarker for Disease Detection
May 20, 2025
Today’s overburdened health systems face numerous challenges, exacerbated by an aging population. Speech emerges as a ubiquitous biomarker with strong…

Online news clustering for crosslingual media monitoring

Sebastião Miranda

Seminários

Últimos seminários

Cost-Sensitive Learning to Defer to Multiple Experts

Fair Federated Learning under Group-Specific Distributed Concept Drift

Unlocking Latent Discourse Translation in LLMs Through Quality-Aware Decoding

Speech as a Biomarker for Disease Detection