ML-Inception: understanding where and why models work (and don’t work)

June 21, 2022

1:00 pm

A subgroup discovery-based method has recently been proposed to understand the behavior of models in the (original) feature space. The subgroups identified represent areas of feature space where the model obtains better or worse predictive performance than on average. For instance, in the marketing domain, the approach extracts subgroups such as: for customers with higher income and who are younger, the random forest achieves higher accuracy than on average. Here, we propose the use of metalearning to analyze those subgroups on the metafeature space, where they are characterized in a domain-independent way, using statistical and information theoretic properties. We then use association rules to relate characteristics of the subgroups to improvement or degradation of the performance of models. For instance, in the same domain, the approach extracts rules such as: when the class entropy decreases and the mutual information increases in the subgroup data, the random forest achieves lower accuracy. We illustrate the approach with some empirical results.

Carlos Soares

Carlos Soares is an Associate Professor at the Faculty of Engineering of U. Porto, where he holds the positions of Subdirector of the Dep. of Informatics Engineering, Director of the Ph.D. programme on Informatics Engineering and Adjunct Director of the M.Sc. programme on Data Science and Engineering. Carlos teaches at the Porto Business School, where he is the co-Director of the executive programme on Business Intelligence & Analytics. He is also an External Advisor for Intelligent Systems at Fraunhofer Portugal AICOS, a researcher at LIACC and a collaborator at LIAAD-INESC TEC. The focus of his research is on metalearning/autoML but he has a general interest in Data Science. He has participated in 20+ national and international R&ID as well as consulting projects. Carlos regularly collaborates with companies, including Feedzai, Accenture and InovRetail. He has published/edited several books and 150+ papers in journals and conferences, (90+/125+ indexed by ISI/Scopus) and supervised 10+/50+ Ph.D./M.Sc. thesis. Recent participation in the organization of events, includes ECML PKDD 2015, IDA 2016 and Discovery Science 2021, as programme co-chair. In 2009, he was awarded the Scientific Merit and Excellence Award of the Portuguese AI Association.FEUP

Seminários

Últimos seminários

Unlocking Latent Discourse Translation in LLMs Through Quality-Aware Decoding
June 17, 2025
Large language models (LLMs) have emerged as strong contenders in machine translation. Yet, they often fall behind specialized neural machine…
Speech as a Biomarker for Disease Detection
May 20, 2025
Today’s overburdened health systems face numerous challenges, exacerbated by an aging population. Speech emerges as a ubiquitous biomarker with strong…
Enhancing Uncertainty Estimation in Neural Networks
May 6, 2025
Neural networks are often overconfident about their predictions, which undermines their reliability and trustworthiness. In this presentation, I will present…
Improving Evaluation Metrics for Vision-and-Language Models
April 22, 2025
Evaluating image captions is essential for ensuring both linguistic fluency and accurate semantic alignment with visual content. While reference-free metrics…

ML-Inception: understanding where and why models work (and don’t work)

Carlos Soares

Seminários

Últimos seminários

Unlocking Latent Discourse Translation in LLMs Through Quality-Aware Decoding

Speech as a Biomarker for Disease Detection

Enhancing Uncertainty Estimation in Neural Networks

Improving Evaluation Metrics for Vision-and-Language Models