From Llama 3 to Deepseek R1 and beyond: a year of LLMs in retrospective

March 25, 2025

10:48 am

The year 2023 was ripe in open-source LLMs, and the community managed to surpass the original ChatGPT model. The wave continued in 2024-2025, and the gap to the best closed-source models is now reduced to a few months. This talk will go over the major model architecture, training and inference changes that pushed the state-of-the-art in LLMs and VLMs over the last year.

João Gante

João Gante is a Machine Learning Engineer in the Open-Source team at Hugging Face, leading text generation in the "transformers" library. João has 7 years of experience in the AI industry, as well as a PhD in AI applied to telecommunications from Instituto Superior Técnico.Hugging Face

Seminários

Últimos seminários

Cost-Sensitive Learning to Defer to Multiple Experts
March 2, 2026
Large language models (LLMs) have emerged as strong contenders in machine translation. Yet, they often fall behind specialized neural machine…
Fair Federated Learning under Group-Specific Distributed Concept Drift
February 24, 2026
Machine learning models can become unfair when different groups experience changes in data over time, a phenomenon called group-specific concept…
Unlocking Latent Discourse Translation in LLMs Through Quality-Aware Decoding
June 17, 2025
Large language models (LLMs) have emerged as strong contenders in machine translation. Yet, they often fall behind specialized neural machine…
Speech as a Biomarker for Disease Detection
May 20, 2025
Today’s overburdened health systems face numerous challenges, exacerbated by an aging population. Speech emerges as a ubiquitous biomarker with strong…

From Llama 3 to Deepseek R1 and beyond: a year of LLMs in retrospective

João Gante

Seminários

Últimos seminários

Cost-Sensitive Learning to Defer to Multiple Experts

Fair Federated Learning under Group-Specific Distributed Concept Drift

Unlocking Latent Discourse Translation in LLMs Through Quality-Aware Decoding

Speech as a Biomarker for Disease Detection