QUotation Attribution Corpus

This page provides links and information about the PÚBLICO 1994 QUotation Attribution Corpus (QUAC). This corpus was created within the scope of Marta Quintão’s Master’s thesis. It is based on the CHAVE corpus, which contained 100000 unmarked news articles from the Portuguese newspaper PÚBLICO. A fraction of the original corpus was manually annotated with information about quotes and the authors of those quotes. Partial information about co-reference is also provided. A total of 212 annotated news articles and 971 annotated quotes are present in the corpus.

Download the PÚBLICO 1994 Quotation Attribution Corpus here.


PÚBLICO 1994 Quotation Attribution Corpus is owned by Marta Quintão and licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. For the legal licensing terms, please see the LICENSE.txt file in the archive. You can find a human-readable summary of the license (which is not a substitute of the license) here.


If you use this corpus, please cite the following work: Marta E. Quintão, Quotation Attribution for Portuguese News Corpora. M.Sc. Thesis. Técnico Lisboa/UTL: Portugal, 2014.


The original (unmarked news) dataset CHAVE was compiled within the CLEF initiative by Linguateca and is available here. It contains news from the newspapers “PÚBLICO” and “Folha de São Paulo” for the years 1994 and 1995. Marta Quintão would like to thank Priberam for providing the news articles and supporting the M.Sc. thesis which lead to the creation of this corpus and, in particular, to André F. T. Martins, Miguel B. Almeida, Prof. Mário Figueiredo (supervisors of the M.Sc. thesis) and Mariana S. C. Almeida for helping and providing the necessary tools.