We take inspiration from recent research on sentiment analysis that interprets text based on the subjective attitude of the author. We consider related tasks where a piece of text is interpreted to predict some extrinsic, real-valued outcome of interest that can be observed in non-text data. Examples include:
- The interpretation of an annual financial report from a company to its shareholders is the risk incurred by investing in the company in the coming year.
- The interpretation of a critic’s review of a film is the film’s box office success.
- The interpretation of a political blog post is the response it garners from readers.
- The interpretation of a day’s microblog feeds is the public’s opinion about a particular issue.
In all of these cases, one aspect of the text’s meaning is observable from objective real-world data, although perhaps not immediately at the time the text is published (respectively: return volatility, gross revenue, user comments, and traditional polls). We propose a generic approach to text-driven forecasting that is expected to benefit from linguistic analysis while remaining neutral to different theories of language. A highly attractive property of this line of research is that evaluation is objective, inexpensive, and theory-neutral. This approach introduces some methodological challenges, as well.
We conjecture that forecasting tasks, when considered in concert, will be a driving force in domain-specific, empirical, and extrinsically useful natural language analysis. Further, this research direction will push NLP to consider the language of a more diverse subset of the population, and may support inquiry in the social sciences about foreknowledge and communication in societies.
This talk includes joint work with Ramnath Balasubramanyan, William Cohen, Dipanjan Das, Kevin Gimpel, Mahesh Joshi, Shimon Kogan, Dimitry Levin, Brendan O’Connor, Bryan Routledge, Jacob Sagi, and Tae Yano.