Friday, November 15 @ 2:35pm
TCL 123 (Wege Auditorium)
Methodological Challenges in Computational Text Analysis
Computational text analysis is a field that uses computational methods (primarily from natural language processing and machine learning) to answer research questions about large collections of text data (e.g. news articles, court documents, social media posts, congressional speeches, electronic health records etc.). In this talk, I will discuss two recent projects that address methodological challenges in computational text analysis research. First, computational text analysis often relies on supervised classifiers to infer aggregate measures of groups of documents (for example, the proportion of Tweets that have positive sentiment towards a politician each day), a task called “prevalence estimation.” I show that previous methods that aggregate and adjust discriminative classifiers are biased and develop a novel hybrid discriminative-generative model which is more robust to prevalence shifts between training and testing. Second, most computational text analysis methods have focused on descriptive measurements of corpora or predictive tasks, but social research is often concerned with causal inference. I will describe our new quasi-experimental design framework for causal text matching which involves computing distance scores between pairs of texts via supervised text classifiers and unsupervised embedding representations. I apply this method to a dataset of scraped Reddit comments and estimate the effects of deleting comments on Reddit users’ behavior.
Katherine Keith is a fourth-year PhD student in the College of Information and Computer Sciences at the University of Massachusetts Amherst. Her research with her advisor, Brendan O’Connor, focuses on computational text analysis, natural language processing, and machine learning. She graduated summa cum laude from Lewis & Clark College in Portland, Oregon in 2015 with a degree in Mathematics and a minor in Chinese. After her undergraduate, she received a Fulbright English Teaching Assistantship and taught in Kinmen, Taiwan for twelve months before beginning her graduate degree. She was awarded a Bloomberg Data Science PhD fellowship in 2019.