Who Voted in 2016? Using Fuzzy Forests to Understand Voter Turnout
Objective: What can machine learning tell us about who voted in 2016? There are numerous competing voter turnout theories, and a large number of covariates are required to assess which theory best explains turnout. This article is a proof of concept that machine learning can help overcome this curse of dimensionality and reveal important insights in studies of political phenomena. Methods: We use fuzzy forests, an extension of random forests, to screen variables for a parsimonious but accurate prediction. Fuzzy forests achieve accurate variable importance measures in the face of high‐dimensional and highly correlated data. The data that we use are from the 2016 Cooperative Congressional Election Study. Results: Fuzzy forests chose only a small number of covariates as major correlates of 2016 turnout and still boasted high predictive performance. Conclusion: Our analysis provides three important conclusions about turnout in 2016: registration and voting procedures were important, political issues were important (especially Obamacare, climate change, and fiscal policy), but few demographic variables other than age were strongly associated with turnout. We conclude that fuzzy forests is an important methodology for studying overdetermined questions in social sciences.
Additional Information© 2020 by the Southwestern Social Science Association. Issue Online: 24 March 2020; Version of Record online: 16 February 2020. An earlier version of this paper was presented as a poster at the Polmeth 2018 conference; we thank conference participants for their comments and suggestions.
Supplemental Material - ssqu12777-sup-0001-suppmat.zip