CaltechAUTHORS
  A Caltech Library Service

Scaling Data from Multiple Sources

Enamorado, Ted and López-Moctezuma, Gabriel and Ratkovic, Marc (2021) Scaling Data from Multiple Sources. Political Analysis, 29 (2). pp. 212-235. ISSN 1047-1987. doi:10.1017/pan.2020.24. https://resolver.caltech.edu/CaltechAUTHORS:20210517-124437095

[img] PDF - Submitted Version
See Usage Policy.

987kB
[img] PDF - Supplemental Material
See Usage Policy.

502kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20210517-124437095

Abstract

We introduce a method for scaling two datasets from different sources. The proposed method estimates a latent factor common to both datasets as well as an idiosyncratic factor unique to each. In addition, it offers a flexible modeling strategy that permits the scaled locations to be a function of covariates, and efficient implementation allows for inference through resampling. A simulation study shows that our proposed method improves over existing alternatives in capturing the variation common to both datasets, as well as the latent factors specific to each. We apply our proposed method to vote and speech data from the 112th U.S. Senate. We recover a shared subspace that aligns with a standard ideological dimension running from liberals to conservatives, while recovering the words most associated with each senator’s location. In addition, we estimate a word-specific subspace that ranges from national security to budget concerns, and a vote-specific subspace with Tea Party senators on one extreme and senior committee leaders on the other.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.1017/pan.2020.24DOIArticle
https://doi.org/10.7910/DVN/FOUVELDOIDataset
https://doi.org/10.24433/CO.3824807.v1DOICode
ORCID:
AuthorORCID
Enamorado, Ted0000-0002-2022-7646
Additional Information:© The Author(s) 2020. Published by Cambridge University Press on behalf of the Society for Political Methodology. Published 23 November 2020. We are grateful to Arthur Spirling, Jacob Neihesel, John Londregan, and Dustin Tingley and audiences at Princeton University, New York University, and the University of Buffalo for for helpful comments on an suggestions. Data Availability Statement: Replication code for this article has been published in Code Ocean, a computational reproducibility platform that enables users to run the code, and can be viewed interactively at Enamorado et al. (2020a) or https://doi.org/10.24433/CO.3824807.v1. A preservation copy of the same code and data can also be accessed via Harvard Dataverse at Enamorado et al. (2020b) or https://doi.org/10.7910/DVN/FOUVELL.
Subject Keywords:multidimensional scaling, principal component analysis, U.S. Senate
Issue or Number:2
DOI:10.1017/pan.2020.24
Record Number:CaltechAUTHORS:20210517-124437095
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20210517-124437095
Official Citation:Enamorado, T., López-Moctezuma, G., & Ratkovic, M. (2021). Scaling Data from Multiple Sources. Political Analysis, 29(2), 212-235. doi:10.1017/pan.2020.24
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:109152
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:17 May 2021 19:54
Last Modified:29 Jul 2021 15:49

Repository Staff Only: item control page