Unconventional Regression Models for Count and Binary Data
2017
Hochschulschrift
Zugriff:
106
Count and binary data have become popular dependent variables in studies in various areas, especially due to the growing availability of data on human and social behavior. While the standard models for count data include Poisson regression and Negative Binomial regression, the standard models for binary data include Logistic regression and Probit regression. Like any other regression models, these standard models have limitations in some cases and there may be other methods that are good in those situations. Among those alternatives we choose two unconventional models: Conway-Maxwell-Poisson (CMP) regression for count data and Linear Probability Model (LPM) for binary data. For CMP, we develop new model extensions and estimation frameworks including an IRLS algorithm, a generalized additive model as well as a tree-based varying coefficient model. For LPM, we critically evaluate its properties in terms of estimation and prediction. (a) The Conway-Maxwell-Poisson (CMP) regression is a popular model for count data due to its ability to capture both under dispersion and over dispersion. However, CMP regression is limited when dealing with complex nonlinear relationships. With today's wide availability of count data, there is need for count data models that can capture complex nonlinear relationships. In this dissertation, we first present a varying coefficient model for the CMP distribution in which the regression coefficient is modeled as a low dimensional function of moderator variables and then we extend the model formulation in such a way that it allows high dimensional coefficient functions. The latter is fitted using a new tree-based method. Our contributions for the CMP regression include: 1. We propose a flexible estimation framework for CMP regression based on iterative reweighed least squares (IRLS) and then extend this model to allow for additive components (GAMs) as well as varying coefficient models (VCMs), using penalized splines estimation method. Because CMP distribution belongs to the exponential family, convergence is guaranteed for IRLS under some regularity conditions. We illustrate the usefulness of this method through extensive simulation studies and using real data from a bike sharing system in Washington, DC. 2. Our proposed tree-based varying coefficient model offers further flexibility. We consider a model based (MOB) recursive partitioning framework to implement a tree-based varying coefficient model for CMP distribution, as it is computationally less intensive. We also provide an alternative method to estimate the split point than the default exhaustive search which in turn eases model fitting in general, and for CMP distribution in particular. We illustrate the usefulness of our method by extensive simulation and a real application from a bike sharing system in Washington, DC. (b) Linear regression is among the most popular statistical models in social sciences research. Linear probability models (LPMs) - linear regression models applied to a binary outcome - are used in various disciplines. In this dissertation, first, we evaluate LPM for three common uses of binary outcome models: inference and estimation, prediction and classification, and selection bias. We compare its performance to Logit and Probit regression models under different sample sizes, error distributions, and more. Second, we relax the parametric assumption and perform a similar type of comparison study for the nonparametric extensions of LPM, logistic and probit models. We find that coefficient directions, statistical significance, and marginal effects from LPM yield results similar to logit and probit. In addition, LPM estimators are consistent for the true parameters up to a multiplicative scalar. For classification and selection bias, LPM is on par with logit and probit in terms of class separation and ranking, and is a viable alternative in selection models. LPM is lacking when the predicted probabilities are of interest, because predicted probabilities can exceed the unit interval. We illustrate some of these results by modeling price in online auctions, using data from eBay. For each of the studies, in addition to the methodological derivations, we use both extensive simulation studies and real data applications to illustrate the usefulness of the proposed methodologies.
Titel: |
Unconventional Regression Models for Count and Binary Data
|
---|---|
Autor/in / Beteiligte Person: | Chatla, Suneel Babu ; 蘇尼爾 |
Link: | |
Veröffentlichung: | 2017 |
Medientyp: | Hochschulschrift |
Sonstiges: |
|