Appendix B: Assessing the Impact of Newspaper-Based Selection Bias


In terms of statistical modeling, selection bias occurs when the distribution of studied variables among cases included in the analysis (non-missing cases) differs from the distribution of the same variables among cases that potentially belong to the sample (missing cases).  To determine whether our data bears this problem, we need to know if the included cases, that is, NBA players for whom information on social origin is available, are significantly different from NBA players for whom such information is lacking.  Specifically, considering our hypotheses, we need to ask: “Did newspaper-based selection bias result in richer data on players from advantaged social origins?”  For if that were the case, this form of bias would have affected our results. 

To answer the question above, we must identify the circumstances under which journalists would report on advantaged NBA players more often than on disadvantaged ones.  Considering that newspapers are in the business of attracting customers, it is reasonable to expect that the media most likely focuses their human interest stories on players who are among the best and, hence, the most popular.  Selection bias, then, would be toward ‘eminent’ NBA players, i.e. those who are among the best in the league: more detailed information would be given about them, including their social class and family structure background. 

According to our theory, players from advantaged social origins are more likely to have made it into the NBA.  Extending this assumption to ‘eminence’, namely that the more advantaged the player’s social origins, the more likely he is to acquire the skills necessary to be among the best in the NBA, we formulate the following hypotheses for the mechanism that could lead newspaper-based selection bias to affect our data:

Hypothesis 1:  The more eminent an NBA player, the more likely news would report detailed information on his social class and family structure background.

Hypothesis 2:  NBA players from more advantaged origins are more likely to be eminent players.

Hypothesis 3:  In so far as both the above two expectations are met, newspaper articles would cover in greater detail information on social class and family structure background of NBA players from more advantaged origins. 

If Hypothesis 3 were to have empirical support, our data would be biased. While we cannot test it directly, we can determine whether the first two expectations of the newspaper-based selection mechanism hold.  Specifically, for selection bias to occur, advantaged social origins must be related to being eminent (Hypothesis 2), which in turn should be related to being included in our data (Hypothesis 1).  If, on the other hand, the relationship between advantaged origins and being an eminent NBA player is not significant, then the news articles have not reported selectively on players from advantaged backgrounds (although they may have selected more often on eminent ones).  In other words, as long as Hypothesis 2 does not find support, we can conclude that NBA players for whom detailed social class and family structure background information is available are not more likely to come from advantaged origins than NBA players for whom we do not have similar information (missing cases). 

To perform this test, we first determined how the 155 players in our subpopulation rank on eminence. We measured eminence as position in the NBA draft, where the higher the score, the higher the draft pick.  In particular, those who were the first pick overall in a given year’s NBA Draft score highest.  Since not all players were drafted, non-draftees are coded with the lowest possible score (6.5% of 155).

These scores obtained from draft position we use as independent variable in logistic regression in which the dependent variable is existence of data on social class and family-structure background (1=yes, 0 = no). 

Table 1 Logistic Regression of Existence of Data (Yes = 1, No = 0) on Social Class and Family-Structure Background on Players’ Eminence


Social Class


Family Structure















Players’ Eminence



























-2 Log Likelihood










Model Chi Square










Cox and Snell R2








N 155       155       155    

Table 1 presents the logistic regression results. Supporting Hypothesis 1, more eminent players are significantly more likely to have information on social class, on family structure, and on both variables combined.  For each dependent variable, the model fit is satisfactory.   

To assess whether advantaged social origins are positively and significantly related to players’ eminence (Hypothesis 2), we employed ordinary least squares regression of player’s eminence separately on three independent variables: advantaged social class (middle to upper social class origins), advantaged family structure background (two-parent family), and compounded advantage (both advantaged social class and family structure background).  Results (see Table 2) show that that none of the advantaged origins variables have a significant effect on eminence. 

Table 2 Ordinary Least Squares Regression of Players’ Eminence on Advantaged Social Origin Variables

  B (standard error)
  Middle to Upper Social Class Background 4.34(4.16)
  Constant 36.29*(3.49)
  R2 0.01
  N 80
  Two Parent Family Structure -2.67(3.23)
  Constant 40.50*(2.26)
  R2 0.01
  N 105
  Both Forms of Advantage 2.77(3.98)
  Constant 39.10*(2.67)
  R2 0.01
  N 70

* p < 0.01

Substantively, having failed to find support for Hypothesis 2 allows us to conclude that our data is not biased toward advantaged NBA players: in terms of social class and/or family background, players for whom this information is available are not substantively different from players on which such information is lacking (i.e. missing cases). 


