The choice of cases for empirical analysis is a central topic in the methods literature. The argument that quantitative and qualitative research studies different templates, which is probably most forcefully described in A Tale of Two Cultures, does not come as a surprise. What I want to focus on here are the arguments on case selection in qualitative research. Dating back to Lijphart and Eckstein, we have a repertoire of types of cases and corresponding selection strategies such as the typical case, the diverse case, and the most-likely case. What is often ignored in the literature on case selection in qualitative research is the difference between the identification and choice of cases. Peter Starke and I make this point in a paper concerning case selection based on regression results, but it extends to genuine qualitative research and other forms of multi-method designs.
Imagine we are interested in the sufficiency of high levels of social and political inequality on the occurrence of revolutions. In our data, we have ten cases in which a high level of inequality coincides with a revolution; these cases are typical because they meet the definition of sufficiency as “if X, then Y”. This means we have identified the group of typical cases we could examine for process tracing, but we still have to choose one. Which case should we now select if all of them are qualitatively identical? (This question is futile when one works with fuzzy sets because there is a clear criterion for intentional case selection, which is what Carsten Schneider and I work out informally here.)
Three answers are on offer only of which is sound from the viewpoint of methods. First, one should choose a substantively important case. While it is valuable to explain empirically important cases (the archetypical example being the French revolution in a study of revolutions), this is not the route I recommend. When we are interested in learning something about theory based on the empirical analysis of cases –referred to as theory-centered/-oriented/-focused studies – the only relevant criterion is to what extent a case is useful for achieving this purpose. Substantive importance is an entirely different criterion; it might be that a substantively interesting case can be well-explained by our theory, but it may not be. Take the example of the French revolution, which is substantively relevant because it changed the European landscape at the end of the 18th century and beginning of the 19th century, and shaped the history of Europe more generally. Is this significant to a general theory explaining the occurrence and non-occurrence of revolutions? It could be, but probably not because the substantive importance of that revolution follows from its consequences in Europe while we are interested in explaining their occurrence. For this reason, one should not be concerned when a theory (or hypothesis) does not fit a substantively important case; this is not the standard in which we are interested when cases are instrumental for developing general theoretical arguments.
To put it in other words, it is argued that the substantive importance of a case drives the expectation that the theory must be able to explain this case, i.e., it is a most-likely case. Failure to explain this case would severely weaken the theory. But this is misleading because how likely it is that a case is explained by a theory depends on theory-relevant features of your case (explained for example in chapter 3 and 8 of my case study book). In a study of revolutions, for instance, a high degree of political and social inequality in a country might render your case most likely to confirm your hypothesis. If the high level of inequality is also what makes your case substantively important, we have an overlap between substantive and theoretical importance, but this would be a coincidence. Theoretical relevance derives from theoretically relevant elements of your case, not from their substantive importance.
Second, we can select a case that we are able to study given our resource constraints. This is a legitimate guideline for case selection in practice, but one should be aware of introducing a bias. The bias occurs when the constraint guiding your choice of cases is systematically related to your research question. Suppose you are interested in revolutions in former colonies. You have Spanish and English colonies in your population, but you do not speak Spanish and only study English colonies. This is fine as long as there is no reason to expect that Spanish and English colonies systematically differ from each other, which is what you must evaluate in light of your theoretical argument.
The third option, which is the one I find most coherent and thus convincing, is stratified random selection (each set comprising qualitatively identical cases forming a stratum). Random selection has a difficult road in qualitative research, but this is only because the idea of choosing cases randomly is so closely (and incorrectly) tied to quantitative research and the estimation of marginal effects. However, there are different reasons why one might choose randomly in qualitative research. (Fearon and Laitin develop a reason in the context of multi-method research, which I am not entirely convinced is correct.). One viable reason in qualitative research is simply that there is no case selection criterion we can rely on after we have identified qualitatively identical cases. Pending evidence to the contrary, every case serves the theoretical purpose equally well and when there is no reason to favor one case over the other, why not choose randomly among strata?