Qualitatively speaking: Deciding how much data and analysis is enough

Sims, D; Cilliers, F

doi:10.7196/AJHPE.2023.v15i1.1657

Services on Demand

Journal

Article

Indicators

African Journal of Health Professions Education

On-line version ISSN 2078-5127

Afr. J. Health Prof. Educ. (Online) vol.15 n.1 Pretoria Mar. 2023

https://doi.org/10.7196/AJHPE.2023.v15i1.1657

FORUM

Qualitatively speaking: Deciding how much data and analysis is enough

D Sims^I; F Cilliers^II

^IBSc, BMedSc Hons, MSc Med, PhD; Office of Teaching and Learning, Faculty of Economic and Management Sciences, University of the Western Cape, Cape Town, South Africa
^IIMB ChB, Hons BSc (MedSc), MPhil, PhD Department of Health Sciences Education, Faculty of Health Sciences, University of Cape Town, South Africa

Correspondence

ABSTRACT

As I traverse my (post) doctoral journey, reworking my thesis into publications, I was immersed again in a debate around the utility of the concept of data saturation. I believe this debate to be emblematic of the process of unlearning and relearning that unfolded during my doctoral journey, coming from a biomedical sciences background into qualitative educational research.

Data saturation, a concept that disseminated from grounded theory, has been described as the point at which no new codes emerge from the data or at which themes stabilise - data collection and analysis may cease as information redundancy has been 'achieved'. Even though it is a seductive concept, qualitative researchers must be wary of persistent, dominant and positivist assumptions that seek to surreptitiously impose themselves on our research.^[1,2] The concept of data saturation implies exhaustive comprehension and absolutes, both of which are incompatible with an interpretative qualitative research paradigm. Indeed, attempts to explicitly operationalise the (mysterious and murky) concept of data saturation emphasise a positivist bent, as exemplified by practices such as code replication, code frequency and code-book stability.

Qualitative research is intrinsically, and unapologetically, contextually responsive; multiple perspectives and plurality of interpretations are legitimate - yet this is not to say these are not rigorous. Rigour is achieved through credibility, transferability, dependability and confirmability. Data saturation aligns more strongly with a neo-positivist paradigm that strives for reliability and generalisability. The concepts of information power, theoretical sufficiency and conceptual depth offer alternative approaches to data saturation for defensible decision-making around ceasing data collection and analysis.

Information power speaks to the characteristics and quality of data collected - not the quantity.^[3] Information power is based on dynamic interactions between the aim and scope of the study, specificity of the sample, use of established theory, quality of dialogue and analysis strategy.^[3]

If the aim of a study is broad, more participants are needed; if narrow, fewer. The more relevant the information regarding participants, the fewer participants are needed. If participants are deeply and richly experienced in the phenomenon being investigated, the sample specificity is 'dense', and fewer are needed; yet, if the sample specificity is sparse, more are required. If a study is theoretically grounded, less data will possibly need to be collected, but if use of theory is low, more will be needed. If the quality of data collected is strong, e.g. articulate, relevant and progressively developed through a productive interviewer-interviewee relationship, less data will be needed; however, if dialogue is weak, e.g. ambiguous or unfocused, more will be needed. If a study is a single in-depth case, fewer participants are needed than for an exploratory cross-case study, in which a broad range of variations of the phenomenon being investigated are required.

Sample specificity could, to an extent, be addressed before data collection with purposive sampling; yet, it requires ongoing evaluation during iterative data collection and analysis, along with the other characteristics. Information power is multidimensional; therefore, researchers thoughtfully and critically need to consider dynamic interactions when determining how 'powerful' their data are.

Theoretical sufficiency refers to data adequacy rather than data saturation.^[1,4] These concepts are not about ceasing data collection and analysis when no new codes emerge, but rather about whether there is enough evidence to support the claims being made and build the proposed theory. It is not the number, frequency or prevalence of the codes that matters, but their meaning, the relationships between them and the credibility of the explanations they offer regarding the phenomenon being investigated. Rather, whether findings are warranted by the data and analysis is important.

Conceptual depth is related to but more specific than theoretical sufficiency. 'To reach conceptual density is not to reach a final limit, beyond which it is impossible to achieve new insights, but it is to reach a sufficient depth of understanding that can allow the researcher to theorise.^['^5] Is there a wide enough range of evidence to illustrate concepts and broader themes in the findings? Do the proposed concepts and themes connect in rich and complex ways, with extensive relationships and variations explained? Is there subtlety and richness in the findings? Do these resonate with the existing literature? In short, to what extent has the phenomenon been explored - how deep and dense is the theorisation?

Theoretical sufficiency and conceptual depth are achieved by striving for conceptual coherence and alignment between the research questions, sampling, theorising and theories drawn upon. Sampling, data collection, analysis and interpretation must be reported, using thick descriptions of each. Provide robust evidence, for example participant quotations, to illustrate concepts and themes. Articulate the subtlety and richness, variation and novelties of the findings. Demonstrate how they relate to (confirm, build upon or contradict) the existing literature. Adequately explore and address disconfirming evidence, negative cases, uncertainties and study limitations.

Achieving multifaceted rigour in qualitative research is not about having a final, all-encompassing result, but gathering enough evidence to develop a defensible finding. Importantly, establishing rigour is an ongoing and iterative process. Determining how much is enough with regard to sampling cannot be done a priori; rather, a researcher may argue for, or against, ceasing data collection and analysis as the phenomenon is explored and understood, the developing findings are constructed and their quality is evaluated.

In too many papers, concepts such as data saturation, information power, theoretical sufficiency and conceptual depth are bandied about merely as a way of ticking a methodological box. Whatever approach is adopted, decisions regarding data collection and analysis are ultimately a function of theory and pragmatism. Feasibility and limitations are elements of all studies; therefore, prioritising quality over quantity is crucial, especially in resource-constrained settings. Detailed descriptions of the study context, the appropriateness and adequacy of sampling, and the richness of their data are crucial. Instead of asking, 'How much?', we should focus on whether the research is educationally imaginative, socially significant and theoretically illuminative. It is only in the doing of qualitative educational research that one can answer how much is enough.

Declaration. None.

Acknowledgements. None.

Author contributions. DS conceptualised the article; DS and FC contributed equally to writing and editing the manuscript.

Funding. None.

Conflicts of interest. None.

References

1. Varpio L, Ajjawi R, Monrouxe L, O'Brien B, Rees C. Shedding the cobra effect: Problematising thematic emergence, triangulation, saturation and member checking. Med Educ 2017;51(1):40-50. https://doi.org/10.1111/medu.13124 [ Links ]

2. Braun V, Clarke V. To saturate or not to saturate? Questioning data saturation as a useful concept for thematic analysis and sample-size rationales. Qual Res Sport Exercise Health 2021;13(2):201-216. https://doi.org/10.1080/2159676X.2019.1704846 [ Links ]

3. Malterud K, Siersma VD, Guassora AD. Sample size in qualitative interview studies: Guided by information power. Qual Health Res 2016;26(13):1753-1760. https://doi.org/10.1177/1049732315617444 [ Links ]

4. Vasileiou K, Barnett J, Thorpe S, Young T. Characterising and justifying sample size sufficiency in interview-based studies: Systematic analysis of qualitative health research over a 15-year period. BMC Med Res Methodol 2018;18(1):148. https://doi.org/10.1186/s12874-018-0594-7 [ Links ]

5. Nelson J. Using conceptual depth criteria: Addressing the challenge of reaching saturation in qualitative research. Qual Res 2017;17(5):554-570. https://doi.org/10.1177/1468794116679873 [ Links ]

Correspondence:
D Sims
dsims@uwc.ac.za

Accepted 25 July 2022