Classifications for Cesarean Section: A Systematic Review

Background Rising cesarean section (CS) rates are a major public health concern and cause worldwide debates. To propose and implement effective measures to reduce or increase CS rates where necessary requires an appropriate classification. Despite several existing CS classifications, there has not yet been a systematic review of these. This study aimed to 1) identify the main CS classifications used worldwide, 2) analyze advantages and deficiencies of each system. Methods and Findings Three electronic databases were searched for classifications published 1968–2008. Two reviewers independently assessed classifications using a form created based on items rated as important by international experts. Seven domains (ease, clarity, mutually exclusive categories, totally inclusive classification, prospective identification of categories, reproducibility, implementability) were assessed and graded. Classifications were tested in 12 hypothetical clinical case-scenarios. From a total of 2948 citations, 60 were selected for full-text evaluation and 27 classifications identified. Indications classifications present important limitations and their overall score ranged from 2–9 (maximum grade = 14). Degree of urgency classifications also had several drawbacks (overall scores 6–9). Woman-based classifications performed best (scores 5–14). Other types of classifications require data not routinely collected and may not be relevant in all settings (scores 3–8). Conclusions This review and critical appraisal of CS classifications is a methodologically sound contribution to establish the basis for the appropriate monitoring and rational use of CS. Results suggest that women-based classifications in general, and Robson's classification, in particular, would be in the best position to fulfill current international and local needs and that efforts to develop an internationally applicable CS classification would be most appropriately placed in building upon this classification. The use of a single CS classification will facilitate auditing, analyzing and comparing CS rates across different settings and help to create and implement effective strategies specifically targeted to optimize CS rates where necessary.


Introduction
The worldwide rise in cesarean section (CS) rates is becoming a major public health concern and cause of considerable debate due to potential maternal and perinatal risks, cost issues and inequity in access. [1][2][3][4] The increase in CS rates observed in many developed and middle-income countries contrasts sharply with the very low rates in numerous low-resource settings, along with lack of access to emergency obstetric care. According to recent data, in Middle Africa, only 1.8% of all live birth deliveries occur by CS, compared to 24.3% in North America and 31% and in Central America. [5] The main determinants of this disparity and specific reasons for the increase in CS rates in most of the world remain unclear.
In order to propose and implement effective measures to reduce or increase CS rates where necessary, it is first essential to identify what groups of women are undergoing CS and investigate the underlying reasons for trends in different settings. This requires the use of a classification system that can best monitor and compare CS rates in a standardized, reliable, consistent and action-oriented manner. Such a classification system should be applicable internationally and useful for clinicians and public health authorities. Ideally, such a system should be simple, clinically relevant, accountable, replicable and verifiable. [6] Over the last decades, several CS classification systems have been created and proposed for different purposes. [6][7][8][9][10][11][12] However, to our knowledge, there has not been a systematic review of the existing CS classification systems, analyzing advantages and deficiencies of each system. This gap motivated the present study. We believe this review is a necessary step in the process of developing a standardized and internationally accepted methodological framework for monitoring, auditing, analyzing and comparing CS rates.
The objectives of this study were 1) to identify the main available classification systems for CS through a systematic review of the literature, and 2) to analyze qualitatively and compare the advantages and deficiencies of each system through a pre-defined comparative framework based on criteria recognized as important by an international panel of experts.

Methods
This study has two components: 1) an enquiry to experts about critical characteristics of a classification for CS, and 2) a systematic review of the literature to identify and critically appraise available classifications.

1) Questionnaire to panel of experts
A panel of 46 multidisciplinary international experts were contacted by email or personally and asked to collaborate with this study by answering a questionnaire on classifications of CS ( Figure S1). They were asked to grade a total of 18 proposed characteristics of a classification system for CS from 1 to 9 (1 = not important; 9 = essential). These characteristics were divided into four main domains (See Table 1): i) General characteristics, ii) Requirements, equipment, necessary skills, iii) Use, and iv) Number and content of categories. Their answers were tabulated in an Excel spreadsheet and ranked according to frequency. Results from this analysis provided the basis for the data-extraction form and assessment of each classification.

2) Systematic Review
Types of studies. Any study that described a theoretical or practical (i.e. actually tested in patients) CS classification system or model was eligible for inclusion in this review, regardless of the level (e.g. facility, regional, national) in which it was applied. We included studies regardless of whether or not the main purpose of the manuscript was to propose a classification (i.e. the classification could be a secondary outcome in the study).
Type of participants. Only studies presenting CS classification systems for low-risk or unselected/general obstetric patients were included.
Type of classification systems. Any type of CS classification system described in sufficient detail to be understandable and replicable was accepted. Any system or model that systematically grouped or organized CS, obstetric populations or other items (traits, characteristics, variables, attributes) potentially related to the performance of CS into categories was considered a classification. Whenever a classification was presented in more than one publication, data were extracted initially from the original source and complemented, if necessary, with information presented on subsequent publications that reported on its use.
Search strategy for identification of studies. Three electronic databases were searched (MEDLINE, EMBASE and LILACS) for articles published from inception to November 26 2008. The search strategy used the following general terms, expanded and adapted for each database: "classification" or "taxonomy" or "nomenclature" or "terminology" and "cesarean section" or "cesarean delivery" or "abdominal delivery" (exact terms presented in Figure S2). There were no language or country restrictions. Classic review articles, textbooks and published letters were also examined for potentially eligible studies. We checked the references of all articles chosen for full-text evaluation. Experts were contacted and emails sent to authors of potentially eligible studies, inquiring about details, unpublished material and their knowledge of other relevant studies on CS classification.
Screening and data extraction. All citations identified were downloaded into Reference ManagerH software version 10. The citations were organized and duplicates deleted. Two investigators (MRT and APB) independently screened the results of the electronic searches to select potentially relevant citations based on title and abstracts, according to the criteria defined above. Discrepancies were resolved through consensus. When a citation was considered relevant or when title/abstract was deemed insufficient for decision on inclusion/exclusion, the full texts were retrieved and evaluated. All articles selected at first screening were read and abstracted individually by the two reviewers using a structured data-extraction form specifically created for this review ( Figure S3). Data extracted were compared and discussed by the two reviewers and a final extraction form was compiled. Information extracted from each article included: 1) main purpose of classification, 2) type of study (theoretical versus clinical), 3) characteristics of study and site (setting, CS rate, number of cases, inclusion/exclusion criteria), 4) general characteristics of the system, 5) requirements and skills for implementation, 6) potential use of the classification, 7) specific characteristics of the classification, 8) main strengths and weaknesses of the system reported by the authors, and 9) main strengths and weaknesses of the system as per reviewers. When data in the original publication were not sufficiently detailed, authors were contacted for additional information. In order to assure consistency in the assessment of the classifications over time, the reviewers compared newly extracted with previously extracted articles and forms.
Semi-qualitative evaluation of classifications. A general comparison table was constructed describing the main characteristics, strengths and weaknesses of each classification system. Seven specific domains (ease of use, clarity, exclusiveness of categories, inclusiveness of classification, possibility of using classification prospectively, reproducibility and requirements for implementation) were graded (2 = good; 1 = median; 0 = poor).The final grade of each classification ranged from 0 to 14, the higher the grading the better the classification. Each classification was assessed and scored independently by the two reviewers, the answers were compared and discussed until a consensus was reached.
To assess each classification beyond a theoretical model, we created a set of 12 different clinical case-scenarios ( Figure S4).
After reading and extracting data from each classification system, the two reviewers independently tested the classification using these 12 clinical cases. As opposed to the data extraction, the results of these case scenarios were not compared, reviewed or discussed between the reviewers since we aimed to assess interrater agreement. Performance of each classification was assessed by: a) the agreement between the two reviewers in classifying each case in one of the proposed categories (reproducibility); b) the possibility of including each of the 12 cases in no more than one of the categories proposed by the classification (exclusiveness); and c) the ability to include each of the 12 cases into a specific category (inclusiveness).

1) Questionnaire to panel of experts
Of the 46 experts contacted, 38 returned the questionnaire on CS classifications (82% response rate). For each of the first three domains: (i) general characteristics, (ii) requirements, equipment and skills, and (iii) use of the classification, the median grade was either 8 or 9 (over a maximum score of 9). Table 1 presents the average grade given to each of the questions in these domains. According to the experts, a CS classification should provide clearly defined and unambiguous categories, the data needed should be easy to obtain and it should be useful to help change clinical practice. Two-thirds of the experts (25/38) answered that ideally, a classification should have "between 6 and 10" main categories, while the rest suggested "5 at the most" (data not shown).

2) Systematic Review
The search strategy yielded 1076 citations in the Medline and EMBASE and 1872 in LILACS. A total of 60 were selected for full-text evaluation ( Figure 1). A total of 20 relevant studies were  (Table 2); one study [4] presented three classifications and two studies [13,14] presented two classifications each. These 27 classifications were grouped into 4 general types, according to the main unit being classified: indication (N = 12),[4,7,15-24] degree of urgency (N = 5), [13,14,25,26] woman characteristics (N = 4) [6,[27][28][29] and other systems (N = 6). [4,13,[30][31][32] Table 2 presents the main characteristics and performance of the 27 classifications, the overall score obtained and the results of the 12 case scenarios. Other types Code: 2 = good, = regular, 0 = poor, ; -= not applicable. 1-Easy: how much effort or time it takes to understand main concepts, logic and rules of the classification. 2-Clarity: clear, objective, precise and unambiguous definitions given for each category. 3-Mutually exclusive: each unit being classified by the system (e.g. woman or CS) can only be placed in a single of the existing categories. 4-Totally inclusive: Each and every unit being classified can be placed in at least one of the categories. 5-Prospective identification of categories: allows classification of the patient into one of the categories before she is taken to the operating theater. 6-Reproducibility: probability that the same case would be classified in the same category by different raters. 7-Implementability: human and material requirements needed to introduce and maintain the classification in continuous use. doi:10.1371/journal.pone.0014566.t002 Table 3 shows the main general strengths and weaknesses of each the 4 general types of classifications. Outlines of each of the 27 classifications are provided in Figure S5.
Indication based classifications. Table 4 presents the main details of the 12 classifications that belong to this category. Four [19,20,23,24] of the twelve indication classifications presented only theoretical models. The other eight classifications [4,7,[15][16][17][18]21,22] were tested on actual patients in studies with sample sizes ranging from 498 to 454,668 deliveries and CS rates from 0.6 to 25%. Only three classifications [15,16,22] provided clearly defined and unambiguous categories. For example, Althabe et al [15] proposed a CS classification along with a guideline containing specific, precise and clear definitions for indications such as dystocia, acute intrapartum fetal distress and several maternal indications, whereas Anderson's [7] classification also used these same terms but did not provide any details or parameters on how to decide that this was indeed the indication for the CS. Therefore, Althabe's classification was considered clear in its definition of categories, while Anderson's was considered unclear. On the other hand, Anderson's classification provided clear hierarchical rules on how to classify a woman with more than one indication for CS (for e.g. a case with previous CS and dystocia), while Althabe's classification did not provide instructions on how to deal with such cases, which could theoretically be classified in more than one category. This lead us to grade the system proposed by Anderson as being mutually exclusive, while Althabe's classification scored poorly on this characteristic. Only two classifications offered mutually exclusive categories [7,18] and five were totally inclusive. [4,7,15,16,21], meaning that each and every possible indication could be placed in at least one of the categories provided by the authors. Over half of these classifications were judged easy to implement. [4,7,17,20,[22][23][24] None of the classifications allowed prospective identification for all categories and in two classifications [7,19] less than half of the categories could be prospectively identifiable. This refers to the possibility of including a woman into one of the existing indication categories provided by the authors before she is actually taken to the operating theater. Two classifications, Althabe and Anderson's [7,15] obtained the best overall grade for this group of classifications (9 out of a maximum of 14 points).
Urgency based classifications. Table 5 presents the main characteristics of these classifications. All five classifications based on degree of urgency had been tested in real life, in studies with sample sizes ranging from 18 to 407 cases in settings with CS rates ranging from 17.7% to 27%. All were judged easy to understand and implement. Three had mutually exclusive categories, [14,25] and two were totally inclusive. [13,14,26] None of the classifications allowed prospective identification for all categories and in two, less than half of the categories proposed could be prospectively identifiable. Van Dillen's classification [14] obtained the best overall grade (9 out of a possible maximum of 14) ( Table 2).
Woman-based classifications. Table 6 presents the main characteristics of all four women-based classifications. These were tested in real life, with samples ranging from 2876 to 222,013 births, in settings with CS rates ranging from 7.9% to 31%. Three classifications presented mutually exclusive categories, [6,27,28] two were totally inclusive, [6,28] and two were judged very easy to implement. [6,27] Although the 10-group (Robson's) classification [6] received the maximum grade in this type of classification, the 8group (Denk) [28] and the case-mix (Cleary) [27] classifications also obtained high grades (Table 2).
Other types of classifications. The six other types of classifications are presented on Table 7. One of these classifications was just a theoretical model that was not tested in real life; [31] the other five were tested in studies involving from 137 to 32,222 cases in settings with CS rates ranging from 23% to 35%. These classifications proposed from 3 to 21 main categories and up to 39 subcategories. None of these classifications had mutually exclusive categories and two were totally inclusive. [30,32]

Discussion
This review identified 27 classification systems which were grouped into 4 general types, based on the main unit being classified. Classifications based on indications for CS were the most frequent type. The main question answered by this type of classification is "why" the CS was being performed, an information routinely registered and available in any maternity, therefore making this type of system easy to implement in any setting. On the negative side, almost all of the models in this group of classifications had categories that are not mutually exclusive and had low reproducibility. Due to these main drawbacks, the disagreement between reviewers in the case-scenarios was high (see Table 2); in six classifications there was disagreement in at least 6 of the 12 case-scenarios. Main weaknesses of these systems include: a) poor/unclear definitions for some of the most common conditions that lead to CS (e.g. dystocia, fetal distress) and therefore questionable inter-rater reproducibility; b) categories not mutually exclusive, implying that there would need to be some kind of hierarchy guideline to classify cases with .1 primary indication; c) not being totally inclusive, unless an extensive list of indications is provided or an ''Other indications'' category is created; and d) not be very useful to change clinical practice, since most of the indications are not prospectively identifiable (Table 3). This type of classifications proposed the largest number of categories, although some, such as Anderson's [7] were quite simple. This specific classification, together with Althabe's had the highest rating in this group. Unlike others in this group, Anderson's classification was judged easy to understand and implement, had a good inter-rater reproducibility and was all inclusive. A unique asset of this model was that it presented clear hierarchical rules for classifying cases with .1 indication for CS which made the categories mutually exclusive. Classifications based on degree of urgency for CS were also theoretically easy to understand and implement due to the reduced number of categories proposed ( Table 2). This type of classification, which basically answers "when" (or how quickly) the CS should be performed, could improve communication between health professionals (nurses, obstetricians, anesthesiologists) thus potentially lead to better maternal and perinatal outcomes. A weak point of several of these classifications is the lack of clear and unambiguous definitions for each of the proposed categories, which could compromise inter-rater reproducibility, comparability and interpretation. Three of the five presented 50% or more of disagreement between reviewers in the case-scenarios. Additionally, the cut-offs (time to delivery intervals) proposed to define each category are subjective and not evidence-based. Finally, the amount of information provided by these systems is very limited and therefore this type of classification would have to be complemented by other types, in order to be more useful.
Classifications based on woman characteristics basically tell us "who" is being submitted to CS, based on maternal and pregnancy characteristics. These represented 4/27 systems identified. Most of these classifications are conceptually easy and simple, have relatively few, and clearly defined categories which are mutually exclusive and allow cases to be prospectively identified upon admission, which could be useful to change clinical practice. Due to all these characteristics, these classifications could be easily implemented and would be highly reproducible as shown with the high agreement in the case scenarios (Table 2). Although most of these classifications are totally inclusive, the case-mix types are not since they only assess CS in a subgroup of women with a specific set of predefined characteristics, such as Cleary's [27] "standard primipara". Robson's 10 group, [6] along with Denk's 8 group [28] classifications got the highest overall theoretical ratings and also performed very well on the practical case scenarios.
Other types of classifications, which represented 6/27 classifications, address questions such as "where" the CS is being performed, "by whom", "how" (under what conditions and circumstances) or combinations of questions. By focusing on aspects often overlooked by other classifications, these systems provide administrators and with useful information about aspects that could affect maternal and perinatal outcomes and perhaps need more attention and investment. However, some of these classifications would need improvement and clearer definitions of some categories. Moreover, several of these systems are only theoretical and have never been tested in real setting. Since some of the data required are not usually collected in most maternities, these systems would require some effort and time to be implemented and not all items in these classifications will be relevant or applicable in all settings.
Based on the methodology used in this systematic review, Anderson's [7] and Althabe's [15] classifications obtained the highest grades and the best performance for indication-based classifications. This can be attributed to the fact that these two classifications provide very clear definitions of categories and precise decision rules or hierarchy on how to classify a case with .1 indication into a single specific category. In the degree of urgency systems, Van Dillen 2009a [14] was the best rated classification. Robson's 10-group model [6] was in first place among the women-based classifications and obtained the highest overall grade and best performance on the case-scenarios.
Each of these classifications offer intrinsic advantages and disadvantages and could be considered more or less useful depending on the objectives of the user. The two classifications with the best overall scores in this group (Robson and Denk [6,28]) are easy to understand, clear, mutually exclusive, totally inclusive, reproducible and allow prospective identification of categories. Additionally, they offer flexibility to adapt to different clinical settings, important aspects if one wishes to implement modifications in clinical protocols to decrease or increase CS rates. Robson's classification offers the possibility of subdividing three of its main categories into subcategories. Namely women at term with a singleton, cephalic, term fetus being submitted to a CS either after induction (groups 2a-nulliparas and 4a-multiparas) or electively (groups 2b-nulliparas or 4b-nulliparas), and women with either one or more than one previous CS (group 5a and 5b, respectively). These subdivisions would provide important information and help to understand differences between different settings or at the same setting over time, in these 3 categories. Despite the fact that the ''10group classification'' would actually become a ''13-group classification'', these subdivisions do not add any substantial amount of work since the information needed is routinely available in maternal charts. A problem with the women-based classifications is that they do not present why (indications) or when (degree of urgency) the CS was performed, which are also important aspects. After a thorough and careful analysis of a large number of classifications and systems for ceasarean deliveries, we acknowledge the fact that, at the present, there is no single ideal classification for all settings and that would fulfill the expectations and needs of every health professional. The choice of a specific classification will depend on the main objectives of the professionals who are going to use it. However, given the flexibility of some of the existing classifications, we believe it would be possible to create a hybrid model based on the womancharacteristics system with additional layers of other classifications for each of the individual categories proposed in the woman's classification. For instance, Van Dillen [14] and/or a modified version of Anderson's indication system [7] could be used within each of the 10 (or 13) categories proposed by Robson[6] or the 8 categories proposed by Denk. [28] This would allow comparison of degree of urgency for CS as well as indications in a homogeneous group of women, for example multiparas at term in spontaneous labor with a singleton cephalic fetus (Robson's group 3a), which represent a large proportion of all deliveries in any setting.
This systematic review had several strong points, starting with its uniqueness. This is the first study specifically designed to retrieve, analyze and critically appraise existing classifications for CS. We developed a broad search strategy, in order to capture the largest possible number of publications on this topic. We tried to reduce bias by using a panel of experts to determine what variables to analyze and two independent reviewers to extract data and test each classification in practical case-scenarios.
Potential limitations included difficulties in retrieving articles through electronic databases, possibly due to the lack of appropriate keywords to index this topic. We also acknowledge the possible existence of other unpublished CS classifications that could not be located, despite efforts to contact experts. Additionally, despite the use of strict methodology and double data extraction at all steps of the systematic review, there is always potential for subjectivity in the semi-qualitative assessment of the classifications. We also acknowledge that the scoring system presented on Table 2 may have limitations. To the best of our knowledge, there are no validated tools for assessing the characteristics of any classification system. This led us to create such a tool, which we tried to keep as simple and objective as possible. However, the use of only three possible grades for each of the domains of the classifications, although straightforward and easy, may be questionable.
Overall, we detected a basic need for clear, unambiguous and precise definitions for common obstetrical diagnoses and terms used to define categories in many of the classifications. Standardization of these terms is an essential step to improve inter-rater reproducibility and allow consistent and reliable comparison of information at the same setting over time and between different settings at various levels (local, regional and national). Specifically, in the indication classifications, terms/ diagnoses such as fetal distress, dystocia, failure to progress, cephalo-pelvic disproportion, obstructed labor, macrosomia, failed induction and failed trial of labor would need to be more clearly defined using unambiguous and preferably evidence-based terminology. Furthermore, it would be preferable to avoid the need for sophisticated equipments or technology (such as electronic fetal monitors or scalp pH) not routinely available in low-resource settings. Despite a few discrepancies, the terms and definitions used in the degree of urgency classifications (for e.g. urgent, emergency, crash, scheduled and elective) tend to be more precisely defined but none of them are evidence-based. Therefore, there is a need to conduct studies that assess if there are any significant differences in maternal and perinatal morbidity and mortality according to the time interval between decision to incision (or actual delivery). Only then would it be possible to establish more precise cut-offs used to define each of these categories.
In the context of international recognition of the difficulties in understanding and controlling the increase and inequitable use of CS worldwide, this systematic review suggests that, among all classifications identified, women-based classifications in general, and Robson's classification, in particular, would be in the best position to fulfill current international and local needs, and that efforts to develop an internationally applicable CS classification would be most appropriately placed in building upon this classification. The dissemination and implementation of a single CS classification system will allow auditing, analyzing and comparing rates of CS across different hospitals, cities, countries and regions. With a clear understanding of why, when, where, how and on whom CS are being performed, it would then be possible to propose and implement effective strategies and actions specifically targeted at high-risk groups, and thus possibly reduce or increase the rate of CS in order to continue improving maternal and perinatal outcomes. Figure S1 Survey questionnaire. Questionnaire sent to international panel of experts to rate items considered important in a classificaiton for cesarean sections.