Scientific Wealth in Middle East and North Africa: Productivity, Indigeneity, and Specialty in 1981–2013

Several developing countries seek to build knowledge-based economies by attempting to expand scientific research capabilities. Characterizing the state and direction of progress in this arena is challenging but important. Here, we employ three metrics: a classical metric of productivity (publications per person), an adapted metric which we denote as Revealed Scientific Advantage (developed from work used to compare publications in scientific fields among countries) to characterize disciplinary specialty, and a new metric, scientific indigeneity (defined as the ratio of publications with domestic corresponding authors) to characterize the locus of scientific activity that also serves as a partial proxy for local absorptive capacity. These metrics—using population and publications data that are available for most countries–allow the characterization of some key features of national scientific enterprise. The trends in productivity and indigeneity when compared across other countries and regions can serve as indicators of strength or fragility in the national research ecosystems, and the trends in specialty can allow regional policy makers to assess the extent to which the areas of focus of research align (or not align) with regional priorities. We apply the metrics to study the Middle East and North Africa (MENA)—a region where science and technology capacity will play a key role in national economic diversification. We analyze 9.8 million publication records between 1981–2013 in 17 countries of MENA from Morocco to Iraq and compare it to selected countries throughout the world. The results show that international collaborators increasingly drove the scientific activity in MENA. The median indigeneity reached 52% in 2013 (indicating that almost half of the corresponding authors were located in foreign countries). Additionally, the regional disciplinary focus in chemical and petroleum engineering is waning with modest growth in the life sciences. We find repeated patterns of stagnation and contraction of scientific activity for several MENA countries contributing to a widening productivity gap on an international comparative yardstick. The results prompt questions about the strength of the developing scientific enterprise and highlight the need for consistent long-term policy for effectively addressing regional challenges with domestic research.


Section A: Methods and Data
We used the peer-reviewed published journal paper as the basic unit of scientific output, and conducted our analysis of scientific activity in the MENA countries on data sourced from the Science Citation Index -Expanded TM accessed by Web of Science TM Core Collection that includes over 8500 leading scientific and technical journals. The outputs of scientific activities are much more than published papers, bibliometric analysis nevertheless allows for comparisons and quantitative measurements of the system that are traceable over time. Measuring the quantity of research output is an important part of informed policy making, and is employed by government agencies tasked with assessing national scientific activities (20). Citations and impact factors can also provide important insights. However, we focus on quantity of output since our focus of study is countries seeking to develop nascent scientific research activities for which publication volume would be a more useful assessment measure.
We analyzed seventeen countries: Morocco, Libya, Algeria, Tunisia, Sudan, Egypt, Jordan, Lebanon, Syria, Iraq, Yemen, Saudi Arabia, Oman, United Arab Emirates, Bahrain, Kuwait, and Qatar. Arabic is the dominant language in all of these countries, but scientific and technical research is primarily published in English (~98-99%). The known English language publication bias (23) will therefore not distort significantly the productivity comparison among the MENA countries, although it will affect the cases of China and some other countries that are provided as context for the MENA data.

Data collection and analysis.
Publications data for this study was obtained from the Science Citation Index-Expanded using advanced search queries executed in the online search engine on the Web of Science TM Core Collection. We included full journal articles only (and did not account for letters, reviews, conference papers, books or other publications). The queries were performed during 2014 through 2015. Due to the consistent growth of journals indexed in the database, it is likely that there may be some differences in exact publication counts for queries executed at a later time. The focus of our work was on analyzing total scientific publications in nations that have not had significant research activities in the recent past. Therefore, we chose the Science Citation Index-Expanded (rather than the more widely used Science Citation Index) since it has wider coverage (although of varying journal quality).
The yearly population data was obtained from the Data Bank web portal of the World Bank (24) that was used for computing per capita annual publications for each country. Population data used for 2013 is shown in Table B.

Publications Data
We obtained publications volume data for each country using advanced search queries of the form: "CU='<country name>' with publication type specified as 'journal article' and 'English' language. The advanced search field tag, 'CU', searches for countries in addresses fields within records (25). The results of each country's search were counted by year using the Web Of Science analysis tool.

Author Location Data
For author location data, we obtained the full citation records of all publications between 1981-2013 for Kuwait, Qatar, UAE, Saudi Arabia, Bahrain, Oman, Yemen, Sudan, Libya, Syria, and Iraq. For other countries (with larger number of publications), we obtained random samples of 500 records for each year and analyzed address information of reprint authors in that sample (using our text parsing routines coded in Matlab TM ) to statistically estimate indigneity.

Subject Area Data
We obtained subject area data for each country by analyzing the country results in multi-year intervals (1981-85, 1986-90, 1991-95, 1996-2000, 2001-2005, 2006-2010, and 2011-2013) with 'research area' analysis in Web of Science. This provides count of papers for each area. It should be noted that a paper may be assigned multiple subject areas (e.g. it may have two areas Ecology, and Marine & Freshwater Biology) associated with it. We found the world total annual publications in each area by running advanced search queries of the form: "WC=<subject area name>', and then sorted the results by year. We conducted this search for 175 areas ( Table A) that we determined to be relevant leaving out areas from social sciences (such as economics, business management etc.) The subject areas were consolidated into 15 categories (Table A) to allow for simpler presentation of results and high-level insights. The categories were based on results reported in (21), wherein a systematic decomposition of a journal-journal citation matrix was used to identify inter-connected disciplines. We used those results with some modifications for engineering disciplines that are relevant for the regional economies in the MENA.

Publications Volume
The total publications for a country i in year t was defined as: (1) The whole-counting approach was used, where for instance, if a publication had three co-authors, and one of the co-authors had an address in Kuwait, the publication would be included as a full count for Kuwait. This approach provided an upper limit accounting of the publications for each country. The attribution for each country was made only the basis of address information and the citizenship or national origin of authors was not taken into account.
The global share of each country was computed for each year t as: (2) where N is the number of countries with journal publications records in year t.

Scientific Productivity
The productivity was measured as the ratio of annual publications and population for each country. We computed the scientific productivity, ⎜ i of country i in year t, as: This ratio (of total publications to total population) has been used in past work (1,2). Ideally, the number of total scientists and researchers should be used instead of total population of a country. The productivity measures computed for US and OECD countries typically use data of scientific research workforce. This data however is not available for many countries where science and technology sectors are not well developed. In such cases, the total population serves as a proxy variable for determining productivity.
The productivity values computed with total population numbers have to be treated with caution, since the demographics in MENA countries are heavily skewed towards younger ages. The 0-14 years age group constitutes 28% of the population on average in the region (Table B). In the selected countries used for comparison, the 0-14 years age group constitutes 19% of the population on average (Table B).

Scientific Indigeneity
In this measure we are interested in assessing the extent of the scientific output that can be attributed to researchers resident in a country. Co-location of researchers is an important issue, since it impacts the speed and type of knowledge transfer, and efficiency and quality of collaborations driven by shared research interests. Furthermore, in some cases it has been found that geographic proximity is important for university-firm interactions (14) -and this can have important implications for workforce training, as well as innovation, and industrial competitiveness. The impact of geography on knowledge transfer is an active area of research given the new globalizing trends, increased mobility and ease of communications.
We use the country addresses of corresponding authors to account for scientific output for each country. The corresponding author is often fully knowledgeable about the work that is presented in the paper and manages the paper through the peer-review process. She may be the researcher who has done the primary work, or is the senior researcher who has been a central part of the work. Our choice of corresponding author allows for striking a balance in the issue of first and last author contributions, where in some fields, the first author represents the researcher who has done the primary work, whereas in some cases the last author is the main driver of the research. We compute the indigeneity, ⎣ i of country i's scientific publications, as (4) where x i is the number of publications in year t where the corresponding author has address in country i.
We recognize that there are limitations with this approach. The contact addresses of the corresponding author may not accurately reflect the location where the published work was actually conducted. Furthermore, researchers frequently move, or often have multiple concurrent affiliations with institutions located in different countries. Nonetheless, the author country addresses provide a verifiable and quantifiable measure for assessing general patterns of location of scientific activities and collaborations, and we utilize this information to conduct our analysis. There is also no reason to believe that this introduces significant biases in the cross-country comparison.

Scientific Specialization -Revealed Scientific Advantage (RSA)
We analyzed subject areas of publications for each country and compared share of publications in particular subject areas of a country in total world publications. Using the Revealed Comparative Advantage (RCA) concept from international trade theory (26), we defined the Revealed Scientific Advantage (〉 ij ) for a country i in subject j as: The Revealed Scientific Advantage (RSA) is the RCA of a country's publications. It is computed as the fraction of publications in subject j within country i's total publications normalized by the fraction of publications in subject j in total world publications. RSA gives a measure of how the publications output in a subject differs from the world average. Past research has focused on relative citation impact and publications RCA of countries within different fields of science (1). It was found that scientifically strong countries (such as the US and Japan) and scientifically weak countries (such as Papua New Guinea) had no particular pattern of specialization. To the best of the knowledge of the authors, there has been no recent assessments of field specific specializations for MENA region or an updated analysis for the comparison group.

Section B: Recent developments in Science and Technology in MENA
Over the last several decades, Arab countries in the Middle East and North Africa (MENA) region have lagged behind in scientific research (5) with insufficient government support and lack of long-term focus on building local capacity in science and technology. However, in recent years the oil-dependent economies of Saudi Arabia, United Arab Emirates, and Qatar in particular have sought to expand their scientific and technological capacity for economic diversification (27,28). In 2011, Saudi Arabia was among the top 40 countries in the world for R&D spending (29), and in 2014, 56 billion US dollars representing 24% of the total Saudi national budget was allocated for education and training (30). Many regional universities -that had historically only focused on teaching and professional training -are embracing research as a key part of their core mission. This adoption of the Humboldtian model in the region, wherein universities are not only repositories of knowledge, but contribute to creating new knowledge (27,31), is evident through expansion of research faculty in science and engineering departments, creation of new and expanded laboratories, and established of science and technology parks in the region. Middle Eastern universities are actively targeting foreign researchers to relocate to the region to jumpstart and expand local research (32). These efforts -from increased funding to attracting scientific research talent -are bearing some results as publications in MENA countries have been on the upward trajectory (Fig. A).

Section C: Publications Growth Rates in MENA Countries and Selected Comparison Countries
We computed rates of change in productivity using a 3-year moving average for 1991-2013. We only included data from 1991 onwards since science activity was small in many MENA countries in the 1980s decade. For year y, the average of the growth in y-2, y-1, and y was computed. For 1991, we used data for 1989, 1990, and 1991. The descriptive statistics for the last two decades are shown in Fig. A. The results of individual countries show wide ranging variation with repeating cycles of periods of decline, stagnation, and growth for countries in the Arabian peninsula (Fig. B and C). Since 2007, Saudi Arabia and Qatar have shown sustained growth starting from approximately zero in 2007 to 35% and 28% respectively in 2013. This sustained growth led Saudi Arabia, in 2012, to surpass Egypt and Tunisia-the long time regionally scientific dominant nations -in total publications volume (see Figure A in Section B). There has been, however, debate regarding the rapid increase in publications from Saudi Arabia (35). And some experts in the discussions noted the difficulty of using publications data to assess research in the country wherein there are extensive programs for enlisting international researchers for visiting-affiliations at Saudi institutions. The countries in the Levant (Jordan, Syria, Lebanon), and North Africa also show cycles of growth and decline, with effects of socio-political turmoil and military conflict prominent for Iraq (with a negative productivity growth rate from 1991 to 2000), declining growth for Tunisia and Egypt since 2011, a sharp decline for Syria in 2012 and so on.     The indigeneity was computed for each country for years between 2000 and 2013. A full data set was used for eleven cases (shown above), while 500 or 1000 samples were used for the rest. In cases where total publications for a particular year were less than the sample size (e.g. less than 500 or 1000), then the total number of publications was used. The margin of error in indigeneity computation for each year was equal to or less than the maximum margin of error value (at 95% confidence level) shown for each country in Table D.    When the relationship between a country's productivity and indigeneity was analyzed, we found that in most cases there was a negative linear trend, however Saudi Arabia, Qatar, and United Arab Emirates showed almost exponential decline in indegenity with increasing productivity over time (Fig. E -G in this section). The productivity increases in these countries have occurred in step with decreasing level of indegeneity. In some cases, there were exceptions such as for Turkey, South Korea and China where the negative trend is negligible indicating that productivity gains have been made through expansion of local capacity (wherein fraction of domestic corresponding authors serves as a proxy indicator).    The changing patterns of specialization for the countries in the comparison group can be seen in Fig. B and C in this section. The trends for China show high initial focus (RSA value) in materials sciences, physics, and mathematics -confirming previous findings of sharp focus of specialization during early stages of development (1) -followed by falling levels of RSA indicating an evening of emphasis in national research across fields. An initial large focus in mechanical/industrial/aeronautical engineering, mathematics, and civil engineering/ environmental sciences is also evident for Singapore. Algeria shows growing focus in computer science and electrical engineering, mechanical/industrial/aeronautical engineering, and somewhat sustained focus in materials science. Algeria, Tunisia, and Morocco show emphasis in mathematics and physics. Tunisia also shows growing focus in Agriculture (Fig. E). Egypt, the largest country in the group shows no particular specialization in any discipline. Figures D and E collectively show that Egypt, the largest country in the group has no specific specialization, through agriculture, chemistry/chemical engineering, and civil engineering/environmental sciences show somewhat higher RSA values (~1.5). Sudan-a country with small scientific output -focused in agriculture, infectious diseases, and general medicine/health with RSA values up to 6 and 8 in some cases. Libya, another small country in terms of publications, showed strong emphasis (with RSA up to 5) in geological science/petroleum engineering (which also corresponds to the country's oil resources). It also shows higher RSA values (~2) in chemistry/chemical engineering, environmental science/civil engineering, infectious diseases, and general medicine/health.  Bahrain, Qatar, and UAE the smaller countries, show some of the highest RSA (~4 -7) in the 1981-1985 period. UAE and Qatar also show consistent rising trends in mechanical/industrial/aeronautical engineering, and in computer science/electrical engineering. Two specific patterns emerging concurrently across most of these countries are interesting: a rise in RSA for civil and environmental engineering in 1991-1995, and an increase in RSA in general medicine and health in 2001-2005 period. The trends in geological science/petroleum engineering and in chemical engineering/chemistry -subjects that relate closely to the largest economic and industrial sectors of oil and gas in these countries -are largely of falling RSA.  In this group of countries, the rising trends follow similar patterns to those observed for Gulf countries, with mostly growing emphasis in computer science/electrical engineering, and more modestly (but consistently) in biomedical sciences.