Figures
Abstract
Introduction
As many web platforms adopt collaborative content editing models, the gender gap is addressed as one of the chief concerns in using technology to restrict content editing by one gender.
Objective
This study aims to analyze the Arabic Wikipedia, the largest collaborative content editing platform on the Arabic web, in terms of gender behavior and differences in user activities.
Methods
This study is the first to address the gender gap in Arabic Wikipedia, characterize users’ gender through their behavior, and then address changes in characteristics over the past five years. This study analyzes parts of Arabic Wikipedia offline by linking article pages and page edit histories to user profiles of known genders.
Results
This study reported that a gender gap exists in Arabic Wikipedia. The results reported differences over the past five years between both genders in terms of tasks and user behavior. One aspect that indicated similarity is the period of active time over months/years. Differences were observed in the reported number of increasing users, activities, responsibilities, and average actions performed.
Citation: Al-Shboul B, Al-Qudah DA, Boshmaf H, Abu-Salih B, Beseiso M, Al-Saqqa S (2024) Arabic Wikipedia users’ personalized behavior analysis considering gender gap. PLoS ONE 19(10): e0312176. https://doi.org/10.1371/journal.pone.0312176
Editor: Omar Enzo Santangelo, Regional Health Care and Social Agency of Lodi, ITALY
Received: December 20, 2023; Accepted: October 2, 2024; Published: October 21, 2024
Copyright: © 2024 Al-Shboul et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data was downloaded from Wikipedia free dump files, an open access site for all researchers working on Wikipedia. No authorization is required. The site was last accessed in December 2023. In addition, all extracted, then utilized contents, were compressed in an attached file containing all data required to generate tables and figures. We have uploaded our dataset/database to both: FAIRsharing and Kaggle. FAIRsharing.org: AWGS; Arabic Wikipedia Gender Study, FAIRsharing ID: https://fairsharing.org/5847, Last Edited: Saturday, September 28th 2024, 1:57, Last Editor:bashar, Last Accessed: Saturday, September 28th 2024, 1:57 https://www.kaggle.com/datasets/basharshboul1981/arabic-wikipedia-gender-study.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Owing to the availability of extensive data and the acquired role of collaborative writing, information sharing, and personalized user profiling, many users have become active and essential contributors to available e-content. Wikipedia is one of the primary platforms with the highest reputation, number of users, and content. Although the content available on Wikipedia is extensive and estimated to be more than 4.3 billion words including all English Wikipedia articles as of June 2023, Arabic content is estimated to be slightly more than 1.2 million words including all articles. Arabic user contributions remain unexplored, particularly from a gender perspective. Arabic content and users over the internet remain ambiguous because of the complexity of the language, multiple dialects, and cultural challenges of gender related to contributions of women over the internet. A few studies have been conducted on gender gap in Arabic Wikipedia. In 2013, only 11% of Arabic Wikipedia editors were identified as women. A 2016 study revealed that only 19% of Wikipedia editors in Arabic were women and only 15% of all Wikipedia articles in Arabic were about women. A 2018 study revealed that the gender gap in Arabic Wikipedia is wider in some countries than in others. This study aims to quantify gender-related issues by studying editing profiles to support Arabic Wikipedia as a more inclusive and representative resource.
The gender gap in Arabic Wikipedia is a complex issue with few indicators such as 1) the few existing women editors, 2) systemic biases in editing culture, and 3) gender preference in writing articles. As digital platforms such as Wikipedia provide powerful platforms to amplify women’s voices and perspectives, they enable women to share their thoughts, experiences, and creative idea-seeking opportunities. This newfound freedom empowers women to contribute toward enriching the content landscape with diverse voices and narratives that may have been marginalized or underrepresented in conventional media [1]. This technology has the potential to free women from various gender roles and stereotypes of their roles in society by facilitating them to improve their careers in the world of work, education, and intellectuals [2, 3]. The digitization of women’s roles in collaborative content writing undeniably reshapes the narrative of gender in the field and brings forth a powerful shift toward diversity and inclusivity. The digital landscape has enabled more women to work remotely as freelancers. Many women take advantage of this flexibility to engage in collaborative writing projects, contributing content from the comfort of their homes or preferred locations. This transformation not only amplifies the voices of women, but also underscores the immense talent and creativity they contribute to the world of content creation [1].
This study aims to explore the behavior of Arab users of Wikipedia, with an emphasis on gender in relation to contributions, resources, and time dedicated to their involvement. This study was conducted using information retrieval techniques and text analysis of contributors over the last five years with the aim of bridging discourses on information technology, gender, and feminism. Thus, the conceptual framework builds on creating a comprehensive and critical understanding of how gender–power dynamics influence the initiation and expansion of women’s contributions to Wikipedia. Gender and feminist theories have been applied to obtain a deeper understanding of the digitization of women’s roles in collaborative content writing. Cyberfeminism, an offshoot of feminism, focuses on the intersection of gender and technology, suggesting that individuals’ experiences are shaped by multiple intersecting identities including gender, race, and class [4]. In the context of collaborative content writing, digitization can offer women from diverse backgrounds an equal platform to share their perspectives and experiences [1, 3].
Consequently, this study aims to answer gender-related questions by analyzing Wikipedia contributors in terms of gender. These are:
- How does gender contribute toward characterizing user profile of Arabic Wikipedia contributors?
- Is it possible to characterize the gender of Wikipedia contributors in terms of the size of contributions?
- Is it possible to characterize the gender of Wikipedia contributors in terms of categories of contributions?
- Is it possible to characterize the gender of Wikipedia contributors in terms of date and time allocated?
- Is it possible to characterize the gender of Wikipedia contributors in terms of topics of interest?
- What are the main page categories of interests of Arabic Wikipedia contributors from a gender perspective?
- Is there a significant difference between genders in terms of categories of interest?
- How do Wikipedia Arabic user contributions vary according to gender over the past five years?
This study primarily examines Arabic Wikipedia contributors by profiling users based on their gender. It highlights how contributor gender plays a significant role in characterizing their behavior, topics of interest, time allocated, and patterns of contribution. This study differs from other studies in that gender is not discussed; rather, user profiling topics are discussed in terms of gender theory with specialized input from a gender specialist to provide insights into contributors’ behavior.
Related work
With the revolution of the web, most of the content generated is user-profile-centric, that is, personalized [5] including social media, e-learning, e-government, and collaborative writing platforms. Wikipedia is an encyclopedia-scale example of collaborative writing platform, which is currently maintained by a community of volunteers using an editing system called MediaWiki. “Wikipedia is the largest and most read reference work in history and has consistently been one of the ten most popular websites, ranked 7th in October 2023” [6]
Wikipedia has been a rich source of research over the years, as many scholars have investigated the content, users, technologies, and contributions. As Wikipedia is more frequently leveraged to correct misinformation [7, 8], train machine learning tools [9], and enhance search engine results [10], the gender biases that exist on the platform can easily propagate throughout the digital landscape [11].
Studies on Wikipedia’s culture of democracy are divided, as some claim that it is bureaucratic [12, 13], whereas others claim it is democratic [14–16].
As gender gap studies focusing on cyberfeminism discuss various aspects of limitations on online access and participation for women [1, 17, 18], many studies emphasize the importance of a movement to support gender equality and online coexistence among genders [19].
Despite the growing body of literature revealing gender disparities in online participation and contribution [20–22], there are limited studies on these gaps using comprehensive measures, particularly in specific contexts such as Arabic Wikipedia. This study highlights the pronounced gender gap in content contribution and page editing activities. Various explanations have been offered for these gender differences, including unequal access to resources and societal processes, such as gender socialization and Social Role Theory (SRT) [23–25]. In a stratified society, such as the Arab region, understanding women’s experiences necessitates an examination of the gender concepts, power structures, and emotional relations that govern their lives [26].
This study differs from extant literature in that it focuses on characterizing Arabic Wikipedia users through their editing behavior, indicating differences and gaps between genders, and then matching the findings with those of SRT.
Methodology
Parts of Arabic Wikipedia were downloaded on 20/06/2023 including pages, articles, logs, categories, and modifications, among other details, with the total size exceeding 23GB. A sample Wikipedia page structure is found in Fig 1. The figure shows how page history may be included in the page metadata including: page id, time stamp, contributors, among many others. One page may have more than one contributor and may have been modified many times. Contributors’ ids were used to match page contributors with their Wikipedia page edit history files.
Another Structured Query Language (SQL) version was downloaded to clearly understand the relationship between Wikipedia pages and other information such as categories, modification logs, and users, with the total size exceeding 21GB. The analysis was performed using a server with a Xeon Gold CPU, 96 GB of DDR4 RAM, and an NVMe SSD. Page history was the chief part of the analysis, where users, their actions, and their action dates were collected. Wikipedia history pages have the schema of (Mediawiki_history_dumps#Schema_details) page last visited (December 4, 2023). This schema table is shown in Table 1 and a short description of studied actions is listed in Table 2 in the supporting information section. According to Wikipedia, the dumps used and their content are licensed under the GNU Free Documentation License (GFDL) and the Creative Commons Attribution-Share-Alike 3.0 License [27].
The schema provides detailed information used to perform our analysis, except for gender. Therefore, a list of Arabic men and women users have been collected from the (تصنيف:رجال_ويكيبيديون ، تصنيف:نساء_ويكيبيديات translated in [28]) Wikipedia pages where usernames were matched to the users found in history pages, and the matched ones were reported in this work. After removing automatic bots, an 89% match of usernames was reported for the last five years, excluding 2023, as the log for this year was incomplete in June 2023. The remaining 11% were inactive, deleted, or banned by Wikipedia administrators for various reasons. Edited or created pages, revisions, user management, and event actions were extracted and analyzed for each user. Consequently, users have been studied from various perspectives including their possible actions as collaborators and/or editors.
Results and discussions
The results revealed several interesting findings. The gender gap in Arabic Wikipedia indicates that there are few existing women editors as shown in the statistics. There are also systemic biases in editing culture as shown in Fig 2, and there is a gender preference in writing articles. For example, the results show that the number of users constantly editing pages on Wikipedia indicates a significant difference (t-test, α = 0.01) between the two genders, as presented in Fig 2. Although the numbers change with similar patterns–that is, they increase and decrease similarly–the difference between the two genders remains high. At the peak of the chart (i.e., 2020), constantly editing men accounted for 76% of the total number of page edits, representing the lowest ratio between men and women. The reported ratio was approximately 80:20, with slight differences over time (except for 2020). This gender gap in digitization engagement has been reported in almost all digital transformation policies in the Arab region, including Jordan, [29] the United Arab Emirates, [30] and Qatar [31].
As digital platforms such as Wikipedia provide powerful platforms to amplify women’s voices and perspectives, they enable women to share their thoughts, experiences, and creative idea-seeking opportunities. This newfound freedom empowers women to contribute toward enriching the content landscape with diverse voices and narratives that may have been marginalized or underrepresented in conventional media [1]. The statistics show the underrepresentation of women on Wikipedia as well.
This technology has the potential to free women from various gender roles and stereotypes of their roles in society by facilitating them to improve their careers in the world of work, education, and intellectuals [2, 3]. The digitization of women’s roles in collaborative content writing undeniably reshapes the narrative of gender in the field and brings forth a powerful shift toward diversity and inclusivity. The digital landscape has enabled more women to work remotely as freelancers. Many women take advantage of this flexibility to engage in collaborative writing projects, contributing content from the comfort of their homes or preferred locations. This transformation not only amplifies the voices of women, but also underscores the immense talent and creativity they contribute to the world of content creation [1]. The results reveal that men and women increased almost constantly in an approximately linear manner, with an average percentage of women (23%) to men (77%), as presented in Fig 3. The numbers reveal that in 2020, the number of women increased by a higher percentage than the number of men (27% and 73%, respectively). The reported results are in harmony with [32] held on Spanish language. In addition, it is shown that in the last five years, the number of contributing women on Wikipedia doubled four times from slightly over 14 thousand in year 2018 to approximately 60 thousand in year 2022, nevertheless, the increase in the number of contributing men was higher starting with slightly higher than 56 thousand in 2018 to slightly higher than 240 thousand in year 2022.
The gender gap evident in Wikipedia contributions, particularly within the context of Arabic Wikipedia, is inadequately documented and signifies an intricate phenomenon that underscores broader societal challenges intertwined with cultural norms and gender roles. As a collaborative online encyclopedia, Wikipedia relies on voluntary contributions from a global array of contributors. Nonetheless, this study elucidates the notable underrepresentation of women among Arabic Wikipedia editors and contributors. The genesis of this gender gap is multifaceted, with cultural norms and gender roles playing pivotal roles in shaping its dynamics. Cultural norms frequently prescribe traditional gender roles and influence perceptions of appropriate behaviors for men and women. Stereotypes linking women to domestic roles or less technically oriented pursuits may deter their active involvement in fields such as technology or contributions to platforms such as Wikipedia that are perceived as predominantly male-dominated. Moreover, cultural expectations surrounding women’s roles in caregiving and household management may restrict the time available for endeavors such as Wikipedia editing, with societal expectations prioritizing women’s domestic responsibilities over their contributions to online platforms, thereby reinforcing traditional gender norms. Conversely, the COVID-19 pandemic provided women with distinct opportunities to engage in remote employment, work from home, and create novel job prospects that align with their role as mothers. This surge in digitalization has led to the emergence of new job opportunities in various fields. The widespread embracing of remote work practices has enabled women to more effectively reconcile their professional responsibilities with their caregiving duties, as reflected in our study results.
The experiments indicated differences in both men and women’s behavior toward performing actions in the last five years, as summarized in three categories: page actions (page create, delete, and move), revision actions (page create), and user actions (user create and user rename). Table 3 summarizes the average per gender in the last five years, with the differences at the bottom of the table. A t-test assuming equal means revealed that there was a significant difference at 95% (t-value = 1.823, α = 0.05), providing strong evidence that there was a significant difference between both genders. Supporting information section provides a detailed table presenting the average number of actions per year.
A summary of the average actions per year for men and women is presented in Fig 4. The figure displays a higher number of actions performed by men than by women, with fewer in favor of women. For example, in 2018 and 2020, women performed a higher average of page actions than men. Women also performed higher number of revisions in 2019 and 2020. Moreover, the figure demonstrates that both genders performed poorly in terms of user action over the past five years.
Date-based statistics
The average action count per year for each gender were reported in Table 4. In addition, it is also reported that January of each year, as shown in Table 5, is the busiest month in terms of the number of active users, followed by December, May, and April in descending order. This indicates that most users may have had more time to contribute during these months, and they may have been university students.
Further, as shown in Table 6, the numbers indicate that there are no significant differences (t-test, α = 0.01 and α = 0.05) between days of the month, as the number of active users is (488 ± 30) users, with the exception of the 31st of each month, as not all months have 31 days.
The number of active users has fluctuated since 2018, with a significant drop in 2022. There may be several assumptions for these changes, including marital changes and divorce rate changes during the COVID-19 pandemic; however, since biographical data for Wikipedians are not available, it is impossible to confirm these assumptions [33, 34]. The supporting materials section provides detailed tables of the day and month statistics. For example, Table 4 shows the average action count per year for each gender. For instance, the average page create action performed by men in year 2018 was 21.9 compared to 18.52 page create actions for women. In Table 5, the sum of edits per month every year is shown. This table is precisely important since it shows how women interaction changes over the year, specifically, at the times where her stereotyping roles are performed. For example, in February, June, and December the interactions decrease compared to other months due to exam periods at schools or universities and holidays.
Conclusion
This study reported that a gender gap exists in Arabic Wikipedia. The results reported differences over the past five years between both genders in terms of tasks and user behavior. One aspect that indicated similarity is the period of active time over months/years. Differences were observed in the reported number of increasing users, activities, responsibilities, and average actions performed.
Acknowledgments
The authors would like to thank the reviewers for their valuable insights and comments on this paper. Also, a special thanks to Editage proofreading service for their continuous support.
References
- 1. Suharnanik S. (2022). Cyberfeminism: The Opportunity and Challenges Of Social Media For Indonesian Women Empowerment. Jurnal Komunikasi Korporasi dan Media (JASIMA), 3(2), 118–136.
- 2. Piliang Y. A. (2001). Cyberspace, Cyborg dan Cyber-Feminism: Politik Teknologi dan Masa Depan Relasi Gender. Dalam Jurnal Perempuan, 18., 18, 7–20.
- 3. Hidayah R. R. (2023). Cyberfeminism’s Resistance to Women’s Marginalization: Gender Discourse Analysis on Magdalene.co Website. Journal of Gender Equality and Millennium Development, 1(1), 21–29.
- 4. Korabik K. (2015). The intersection of gender and work–family guilt. In Mills M. J., Gender and the work-family experience: An intersection of two domains (pp. 141–157). Springer.
- 5. Challam V. G. (2007). Contextual search using ontology-based user profiles. Large Scale Semantic Access to Content (Text, Image, Video, and Sound) (RIAO ’07), (pp. 612–617). Paris.
- 6. https://www.similarweb.com/
- 7. Cohen N. (. (2018, April 6). Conspiracy videos? Fake news? Enter Wikipedia, the ‘good cop’ of the Internet. Retrieved from The Washington Post: https://www.washingtonpost.com/outlook/conspiracy-videos-fake-news-enter-wikipedia-the-good-cop-of-the-internet/2018/04/06/ad1f018a-3835-11e8-8fd2-49fe3c675a89_story.html
- 8. Hughes T. S. (2020, August 22). Helping People Better Assess the Stories They See in News Feed with the Context Button. Retrieved March 5, 2024, from Facebook Newsroom: https://about.fb.com/news/2018/04/newsfeed-fyi-more-context/
- 9.
https://www.wikipedia.org/
- 10. Lewandowski D. S. (2011). Ranking of Wikipedia articles in search engines revisited: Fair ranking for reasonable quality? Journal of the American Society for Information Science and Technology, 62(1), 117–132.
- 11. Langrock I.&.-B. (2022). The Gender Divide in Wikipedia: Quantifying and Assessing the Impact of Two Feminist Interventions. Journal of Communication, 72(3), 297–321.
- 12. Shaw A. H. (2014). Laboratories of Oligarchy? How the Iron Law Extends to Peer Production: Laboratories of Oligarchy. Journal of Communication, 64(2), 215–238.
- 13. Niederer S. D. (2010). Wisdom of the crowd or technicity of content? Wikipedia as a sociotechnical system. New Media & Society, 12, 1368–1387.
- 14.
Benkler Y. (2006). The wealth of networks: How social production transforms markets and freedom. Yale University Press.
- 15. Cooke R. (2020, February 17). Wikipedia Is the Last Best Place on the Internet. Retrieved from Wired: https://www.wired.com/story/wikipedia-online-encyclopedia-best-place-internet/
- 16. Maher K. (2018, October 18). Wikipedia is a mirror of the world’s gender biases. Retrieved from Wikimedia Foundation: https://wikimediafoundation.org/news/2018/10/18/wikipedia-mirror-world-gender-biases/
- 17. Gao G. S. (2020). Towards a ‘virtual’ world: Social isolation and struggles during the COVID‐19 pandemic as single women living alone. Gender, Work & Organization, 27(5), 754–762. pmid:32837008
- 18. Elena-Bucea A. C.-J. (2021). Assessing the role of age, education, gender and income on the digital divide: Evidence for the European Union. Information Systems Frontiers, 23, 1007–1021.
- 19. Dinana H. O. (2022). Marketing and Advertising in the Online-to-offline (O2O) World. IGI Global.
- 20. Hinnosaar M. (2019). Gender inequality in new media: Evidence from Wikipedia. Journal of economic behavior & organization, 163, 262–276.
- 21. Master A. H. (2020). Cultural stereotypes and sense of belonging contribute to gender gaps in STEM. Grantee Submission, 12(1), 152–198.
- 22. Robinson A. L. (2021). How to close the gender gap in political participation: Lessons from matrilineal societies in Africa. British Journal of Political Science, 51(1), 68–92.
- 23. Eagly A. H. (1987). Sex differences in social behavior: A social-role interpretation. Lawrence Erlbaum Associates, Inc.
- 24. Eagly A. H. (2000). Social role theory of sex differences and similarities: A current appraisal. (Eckes T. T., Ed.) The developmental social psychology of gender, 123–174.
- 25. Zirra A. R. (2023). Gender differences in the prevalence of Parkinson’s disease. Movement Disorders Clinical Practice, 10(1), 86–93. pmid:36699001
- 26. Boshmaf H. (2023). Jordanian Women Entrepreneurs and the Role of Social Media: The Road to Empowerment. Dirasat: Human and Social Sciences, 50(3), 139–152.
- 27.
License information about Wikimedia dump downloads: https://dumps.wikimedia.org/legal.html (Last visited 3/3/2024)
- 28.
Translation: Category: Wikipedia_Women, Category: Wikipedia_Men, respectively
- 29. Jordan Digital Transformation Policy 2020: https://www.modee.gov.jo/EBV4.0/Root_Storage/EN/1/Jordan_Digital_Transformation_Strategy_2020_English_Unofficial_Translation.pdf
- 30.
UAE National Digital Government Strategy: https://u.ae/en/about-the-uae/strategies-initiatives-and-awards/strategies-plans-and-visions/government-services-and-digital-transformation/uae-national-digital-government-strategy
- 31.
Qatar Digital Inclusion Policy: https://mcit.gov.qa/en/digital-society/digital-inclusion
- 32. Minguillo´n J. M. (2021). Exploring the gender gap in the Spanish Wikipedia: Differences in engagement and editing practices. PLoS ONE, 16(2). pmid:33621229
- 33. Hoehn-Velasco L. B.-M. (2023). Marriage and divorce during a pandemic: the impact of the COVID-19 pandemic on marital formation and dissolution in Mexico. Review of economics of the household, 1–32. pmid:37361559
- 34.
https://jorinfo.dos.gov.jo/