A metamodel for mobile forensics investigation domain

With the rapid development of technology, mobile phones have become an essential tool in terms of crime fighting and criminal investigation. However, many mobile forensics investigators face difficulties with the investigation process in their domain. These difficulties are due to the heavy reliance of the forensics field on knowledge which, although a valuable resource, is scattered and widely dispersed. The wide dispersion of mobile forensics knowledge not only makes investigation difficult for new investigators, resulting in substantial waste of time, but also leads to ambiguity in the concepts and terminologies of the mobile forensics domain. This paper developed an approach for mobile forensics domain based on metamodeling. The developed approach contributes to identify common concepts of mobile forensics through a development of the Mobile Forensics Metamodel (MFM). In addion, it contributes to simplifying the investigation process and enables investigation teams to capture and reuse specialized forensic knowledge, thereby supporting the training and knowledge management activities. Furthermore, it reduces the difficulty and ambiguity in the mobile forensics domain. A validation process was performed to ensure the completeness and correctness of the MFM. The validation was conducted using two techniques for improvements and adjustments to the metamodel. The last version of the adjusted metamodel was named MFM 1.2.


Introduction
The worldwide use of mobile phone devices is increasing daily. Ericsson President and CEO Hans Vestberg expects that by 2020, 50 billion mobile phones will be connected to the web as compared to five billion now [1]. This confirms an earlier prediction that by 2020 mobile phones will be the primary devices of digital communication [2]. Fig 1 shows that 76 percent of the devices used in 2014 were mobile phones [3]. According to a recent report by Patrik Cerwall (2015), the number of mobile phone users in Q1 2015 was around 7.2 billion, which equals the World's population [4].
Mobile device forensics is considered a new field compared to other digital forensics such as computer and database forensics. According to authors in [5] branch of digital forensics relating to the recovery of digital evidence from a mobile device under forensically sound conditions. The phrase 'mobile device' often applies to mobile phones. However, these devices are currently used for many other activities in our daily lives, for instance, checking e-mails, taking photos, browsing the Internet, business transactions, location data and much more. In contrast to these productive activities, mobile phone crime is on the rise, and cybercrime is now moving to mobile phone devices. For instance, committing fraud via email, harassment through text messages, distribution of child pornography, terrorism and selling drugs. MF has many interacting elements, including people, authority, investigation teams, resources, procedures and policy. The sophistication of the crimes and the https://doi.org/10.1371/journal.pone.0176223.g001 Table 1. Mobile phone digital crimes [11].

Crime Description Evidence Source Harassment
By sending any type of (text, sexual, photo, video) messages that contain harassing and threatening words.
-Address books-History logs file.

Trafficking Drugs
Criminals using mobile phones to distribute drugs and coordinate activities between them.
-Gallery photos-Call history.
-Contact lists-Cell site locations.

Terrorism
Dangerous actions against civilians to achieve political, organization goals by using mobile phones as a bomb (e.g.: Mumbai terrorist attack 2008, and commuter trains in Madrid in 2004, or using the mobile to coordinate activities and share information).
-Cell site locations-Call history.
-Electronic money transfers.
variety of mobile phone devices used in these offenses are becoming major challenges to the investigators [6]. In addition, the volume of data and complexity of investigation are among the major issues in MF [7]. Furthermore, the investigators may not have a clear view of which potential evidence to start the investigation with. Previous studies have mostly discussed mobile forensics only in data acquisition terms and only in a problem solving scenario, as a subset to computer forensics. These studies did not take mobile forensics beyond the paradigm that is known as computer forensics. Additionally, they have not addressed the elements of MF comprehensively, and the previous research in the MF domain did not focus on modeling the case domain information involved in investigations [8]. The existing mobile forensic models are not based on any metamodels but rather constitute proprietary solutions, mainly focused on frameworks 1 Developing Process for Mobile Device Forensics [43] and other aspects of models. Metamodeling has been promoted by the efforts of the Object Management Group (OMG) [9]. This paper develops a Mobile Forensic Metamodel (MFM) in order to clarify all the necessary activities required by investigators for conducting their task. In addition, it creates a unified view of mobile forensic in the form of a metamodel that can be seen as a language for this domain. A metamodeling approach is applied to ensure that the metamodel which is the outcome is complete and consistent.

Model
Year published Table 3. Concept extraction.

Concept Total
The rest of this paper is organized as follows: the background of MF summarized in Section 2; Section 3 presents and describes the development process of our mobile forensic metamodel, based on a metamodeling approach; finally, the conclusion is presented in Section 4.

Background
The rapid change in the technology of mobile phones has provided opportunities for criminal activities. The crimes conducted through mobile phones include fraud, drugs and pornography, as discussed in [10] which indicated that these crimes are growing with the increase in numbers of mobile devices. According to the National Institute of Justice [11], many digital crimes are committed annually through mobile phone devices due to the proliferation of these devices in most countries. Thus, mobile phone devices contain a great deal of digital evidence for digital investigation processes [12]. The purpose of extracting digital evidence from mobile phone devices is to use it in court proceedings, as these devices are now frequently used in criminal activities [13]. The extracted evidence from mobile phones has played a significant role in forensics investigation in recent years and many murderer convictions have been partly based on evidence gathered from the mobile phones of the perpetrators or their victims [14]. For instance, mobile phone evidence was used in the prosecution of Ian Huntley who killed two girls, and was also used to locate and arrest suspects in the failed London car bomb attacks in 2007 [12]. Some of the types of crimes conducted through the use of mobiles and the evidence sources contained in the mobile devices are shown in Table 1.
The rapid proliferation of mobile phones on the market caused a demand for forensic examination of the devices, which could not be met by existing computer forensic techniques. Much research has been conducted in the MF domain. While some studies have discussed MF in general devices, the majority of previous studies were concerned with Smartphone forensics. A study in [15] tested and analyzed data remnants for instant messaging from Facebook and Skype to identify evidence from these data. However, validated frameworks and methods to extract mobile phone data are practically non-existent [16]. The rapid development in mobile phone devices has caused difficulties to designing a single forensic tool or standards specific to one platform [12]. Furthermore, the lack of hardware, software and standardization in mobile phone devices are one of the significant difficulties in the MF domain [17]. This fact makes investigation process a hard task. It is also easy to tamper with digital evidence in mobile phones through overwritten or remote commands received from the wireless network [18].
Moreover, the lack of standardization is a major issue in the field of MF. Advanced development in technology, as well as the variety of mobile devices and OSs are making the procedure of developing a common framework or standardization model difficult [17,19,20]. In addition, as stated in [21], that the major issue in mobile phone investigation is that there is no standard forensic model nor any standard process for the forensic examination of smart phones. Research by Hoog concluded that digital forensic investigators and security engineers have faced difficulties dealing with mobile phone crimes due to their lack of knowledge management [22].
Additionally, it has been suggested that members of the legal profession need to increase their level of understanding and knowledge of mobile phone forensic terminology, techniques and procedures [13]. Moreover, it has been claimed that a major issue in law enforcement agencies in many countries is the lack of knowledge management [23]. Therefore, forensic investigators are facing difficult challenges when conducting the forensic investigation processes related to digital crimes, particularly for mobile phones. In a recent NIST Mobile Forensics Workshop (2014) [24] conducted by researchers in the MF domain, all the issues related to MF domain were discussed. It was indicated that investigators are struggling with the MF domain because they do not have sufficient knowledge, training and education related to the proper seizure procedures for mobile devices, the appropriate transport procedures and proper forensic examination and analysis [24]. Furthermore, while a number of digital forensic process models have been developed by various organizations worldwide, there are no agreed forensic investigation and legislative delegation procedures to adhere to, especially in the case Table 5. Sample of concept definitions.

Concept Definition
Chain of Custody A process that tracks the movement of evidence through its collection, preservation, and analysis lifecycle by documenting each person who handled the evidence, the date/time it was collected or transferred.

Documentation
A continuous activity required in all the stages and used for documenting the crime Scene (Photographing, Sketching, and Recording).
Extraction A process to acquire data from mobile phone using acquisition methods which are manual acquisition, logical acquisition, and physical acquisition.
PhysicalAcquisition A process to facilitate the examiner to search the contents of the removable media and potentially recover deleted files.

Mobile forensic metamodel
In this paper, the authors use a metamodeling approach to identify the common concepts of the MF domain. This approach has been promoted by the efforts of the Object Management Group (OMG) to create interoperable, reusable and components. This is an activity to generalize a domain through collecting all the domain concepts and partitioning the domain problems into sub-domain-problems [39]. Through this approach we developed our metamodel for MF. Thus, a metamodel is a special kind of a model: It identifies domain features and related concepts (like any other model) but is created with the intent to formally describe the semantics underpinning a formal modelling language. Without a metamodel, the semantics of domain models can be ambiguous [40]. Many previous studies have used metamodeling approach for managing knowledge of domain. The study reported in [39] used this approach to develop a generic metamodel for Multi Agent System (MAS). They used 6-steps to develop their metamodel. Later, a metamodel for managing disaster management knowledge was developed [40], using an 8-step metamodeling creation process. Moreover, the study in [41] used 7-step of metamodeling process to design a comprehensive and general purpose metamodel for metacognition that support artificial intelligence systems. To develop MFM, we used an 8-step metamodeling process adapted from [39], [40] and [41], which are the works most closely related to this study, and which present a thorough and structured process for identifying major concepts and their relationships.

Identification of common phases of domain
The purpose of identifying the common phases of the domain is to facilitate the extraction concepts in the domain. According to [5,42], the common phases of MF include Preservation, Acquisition, Examination and Analysis and Reporting. National Institute of Standards and Technology (NIST) also recommends these phases. Preservation is a process of securely maintaining custody of property without altering or changing the content of data that reside on devices and removable media. Acquisition is the process of obtaining information from a mobile device and its associated media. In this process, the potential digital evidence is extracted from the sources such as the mobile device's internal memory, SIM card memory, and SD memory, using acquisition methods. Examination & Analysis are the processes used to uncover digital evidence such as hidden data. The results are obtained through applying established scientifically-based methods and should describe the content and state of the data fully. Finally, Reporting is the process of preparing a detailed summary of all the steps taken and conclusions reached in the investigation of a case.

Model collection and classification
This step includes collecting several MF models from a variety of sources, including books, journal papers, conference papers and reports that were found from Google Scholar, Science-Direct, IEEE Xplore, PLOS One, Springer Link and Google. The collection of models was conducted using different keywords such as ''mobile forensics model", ''smartphone forensics analysis", ''mobile forensics preservation", ''mobile forensics acquisition", ''mobile forensics examination", ''mobile forensics identification" and ''evidence extraction of mobile device". Among these collected models, some models cover all four phases of MF, while others cover three, two or even only one phase. Hence, based on the number of phases included, the model can be called either a "general model" or a "specific model". The model is called "general model" if it can cover at least three phases of MF, otherwise the model is called "specific model". For this study, a total of 41 models were collected, from which 31 models were considered as general models and 10 models as specific models. These models were selected based on their clarity and how well domain knowledge is presented through the models. The collected models were then classified into three different sets (Set1, Set V1 and Set V2) for development and validation of the MFM. These sets are formed according to how broadly the models cover the four phases of MF. Set I, which includes 21 general models is used to create the initial metamodel, while Set V1 which includes 10 specific models and Set V2 which includes 10 general models are used for validation of the metamodel in Step 8.
The purpose of this first validation (Set V1) is to identify any missing concepts in the initial metamodel, because the specific models provide more information for each phase of the MF domain than provided by general models. While Set V1 concentrates on validating the MFM against specific MF models, the second validation (Set V2) focuses on generic MF models. It is worth mentioning that including the general models with wider coverage in this set will provide better indication of the frequency of concepts across the models, which is necessary to evaluate the importance of individual concepts included in the MFM. Table 2 shows the models in each set.

Concept extraction
This step is an important process in the metamodeling approach. The purpose of this process is to identify concepts among the models that could potentially be included in the MFM. Extracting concepts should be performed from the textual contents (main body) of a mobile forensic model in order to avoid any missing or unrelated concepts during extraction process. The main body contains the developed model. For instance, Xian's model "Symbian smartphones forensic process model" [31] covered a five processes for Symbian smartphones. We extracted the related concepts under each of these processes. The extracted concepts should be related to the MF domain, otherwise, they were excluded. However, similarly to the procedures in [39,41,82,83], the concepts were extracted manually from each model in Set I. We adapted the concept extraction process from [84][85][86][87]. A description of the concept extraction process is presented below: • Concept Recognition: this step is based on a linguistic approach. The concept should contain noun or adjective + noun or compound noun to recognize it. For instance, "Investigator and Court" are a noun; "Faraday Bag, Chain of Custody" are compound noun, whereas "Acquired Data, Volatile Evidence" are adjective + noun.
• Concept categories: candidate concepts of the metamodel are represented as: i. Actor (active concepts) such as (Investigator, Forensic Specialist, Audience).
The concept extraction process is shown in Fig 3. The outcomes of the concept extraction are shown in Table 3. We extracted 725 general concepts from Set 1 (including 21 models in total).

Selection and identification of common concepts
In this step, we identified the common concepts from step 3 (containing 725 concepts in total) based on concepts that have been widely used in the domain of MF. However, some concepts used different name but with similar meaning. For example, the concepts "Incident" in models [43,50], "Case" in model [52] and "Crime" in models [45,47,48,55,59,61] have similar meaning. Hence, we grouped these concepts into one common concept: "Crime", as shown in Table 4.In addition, the concepts that have a single name such as "Securing Scene" in models [42,45,46,48,59,61] are considered as common concept. The remainder of selection of common concepts are shown in S1 Table. For the concepts that shared same meaning, we used the following features: Frequency (occurrence), Generality and Definition to select the name of the common concepts from them. Therefore, if two or more concepts have similar meanings, then the concept name with higher frequency, generality and definition will be selected for inclusion in the metamodel while the other names will be excluded. For example, the shared meaning of the concepts: Classification, Identification and Recognition is: ''used by investigator to identify type of mobile device and its operating system, people in the crime scene, external data storage and potential evidence sources". The concept ''Identification" is selected as a common concept because it has higher frequency in more models than Classification and Recognition. Hence ''Identification" is included in the metamodel and Classification and Recognition are excluded. Indeed, the main priority for selecting the common concept is the high frequency (occurrence) of the concept among all models. The outcome of this step is selecting 82 common concepts, as shown in S1 Table.

Short-listing and reconciliation of definitions
In this step, we provide a short list of definitions for every selected concept in step 4. A harmonization of the definitions in the metamodel is required, when two or more concepts have the same definition, or even two or more concepts have the same concept name. The chosen definition for each concept must be a precise definition and widely agreed in the mobile forensic domain [39,82]. In addition, differences between definitions were reconciled to ensure an internally consistent set of metamodel terms. Definitions were chosen based on consistency with earlier choices, where possible; otherwise, hybrid definitions created from multiple sources were introduced. If there is a different use of concept definition between two or more sources, the process was to select the usage that was most coherent with the rest of the set of chosen concepts trying at all times to preserve generality. For instance, the concept of "Documentation" was defined differently in four models: Kaur [62] defines it as "Document all the steps". Ayers [42] defines it as "an essential activity in providing individuals the ability to re-create the process from beginning to end and documenting the crime Scene (Photographing, Sketching, and Recording). Dancer [52] defines it as "an activity that takes place within each phase of forensics investigation and therefore should not be a standalone activity in any forensics examination". Mumba [50] defines it as "a process to improve efficiency by ensuring documentation of all steps is clearly undertaken during a mobile forensic investigation". Ramabhadran [45] defines it as "a continuous activity required in all the stages and is quite critical for maintaining proper chain of custody". As a result, the concept of "Documentation" in our metamodel is defined as "a continuous activity required in all the phases of mobile forensic and used for documenting the crime scene through (Photographing, Sketching and Recording)". A sample of the list of short definitions is shown in Table 5. The rest of the concept definitions are shown in S2 Table. 3.6 Classification of common concept In this step, selected concepts are classified into one of the MF phases: Preservation, Acquisition, Examination & Analysis and Reporting [5,42]. Classification into the phases is shown in Table 6.

Relationship identification among concepts
In this step, we determine the relationships between our MFM concepts. Mobile forensics investigation has four common phases, which are preservation, acquisition, examination and analysis and reporting in. Therefore, the resultant MFM is represented in four different diagrams which are: the Preservation-phase, the Acquisition-phase, the Examination and analysis-phase and the Report-phase. Figs 4-7 illustrate our initial MFM 1.0 diagrams for each phase. The resultant metamodel includes the relationships between concepts and represents the semantics of the MF domain. Therefore, we established the relationships between concepts, based on the semantic language, which were discovered and identified during survey of MF models. We used three symbols of relationships which are Association; Specialization; and Aggregation. Association indicates functional relationships between concepts. Specialization represents hierarchies between concepts using relationship 'Is A Kind Of'. Aggregation represents relationships between concepts that are composed of other concepts using relationship 'Is A Group Of'. For example, the Acquisition-phase class (Fig 5) has a central concept, Foren-sicLab. The aggregation symbol is used to describe relationships between ForensicLab concepts and other concepts including Extraction, ForensicTool and ForensicExaminer. Another example of relationship between concepts is the association. This describes relations between 'Evidence' and 'Presentation' concepts in the Reporting-phase class (Fig 7). The relationship between 'InternalMemory' and 'VolatileEvidence' concepts represents using 'Is A Kind Of' in the Acquisition-phase class (Fig 5).
MF is a continuous process with activities linking phases at different points. Correspondingly, in our MFM, relationships between concepts are identified not only among concepts within the same phase, but also among concepts from different phases. Concepts from classes in different phases can be linked and a continuous MF process can be formed. Linkages across phases are established either through relationships among concepts from different phases or  through common concepts among phases. For example, an association relationship 'Requires' can link the concept of "ForensicTool" (from the Acquisition-phase) to the concept "Preparation" (from the Preservation phase). Another example of a relationship that links two concepts across two phases is an association relationship 'Requires' that is used to create a link between the concept "Evidence" in the Reporting -phase class and the "Collection" concept in the Preservation -phase class. Table 7 illustrates examples of relationships that link concepts from different phases. Additionally, Linkages across phases are also established through common concepts between phases. The use of the concept "Crime" shows that the investigation task should start from the preservation phase in the mobile forensic investigation process, while the use of the concept "Documentation" shows that the four phases require overlapping sets of documentation for their phase activities.

Metamodel validation
In this section, we will discuss the validation process of our proposed MFM. The purpose of validation process is to measure the soundness and quality of proposed metamodel [88]. A metamodel requires validation to meet the requirements of generality, expressiveness and completeness of the artifact. In addition, to insure the completeness and correctness of the proposed metamodel, validation of the metamodel is required. For the validation process, the following two commonly used techniques [89,90] were used:

Comparison with other Models-This
technique is used to verify that each concept of a validation model can be represented with some of the metamodel concepts. In this technique, we added some concepts to the metamodel.

Frequency-based Selection-
The purpose of this validation technique is to verify the frequency of the metamodel concepts appearing in a set of models. In this technique, we deleted some concepts from the metamodel.
These validation techniques are described in the next subsections.

Comparison with other models.
The purpose of this validation technique is to ensure that each model included in Set V1 is represented in MFM (shown in S3 Table). For example, if a concept of some model in Set V1 could not be represented in MFM, then we consider this concept as a candidate concept to add to MFM. In this process, we added four new concepts to MFM. Table 8 illustrates these new concepts. These four were added to MFM: Hypothesis, Imaging, DataExamined and Archiving as shown in . The relationships between the new concepts and the concepts that comprise the MFM are shown in Table 9. The outcome of this technique was version MFM 1.1.

Frequency-based selection.
We used 10 models (Set V2 in Table 2) to perform this validation. The purpose of this technique is to evaluate the importance of individual concepts in the model developed [91]. This technique preforms two tasks. In the first task, we collect concepts from model Set V2 and compare them with concepts in the MFM 1.1, as shown in S4 Table. From this task, not all phases were changed to the same extent e.g.: the Preservationphase of MFM 1.1 only gained the Collection concept as shown in Fig 12. The second task of frequency-based selection validation is to score each concept according to its frequency. Concepts which have a low score are revisited and are liable for deletion. To estimate an importance value for each concept in MFM, we used 'Degree of Confidence (DoC)'. This value identifies the expected probability that a MFM concept is used in a randomly chosen mobile The following five categories of concepts based on their DoC are defined: 1. Very Strong (DoC value: 100-70%).

Very Mild (10-0%).
Very Strong refers to a concept that appears many times in Set V 2 models, while Very Mild is the other end of the scale. For example, the MFM concept Identification has a strong DoC value of 80%: Tables 10-13 have three main parts. Left part of tables contains concepts for each phase in the MFM1.1. The middle part of tables contains 10 models for Set V2 that were used to Table 12. Frequency result of examination and analysis -phase concepts.

MFM1.1 Examination and Analysis Concepts
compare their concepts against concepts of MFM1.1. The right side of tables contains concept frequency (score) for each concept. Each row of these tables contains concepts for each phase in the MFM1.1.
In Tables 10-13, we compared each concept of the Preservation, Acquisition, Examination & Analysis and Reporting phases against the models of Set V2 to find concept frequency for each concept in these models. The results show that the concepts EnvironmentalEffect, FirstResponder, InvestigationStrategy, Sketching and Shock) in preservation-phase (Table 10) have low score, whereas concepts such as Crime, MobileDevice, Documentation and Investigator have a high score. In Table 11, the acquisition-phase has two concepts with low score which are  Table 12, whereas the concepts such as AnalysisData, ExaminationData, ForensicTool and Documentation have a higher score in this phase. In Table 13, the concepts Evidence, Result, Investigator, and CourtOfLaw are examples of concepts with high score, whereas concepts such as Archiving, Conclusion and TechnicalExpert have a low score in the Reporting-phase. The concepts with higher score mean these concepts are more important in the MF domain. In contrast, the concepts that have a low score are revisited and are liable for deletion. The DoC classification of all MFM concepts is shown in Tampering is deleted because the DoC value of this concept was 'zero', which means this concept is rarely recognized in the mobile forensic models. By revisiting MFM, it is found that the other four (EnvironmentalEffect, PatternMatching, TechnicalExpert and LegalExpert), are to be kept as they are common across varying MF domains.
Because of frequency-based selection, classes for the Preservation and Examination & Analysis phases have been changed, whereas the classes for Acquisition and Reporting phases remain unchanged. Many people who are directly (e.g.: forensic investigators, cybersecurity agencies, police officers) or indirectly (e.g.: law enforcement agencies, IT companies) involved in mobile forensic operations generally do not have a complete view of how different mobile forensic activities can be conducted. MFM through its four sets of classes (preservation, acquisition, examination & analysis and reporting) can provide a picture of how all mobile forensic actions should be performed. Additionally, the developed metamodel contributes to the facilitation of sharing MF knowledge. It presents a new a metamodeling-based approach to guide mobile forensics practitioners on how to conduct mobile forensics investigation process properly. This is a specific artifact to describe a mobile forensics language. As the MFM has the ability to offer a modelling guideline to many domain users, various users can quickly find decision solutions from semantic models. Moreover, the resultant metamodel provides investigators with logical and sensible investigation concepts that may be needed during investigation process. Most of the concepts and terminologies of the mobile forensics domain were used in the MFM.

Conclusion
The issues and challenges of mobile forensics investigation have been presented and discussed through this paper. Based on our observation, the lack of knowledge management in mobile forensics has led to a certain problems in this domain. These are i) the difficulty of investigation for new investigators, ii) ambiguity in mobile forensics' concepts and terminologies and iii) the difficulty in understanding the various processes involved in this domain. To overcome these issues, the metamodeling approach has been selected and discussed briefly in this paper. We used 21 models (Set1) for the initial development of MFM. In the second iteration, 10 models (Set V1) were used for validation (using the technique of comparison against other https://doi.org/10.1371/journal.pone.0176223.g015 models) to identify any missing concepts in the initial version of the metamodel and to ensure its broad coverage. In the third iteration, we used another 10 models (Set V2) for a second validation (using frequency-based selection) to evaluate the importance of individual concepts. These two validations improved the expressiveness and the completeness of the concepts in MFM. Our MFM contributes to the increase of knowledge for both internal and external stakeholders in the digital forensics domain. Through the MFM, the artifact is hoped to help increase the efficiency of mobile forensic investigation in various forensic agencies. The MFM presents all the required concepts that could assist the designers in modelling all respective aspects when designing a mobile forensic enabled system and service.
Our future work based on results gathered from this paper is to continue to develop a repository based on the MFM to store MF knowledge and to allow a responsive and flexible MF approach.
Supporting information S1