A model-guided symbolic execution approach for network protocol implementations and vulnerability detection

Formal techniques have been devoted to analyzing whether network protocol specifications violate security policies; however, these methods cannot detect vulnerabilities in the implementations of the network protocols themselves. Symbolic execution can be used to analyze the paths of the network protocol implementations, but for stateful network protocols, it is difficult to reach the deep states of the protocol. This paper proposes a novel model-guided approach to detect vulnerabilities in network protocol implementations. Our method first abstracts a finite state machine (FSM) model, then utilizes the model to guide the symbolic execution. This approach achieves high coverage of both the code and the protocol states. The proposed method is implemented and applied to test numerous real-world network protocol implementations. The experimental results indicate that the proposed method is more effective than traditional fuzzing methods such as SPIKE at detecting vulnerabilities in the deep states of network protocol implementations.


Introduction
Network protocol implementations are often prone to vulnerabilities, and formal verification techniques cannot address the problems in the implementations. Fuzz testing and symbolic execution are widely applied to detect vulnerabilities in network protocol implementations. However, it is difficult to reach the deep states of the stateful network protocols given the complex interactions and state transitions of these methods, because they do not fully exploit the packet interaction and state transition information.
In this article, we propose a novel approach that uses an FSM model to guide the symbolic execution. More precisely, we utilize the L Ã [1] online learning algorithm to construct the FSM model. The FSM for a stateful protocol is presented in Fig 1. We first build a prototype model-guided symbolic execution system to explore the protocol states and detect vulnerabilities in the deep states of the protocol. Then, we use the prototype system to test several real-world network protocol implementations and compare this system with the traditional fuzzing tool SPIKE.

Related work
In this section, we briefly present research works related to the current study [2,3]. Fuzzing has been used to detect network protocol implementation vulnerabilities for more than 20 years [4]. Several tools exist that are specifically aimed at network protocol implementations. SPIKE [5] developed by Dave Aitel, is a framework that provides an API to assist in creating fuzzed network protocol implementations. PROTOS [6], developed by the Oulu University Secure Programming Group, generates input packets intelligently based on protocol specifications. However, these methods do not model stateful protocols and cannot reach the deep states of network protocols.
Symbolic execution is a powerful technique for analyzing program behavior, identifying bugs, and generating tests [7]. The main concept underlying symbolic execution is to use symbolic input values instead of concrete input values. This approach treats the paths as symbolic constraints and solves the constraints to produce a concrete input as a test case. Symbolic execution has been applied to test network protocol implementations. SymNV [8] and SymbexNet [9], developed by JaeSeung Song, combine symbolic execution with rule-based specifications based on KLEE [10], a symbolic engine tool. KleeNet [11] integrates KLEE to detect vulnerabilities in wireless sensor networks. SymNet [12], proposed by Raimonds Sasnauskas, is a testing environment for unmodified protocol implementations running on diverse operating systems. It was designed on top of the S2E platform [13], runs on the QEMU [14] virtual machine, and adopts KLEE as its symbolic engine. Traditional fuzzing and symbolic execution methods do not make full use of the protocol state information; thus, they have difficulty reaching the deep states of network protocol implementations. Jingling Zhao [15] proposed a smart fuzzing algorithm based on a regression finite state machine (RFSM-Fuzzing) that can test the robustness of wireless network protocols and find potential flaws. MACE [16] uses concolic execution [17] to build an abstract FSM model to guide further state exploration. Our previous work [18] that combines network analysis and binary reverse engineering is an advanced fuzzing testing method for detecting vulnerabilities in network protocols. However, without making full use of the protocol state information, this method cannot reach the deep states of the network protocol. As a result, the vulnerabilities related to the deep states cannot be detected by this method.

Protocol inference
A state machine model is the standard way to model a protocol. To infer the protocol, state machine inferences are used to extract a state machine model for each network protocol implementation. This approach learns the state machine by sending network protocol packets and observing the response packets. There are two types of FSMs. The first type is called the Moore machine [19] and the second type is called the Mealy machine [20]. The Moore machine's outputs depend solely on the current state, while the Mealy machine's outputs depend on both the current state and on the inputs. Because protocol states use the input and output packets to interact with their environment, the Mealy machine is more suitable for making protocol inferences.
Inference algorithm: L Ã was the first learning automata algorithm to use an active approach [1], and it is still the most widely-used active approach for performing formal verifications. Niese proposed a modified version of the L Ã algorithm to infer Mealy machines [21] that involves a teacher and a learner. The teacher possesses knowledge about a deterministic Mealy machine, while the learner has no knowledge about the Mealy machine except for its input and output. The learner learns about the Mealy machine by querying the teacher.

Symbolic execution
James proposed symbolic execution in 1976 [22], which takes input in the form of symbolic values rather than concrete values. The path constraints utilize the conditions of each path, which depend upon the symbolic values. They are collected as symbolic value expressions from the starting point to the current point. The constraints are resolved by the constraint solver, leading to a concrete value as an input. The concrete input enables program execution to the current point.
Symbolic execution is an enhanced testing technique because it is more efficient than normal testing. It requires testing each path only once due to the symbolic value, which achieves the same effect as testing a path using all the concrete values that match the path constraints. An example of symbolic execution is shown in Figs 2 and 3: when symbolic execution results in a 5, the symbolic expression is (x < 17)^(x > 10^x < 20)-the same as when the path is executed with the concrete values x = 11, 12, 13, 14, 15, 16.

Framework design
This study presents a new model-guided symbolic execution approach to detect flaws in network protocol implementations. The basic concept is to link the program paths and states of the protocol using an FSM to guide the symbolic execution. This approach helps in exploring the deep states of network protocol implementations. Fig 4 presents the framework of our method, which consists of the following components: a message format extractor, a protocol model extractor, an input packet injector, a modelguided symbolic executor, and an exception monitor.
The method proceeds as follows: first, we extract the message formats from the protocol specification of the target network protocol implementation. Second, we automatically infer an abstract FSM of the network protocol implementation. Third, after acquiring the message formats and the protocol model, we use the message formats to construct symbolic packets, which are used as the input of symbolic execution, and we leverage the protocol model to guide the symbolic execution to improve the coverage of both program paths and protocol states. Finally, we monitor the symbolic execution procedure; when exceptions occur, we report the crashes and record the test cases.

Message format extractor
A Request for Comments (RFC) is a document that provides a description of a protocol specification. The document contains descriptions of the services, types and formats of messages exchanged and the rules governing the reactions of the entities involved. Network developers have implemented many different network protocol software applications based on these standard protocol specifications. The relationship between the protocol description, its specification, and the implementation of the ISAKMP protocol is shown in Fig 5. This protocol is described in RFC2408 [23] and has been implemented as Openswan, Wireshark, Adaptive Security Appliances, and OpenBSD. The packet format of an ISAKMP header is depicted in  We analyze the RFC documents of the target network protocol implementations by focusing on the description of the message formats. In our work, the message formats are manually extracted from the RFC documents.

Protocol model extractor
We leverage the L Ã inference algorithm to infer the FSM model for the network protocol implementation. L Ã probes a black box with a sequence of messages, listens to the responses, and builds an FSM from the responses [16]. The component called the protocol model extractor learns an abstract model of the target implementation of the network protocol by producing a sequence of messages used as input. The produced input is further modified based on the network protocol implementation, and the output description. The proposed model utilizes the message format extracted by the message format extractor component to generate many input messages as the starting inputs to infer the first FSM. Subsequently, the FSM guides the symbolic execution to explore the possible states of the network protocol implementation. The symbolic execution produces numerous input as well as output messages. These messages are employed to further infer and evolve the FSM, and this is a repetitive process. If the target is a known protocol, we can acquire the RFC documents and use them to assist the FSM in inferring the model. The proposed method produces a more accurate the model. The FSM of the DHCP protocol is illustrated in Fig 7.

Input packet injector
To leverage symbolic execution for network protocol implementations, we need to inject the input packet into memory by intercepting the network-related functions and their arguments,   Table 1. In this method, we can inject the input packet into memory without changing or recompiling the source code of the network protocol implementations.
After injecting the input packet into memory, the packets are marked as symbolic packets by replacing some of the bytes of concrete input packets with symbolic values. To reduce the path explosion problem, specific interesting bytes can be selected for replacement. Fig 8 depicts an example of packet marking in which the ID field is targeted for symbolic values.

Model-guided symbolic executor
The model-guided symbolic execution leverages the FSM to guide the symbolic execution to concentrate on exploring the interesting states. With FSM, we can easily obtain an input As shown in Figs 9 and 10, the conventional symbolic execution approach can become stuck in the program paths of state S1; however, the module-guided symbolic execution leverages the FSM to inject packet 2, causing the program to transition to state S2. Similarly, if we intend to explore state S4, we can inject packet 4.

Exception monitor
The exception monitor component tracks the test cases and produces a crash report. This module leverages dynamic kernel instrumentation to catch unhandled exceptions. It also tracks CPU states and the execution context, which are useful for analysis in the later stages. Default exception handling functions can be called for unhandled exceptions by the network protocol implementations. In the proposed work, these functions are used to track test cases and produce the crash report. For WINDOWS system, we intercept the default exception handler function: UnhandledExceptionFilter. For Ã nix system, we intercept the exit function of the process, whose address can be acquired from System.map file.

Implementation
The proposed method is implemented based on S2E, and it is a full-system selective symbolic execution platform [13]. S2E enables the reuse of parts of the QEMU virtual machine [14], the KLEE symbolic execution engine [10], the LLVM tool chain [24], and the STP constraint solver [25]. This paper proposes an extension to S2E to leverage model-guided symbolic execution and find flaws in network protocol implementations. To infer the state machines for the network protocol implementations, we adopt LearnLib [26], which uses a modified version of the L Ã algorithm.

Evaluation
We evaluate the effectiveness of our proposed method by testing several known vulnerabilities in real-world File Transfer Protocol (FTP) implementations(S1 File). We compare our proposed method with the general fuzzing tool SPIKE(S2 File).
The experiments were conducted by hosting the QEMU virtual platform on two machines, namely, a host machine and a guest machine. The host machine includes an Intel Core i7-  Comparisons of the results of the prototype system and SPIKE for vulnerability detection are shown in Table 2. The empirical results indicate that our prototype system can detect the known vulnerabilities and more effectively than SPIKE.

Conclusion
This study presented a new method based on model-guided symbolic execution. The proposed method associates program paths with protocol states. It uses an FSM to guide the symbolic execution, allowing it to analyze interesting deep states of network protocol implementations. We built a prototype system and used it to test for several known vulnerabilities that exist in real-world implementations of various network protocols. The empirical results indicate that the proposed method is effective at detecting vulnerabilities.
In possible future work, we plan to use our method to test several other network protocols and to seek to improve the efficiency of our method.