Random domain name and address mutation (RDAM) for thwarting reconnaissance attacks

Network address shuffling is a novel moving target defense (MTD) that invalidates the address information collected by the attacker by dynamically changing or remapping the host’s network addresses. However, most network address shuffling methods are limited by the limited address space and rely on the host’s static domain name to map to its dynamic address; therefore these methods cannot effectively defend against random scanning attacks, and cannot defend against an attacker who knows the target’s domain name. In this paper, we propose a network defense method based on random domain name and address mutation (RDAM), which increases the scanning space of the attacker through a dynamic domain name method and reduces the probability that a host will be hit by an attacker scanning IP addresses using the domain name system (DNS) query list and the time window methods. Theoretical analysis and experimental results show that RDAM can defend against scanning attacks and worm propagation more effectively than general network address shuffling methods, while introducing an acceptable operational overhead.


Introduction
One important phase in most cyber attacks is network reconnaissance, where the attacker attempts to gather information about the potential target, including its IP address, operating system, and network services, searching for vulnerabilities to exploit and attack. It has been reported that attackers can spend an average of 45 percent of their time performing reconnaissance [1]. Therefore, mitigating the network reconnaissance phase is an effective network defense strategy.
In conventional networks, the host's network address and service port remain unchanged throughout the service lifetime, and this static service characteristic gives the attacker sufficient time to scan the target for detailed information and deploy a subsequent attack. However, traditional security systems such as firewalls, intrusion detection systems (IDS), and antivirus software are passive and incomplete in nature, and can detect only known attacks.
Moving target defense (MTD) [2] is a diversity defense that increases the attacker's costs and risk of detection by constantly changing the attack surface of the target, rendering the information collected by the attacker ineffective. Network address and service port shuffling a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 must communicate through a shared symmetric key and a synchronized timestamp, and maintain multiple IPv6 address for each node at any given time. RPAH (Random Port and Address Hopping) [6] is a port and address hopping technique proposed by Dunlop et al.; this method builds port and address hopping tunnels via the address hopping gateway (AHG), port hopping engine (PHG), and port and address hopping gateway (PAHG). This method requires adding several hardware devices in the network and deployment software in server hosts. In addition, in RPAH, the destination port for the client host's connection to the service remains untouched, and therefore, the port hopping does not increase the scanning space. These researchers also proposed TPAH (TAP-based Port and Address Hopping) [7], in which both communication sides generate port and address pair bases on a shared secret key and time-synchronization information. DYNAT (Dynamic Network Address Translation) [1] and APOD (Applications That Participate in Their Own Defense) [8] use address and port hopping approaches to disguise the host's identity; NAH (Network Address Hopping) [9] enhances the security of data by transferring data across multiple data connections named channels. These three works aim to resist sniffer attacks, but they cannot defend against scanbased reconnaissance attacks.
In the mutation pattern, address shuffling does not need to be synchronized, and the client host usually connects to the server host by a DNS request/respond or other third-party mechanisms. NASR (Network Address Space Randomization) [10] provides a LAN-level network address randomization scheme against hit-list worms; this method uses dynamic host configuration protocol (DHCP) to reassign addresses, and relies upon DNS to reestablish connection after the DHCP update by querying the updated address with the host's hostname. This method provides limited unpredictability and mutation speed because the mutation is limited within the network address space and DHCP update reconfiguration. OF-RHM (OpenFlow Random Host Mutation) [11], RHM (Random Host Mutation) [12][13] [14], and spatio-temporal address mutation [15] are proposed by Jafarian et al.; OF-RHM provides an IP mutation technique for a software defined network (SDN). In OF-RHM, the SDN controller dynamically assigns each host a random virtual IP which is translated to/from the real IP of the host, and hosts reach each other via the virtual IP acquired from the DNS request. OF-RHM uses the SDN controller to mutate the above virtual addresses and maintain connections by maintaining flow table entries. RHM implements a similar IP mutation technique in traditional networks. In spatio-temporal address mutation, the ephemeral IP address of the host is spatially changed based on the requestor identity and temporally changed based on the time. The above techniques can resist scanning attacks and worm propagation, but they have limited effects on scanning attacks and cannot prevent worms from spreading because of the limited IP address space. In addition, because the above techniques rely on the host's static domain name for the DNS request/response to establish the connection, they cannot defend against an attacker who already knows the target's domain name. The same limitations apply to the SDN shuffle approach [16], in which the SDN controller orders the server to install network address translation (NAT) rules to translate the dynamic IP address. The SDN shuffle approach modifies all server hosts and the DNS server to cooperate with the SDN controller. Other network address shuffling techniques that employ the mutation pattern include dynamic honeypot networks [17] [18] and MOTAG (MOving Target defense mechanism AGainst Internet denial-of-service attacks) [19], which is proposed against denial-of-service (DDoS) attacks, while our work aims at resisting reconnaissance attacks, increasing the cost to attackers through the dynamic mutation of network addresses and domain names.
Regarding the theoretical analysis of the effectiveness of network address shuffling, Carroll et al. [25] presented probabilistic models for non-repeat scanning in static address and perfect shuffling networks, where perfect shuffling is characterized by uniform shuffling where the change rate is greater than scanning rate. Crouse et al. [26] presented probability models for honeynet based on the above models. Their analysis shows that network address shuffling has a limited defense effect against scanning attacks and that perfect shuffling is beneficial only if the network serves a relatively small number of vulnerable computers. Al-Shaer et al. [12] [13] theoretically analyze the probability of non-repeat scanning in the uniform mutation network, considering the ratio of the attacker scanning rate to the defender mutation rate. Likewise, this analysis is also limited to non-repeat scanning. Several other theoretical researches related to network address shuffling include the general theory of MTD [27] [28], qualitative evaluations of network-based MTD methods [29], and comparisons of different MTD techniques [30].

Basic principles of RDAM
In a statically configured network, hosts communicate through static IP addresses, and therefore the attacker can scan the IP address space, identify a target hit-list after receiving responses, and execute persistent attacks using the hit-list. In a network address shuffling network based on domain name mapping, the IP address of the host constantly changes, and therefore the attacker cannot maintain a target hit-list of effective IP addresses. However, the domain name of the host remains static, and therefore the attacker can maintain a domainbased hit-list to attack. In addition, limited by the limited IP address space, network address shuffling cannot effectively defend against random scanning attacks.
RDAM dynamically changes the host's IP address and domain name based on the SDN architecture. A legitimate client host must first make a DNS request before connecting to the service, and the client can connect to the service only if the correct domain name is requested. The client can retrieve the correct domain names through a security authentication server. Therefore in RDAM, the attacker cannot establish a target list of domain names or addresses, and it is difficult for the attacker to directly access a host in the network through the IP address. While the domain name space is much larger than the address space, RDAM greatly increases the difficulty to an attacker trying to scan the network based on domain names. switches are configured to modify and forward packets quickly; and the controller acts as the central manager of all software-defined switches, managing their flow tables and cooperating with the authoritative DNS server, the DHCP server, and the security authentication server. In RDAM, the authoritative DNS server is authorized for the internal hosts' domain names, and the security authentication server, which applies existing proof-of-work (PoW) schemes [31] [32] to protect the client authentication channel, presents different dynamic domain name lists to authorized users according to their authority.

Basic architecture of RDAM
The internal hosts in RDAM include public hosts that provide services to the external network (Internet) and private hosts that cannot be accessed by the external network. Internal hosts communicate with each other through internal addresses, while public hosts communicate with the external network through external addresses. If private hosts need to access the external network, the gateway must deploy the NAT functionality to realize the communication with the external network.

Address and domain name mutation
In RDAM, the mutation space to be considered includes the internal address space, the external address space, and the domain name space.
Internal addresses and external addresses should be reserved addresses and public addresses in the IPv4 address space, respectively. Denote S R as all reserved addresses and S P as all public addresses; then the internal address space is S I & S R with a size m I = |S I | 2 24 + 2 20 + 2 16 ; the external address space is S E & S P with a size (the number of available public addresses) m E = |S E | that is very limited because of the IPv4 address space allocation strategy.
We use third-level domain (3LD) names for the hosts' domain names, where the 3LD is a random string of length l > 1. Because domain names consist of letters (case-insensitive), numbers, and connectors (not at the beginning or end), then S D is the domain name space with a size of m D = |S D | = 36 2 × 37 l−2 . By setting the appropriate l, m D can be much larger than m E (m D ) m E ). In particular, when l is 6, the size of the domain name space can be the same order of magnitude as the entire IPv4 address space size.
Suppose there are n I internal hosts in the RDAM network Hosts ¼ fh 1 ; h 2 ; . . . ; h n I g, of which there are n E n I public hosts Servers ¼ fs 1 ; s 2 ; . . . ; s n E g. Denote mac i as the MAC address of host h i , and if i as the interface of the software-defined switch to which h i is connected. In particular, denote if E as the interface of the software-defined switch to which the gateway is connected, and GMAC as the MAC address of the gateway's interface connecting to the internal network. Then the address and domain name mutation method can be executed as follows.

Internal address mutation.
In RDAM, the internal address iIP i 2 S I of host h i is the host's real address. In order to protect communications within a physical subnet (a group of hosts physically connected through common switches), the internal addresses of all the hosts in RDAM occupy different subnets of the internal address space so that the internal hosts cannot communicate directly through common switches. Denote L as the mask length of all subnets; then the subnet of iIP i is S I i ¼ subnet iIP i =L, and 8i 6 ¼ j; S I i \ S I j ¼ 0. Therefore, the number of internal hosts supported by RDAM satisfies n I m I /2 32−L .
In order to ensure normal communication and confuse the attacker in RDAM, each subnet S I i has a virtual gateway, the IP address of which is gIP i 2 S I i and gIP i 6 ¼ iIP i , and the MAC address gMAC i is generated randomly. The virtual gateway can respond to some basic protocols such as address resolution protocol (ARP) and Internet control message protocols (ICMP). After the connection to the network has been established, each host will obtain the MAC address of its virtual gateway through ARP protocol for subsequent communications.
The controller can directly obtain DNS requests from internal hosts; therefore in RDAM, for each host h i we record its internal DNS query list Q I i & S I , which is the address list of the internal hosts that have queried the given host's domain name (the address list will be cleared after the domain name changed). Only the connection requests of the hosts in this list are allowed to forward to h i . Therefore, if the attacker does not know the internal host's domain name, then even if the attacker knows the host's internal address, the attacker still cannot access the host, so the internal addresses can mutate over a long period or remain unchanged.
In addition, the internal host can obtain the internal address through DHCP, and can also use a statically configured internal address; however, the subnet of the configured address cannot be repeated by other internal hosts. If hosts in the same physical subnet are configured with the same subnet, then they can directly access each other without protection. Therefore, in RDAM, all the hosts should be configured in separate subnets.

External address mutation.
Because of the recursive query, the controller may not obtain the IP address of the external host in the DNS request from the external network; therefore, it is not possible to identify the external host issuing the DNS query. To address this issue, RDAM uses the time window method to reduce the probability that the attacker can access the public host through the IP address. For each public host s i , we record its external DNS query time t Q i , which is the time of the latest DNS query for its external address eIP i 2 S E from the external network. Only within the following win i time range can external hosts access s i through eIP i .
In order to calculate the time window of each public host, RDAM counts the number of times each public host is DNS-queried by an external host through its domain name to obtain the average speed λ i (t) of the DNS query for s i at time t. It can be assumed that the time interval of external hosts issuing the DNS query for s i follows an exponential distribution with parameter λ i (t). Then the probability that external hosts can access s i through eIP i is In RDAM, we set the above probability of each public host as p. Then the time window of s i is win i ¼ À 1 l i ðtÞ lnð1 À pÞ; that is, the more a public host is DNS-queried by external hosts per unit time, the smaller its time window will be.
In the time window, the attacker could still access the public host through the external address, and therefore the external address of the public host still needs to be dynamically mutated at a high frequency. Denote R E i as the external address mutation rate of public host s i ; then the next external address mutation time t E i is the current mutation time plus 1=R E i . R E i can be pre-configured, and the more critical a public host is, the higher its mutation rate should be. then the next domain name mutation time after time t is nextðtÞ Because the domain name space has sufficient size, it can guarantee that in a sufficient number of domain name mutation cycles, a domain name will not be repeated. Assume the nonrepeat cycle number of the domain name mutation is x; then RDAM records the domain names D x & S D that are assigned in the last x cycles. Assume φ denotes the maximum ratio of the number of allocated domain names |D x | = xn I to the domain name space size; then the non-repeat cycle number of domain names supported by RDAM satisfies x min(m D /n I −1,φm D /n I ).

Packet processing algorithm.
In RDAM, software-defined switches modify and forward transmission control protocol (TCP) and user datagram protocol (UDP) packets according to flow tables; packets that have no matching flows in flow tables and ARP, ICMP, DNS, and DHCP packets are forwarded to the controller. The controller processes the above packets and installs the flows necessary for new connections in the software-defined switches in the path. Each connection must be associated with some fixed flows, thus ensuring end-toend connectivity continuity.
The general packet processing algorithm is presented in Algorithm 1. The flows in the algorithm are the combined description of all software-defined switches in the path, in fact the action of the flows in the source and destination switches should only contain part of the modification action, and the intermediate switches should only contain the forward action. The flow chart of Algorithm 1 is presented in Fig 2.  Random domain name and address mutation (RDAM) for thwarting reconnaissance attacks UDP, and ICMP packets from an internal host to an external address. Connection failure packets include: 1)DNS packets querying currently unused internal domain names; 2) TCP-SYN, UDP, and ICMP packets connecting to currently unused internal or external addresses; 3) TCP-SYN, UDP, and ICMP packets from a source address that is an internal address and is not in the DNS query list of the destination host; and 4) TCP-SYN, UDP, and ICMP packets that the destination host is a public host, the source address is an external address, and the time is not in the time window of the destination host.

Domain name and external address mutation algorithm.
The domain names of all hosts are periodically changed at fixed times, and no assigned domain name will be repeated in x cycles; therefore, the mutation space is S D − D x . The external addresses of all public hosts are changed randomly according to their mutation rate, and no external address that is assigned to mutated hosts can be the same as the external addresses of all non-mutated hosts; therefore, the mutation space is S E À feIP i js i 2 Servers^t E i > nowg. The general domain name and external address mutation algorithm is presented in Algorithm 2.

end for end for
In the random selection of the domain name and external address in the above algorithm, we consider the following two methods: 1) uniform mutation, that is, the domain names and the external addresses are uniformly selected in the mutation space; and 2) failure-prior mutation, that is, the destination domain names and the addresses of the connection failure packets are preferentially selected.

Communication protocol
In RDAM, the client host must use the dynamic domain name of the server host to access the services it provides, and the client host can obtain the dynamic domain name of the server host in two ways: 1) for temporary services, the server host obtains its domain name through rDNS (nslookup) and informs the client host; or 2) for long-term services, the client host accesses the security authentication server to obtain the dynamic domain name list according to its authority, and finds the dynamic domain name of the server host in the list. Fig 3 shows the communication process between internal hosts. After obtaining the domain name of the server host, the client host sends a DNS query with the domain name, the controller and the DNS server response the server host's internal address; the TTL value in the DNS response is the valid time of the current domain name; and the controller records the client host in the internal DNS query list (steps 1-3). The client host can then initiate the connection using the address of the server host, and the software-defined switch forwards the initial packet to the controller (step 4). The controller installs relevant flows in the software-defined switches in the path according to Algorithm 1 (step 5). Future packets of this connection will be modified and forwarded in software-defined switches according to the above installed flows in the flow table (steps 6-10). Fig 4 shows the communication process of an external host requesting a public host's service. In this case, the client host obtains the external address of the server host through a DNS request; the TTL value in the DNS response is the minimum value of the valid time of the current external address and the time window; and the controller records the external DNS query time (steps 1-3). Similar to the above scenario, the initial packet of the client host is forwarded to the controller by a software-defined switch (step 4). According to Algorithm 1, after confirming the time window, the controller installs relevant flows in the software-defined switches in the path (step 5). The rest of the packets will be modified and forwarded in software-defined switches according to these flows (steps 6-10).

Additional techniques for robustness
In order to enhance the availability and security of the network, RDAM also used several other techniques to improve the network functionality and increase defense against reconnaissance attacks.

Application level gateway service (ALG).
When an external host accesses a public host, the destination IP address is different from the actual IP address of the server host. However, in some special application protocols, such as file transfer protocol (FTP), generic routing encapsulation (GRE), etc., the payload contains the server host's IP address to establish connections. Therefore, IP address mutation may disrupt the normal communication of these protocols. RDAM implements application level gateway service (ALG) in the switch connected to the gateway to perform the detection and transformation of the payload when the internal address and the external address of the public host are interchanged, ensuring normal communication through the protocol without requiring the user to implement any special configuration.

Fingerprint mutation.
The attacker can use fingerprinting tools such as Nmap to identify the target host's fingerprint information, such as operating system, set of services, etc., for the purpose of finding a target host's vulnerabilities. RDAM uses the recognition and replacement method to realize the confusion of some fingerprint information [33], including the version of the server software in FTP and the hypertext transfer protocol (HTTP), and the operating system version.

Implementation and evaluation
In order to investigate and validate the feasibility of RDAM, we implemented a RDAM prototype and deployed it in our laboratory network according to Fig 1, in which all the software-defined switches and the controller were implemented in the X86 platform using Intel's DPDK (Data Plane Development Kit) [34]. The network's external address space was a Class C subnet, and there were 36 hosts in the network distributed within three physical subnets. One subnet included a web server and a FTP server, which were both set as public hosts, while in the other subnets, four internal hosts enabled the remote desktop service, of which two hosts were set as public hosts.
We ran several network activities during mutation, including: 1) web browsing, FTP file download, and remote desktop connections of internal hosts and external hosts to public hosts; 2) remote desktop connections of internal hosts to private hosts; and 3) connection of internal hosts to the Internet. All network activities behaved normally, and long-lived connections were maintained during mutations. The network quickly completed the DNS responses and the establishment of connections, while the established data flow packet transmission delay was negligible. Our implementation proved that RDAM is feasible in real networks.
In the following, we will present the defense performance of RDAM against scanning attacks and worm propagation, as well as its cost, through theoretical analysis and simulation experiments.

Scanning attack
Scanning is the primary step in most network attacks. The attacker uses scanning tools, such as Nmap, to discover the active hosts in the target network, and then carries out further attacks. In this paper, we consider the following two typical scanning strategies: 1) non-repeat scans, which never scan the same address twice, usually employed in scanning tools; and 2) repeatable scans, in which the addresses scanned are generated randomly so they may be repeated, usually employed in worms and other malwares. In the following, we will analyze the defense performance of RDAM when using uniform mutation and failure-prior mutation by calculating the expected ratio of hosts missed by the scanner (to total hosts).

Repeatable scans.
In repeatable scans, the address or domain name scanned is randomly generated. Therefore, regardless of which mutation method the defender uses, the probability of the scanner hitting a designated host is the same as the probability in the static address environment.
Here, we analyze repeatable scanning of domain names and IP addresses from internal and external scanners in RDAM.
(1) Domain name scans: First, consider that the internal/external scanner scans domain names. Because the size of the domain name space is m D , the probability of the scanner hitting each host in each scan is 1/m D . Then, in k scans the scanner misses each host with probability (1−1/m D ) k . Because the probability of each host being hit is independent of each other host, the probability of the scanner missing each host is also the expected ratio of missed hosts. Therefore, the expected ratio of missed hosts in k scans is Thus, when the number of scans is the same as the external address space size m E , RDAM allows ð1 À 1=m D Þ m E % e À m E =m D of the hosts to be missed by the scanner. (2) IP address scans by an external scanner: In RDAM, the probability that each public host can be accessed through its external address is p, so the probability of the scanner hitting each host in each scan is p/m E . Therefore, the expected ratio of missed hosts in k scans is Thus, when the number of scans is m E , RDAM allows ð1 À p=m E Þ m E % e À p of the hosts to be missed by the scanner.
(3) IP address scans by an internal scanner: In RDAM, internal hosts can access another host only when they are in the internal DNS query list; therefore, the internal scanner cannot hit any host by scanning IP addresses, and the expected ratio of missed hosts in k scans is P miss ðkÞ ¼ 1: To perform simulation experiments, we created a network of SDN switches using Mininet [35], and developed the RDAM controller using POX [36] to manage the SDN network. The network included n I = 12000 internal hosts, of which n E = 6000 hosts were public hosts; the external address space was a Class B subnet (m E = 65536); and the length of the 3LD was 4 (l = 4). Fig 5 shows the theoretical and experimental results of the ratio of missed hosts by the internal and external repeatable scanners scanning domain names and IP addresses as the scanning number increases. In the simulation, every data point was taken as the average of 10 runs, and the scanner executed a total of m E scans. In existing network address shuffling techniques, such as NASR [10], OF-RHM [11], etc., internal and external scanners can directly access an internal host once they hit the host's IP address, because of the lack of the DNS query list and the time window methods. Therefore, the result of the internal/external scanner scanning IP addresses in OF-RHM is equivalent to the result of the external scanner scanning IP addresses in RDAM when p = 1 in Fig 5(B).
From the figure, we can see that the experimental results are basically the same as those predicted by the theoretical analysis. RDAM allows 96.2% of hosts to be missed by internal and external scanners scanning domain names; an internal scanner cannot hit any host by scanning IP addresses; for external scanners scanning IP addresses, when p = 1, that is, no time window, RDAM allows only 37% of hosts to be missed, while when p = 0.2, RDAM allows 81% of hosts to be missed by adjusting the time window setting.

Non-repeat scans in uniform mutation.
Assume that r denotes the ratio of the attacker scanning rate to the defender mutation rate; that the external address mutation rate of all public hosts is the same; and that they mutate at the same time. When r 1, the domain name (IP address) will be changed between any two scans; therefore, the scanning process is equivalent to a repeatable scan, and the scan results are consistent with those presented in the previous section.
Next, we analyze the case when r ! 1, assuming r is an integer.
(1) Domain name scans: First, consider that the internal/external scanners scan domain names. Because the hosts' domain names are uniformly distributed in the domain name space after each mutation, between two mutations, the probability of the scanner hitting each host in k 0 scan is k 0 /m D . k m D scans can be divided into bk/rc+1 mutations; therefore, the expected ratio of missed hosts in k scans, which is also the probability of the scanner missing each host, is P miss ðr; kÞ ¼ ð1 À ðk À bk=rcrÞ=m D Þð1 À r=m D Þ bk=rc : Thus, we can see that P miss (r,k) decreases as r increases, and 1) when r ( k, P miss ðr; kÞ % ð1 À r=m D Þ bk=rc % ð1 À r=m D Þ k=r % e À k=m D , P miss (r,k) reaches its maximum, and RDAM achieves the best defense performance; 2) when r < k^r ! k, P miss (k) % 1−r/m D ; and 3) when r ! k, P miss (r,k) % 1−r/m D , the domain name mutation offers no additional defense performance compared to the static environment.
From the above analysis we can obtain 1 À m E =m D P miss ðr; Therefore, when the number of scans is m E , RDAM allows 1−m E /m D of the hosts to be missed by the scanner.
(2) IP address scans by an external scanner: In RDAM, the probability that each public host can be accessed through its external address is p. Therefore, the expected ratio of missed hosts in k scans is P miss ðr; kÞ ¼ ð1 À pðk À bk=rcrÞ=m E Þð1 À pr=m E Þ bk=rc : Thus, we can obtain 1−p P miss (r,m E ) e −p . Therefore, when the number of scans is m E , RDAM allows at most e −p and at least 1−p of the hosts to be missed by the scanner. (3) IP address scans by an internal scanner: In RDAM, the internal scanner cannot hit any host by scanning IP addresses. Therefore, the expected ratio of missed hosts in k scans is P miss ðr; kÞ ¼ 1 Fig 6 shows the theoretical and experimental results of the ratio of hosts missed by the internal and external non-repeat scanners scanning domain names and IP addresses when the defender adopts uniform mutation as r increases. The simulation environment is the same as that described in Section 5.1.1. In the simulation, every data point was taken as the average of 10 runs, and the scanner executed a total of m E scans. In particular, the result of the internal/ external scanner scanning IP addresses in OF-RHM when the defender adopts uniform mutation is equivalent to the result of the external scanner scanning IP addresses in RDAM when p = 1 in Fig 6(B).
From the figure, we can see that the experimental results are basically the same as those predicted by the theoretical analysis. RDAM allows 96.2% of hosts to be missed by internal and external scanners scanning domain names; internal scanners cannot hit any host by scanning IP addresses; for external scanners scanning IP addresses, when p = 1, that is, no time window, RDAM allows at most 37% of hosts to be missed, and the ratio decreases as r increases, while when p = 0.2, RDAM allows 80% of hosts to be missed by adjusting the time window setting.

Non-repeat scans in failure-prior mutation.
Assume that r denotes the ratio of the attacker scanning rate to the defender mutation rate; that the external address mutation rate of all public hosts is the same; and that they mutate at the same time. Similarly, when r 1, the scan results are consistent with those presented in Section 5.1.1.
Next, we also analyze the case when r ! 1, assuming r is an integer.
(1) Domain name scans: First, consider that the internal/external scanner scans domain names, and denote the number of hosts as n (n = n I for the internal scanner and n = n E for the external scanner). In failure-prior mutation, the domain names missed by the scanner are preferentially selected as hosts' domain names, and the scanner will never scan those domain names in non-repeat scans, so some hosts will not be hit by the scanner. Denote X i as the number of inaccessible hosts in the i th mutation interval; then X 1 = 0.
In the i th mutation interval, the domain names of the remaining n−X i hosts are uniformly distributed in the domain name space, the size of which is m D −X i . In this interval, the probability of the scanner hitting each host in k 0 scans is nÀ X i n k 0 m D À X i , and the scanning number of hitting a host follows an hypergeometric distribution; thus, the expected hitting number is k 0 (n−X i )/ (m D −X i ). The scanner can execute r scans in this interval, and therefore the number of missed domain names is r−r(n−X i )/(m D −X i ), which is the number of inaccessible hosts in the i+1 th mutation interval. Considering that the number of hosts is n, we can obtain X i+1 = min(n,r−r (n−X i )/(m D −X i )).
k m D scans could be divided into bk/rc + 1 mutations; therefore, the expected ratio of missed hosts in k scans, which is also the probability of the scanner missing each host, is P miss ðr; kÞ ¼ 1 À ðk À bk=rcrÞ Thus we can obtain that 1) when r ( k^r ( n, X i is negligible, so P miss ðr; kÞ % e À k=m D ; 2) when r < k^X i < n, the larger r is, the larger X i is, and the better the defense performance is, so P miss (r,k) increases as r increases; 3) when r < k^X 2 = n, namely, n/(1−n/m D ) r < k, X i = n,i > 1, we can obtain P miss (r,k) % 1−r/m D , so P miss (r,k) decreases as r increases; and 4) when r ! k, P miss (r,k) % 1−k/m D .
From the above analysis we find that when r = n/(1−n/m D ), P miss (r,k) reaches its maximum; when r ! k, P miss (r,k) reaches its minimum, so 1 À k=m D P miss ðr; kÞ 1 À n=ð1À n=m D Þ m D ¼ m D À 2n m D À n . Therefore, when the number of scans is m E , RDAM allows at most m D À 2n m D À n and at least 1−m E /m D of the hosts to be missed by the scanner.
(2) IP address scans by an external scanner: In RDAM, the probability that each public host can be accessed through its external address is p. Therefore, the expected ratio of missed hosts in k scans is P miss ðr; kÞ ¼ 1 À pðk À bk=rcrÞ Similarly, we find that when r < k^X 2 = n, namely n/(1−pn E /m E ) r < k, P miss (r,k) % 1 −pr/m E can reach its maximum; when r ! k, P miss (r,k) % 1−pk/m D reaches its minimum, so 1 À p P miss ðr; m E Þ m E À 2pn E m E À pn E . Therefore, when the number of scans is m E , RDAM allows at most m E À 2pn E m E À pn E and at least 1−p of the hosts missed to be by the scanner. (3) IP address scans by an internal scanner: In RDAM, the internal scanner cannot hit any host by scanning IP addresses. Therefore, the expected ratio of missed hosts in k scans is P miss ðr; kÞ ¼ 1 Fig 7 shows the theoretical and experimental results of the ratio of hosts missed by the internal and external non-repeat scanners scanning domain names and IP addresses when the defender adopts failure-prior mutation as r increases. The simulation environment is the same as that presented in Section 5.1.1. In the simulation, every data point was taken as the average of 10 runs, and the scanner executed a total of m E scans. In particular, the result of the internal/external scanner scanning IP addresses in OF-RHM when the defender adopts failureprior mutation is equivalent to the result of the external scanner scanning IP addresses in RDAM when p = 1 in Fig 7(B).
From the figure, we can see that the experimental results are basically the same as those predicted by the theoretical analysis. For the internal scanner (external scanner) scanning domain names, the ratio of missed hosts first increases then decreases as r increases, and RDAM allows at most 99.7% (99.3%) and at least 96.2% of hosts to be missed. The internal scanner cannot hit any host by scanning IP addresses. For the external scanner scanning IP addresses, the ratio of missed hosts also first increases then decreases as r increases; when p = 1, that is, no time window, RDAM allows at most 89.9% of hosts to be missed, while when p = 0.2, RDAM allows at most 98.1% and at least 80% of hosts to be missed by adjusting the time window setting.
Through the analysis of the above three cases, we can see that RDAM increases the scanning space of the scanner through the dynamic domain name method, and reduces the probability that the scanner can access hosts by scanning the IP address through the DNS query list and time window. RDAM offers better defense performance than the general network address shuffling method, whether it is for the repeatable scanning attack or the non-repeat scanning attack.

Worm propagation
Computer worms are programs that self-propagate across a network, exploiting security or policy vulnerabilities in widely used services. "Scan-based worms" propagate by scanning random IP addresses and infecting active vulnerable targets, which need no human activation and thus can propagate much faster than "email worms" [37]. Many well-known worms, such as Code Red [38], Blaster [39], Witty worm [40], Conficker [41], Storm [42], etc., are all scanbased worms. In this paper, we consider worms based on IP address and domain name scanning, even though there are no known worms based on domain name scanning currently [43].
In this paper, we conducted experiments on worm propagation in the same simulation environment as in Section 5.1.1. Worm propagation behavior was experimented using a worm daemon [44], which consists of a client module and a server module. The client module is used to send "infection" datagrams to scanned targets in the target space on a specific UDP port; the server module is used to listen on the UDP port, and start the client module after receiving an "infection" datagram. The target space of the worm based on IP address scanning is the external address space, and the target space of the worm based on domain name scanning is the domain name space. In RDAM, the domain names and external addresses of all Random domain name and address mutation (RDAM) for thwarting reconnaissance attacks hosts are uniformly distributed, and therefore the attacker can obtain no information about a non-uniform vulnerable-host distribution; the best advantage the attacker can gain from a uniform distribution is to use uniform scanning, that is, randomly selecting targets in the target space [45]. Therefore, in the experiment, the worms use uniform scanning strategy. Fig 8  shows the propagation of worms based on IP address and domain name scanning in RDAM, and the propagation of the worm based on IP address scanning when all internal hosts dynamically mutate in the external address space using OF-RHM. The results in the figure were all taken from the averages of five experimental results. In each experiment, we ran server modules in 5% of the internal hosts and initially started two client modules in a public host and an external host, respectively; the worm scanning speed was 10 times/s, and each experiment lasted 1 hour.
From the experiments and above analyses, we can see that worms based on IP address scanning have the following characteristics in RDAM: 1) worms cannot propagate within internal hosts because internal hosts cannot directly access each other by scanning IP addresses; and 2) worms cannot infect private hosts because external infected hosts cannot access the private host and internal hosts cannot infect each other. Therefore, worms based on IP address scanning in RDAM are equivalent to external repeatable scanning attacks scanning external address spaces. Following the analysis in Section 5.1.1, RDAM can effectively reduce the ratio of infected public hosts through adjustments to the time window, and from Fig 8, we can see that the propagation results of worms based on IP address scanning in RDAM are consistent with the results shown in Fig 5. In summary, RDAM can effectively defend against worms based on IP address scanning.
With regard to worms based on domain name scanning, an attacker who already knows the domain name space of RDAM can spread a worm based on domain name scanning by setting the domain name scanning space in the worm. However, the space can be easily changed by re-configuration. In addition, according to [46], the propagation time of a uniform scan worm is proportional to the size of the scanning space. Therefore, RDAM can greatly increase the propagation time of worms based on domain name scanning, and from Fig 8, we can see that the propagation of the worm based on domain name scanning is significantly slower than the propagation of the worm based on IP address scanning in OFRHM. In summary, RDAM can effectively defend against worms based on domain name scanning. Random domain name and address mutation (RDAM) for thwarting reconnaissance attacks

Overhead
The space overhead of RDAM mainly includes the size of the flow table. The time overhead of RDAM mainly includes the delay from the DNS request to establish a connection and the delay resulting from packet modification and forwarding in software-defined switches.

Flow table size.
In RDAM, each TCP/UDP session needs to establish two flow table entries in the software-defined switches in the path. Assume in RDAM, each internal host establishes on average w 1 sessions with other internal hosts per second, w 2 sessions with external hosts per second, and each session on average takes θ seconds to terminate. Then according to Little's law in queuing theory, the mean length of the flow table is (w 1 + 2w 2 )n I θ. Fig 9 shows the flow table lengths for various session establishment rates (w 1 and w 2 /w 1 ) when n I = 12000, θ = 10s.

Operational delay.
We tested the operational delay caused by RDAM in our laboratory prototype. The software-defined switches and the controller were implemented in the X86 platform operating at 4 GHz using Intel's DPDK; the controller had two cores and 4 GB of memory; the switches had six cores and 8 GB of memory. All internal hosts, the DNS server, the DHCP server, and the security authentication server had two cores operating at 3.7 GHz and 4 GB of memory; the internal hosts ran the Win7 operating system, and the servers ran the Windows Server 2003 operating system. External hosts were generated through a server running Hyper-V with 32 cores operating at 3.9 GHz and 64 GB of memory; each external host had one core and 512 MB of memory running the Windows XP operating system. We tested 3 scenarios: scenarios with no flow table entry and 1 million flow table entries in the switches, and a scenario without RDAM, whereby hosts access each other using IP addresses through common switches. In each scenario, we measured the delay results of connections between internal hosts and connections of external hosts accessing public hosts, ran 100 trials respectively by using 25 internal hosts/external hosts visiting 4 public hosts. Table 1 shows the experimental results of the connection delay (the delay from the DNS request to establish a connection) and the transmission delay (the delay resulting from packet processing in software-defined switches after establishing the connection). From the results, we can see that the connection delay is unlikely to be noticed by users, and the transmission delay is negligible compared with the whole packet transmission process. Random domain name and address mutation (RDAM) for thwarting reconnaissance attacks

Discussion
RDAM is a network-based defense method, so it can be used simultaneously with host-based defense methods, such as firewall, to strengthen the network security. Also, RDAM does not affect the normal use of VPN (Virtual Private Network), because 1) when a VPN connection is established, the corresponding flow table entries are installed in the SDN switches, so VPN communications will not be interrupted due to domain name and network address mutation, and 2) as we described in Section 4.4.1, RDAM implements ALG in the switch connected to the gateway to perform the detection and address information replacement of the TCP/UDP payloads including those of VPN protocols.
The topology of RDAM is that a number of hosts and physical subnets are connected to the SDN network composed of SDN switches and the controller. However, RDAM does not support the topology that hosts and physical subnets are connected to the SDN network behind routers, because communications within routers will not be forwarded to the controller (if they can, the routers are actually SDN switches), and thus will not be protected by RDAM.
In addition, because internal hosts cannot access each other by scanning IP addresses, so some LAN software based on scanning IP addresses cannot work normally in RDAM. In further work, we will consider establishing a virtual network in RDAM, mapping the IP addresses of the hosts using LAN software to the network to allow these hosts to communicate through a specific port and protocol without leaking the IP addresses.

Conclusion
In this paper, we presented a defense technique based on assigning random domain names and network addresses, called RDAM, which defends against adversarial reconnaissance and scanning by dynamically mutating domain names and IP addresses. In RDAM, both the domain name and network address are changed dynamically, and an internal host can access other internal hosts only when it is in the DNS query list, and an external host can access a public host only when it is in the public host's time window. In addition, all internal hosts occupy different subnets and communicate through their virtual gateways.
We have presented the basic architecture and communication protocols in this paper, and evaluated the defense performance against scanning attacks and worm propagation, as well as the overhead, through theoretical analysis and experiment. Our evaluation and experimentation show that RDAM can effectively defend against scanning attacks and worm propagation by increasing the scanning space of the scanner through the dynamic domain name method, and reducing the probability that the scanner can access hosts by scanning the IP address through the DNS query list and time window, while satisfying practical operation overhead demand. Random domain name and address mutation (RDAM) for thwarting reconnaissance attacks