ScaleDefrag: Design and implementation of a scalable file defragmentation tool for flash-based SSDs

doi:10.1371/journal.pone.0348520

Table 1.

Categories and comparison with previous defragmentation studies.

More »

Expand

Fig 1.

Motivational performance evaluation.

(Fragmented state: it indicates the throughput measured after file generation, before defragmentation. Defragmented: it indicates the throughput after running e4defrag on the same workload.). (A) Application performance of varmail(highly fragmented) and fileserver(moderately fragmented) workloads in fragmented and defragmented states. (B) Read and Write throughput of e4defrag during defragmentation. Defragmentation increases application throughput while the e4defrag still underutilizes SSD bandwidth.

More »

Expand

Fig 2.

Overall procedure of serialized defragmentation process in the existing defragmentation scheme (e4defrag).

The e4defrag processes files sequentially with a single defragger, limiting multi-core parallelism and I/O concurrency.

More »

Expand

Fig 3.

Synchronous I/O operation in the existing defragmentation tool (e4defrag).

The defragger issues and completes scattered-block I/O ont at a time (i.e., at most one outstanding request), underutilizing SSD internal parallelism and increasing defragmentation time.

More »

Expand

Fig 4.

Overall Architecture of ScaleDefrag.

More »

Expand

Fig 5.

Overall process of file information collection in ScaleDefrag.

More »

Expand

Fig 6.

Information collector and multiple defraggers in ScaleDefrag.

More »

Expand

Fig 7.

Parallel checking of file fragmentation in ScaleDefrag.

More »

Expand

Fig 8.

Parallel file relocation in ScaleDefrag.

More »

Expand

Fig 9.

Synchronous and asynchronous defragmentation I/O timeline in e4defrag and ScaleDefrag, respectively.

(A) Synchronous I/O timeline in e4defrag. The defragger reads one page (P1, P2, P3) at a time, submits a single I/O request, and waits for its completion before preparing and issuing the next request. As a result, the device time for successive pages does not overlap and much of the potential parallelism in the storage device is left unused. (B) Asynchronous I/O timeline in ScaleDefrag. The defragger first collects multiple pages related to the same file (P1-P3 and P4-P6), submits their I/O requests, and then waits for completion. This allows the device times of different pages to overlap and exposes higher I/O parallelism, reducing the effective defragmentation time.

More »

Expand

Table 2.

Filebench and FFSB workload configurations.

More »

Expand

Fig 10.

Normalized execution time of the existing and proposed schemes on flash-based SSD.

(A) Defragmenting highly fragmented files generated with Varmail workload. (B) Defragmenting moderately fragmented files generated with FFSB workload. (C) Defragmenting less fragmented files generated with Fileserver workload. (D) Defragmenting little fragmented files generated with OLTP workload. The y-axis shows normalized execution time to the e4defrag with one thread (lower is better), and the x-axis shows the number of defragger threads. Across all workloads, ScaleDefrag with both schemes(PD+Async) achieve the shortest defragmentation time.

More »

Expand

Fig 11.

Throughput of the existing and proposed schemes on flash-based SSD (PD: parallel defragmentation, Async: asynchronous I/O).

(A) Defragmenting highly fragmented files generated with Varmail workload. (B) Defragmenting moderately fragmented files generated with FFSB workload. (C) Defragmenting less fragmented files generated with Fileserver workload. (D) Defragmenting little fragmented files generated with OLTP workload. The y-axis shows average defragmentation throughput (MB/s) and the x-axis shows the number of defragger threads. In all workloads, ScaleDefrag with both schemes(PD+Async) achieve the highest throughput.

More »

Expand

Table 3.

Baseline comparison summary across workloads. Execution time, defragmentation throughput, and peak memory usage for e4defrag and ScaleDefrag (PD+Async; Parallel and asynchronous).

More »

Expand

Table 4.

Fragmentation states of files evaluation workloads.

More »

Expand

Fig 12.

Impact on co-running application with defragmentation schemes (Standalone: running FIO alone).

(A) Normalized FIO execution time when running alone (Standalone) and when co-running with the e4defrag or ScaleDefrag under different fragmentation states. (B) Defragmentation time of the e4defrag and ScaleDefrag for the same states. ScaleDefrag both shortens defragmentation time and reduces the slowdown of the co-running application compared with the existing tool.

More »

Expand

Fig 13.

Core scalability of ScaleDefrag on a manycore machine with a CT250MX500 SSD.

(A) Normalized defragmentation time of the e4defrag and ScaleDefrag as the number of defragger increases from 1 to 64 (lower is better). (B) Defragmentation throughput (MB/s) under the same settings. ScaleDefrag continuously reduces execution time and increases throughput as more cores are used.

More »

Expand

Fig 14.

Core scalability of ScaleDefrag on a manycore machine with an Intel Optane 900p SSD.

(A) Normalized defragmentation time of the existing tool and ScaleDefrag as the number of defragger increases from 1 to 64 (lower is better). (B) Defragmentation throughput (MB/s) under the same settings. ScaleDefrag continually reduces execution time and increases throughput as more cores are used, reaching 4.57× higher throughput than the e4defrag at 64 cores.

More »

Expand

Table 5.

Performance breakdown of e4defrag and ScaleDefrag (PD: parallel defragmentation, Async: asynchronous I/O).

More »

Expand