Table 1.
Categories and comparison with previous defragmentation studies.
Fig 1.
Motivational performance evaluation.
(Fragmented state: it indicates the throughput measured after file generation, before defragmentation. Defragmented: it indicates the throughput after running e4defrag on the same workload.). (A) Application performance of varmail(highly fragmented) and fileserver(moderately fragmented) workloads in fragmented and defragmented states. (B) Read and Write throughput of e4defrag during defragmentation. Defragmentation increases application throughput while the e4defrag still underutilizes SSD bandwidth.
Fig 2.
Overall procedure of serialized defragmentation process in the existing defragmentation scheme (e4defrag).
The e4defrag processes files sequentially with a single defragger, limiting multi-core parallelism and I/O concurrency.
Fig 3.
Synchronous I/O operation in the existing defragmentation tool (e4defrag).
The defragger issues and completes scattered-block I/O ont at a time (i.e., at most one outstanding request), underutilizing SSD internal parallelism and increasing defragmentation time.
Fig 4.
Overall Architecture of ScaleDefrag.
Fig 5.
Overall process of file information collection in ScaleDefrag.
Fig 6.
Information collector and multiple defraggers in ScaleDefrag.
Fig 7.
Parallel checking of file fragmentation in ScaleDefrag.
Fig 8.
Parallel file relocation in ScaleDefrag.
Fig 9.
Synchronous and asynchronous defragmentation I/O timeline in e4defrag and ScaleDefrag, respectively.
(A) Synchronous I/O timeline in e4defrag. The defragger reads one page (P1, P2, P3) at a time, submits a single I/O request, and waits for its completion before preparing and issuing the next request. As a result, the device time for successive pages does not overlap and much of the potential parallelism in the storage device is left unused. (B) Asynchronous I/O timeline in ScaleDefrag. The defragger first collects multiple pages related to the same file (P1-P3 and P4-P6), submits their I/O requests, and then waits for completion. This allows the device times of different pages to overlap and exposes higher I/O parallelism, reducing the effective defragmentation time.
Table 2.
Filebench and FFSB workload configurations.
Fig 10.
Normalized execution time of the existing and proposed schemes on flash-based SSD.
(A) Defragmenting highly fragmented files generated with Varmail workload. (B) Defragmenting moderately fragmented files generated with FFSB workload. (C) Defragmenting less fragmented files generated with Fileserver workload. (D) Defragmenting little fragmented files generated with OLTP workload. The y-axis shows normalized execution time to the e4defrag with one thread (lower is better), and the x-axis shows the number of defragger threads. Across all workloads, ScaleDefrag with both schemes(PD+Async) achieve the shortest defragmentation time.
Fig 11.
Throughput of the existing and proposed schemes on flash-based SSD (PD: parallel defragmentation, Async: asynchronous I/O).
(A) Defragmenting highly fragmented files generated with Varmail workload. (B) Defragmenting moderately fragmented files generated with FFSB workload. (C) Defragmenting less fragmented files generated with Fileserver workload. (D) Defragmenting little fragmented files generated with OLTP workload. The y-axis shows average defragmentation throughput (MB/s) and the x-axis shows the number of defragger threads. In all workloads, ScaleDefrag with both schemes(PD+Async) achieve the highest throughput.
Table 3.
Baseline comparison summary across workloads. Execution time, defragmentation throughput, and peak memory usage for e4defrag and ScaleDefrag (PD+Async; Parallel and asynchronous).
Table 4.
Fragmentation states of files evaluation workloads.
Fig 12.
Impact on co-running application with defragmentation schemes (Standalone: running FIO alone).
(A) Normalized FIO execution time when running alone (Standalone) and when co-running with the e4defrag or ScaleDefrag under different fragmentation states. (B) Defragmentation time of the e4defrag and ScaleDefrag for the same states. ScaleDefrag both shortens defragmentation time and reduces the slowdown of the co-running application compared with the existing tool.
Fig 13.
Core scalability of ScaleDefrag on a manycore machine with a CT250MX500 SSD.
(A) Normalized defragmentation time of the e4defrag and ScaleDefrag as the number of defragger increases from 1 to 64 (lower is better). (B) Defragmentation throughput (MB/s) under the same settings. ScaleDefrag continuously reduces execution time and increases throughput as more cores are used.
Fig 14.
Core scalability of ScaleDefrag on a manycore machine with an Intel Optane 900p SSD.
(A) Normalized defragmentation time of the existing tool and ScaleDefrag as the number of defragger increases from 1 to 64 (lower is better). (B) Defragmentation throughput (MB/s) under the same settings. ScaleDefrag continually reduces execution time and increases throughput as more cores are used, reaching 4.57× higher throughput than the e4defrag at 64 cores.
Table 5.
Performance breakdown of e4defrag and ScaleDefrag (PD: parallel defragmentation, Async: asynchronous I/O).