Linux Disk Sequential Write, Random Write

Preface

  • Random writing can cause the head to change paths continuously, resulting in a significant reduction in efficiency
  • Sequential write heads rarely need to change lanes or have a short time to change lanes

Introduction to fio

  • Several key indicators in fio's output report:
    • slat: refers to the time from I/O submission to actual I/O execution (Submission latency)
    • clat: refers to the time from I/O submission to I/O completion (Completion latency)
    • lat: refers to the total time from fio creation I/O to I/O completion
    • bw: Throughput
    • iops: Number of I/O s per second

Synchronous Write Test

Synchronous random writing

  • Use strace tools to view system calls
    strace -f -tt -o /tmp/randwrite.log -D fio -name=randwrite -rw=randwrite \
    -direct=1 -bs=4k -size=1G -numjobs=1  -group_reporting -filename=/tmp/test.db
    
  • Extract key information
    root@wilson-ubuntu:~# strace -f -tt -o /tmp/randwrite.log -D fio -name=randwrite -rw=randwrite \
    > -direct=1 -bs=4k -size=1G -numjobs=1  -group_reporting -filename=/tmp/test.db
    randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
    fio-2.2.10
    Starting 1 process
    ...
    randwrite: (groupid=0, jobs=1): err= 0: pid=26882: Wed Aug 14 10:39:02 2019
      write: io=1024.0MB, bw=52526KB/s, iops=13131, runt= 19963msec
        clat (usec): min=42, max=18620, avg=56.15, stdev=164.79
         lat (usec): min=42, max=18620, avg=56.39, stdev=164.79
    ...
        bw (KB  /s): min=50648, max=55208, per=99.96%, avg=52506.03, stdev=1055.83
    ...
    
    Run status group 0 (all jobs):
      WRITE: io=1024.0MB, aggrb=52525KB/s, minb=52525KB/s, maxb=52525KB/s, mint=19963msec, maxt=19963msec
    
    Disk stats (read/write):
    ...
      sda: ios=0/262177, merge=0/25, ticks=0/7500, in_queue=7476, util=36.05%
    
  • Focused information
    • clat: The average time is about 56ms
    • lat: Average time is about 56ms
    • bw: Throughput, about 52M
  • Kernel call information, which uses lseek to locate the current file offset before each write is read randomly
    root@wilson-ubuntu:~# more /tmp/randwrite.log
    ...
    26882 10:38:41.919904 lseek(3, 665198592, SEEK_SET) = 665198592
    26882 10:38:41.919920 write(3, "\220\240@\6\371\341\277>\0\200\36\31\0\0\0\0\202\2\7\320\343\6H\26P\340\277\370\330\30e\30"..., 4096) = 4096
    26882 10:38:41.919969 lseek(3, 4313088, SEEK_SET) = 4313088
    26882 10:38:41.919985 write(3, "\220\240@\6\371\341\277>\0\200\36\31\0\0\0\0\202\2\7\320\343\6H\26P\340\277\370\330\30e\30"..., 4096) = 4096
    26882 10:38:41.920032 lseek(3, 455880704, SEEK_SET) = 455880704
    26882 10:38:41.920048 write(3, "\220\240@\6\371\341\277>\0\200\36\31\0\0\0\0\202\2\7\320\343\6H\26P\340\277\370\330\30e\30"..., 4096) = 4096
    26882 10:38:41.920096 lseek(3, 338862080, SEEK_SET) = 338862080
    26882 10:38:41.920112 write(3, "\220\240@\6\371\341\277>\0\2402\24\0\0\0\0\202\2\7\320\343\6H\26P\340\277\370\330\30e\30"..., 4096) = 4096
    26882 10:38:41.920161 lseek(3, 739086336, SEEK_SET) = 739086336
    26882 10:38:41.920177 write(3, "\220\240@\6\371\341\277>\0\2402\24\0\0\0\0\202\2\7\320\343\6H\26P\340\277\370\330\30e\30"..., 4096) = 4096
    26882 10:38:41.920229 lseek(3, 848175104, SEEK_SET) = 848175104
    26882 10:38:41.920245 write(3, "\220\240@\6\371\341\277>\0\2402\24\0\0\0\0\202\2\7\320\343\6H\26P\340\277\370\330\30e\30"..., 4096) = 4096
    26882 10:38:41.920296 lseek(3, 1060147200, SEEK_SET) = 1060147200
    26882 10:38:41.920312 write(3, "\220\240@\6\371\341\277>\0\2402\24\0\0\0\0\202\2\7\320\343\6H\26P\340\277\370\330\30e\30"..., 4096) = 4096
    26882 10:38:41.920362 lseek(3, 863690752, SEEK_SET) = 863690752
    26882 10:38:41.920377 write(3, "\220\240@\6\371\341\277>\0\2402\24\0\0\0\0\202\2\7\320\343\6H\26P\340\277\370\330\30e\30"..., 4096) = 4096
    26882 10:38:41.920428 lseek(3, 279457792, SEEK_SET) = 279457792
    26882 10:38:41.920444 write(3, "\220\240@\6\371\341\277>\0\2402\24\0\0\0\0\202\2\7\320\343\6H\26P\340\277\370\330\30e\30"..., 4096) = 4096
    26882 10:38:41.920492 lseek(3, 271794176, SEEK_SET) = 271794176
    26882 10:38:41.920508 write(3, "\220\240@\6\371\341\277>\0\2402\24\0\0\0\0\202\2\7\320\343\6H\26P\340\277\370\330\30e\30"..., 4096) = 4096
    26882 10:38:41.920558 lseek(3, 1067864064, SEEK_SET) = 1067864064
    26882 10:38:41.920573 write(3, "\220\240@\6\371\341\277>\0\2402\24\0\0\0\0\202\2\7\320\343\6H\26P\340\277\370\330\30e\30"..., 4096) = 4096
    ...
    

Synchronized Sequential Writing

  • Test Sequence Writing
    strace -f -tt -o /tmp/write.log -D fio -name=write -rw=write \
    -direct=1 -bs=4k -size=1G -numjobs=1  -group_reporting -filename=/tmp/test.db
    
  • As you can see, throughput has increased to around 70M
    write: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
    fio-2.2.10
    Starting 1 process
    Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/70432KB/0KB /s] [0/17.7K/0 iops] [eta 00m:00s]
    write: (groupid=0, jobs=1): err= 0: pid=27005: Wed Aug 14 10:53:02 2019
      write: io=1024.0MB, bw=70238KB/s, iops=17559, runt= 14929msec
        clat (usec): min=43, max=7464, avg=55.95, stdev=56.24
         lat (usec): min=43, max=7465, avg=56.15, stdev=56.25
    ...
        bw (KB  /s): min=67304, max=72008, per=99.98%, avg=70225.38, stdev=1266.88
    ...
    
    Run status group 0 (all jobs):
      WRITE: io=1024.0MB, aggrb=70237KB/s, minb=70237KB/s, maxb=70237KB/s, mint=14929msec, maxt=14929msec
    
    Disk stats (read/write):
    ...
      sda: ios=0/262162, merge=0/10, ticks=0/6948, in_queue=6932, util=46.49%
    
  • Kernel Call
    root@wilson-ubuntu:~# more /tmp/write.log
    ...
    27046 10:54:28.194508 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\360\t\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.194568 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.194627 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.194687 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.194747 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.194807 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.194868 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.194928 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.194988 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.195049 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.195110 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.195197 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.195262 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.195330 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.195426 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.195497 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.195567 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.195637 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.195704 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.195757 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.195807 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.195859 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.195910 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.195961 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.196012 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.196062 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0\220\24\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.196112 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0 \26\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.196162 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0 \26\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.196213 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0 \26\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.196265 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0 \26\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.196314 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0 \26\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.196363 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0 \26\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.196414 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0 \26\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.196472 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0 \26\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.196524 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0 \26\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    27046 10:54:28.196573 write(3, "\0\0\23\0\0\0\0\0\0\300\16\0\0\0\0\0\0 \26\0\0\0\0\0\0\320\17\0\0\0\0\0"..., 4096) = 4096
    ...
    
  • Sequential reading does not require relocating file offsets, so it focuses on writing

Asynchronous sequential writing

  • No slats were found in previous fio test reports because they were all synchronous operations. For synchronous I/O, slats are actually the time I/O completes because I/O commits and I/O completes are actions
  • Write test commands in asynchronous order
    # Synchronized Sequential Write Command Add-ioengine=libaio
    fio -name=write -rw=write -ioengine=libaio -direct=1 -bs=4k -size=1G -numjobs=1  -group_reporting -filename=/tmp/test.db
    
  • The slat metric appears, lat is approximately equal to the sum of slat + clat (avg average); and after switching to asynchronous io, throughput has been greatly improved, around 120M
    write: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
    fio-2.2.10
    Starting 1 process
    Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/119.3MB/0KB /s] [0/30.6K/0 iops] [eta 00m:00s]
    write: (groupid=0, jobs=1): err= 0: pid=27258: Wed Aug 14 11:14:36 2019
      write: io=1024.0MB, bw=120443KB/s, iops=30110, runt=  8706msec
        slat (usec): min=3, max=70, avg= 4.31, stdev= 1.56
        clat (usec): min=0, max=8967, avg=28.13, stdev=55.68
         lat (usec): min=22, max=8976, avg=32.53, stdev=55.72
    ...
        bw (KB  /s): min=118480, max=122880, per=100.00%, avg=120467.29, stdev=1525.68
    ...
    
    Run status group 0 (all jobs):
      WRITE: io=1024.0MB, aggrb=120442KB/s, minb=120442KB/s, maxb=120442KB/s, mint=8706msec, maxt=8706msec
    
    Disk stats (read/write):
    ...
      sda: ios=0/262147, merge=0/1, ticks=0/6576, in_queue=6568, util=74.32%
    

Tags: Operation & Maintenance Ubuntu iOS

Posted on Thu, 29 Aug 2019 18:53:31 -0700 by EPJS