KVM guests – compressed storages – enabling TRIM support on virtio disks.

Trim is commonly used as a way to notify SSD drive that some part of the data it holds is garbage and can be discarded, it extends SSD lifetime and releases otherwise reserved space on the disk. Turns out it’s not the only scenario where trim comes handy. Compression is a popular way to save on space usage when deploying virtual machines and it seems to be a good option especially that the performance penalty is marginal.

But there’s one compression-specific issue some admins wasn’t aware of or misunderstood. It might cause serious problems impacting all KVM guests on such storage if it’s not addressed. I will describe it in the example scenario below, note that it’s not ZFS-specific issue.

Let’s say we:

  • defined ZFS pool ‘t1’:
hv:~# zpool create t1 mirror sda9 sdb9
hv:~# zfs get compression t1
NAME          PROPERTY     VALUE     SOURCE
t1/guest1     compression  off       local
  • created some sparse (thin provisioned) datasets in that pool, one per KVM guest:
hv:~# zfs create -o compression=on -V 100G -s t1/guest1
hv:~# zfs get compression t1/guest1
NAME          PROPERTY     VALUE     SOURCE
t1/guest1     compression  on        local
  • created KVM guest which will use compressed dataset we just made:
hv:~# virt-install --virt-type kvm --name C6-guest1 --ram 8096 --disk /dev/zvol/t1/guest1 \
    --network bridge=bridge4,model=virtio --graphic vnc,password=trivialpw,listen=0.0.0.0 \
    --noautoconsole --os-type=linux --os-variant=centos6.10 \
    --cdrom=/opt/ISOs/CentOS-6.9-x86_64-minimal.iso --vcpus=8

 

Initial space usage on the pool and inside the KVM guest:

hv:~# zfs list -r t1 ; zpool list t1
NAME        USED  AVAIL     REFER  MOUNTPOINT
t1          584M  27.5G       96K  /t1
t1/guest1   583M  27.5G      583M  -
NAME  SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP DEDUP    HEALTH  ALTROOT
t1    29G   584M  28.4G        -         -     0%     1%  1.00x    ONLINE  -

Inside KVM guest:
[root@C6-guest1 ~]# df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        99G  719M   93G   1% /

 

Now amount of data (let it be 10GB of random data) deleted inside KVM guest will be the amount of free space immediately available back for that KVM guest.

[root@C6-guest1 ~]# touch 10GB.dat ; shred -n1 -s10G 10GB.dat ; df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        99G   11G   83G  12% /

[root@C6-guest1 ~]# rm -f 10GB.dat ; df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        99G  719M   93G   1% /

Outside the guest the dataset still shows either  the same space usage as it was before deletion – or released amount is so much smaller than the size of the data deleted:

hv:~# zfs list -r t1 ; zpool list t1
NAME       USED  AVAIL     REFER  MOUNTPOINT
t1         10.7G  17.4G       96K  /t1
t1/guest1  10.7G  17.4G     10.7G  -
NAME  SIZE ALLOC  FREE   CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
t1    29G  10.7G  18.3G        -         -     0%    36%  1.00x    ONLINE  -

this is because the data wasn’t really deleted but only marked as available for overwriting whenever the OS needs to write data again. As a result, there is now 10GB of garbage data on the VM’s storage, from VM’s OS perspective there is no problem as it perceives marked garbage as 10GB of available space  –  however outside the VM it’s a problem – the ZFS pool with compressed dataset have no idea it now holds 10GB of garbage because from the outside of KVM guest there is no way to tell which data is garbage and which isn’t. It’s 10GB less space available on the ZFS pool, affecting remaining KVM guests.

That’s where TRIM would come handy.

[root@C6-guest1 ~]# fstrim -v /
fstrim: /: FITRIM ioctl failed: Operation not supported

Bummer. In many cases the virtual controller defaults to virtio-blk (for example – current version of Softaculous’s Virtualizor) which doesn’t support discard feature (actually virtio-blk supports discard since kernel 5.0 but that version might be not an option for many reasons) so the workaround needs to be used.

In order to release the 10GB of garbage data from ZFS dataset without TRIM, the Guest’s OS would need to overwrite garbage with zeros (or any other stream with high compression ratio), so then the ZFS’s dataset can start compressing 10GB of data made just from a single character, therefore reducing it to – wild guess – let’s say few bytes/kilobytes:

[root@C6-guest1 ~]# dd if=/dev/zero of=zero bs=4M ; rm -f zero
dd: writing `zero': No space left on device
24910+0 records in
24909+0 records out
104477106176 bytes (104 GB) copied, 801.62 s, 130 MB/s

After that the 10GB on the dataset has been reclaimed:

hv:~# zfs list -r t1 ; zpool list t1
NAME USED AVAIL REFER MOUNTPOINT
t1 559M 27.5G 96K /t1
t1/guest1 558M 27.5G 558M -
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
t1 29G 561M 28.5G - - 0% 1% 1.00x ONLINE -

However TRIM is the preferred way.

For the TRIM to work inside KVM guest, the virtual disk must support ‘discard’ feature, here’s how:

1. Create a new file named new-virtio-scsi-ctl.xml with SCSI controller definition, SCSI drives will be attached to it, add the content below and save the file:

<controller type='scsi' model='virtio-scsi' index='5'/>

2. Define new SCSI drive – create new file new-virtio-scsi-drive.xml, fill it as below, note that the ‘controller’ number must match the one we just defined above:

<disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' discard='unmap'/>
      <source dev='/dev/sdb8'/>
      <target dev='sdb' bus='scsi'/>
      <address type='drive' controller='5' bus='0' target='0' unit='1'/>
</disk>

3. Import both controller and new disk definition, to our example KVM guest named C6-guest-13:

# virsh list
 Id   Name             State
--------------------------------
 37   C6-guest-13     running

The controller:

# virsh attach-device 37 --config --live new-virtio-scsi-ctl.xml 
Device attached successfully

Inside the KVM guest, dmesg will show something like:

​scsi host2: Virtio SCSI HBA

The drive:

# virsh attach-device 37 --config --live new-virtio-scsi-drive.xml
Device attached successfully

And the guest’s dmesg will show:

sd 2:0:0:1: [sdb] Attached SCSI disk

Lets quickly test the drive for fstrim support, execute as below inside KVM guest (bc and lsscsi needed):

# new virtio-scsi drive name as seen on the Guest
drv=sdb
# send commands to fdisk
p=$(echo -e "o\nn\np\n1\n\n$(bc<<<5*1024^3/$(cat /sys/block/$drv/queue/hw_sector_size))\np\nw"| \
    fdisk /dev/${drv}|grep ^/|cut -d' ' -f1)&&mkfs.ext4 \
    1>/dev/null ${p};mount -v ${p} /mnt;lsscsi;fstrim -v /mnt

The output should look as below:

Building a new DOS disklabel with disk identifier 0xc58806bb.
mke2fs 1.42.9 (28-Dec-2013)
mount: /dev/sdb1 mounted on /mnt.
[0:0:0:0]    cd/dvd  QEMU     QEMU DVD-ROM     2.5+  /dev/sr0
[6:0:0:0]    disk    QEMU     QEMU HARDDISK    2.5+  /dev/sdb
/mnt: 4.8 GiB (5128200192 bytes) trimmed

That’s it.

In order to convert existing ‘virtio’ storage to ‘virtio-scsi’ just add SCSI controller and adjust existing storage definition to match the new controller as an example above.

Leave a Reply

Your email address will not be published. Required fields are marked *