Fixing Unavailable LXD Storage Pool Due to Renamed ZFS Dataset

2023-05-06 Technical Salty Fish 1条

Today when I use lxc launch to create a new container, I was greeted with this error message:

Error: Failed instance creation: Failed creating instance from image: Storage pool is unavailable on this server

I confirmed that my only storage pool, the default one, is indeed unavailable:

$ lxc storage list
+---------+--------+-------------------+-------------+---------+-------------+
|  NAME   | DRIVER |      SOURCE       | DESCRIPTION | USED BY |    STATE    |
+---------+--------+-------------------+-------------+---------+-------------+
| default | zfs    | zroot/var/lib/lxc |             | 6       | UNAVAILABLE |
+---------+--------+-------------------+-------------+---------+-------------+

Digression: Don't Encrypt Your Zroot

And I suddenly understand why. When I first set up my current Arch Linux installation, I was young and naive. I configured root on ZFS and enabled native ZFS encryption on the entire zpool. As such, when I purchased a new laptop this year, I had some trouble migrating the OS. I thought that migration would be as easy as a zfs send -R -w zroot@migrate and a zfs recv zroot, but ZFS complained immediately:

cannot receive new filesystem stream: destination 'zroot' exists
must specify -F to overwrite it

Apparently I could not destroy the zpool and receive the snapshot into the void, neither could I use -F to overwrite zroot, because ZFS would complain again:

cannot receive new filesystem stream: zfs receive -F cannot be used to destroy an encrypted filesystem or overwrite an unencrypted one with an encrypted one

At that point I recalled hearing some wise people suggesting that you should NEVER encrypt your zpool, and should create an encrypted dataset under your zpool to use as the "de-facto zroot" instead. I did not listen to these words, so here I am, regretting what I did. Please, if you are reading this article and considering setting up root on ZFS, take this advice and avoid making the same mistake.

Good news is that I could switch to the recommended setup easily, by recreating an unencrypted new zroot and receiving the original zroot into zroot/crypt encrypted, so that was what I did, and as a result all my datasets now have an extra component in their paths - zroot/var/lib/lxc became zroot/crypt/var/lib/lxc.

The (Seemingly) Easy Fix

This explains why LXD would report the storage pool as unavailable. Fixing it should be easy: just type lxc storage edit default and you will see the configuration file:

### some comments, omitted

config:
  source: zroot/var/lib/lxc
  volatile.initial_source: zroot/var/lib/lxc
  zfs.pool_name: zroot/var/lib/lxc
description: ""
name: default
driver: zfs
used_by:
- /1.0/images/a88b48c87792028ccab2a65ae2a4a8eaf8d37100309031f5696e1bc7b24ebd57
- /1.0/instances/some_container
- /1.0/instances/some_other_container
- /1.0/profiles/default
status: Unavailable
locations:
- none

We should be done as soon as we replace zroot/var/lib/lxc with zroot/crypt/var/lib/lxc... or should we? Just as we save, LXD complains:

Config parsing error: Pool source cannot be changed when not in pending state
Press enter to open the editor again or ctrl+c to abort change

Huh. LXD does not allow us to change the source property of already created storage pools. What should we do now?

Working Around LXD's Restriction

Attempt #1 (Does NOT Work)

Remember that we are using ZFS, the best most flexible modern filesystem (IMHO). I quickly came up with a workaround: creating a new LXD storage pool with the updated source. However, this again did not work:

$ lxc storage create default2 zfs source=zroot/crypt/var/lib/lxc
Error: Provided ZFS pool (or dataset) isn't empty, run "sudo zfs list zroot/crypt/var/lib/lxc" to see existing entries

Attempt #2 (Does NOT Work)

So LXD does not allow initializing a storage pool with existing data. Fine - I guess we have to take the hard way:

  1. We will rename zroot/crypt/var/lib/lxc. I prefer to append an underscore.
  2. We will create a new LXD storage pool with the ZFS backend and source zroot/crypt/var/lib/lxc. Note that the specified ZFS dataset will be automatically created by LXD.
  3. We will overwrite zroot/crypt/var/lib/lxc with contents of zroot/crypt/var/lib/lxc_, by taking a snapshot and doing a standrad send/recv.
  4. We will edit the configuration of the new storage pool to resemble the old one.

The commands are fairly straightforward:

$ zfs rename zroot/crypt/var/lib/lxc zroot/crypt/var/lib/lxc_
$ zfs snapshot -r zroot/crypt/var/lib/lxc_@fix
$ lxc storage create default2 zfs source=zroot/crypt/var/lib/lxc
$ zfs destroy zroot/crypt/var/lib/lxc
$ zfs send -R -w zroot/crypt/var/lib/lxc_@fix | sudo zfs recv -F zroot/crypt/var/lib/lxc

Then, time to edit the configuration file:

$ lxc storage edit default2

I thought I would be able to save all changes I make to the configuration file, but it turned out that LXD does not rename storage pools, and just ignores changes to name. At this point, starting a previously created container still does not work, because

Error: Storage pool "default" unavailable on this server

Attempt #3 (Works!)

LXD internally uses an SQLite database to store its configuration, so we can always fall back to tampering with its DB manually. This is a bit dangerous, but it seems we have to do it anyway...

ZFS-based storage information is stored here:

$ lxd sql global "SELECT * FROM storage_pools_config;"
+----+-----------------+---------+-------------------------+-------------------+
| id | storage_pool_id | node_id |           key           |       value       |
+----+-----------------+---------+-------------------------+-------------------+
| 6  | 2               | 1       | zfs.pool_name           | zroot/var/lib/lxc |
| 7  | 2               | 1       | source                  | zroot/var/lib/lxc |
| 8  | 2               | 1       | volatile.initial_source | zroot/var/lib/lxc |
+----+-----------------+---------+-------------------------+-------------------+

We just need to update these values!

$ lxd sql global "UPDATE storage_pools_config SET value='zroot/crypt/var/lib/lxc';"
Rows affected: 3
Note: Please add a WHERE clause to narrow down the records you wish to update if you have multiple storage configurations! I don't need one because I only have the default storage configured.

In a minute, the storage should become available again:

$ lxc storage list
+---------+--------+-------------------------+-------------+---------+---------+
|  NAME   | DRIVER |         SOURCE          | DESCRIPTION | USED BY |  STATE  |
+---------+--------+-------------------------+-------------+---------+---------+
| default | zfs    | zroot/crypt/var/lib/lxc |             | 6       | CREATED |
+---------+--------+-------------------------+-------------+---------+---------+