Message ID | 85226cf68d7a72a034f0c0895b96b2557169755b.1698917826.git.anand.jain@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] btrfs: pick device with lowest devt for show_devname | expand |
On Thu, Nov 02, 2023 at 07:10:48PM +0800, Anand Jain wrote: > In a non-single-device Btrfs filesystem, if Btrfs is already mounted and > if you run the command 'mount -a,' it will fail and the command > 'umount <device>' also fails. As below: > > ---------------- > $ cat /etc/fstab | grep btrfs > UUID=12345678-1234-1234-1234-123456789abc /btrfs btrfs defaults,nofail 0 0 > > $ mkfs.btrfs -qf --uuid 12345678-1234-1234-1234-123456789abc /dev/sda2 /dev/sda1 > $ mount --verbose -a > / : ignored > /btrfs : successfully mounted > > $ ls -l /dev/disk/by-uuid | grep 12345678-1234-1234-1234-123456789abc > lrwxrwxrwx 1 root root 10 Nov 2 17:43 12345678-1234-1234-1234-123456789abc -> ../../sda1 > > $ cat /proc/self/mounts | grep btrfs > /dev/sda2 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0 > > $ findmnt --df /btrfs > SOURCE FSTYPE SIZE USED AVAIL USE% TARGET > /dev/sda2 btrfs 2G 5.8M 1.8G 0% /btrfs > > $ mount --verbose -a > / : ignored > mount: /btrfs: /dev/sda1 already mounted or mount point busy. > $echo $? > 32 > > $ umount /dev/sda1 > umount: /dev/sda1: not mounted. > $ echo $? > 32 > ---------------- > > I assume (RFC) this is because '/dev/disk/by-uuid,' '/proc/self/mounts,' > and 'findmnt' do not all reference the same device, resulting in the > 'mount -a' and 'umount' failures. However, an empirically found solution > is to align them using a rule, such as the disk with the lowest 'devt,' > for a multi-device Btrfs filesystem. > > I'm not yet sure (RFC) how to create a udev rule to point to the disk with > the lowest 'devt,' as this kernel patch does, and I believe it is > possible. > > And this would ensure that '/proc/self/mounts,' 'findmnt,' and > '/dev/disk/by-uuid' all reference the same device. > > After applying this patch, the above test passes. Unfortunately, > /dev/disk/by-uuid also points to the lowest 'devt' by chance, even though > no rule has been set as of now. As shown below. Does this mean the devid of the device shown in /proc/self/mount won't be the lowest? Here the devid is the logical device number, while devt is some internal identifier or at least not something I'd consider a good identifier from user perspective. The lowest devid has been there for a long time so I'd consider this as behaviour change which can potentially break things.
On 11/3/23 04:26, David Sterba wrote: > On Thu, Nov 02, 2023 at 07:10:48PM +0800, Anand Jain wrote: >> In a non-single-device Btrfs filesystem, if Btrfs is already mounted and >> if you run the command 'mount -a,' it will fail and the command >> 'umount <device>' also fails. As below: >> >> ---------------- >> $ cat /etc/fstab | grep btrfs >> UUID=12345678-1234-1234-1234-123456789abc /btrfs btrfs defaults,nofail 0 0 >> >> $ mkfs.btrfs -qf --uuid 12345678-1234-1234-1234-123456789abc /dev/sda2 /dev/sda1 >> $ mount --verbose -a >> / : ignored >> /btrfs : successfully mounted >> >> $ ls -l /dev/disk/by-uuid | grep 12345678-1234-1234-1234-123456789abc >> lrwxrwxrwx 1 root root 10 Nov 2 17:43 12345678-1234-1234-1234-123456789abc -> ../../sda1 >> >> $ cat /proc/self/mounts | grep btrfs >> /dev/sda2 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0 >> >> $ findmnt --df /btrfs >> SOURCE FSTYPE SIZE USED AVAIL USE% TARGET >> /dev/sda2 btrfs 2G 5.8M 1.8G 0% /btrfs >> >> $ mount --verbose -a >> / : ignored >> mount: /btrfs: /dev/sda1 already mounted or mount point busy. >> $echo $? >> 32 >> >> $ umount /dev/sda1 >> umount: /dev/sda1: not mounted. >> $ echo $? >> 32 >> ---------------- >> >> I assume (RFC) this is because '/dev/disk/by-uuid,' '/proc/self/mounts,' >> and 'findmnt' do not all reference the same device, resulting in the >> 'mount -a' and 'umount' failures. However, an empirically found solution >> is to align them using a rule, such as the disk with the lowest 'devt,' >> for a multi-device Btrfs filesystem. >> >> I'm not yet sure (RFC) how to create a udev rule to point to the disk with >> the lowest 'devt,' as this kernel patch does, and I believe it is >> possible. >> >> And this would ensure that '/proc/self/mounts,' 'findmnt,' and >> '/dev/disk/by-uuid' all reference the same device. >> >> After applying this patch, the above test passes. Unfortunately, >> /dev/disk/by-uuid also points to the lowest 'devt' by chance, even though >> no rule has been set as of now. As shown below. > > Does this mean the devid of the device shown in /proc/self/mount won't > be the lowest? Here the devid is the logical device number, while devt > is some internal identifier or at least not something I'd consider a > good identifier from user perspective. > > The lowest devid has been there for a long time so I'd consider this as > behaviour change which can potentially break things. It's not the lowest devid, but rather the latest_bdev since commit 6605fd2f394b ('btrfs: use latest_dev in btrfs_show_devname'). We need a rule for choosing a device in a multi-device filesystem that works both inside and outside the kernel. The major-minor (devt) is the only consistent option. OR Any other ideas? Thanks, Anand
On Fri, Nov 03, 2023 at 06:55:51AM +0800, Anand Jain wrote: > On 11/3/23 04:26, David Sterba wrote: > > On Thu, Nov 02, 2023 at 07:10:48PM +0800, Anand Jain wrote: > >> In a non-single-device Btrfs filesystem, if Btrfs is already mounted and > >> if you run the command 'mount -a,' it will fail and the command > >> 'umount <device>' also fails. As below: > >> > >> ---------------- > >> $ cat /etc/fstab | grep btrfs > >> UUID=12345678-1234-1234-1234-123456789abc /btrfs btrfs defaults,nofail 0 0 > >> > >> $ mkfs.btrfs -qf --uuid 12345678-1234-1234-1234-123456789abc /dev/sda2 /dev/sda1 > >> $ mount --verbose -a > >> / : ignored > >> /btrfs : successfully mounted > >> > >> $ ls -l /dev/disk/by-uuid | grep 12345678-1234-1234-1234-123456789abc > >> lrwxrwxrwx 1 root root 10 Nov 2 17:43 12345678-1234-1234-1234-123456789abc -> ../../sda1 > >> > >> $ cat /proc/self/mounts | grep btrfs > >> /dev/sda2 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0 > >> > >> $ findmnt --df /btrfs > >> SOURCE FSTYPE SIZE USED AVAIL USE% TARGET > >> /dev/sda2 btrfs 2G 5.8M 1.8G 0% /btrfs > >> > >> $ mount --verbose -a > >> / : ignored > >> mount: /btrfs: /dev/sda1 already mounted or mount point busy. > >> $echo $? > >> 32 > >> > >> $ umount /dev/sda1 > >> umount: /dev/sda1: not mounted. > >> $ echo $? > >> 32 > >> ---------------- > >> > >> I assume (RFC) this is because '/dev/disk/by-uuid,' '/proc/self/mounts,' > >> and 'findmnt' do not all reference the same device, resulting in the > >> 'mount -a' and 'umount' failures. However, an empirically found solution > >> is to align them using a rule, such as the disk with the lowest 'devt,' > >> for a multi-device Btrfs filesystem. > >> > >> I'm not yet sure (RFC) how to create a udev rule to point to the disk with > >> the lowest 'devt,' as this kernel patch does, and I believe it is > >> possible. > >> > >> And this would ensure that '/proc/self/mounts,' 'findmnt,' and > >> '/dev/disk/by-uuid' all reference the same device. > >> > >> After applying this patch, the above test passes. Unfortunately, > >> /dev/disk/by-uuid also points to the lowest 'devt' by chance, even though > >> no rule has been set as of now. As shown below. > > > > Does this mean the devid of the device shown in /proc/self/mount won't > > be the lowest? Here the devid is the logical device number, while devt > > is some internal identifier or at least not something I'd consider a > > good identifier from user perspective. > > > > The lowest devid has been there for a long time so I'd consider this as > > behaviour change which can potentially break things. > > It's not the lowest devid, but rather the latest_bdev since commit > 6605fd2f394b ('btrfs: use latest_dev in btrfs_show_devname'). > > We need a rule for choosing a device in a multi-device filesystem that > works both inside and outside the kernel. The major-minor (devt) is the > only consistent option. I'm not sure how users interpret the device path in the proc/mounts output so changing that can potentially break something. OTOH if the device hasn't been always the lowest one (i.e. no assumptions because of the mentioned commit), then we can use the devt.
On Thu, Nov 02, 2023 at 07:10:48PM +0800, Anand Jain wrote: > In a non-single-device Btrfs filesystem, if Btrfs is already mounted and > if you run the command 'mount -a,' it will fail and the command > 'umount <device>' also fails. As below: > > ---------------- > $ cat /etc/fstab | grep btrfs > UUID=12345678-1234-1234-1234-123456789abc /btrfs btrfs defaults,nofail 0 0 > > $ mkfs.btrfs -qf --uuid 12345678-1234-1234-1234-123456789abc /dev/sda2 /dev/sda1 > $ mount --verbose -a > / : ignored > /btrfs : successfully mounted > > $ ls -l /dev/disk/by-uuid | grep 12345678-1234-1234-1234-123456789abc > lrwxrwxrwx 1 root root 10 Nov 2 17:43 12345678-1234-1234-1234-123456789abc -> ../../sda1 > > $ cat /proc/self/mounts | grep btrfs > /dev/sda2 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0 > > $ findmnt --df /btrfs > SOURCE FSTYPE SIZE USED AVAIL USE% TARGET > /dev/sda2 btrfs 2G 5.8M 1.8G 0% /btrfs > > $ mount --verbose -a > / : ignored > mount: /btrfs: /dev/sda1 already mounted or mount point busy. > $echo $? > 32 > > $ umount /dev/sda1 > umount: /dev/sda1: not mounted. > $ echo $? > 32 > ---------------- > > I assume (RFC) this is because '/dev/disk/by-uuid,' '/proc/self/mounts,' > and 'findmnt' do not all reference the same device, resulting in the > 'mount -a' and 'umount' failures. However, an empirically found solution > is to align them using a rule, such as the disk with the lowest 'devt,' > for a multi-device Btrfs filesystem. > > I'm not yet sure (RFC) how to create a udev rule to point to the disk with > the lowest 'devt,' as this kernel patch does, and I believe it is > possible. > > And this would ensure that '/proc/self/mounts,' 'findmnt,' and > '/dev/disk/by-uuid' all reference the same device. > > After applying this patch, the above test passes. Unfortunately, > /dev/disk/by-uuid also points to the lowest 'devt' by chance, even though > no rule has been set as of now. As shown below. > > ---------------- > $ mkfs.btrfs -qf --uuid 12345678-1234-1234-1234-123456789abc /dev/sda2 /dev/sda1 > > $ mount --verbose -a > / : ignored > /btrfs : successfully mounted > > $ ls -l /dev/disk/by-uuid | grep 12345678-1234-1234-1234-123456789abc > lrwxrwxrwx 1 root root 10 Nov 2 17:53 12345678-1234-1234-1234-123456789abc -> ../../sda1 > > $ cat /proc/self/mounts | grep btrfs > /dev/sda1 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0 > > $ findmnt --df /btrfs > SOURCE FSTYPE SIZE USED AVAIL USE% TARGET > /dev/sda1 btrfs 2G 5.8M 1.8G 0% /btrfs > > $ mount --verbose -a > / : ignored > /btrfs : already mounted > $echo $? > 0 > > $ umount /dev/sda1 > $echo $? > 0 > ---------------- > > Signed-off-by: Anand Jain <anand.jain@oracle.com> > --- > fs/btrfs/super.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c > index 66bdb6fd83bd..d768917cc5cc 100644 > --- a/fs/btrfs/super.c > +++ b/fs/btrfs/super.c > @@ -2301,7 +2301,18 @@ static int btrfs_unfreeze(struct super_block *sb) > > static int btrfs_show_devname(struct seq_file *m, struct dentry *root) > { > - struct btrfs_fs_info *fs_info = btrfs_sb(root->d_sb); > + struct btrfs_fs_devices *fs_devices = btrfs_sb(root->d_sb)->fs_devices; > + struct btrfs_device *device; > + struct btrfs_device *first_device = NULL; > + > + list_for_each_entry(device, &fs_devices->devices, dev_list) { > + if (first_device == NULL) { > + first_device = device; > + continue; > + } > + if (first_device->devt > device->devt) > + first_device = device; > + } I think we agree in principle that the devt can be used to determine the device to print in show_devname, however I'd like to avoid iterating the device list here. We used to have it, first with mutex protection, then RCU. That we can simply print the latest_dev is quite convenient. I suggest to either add a separate fs_info variable to set the desired device and only print it here, or reuse latest_dev for that. Adding a separate variable for that should be safer as latest_dev is used in various way and I can't say it's always clear.
On 11/25/23 00:19, David Sterba wrote: > On Thu, Nov 02, 2023 at 07:10:48PM +0800, Anand Jain wrote: >> In a non-single-device Btrfs filesystem, if Btrfs is already mounted and >> if you run the command 'mount -a,' it will fail and the command >> 'umount <device>' also fails. As below: >> >> ---------------- >> $ cat /etc/fstab | grep btrfs >> UUID=12345678-1234-1234-1234-123456789abc /btrfs btrfs defaults,nofail 0 0 >> >> $ mkfs.btrfs -qf --uuid 12345678-1234-1234-1234-123456789abc /dev/sda2 /dev/sda1 >> $ mount --verbose -a >> / : ignored >> /btrfs : successfully mounted >> >> $ ls -l /dev/disk/by-uuid | grep 12345678-1234-1234-1234-123456789abc >> lrwxrwxrwx 1 root root 10 Nov 2 17:43 12345678-1234-1234-1234-123456789abc -> ../../sda1 >> >> $ cat /proc/self/mounts | grep btrfs >> /dev/sda2 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0 >> >> $ findmnt --df /btrfs >> SOURCE FSTYPE SIZE USED AVAIL USE% TARGET >> /dev/sda2 btrfs 2G 5.8M 1.8G 0% /btrfs >> >> $ mount --verbose -a >> / : ignored >> mount: /btrfs: /dev/sda1 already mounted or mount point busy. >> $echo $? >> 32 >> >> $ umount /dev/sda1 >> umount: /dev/sda1: not mounted. >> $ echo $? >> 32 >> ---------------- >> >> I assume (RFC) this is because '/dev/disk/by-uuid,' '/proc/self/mounts,' >> and 'findmnt' do not all reference the same device, resulting in the >> 'mount -a' and 'umount' failures. However, an empirically found solution >> is to align them using a rule, such as the disk with the lowest 'devt,' >> for a multi-device Btrfs filesystem. >> >> I'm not yet sure (RFC) how to create a udev rule to point to the disk with >> the lowest 'devt,' as this kernel patch does, and I believe it is >> possible. >> >> And this would ensure that '/proc/self/mounts,' 'findmnt,' and >> '/dev/disk/by-uuid' all reference the same device. >> >> After applying this patch, the above test passes. Unfortunately, >> /dev/disk/by-uuid also points to the lowest 'devt' by chance, even though >> no rule has been set as of now. As shown below. >> >> ---------------- >> $ mkfs.btrfs -qf --uuid 12345678-1234-1234-1234-123456789abc /dev/sda2 /dev/sda1 >> >> $ mount --verbose -a >> / : ignored >> /btrfs : successfully mounted >> >> $ ls -l /dev/disk/by-uuid | grep 12345678-1234-1234-1234-123456789abc >> lrwxrwxrwx 1 root root 10 Nov 2 17:53 12345678-1234-1234-1234-123456789abc -> ../../sda1 >> >> $ cat /proc/self/mounts | grep btrfs >> /dev/sda1 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0 >> >> $ findmnt --df /btrfs >> SOURCE FSTYPE SIZE USED AVAIL USE% TARGET >> /dev/sda1 btrfs 2G 5.8M 1.8G 0% /btrfs >> >> $ mount --verbose -a >> / : ignored >> /btrfs : already mounted >> $echo $? >> 0 >> >> $ umount /dev/sda1 >> $echo $? >> 0 >> ---------------- >> >> Signed-off-by: Anand Jain <anand.jain@oracle.com> >> --- >> fs/btrfs/super.c | 15 +++++++++++++-- >> 1 file changed, 13 insertions(+), 2 deletions(-) >> >> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c >> index 66bdb6fd83bd..d768917cc5cc 100644 >> --- a/fs/btrfs/super.c >> +++ b/fs/btrfs/super.c >> @@ -2301,7 +2301,18 @@ static int btrfs_unfreeze(struct super_block *sb) >> >> static int btrfs_show_devname(struct seq_file *m, struct dentry *root) >> { >> - struct btrfs_fs_info *fs_info = btrfs_sb(root->d_sb); >> + struct btrfs_fs_devices *fs_devices = btrfs_sb(root->d_sb)->fs_devices; >> + struct btrfs_device *device; >> + struct btrfs_device *first_device = NULL; >> + >> + list_for_each_entry(device, &fs_devices->devices, dev_list) { >> + if (first_device == NULL) { >> + first_device = device; >> + continue; >> + } >> + if (first_device->devt > device->devt) >> + first_device = device; >> + } > > I think we agree in principle that the devt can be used to determine the > device to print in show_devname, however I'd like to avoid iterating the > device list here. We used to have it, first with mutex protection, then > RCU. That we can simply print the latest_dev is quite convenient. > > I suggest to either add a separate fs_info variable to set the desired > device and only print it here, or reuse latest_dev for that. > I got it. Thx. However, I am still thinking about the fix. more below. > Adding a separate variable for that should be safer as latest_dev is > used in various way and I can't say it's always clear. Libmount and libblkid use the device path from the mount table and udev, respectively. These output gets string-matched during 'umount <dev>', 'findmnt', and 'mount -a'. However, in the case of Btrfs, there may be more than one device per FSID, and there is neither a master device nor a consolidating pseudo device for public relations, similar to LVM. A rule, such as selecting the device with the lowest maj:min, works if it is implemented in both systemd and btrfs.ko. But, this approach does not resolve the 'umount' problem, such as 'umount <device-which-mount-table-did-not-show>'. I am skeptical about whether we have a strong case to create a single pseudo device per multi-device Btrfs filesystem, such as, for example '/dev/btrfs/<fsid>-<random>/rootid=5' and which means pseudo device will carry the btrfs-magic and the actual blk devices something else. OR for now, regarding the umount issue mentioned above, we just can document it for the users to be aware of. Any feedback is greatly appreciated. Thanks.
On 11/25/23 09:09, Anand Jain wrote: > > > On 11/25/23 00:19, David Sterba wrote: >> On Thu, Nov 02, 2023 at 07:10:48PM +0800, Anand Jain wrote: >>> In a non-single-device Btrfs filesystem, if Btrfs is already mounted and >>> if you run the command 'mount -a,' it will fail and the command >>> 'umount <device>' also fails. As below: >>> >>> ---------------- >>> $ cat /etc/fstab | grep btrfs >>> UUID=12345678-1234-1234-1234-123456789abc /btrfs btrfs >>> defaults,nofail 0 0 >>> >>> $ mkfs.btrfs -qf --uuid 12345678-1234-1234-1234-123456789abc >>> /dev/sda2 /dev/sda1 >>> $ mount --verbose -a >>> / : ignored >>> /btrfs : successfully mounted >>> >>> $ ls -l /dev/disk/by-uuid | grep 12345678-1234-1234-1234-123456789abc >>> lrwxrwxrwx 1 root root 10 Nov 2 17:43 >>> 12345678-1234-1234-1234-123456789abc -> ../../sda1 >>> >>> $ cat /proc/self/mounts | grep btrfs >>> /dev/sda2 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/ >>> 0 0 >>> >>> $ findmnt --df /btrfs >>> SOURCE FSTYPE SIZE USED AVAIL USE% TARGET >>> /dev/sda2 btrfs 2G 5.8M 1.8G 0% /btrfs >>> >>> $ mount --verbose -a >>> / : ignored >>> mount: /btrfs: /dev/sda1 already mounted or mount point busy. >>> $echo $? >>> 32 >>> >>> $ umount /dev/sda1 >>> umount: /dev/sda1: not mounted. >>> $ echo $? >>> 32 >>> ---------------- >>> >>> I assume (RFC) this is because '/dev/disk/by-uuid,' '/proc/self/mounts,' >>> and 'findmnt' do not all reference the same device, resulting in the >>> 'mount -a' and 'umount' failures. However, an empirically found solution >>> is to align them using a rule, such as the disk with the lowest 'devt,' >>> for a multi-device Btrfs filesystem. >>> >>> I'm not yet sure (RFC) how to create a udev rule to point to the disk >>> with >>> the lowest 'devt,' as this kernel patch does, and I believe it is >>> possible. >>> >>> And this would ensure that '/proc/self/mounts,' 'findmnt,' and >>> '/dev/disk/by-uuid' all reference the same device. >>> >>> After applying this patch, the above test passes. Unfortunately, >>> /dev/disk/by-uuid also points to the lowest 'devt' by chance, even >>> though >>> no rule has been set as of now. As shown below. >>> >>> ---------------- >>> $ mkfs.btrfs -qf --uuid 12345678-1234-1234-1234-123456789abc >>> /dev/sda2 /dev/sda1 >>> >>> $ mount --verbose -a >>> / : ignored >>> /btrfs : successfully mounted >>> >>> $ ls -l /dev/disk/by-uuid | grep 12345678-1234-1234-1234-123456789abc >>> lrwxrwxrwx 1 root root 10 Nov 2 17:53 >>> 12345678-1234-1234-1234-123456789abc -> ../../sda1 >>> >>> $ cat /proc/self/mounts | grep btrfs >>> /dev/sda1 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/ >>> 0 0 >>> >>> $ findmnt --df /btrfs >>> SOURCE FSTYPE SIZE USED AVAIL USE% TARGET >>> /dev/sda1 btrfs 2G 5.8M 1.8G 0% /btrfs >>> >>> $ mount --verbose -a >>> / : ignored >>> /btrfs : already mounted >>> $echo $? >>> 0 >>> >>> $ umount /dev/sda1 >>> $echo $? >>> 0 >>> ---------------- >>> >>> Signed-off-by: Anand Jain <anand.jain@oracle.com> >>> --- >>> fs/btrfs/super.c | 15 +++++++++++++-- >>> 1 file changed, 13 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c >>> index 66bdb6fd83bd..d768917cc5cc 100644 >>> --- a/fs/btrfs/super.c >>> +++ b/fs/btrfs/super.c >>> @@ -2301,7 +2301,18 @@ static int btrfs_unfreeze(struct super_block *sb) >>> static int btrfs_show_devname(struct seq_file *m, struct dentry *root) >>> { >>> - struct btrfs_fs_info *fs_info = btrfs_sb(root->d_sb); >>> + struct btrfs_fs_devices *fs_devices = >>> btrfs_sb(root->d_sb)->fs_devices; >>> + struct btrfs_device *device; >>> + struct btrfs_device *first_device = NULL; >>> + >>> + list_for_each_entry(device, &fs_devices->devices, dev_list) { >>> + if (first_device == NULL) { >>> + first_device = device; >>> + continue; >>> + } >>> + if (first_device->devt > device->devt) >>> + first_device = device; >>> + } >> >> I think we agree in principle that the devt can be used to determine the >> device to print in show_devname, however I'd like to avoid iterating the >> device list here. We used to have it, first with mutex protection, then >> RCU. That we can simply print the latest_dev is quite convenient. >> >> I suggest to either add a separate fs_info variable to set the desired >> device and only print it here, or reuse latest_dev for that. >> > > I got it. Thx. However, I am still thinking about the fix. more below. > >> Adding a separate variable for that should be safer as latest_dev is >> used in various way and I can't say it's always clear. > > > Libmount and libblkid use the device path from the mount table and udev, > respectively. These output gets string-matched during 'umount <dev>', > 'findmnt', and 'mount -a'. However, in the case of Btrfs, there may be > more than one device per FSID, and there is neither a master device nor > a consolidating pseudo device for public relations, similar to LVM. > > A rule, such as selecting the device with the lowest maj:min, works if > it is implemented in both systemd and btrfs.ko. But, this approach does > not resolve the 'umount' problem, such as > 'umount <device-which-mount-table-did-not-show>'. > > I am skeptical about whether we have a strong case to create a single > pseudo device per multi-device Btrfs filesystem, such as, for example > '/dev/btrfs/<fsid>-<random>/rootid=5' and which means pseudo device > will carry the btrfs-magic and the actual blk devices something else. > > OR for now, regarding the umount issue mentioned above, we just can > document it for the users to be aware of. > > Any feedback is greatly appreciated. > How about if we display the devices list in the options, so that user-land libs have something in the mount-table that tells all the devices part of the fsid? For example: $ cat /proc/self/mounts | grep btrfs /dev/sda1 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/,device=/dev/sda2,device=/dev/sdb3 0 0 Thanks.
On 27/11/2023 12.48, Anand Jain wrote: > > > On 11/25/23 09:09, Anand Jain wrote: [...] >> I am skeptical about whether we have a strong case to create a single >> pseudo device per multi-device Btrfs filesystem, such as, for example >> '/dev/btrfs/<fsid>-<random>/rootid=5' and which means pseudo device >> will carry the btrfs-magic and the actual blk devices something else. >> >> OR for now, regarding the umount issue mentioned above, we just can >> document it for the users to be aware of. >> >> Any feedback is greatly appreciated. >> > > How about if we display the devices list in the options, so that > user-land libs have something in the mount-table that tells all > the devices part of the fsid? > > For example: > $ cat /proc/self/mounts | grep btrfs > > /dev/sda1 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/,device=/dev/sda2,device=/dev/sdb3 0 0 > When I developed code to find a btrfs mount point from a disk, I had to consider all the devices involved and check if one is in /proc/self/mounts. Putting the devices list as device=<xxx>,device=<yyy> doesn't change anything because the code has to manage a btrfs filesistem as "special" in any case. To get the map <btrfs-uuid> <-> <devices-list> I used libblkid. I think that a "saner" way to manage this issue, is to patch "mount" to consider the special needing of btrfs. Pay attention to consider also events like, removing a device, adding a device: after these events how /dev/disk/by-uuid/ would be updated ? What about bcachefs ? Does it have the same issue ? If yes this may be a further reason to patch "mount" instead relying to a rule (pick the lowest devt) spread for all the projects (systemd, mount...). > Thanks. >
On 28/11/2023 16:00, Goffredo Baroncelli wrote: > On 27/11/2023 12.48, Anand Jain wrote: >> >> >> On 11/25/23 09:09, Anand Jain wrote: > [...] >>> I am skeptical about whether we have a strong case to create a single >>> pseudo device per multi-device Btrfs filesystem, such as, for example >>> '/dev/btrfs/<fsid>-<random>/rootid=5' and which means pseudo device >>> will carry the btrfs-magic and the actual blk devices something else. >>> >>> OR for now, regarding the umount issue mentioned above, we just can >>> document it for the users to be aware of. >>> >>> Any feedback is greatly appreciated. >>> >> >> How about if we display the devices list in the options, so that >> user-land libs have something in the mount-table that tells all >> the devices part of the fsid? >> >> For example: >> $ cat /proc/self/mounts | grep btrfs >> >> /dev/sda1 /btrfs btrfs >> rw,relatime,space_cache=v2,subvolid=5,subvol=/,device=/dev/sda2,device=/dev/sdb3 0 0 >> > > When I developed code to find a btrfs mount point from a disk, I had to > consider all the devices involved and check if one is in /proc/self/mounts. > > Putting the devices list as device=<xxx>,device=<yyy> doesn't change > anything because > the code has to manage a btrfs filesistem as "special" in any case. > To get the map <btrfs-uuid> <-> <devices-list> I used libblkid. > > I think that a "saner" way to manage this issue, is to patch "mount" to > consider the special needing of btrfs. > > Pay attention to consider also events like, removing a device, adding a > device: > after these events how /dev/disk/by-uuid/ would be updated ? > > What about bcachefs ? Does it have the same issue ? If yes this may be > a further reason to patch "mount" instead relying to a rule (pick the > lowest devt) spread for all the projects (systemd, mount...). The display-devices-list in /proc/self/mounts method simplifies device listing, bypassing the need for sysfs or Btrfs-progs and it replaces the lowest-devt approach proposed earlier. And, yeah, all multi-device filesystems would need a special case handing in libmount. Udev's /dev/disk/by-uuid, gets updated upon an (over)write sb event. I don't know its updater, it is not in util-linux, I find no rules in /etc/udev either. Regarding libblkid for Btrfs device discovery, I m little confused what are you referring to, an example would be helpful. Thanks.
On Wed, Nov 29, 2023 at 2:28 AM Anand Jain <anand.jain@oracle.com> wrote: > > > The display-devices-list in /proc/self/mounts method simplifies device > listing, bypassing the need for sysfs or Btrfs-progs and it replaces the > lowest-devt approach proposed earlier. And, yeah, all multi-device > filesystems would need a special case handing in libmount. > > Udev's /dev/disk/by-uuid, gets updated upon an (over)write sb event. I > don't know its updater, it is not in util-linux, I find no rules in Today it is 60-persistent-storage.rules which is part of systemd. Currently the logic makes the last device claiming the same symlink win. There was PR to keep the first created symlink, but it caused some regression test failure and was reverted. > /etc/udev either. > Packages install rules in /usr/lib/udev (may be /lib/udev on older distributions).
On 29/11/2023 00.28, Anand Jain wrote: > > > On 28/11/2023 16:00, Goffredo Baroncelli wrote: >> On 27/11/2023 12.48, Anand Jain wrote: >>> >>> >>> On 11/25/23 09:09, Anand Jain wrote: >> [...] >>>> I am skeptical about whether we have a strong case to create a single >>>> pseudo device per multi-device Btrfs filesystem, such as, for example >>>> '/dev/btrfs/<fsid>-<random>/rootid=5' and which means pseudo device >>>> will carry the btrfs-magic and the actual blk devices something else. >>>> >>>> OR for now, regarding the umount issue mentioned above, we just can >>>> document it for the users to be aware of. >>>> >>>> Any feedback is greatly appreciated. >>>> >>> >>> How about if we display the devices list in the options, so that >>> user-land libs have something in the mount-table that tells all >>> the devices part of the fsid? >>> >>> For example: >>> $ cat /proc/self/mounts | grep btrfs >>> >>> /dev/sda1 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/,device=/dev/sda2,device=/dev/sdb3 0 0 >>> >> >> When I developed code to find a btrfs mount point from a disk, I had to >> consider all the devices involved and check if one is in /proc/self/mounts. >> >> Putting the devices list as device=<xxx>,device=<yyy> doesn't change anything because >> the code has to manage a btrfs filesistem as "special" in any case. >> To get the map <btrfs-uuid> <-> <devices-list> I used libblkid. [...] > > Regarding libblkid for Btrfs device discovery, I m little confused what are you referring to, an example would be helpful. I developed a little utility to build for each btrfs filesystem: - all the devices involved - all the mountpoint (if any) where the filesystem is mounted and the subvolume used as root. It was nice because it got all these information only using: - libblkid - parsing /proc/self/mountinfo > > Thanks.
On 02/11/2023 12.10, Anand Jain wrote: > In a non-single-device Btrfs filesystem, if Btrfs is already mounted and > if you run the command 'mount -a,' it will fail and the command > 'umount <device>' also fails. As below: > > ---------------- > $ cat /etc/fstab | grep btrfs > UUID=12345678-1234-1234-1234-123456789abc /btrfs btrfs defaults,nofail 0 0 > > $ mkfs.btrfs -qf --uuid 12345678-1234-1234-1234-123456789abc /dev/sda2 /dev/sda1 > $ mount --verbose -a > / : ignored > /btrfs : successfully mounted > > $ ls -l /dev/disk/by-uuid | grep 12345678-1234-1234-1234-123456789abc > lrwxrwxrwx 1 root root 10 Nov 2 17:43 12345678-1234-1234-1234-123456789abc -> ../../sda1 > > $ cat /proc/self/mounts | grep btrfs > /dev/sda2 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0 > > $ findmnt --df /btrfs > SOURCE FSTYPE SIZE USED AVAIL USE% TARGET > /dev/sda2 btrfs 2G 5.8M 1.8G 0% /btrfs > > $ mount --verbose -a > / : ignored > mount: /btrfs: /dev/sda1 already mounted or mount point busy. > $echo $? > 32 > > $ umount /dev/sda1 > umount: /dev/sda1: not mounted. > $ echo $? > 32 > ---------------- > > I assume (RFC) this is because '/dev/disk/by-uuid,' '/proc/self/mounts,' > and 'findmnt' do not all reference the same device, resulting in the > 'mount -a' and 'umount' failures. However, an empirically found solution > is to align them using a rule, such as the disk with the lowest 'devt,' > for a multi-device Btrfs filesystem. What about creating dedicated helper like mount.btrfs and umount.btrfs that can manage all these kind of details ? mount.btrfs may also help to have a better understand about problem like a device missing. > > I'm not yet sure (RFC) how to create a udev rule to point to the disk with > the lowest 'devt,' as this kernel patch does, and I believe it is > possible. > > And this would ensure that '/proc/self/mounts,' 'findmnt,' and > '/dev/disk/by-uuid' all reference the same device. > > After applying this patch, the above test passes. Unfortunately, > /dev/disk/by-uuid also points to the lowest 'devt' by chance, even though > no rule has been set as of now. As shown below. > > ---------------- > $ mkfs.btrfs -qf --uuid 12345678-1234-1234-1234-123456789abc /dev/sda2 /dev/sda1 > > $ mount --verbose -a > / : ignored > /btrfs : successfully mounted > > $ ls -l /dev/disk/by-uuid | grep 12345678-1234-1234-1234-123456789abc > lrwxrwxrwx 1 root root 10 Nov 2 17:53 12345678-1234-1234-1234-123456789abc -> ../../sda1 > > $ cat /proc/self/mounts | grep btrfs > /dev/sda1 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0 > > $ findmnt --df /btrfs > SOURCE FSTYPE SIZE USED AVAIL USE% TARGET > /dev/sda1 btrfs 2G 5.8M 1.8G 0% /btrfs > > $ mount --verbose -a > / : ignored > /btrfs : already mounted > $echo $? > 0 > > $ umount /dev/sda1 > $echo $? > 0 > ---------------- > > Signed-off-by: Anand Jain <anand.jain@oracle.com> > --- > fs/btrfs/super.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c > index 66bdb6fd83bd..d768917cc5cc 100644 > --- a/fs/btrfs/super.c > +++ b/fs/btrfs/super.c > @@ -2301,7 +2301,18 @@ static int btrfs_unfreeze(struct super_block *sb) > > static int btrfs_show_devname(struct seq_file *m, struct dentry *root) > { > - struct btrfs_fs_info *fs_info = btrfs_sb(root->d_sb); > + struct btrfs_fs_devices *fs_devices = btrfs_sb(root->d_sb)->fs_devices; > + struct btrfs_device *device; > + struct btrfs_device *first_device = NULL; > + > + list_for_each_entry(device, &fs_devices->devices, dev_list) { > + if (first_device == NULL) { > + first_device = device; > + continue; > + } > + if (first_device->devt > device->devt) > + first_device = device; > + } > > /* > * There should be always a valid pointer in latest_dev, it may be stale > @@ -2309,7 +2320,7 @@ static int btrfs_show_devname(struct seq_file *m, struct dentry *root) > * the end of RCU grace period. > */ > rcu_read_lock(); > - seq_escape(m, btrfs_dev_name(fs_info->fs_devices->latest_dev), " \t\n\\"); > + seq_escape(m, rcu_str_deref(first_device->name), " \t\n\\"); > rcu_read_unlock(); > > return 0;
On Tue, Nov 28, 2023 at 09:00:06AM +0100, Goffredo Baroncelli wrote: > On 27/11/2023 12.48, Anand Jain wrote: > > > > > > On 11/25/23 09:09, Anand Jain wrote: > [...] > >> I am skeptical about whether we have a strong case to create a single > >> pseudo device per multi-device Btrfs filesystem, such as, for example > >> '/dev/btrfs/<fsid>-<random>/rootid=5' and which means pseudo device > >> will carry the btrfs-magic and the actual blk devices something else. > >> > >> OR for now, regarding the umount issue mentioned above, we just can > >> document it for the users to be aware of. > >> > >> Any feedback is greatly appreciated. > >> > > > > How about if we display the devices list in the options, so that > > user-land libs have something in the mount-table that tells all > > the devices part of the fsid? > > > > For example: > > $ cat /proc/self/mounts | grep btrfs > > > > /dev/sda1 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/,device=/dev/sda2,device=/dev/sdb3 0 0 > > > > When I developed code to find a btrfs mount point from a disk, I had to > consider all the devices involved and check if one is in /proc/self/mounts. > > Putting the devices list as device=<xxx>,device=<yyy> doesn't change anything because > the code has to manage a btrfs filesistem as "special" in any case. > To get the map <btrfs-uuid> <-> <devices-list> I used libblkid. > > I think that a "saner" way to manage this issue, is to patch "mount" to > consider the special needing of btrfs. This sounds like a good option although it needs a special case, but as you say parsing the devices in the mounts list would be needed a spcieal handling anyway. I'd rather not put all the device paths to the listed mount options though, feels like a wrong place for that. > Pay attention to consider also events like, removing a device, adding a device: > after these events how /dev/disk/by-uuid/ would be updated ? The by-uuid links are another place that would need to be reliably kept in sync with the changes (add/remove/replace device) as it's a cache and depends on the events. Caches can get stale, events lost. If we can get a solution that is based on a run time detection/analysis and does not even depend on what kernel module publishes as the main device I'd say we can achieve the best result.
On Wed, Nov 29, 2023 at 09:54:02PM +0100, Goffredo Baroncelli wrote: > On 29/11/2023 00.28, Anand Jain wrote: > > > > > > On 28/11/2023 16:00, Goffredo Baroncelli wrote: > >> On 27/11/2023 12.48, Anand Jain wrote: > >>> > >>> > >>> On 11/25/23 09:09, Anand Jain wrote: > >> [...] > >>>> I am skeptical about whether we have a strong case to create a single > >>>> pseudo device per multi-device Btrfs filesystem, such as, for example > >>>> '/dev/btrfs/<fsid>-<random>/rootid=5' and which means pseudo device > >>>> will carry the btrfs-magic and the actual blk devices something else. > >>>> > >>>> OR for now, regarding the umount issue mentioned above, we just can > >>>> document it for the users to be aware of. > >>>> > >>>> Any feedback is greatly appreciated. > >>>> > >>> > >>> How about if we display the devices list in the options, so that > >>> user-land libs have something in the mount-table that tells all > >>> the devices part of the fsid? > >>> > >>> For example: > >>> $ cat /proc/self/mounts | grep btrfs > >>> > >>> /dev/sda1 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/,device=/dev/sda2,device=/dev/sdb3 0 0 > >>> > >> > >> When I developed code to find a btrfs mount point from a disk, I had to > >> consider all the devices involved and check if one is in /proc/self/mounts. > >> > >> Putting the devices list as device=<xxx>,device=<yyy> doesn't change anything because > >> the code has to manage a btrfs filesistem as "special" in any case. > >> To get the map <btrfs-uuid> <-> <devices-list> I used libblkid. > [...] > > > > Regarding libblkid for Btrfs device discovery, I m little confused what are you referring to, an example would be helpful. > > I developed a little utility to build for each btrfs filesystem: > - all the devices involved > - all the mountpoint (if any) where the filesystem is mounted and the subvolume used as root. > > It was nice because it got all these information only using: > - libblkid > - parsing /proc/self/mountinfo I think if there's one consistent approach based on libblkid then all the related tools and projects can use that.
On 05/12/2023 18.44, David Sterba wrote: > On Wed, Nov 29, 2023 at 09:54:02PM +0100, Goffredo Baroncelli wrote: [...] >> I developed a little utility to build for each btrfs filesystem: >> - all the devices involved >> - all the mountpoint (if any) where the filesystem is mounted and the subvolume used as root. >> >> It was nice because it got all these information only using: >> - libblkid >> - parsing /proc/self/mountinfo > > I think if there's one consistent approach based on libblkid then all > the related tools and projects can use that. > Just now I send some patches to evaluate. See [PATCH 0/4] RFC: add the btrfs_info helper BR
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 66bdb6fd83bd..d768917cc5cc 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2301,7 +2301,18 @@ static int btrfs_unfreeze(struct super_block *sb) static int btrfs_show_devname(struct seq_file *m, struct dentry *root) { - struct btrfs_fs_info *fs_info = btrfs_sb(root->d_sb); + struct btrfs_fs_devices *fs_devices = btrfs_sb(root->d_sb)->fs_devices; + struct btrfs_device *device; + struct btrfs_device *first_device = NULL; + + list_for_each_entry(device, &fs_devices->devices, dev_list) { + if (first_device == NULL) { + first_device = device; + continue; + } + if (first_device->devt > device->devt) + first_device = device; + } /* * There should be always a valid pointer in latest_dev, it may be stale @@ -2309,7 +2320,7 @@ static int btrfs_show_devname(struct seq_file *m, struct dentry *root) * the end of RCU grace period. */ rcu_read_lock(); - seq_escape(m, btrfs_dev_name(fs_info->fs_devices->latest_dev), " \t\n\\"); + seq_escape(m, rcu_str_deref(first_device->name), " \t\n\\"); rcu_read_unlock(); return 0;
In a non-single-device Btrfs filesystem, if Btrfs is already mounted and if you run the command 'mount -a,' it will fail and the command 'umount <device>' also fails. As below: ---------------- $ cat /etc/fstab | grep btrfs UUID=12345678-1234-1234-1234-123456789abc /btrfs btrfs defaults,nofail 0 0 $ mkfs.btrfs -qf --uuid 12345678-1234-1234-1234-123456789abc /dev/sda2 /dev/sda1 $ mount --verbose -a / : ignored /btrfs : successfully mounted $ ls -l /dev/disk/by-uuid | grep 12345678-1234-1234-1234-123456789abc lrwxrwxrwx 1 root root 10 Nov 2 17:43 12345678-1234-1234-1234-123456789abc -> ../../sda1 $ cat /proc/self/mounts | grep btrfs /dev/sda2 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0 $ findmnt --df /btrfs SOURCE FSTYPE SIZE USED AVAIL USE% TARGET /dev/sda2 btrfs 2G 5.8M 1.8G 0% /btrfs $ mount --verbose -a / : ignored mount: /btrfs: /dev/sda1 already mounted or mount point busy. $echo $? 32 $ umount /dev/sda1 umount: /dev/sda1: not mounted. $ echo $? 32 ---------------- I assume (RFC) this is because '/dev/disk/by-uuid,' '/proc/self/mounts,' and 'findmnt' do not all reference the same device, resulting in the 'mount -a' and 'umount' failures. However, an empirically found solution is to align them using a rule, such as the disk with the lowest 'devt,' for a multi-device Btrfs filesystem. I'm not yet sure (RFC) how to create a udev rule to point to the disk with the lowest 'devt,' as this kernel patch does, and I believe it is possible. And this would ensure that '/proc/self/mounts,' 'findmnt,' and '/dev/disk/by-uuid' all reference the same device. After applying this patch, the above test passes. Unfortunately, /dev/disk/by-uuid also points to the lowest 'devt' by chance, even though no rule has been set as of now. As shown below. ---------------- $ mkfs.btrfs -qf --uuid 12345678-1234-1234-1234-123456789abc /dev/sda2 /dev/sda1 $ mount --verbose -a / : ignored /btrfs : successfully mounted $ ls -l /dev/disk/by-uuid | grep 12345678-1234-1234-1234-123456789abc lrwxrwxrwx 1 root root 10 Nov 2 17:53 12345678-1234-1234-1234-123456789abc -> ../../sda1 $ cat /proc/self/mounts | grep btrfs /dev/sda1 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0 $ findmnt --df /btrfs SOURCE FSTYPE SIZE USED AVAIL USE% TARGET /dev/sda1 btrfs 2G 5.8M 1.8G 0% /btrfs $ mount --verbose -a / : ignored /btrfs : already mounted $echo $? 0 $ umount /dev/sda1 $echo $? 0 ---------------- Signed-off-by: Anand Jain <anand.jain@oracle.com> --- fs/btrfs/super.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)