diff mbox series

[1/2] btrfs: avoid unnecessary device path update for the same device

Message ID bb5f08852d68f7424b1113deb74586527912c290.1727154543.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: enhance btrfs device path rename | expand

Commit Message

Qu Wenruo Sept. 24, 2024, 5:17 a.m. UTC
[PROBLEM]
It is very common for udev to trigger device scan, and every time a
mounted btrfs device got re-scan from different soft links, we will get
unnecessary renames like this:

 BTRFS: device fsid 2378be81-fe12-46d2-a9e8-68cf08dd98d5 devid 1 transid 7 /proc/self/fd/3 (253:2) scanned by mount_by_fd (11096)
 BTRFS info (device dm-2): first mount of filesystem 2378be81-fe12-46d2-a9e8-68cf08dd98d5
 BTRFS info (device dm-2): using crc32c (crc32c-intel) checksum algorithm
 BTRFS info (device dm-2): using free-space-tree
 BTRFS info: devid 1 device path /proc/self/fd/3 changed to /dev/dm-2 scanned by (udev-worker) (11092)
 BTRFS info: devid 1 device path /dev/dm-2 changed to /dev/mapper/test-scratch1 scanned by (udev-worker) (11092)

The program "mount_by_fd" has the following simple source code:

 #include <fcntl.h>
 #include <stdio.h>
 #include <sys/mount.h>

 int main(int argc, char *argv[]) {
	int fd = open(argv[1], O_RDWR);
	char path[256];

	snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
	return mount(path, argv[2], "btrfs", 0, NULL);
 }

Note that, all the above paths are all just pointing to "/dev/dm-2".
The "/proc/self/fd/3" is the proc fs, describing all the opened fds.
The "/dev/mapper/test-scratch1" is the soft link created by LVM2.

[CAUSE]
Inside device_list_add(), we are using a very primitive way checking if
the device has changed, strcmp().

Which can never handle links well, no matter if it's hard or soft links.

So every different link of the same device will be treated as different
device, causing the unnecessary device name update.

[FIX]
Introduce a helper, is_same_device(), and use path_equal() to properly
detect the same block device.
So that the different soft links won't trigger the rename race.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

Comments

Filipe Manana Sept. 24, 2024, 11:47 a.m. UTC | #1
On Tue, Sep 24, 2024 at 6:17 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [PROBLEM]
> It is very common for udev to trigger device scan, and every time a
> mounted btrfs device got re-scan from different soft links, we will get
> unnecessary renames like this:
>
>  BTRFS: device fsid 2378be81-fe12-46d2-a9e8-68cf08dd98d5 devid 1 transid 7 /proc/self/fd/3 (253:2) scanned by mount_by_fd (11096)
>  BTRFS info (device dm-2): first mount of filesystem 2378be81-fe12-46d2-a9e8-68cf08dd98d5
>  BTRFS info (device dm-2): using crc32c (crc32c-intel) checksum algorithm
>  BTRFS info (device dm-2): using free-space-tree
>  BTRFS info: devid 1 device path /proc/self/fd/3 changed to /dev/dm-2 scanned by (udev-worker) (11092)
>  BTRFS info: devid 1 device path /dev/dm-2 changed to /dev/mapper/test-scratch1 scanned by (udev-worker) (11092)
>
> The program "mount_by_fd" has the following simple source code:
>
>  #include <fcntl.h>
>  #include <stdio.h>
>  #include <sys/mount.h>
>
>  int main(int argc, char *argv[]) {
>         int fd = open(argv[1], O_RDWR);
>         char path[256];
>
>         snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
>         return mount(path, argv[2], "btrfs", 0, NULL);
>  }
>
> Note that, all the above paths are all just pointing to "/dev/dm-2".
> The "/proc/self/fd/3" is the proc fs, describing all the opened fds.

This is redundant to say, we all know everything inside /proc belongs
to procfs, and self/fd/X corresponds to an open fd by the calling
task.

> The "/dev/mapper/test-scratch1" is the soft link created by LVM2.

What would be more useful here is to have all the steps to reproduce
the problem:

1) How you invoke that test program, what the arguments are;
2) Any other steps that one should run to reproduce the problem.

Can we also get a test case for fstests?
Even if it's caused by a race and it doesn't always trigger, with
several people often running fstests frequently, possible regressions
will be quickly detected.

>
> [CAUSE]
> Inside device_list_add(), we are using a very primitive way checking if
> the device has changed, strcmp().
>
> Which can never handle links well, no matter if it's hard or soft links.
>
> So every different link of the same device will be treated as different
> device, causing the unnecessary device name update.
>
> [FIX]
> Introduce a helper, is_same_device(), and use path_equal() to properly
> detect the same block device.
> So that the different soft links won't trigger the rename race.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Since there's a publicly accessible report for this, at
https://bugzilla.suse.com/show_bug.cgi?id=1230641, can we please at
least get a Link tag here?

> ---
>  fs/btrfs/volumes.c | 28 +++++++++++++++++++++++++++-
>  1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 995b0647f538..b713e4ebb362 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -732,6 +732,32 @@ const u8 *btrfs_sb_fsid_ptr(const struct btrfs_super_block *sb)
>         return has_metadata_uuid ? sb->metadata_uuid : sb->fsid;
>  }
>
> +static bool is_same_device(struct btrfs_device *device, const char *new_path)
> +{
> +       struct path old = { .mnt = NULL, .dentry = NULL };
> +       struct path new = { .mnt = NULL, .dentry = NULL };

This can simply be: ... = { 0 }

But I don't mind this more verbose initialization.

With the change log fixes:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> +       bool is_same = false;
> +       int ret;
> +
> +       if (!device->name)
> +               goto out;
> +
> +       old_path = rcu_str_deref(device->name);
> +       ret = kern_path(old_path, LOOKUP_FOLLOW, &old);
> +       if (ret)
> +               goto out;
> +       ret = kern_path(new_path, LOOKUP_FOLLOW, &new);
> +       if (ret)
> +               goto out;
> +       if (path_equal(&old, &new))
> +               is_same = true;
> +out:
> +       path_put(&old);
> +       path_put(&new);
> +       return is_same;
> +}
> +
>  /*
>   * Add new device to list of registered devices
>   *
> @@ -852,7 +878,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>                                 MAJOR(path_devt), MINOR(path_devt),
>                                 current->comm, task_pid_nr(current));
>
> -       } else if (!device->name || strcmp(device->name->str, path)) {
> +       } else if (!device->name || !is_same_device(device, path)) {
>                 /*
>                  * When FS is already mounted.
>                  * 1. If you are here and if the device->name is NULL that
> --
> 2.46.1
>
>
Qu Wenruo Sept. 24, 2024, 9:12 p.m. UTC | #2
在 2024/9/24 21:17, Filipe Manana 写道:
> On Tue, Sep 24, 2024 at 6:17 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> [PROBLEM]
>> It is very common for udev to trigger device scan, and every time a
>> mounted btrfs device got re-scan from different soft links, we will get
>> unnecessary renames like this:
>>
>>   BTRFS: device fsid 2378be81-fe12-46d2-a9e8-68cf08dd98d5 devid 1 transid 7 /proc/self/fd/3 (253:2) scanned by mount_by_fd (11096)
>>   BTRFS info (device dm-2): first mount of filesystem 2378be81-fe12-46d2-a9e8-68cf08dd98d5
>>   BTRFS info (device dm-2): using crc32c (crc32c-intel) checksum algorithm
>>   BTRFS info (device dm-2): using free-space-tree
>>   BTRFS info: devid 1 device path /proc/self/fd/3 changed to /dev/dm-2 scanned by (udev-worker) (11092)
>>   BTRFS info: devid 1 device path /dev/dm-2 changed to /dev/mapper/test-scratch1 scanned by (udev-worker) (11092)
>>
>> The program "mount_by_fd" has the following simple source code:
>>
>>   #include <fcntl.h>
>>   #include <stdio.h>
>>   #include <sys/mount.h>
>>
>>   int main(int argc, char *argv[]) {
>>          int fd = open(argv[1], O_RDWR);
>>          char path[256];
>>
>>          snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
>>          return mount(path, argv[2], "btrfs", 0, NULL);
>>   }
>>
>> Note that, all the above paths are all just pointing to "/dev/dm-2".
>> The "/proc/self/fd/3" is the proc fs, describing all the opened fds.
>
> This is redundant to say, we all know everything inside /proc belongs
> to procfs, and self/fd/X corresponds to an open fd by the calling
> task.
>
>> The "/dev/mapper/test-scratch1" is the soft link created by LVM2.
>
> What would be more useful here is to have all the steps to reproduce
> the problem:
>
> 1) How you invoke that test program, what the arguments are;
> 2) Any other steps that one should run to reproduce the problem.
>
> Can we also get a test case for fstests?

 From my tests with ext4, it's really btrfs' ability to change path name
halfway saving us.

For ext4, the weird device name will stick there forever, and I do not
even know if this is the expected behavior or not.

> Even if it's caused by a race and it doesn't always trigger, with
> several people often running fstests frequently, possible regressions
> will be quickly detected.

So far I haven't hit a case where the proc name can stay persistent,
udev rescan always saves us.

I guess the only reliable way to get such situation is to disable udev
or on systems without udev completely.

>
>>
>> [CAUSE]
>> Inside device_list_add(), we are using a very primitive way checking if
>> the device has changed, strcmp().
>>
>> Which can never handle links well, no matter if it's hard or soft links.
>>
>> So every different link of the same device will be treated as different
>> device, causing the unnecessary device name update.
>>
>> [FIX]
>> Introduce a helper, is_same_device(), and use path_equal() to properly
>> detect the same block device.
>> So that the different soft links won't trigger the rename race.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> Since there's a publicly accessible report for this, at
> https://bugzilla.suse.com/show_bug.cgi?id=1230641, can we please at
> least get a Link tag here?

Sure.

>
>> ---
>>   fs/btrfs/volumes.c | 28 +++++++++++++++++++++++++++-
>>   1 file changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 995b0647f538..b713e4ebb362 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -732,6 +732,32 @@ const u8 *btrfs_sb_fsid_ptr(const struct btrfs_super_block *sb)
>>          return has_metadata_uuid ? sb->metadata_uuid : sb->fsid;
>>   }
>>
>> +static bool is_same_device(struct btrfs_device *device, const char *new_path)
>> +{
>> +       struct path old = { .mnt = NULL, .dentry = NULL };
>> +       struct path new = { .mnt = NULL, .dentry = NULL };
>
> This can simply be: ... = { 0 }

Unfortunately that will trigger designated-init warning since the
structure has __randomize_layout attribute.

>
> But I don't mind this more verbose initialization.
>
> With the change log fixes:
>
> Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks for the review,
Qu

>
> Thanks.
>
>> +       bool is_same = false;
>> +       int ret;
>> +
>> +       if (!device->name)
>> +               goto out;
>> +
>> +       old_path = rcu_str_deref(device->name);
>> +       ret = kern_path(old_path, LOOKUP_FOLLOW, &old);
>> +       if (ret)
>> +               goto out;
>> +       ret = kern_path(new_path, LOOKUP_FOLLOW, &new);
>> +       if (ret)
>> +               goto out;
>> +       if (path_equal(&old, &new))
>> +               is_same = true;
>> +out:
>> +       path_put(&old);
>> +       path_put(&new);
>> +       return is_same;
>> +}
>> +
>>   /*
>>    * Add new device to list of registered devices
>>    *
>> @@ -852,7 +878,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>>                                  MAJOR(path_devt), MINOR(path_devt),
>>                                  current->comm, task_pid_nr(current));
>>
>> -       } else if (!device->name || strcmp(device->name->str, path)) {
>> +       } else if (!device->name || !is_same_device(device, path)) {
>>                  /*
>>                   * When FS is already mounted.
>>                   * 1. If you are here and if the device->name is NULL that
>> --
>> 2.46.1
>>
>>
>
diff mbox series

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 995b0647f538..b713e4ebb362 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -732,6 +732,32 @@  const u8 *btrfs_sb_fsid_ptr(const struct btrfs_super_block *sb)
 	return has_metadata_uuid ? sb->metadata_uuid : sb->fsid;
 }
 
+static bool is_same_device(struct btrfs_device *device, const char *new_path)
+{
+	struct path old = { .mnt = NULL, .dentry = NULL };
+	struct path new = { .mnt = NULL, .dentry = NULL };
+	char *old_path;
+	bool is_same = false;
+	int ret;
+
+	if (!device->name)
+		goto out;
+
+	old_path = rcu_str_deref(device->name);
+	ret = kern_path(old_path, LOOKUP_FOLLOW, &old);
+	if (ret)
+		goto out;
+	ret = kern_path(new_path, LOOKUP_FOLLOW, &new);
+	if (ret)
+		goto out;
+	if (path_equal(&old, &new))
+		is_same = true;
+out:
+	path_put(&old);
+	path_put(&new);
+	return is_same;
+}
+
 /*
  * Add new device to list of registered devices
  *
@@ -852,7 +878,7 @@  static noinline struct btrfs_device *device_list_add(const char *path,
 				MAJOR(path_devt), MINOR(path_devt),
 				current->comm, task_pid_nr(current));
 
-	} else if (!device->name || strcmp(device->name->str, path)) {
+	} else if (!device->name || !is_same_device(device, path)) {
 		/*
 		 * When FS is already mounted.
 		 * 1. If you are here and if the device->name is NULL that