diff mbox series

[1/2] btrfs-progs: Ignore path device during device scan

Message ID 20210928123730.393551-1-nborisov@suse.com (mailing list archive)
State New, archived
Headers show
Series [1/2] btrfs-progs: Ignore path device during device scan | expand

Commit Message

Nikolay Borisov Sept. 28, 2021, 12:37 p.m. UTC
Currently btrfs-progs will happily enumerate any device which has a
btrfs filesystem on it. For the majority of use cases that's fine and
there haven't been any problems with that. However, there was a recent
report that in multipath scenario when running "btrfs fi show" after a
path flap instead of the multipath device being show the path device is
shown. So a multipath filesystem might look like:

Label: none  uuid: d3c1261f-18be-4015-9fef-6b35759dfdba
	Total devices 1 FS bytes used 192.00KiB
	devid    1 size 10.00GiB used 536.00MiB path /dev/mapper/3600140501cc1f49e5364f0093869c763

/dev/mapper/xxx can actually be backed by an arbitrary number of path,
which in turn are presented to the system as ordinary scsi devices i.e
/dev/sdd. If a path flaps and a user re-runs 'btrfs fi show' the output
would look like:

Label: none  uuid: d3c1261f-18be-4015-9fef-6b35759dfdba
	Total devices 1 FS bytes used 192.00KiB
	devid    1 size 10.00GiB used 536.00MiB path /dev/sdd

Turns out the output of this command is consumed by libraries and the
presence of a path device rather than the actual multipath one can cause
issues.

Fix this by relying on the fact that path devices are tagged with the
DM_MULTIPATH_DEVICE_PATH attribute by the respective udev scripts. In
order to access it an optional dependency on libudev is added, if the
library can't be found then device enumeration will continue working
as it was before the commit. Since libudev doesn't have static library
for now support for this behavior in case of static builds is going to
be disabled. Fallback code for static builds will come in a future
commit.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 Makefile             |  2 +-
 Makefile.inc.in      |  2 +-
 common/device-scan.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 configure.ac         |  9 +++++++++
 4 files changed, 55 insertions(+), 2 deletions(-)

Comments

Anand Jain Sept. 28, 2021, 11:03 p.m. UTC | #1
On 28/09/2021 20:37, Nikolay Borisov wrote:
> Currently btrfs-progs will happily enumerate any device which has a
> btrfs filesystem on it. For the majority of use cases that's fine and
> there haven't been any problems with that.

> However, there was a recent
> report

  Could you point to the report or if it is internal?

  Kernel message has the process of name for the device scan.
  We don't have to fix the btrfs-progs end if it is not doing it.

> that in multipath scenario when running "btrfs fi show" after a
> path flap 

  It is better to use 'btrfs fi show -m' it provides kernel perspective.

  What do you mean by path flap here? Do you mean a device-path in a 
multi-path config disappeared forever or failed temporarily?

> instead of the multipath device being show the path device is
> shown. So a multipath filesystem might look like:
> 
> Label: none  uuid: d3c1261f-18be-4015-9fef-6b35759dfdba
> 	Total devices 1 FS bytes used 192.00KiB
> 	devid    1 size 10.00GiB used 536.00MiB path /dev/mapper/3600140501cc1f49e5364f0093869c763

> 
> /dev/mapper/xxx can actually be backed by an arbitrary number of path,

  It is not arbitrary it depends on the number of HBAs configured to the 
storage/LUN in a SAN.

> which in turn are presented to the system as ordinary scsi devices i.e
> /dev/sdd.

  Yeah, it is problematic. Other OSs got away with that approach by 
introducing a path consolidator between the HBA and the target driver.

> If a path flaps and a user re-runs 'btrfs fi show' the output
> would look like:

> Label: none  uuid: d3c1261f-18be-4015-9fef-6b35759dfdba
> 	Total devices 1 FS bytes used 192.00KiB
> 	devid    1 size 10.00GiB used 536.00MiB path /dev/sdd

  Kernel messages will know when we change the device path and who 
called the device scan. Could you pls put those messages here?

Thanks, Anand

> Turns out the output of this command is consumed by libraries and the
> presence of a path device rather than the actual multipath one can cause
> issues.
> 

> Fix this by relying on the fact that path devices are tagged with the
> DM_MULTIPATH_DEVICE_PATH attribute by the respective udev scripts.

> In
> order to access it an optional dependency on libudev is added, if the
> library can't be found then device enumeration will continue working
> as it was before the commit.

> Since libudev doesn't have static library
> for now support for this behavior in case of static builds is going to
> be disabled. Fallback code for static builds will come in a future
> commit.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>   Makefile             |  2 +-
>   Makefile.inc.in      |  2 +-
>   common/device-scan.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>   configure.ac         |  9 +++++++++
>   4 files changed, 55 insertions(+), 2 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 93fe4c2b3e08..e96f66a36b46 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -129,7 +129,7 @@ LIBS = $(LIBS_BASE) $(LIBS_CRYPTO)
>   LIBBTRFS_LIBS = $(LIBS_BASE) $(LIBS_CRYPTO)
>   
>   # Static compilation flags
> -STATIC_CFLAGS = $(CFLAGS) -ffunction-sections -fdata-sections
> +STATIC_CFLAGS = $(CFLAGS) -ffunction-sections -fdata-sections -DSTATIC_BUILD
>   STATIC_LDFLAGS = -static -Wl,--gc-sections
>   STATIC_LIBS = $(STATIC_LIBS_BASE)
>   
> diff --git a/Makefile.inc.in b/Makefile.inc.in
> index 9f49337147b8..c995aef97219 100644
> --- a/Makefile.inc.in
> +++ b/Makefile.inc.in
> @@ -27,7 +27,7 @@ CRYPTO_CFLAGS = @GCRYPT_CFLAGS@ @SODIUM_CFLAGS@ @KCAPI_CFLAGS@
>   SUBST_CFLAGS = @CFLAGS@
>   SUBST_LDFLAGS = @LDFLAGS@
>   
> -LIBS_BASE = @UUID_LIBS@ @BLKID_LIBS@ -L. -pthread
> +LIBS_BASE = @UUID_LIBS@ @BLKID_LIBS@ @LIBUDEV_LIBS@ -L. -pthread
>   LIBS_COMP = @ZLIB_LIBS@ @LZO2_LIBS@ @ZSTD_LIBS@
>   LIBS_PYTHON = @PYTHON_LIBS@
>   LIBS_CRYPTO = @GCRYPT_LIBS@ @SODIUM_LIBS@ @KCAPI_LIBS@
> diff --git a/common/device-scan.c b/common/device-scan.c
> index b5bfe844104b..2ed0e34d3664 100644
> --- a/common/device-scan.c
> +++ b/common/device-scan.c
> @@ -14,6 +14,10 @@
>    * Boston, MA 021110-1307, USA.
>    */
>   
> +#ifdef STATIC_BUILD
> +#undef HAVE_LIBUDEV
> +#endif
> +
>   #include "kerncompat.h"
>   #include <sys/ioctl.h>
>   #include <stdlib.h>
> @@ -25,6 +29,10 @@
>   #include <dirent.h>
>   #include <blkid/blkid.h>
>   #include <uuid/uuid.h>
> +#ifdef HAVE_LIBUDEV
> +#include <sys/stat.h>
> +#include <libudev.h>
> +#endif
>   #include "kernel-lib/overflow.h"
>   #include "common/path-utils.h"
>   #include "common/device-scan.h"
> @@ -364,6 +372,37 @@ void free_seen_fsid(struct seen_fsid *seen_fsid_hash[])
>   	}
>   }
>   
> +#ifdef HAVE_LIBUDEV
> +static bool is_path_device(char *device_path)
> +{
> +	struct udev *udev = NULL;
> +	struct udev_device *dev = NULL;
> +	struct stat dev_stat;
> +	const char *val;
> +	bool ret = false;
> +
> +	if (stat(device_path, &dev_stat) < 0)
> +		return false;
> +
> +	udev = udev_new();
> +	if (!udev)
> +		goto out;
> +
> +	dev = udev_device_new_from_devnum(udev, 'b', dev_stat.st_rdev);
> +	if (!dev)
> +		goto out;
> +
> +	val = udev_device_get_property_value(dev, "DM_MULTIPATH_DEVICE_PATH");
> +	if (val && atoi(val) > 0)
> +		ret = true;
> +out:
> +	udev_device_unref(dev);
> +	udev_unref(udev);
> +
> +	return ret;
> +}
> +#endif
> +
>   int btrfs_scan_devices(int verbose)
>   {
>   	int fd = -1;
> @@ -394,6 +433,11 @@ int btrfs_scan_devices(int verbose)
>   		/* if we are here its definitely a btrfs disk*/
>   		strncpy_null(path, blkid_dev_devname(dev));
>   
> +#ifdef HAVE_LIBUDEV
> +		if (is_path_device(path))
> +			continue;
> +#endif
> +
>   		fd = open(path, O_RDONLY);
>   		if (fd < 0) {
>   			error("cannot open %s: %m", path);
> diff --git a/configure.ac b/configure.ac
> index 038c2688421c..d0ceb0d70d16 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -304,6 +304,15 @@ PKG_STATIC(UUID_LIBS_STATIC, [uuid])
>   PKG_CHECK_MODULES(ZLIB, [zlib])
>   PKG_STATIC(ZLIB_LIBS_STATIC, [zlib])
>   
> +PKG_CHECK_EXISTS([libudev], [pkg_config_libudev=yes], [pkg_config_libudev=no])
> +if test "x$pkg_config_libudev" = xyes; then
> +	PKG_CHECK_MODULES([LIBUDEV], [libudev])
> +	AC_DEFINE([HAVE_LIBUDEV], [1], [Define to 1 if libudev is available])
> +else
> +	AC_MSG_CHECKING([for LIBUDEV])
> +	AC_MSG_RESULT([no])
> +fi
> +
>   AC_ARG_ENABLE([zstd],
>   	AS_HELP_STRING([--disable-zstd], [build without zstd support]),
>   	[], [enable_zstd=yes]
>
Nikolay Borisov Sept. 29, 2021, 6:59 a.m. UTC | #2
On 29.09.21 г. 2:03, Anand Jain wrote:
> On 28/09/2021 20:37, Nikolay Borisov wrote:
>> Currently btrfs-progs will happily enumerate any device which has a
>> btrfs filesystem on it. For the majority of use cases that's fine and
>> there haven't been any problems with that.
> 
>> However, there was a recent
>> report
> 
>  Could you point to the report or if it is internal?

Internal

> 
>  Kernel message has the process of name for the device scan.
>  We don't have to fix the btrfs-progs end if it is not doing it.
> 
>> that in multipath scenario when running "btrfs fi show" after a
>> path flap 
> 
>  It is better to use 'btrfs fi show -m' it provides kernel perspective.

[146822.972653] BTRFS warning: duplicate device /dev/sdd devid 1
generation 8 scanned by systemd-udevd (6254)

[146823.060984] BTRFS info (device dm-0): devid 1 device path
/dev/mapper/3600140501cc1f49e5364f0093869c763 changed to /dev/dm-0
scanned by systemd-udevd (6266)

[146823.075084] BTRFS info (device dm-0): devid 1 device path /dev/dm-0
changed to /dev/mapper/3600140501cc1f49e5364f0093869c763 scanned by
systemd-udevd (6266)


btrfs fi show -m is actually consistent with always showing the mapper
device.

> 
>  What do you mean by path flap here? Do you mean a device-path in a
> multi-path config disappeared forever or failed temporarily?

flap means going up and down. The gist is that btrfs fi show would show
the latest device being scanned, which in the case of multipath device
could be any of the paths.

> 
>> instead of the multipath device being show the path device is
>> shown. So a multipath filesystem might look like:
>>
>> Label: none  uuid: d3c1261f-18be-4015-9fef-6b35759dfdba
>>     Total devices 1 FS bytes used 192.00KiB
>>     devid    1 size 10.00GiB used 536.00MiB path
>> /dev/mapper/3600140501cc1f49e5364f0093869c763
> 
>>
>> /dev/mapper/xxx can actually be backed by an arbitrary number of path,
> 
>  It is not arbitrary it depends on the number of HBAs configured to the
> storage/LUN in a SAN.

And this precisely means that the number of paths is arbitrary, based on
particular configuration.

> 
>> which in turn are presented to the system as ordinary scsi devices i.e
>> /dev/sdd.
> 

<snip>
Anand Jain Sept. 29, 2021, 11:13 a.m. UTC | #3
On 29/09/2021 14:59, Nikolay Borisov wrote:
> 
> 
> On 29.09.21 г. 2:03, Anand Jain wrote:
>> On 28/09/2021 20:37, Nikolay Borisov wrote:
>>> Currently btrfs-progs will happily enumerate any device which has a
>>> btrfs filesystem on it. For the majority of use cases that's fine and
>>> there haven't been any problems with that.
>>
>>> However, there was a recent
>>> report
>>
>>   Could you point to the report or if it is internal?
> 
> Internal
> 
>>
>>   Kernel message has the process of name for the device scan.
>>   We don't have to fix the btrfs-progs end if it is not doing it.
>>
>>> that in multipath scenario when running "btrfs fi show" after a
>>> path flap
>>
>>   It is better to use 'btrfs fi show -m' it provides kernel perspective.
> 
> [146822.972653] BTRFS warning: duplicate device /dev/sdd devid 1
> generation 8 scanned by systemd-udevd (6254)
> 
> [146823.060984] BTRFS info (device dm-0): devid 1 device path
> /dev/mapper/3600140501cc1f49e5364f0093869c763 changed to /dev/dm-0
> scanned by systemd-udevd (6266)

   So /dev/sdd is a device-path of the multi-path
      /dev/mapper/3600140501cc1f49e5364f0093869c763 aka /dev/dm-0
   which, the kernel rejected to use, that's good.

> [146823.075084] BTRFS info (device dm-0): devid 1 device path /dev/dm-0
> changed to /dev/mapper/3600140501cc1f49e5364f0093869c763 scanned by
> systemd-udevd (6266)

  systemd-udevd is scanning the same device too many times.

> btrfs fi show -m is actually consistent with always showing the mapper
> device.

  Do you mean either of /dev/mapper/3600140501cc1f49e5364f0093869c763
  or /dev/dm-0 but, never /dev/sdd? That's expected as per the above
  messages which is not a problem.


>>   What do you mean by path flap here? Do you mean a device-path in a
>> multi-path config disappeared forever or failed temporarily?
> 
> flap means going up and down. The gist is that btrfs fi show would show
> the latest device being scanned, which in the case of multipath device
> could be any of the paths.

  Do you mean 'btrfs fi show' shows a device of the multi-path however,
  'btrfs fi show -m' shows the correct multi-path device?

<snip>
Nikolay Borisov Sept. 29, 2021, 11:25 a.m. UTC | #4
On 29.09.21 г. 14:13, Anand Jain wrote:
<snip>

>> flap means going up and down. The gist is that btrfs fi show would show
>> the latest device being scanned, which in the case of multipath device
>> could be any of the paths.
> 
>  Do you mean 'btrfs fi show' shows a device of the multi-path however,
>  'btrfs fi show -m' shows the correct multi-path device?

Yes, that's a problem purely in btrfs-progs, as the path devices are
opened exclusively for the purpose of multiapth.
> 
> <snip>
>
Anand Jain Sept. 29, 2021, 12:44 p.m. UTC | #5
>>> flap means going up and down. The gist is that btrfs fi show would show
>>> the latest device being scanned, which in the case of multipath device
>>> could be any of the paths.

   But why the problem is only when a device flaps? Or it doesn't matter?

>>   Do you mean 'btrfs fi show' shows a device of the multi-path however,
>>   'btrfs fi show -m' shows the correct multi-path device?
> 
> Yes, that's a problem purely in btrfs-progs, as the path devices are
> opened exclusively for the purpose of multiapth.

  Ok. All parts of the test case is with an unmounted btrfs, I am 
clarified, now.
Nikolay Borisov Sept. 29, 2021, 12:51 p.m. UTC | #6
On 29.09.21 г. 15:44, Anand Jain wrote:
> 
> 
>>>> flap means going up and down. The gist is that btrfs fi show would show
>>>> the latest device being scanned, which in the case of multipath device
>>>> could be any of the paths.
> 
>   But why the problem is only when a device flaps? Or it doesn't matter?

Because when the device re-appears it will be the last device scanned by
btrfs scanning code. It can be reproduced by following steps:

1) Validate 'btrfs fi show' is showing /dev/mapper/xxxx for all fs's
1) Unplug one of the cables from the FC adapter < this can be simulated
by simply doing 'echo 1 > /sys/block/sdd/device/delete' for the given
path device >
2) Wait for the paths/drives associated with the downed port to disappear
3) Check again that 'btrfs fi show' still lists the /dev/mapper entry
4) Reattach the cable to the hba port <this can be simulated by
rescanning the HBA or w/e bus you have: echo "- - -" >
/sys/class/scsi_host/host1/scan >
5) Check that 'btrfs fi show' is now shows /dev/sdX devices for all mpio
fs's

> 
>>>   Do you mean 'btrfs fi show' shows a device of the multi-path however,
>>>   'btrfs fi show -m' shows the correct multi-path device?
>>
>> Yes, that's a problem purely in btrfs-progs, as the path devices are
>> opened exclusively for the purpose of multiapth.
> 
>  Ok. All parts of the test case is with an unmounted btrfs, I am
> clarified, now.
> 
> 
>
David Sterba Sept. 29, 2021, 6:31 p.m. UTC | #7
On Tue, Sep 28, 2021 at 03:37:29PM +0300, Nikolay Borisov wrote:
> Currently btrfs-progs will happily enumerate any device which has a
> btrfs filesystem on it. For the majority of use cases that's fine and
> there haven't been any problems with that. However, there was a recent
> report that in multipath scenario when running "btrfs fi show" after a
> path flap instead of the multipath device being show the path device is
> shown. So a multipath filesystem might look like:
> 
> Label: none  uuid: d3c1261f-18be-4015-9fef-6b35759dfdba
> 	Total devices 1 FS bytes used 192.00KiB
> 	devid    1 size 10.00GiB used 536.00MiB path /dev/mapper/3600140501cc1f49e5364f0093869c763
> 
> /dev/mapper/xxx can actually be backed by an arbitrary number of path,
> which in turn are presented to the system as ordinary scsi devices i.e
> /dev/sdd. If a path flaps and a user re-runs 'btrfs fi show' the output
> would look like:
> 
> Label: none  uuid: d3c1261f-18be-4015-9fef-6b35759dfdba
> 	Total devices 1 FS bytes used 192.00KiB
> 	devid    1 size 10.00GiB used 536.00MiB path /dev/sdd
> 
> Turns out the output of this command is consumed by libraries and the
> presence of a path device rather than the actual multipath one can cause
> issues.
> 
> Fix this by relying on the fact that path devices are tagged with the
> DM_MULTIPATH_DEVICE_PATH attribute by the respective udev scripts. In
> order to access it an optional dependency on libudev is added, if the
> library can't be found then device enumeration will continue working
> as it was before the commit. Since libudev doesn't have static library
> for now support for this behavior in case of static builds is going to
> be disabled. Fallback code for static builds will come in a future
> commit.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  Makefile             |  2 +-
>  Makefile.inc.in      |  2 +-
>  common/device-scan.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  configure.ac         |  9 +++++++++
>  4 files changed, 55 insertions(+), 2 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 93fe4c2b3e08..e96f66a36b46 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -129,7 +129,7 @@ LIBS = $(LIBS_BASE) $(LIBS_CRYPTO)
>  LIBBTRFS_LIBS = $(LIBS_BASE) $(LIBS_CRYPTO)
>  
>  # Static compilation flags
> -STATIC_CFLAGS = $(CFLAGS) -ffunction-sections -fdata-sections
> +STATIC_CFLAGS = $(CFLAGS) -ffunction-sections -fdata-sections -DSTATIC_BUILD

Could you please split the configure/makefile changes to a separate
patch? For progs the patch separation rules/recommendations are less
strict as for kernel but at least the build and code changes should be
separate as each has a different justification. Doing libudev and
-DSTATIC_BUILD is probably fine in one patch.

>  STATIC_LDFLAGS = -static -Wl,--gc-sections
>  STATIC_LIBS = $(STATIC_LIBS_BASE)
>  
> diff --git a/Makefile.inc.in b/Makefile.inc.in
> index 9f49337147b8..c995aef97219 100644
> --- a/Makefile.inc.in
> +++ b/Makefile.inc.in
> @@ -27,7 +27,7 @@ CRYPTO_CFLAGS = @GCRYPT_CFLAGS@ @SODIUM_CFLAGS@ @KCAPI_CFLAGS@
>  SUBST_CFLAGS = @CFLAGS@
>  SUBST_LDFLAGS = @LDFLAGS@
>  
> -LIBS_BASE = @UUID_LIBS@ @BLKID_LIBS@ -L. -pthread
> +LIBS_BASE = @UUID_LIBS@ @BLKID_LIBS@ @LIBUDEV_LIBS@ -L. -pthread
>  LIBS_COMP = @ZLIB_LIBS@ @LZO2_LIBS@ @ZSTD_LIBS@
>  LIBS_PYTHON = @PYTHON_LIBS@
>  LIBS_CRYPTO = @GCRYPT_LIBS@ @SODIUM_LIBS@ @KCAPI_LIBS@
> diff --git a/common/device-scan.c b/common/device-scan.c
> index b5bfe844104b..2ed0e34d3664 100644
> --- a/common/device-scan.c
> +++ b/common/device-scan.c
> @@ -14,6 +14,10 @@
>   * Boston, MA 021110-1307, USA.
>   */
>  
> +#ifdef STATIC_BUILD
> +#undef HAVE_LIBUDEV
> +#endif
> +
>  #include "kerncompat.h"
>  #include <sys/ioctl.h>
>  #include <stdlib.h>
> @@ -25,6 +29,10 @@
>  #include <dirent.h>
>  #include <blkid/blkid.h>
>  #include <uuid/uuid.h>
> +#ifdef HAVE_LIBUDEV
> +#include <sys/stat.h>
> +#include <libudev.h>
> +#endif
>  #include "kernel-lib/overflow.h"
>  #include "common/path-utils.h"
>  #include "common/device-scan.h"
> @@ -364,6 +372,37 @@ void free_seen_fsid(struct seen_fsid *seen_fsid_hash[])
>  	}
>  }
>  
> +#ifdef HAVE_LIBUDEV
> +static bool is_path_device(char *device_path)

Please rename it to is_multipath_device, we've been using 'path' for
normal paths and I find using 'path' confusing in this context.

> +{
> +	struct udev *udev = NULL;
> +	struct udev_device *dev = NULL;
> +	struct stat dev_stat;
> +	const char *val;
> +	bool ret = false;
> +
> +	if (stat(device_path, &dev_stat) < 0)
> +		return false;
> +
> +	udev = udev_new();
> +	if (!udev)
> +		goto out;
> +
> +	dev = udev_device_new_from_devnum(udev, 'b', dev_stat.st_rdev);
> +	if (!dev)
> +		goto out;
> +
> +	val = udev_device_get_property_value(dev, "DM_MULTIPATH_DEVICE_PATH");
> +	if (val && atoi(val) > 0)
> +		ret = true;
> +out:
> +	udev_device_unref(dev);
> +	udev_unref(udev);
> +
> +	return ret;
> +}
> +#endif
> +
>  int btrfs_scan_devices(int verbose)
>  {
>  	int fd = -1;
> @@ -394,6 +433,11 @@ int btrfs_scan_devices(int verbose)
>  		/* if we are here its definitely a btrfs disk*/
>  		strncpy_null(path, blkid_dev_devname(dev));
>  
> +#ifdef HAVE_LIBUDEV
> +		if (is_path_device(path))
> +			continue;
> +#endif

I'd rather avoid inline ifdef, the cleaner way is to define 2 versions
based on the macro like

#ifdef HAVE_LIBUDEV
static bool is_path_device(char *device_path)
{
... as above
}

#else

static bool is_path_device(char *device_path)
{
	return false;
}

#endif

and inside btrfs_scan_devices do

	if (is_multipath_device(...))

> +
>  		fd = open(path, O_RDONLY);
>  		if (fd < 0) {
>  			error("cannot open %s: %m", path);
> diff --git a/configure.ac b/configure.ac
> index 038c2688421c..d0ceb0d70d16 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -304,6 +304,15 @@ PKG_STATIC(UUID_LIBS_STATIC, [uuid])
>  PKG_CHECK_MODULES(ZLIB, [zlib])
>  PKG_STATIC(ZLIB_LIBS_STATIC, [zlib])
>  
> +PKG_CHECK_EXISTS([libudev], [pkg_config_libudev=yes], [pkg_config_libudev=no])
> +if test "x$pkg_config_libudev" = xyes; then
> +	PKG_CHECK_MODULES([LIBUDEV], [libudev])
> +	AC_DEFINE([HAVE_LIBUDEV], [1], [Define to 1 if libudev is available])
> +else
> +	AC_MSG_CHECKING([for LIBUDEV])
> +	AC_MSG_RESULT([no])
> +fi
> +
>  AC_ARG_ENABLE([zstd],
>  	AS_HELP_STRING([--disable-zstd], [build without zstd support]),
>  	[], [enable_zstd=yes]
> -- 
> 2.17.1
David Sterba Sept. 29, 2021, 6:38 p.m. UTC | #8
On Wed, Sep 29, 2021 at 09:59:15AM +0300, Nikolay Borisov wrote:
> On 29.09.21 г. 2:03, Anand Jain wrote:
> > On 28/09/2021 20:37, Nikolay Borisov wrote:
> >> Currently btrfs-progs will happily enumerate any device which has a
> >> btrfs filesystem on it. For the majority of use cases that's fine and
> >> there haven't been any problems with that.
> > 
> >> However, there was a recent
> >> report
> > 
> >  Could you point to the report or if it is internal?
> 
> Internal
> 
> > 
> >  Kernel message has the process of name for the device scan.
> >  We don't have to fix the btrfs-progs end if it is not doing it.
> > 
> >> that in multipath scenario when running "btrfs fi show" after a
> >> path flap 
> > 
> >  It is better to use 'btrfs fi show -m' it provides kernel perspective.
> 
> [146822.972653] BTRFS warning: duplicate device /dev/sdd devid 1
> generation 8 scanned by systemd-udevd (6254)
> 
> [146823.060984] BTRFS info (device dm-0): devid 1 device path
> /dev/mapper/3600140501cc1f49e5364f0093869c763 changed to /dev/dm-0
> scanned by systemd-udevd (6266)
> 
> [146823.075084] BTRFS info (device dm-0): devid 1 device path /dev/dm-0
> changed to /dev/mapper/3600140501cc1f49e5364f0093869c763 scanned by
> systemd-udevd (6266)
> 
> 
> btrfs fi show -m is actually consistent with always showing the mapper
> device.
> 
> > 
> >  What do you mean by path flap here? Do you mean a device-path in a
> > multi-path config disappeared forever or failed temporarily?
> 
> flap means going up and down.

Please use a more verbose description like

  "a path can flap (go up and down)"
Anand Jain Sept. 30, 2021, 1:02 a.m. UTC | #9
On 29/09/2021 20:51, Nikolay Borisov wrote:
> 
> 
> On 29.09.21 г. 15:44, Anand Jain wrote:
>>
>>
>>>>> flap means going up and down. The gist is that btrfs fi show would show
>>>>> the latest device being scanned, which in the case of multipath device
>>>>> could be any of the paths.
>>
>>    But why the problem is only when a device flaps? Or it doesn't matter?
> 


> Because when the device re-appears it will be the last device scanned by
> btrfs scanning code.

  It shouldn't be so fragile that we merely depend on the order of the
  btrfs device enumeration. Device orders can be random. It shouldn't
  matter. Even if it is working fine before the disappear-reappear cycle,
  it is just by luck.

  To show the list of btrfs devices that are unmounted, we use
  btrfs_scan_devices(). In btrfs_scan_devices(), we extensively use
  libblkid to enumerate btrfs devices. Few important lines of it are
  as below.

-----------------
367 int btrfs_scan_devices(int verbose)

<snip>

381         ret = blkid_get_cache(&cache, NULL);
382         if (ret < 0) {
383                 errno = -ret;
384                 error("blkid cache get failed: %m");
385                 return ret;
386         }
387         blkid_probe_all(cache);
388         iter = blkid_dev_iterate_begin(cache);
389         blkid_dev_set_search(iter, "TYPE", "btrfs");
390         while (blkid_dev_next(iter, &dev) == 0) {
391                 dev = blkid_verify(cache, dev);
392                 if (!dev)
393                         continue;
394                 /* if we are here its definitely a btrfs disk*/
395                 strncpy_null(path, blkid_dev_devname(dev));
-----------------

  So, you mean to say the blkid_dev_set_search() always finds all three
  paths containing the same fsid+uuid+devid?
  But, only its order varies when the a underlying device disappears and
  reappears.

  For example:
   Device order before /dev/sdb disappeared
    /dev/sda  MAJ:MIN??
    /dev/sdb  MAJ:MIN??
    /dev/mapper/3600140501cc1f49e5364f0093869c763  MAJ:MIN??

   Device order after /dev/sdb reappeared
    /dev/sda  MAJ:MIN??
    /dev/mapper/3600140501cc1f49e5364f0093869c763  MAJ:MIN??
    /dev/sdb  MAJ:MIN??

  Could you please help to find the MAJ:MIN of the devices before and
  after the disappear-reappear cycle? Are we sure the reappeared device
  has the same MAJ:MIN as before? is it shown as a new device? If not
  then it could be block layer problem too.

  So as said we shouldn't depend on the order of the device enumeration.
  Your fix makes sense to an extent but, still depend on the device
  order of reporting. Let us say we dd a device to another device
  then, btrfs fi show will show the last enumerated device by blkid
  (I think). If yes then, it is wrong.

  Could we make it order neutral? I don't know how. I think the only
  choice is to list all the devices in a tree format. Similar to
  lsblk(8).

Thanks, Anand

> It can be reproduced by following steps: > 1) Validate 'btrfs fi show' is showing /dev/mapper/xxxx for all fs's
> 1) Unplug one of the cables from the FC adapter < this can be simulated
> by simply doing 'echo 1 > /sys/block/sdd/device/delete' for the given
> path device >
> 2) Wait for the paths/drives associated with the downed port to disappear
> 3) Check again that 'btrfs fi show' still lists the /dev/mapper entry
> 4) Reattach the cable to the hba port <this can be simulated by
> rescanning the HBA or w/e bus you have: echo "- - -" >
> /sys/class/scsi_host/host1/scan >
> 5) Check that 'btrfs fi show' is now shows /dev/sdX devices for all mpio
> fs's
> 
>>
>>>>    Do you mean 'btrfs fi show' shows a device of the multi-path however,
>>>>    'btrfs fi show -m' shows the correct multi-path device?
>>>
>>> Yes, that's a problem purely in btrfs-progs, as the path devices are
>>> opened exclusively for the purpose of multiapth.
>>
>>   Ok. All parts of the test case is with an unmounted btrfs, I am
>> clarified, now.
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index 93fe4c2b3e08..e96f66a36b46 100644
--- a/Makefile
+++ b/Makefile
@@ -129,7 +129,7 @@  LIBS = $(LIBS_BASE) $(LIBS_CRYPTO)
 LIBBTRFS_LIBS = $(LIBS_BASE) $(LIBS_CRYPTO)
 
 # Static compilation flags
-STATIC_CFLAGS = $(CFLAGS) -ffunction-sections -fdata-sections
+STATIC_CFLAGS = $(CFLAGS) -ffunction-sections -fdata-sections -DSTATIC_BUILD
 STATIC_LDFLAGS = -static -Wl,--gc-sections
 STATIC_LIBS = $(STATIC_LIBS_BASE)
 
diff --git a/Makefile.inc.in b/Makefile.inc.in
index 9f49337147b8..c995aef97219 100644
--- a/Makefile.inc.in
+++ b/Makefile.inc.in
@@ -27,7 +27,7 @@  CRYPTO_CFLAGS = @GCRYPT_CFLAGS@ @SODIUM_CFLAGS@ @KCAPI_CFLAGS@
 SUBST_CFLAGS = @CFLAGS@
 SUBST_LDFLAGS = @LDFLAGS@
 
-LIBS_BASE = @UUID_LIBS@ @BLKID_LIBS@ -L. -pthread
+LIBS_BASE = @UUID_LIBS@ @BLKID_LIBS@ @LIBUDEV_LIBS@ -L. -pthread
 LIBS_COMP = @ZLIB_LIBS@ @LZO2_LIBS@ @ZSTD_LIBS@
 LIBS_PYTHON = @PYTHON_LIBS@
 LIBS_CRYPTO = @GCRYPT_LIBS@ @SODIUM_LIBS@ @KCAPI_LIBS@
diff --git a/common/device-scan.c b/common/device-scan.c
index b5bfe844104b..2ed0e34d3664 100644
--- a/common/device-scan.c
+++ b/common/device-scan.c
@@ -14,6 +14,10 @@ 
  * Boston, MA 021110-1307, USA.
  */
 
+#ifdef STATIC_BUILD
+#undef HAVE_LIBUDEV
+#endif
+
 #include "kerncompat.h"
 #include <sys/ioctl.h>
 #include <stdlib.h>
@@ -25,6 +29,10 @@ 
 #include <dirent.h>
 #include <blkid/blkid.h>
 #include <uuid/uuid.h>
+#ifdef HAVE_LIBUDEV
+#include <sys/stat.h>
+#include <libudev.h>
+#endif
 #include "kernel-lib/overflow.h"
 #include "common/path-utils.h"
 #include "common/device-scan.h"
@@ -364,6 +372,37 @@  void free_seen_fsid(struct seen_fsid *seen_fsid_hash[])
 	}
 }
 
+#ifdef HAVE_LIBUDEV
+static bool is_path_device(char *device_path)
+{
+	struct udev *udev = NULL;
+	struct udev_device *dev = NULL;
+	struct stat dev_stat;
+	const char *val;
+	bool ret = false;
+
+	if (stat(device_path, &dev_stat) < 0)
+		return false;
+
+	udev = udev_new();
+	if (!udev)
+		goto out;
+
+	dev = udev_device_new_from_devnum(udev, 'b', dev_stat.st_rdev);
+	if (!dev)
+		goto out;
+
+	val = udev_device_get_property_value(dev, "DM_MULTIPATH_DEVICE_PATH");
+	if (val && atoi(val) > 0)
+		ret = true;
+out:
+	udev_device_unref(dev);
+	udev_unref(udev);
+
+	return ret;
+}
+#endif
+
 int btrfs_scan_devices(int verbose)
 {
 	int fd = -1;
@@ -394,6 +433,11 @@  int btrfs_scan_devices(int verbose)
 		/* if we are here its definitely a btrfs disk*/
 		strncpy_null(path, blkid_dev_devname(dev));
 
+#ifdef HAVE_LIBUDEV
+		if (is_path_device(path))
+			continue;
+#endif
+
 		fd = open(path, O_RDONLY);
 		if (fd < 0) {
 			error("cannot open %s: %m", path);
diff --git a/configure.ac b/configure.ac
index 038c2688421c..d0ceb0d70d16 100644
--- a/configure.ac
+++ b/configure.ac
@@ -304,6 +304,15 @@  PKG_STATIC(UUID_LIBS_STATIC, [uuid])
 PKG_CHECK_MODULES(ZLIB, [zlib])
 PKG_STATIC(ZLIB_LIBS_STATIC, [zlib])
 
+PKG_CHECK_EXISTS([libudev], [pkg_config_libudev=yes], [pkg_config_libudev=no])
+if test "x$pkg_config_libudev" = xyes; then
+	PKG_CHECK_MODULES([LIBUDEV], [libudev])
+	AC_DEFINE([HAVE_LIBUDEV], [1], [Define to 1 if libudev is available])
+else
+	AC_MSG_CHECKING([for LIBUDEV])
+	AC_MSG_RESULT([no])
+fi
+
 AC_ARG_ENABLE([zstd],
 	AS_HELP_STRING([--disable-zstd], [build without zstd support]),
 	[], [enable_zstd=yes]