diff mbox series

[1/2] common/rc: wait for udev before creating dm targets

Message ID 165886491692.1585061.2529733779998396096.stgit@magnolia (mailing list archive)
State New, archived
Headers show
Series dmerror: support external log and rt devices | expand

Commit Message

Darrick J. Wong July 26, 2022, 7:48 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

Every now and then I see a failure when running generic/322 on btrfs:

QA output created by 322
failed to create flakey device

Looking in the 322.full file, I see:

device-mapper: reload ioctl on flakey-test (253:0) failed: Device or resource busy
Command failed.

And looking in dmesg, I see:

device-mapper: table: 8:3: linear: Device lookup failed (-16)
device-mapper: ioctl: error adding target to table

/dev/block/8:3 corresponds to the SCRATCH_DEV on this system.  Given the
failures in 322.out, I think this is caused by generic/322 calling
_init_flakey -> _dmsetup_create -> $DMSETUP_PROG create being unable to
open SCRATCH_DEV exclusively.  Add a call to $UDEV_SETTLE_PROG prior to
the creation of the target to try to calm the system down sufficiently
that the test can proceed.

Note that I don't have any hard evidence that it's udev at fault here --
the few times I've caught this thing, udev *has* been active spraying
error messages for nonexistent sysfs paths to journald and adding a
'udevadm settle' seems to fix it... but that's still only
circumstantial.  Regardless, it seems to have fixed the test failure.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 common/rc |    5 +++++
 1 file changed, 5 insertions(+)

Comments

Zorro Lang July 28, 2022, 5:52 p.m. UTC | #1
On Tue, Jul 26, 2022 at 12:48:36PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Every now and then I see a failure when running generic/322 on btrfs:
> 
> QA output created by 322
> failed to create flakey device
> 
> Looking in the 322.full file, I see:
> 
> device-mapper: reload ioctl on flakey-test (253:0) failed: Device or resource busy
> Command failed.
> 
> And looking in dmesg, I see:
> 
> device-mapper: table: 8:3: linear: Device lookup failed (-16)
> device-mapper: ioctl: error adding target to table
> 
> /dev/block/8:3 corresponds to the SCRATCH_DEV on this system.  Given the
> failures in 322.out, I think this is caused by generic/322 calling
> _init_flakey -> _dmsetup_create -> $DMSETUP_PROG create being unable to
> open SCRATCH_DEV exclusively.  Add a call to $UDEV_SETTLE_PROG prior to
> the creation of the target to try to calm the system down sufficiently
> that the test can proceed.
> 
> Note that I don't have any hard evidence that it's udev at fault here --
> the few times I've caught this thing, udev *has* been active spraying
> error messages for nonexistent sysfs paths to journald and adding a
> 'udevadm settle' seems to fix it... but that's still only
> circumstantial.  Regardless, it seems to have fixed the test failure.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  common/rc |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> 
> diff --git a/common/rc b/common/rc
> index f4469464..60a9bacd 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -4815,6 +4815,11 @@ _dmsetup_remove()
>  
>  _dmsetup_create()
>  {
> +	# Wait for udev to settle so that the dm creation doesn't fail because
> +	# some udev subprogram opened one of the block devices mentioned in the
> +	# table string w/ O_EXCL.  Do it again at the end so that an immediate
> +	# device open won't also fail.
> +	$UDEV_SETTLE_PROG >/dev/null 2>&1

No objection from me, as you've proved it works for your issue:)

Reviewed-by: Zorro Lang <zlang@redhat.com>

>  	$DMSETUP_PROG create "$@" >>$seqres.full 2>&1 || return 1
>  	$DMSETUP_PROG mknodes >/dev/null 2>&1
>  	$UDEV_SETTLE_PROG >/dev/null 2>&1
>
Christoph Hellwig July 28, 2022, 6:41 p.m. UTC | #2
I have no idea what except for udev this could be, so if the patch
works for you:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Naohiro Aota Aug. 8, 2022, 5:32 a.m. UTC | #3
On Tue, Jul 26, 2022 at 12:48:36PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Every now and then I see a failure when running generic/322 on btrfs:
> 
> QA output created by 322
> failed to create flakey device
> 
> Looking in the 322.full file, I see:
> 
> device-mapper: reload ioctl on flakey-test (253:0) failed: Device or resource busy
> Command failed.
> 
> And looking in dmesg, I see:
> 
> device-mapper: table: 8:3: linear: Device lookup failed (-16)
> device-mapper: ioctl: error adding target to table
> 
> /dev/block/8:3 corresponds to the SCRATCH_DEV on this system.  Given the
> failures in 322.out, I think this is caused by generic/322 calling
> _init_flakey -> _dmsetup_create -> $DMSETUP_PROG create being unable to
> open SCRATCH_DEV exclusively.  Add a call to $UDEV_SETTLE_PROG prior to
> the creation of the target to try to calm the system down sufficiently
> that the test can proceed.
> 
> Note that I don't have any hard evidence that it's udev at fault here --
> the few times I've caught this thing, udev *has* been active spraying
> error messages for nonexistent sysfs paths to journald and adding a
> 'udevadm settle' seems to fix it... but that's still only
> circumstantial.  Regardless, it seems to have fixed the test failure.

FYI, I often had a similar issue while I'm testing btrfs on zoned devices.

I used the following bpftrace to confirm udev is competing with dmsetup
when it failed.

$ sudo bpftrace -e 'kfunc:bd_prepare_to_claim { printf("%s %x %x\n", comm, args->bdev->bd_dev, args->holder)}'
...
mkfs.btrfs fd00000 b103b640
systemd-udevd fd00000 b103b640
systemd-udevd fd00000 b103b640
dmsetup fd00000 b06fb655  

> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  common/rc |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> 
> diff --git a/common/rc b/common/rc
> index f4469464..60a9bacd 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -4815,6 +4815,11 @@ _dmsetup_remove()
>  
>  _dmsetup_create()
>  {
> +	# Wait for udev to settle so that the dm creation doesn't fail because
> +	# some udev subprogram opened one of the block devices mentioned in the
> +	# table string w/ O_EXCL.  Do it again at the end so that an immediate
> +	# device open won't also fail.
> +	$UDEV_SETTLE_PROG >/dev/null 2>&1
>  	$DMSETUP_PROG create "$@" >>$seqres.full 2>&1 || return 1
>  	$DMSETUP_PROG mknodes >/dev/null 2>&1
>  	$UDEV_SETTLE_PROG >/dev/null 2>&1
>
diff mbox series

Patch

diff --git a/common/rc b/common/rc
index f4469464..60a9bacd 100644
--- a/common/rc
+++ b/common/rc
@@ -4815,6 +4815,11 @@  _dmsetup_remove()
 
 _dmsetup_create()
 {
+	# Wait for udev to settle so that the dm creation doesn't fail because
+	# some udev subprogram opened one of the block devices mentioned in the
+	# table string w/ O_EXCL.  Do it again at the end so that an immediate
+	# device open won't also fail.
+	$UDEV_SETTLE_PROG >/dev/null 2>&1
 	$DMSETUP_PROG create "$@" >>$seqres.full 2>&1 || return 1
 	$DMSETUP_PROG mknodes >/dev/null 2>&1
 	$UDEV_SETTLE_PROG >/dev/null 2>&1