mbox series

[v2,0/4] Fixes for pNFS SCSI layout PR key registration

Message ID 20240621162227.215412-6-cel@kernel.org (mailing list archive)
Headers show
Series Fixes for pNFS SCSI layout PR key registration | expand

Message

Chuck Lever June 21, 2024, 4:22 p.m. UTC
From: Chuck Lever <chuck.lever@oracle.com>

The double registration/unregistration I observed was actually the
registration and unregistration of two separate block devices: one
for /media/test and one for /media/scratch. So, that was a false
alarm.

The complete fstests run shows:

Failures: generic/126 generic/355 generic/450 generic/740

unknown: run fstests generic/108 at 2024-06-21 10:13:58
systemd[1]: Started fstests-generic-108.scope - /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/108.
kernel: sd 6:0:0:1: reservation conflict
kernel: sd 6:0:0:1: [sdb] tag#30 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
kernel: sd 6:0:0:1: [sdb] tag#30 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00
kernel: reservation conflict error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 2
systemd[1]: fstests-generic-108.scope: Deactivated successfully.

These errors appear in the system journal only when the whole
fstests series is run. I can see the "block_rq_complete [-52]" in
the trace log. But the test output shows:

generic/108       [not run] require cel-nfsd:/export/nfs-pnfs-fs-s to be valid block disk

generic/450 is also failing:

generic/450       - output mismatch (see /data/fstests-install/xfstests/results/cel-nfs-pnfs/6.10.0-rc4-gd24c98202dbe/nfs_pnfs/generic/450.out.bad)
    --- tests/generic/450.out	2024-06-20 16:50:06.548035014 -0400
    +++ /data/fstests-install/xfstests/results/cel-nfs-pnfs/6.10.0-rc4-gd24c98202dbe/nfs_pnfs/generic/450.out.bad	2024-06-21 10:44:02.600634341 -0400
    @@ -8,4 +8,6 @@
     direct read the second block contains EOF
     direct read a sector at (after) EOF
     direct read the last sector past EOF
    +expect [2093056,4096,0], got [2093056,4096,4096]
     direct read at far away from EOF
    +expect [104857600,4096,0], got [104857600,4096,4096]
    ...

However this might be a bug that existed before this series.

The other three explicit test failures are usual for NFSv4.1.

---
Changes since RFC:
- series re-ordered to place fixes first
- address review comments as best I can

Chuck Lever (4):
  nfs/blocklayout: Fix premature PR key unregistration
  nfs/blocklayout: Use bulk page allocation APIs
  nfs/blocklayout: Report only when /no/ device is found
  nfs/blocklayout: SCSI layout trace points for reservation key
    reg/unreg

 fs/nfs/blocklayout/blocklayout.c | 13 ++++-
 fs/nfs/blocklayout/blocklayout.h |  8 ++-
 fs/nfs/blocklayout/dev.c         | 72 +++++++++++++++++---------
 fs/nfs/nfs4trace.c               |  7 +++
 fs/nfs/nfs4trace.h               | 88 ++++++++++++++++++++++++++++++++
 fs/nfs/pnfs_dev.c                | 15 +++---
 6 files changed, 166 insertions(+), 37 deletions(-)

Comments

Benjamin Coddington June 21, 2024, 6:03 p.m. UTC | #1
On 21 Jun 2024, at 12:22, cel@kernel.org wrote:

> From: Chuck Lever <chuck.lever@oracle.com>
>
> The double registration/unregistration I observed was actually the
> registration and unregistration of two separate block devices: one
> for /media/test and one for /media/scratch. So, that was a false
> alarm.
>
> The complete fstests run shows:
>
> Failures: generic/126 generic/355 generic/450 generic/740
>
> unknown: run fstests generic/108 at 2024-06-21 10:13:58
> systemd[1]: Started fstests-generic-108.scope - /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/108.
> kernel: sd 6:0:0:1: reservation conflict
> kernel: sd 6:0:0:1: [sdb] tag#30 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
> kernel: sd 6:0:0:1: [sdb] tag#30 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00
> kernel: reservation conflict error, dev sdb, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 2
> systemd[1]: fstests-generic-108.scope: Deactivated successfully.
>
> These errors appear in the system journal only when the whole
> fstests series is run. I can see the "block_rq_complete [-52]" in
> the trace log. But the test output shows:
>
> generic/108       [not run] require cel-nfsd:/export/nfs-pnfs-fs-s to be valid block disk
>
> generic/450 is also failing:
>
> generic/450       - output mismatch (see /data/fstests-install/xfstests/results/cel-nfs-pnfs/6.10.0-rc4-gd24c98202dbe/nfs_pnfs/generic/450.out.bad)
>     --- tests/generic/450.out	2024-06-20 16:50:06.548035014 -0400
>     +++ /data/fstests-install/xfstests/results/cel-nfs-pnfs/6.10.0-rc4-gd24c98202dbe/nfs_pnfs/generic/450.out.bad	2024-06-21 10:44:02.600634341 -0400
>     @@ -8,4 +8,6 @@
>      direct read the second block contains EOF
>      direct read a sector at (after) EOF
>      direct read the last sector past EOF
>     +expect [2093056,4096,0], got [2093056,4096,4096]
>      direct read at far away from EOF
>     +expect [104857600,4096,0], got [104857600,4096,4096]
>     ...
>
> However this might be a bug that existed before this series.
>
> The other three explicit test failures are usual for NFSv4.1.
>
> ---
> Changes since RFC:
> - series re-ordered to place fixes first
> - address review comments as best I can

Looks good, I like the bitops over the bool for pr_registered.

For the series:
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>

Ben