mbox series

[v1,00/11] Put struct nfsd4_copy on a diet

Message ID 165852076926.11403.44005570813790008.stgit@manet.1015granger.net (mailing list archive)
Headers show
Series Put struct nfsd4_copy on a diet | expand

Message

Chuck Lever July 22, 2022, 8:18 p.m. UTC
While testing NFSD for-next, I noticed svc_generic_init_request()
was an unexpected hot spot on NFSv4 workloads. Drilling into the
perf report, it shows that the hot path in there is:

1208         memset(rqstp->rq_argp, 0, procp->pc_argsize);
1209         memset(rqstp->rq_resp, 0, procp->pc_ressize);

For an NFSv4 COMPOUND,

	procp->pc_argsize = sizeof(nfsd4_compoundargs),

struct nfsd4_compoundargs on my system is more than 17KB! This is
due to the size of the iops field:

	struct nfsd4_op                 iops[8];

Each struct nfsd4_op contains a union of the arguments for each
NFSv4 operation. Each argument is typically less than 128 bytes
except that struct nfsd4_copy and struct nfsd4_copy_notify are both
larger than 2KB each.

I'm not yet totally convinced this series never orphans memory, but
it does reduce the size of nfsd4_compoundargs to just over 4KB. This
is still due to struct nfsd4_copy being almost 500 bytes. I don't
see more low-hanging fruit there, though.

---

Chuck Lever (11):
      NFSD: Shrink size of struct nfsd4_copy_notify
      NFSD: Shrink size of struct nfsd4_copy
      NFSD: Reorder the fields in struct nfsd4_op
      NFSD: Make nfs4_put_copy() static
      NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags
      NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
      NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
      NFSD: Refactor nfsd4_do_copy()
      NFSD: Remove kmalloc from nfsd4_do_async_copy()
      NFSD: Add nfsd4_send_cb_offload()
      NFSD: Move copy offload callback arguments into a separate structure


 fs/nfsd/nfs4callback.c |  37 +++++----
 fs/nfsd/nfs4proc.c     | 165 +++++++++++++++++++++--------------------
 fs/nfsd/nfs4xdr.c      |  30 +++++---
 fs/nfsd/state.h        |   1 -
 fs/nfsd/xdr4.h         |  54 ++++++++++----
 5 files changed, 163 insertions(+), 124 deletions(-)

--
Chuck Lever

Comments

Olga Kornievskaia July 26, 2022, 7:45 p.m. UTC | #1
Chuck,

Are there pre-reqs for this series? I had tried to apply the patches
on top of 5-19-rc6 but I get the following compile error:

fs/nfsd/nfs4proc.c: In function ‘nfsd4_setup_inter_ssc’:
fs/nfsd/nfs4proc.c:1539:34: error: passing argument 1 of
‘nfsd4_interssc_connect’ from incompatible pointer type
[-Werror=incompatible-pointer-types]
  status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
                                  ^~~~~~~~~~~~~
fs/nfsd/nfs4proc.c:1414:43: note: expected ‘struct nl4_server *’ but
argument is of type ‘struct nl4_server **’
 nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
                        ~~~~~~~~~~~~~~~~~~~^~~
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:249: fs/nfsd/nfs4proc.o] Error 1
make[1]: *** [scripts/Makefile.build:466: fs/nfsd] Error 2
make: *** [Makefile:1843: fs] Error 2

On Fri, Jul 22, 2022 at 4:36 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>
> While testing NFSD for-next, I noticed svc_generic_init_request()
> was an unexpected hot spot on NFSv4 workloads. Drilling into the
> perf report, it shows that the hot path in there is:
>
> 1208         memset(rqstp->rq_argp, 0, procp->pc_argsize);
> 1209         memset(rqstp->rq_resp, 0, procp->pc_ressize);
>
> For an NFSv4 COMPOUND,
>
>         procp->pc_argsize = sizeof(nfsd4_compoundargs),
>
> struct nfsd4_compoundargs on my system is more than 17KB! This is
> due to the size of the iops field:
>
>         struct nfsd4_op                 iops[8];
>
> Each struct nfsd4_op contains a union of the arguments for each
> NFSv4 operation. Each argument is typically less than 128 bytes
> except that struct nfsd4_copy and struct nfsd4_copy_notify are both
> larger than 2KB each.
>
> I'm not yet totally convinced this series never orphans memory, but
> it does reduce the size of nfsd4_compoundargs to just over 4KB. This
> is still due to struct nfsd4_copy being almost 500 bytes. I don't
> see more low-hanging fruit there, though.
>
> ---
>
> Chuck Lever (11):
>       NFSD: Shrink size of struct nfsd4_copy_notify
>       NFSD: Shrink size of struct nfsd4_copy
>       NFSD: Reorder the fields in struct nfsd4_op
>       NFSD: Make nfs4_put_copy() static
>       NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags
>       NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
>       NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
>       NFSD: Refactor nfsd4_do_copy()
>       NFSD: Remove kmalloc from nfsd4_do_async_copy()
>       NFSD: Add nfsd4_send_cb_offload()
>       NFSD: Move copy offload callback arguments into a separate structure
>
>
>  fs/nfsd/nfs4callback.c |  37 +++++----
>  fs/nfsd/nfs4proc.c     | 165 +++++++++++++++++++++--------------------
>  fs/nfsd/nfs4xdr.c      |  30 +++++---
>  fs/nfsd/state.h        |   1 -
>  fs/nfsd/xdr4.h         |  54 ++++++++++----
>  5 files changed, 163 insertions(+), 124 deletions(-)
>
> --
> Chuck Lever
>
Olga Kornievskaia July 27, 2022, 4:18 p.m. UTC | #2
Hi Chuck,

To make it compile I did:
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 7196bcafdd86..f6deffc921d0 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1536,7 +1536,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
        if (status)
                goto out;

-       status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
+       status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount);
        if (status)
                goto out;

But when I tried to run the nfstest_ssc. The first test (intra01) made
the server oops:

[ 9569.551100] CPU: 0 PID: 2861 Comm: nfsd Not tainted 5.19.0-rc6+ #73
[ 9569.552385] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[ 9569.555043] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd]
[ 9569.556662] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d
79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20
01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00
48 29
[ 9569.561792] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282
[ 9569.563112] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000
[ 9569.565196] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008
[ 9569.567140] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228
[ 9569.568929] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00
[ 9569.570477] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000
[ 9569.572052] FS:  0000000000000000(0000) GS:ffff99b5bbe00000(0000)
knlGS:0000000000000000
[ 9569.573926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9569.575281] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0
[ 9569.577586] Call Trace:
[ 9569.578220]  <TASK>
[ 9569.578770]  ? nfsd4_proc_compound+0x3d2/0x730 [nfsd]
[ 9569.579945]  nfsd4_proc_compound+0x3d2/0x730 [nfsd]
[ 9569.581055]  nfsd_dispatch+0x146/0x270 [nfsd]
[ 9569.581987]  svc_process_common+0x365/0x5c0 [sunrpc]
[ 9569.583122]  ? nfsd_svc+0x350/0x350 [nfsd]
[ 9569.583986]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[ 9569.585129]  svc_process+0xb7/0xf0 [sunrpc]
[ 9569.586169]  nfsd+0xd5/0x190 [nfsd]
[ 9569.587170]  kthread+0xe8/0x110
[ 9569.587898]  ? kthread_complete_and_exit+0x20/0x20
[ 9569.588934]  ret_from_fork+0x22/0x30
[ 9569.589759]  </TASK>
[ 9569.590224] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event
intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul
vmw_balloon ghash_clmulni_intel joydev pcspkr btusb btrtl btbcm
btintel snd_ens1371 uvcvideo snd_ac97_codec videobuf2_vmalloc ac97_bus
videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq snd_pcm
videodev bluetooth mc rfkill ecdh_generic ecc snd_timer snd_rawmidi
snd_seq_device snd vmw_vmci soundcore i2c_piix4 auth_rpcgss sunrpc
ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic crc32c_intel
ata_piix nvme ahci libahci nvme_core t10_pi crc64_rocksoft serio_raw
crc64 vmwgfx drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect
sysimgblt fb_sys_fops vmxnet3 drm libata
[ 9569.610612] CR2: 0000000000000000
[ 9569.611375] ---[ end trace 0000000000000000 ]---
[ 9569.612424] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd]
[ 9569.613472] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d
79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20
01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00
48 29
[ 9569.617410] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282
[ 9569.618487] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000
[ 9569.620097] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008
[ 9569.621710] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228
[ 9569.623398] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00
[ 9569.625019] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000
[ 9569.627456] FS:  0000000000000000(0000) GS:ffff99b5bbe00000(0000)
knlGS:0000000000000000
[ 9569.629249] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9569.630433] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0
[ 9569.632043] Kernel panic - not syncing: Fatal exception



On Tue, Jul 26, 2022 at 3:45 PM Olga Kornievskaia <aglo@umich.edu> wrote:
>
> Chuck,
>
> Are there pre-reqs for this series? I had tried to apply the patches
> on top of 5-19-rc6 but I get the following compile error:
>
> fs/nfsd/nfs4proc.c: In function ‘nfsd4_setup_inter_ssc’:
> fs/nfsd/nfs4proc.c:1539:34: error: passing argument 1 of
> ‘nfsd4_interssc_connect’ from incompatible pointer type
> [-Werror=incompatible-pointer-types]
>   status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
>                                   ^~~~~~~~~~~~~
> fs/nfsd/nfs4proc.c:1414:43: note: expected ‘struct nl4_server *’ but
> argument is of type ‘struct nl4_server **’
>  nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
>                         ~~~~~~~~~~~~~~~~~~~^~~
> cc1: some warnings being treated as errors
> make[2]: *** [scripts/Makefile.build:249: fs/nfsd/nfs4proc.o] Error 1
> make[1]: *** [scripts/Makefile.build:466: fs/nfsd] Error 2
> make: *** [Makefile:1843: fs] Error 2
>
> On Fri, Jul 22, 2022 at 4:36 PM Chuck Lever <chuck.lever@oracle.com> wrote:
> >
> > While testing NFSD for-next, I noticed svc_generic_init_request()
> > was an unexpected hot spot on NFSv4 workloads. Drilling into the
> > perf report, it shows that the hot path in there is:
> >
> > 1208         memset(rqstp->rq_argp, 0, procp->pc_argsize);
> > 1209         memset(rqstp->rq_resp, 0, procp->pc_ressize);
> >
> > For an NFSv4 COMPOUND,
> >
> >         procp->pc_argsize = sizeof(nfsd4_compoundargs),
> >
> > struct nfsd4_compoundargs on my system is more than 17KB! This is
> > due to the size of the iops field:
> >
> >         struct nfsd4_op                 iops[8];
> >
> > Each struct nfsd4_op contains a union of the arguments for each
> > NFSv4 operation. Each argument is typically less than 128 bytes
> > except that struct nfsd4_copy and struct nfsd4_copy_notify are both
> > larger than 2KB each.
> >
> > I'm not yet totally convinced this series never orphans memory, but
> > it does reduce the size of nfsd4_compoundargs to just over 4KB. This
> > is still due to struct nfsd4_copy being almost 500 bytes. I don't
> > see more low-hanging fruit there, though.
> >
> > ---
> >
> > Chuck Lever (11):
> >       NFSD: Shrink size of struct nfsd4_copy_notify
> >       NFSD: Shrink size of struct nfsd4_copy
> >       NFSD: Reorder the fields in struct nfsd4_op
> >       NFSD: Make nfs4_put_copy() static
> >       NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags
> >       NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
> >       NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
> >       NFSD: Refactor nfsd4_do_copy()
> >       NFSD: Remove kmalloc from nfsd4_do_async_copy()
> >       NFSD: Add nfsd4_send_cb_offload()
> >       NFSD: Move copy offload callback arguments into a separate structure
> >
> >
> >  fs/nfsd/nfs4callback.c |  37 +++++----
> >  fs/nfsd/nfs4proc.c     | 165 +++++++++++++++++++++--------------------
> >  fs/nfsd/nfs4xdr.c      |  30 +++++---
> >  fs/nfsd/state.h        |   1 -
> >  fs/nfsd/xdr4.h         |  54 ++++++++++----
> >  5 files changed, 163 insertions(+), 124 deletions(-)
> >
> > --
> > Chuck Lever
> >
Dai Ngo July 27, 2022, 4:38 p.m. UTC | #3
Hi Olga,

I got the same problem. Can you try this patch:

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 21830cc1ed0a..18dd708ff846 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1921,14 +1921,15 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy)
  
         if (xdr_stream_decode_u32(argp->xdr, &count) < 0)
                 return nfserr_bad_xdr;
-       if (count == 0) { /* intra-server copy */
-               __set_bit(NFSD4_COPY_F_INTRA, &copy->cp_flags);
-               return nfs_ok;
-       }
  
         copy->cp_src = svcxdr_tmpalloc(argp, sizeof(*copy->cp_src));
         if (copy->cp_src == NULL)
-               return nfserrno(-ENOMEM);       /* XXX: jukebox? */
+               return nfserrno(-ENOMEM);
+       if (count == 0) { /* intra-server copy */
+               __set_bit(NFSD4_COPY_F_INTRA, &copy->cp_flags);
+               return nfs_ok;
+       } else
+               __clear_bit(NFSD4_COPY_F_INTRA, &copy->cp_flags);
  
         /* decode all the supplied server addresses but use only the first */
         status = nfsd4_decode_nl4_server(argp, copy->cp_src);


-Dai

On 7/27/22 9:18 AM, Olga Kornievskaia wrote:
> Hi Chuck,
>
> To make it compile I did:
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 7196bcafdd86..f6deffc921d0 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1536,7 +1536,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
>          if (status)
>                  goto out;
>
> -       status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
> +       status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount);
>          if (status)
>                  goto out;
>
> But when I tried to run the nfstest_ssc. The first test (intra01) made
> the server oops:
>
> [ 9569.551100] CPU: 0 PID: 2861 Comm: nfsd Not tainted 5.19.0-rc6+ #73
> [ 9569.552385] Hardware name: VMware, Inc. VMware Virtual
> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> [ 9569.555043] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd]
> [ 9569.556662] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d
> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20
> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00
> 48 29
> [ 9569.561792] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282
> [ 9569.563112] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000
> [ 9569.565196] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008
> [ 9569.567140] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228
> [ 9569.568929] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00
> [ 9569.570477] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000
> [ 9569.572052] FS:  0000000000000000(0000) GS:ffff99b5bbe00000(0000)
> knlGS:0000000000000000
> [ 9569.573926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 9569.575281] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0
> [ 9569.577586] Call Trace:
> [ 9569.578220]  <TASK>
> [ 9569.578770]  ? nfsd4_proc_compound+0x3d2/0x730 [nfsd]
> [ 9569.579945]  nfsd4_proc_compound+0x3d2/0x730 [nfsd]
> [ 9569.581055]  nfsd_dispatch+0x146/0x270 [nfsd]
> [ 9569.581987]  svc_process_common+0x365/0x5c0 [sunrpc]
> [ 9569.583122]  ? nfsd_svc+0x350/0x350 [nfsd]
> [ 9569.583986]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
> [ 9569.585129]  svc_process+0xb7/0xf0 [sunrpc]
> [ 9569.586169]  nfsd+0xd5/0x190 [nfsd]
> [ 9569.587170]  kthread+0xe8/0x110
> [ 9569.587898]  ? kthread_complete_and_exit+0x20/0x20
> [ 9569.588934]  ret_from_fork+0x22/0x30
> [ 9569.589759]  </TASK>
> [ 9569.590224] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
> iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
> nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
> vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event
> intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul
> vmw_balloon ghash_clmulni_intel joydev pcspkr btusb btrtl btbcm
> btintel snd_ens1371 uvcvideo snd_ac97_codec videobuf2_vmalloc ac97_bus
> videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq snd_pcm
> videodev bluetooth mc rfkill ecdh_generic ecc snd_timer snd_rawmidi
> snd_seq_device snd vmw_vmci soundcore i2c_piix4 auth_rpcgss sunrpc
> ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic crc32c_intel
> ata_piix nvme ahci libahci nvme_core t10_pi crc64_rocksoft serio_raw
> crc64 vmwgfx drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect
> sysimgblt fb_sys_fops vmxnet3 drm libata
> [ 9569.610612] CR2: 0000000000000000
> [ 9569.611375] ---[ end trace 0000000000000000 ]---
> [ 9569.612424] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd]
> [ 9569.613472] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d
> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20
> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00
> 48 29
> [ 9569.617410] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282
> [ 9569.618487] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000
> [ 9569.620097] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008
> [ 9569.621710] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228
> [ 9569.623398] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00
> [ 9569.625019] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000
> [ 9569.627456] FS:  0000000000000000(0000) GS:ffff99b5bbe00000(0000)
> knlGS:0000000000000000
> [ 9569.629249] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 9569.630433] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0
> [ 9569.632043] Kernel panic - not syncing: Fatal exception
>
>
>
> On Tue, Jul 26, 2022 at 3:45 PM Olga Kornievskaia <aglo@umich.edu> wrote:
>> Chuck,
>>
>> Are there pre-reqs for this series? I had tried to apply the patches
>> on top of 5-19-rc6 but I get the following compile error:
>>
>> fs/nfsd/nfs4proc.c: In function ‘nfsd4_setup_inter_ssc’:
>> fs/nfsd/nfs4proc.c:1539:34: error: passing argument 1 of
>> ‘nfsd4_interssc_connect’ from incompatible pointer type
>> [-Werror=incompatible-pointer-types]
>>    status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
>>                                    ^~~~~~~~~~~~~
>> fs/nfsd/nfs4proc.c:1414:43: note: expected ‘struct nl4_server *’ but
>> argument is of type ‘struct nl4_server **’
>>   nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
>>                          ~~~~~~~~~~~~~~~~~~~^~~
>> cc1: some warnings being treated as errors
>> make[2]: *** [scripts/Makefile.build:249: fs/nfsd/nfs4proc.o] Error 1
>> make[1]: *** [scripts/Makefile.build:466: fs/nfsd] Error 2
>> make: *** [Makefile:1843: fs] Error 2
>>
>> On Fri, Jul 22, 2022 at 4:36 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>>> While testing NFSD for-next, I noticed svc_generic_init_request()
>>> was an unexpected hot spot on NFSv4 workloads. Drilling into the
>>> perf report, it shows that the hot path in there is:
>>>
>>> 1208         memset(rqstp->rq_argp, 0, procp->pc_argsize);
>>> 1209         memset(rqstp->rq_resp, 0, procp->pc_ressize);
>>>
>>> For an NFSv4 COMPOUND,
>>>
>>>          procp->pc_argsize = sizeof(nfsd4_compoundargs),
>>>
>>> struct nfsd4_compoundargs on my system is more than 17KB! This is
>>> due to the size of the iops field:
>>>
>>>          struct nfsd4_op                 iops[8];
>>>
>>> Each struct nfsd4_op contains a union of the arguments for each
>>> NFSv4 operation. Each argument is typically less than 128 bytes
>>> except that struct nfsd4_copy and struct nfsd4_copy_notify are both
>>> larger than 2KB each.
>>>
>>> I'm not yet totally convinced this series never orphans memory, but
>>> it does reduce the size of nfsd4_compoundargs to just over 4KB. This
>>> is still due to struct nfsd4_copy being almost 500 bytes. I don't
>>> see more low-hanging fruit there, though.
>>>
>>> ---
>>>
>>> Chuck Lever (11):
>>>        NFSD: Shrink size of struct nfsd4_copy_notify
>>>        NFSD: Shrink size of struct nfsd4_copy
>>>        NFSD: Reorder the fields in struct nfsd4_op
>>>        NFSD: Make nfs4_put_copy() static
>>>        NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags
>>>        NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
>>>        NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
>>>        NFSD: Refactor nfsd4_do_copy()
>>>        NFSD: Remove kmalloc from nfsd4_do_async_copy()
>>>        NFSD: Add nfsd4_send_cb_offload()
>>>        NFSD: Move copy offload callback arguments into a separate structure
>>>
>>>
>>>   fs/nfsd/nfs4callback.c |  37 +++++----
>>>   fs/nfsd/nfs4proc.c     | 165 +++++++++++++++++++++--------------------
>>>   fs/nfsd/nfs4xdr.c      |  30 +++++---
>>>   fs/nfsd/state.h        |   1 -
>>>   fs/nfsd/xdr4.h         |  54 ++++++++++----
>>>   5 files changed, 163 insertions(+), 124 deletions(-)
>>>
>>> --
>>> Chuck Lever
>>>
Chuck Lever July 27, 2022, 5:15 p.m. UTC | #4
> On Jul 27, 2022, at 12:18 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> 
> Hi Chuck,

Sorry for the delay, I was traveling.

> To make it compile I did:
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 7196bcafdd86..f6deffc921d0 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1536,7 +1536,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
>        if (status)
>                goto out;
> 
> -       status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
> +       status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount);
>        if (status)
>                goto out;

Yes, same bug was reported by the day-0 kbot. v1 was kind of an RFC,
as I hadn't fully tested it. Sorry for mislabeling it.

I will post a v2 of this series with this fixed and with Dai's
fix for nfsd4_decode_copy(). Stand by.


> But when I tried to run the nfstest_ssc. The first test (intra01) made
> the server oops:
> 
> [ 9569.551100] CPU: 0 PID: 2861 Comm: nfsd Not tainted 5.19.0-rc6+ #73
> [ 9569.552385] Hardware name: VMware, Inc. VMware Virtual
> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> [ 9569.555043] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd]
> [ 9569.556662] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d
> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20
> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00
> 48 29
> [ 9569.561792] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282
> [ 9569.563112] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000
> [ 9569.565196] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008
> [ 9569.567140] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228
> [ 9569.568929] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00
> [ 9569.570477] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000
> [ 9569.572052] FS:  0000000000000000(0000) GS:ffff99b5bbe00000(0000)
> knlGS:0000000000000000
> [ 9569.573926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 9569.575281] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0
> [ 9569.577586] Call Trace:
> [ 9569.578220]  <TASK>
> [ 9569.578770]  ? nfsd4_proc_compound+0x3d2/0x730 [nfsd]
> [ 9569.579945]  nfsd4_proc_compound+0x3d2/0x730 [nfsd]
> [ 9569.581055]  nfsd_dispatch+0x146/0x270 [nfsd]
> [ 9569.581987]  svc_process_common+0x365/0x5c0 [sunrpc]
> [ 9569.583122]  ? nfsd_svc+0x350/0x350 [nfsd]
> [ 9569.583986]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
> [ 9569.585129]  svc_process+0xb7/0xf0 [sunrpc]
> [ 9569.586169]  nfsd+0xd5/0x190 [nfsd]
> [ 9569.587170]  kthread+0xe8/0x110
> [ 9569.587898]  ? kthread_complete_and_exit+0x20/0x20
> [ 9569.588934]  ret_from_fork+0x22/0x30
> [ 9569.589759]  </TASK>
> [ 9569.590224] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
> iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
> nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
> vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event
> intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul
> vmw_balloon ghash_clmulni_intel joydev pcspkr btusb btrtl btbcm
> btintel snd_ens1371 uvcvideo snd_ac97_codec videobuf2_vmalloc ac97_bus
> videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq snd_pcm
> videodev bluetooth mc rfkill ecdh_generic ecc snd_timer snd_rawmidi
> snd_seq_device snd vmw_vmci soundcore i2c_piix4 auth_rpcgss sunrpc
> ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic crc32c_intel
> ata_piix nvme ahci libahci nvme_core t10_pi crc64_rocksoft serio_raw
> crc64 vmwgfx drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect
> sysimgblt fb_sys_fops vmxnet3 drm libata
> [ 9569.610612] CR2: 0000000000000000
> [ 9569.611375] ---[ end trace 0000000000000000 ]---
> [ 9569.612424] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd]
> [ 9569.613472] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d
> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20
> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00
> 48 29
> [ 9569.617410] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282
> [ 9569.618487] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000
> [ 9569.620097] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008
> [ 9569.621710] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228
> [ 9569.623398] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00
> [ 9569.625019] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000
> [ 9569.627456] FS:  0000000000000000(0000) GS:ffff99b5bbe00000(0000)
> knlGS:0000000000000000
> [ 9569.629249] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 9569.630433] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0
> [ 9569.632043] Kernel panic - not syncing: Fatal exception
> 
> 
> 
> On Tue, Jul 26, 2022 at 3:45 PM Olga Kornievskaia <aglo@umich.edu> wrote:
>> 
>> Chuck,
>> 
>> Are there pre-reqs for this series? I had tried to apply the patches
>> on top of 5-19-rc6 but I get the following compile error:
>> 
>> fs/nfsd/nfs4proc.c: In function ‘nfsd4_setup_inter_ssc’:
>> fs/nfsd/nfs4proc.c:1539:34: error: passing argument 1 of
>> ‘nfsd4_interssc_connect’ from incompatible pointer type
>> [-Werror=incompatible-pointer-types]
>>  status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
>>                                  ^~~~~~~~~~~~~
>> fs/nfsd/nfs4proc.c:1414:43: note: expected ‘struct nl4_server *’ but
>> argument is of type ‘struct nl4_server **’
>> nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
>>                        ~~~~~~~~~~~~~~~~~~~^~~
>> cc1: some warnings being treated as errors
>> make[2]: *** [scripts/Makefile.build:249: fs/nfsd/nfs4proc.o] Error 1
>> make[1]: *** [scripts/Makefile.build:466: fs/nfsd] Error 2
>> make: *** [Makefile:1843: fs] Error 2
>> 
>> On Fri, Jul 22, 2022 at 4:36 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>>> 
>>> While testing NFSD for-next, I noticed svc_generic_init_request()
>>> was an unexpected hot spot on NFSv4 workloads. Drilling into the
>>> perf report, it shows that the hot path in there is:
>>> 
>>> 1208         memset(rqstp->rq_argp, 0, procp->pc_argsize);
>>> 1209         memset(rqstp->rq_resp, 0, procp->pc_ressize);
>>> 
>>> For an NFSv4 COMPOUND,
>>> 
>>>        procp->pc_argsize = sizeof(nfsd4_compoundargs),
>>> 
>>> struct nfsd4_compoundargs on my system is more than 17KB! This is
>>> due to the size of the iops field:
>>> 
>>>        struct nfsd4_op                 iops[8];
>>> 
>>> Each struct nfsd4_op contains a union of the arguments for each
>>> NFSv4 operation. Each argument is typically less than 128 bytes
>>> except that struct nfsd4_copy and struct nfsd4_copy_notify are both
>>> larger than 2KB each.
>>> 
>>> I'm not yet totally convinced this series never orphans memory, but
>>> it does reduce the size of nfsd4_compoundargs to just over 4KB. This
>>> is still due to struct nfsd4_copy being almost 500 bytes. I don't
>>> see more low-hanging fruit there, though.
>>> 
>>> ---
>>> 
>>> Chuck Lever (11):
>>>      NFSD: Shrink size of struct nfsd4_copy_notify
>>>      NFSD: Shrink size of struct nfsd4_copy
>>>      NFSD: Reorder the fields in struct nfsd4_op
>>>      NFSD: Make nfs4_put_copy() static
>>>      NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags
>>>      NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
>>>      NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
>>>      NFSD: Refactor nfsd4_do_copy()
>>>      NFSD: Remove kmalloc from nfsd4_do_async_copy()
>>>      NFSD: Add nfsd4_send_cb_offload()
>>>      NFSD: Move copy offload callback arguments into a separate structure
>>> 
>>> 
>>> fs/nfsd/nfs4callback.c |  37 +++++----
>>> fs/nfsd/nfs4proc.c     | 165 +++++++++++++++++++++--------------------
>>> fs/nfsd/nfs4xdr.c      |  30 +++++---
>>> fs/nfsd/state.h        |   1 -
>>> fs/nfsd/xdr4.h         |  54 ++++++++++----
>>> 5 files changed, 163 insertions(+), 124 deletions(-)
>>> 
>>> --
>>> Chuck Lever
>>> 

--
Chuck Lever
Olga Kornievskaia July 27, 2022, 5:52 p.m. UTC | #5
After applying Dai's patch I got further... I hit the next panic
(below)... before that it ran into a failure for "inter01" failed with
ECOMM. On hte trace, after the COPY is places the server returns
ESTALE in CB_OFFLOAD, then close is failed with BAD_SESSION (just
basically something really wrong happened on the server)... After
failing a new more tests in the similar fashion.. On cleanup the oops
happens.

[  842.455939] list_del corruption. prev->next should be
ffff9aaa8b5f0c78, but was ffff9aaab2713508. (prev=ffff9aaab2713510)
[  842.460118] ------------[ cut here ]------------
[  842.461599] kernel BUG at lib/list_debug.c:53!
[  842.462962] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[  842.464587] CPU: 1 PID: 500 Comm: kworker/u256:28 Not tainted 5.18.0 #70
[  842.466656] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[  842.470309] Workqueue: nfsd4 laundromat_main [nfsd]
[  842.471898] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a
[  842.473792] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4
d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd
d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48
89 ee
[  842.479607] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246
[  842.481828] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002
[  842.484769] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff
[  842.487252] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff
[  842.489939] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50
[  842.492215] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198
[  842.494406] FS:  0000000000000000(0000) GS:ffff9aaafbe40000(0000)
knlGS:0000000000000000
[  842.496939] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  842.498759] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0
[  842.500957] Call Trace:
[  842.501740]  <TASK>
[  842.502479]  _free_cpntf_state_locked+0x36/0x90 [nfsd]
[  842.504157]  laundromat_main+0x59e/0x8b0 [nfsd]
[  842.505594]  ? finish_task_switch+0xbd/0x2a0
[  842.507247]  process_one_work+0x1c8/0x390
[  842.508538]  worker_thread+0x30/0x360
[  842.509670]  ? process_one_work+0x390/0x390
[  842.510957]  kthread+0xe8/0x110
[  842.511938]  ? kthread_complete_and_exit+0x20/0x20
[  842.513422]  ret_from_fork+0x22/0x30
[  842.514533]  </TASK>
[  842.515219] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
vmw_vsock_vmci_transport vsock intel_rapl_msr snd_seq_midi
snd_seq_midi_event intel_rapl_common crct10dif_pclmul crc32_pclmul
vmw_balloon ghash_clmulni_intel pcspkr joydev btusb uvcvideo btrtl
btbcm btintel videobuf2_vmalloc videobuf2_memops snd_ens1371
videobuf2_v4l2 snd_ac97_codec ac97_bus videobuf2_common snd_seq
videodev snd_pcm bluetooth rfkill mc snd_timer snd_rawmidi
ecdh_generic snd_seq_device ecc snd soundcore vmw_vmci i2c_piix4
auth_rpcgss sunrpc ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic
nvme nvme_core t10_pi crc32c_intel crc64_rocksoft serio_raw crc64
vmwgfx vmxnet3 drm_ttm_helper ata_piix ttm drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops ahci libahci drm libata
[  842.541753] ---[ end trace 0000000000000000 ]---
[  842.543403] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a
[  842.545170] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4
d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd
d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48
89 ee
[  842.551346] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246
[  842.552999] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002
[  842.555151] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff
[  842.557503] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff
[  842.559694] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50
[  842.561956] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198
[  842.564300] FS:  0000000000000000(0000) GS:ffff9aaafbe40000(0000)
knlGS:0000000000000000
[  842.567357] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  842.569273] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0
[  842.571598] Kernel panic - not syncing: Fatal exception
[  842.573674] Kernel Offset: 0x2e800000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1101.134589] ---[ end Kernel panic - not syncing: Fatal exception ]---

On Wed, Jul 27, 2022 at 1:15 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Jul 27, 2022, at 12:18 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >
> > Hi Chuck,
>
> Sorry for the delay, I was traveling.
>
> > To make it compile I did:
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index 7196bcafdd86..f6deffc921d0 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -1536,7 +1536,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
> >        if (status)
> >                goto out;
> >
> > -       status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
> > +       status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount);
> >        if (status)
> >                goto out;
>
> Yes, same bug was reported by the day-0 kbot. v1 was kind of an RFC,
> as I hadn't fully tested it. Sorry for mislabeling it.
>
> I will post a v2 of this series with this fixed and with Dai's
> fix for nfsd4_decode_copy(). Stand by.
>
>
> > But when I tried to run the nfstest_ssc. The first test (intra01) made
> > the server oops:
> >
> > [ 9569.551100] CPU: 0 PID: 2861 Comm: nfsd Not tainted 5.19.0-rc6+ #73
> > [ 9569.552385] Hardware name: VMware, Inc. VMware Virtual
> > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> > [ 9569.555043] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd]
> > [ 9569.556662] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d
> > 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20
> > 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00
> > 48 29
> > [ 9569.561792] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282
> > [ 9569.563112] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000
> > [ 9569.565196] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008
> > [ 9569.567140] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228
> > [ 9569.568929] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00
> > [ 9569.570477] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000
> > [ 9569.572052] FS:  0000000000000000(0000) GS:ffff99b5bbe00000(0000)
> > knlGS:0000000000000000
> > [ 9569.573926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 9569.575281] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0
> > [ 9569.577586] Call Trace:
> > [ 9569.578220]  <TASK>
> > [ 9569.578770]  ? nfsd4_proc_compound+0x3d2/0x730 [nfsd]
> > [ 9569.579945]  nfsd4_proc_compound+0x3d2/0x730 [nfsd]
> > [ 9569.581055]  nfsd_dispatch+0x146/0x270 [nfsd]
> > [ 9569.581987]  svc_process_common+0x365/0x5c0 [sunrpc]
> > [ 9569.583122]  ? nfsd_svc+0x350/0x350 [nfsd]
> > [ 9569.583986]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
> > [ 9569.585129]  svc_process+0xb7/0xf0 [sunrpc]
> > [ 9569.586169]  nfsd+0xd5/0x190 [nfsd]
> > [ 9569.587170]  kthread+0xe8/0x110
> > [ 9569.587898]  ? kthread_complete_and_exit+0x20/0x20
> > [ 9569.588934]  ret_from_fork+0x22/0x30
> > [ 9569.589759]  </TASK>
> > [ 9569.590224] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
> > iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
> > xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
> > nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
> > vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event
> > intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul
> > vmw_balloon ghash_clmulni_intel joydev pcspkr btusb btrtl btbcm
> > btintel snd_ens1371 uvcvideo snd_ac97_codec videobuf2_vmalloc ac97_bus
> > videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq snd_pcm
> > videodev bluetooth mc rfkill ecdh_generic ecc snd_timer snd_rawmidi
> > snd_seq_device snd vmw_vmci soundcore i2c_piix4 auth_rpcgss sunrpc
> > ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic crc32c_intel
> > ata_piix nvme ahci libahci nvme_core t10_pi crc64_rocksoft serio_raw
> > crc64 vmwgfx drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect
> > sysimgblt fb_sys_fops vmxnet3 drm libata
> > [ 9569.610612] CR2: 0000000000000000
> > [ 9569.611375] ---[ end trace 0000000000000000 ]---
> > [ 9569.612424] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd]
> > [ 9569.613472] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d
> > 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20
> > 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00
> > 48 29
> > [ 9569.617410] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282
> > [ 9569.618487] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000
> > [ 9569.620097] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008
> > [ 9569.621710] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228
> > [ 9569.623398] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00
> > [ 9569.625019] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000
> > [ 9569.627456] FS:  0000000000000000(0000) GS:ffff99b5bbe00000(0000)
> > knlGS:0000000000000000
> > [ 9569.629249] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 9569.630433] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0
> > [ 9569.632043] Kernel panic - not syncing: Fatal exception
> >
> >
> >
> > On Tue, Jul 26, 2022 at 3:45 PM Olga Kornievskaia <aglo@umich.edu> wrote:
> >>
> >> Chuck,
> >>
> >> Are there pre-reqs for this series? I had tried to apply the patches
> >> on top of 5-19-rc6 but I get the following compile error:
> >>
> >> fs/nfsd/nfs4proc.c: In function ‘nfsd4_setup_inter_ssc’:
> >> fs/nfsd/nfs4proc.c:1539:34: error: passing argument 1 of
> >> ‘nfsd4_interssc_connect’ from incompatible pointer type
> >> [-Werror=incompatible-pointer-types]
> >>  status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
> >>                                  ^~~~~~~~~~~~~
> >> fs/nfsd/nfs4proc.c:1414:43: note: expected ‘struct nl4_server *’ but
> >> argument is of type ‘struct nl4_server **’
> >> nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
> >>                        ~~~~~~~~~~~~~~~~~~~^~~
> >> cc1: some warnings being treated as errors
> >> make[2]: *** [scripts/Makefile.build:249: fs/nfsd/nfs4proc.o] Error 1
> >> make[1]: *** [scripts/Makefile.build:466: fs/nfsd] Error 2
> >> make: *** [Makefile:1843: fs] Error 2
> >>
> >> On Fri, Jul 22, 2022 at 4:36 PM Chuck Lever <chuck.lever@oracle.com> wrote:
> >>>
> >>> While testing NFSD for-next, I noticed svc_generic_init_request()
> >>> was an unexpected hot spot on NFSv4 workloads. Drilling into the
> >>> perf report, it shows that the hot path in there is:
> >>>
> >>> 1208         memset(rqstp->rq_argp, 0, procp->pc_argsize);
> >>> 1209         memset(rqstp->rq_resp, 0, procp->pc_ressize);
> >>>
> >>> For an NFSv4 COMPOUND,
> >>>
> >>>        procp->pc_argsize = sizeof(nfsd4_compoundargs),
> >>>
> >>> struct nfsd4_compoundargs on my system is more than 17KB! This is
> >>> due to the size of the iops field:
> >>>
> >>>        struct nfsd4_op                 iops[8];
> >>>
> >>> Each struct nfsd4_op contains a union of the arguments for each
> >>> NFSv4 operation. Each argument is typically less than 128 bytes
> >>> except that struct nfsd4_copy and struct nfsd4_copy_notify are both
> >>> larger than 2KB each.
> >>>
> >>> I'm not yet totally convinced this series never orphans memory, but
> >>> it does reduce the size of nfsd4_compoundargs to just over 4KB. This
> >>> is still due to struct nfsd4_copy being almost 500 bytes. I don't
> >>> see more low-hanging fruit there, though.
> >>>
> >>> ---
> >>>
> >>> Chuck Lever (11):
> >>>      NFSD: Shrink size of struct nfsd4_copy_notify
> >>>      NFSD: Shrink size of struct nfsd4_copy
> >>>      NFSD: Reorder the fields in struct nfsd4_op
> >>>      NFSD: Make nfs4_put_copy() static
> >>>      NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags
> >>>      NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
> >>>      NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
> >>>      NFSD: Refactor nfsd4_do_copy()
> >>>      NFSD: Remove kmalloc from nfsd4_do_async_copy()
> >>>      NFSD: Add nfsd4_send_cb_offload()
> >>>      NFSD: Move copy offload callback arguments into a separate structure
> >>>
> >>>
> >>> fs/nfsd/nfs4callback.c |  37 +++++----
> >>> fs/nfsd/nfs4proc.c     | 165 +++++++++++++++++++++--------------------
> >>> fs/nfsd/nfs4xdr.c      |  30 +++++---
> >>> fs/nfsd/state.h        |   1 -
> >>> fs/nfsd/xdr4.h         |  54 ++++++++++----
> >>> 5 files changed, 163 insertions(+), 124 deletions(-)
> >>>
> >>> --
> >>> Chuck Lever
> >>>
>
> --
> Chuck Lever
>
>
>
Chuck Lever July 27, 2022, 6:04 p.m. UTC | #6
> On Jul 27, 2022, at 1:52 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> 
> After applying Dai's patch I got further... I hit the next panic
> (below)... before that it ran into a failure for "inter01" failed with
> ECOMM. On hte trace, after the COPY is places the server returns
> ESTALE in CB_OFFLOAD, then close is failed with BAD_SESSION (just
> basically something really wrong happened on the server)... After
> failing a new more tests in the similar fashion.. On cleanup the oops
> happens.

What test should I run to reproduce this?


> [  842.455939] list_del corruption. prev->next should be
> ffff9aaa8b5f0c78, but was ffff9aaab2713508. (prev=ffff9aaab2713510)
> [  842.460118] ------------[ cut here ]------------
> [  842.461599] kernel BUG at lib/list_debug.c:53!
> [  842.462962] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> [  842.464587] CPU: 1 PID: 500 Comm: kworker/u256:28 Not tainted 5.18.0 #70
> [  842.466656] Hardware name: VMware, Inc. VMware Virtual
> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> [  842.470309] Workqueue: nfsd4 laundromat_main [nfsd]
> [  842.471898] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a
> [  842.473792] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4
> d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd
> d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48
> 89 ee
> [  842.479607] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246
> [  842.481828] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002
> [  842.484769] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff
> [  842.487252] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff
> [  842.489939] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50
> [  842.492215] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198
> [  842.494406] FS:  0000000000000000(0000) GS:ffff9aaafbe40000(0000)
> knlGS:0000000000000000
> [  842.496939] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  842.498759] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0
> [  842.500957] Call Trace:
> [  842.501740]  <TASK>
> [  842.502479]  _free_cpntf_state_locked+0x36/0x90 [nfsd]
> [  842.504157]  laundromat_main+0x59e/0x8b0 [nfsd]
> [  842.505594]  ? finish_task_switch+0xbd/0x2a0
> [  842.507247]  process_one_work+0x1c8/0x390
> [  842.508538]  worker_thread+0x30/0x360
> [  842.509670]  ? process_one_work+0x390/0x390
> [  842.510957]  kthread+0xe8/0x110
> [  842.511938]  ? kthread_complete_and_exit+0x20/0x20
> [  842.513422]  ret_from_fork+0x22/0x30
> [  842.514533]  </TASK>
> [  842.515219] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
> iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
> nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
> vmw_vsock_vmci_transport vsock intel_rapl_msr snd_seq_midi
> snd_seq_midi_event intel_rapl_common crct10dif_pclmul crc32_pclmul
> vmw_balloon ghash_clmulni_intel pcspkr joydev btusb uvcvideo btrtl
> btbcm btintel videobuf2_vmalloc videobuf2_memops snd_ens1371
> videobuf2_v4l2 snd_ac97_codec ac97_bus videobuf2_common snd_seq
> videodev snd_pcm bluetooth rfkill mc snd_timer snd_rawmidi
> ecdh_generic snd_seq_device ecc snd soundcore vmw_vmci i2c_piix4
> auth_rpcgss sunrpc ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic
> nvme nvme_core t10_pi crc32c_intel crc64_rocksoft serio_raw crc64
> vmwgfx vmxnet3 drm_ttm_helper ata_piix ttm drm_kms_helper syscopyarea
> sysfillrect sysimgblt fb_sys_fops ahci libahci drm libata
> [  842.541753] ---[ end trace 0000000000000000 ]---
> [  842.543403] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a
> [  842.545170] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4
> d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd
> d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48
> 89 ee
> [  842.551346] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246
> [  842.552999] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002
> [  842.555151] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff
> [  842.557503] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff
> [  842.559694] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50
> [  842.561956] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198
> [  842.564300] FS:  0000000000000000(0000) GS:ffff9aaafbe40000(0000)
> knlGS:0000000000000000
> [  842.567357] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  842.569273] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0
> [  842.571598] Kernel panic - not syncing: Fatal exception
> [  842.573674] Kernel Offset: 0x2e800000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 1101.134589] ---[ end Kernel panic - not syncing: Fatal exception ]---
> 
> On Wed, Jul 27, 2022 at 1:15 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>> 
>> 
>> 
>>> On Jul 27, 2022, at 12:18 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>> 
>>> Hi Chuck,
>> 
>> Sorry for the delay, I was traveling.
>> 
>>> To make it compile I did:
>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>> index 7196bcafdd86..f6deffc921d0 100644
>>> --- a/fs/nfsd/nfs4proc.c
>>> +++ b/fs/nfsd/nfs4proc.c
>>> @@ -1536,7 +1536,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
>>>       if (status)
>>>               goto out;
>>> 
>>> -       status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
>>> +       status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount);
>>>       if (status)
>>>               goto out;
>> 
>> Yes, same bug was reported by the day-0 kbot. v1 was kind of an RFC,
>> as I hadn't fully tested it. Sorry for mislabeling it.
>> 
>> I will post a v2 of this series with this fixed and with Dai's
>> fix for nfsd4_decode_copy(). Stand by.
>> 
>> 
>>> But when I tried to run the nfstest_ssc. The first test (intra01) made
>>> the server oops:
>>> 
>>> [ 9569.551100] CPU: 0 PID: 2861 Comm: nfsd Not tainted 5.19.0-rc6+ #73
>>> [ 9569.552385] Hardware name: VMware, Inc. VMware Virtual
>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
>>> [ 9569.555043] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd]
>>> [ 9569.556662] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d
>>> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20
>>> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00
>>> 48 29
>>> [ 9569.561792] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282
>>> [ 9569.563112] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000
>>> [ 9569.565196] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008
>>> [ 9569.567140] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228
>>> [ 9569.568929] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00
>>> [ 9569.570477] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000
>>> [ 9569.572052] FS:  0000000000000000(0000) GS:ffff99b5bbe00000(0000)
>>> knlGS:0000000000000000
>>> [ 9569.573926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 9569.575281] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0
>>> [ 9569.577586] Call Trace:
>>> [ 9569.578220]  <TASK>
>>> [ 9569.578770]  ? nfsd4_proc_compound+0x3d2/0x730 [nfsd]
>>> [ 9569.579945]  nfsd4_proc_compound+0x3d2/0x730 [nfsd]
>>> [ 9569.581055]  nfsd_dispatch+0x146/0x270 [nfsd]
>>> [ 9569.581987]  svc_process_common+0x365/0x5c0 [sunrpc]
>>> [ 9569.583122]  ? nfsd_svc+0x350/0x350 [nfsd]
>>> [ 9569.583986]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
>>> [ 9569.585129]  svc_process+0xb7/0xf0 [sunrpc]
>>> [ 9569.586169]  nfsd+0xd5/0x190 [nfsd]
>>> [ 9569.587170]  kthread+0xe8/0x110
>>> [ 9569.587898]  ? kthread_complete_and_exit+0x20/0x20
>>> [ 9569.588934]  ret_from_fork+0x22/0x30
>>> [ 9569.589759]  </TASK>
>>> [ 9569.590224] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
>>> iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
>>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
>>> nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
>>> vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event
>>> intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul
>>> vmw_balloon ghash_clmulni_intel joydev pcspkr btusb btrtl btbcm
>>> btintel snd_ens1371 uvcvideo snd_ac97_codec videobuf2_vmalloc ac97_bus
>>> videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq snd_pcm
>>> videodev bluetooth mc rfkill ecdh_generic ecc snd_timer snd_rawmidi
>>> snd_seq_device snd vmw_vmci soundcore i2c_piix4 auth_rpcgss sunrpc
>>> ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic crc32c_intel
>>> ata_piix nvme ahci libahci nvme_core t10_pi crc64_rocksoft serio_raw
>>> crc64 vmwgfx drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect
>>> sysimgblt fb_sys_fops vmxnet3 drm libata
>>> [ 9569.610612] CR2: 0000000000000000
>>> [ 9569.611375] ---[ end trace 0000000000000000 ]---
>>> [ 9569.612424] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd]
>>> [ 9569.613472] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d
>>> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20
>>> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00
>>> 48 29
>>> [ 9569.617410] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282
>>> [ 9569.618487] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000
>>> [ 9569.620097] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008
>>> [ 9569.621710] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228
>>> [ 9569.623398] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00
>>> [ 9569.625019] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000
>>> [ 9569.627456] FS:  0000000000000000(0000) GS:ffff99b5bbe00000(0000)
>>> knlGS:0000000000000000
>>> [ 9569.629249] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 9569.630433] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0
>>> [ 9569.632043] Kernel panic - not syncing: Fatal exception
>>> 
>>> 
>>> 
>>> On Tue, Jul 26, 2022 at 3:45 PM Olga Kornievskaia <aglo@umich.edu> wrote:
>>>> 
>>>> Chuck,
>>>> 
>>>> Are there pre-reqs for this series? I had tried to apply the patches
>>>> on top of 5-19-rc6 but I get the following compile error:
>>>> 
>>>> fs/nfsd/nfs4proc.c: In function ‘nfsd4_setup_inter_ssc’:
>>>> fs/nfsd/nfs4proc.c:1539:34: error: passing argument 1 of
>>>> ‘nfsd4_interssc_connect’ from incompatible pointer type
>>>> [-Werror=incompatible-pointer-types]
>>>> status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
>>>>                                 ^~~~~~~~~~~~~
>>>> fs/nfsd/nfs4proc.c:1414:43: note: expected ‘struct nl4_server *’ but
>>>> argument is of type ‘struct nl4_server **’
>>>> nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
>>>>                       ~~~~~~~~~~~~~~~~~~~^~~
>>>> cc1: some warnings being treated as errors
>>>> make[2]: *** [scripts/Makefile.build:249: fs/nfsd/nfs4proc.o] Error 1
>>>> make[1]: *** [scripts/Makefile.build:466: fs/nfsd] Error 2
>>>> make: *** [Makefile:1843: fs] Error 2
>>>> 
>>>> On Fri, Jul 22, 2022 at 4:36 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>>>>> 
>>>>> While testing NFSD for-next, I noticed svc_generic_init_request()
>>>>> was an unexpected hot spot on NFSv4 workloads. Drilling into the
>>>>> perf report, it shows that the hot path in there is:
>>>>> 
>>>>> 1208         memset(rqstp->rq_argp, 0, procp->pc_argsize);
>>>>> 1209         memset(rqstp->rq_resp, 0, procp->pc_ressize);
>>>>> 
>>>>> For an NFSv4 COMPOUND,
>>>>> 
>>>>>       procp->pc_argsize = sizeof(nfsd4_compoundargs),
>>>>> 
>>>>> struct nfsd4_compoundargs on my system is more than 17KB! This is
>>>>> due to the size of the iops field:
>>>>> 
>>>>>       struct nfsd4_op                 iops[8];
>>>>> 
>>>>> Each struct nfsd4_op contains a union of the arguments for each
>>>>> NFSv4 operation. Each argument is typically less than 128 bytes
>>>>> except that struct nfsd4_copy and struct nfsd4_copy_notify are both
>>>>> larger than 2KB each.
>>>>> 
>>>>> I'm not yet totally convinced this series never orphans memory, but
>>>>> it does reduce the size of nfsd4_compoundargs to just over 4KB. This
>>>>> is still due to struct nfsd4_copy being almost 500 bytes. I don't
>>>>> see more low-hanging fruit there, though.
>>>>> 
>>>>> ---
>>>>> 
>>>>> Chuck Lever (11):
>>>>>     NFSD: Shrink size of struct nfsd4_copy_notify
>>>>>     NFSD: Shrink size of struct nfsd4_copy
>>>>>     NFSD: Reorder the fields in struct nfsd4_op
>>>>>     NFSD: Make nfs4_put_copy() static
>>>>>     NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags
>>>>>     NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
>>>>>     NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
>>>>>     NFSD: Refactor nfsd4_do_copy()
>>>>>     NFSD: Remove kmalloc from nfsd4_do_async_copy()
>>>>>     NFSD: Add nfsd4_send_cb_offload()
>>>>>     NFSD: Move copy offload callback arguments into a separate structure
>>>>> 
>>>>> 
>>>>> fs/nfsd/nfs4callback.c |  37 +++++----
>>>>> fs/nfsd/nfs4proc.c     | 165 +++++++++++++++++++++--------------------
>>>>> fs/nfsd/nfs4xdr.c      |  30 +++++---
>>>>> fs/nfsd/state.h        |   1 -
>>>>> fs/nfsd/xdr4.h         |  54 ++++++++++----
>>>>> 5 files changed, 163 insertions(+), 124 deletions(-)
>>>>> 
>>>>> --
>>>>> Chuck Lever
>>>>> 
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 

--
Chuck Lever
Olga Kornievskaia July 27, 2022, 6:21 p.m. UTC | #7
On Wed, Jul 27, 2022 at 2:04 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Jul 27, 2022, at 1:52 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >
> > After applying Dai's patch I got further... I hit the next panic
> > (below)... before that it ran into a failure for "inter01" failed with
> > ECOMM. On hte trace, after the COPY is places the server returns
> > ESTALE in CB_OFFLOAD, then close is failed with BAD_SESSION (just
> > basically something really wrong happened on the server)... After
> > failing a new more tests in the similar fashion.. On cleanup the oops
> > happens.
>
> What test should I run to reproduce this?

I'm running "./nfstest_ssc". It ran thru all with "inter15" being
last, then started "cleanup" and that's what panic-ed the server.

It's been a while since I tested ssc... so i'll undo all the patched
and re-run the tests to make sure that before code worked.

> > [  842.455939] list_del corruption. prev->next should be
> > ffff9aaa8b5f0c78, but was ffff9aaab2713508. (prev=ffff9aaab2713510)
> > [  842.460118] ------------[ cut here ]------------
> > [  842.461599] kernel BUG at lib/list_debug.c:53!
> > [  842.462962] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> > [  842.464587] CPU: 1 PID: 500 Comm: kworker/u256:28 Not tainted 5.18.0 #70
> > [  842.466656] Hardware name: VMware, Inc. VMware Virtual
> > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> > [  842.470309] Workqueue: nfsd4 laundromat_main [nfsd]
> > [  842.471898] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a
> > [  842.473792] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4
> > d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd
> > d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48
> > 89 ee
> > [  842.479607] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246
> > [  842.481828] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002
> > [  842.484769] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff
> > [  842.487252] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff
> > [  842.489939] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50
> > [  842.492215] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198
> > [  842.494406] FS:  0000000000000000(0000) GS:ffff9aaafbe40000(0000)
> > knlGS:0000000000000000
> > [  842.496939] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  842.498759] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0
> > [  842.500957] Call Trace:
> > [  842.501740]  <TASK>
> > [  842.502479]  _free_cpntf_state_locked+0x36/0x90 [nfsd]
> > [  842.504157]  laundromat_main+0x59e/0x8b0 [nfsd]
> > [  842.505594]  ? finish_task_switch+0xbd/0x2a0
> > [  842.507247]  process_one_work+0x1c8/0x390
> > [  842.508538]  worker_thread+0x30/0x360
> > [  842.509670]  ? process_one_work+0x390/0x390
> > [  842.510957]  kthread+0xe8/0x110
> > [  842.511938]  ? kthread_complete_and_exit+0x20/0x20
> > [  842.513422]  ret_from_fork+0x22/0x30
> > [  842.514533]  </TASK>
> > [  842.515219] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
> > iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
> > xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
> > nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
> > vmw_vsock_vmci_transport vsock intel_rapl_msr snd_seq_midi
> > snd_seq_midi_event intel_rapl_common crct10dif_pclmul crc32_pclmul
> > vmw_balloon ghash_clmulni_intel pcspkr joydev btusb uvcvideo btrtl
> > btbcm btintel videobuf2_vmalloc videobuf2_memops snd_ens1371
> > videobuf2_v4l2 snd_ac97_codec ac97_bus videobuf2_common snd_seq
> > videodev snd_pcm bluetooth rfkill mc snd_timer snd_rawmidi
> > ecdh_generic snd_seq_device ecc snd soundcore vmw_vmci i2c_piix4
> > auth_rpcgss sunrpc ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic
> > nvme nvme_core t10_pi crc32c_intel crc64_rocksoft serio_raw crc64
> > vmwgfx vmxnet3 drm_ttm_helper ata_piix ttm drm_kms_helper syscopyarea
> > sysfillrect sysimgblt fb_sys_fops ahci libahci drm libata
> > [  842.541753] ---[ end trace 0000000000000000 ]---
> > [  842.543403] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a
> > [  842.545170] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4
> > d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd
> > d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48
> > 89 ee
> > [  842.551346] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246
> > [  842.552999] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002
> > [  842.555151] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff
> > [  842.557503] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff
> > [  842.559694] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50
> > [  842.561956] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198
> > [  842.564300] FS:  0000000000000000(0000) GS:ffff9aaafbe40000(0000)
> > knlGS:0000000000000000
> > [  842.567357] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  842.569273] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0
> > [  842.571598] Kernel panic - not syncing: Fatal exception
> > [  842.573674] Kernel Offset: 0x2e800000 from 0xffffffff81000000
> > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [ 1101.134589] ---[ end Kernel panic - not syncing: Fatal exception ]---
> >
> > On Wed, Jul 27, 2022 at 1:15 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>
> >>
> >>
> >>> On Jul 27, 2022, at 12:18 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >>>
> >>> Hi Chuck,
> >>
> >> Sorry for the delay, I was traveling.
> >>
> >>> To make it compile I did:
> >>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >>> index 7196bcafdd86..f6deffc921d0 100644
> >>> --- a/fs/nfsd/nfs4proc.c
> >>> +++ b/fs/nfsd/nfs4proc.c
> >>> @@ -1536,7 +1536,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
> >>>       if (status)
> >>>               goto out;
> >>>
> >>> -       status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
> >>> +       status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount);
> >>>       if (status)
> >>>               goto out;
> >>
> >> Yes, same bug was reported by the day-0 kbot. v1 was kind of an RFC,
> >> as I hadn't fully tested it. Sorry for mislabeling it.
> >>
> >> I will post a v2 of this series with this fixed and with Dai's
> >> fix for nfsd4_decode_copy(). Stand by.
> >>
> >>
> >>> But when I tried to run the nfstest_ssc. The first test (intra01) made
> >>> the server oops:
> >>>
> >>> [ 9569.551100] CPU: 0 PID: 2861 Comm: nfsd Not tainted 5.19.0-rc6+ #73
> >>> [ 9569.552385] Hardware name: VMware, Inc. VMware Virtual
> >>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> >>> [ 9569.555043] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd]
> >>> [ 9569.556662] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d
> >>> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20
> >>> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00
> >>> 48 29
> >>> [ 9569.561792] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282
> >>> [ 9569.563112] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000
> >>> [ 9569.565196] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008
> >>> [ 9569.567140] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228
> >>> [ 9569.568929] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00
> >>> [ 9569.570477] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000
> >>> [ 9569.572052] FS:  0000000000000000(0000) GS:ffff99b5bbe00000(0000)
> >>> knlGS:0000000000000000
> >>> [ 9569.573926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [ 9569.575281] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0
> >>> [ 9569.577586] Call Trace:
> >>> [ 9569.578220]  <TASK>
> >>> [ 9569.578770]  ? nfsd4_proc_compound+0x3d2/0x730 [nfsd]
> >>> [ 9569.579945]  nfsd4_proc_compound+0x3d2/0x730 [nfsd]
> >>> [ 9569.581055]  nfsd_dispatch+0x146/0x270 [nfsd]
> >>> [ 9569.581987]  svc_process_common+0x365/0x5c0 [sunrpc]
> >>> [ 9569.583122]  ? nfsd_svc+0x350/0x350 [nfsd]
> >>> [ 9569.583986]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
> >>> [ 9569.585129]  svc_process+0xb7/0xf0 [sunrpc]
> >>> [ 9569.586169]  nfsd+0xd5/0x190 [nfsd]
> >>> [ 9569.587170]  kthread+0xe8/0x110
> >>> [ 9569.587898]  ? kthread_complete_and_exit+0x20/0x20
> >>> [ 9569.588934]  ret_from_fork+0x22/0x30
> >>> [ 9569.589759]  </TASK>
> >>> [ 9569.590224] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
> >>> iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
> >>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
> >>> nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
> >>> vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event
> >>> intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul
> >>> vmw_balloon ghash_clmulni_intel joydev pcspkr btusb btrtl btbcm
> >>> btintel snd_ens1371 uvcvideo snd_ac97_codec videobuf2_vmalloc ac97_bus
> >>> videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq snd_pcm
> >>> videodev bluetooth mc rfkill ecdh_generic ecc snd_timer snd_rawmidi
> >>> snd_seq_device snd vmw_vmci soundcore i2c_piix4 auth_rpcgss sunrpc
> >>> ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic crc32c_intel
> >>> ata_piix nvme ahci libahci nvme_core t10_pi crc64_rocksoft serio_raw
> >>> crc64 vmwgfx drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect
> >>> sysimgblt fb_sys_fops vmxnet3 drm libata
> >>> [ 9569.610612] CR2: 0000000000000000
> >>> [ 9569.611375] ---[ end trace 0000000000000000 ]---
> >>> [ 9569.612424] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd]
> >>> [ 9569.613472] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d
> >>> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20
> >>> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00
> >>> 48 29
> >>> [ 9569.617410] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282
> >>> [ 9569.618487] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000
> >>> [ 9569.620097] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008
> >>> [ 9569.621710] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228
> >>> [ 9569.623398] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00
> >>> [ 9569.625019] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000
> >>> [ 9569.627456] FS:  0000000000000000(0000) GS:ffff99b5bbe00000(0000)
> >>> knlGS:0000000000000000
> >>> [ 9569.629249] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [ 9569.630433] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0
> >>> [ 9569.632043] Kernel panic - not syncing: Fatal exception
> >>>
> >>>
> >>>
> >>> On Tue, Jul 26, 2022 at 3:45 PM Olga Kornievskaia <aglo@umich.edu> wrote:
> >>>>
> >>>> Chuck,
> >>>>
> >>>> Are there pre-reqs for this series? I had tried to apply the patches
> >>>> on top of 5-19-rc6 but I get the following compile error:
> >>>>
> >>>> fs/nfsd/nfs4proc.c: In function ‘nfsd4_setup_inter_ssc’:
> >>>> fs/nfsd/nfs4proc.c:1539:34: error: passing argument 1 of
> >>>> ‘nfsd4_interssc_connect’ from incompatible pointer type
> >>>> [-Werror=incompatible-pointer-types]
> >>>> status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
> >>>>                                 ^~~~~~~~~~~~~
> >>>> fs/nfsd/nfs4proc.c:1414:43: note: expected ‘struct nl4_server *’ but
> >>>> argument is of type ‘struct nl4_server **’
> >>>> nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
> >>>>                       ~~~~~~~~~~~~~~~~~~~^~~
> >>>> cc1: some warnings being treated as errors
> >>>> make[2]: *** [scripts/Makefile.build:249: fs/nfsd/nfs4proc.o] Error 1
> >>>> make[1]: *** [scripts/Makefile.build:466: fs/nfsd] Error 2
> >>>> make: *** [Makefile:1843: fs] Error 2
> >>>>
> >>>> On Fri, Jul 22, 2022 at 4:36 PM Chuck Lever <chuck.lever@oracle.com> wrote:
> >>>>>
> >>>>> While testing NFSD for-next, I noticed svc_generic_init_request()
> >>>>> was an unexpected hot spot on NFSv4 workloads. Drilling into the
> >>>>> perf report, it shows that the hot path in there is:
> >>>>>
> >>>>> 1208         memset(rqstp->rq_argp, 0, procp->pc_argsize);
> >>>>> 1209         memset(rqstp->rq_resp, 0, procp->pc_ressize);
> >>>>>
> >>>>> For an NFSv4 COMPOUND,
> >>>>>
> >>>>>       procp->pc_argsize = sizeof(nfsd4_compoundargs),
> >>>>>
> >>>>> struct nfsd4_compoundargs on my system is more than 17KB! This is
> >>>>> due to the size of the iops field:
> >>>>>
> >>>>>       struct nfsd4_op                 iops[8];
> >>>>>
> >>>>> Each struct nfsd4_op contains a union of the arguments for each
> >>>>> NFSv4 operation. Each argument is typically less than 128 bytes
> >>>>> except that struct nfsd4_copy and struct nfsd4_copy_notify are both
> >>>>> larger than 2KB each.
> >>>>>
> >>>>> I'm not yet totally convinced this series never orphans memory, but
> >>>>> it does reduce the size of nfsd4_compoundargs to just over 4KB. This
> >>>>> is still due to struct nfsd4_copy being almost 500 bytes. I don't
> >>>>> see more low-hanging fruit there, though.
> >>>>>
> >>>>> ---
> >>>>>
> >>>>> Chuck Lever (11):
> >>>>>     NFSD: Shrink size of struct nfsd4_copy_notify
> >>>>>     NFSD: Shrink size of struct nfsd4_copy
> >>>>>     NFSD: Reorder the fields in struct nfsd4_op
> >>>>>     NFSD: Make nfs4_put_copy() static
> >>>>>     NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags
> >>>>>     NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
> >>>>>     NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
> >>>>>     NFSD: Refactor nfsd4_do_copy()
> >>>>>     NFSD: Remove kmalloc from nfsd4_do_async_copy()
> >>>>>     NFSD: Add nfsd4_send_cb_offload()
> >>>>>     NFSD: Move copy offload callback arguments into a separate structure
> >>>>>
> >>>>>
> >>>>> fs/nfsd/nfs4callback.c |  37 +++++----
> >>>>> fs/nfsd/nfs4proc.c     | 165 +++++++++++++++++++++--------------------
> >>>>> fs/nfsd/nfs4xdr.c      |  30 +++++---
> >>>>> fs/nfsd/state.h        |   1 -
> >>>>> fs/nfsd/xdr4.h         |  54 ++++++++++----
> >>>>> 5 files changed, 163 insertions(+), 124 deletions(-)
> >>>>>
> >>>>> --
> >>>>> Chuck Lever
> >>>>>
> >>
> >> --
> >> Chuck Lever
> >>
> >>
> >>
>
> --
> Chuck Lever
>
>
>
Olga Kornievskaia July 27, 2022, 6:48 p.m. UTC | #8
On Wed, Jul 27, 2022 at 2:21 PM Olga Kornievskaia <aglo@umich.edu> wrote:
>
> On Wed, Jul 27, 2022 at 2:04 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >
> >
> >
> > > On Jul 27, 2022, at 1:52 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> > >
> > > After applying Dai's patch I got further... I hit the next panic
> > > (below)... before that it ran into a failure for "inter01" failed with
> > > ECOMM. On hte trace, after the COPY is places the server returns
> > > ESTALE in CB_OFFLOAD, then close is failed with BAD_SESSION (just
> > > basically something really wrong happened on the server)... After
> > > failing a new more tests in the similar fashion.. On cleanup the oops
> > > happens.
> >
> > What test should I run to reproduce this?
>
> I'm running "./nfstest_ssc". It ran thru all with "inter15" being
> last, then started "cleanup" and that's what panic-ed the server.
>
> It's been a while since I tested ssc... so i'll undo all the patched
> and re-run the tests to make sure that before code worked.

It looks like the code got broken before this patch set. The ESTALE in
CB_OFFLOAD leading to ECOM error happens without your patches. And
then the kernel panic. I'll do my best to git bisect where the problem
occurred first.

>
> > > [  842.455939] list_del corruption. prev->next should be
> > > ffff9aaa8b5f0c78, but was ffff9aaab2713508. (prev=ffff9aaab2713510)
> > > [  842.460118] ------------[ cut here ]------------
> > > [  842.461599] kernel BUG at lib/list_debug.c:53!
> > > [  842.462962] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> > > [  842.464587] CPU: 1 PID: 500 Comm: kworker/u256:28 Not tainted 5.18.0 #70
> > > [  842.466656] Hardware name: VMware, Inc. VMware Virtual
> > > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> > > [  842.470309] Workqueue: nfsd4 laundromat_main [nfsd]
> > > [  842.471898] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a
> > > [  842.473792] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4
> > > d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd
> > > d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48
> > > 89 ee
> > > [  842.479607] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246
> > > [  842.481828] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002
> > > [  842.484769] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff
> > > [  842.487252] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff
> > > [  842.489939] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50
> > > [  842.492215] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198
> > > [  842.494406] FS:  0000000000000000(0000) GS:ffff9aaafbe40000(0000)
> > > knlGS:0000000000000000
> > > [  842.496939] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  842.498759] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0
> > > [  842.500957] Call Trace:
> > > [  842.501740]  <TASK>
> > > [  842.502479]  _free_cpntf_state_locked+0x36/0x90 [nfsd]
> > > [  842.504157]  laundromat_main+0x59e/0x8b0 [nfsd]
> > > [  842.505594]  ? finish_task_switch+0xbd/0x2a0
> > > [  842.507247]  process_one_work+0x1c8/0x390
> > > [  842.508538]  worker_thread+0x30/0x360
> > > [  842.509670]  ? process_one_work+0x390/0x390
> > > [  842.510957]  kthread+0xe8/0x110
> > > [  842.511938]  ? kthread_complete_and_exit+0x20/0x20
> > > [  842.513422]  ret_from_fork+0x22/0x30
> > > [  842.514533]  </TASK>
> > > [  842.515219] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
> > > iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
> > > xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
> > > nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
> > > vmw_vsock_vmci_transport vsock intel_rapl_msr snd_seq_midi
> > > snd_seq_midi_event intel_rapl_common crct10dif_pclmul crc32_pclmul
> > > vmw_balloon ghash_clmulni_intel pcspkr joydev btusb uvcvideo btrtl
> > > btbcm btintel videobuf2_vmalloc videobuf2_memops snd_ens1371
> > > videobuf2_v4l2 snd_ac97_codec ac97_bus videobuf2_common snd_seq
> > > videodev snd_pcm bluetooth rfkill mc snd_timer snd_rawmidi
> > > ecdh_generic snd_seq_device ecc snd soundcore vmw_vmci i2c_piix4
> > > auth_rpcgss sunrpc ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic
> > > nvme nvme_core t10_pi crc32c_intel crc64_rocksoft serio_raw crc64
> > > vmwgfx vmxnet3 drm_ttm_helper ata_piix ttm drm_kms_helper syscopyarea
> > > sysfillrect sysimgblt fb_sys_fops ahci libahci drm libata
> > > [  842.541753] ---[ end trace 0000000000000000 ]---
> > > [  842.543403] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a
> > > [  842.545170] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4
> > > d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd
> > > d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48
> > > 89 ee
> > > [  842.551346] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246
> > > [  842.552999] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002
> > > [  842.555151] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff
> > > [  842.557503] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff
> > > [  842.559694] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50
> > > [  842.561956] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198
> > > [  842.564300] FS:  0000000000000000(0000) GS:ffff9aaafbe40000(0000)
> > > knlGS:0000000000000000
> > > [  842.567357] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  842.569273] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0
> > > [  842.571598] Kernel panic - not syncing: Fatal exception
> > > [  842.573674] Kernel Offset: 0x2e800000 from 0xffffffff81000000
> > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > [ 1101.134589] ---[ end Kernel panic - not syncing: Fatal exception ]---
> > >
> > > On Wed, Jul 27, 2022 at 1:15 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> > >>
> > >>
> > >>
> > >>> On Jul 27, 2022, at 12:18 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> > >>>
> > >>> Hi Chuck,
> > >>
> > >> Sorry for the delay, I was traveling.
> > >>
> > >>> To make it compile I did:
> > >>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > >>> index 7196bcafdd86..f6deffc921d0 100644
> > >>> --- a/fs/nfsd/nfs4proc.c
> > >>> +++ b/fs/nfsd/nfs4proc.c
> > >>> @@ -1536,7 +1536,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
> > >>>       if (status)
> > >>>               goto out;
> > >>>
> > >>> -       status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
> > >>> +       status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount);
> > >>>       if (status)
> > >>>               goto out;
> > >>
> > >> Yes, same bug was reported by the day-0 kbot. v1 was kind of an RFC,
> > >> as I hadn't fully tested it. Sorry for mislabeling it.
> > >>
> > >> I will post a v2 of this series with this fixed and with Dai's
> > >> fix for nfsd4_decode_copy(). Stand by.
> > >>
> > >>
> > >>> But when I tried to run the nfstest_ssc. The first test (intra01) made
> > >>> the server oops:
> > >>>
> > >>> [ 9569.551100] CPU: 0 PID: 2861 Comm: nfsd Not tainted 5.19.0-rc6+ #73
> > >>> [ 9569.552385] Hardware name: VMware, Inc. VMware Virtual
> > >>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> > >>> [ 9569.555043] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd]
> > >>> [ 9569.556662] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d
> > >>> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20
> > >>> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00
> > >>> 48 29
> > >>> [ 9569.561792] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282
> > >>> [ 9569.563112] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000
> > >>> [ 9569.565196] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008
> > >>> [ 9569.567140] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228
> > >>> [ 9569.568929] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00
> > >>> [ 9569.570477] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000
> > >>> [ 9569.572052] FS:  0000000000000000(0000) GS:ffff99b5bbe00000(0000)
> > >>> knlGS:0000000000000000
> > >>> [ 9569.573926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >>> [ 9569.575281] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0
> > >>> [ 9569.577586] Call Trace:
> > >>> [ 9569.578220]  <TASK>
> > >>> [ 9569.578770]  ? nfsd4_proc_compound+0x3d2/0x730 [nfsd]
> > >>> [ 9569.579945]  nfsd4_proc_compound+0x3d2/0x730 [nfsd]
> > >>> [ 9569.581055]  nfsd_dispatch+0x146/0x270 [nfsd]
> > >>> [ 9569.581987]  svc_process_common+0x365/0x5c0 [sunrpc]
> > >>> [ 9569.583122]  ? nfsd_svc+0x350/0x350 [nfsd]
> > >>> [ 9569.583986]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
> > >>> [ 9569.585129]  svc_process+0xb7/0xf0 [sunrpc]
> > >>> [ 9569.586169]  nfsd+0xd5/0x190 [nfsd]
> > >>> [ 9569.587170]  kthread+0xe8/0x110
> > >>> [ 9569.587898]  ? kthread_complete_and_exit+0x20/0x20
> > >>> [ 9569.588934]  ret_from_fork+0x22/0x30
> > >>> [ 9569.589759]  </TASK>
> > >>> [ 9569.590224] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
> > >>> iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
> > >>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
> > >>> nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
> > >>> vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event
> > >>> intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul
> > >>> vmw_balloon ghash_clmulni_intel joydev pcspkr btusb btrtl btbcm
> > >>> btintel snd_ens1371 uvcvideo snd_ac97_codec videobuf2_vmalloc ac97_bus
> > >>> videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq snd_pcm
> > >>> videodev bluetooth mc rfkill ecdh_generic ecc snd_timer snd_rawmidi
> > >>> snd_seq_device snd vmw_vmci soundcore i2c_piix4 auth_rpcgss sunrpc
> > >>> ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic crc32c_intel
> > >>> ata_piix nvme ahci libahci nvme_core t10_pi crc64_rocksoft serio_raw
> > >>> crc64 vmwgfx drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect
> > >>> sysimgblt fb_sys_fops vmxnet3 drm libata
> > >>> [ 9569.610612] CR2: 0000000000000000
> > >>> [ 9569.611375] ---[ end trace 0000000000000000 ]---
> > >>> [ 9569.612424] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd]
> > >>> [ 9569.613472] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d
> > >>> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20
> > >>> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00
> > >>> 48 29
> > >>> [ 9569.617410] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282
> > >>> [ 9569.618487] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000
> > >>> [ 9569.620097] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008
> > >>> [ 9569.621710] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228
> > >>> [ 9569.623398] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00
> > >>> [ 9569.625019] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000
> > >>> [ 9569.627456] FS:  0000000000000000(0000) GS:ffff99b5bbe00000(0000)
> > >>> knlGS:0000000000000000
> > >>> [ 9569.629249] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >>> [ 9569.630433] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0
> > >>> [ 9569.632043] Kernel panic - not syncing: Fatal exception
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Jul 26, 2022 at 3:45 PM Olga Kornievskaia <aglo@umich.edu> wrote:
> > >>>>
> > >>>> Chuck,
> > >>>>
> > >>>> Are there pre-reqs for this series? I had tried to apply the patches
> > >>>> on top of 5-19-rc6 but I get the following compile error:
> > >>>>
> > >>>> fs/nfsd/nfs4proc.c: In function ‘nfsd4_setup_inter_ssc’:
> > >>>> fs/nfsd/nfs4proc.c:1539:34: error: passing argument 1 of
> > >>>> ‘nfsd4_interssc_connect’ from incompatible pointer type
> > >>>> [-Werror=incompatible-pointer-types]
> > >>>> status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
> > >>>>                                 ^~~~~~~~~~~~~
> > >>>> fs/nfsd/nfs4proc.c:1414:43: note: expected ‘struct nl4_server *’ but
> > >>>> argument is of type ‘struct nl4_server **’
> > >>>> nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
> > >>>>                       ~~~~~~~~~~~~~~~~~~~^~~
> > >>>> cc1: some warnings being treated as errors
> > >>>> make[2]: *** [scripts/Makefile.build:249: fs/nfsd/nfs4proc.o] Error 1
> > >>>> make[1]: *** [scripts/Makefile.build:466: fs/nfsd] Error 2
> > >>>> make: *** [Makefile:1843: fs] Error 2
> > >>>>
> > >>>> On Fri, Jul 22, 2022 at 4:36 PM Chuck Lever <chuck.lever@oracle.com> wrote:
> > >>>>>
> > >>>>> While testing NFSD for-next, I noticed svc_generic_init_request()
> > >>>>> was an unexpected hot spot on NFSv4 workloads. Drilling into the
> > >>>>> perf report, it shows that the hot path in there is:
> > >>>>>
> > >>>>> 1208         memset(rqstp->rq_argp, 0, procp->pc_argsize);
> > >>>>> 1209         memset(rqstp->rq_resp, 0, procp->pc_ressize);
> > >>>>>
> > >>>>> For an NFSv4 COMPOUND,
> > >>>>>
> > >>>>>       procp->pc_argsize = sizeof(nfsd4_compoundargs),
> > >>>>>
> > >>>>> struct nfsd4_compoundargs on my system is more than 17KB! This is
> > >>>>> due to the size of the iops field:
> > >>>>>
> > >>>>>       struct nfsd4_op                 iops[8];
> > >>>>>
> > >>>>> Each struct nfsd4_op contains a union of the arguments for each
> > >>>>> NFSv4 operation. Each argument is typically less than 128 bytes
> > >>>>> except that struct nfsd4_copy and struct nfsd4_copy_notify are both
> > >>>>> larger than 2KB each.
> > >>>>>
> > >>>>> I'm not yet totally convinced this series never orphans memory, but
> > >>>>> it does reduce the size of nfsd4_compoundargs to just over 4KB. This
> > >>>>> is still due to struct nfsd4_copy being almost 500 bytes. I don't
> > >>>>> see more low-hanging fruit there, though.
> > >>>>>
> > >>>>> ---
> > >>>>>
> > >>>>> Chuck Lever (11):
> > >>>>>     NFSD: Shrink size of struct nfsd4_copy_notify
> > >>>>>     NFSD: Shrink size of struct nfsd4_copy
> > >>>>>     NFSD: Reorder the fields in struct nfsd4_op
> > >>>>>     NFSD: Make nfs4_put_copy() static
> > >>>>>     NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags
> > >>>>>     NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
> > >>>>>     NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
> > >>>>>     NFSD: Refactor nfsd4_do_copy()
> > >>>>>     NFSD: Remove kmalloc from nfsd4_do_async_copy()
> > >>>>>     NFSD: Add nfsd4_send_cb_offload()
> > >>>>>     NFSD: Move copy offload callback arguments into a separate structure
> > >>>>>
> > >>>>>
> > >>>>> fs/nfsd/nfs4callback.c |  37 +++++----
> > >>>>> fs/nfsd/nfs4proc.c     | 165 +++++++++++++++++++++--------------------
> > >>>>> fs/nfsd/nfs4xdr.c      |  30 +++++---
> > >>>>> fs/nfsd/state.h        |   1 -
> > >>>>> fs/nfsd/xdr4.h         |  54 ++++++++++----
> > >>>>> 5 files changed, 163 insertions(+), 124 deletions(-)
> > >>>>>
> > >>>>> --
> > >>>>> Chuck Lever
> > >>>>>
> > >>
> > >> --
> > >> Chuck Lever
> > >>
> > >>
> > >>
> >
> > --
> > Chuck Lever
> >
> >
> >
Dai Ngo July 27, 2022, 7:23 p.m. UTC | #9
On 7/27/22 11:48 AM, Olga Kornievskaia wrote:
> On Wed, Jul 27, 2022 at 2:21 PM Olga Kornievskaia <aglo@umich.edu> wrote:
>> On Wed, Jul 27, 2022 at 2:04 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>
>>>
>>>> On Jul 27, 2022, at 1:52 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>>
>>>> After applying Dai's patch I got further... I hit the next panic
>>>> (below)... before that it ran into a failure for "inter01" failed with
>>>> ECOMM. On hte trace, after the COPY is places the server returns
>>>> ESTALE in CB_OFFLOAD, then close is failed with BAD_SESSION (just
>>>> basically something really wrong happened on the server)... After
>>>> failing a new more tests in the similar fashion.. On cleanup the oops
>>>> happens.
>>> What test should I run to reproduce this?
>> I'm running "./nfstest_ssc". It ran thru all with "inter15" being
>> last, then started "cleanup" and that's what panic-ed the server.
>>
>> It's been a while since I tested ssc... so i'll undo all the patched
>> and re-run the tests to make sure that before code worked.
> It looks like the code got broken before this patch set. The ESTALE in
> CB_OFFLOAD leading to ECOM error happens without your patches. And
> then the kernel panic. I'll do my best to git bisect where the problem
> occurred first.

I think this this is what lead to the list_del corruption problem:

Jul 27 12:14:23 nfsvmd07 kernel: ==================================================================
Jul 27 12:14:23 nfsvmd07 kernel: BUG: KASAN: use-after-free in __list_del_entry_valid+0x16e/0x180
Jul 27 12:14:23 nfsvmd07 kernel: Read of size 8 at addr ffff8881189c8230 by task kworker/u2:1/23
Jul 27 12:14:23 nfsvmd07 kernel:
Jul 27 12:14:23 nfsvmd07 kernel: CPU: 0 PID: 23 Comm: kworker/u2:1 Not tainted 5.19.0-rc7+ #1
Jul 27 12:14:23 nfsvmd07 kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Jul 27 12:14:23 nfsvmd07 kernel: Workqueue: nfsd4 laundromat_main [nfsd]
Jul 27 12:14:23 nfsvmd07 kernel: Call Trace:
Jul 27 12:14:23 nfsvmd07 kernel: <TASK>
Jul 27 12:14:23 nfsvmd07 kernel: dump_stack_lvl+0x57/0x7d
Jul 27 12:14:23 nfsvmd07 kernel: print_report.cold+0xf8/0x654
Jul 27 12:14:23 nfsvmd07 kernel: ? __list_del_entry_valid+0x16e/0x180
Jul 27 12:14:23 nfsvmd07 kernel: kasan_report+0x8a/0x190
Jul 27 12:14:23 nfsvmd07 kernel: ? pm_suspend.cold+0x4e2/0x4e2
Jul 27 12:14:23 nfsvmd07 kernel: ? __list_del_entry_valid+0x16e/0x180
Jul 27 12:14:23 nfsvmd07 kernel: __list_del_entry_valid+0x16e/0x180
Jul 27 12:14:23 nfsvmd07 kernel: __list_del_entry+0xa/0xb0 [nfsd]
Jul 27 12:14:23 nfsvmd07 kernel: _free_cpntf_state_locked+0x75/0x170 [nfsd]
Jul 27 12:14:23 nfsvmd07 kernel: laundromat_main.cold+0x23/0x28 [nfsd]
Jul 27 12:14:23 nfsvmd07 kernel: ? release_lock_stateid+0x70/0x70 [nfsd]
Jul 27 12:14:23 nfsvmd07 kernel: ? rcu_read_lock_sched_held+0x81/0xb0
Jul 27 12:14:23 nfsvmd07 kernel: ? rcu_read_lock_bh_held+0x90/0x90
Jul 27 12:14:23 nfsvmd07 kernel: process_one_work+0x7cc/0x1350
Jul 27 12:14:23 nfsvmd07 kernel: ? lockdep_hardirqs_on_prepare+0x410/0x410
Jul 27 12:14:23 nfsvmd07 kernel: ? queue_delayed_work_on+0x90/0x90
Jul 27 12:14:23 nfsvmd07 kernel: ? rwlock_bug.part.0+0x90/0x90
Jul 27 12:14:23 nfsvmd07 kernel: worker_thread+0x55d/0xe80
Jul 27 12:14:23 nfsvmd07 kernel: ? process_one_work+0x1350/0x1350
Jul 27 12:14:23 nfsvmd07 kernel: kthread+0x29e/0x340
Jul 27 12:14:23 nfsvmd07 kernel: ? kthread_complete_and_exit+0x20/0x20
Jul 27 12:14:23 nfsvmd07 kernel: ret_from_fork+0x1f/0x30
Jul 27 12:14:23 nfsvmd07 kernel: </TASK>
Jul 27 12:14:23 nfsvmd07 kernel:
Jul 27 12:14:23 nfsvmd07 kernel: Allocated by task 4051:
Jul 27 12:14:23 nfsvmd07 kernel: kasan_save_stack+0x1e/0x40
Jul 27 12:14:23 nfsvmd07 kernel: __kasan_slab_alloc+0x64/0x80
Jul 27 12:14:23 nfsvmd07 kernel: kmem_cache_alloc+0xeb/0x2c0
Jul 27 12:14:23 nfsvmd07 kernel: nfs4_alloc_stid+0x29/0x430 [nfsd]
Jul 27 12:14:23 nfsvmd07 kernel: nfsd4_lock+0x1e9e/0x3cb0 [nfsd]
Jul 27 12:14:23 nfsvmd07 kernel: nfsd4_proc_compound+0xd75/0x26c0 [nfsd]
Jul 27 12:14:23 nfsvmd07 kernel: nfsd_dispatch+0x4e8/0xc00 [nfsd]
Jul 27 12:14:23 nfsvmd07 kernel: svc_process_common+0xb51/0x1af0 [sunrpc]
Jul 27 12:14:23 nfsvmd07 kernel: svc_process+0x361/0x4f0 [sunrpc]
Jul 27 12:14:23 nfsvmd07 kernel: nfsd+0x2d6/0x570 [nfsd]
Jul 27 12:14:23 nfsvmd07 kernel: kthread+0x29e/0x340
Jul 27 12:14:23 nfsvmd07 kernel: ret_from_fork+0x1f/0x30
Jul 27 12:14:23 nfsvmd07 kernel:
Jul 27 12:14:23 nfsvmd07 kernel: Freed by task 4051:
Jul 27 12:14:23 nfsvmd07 kernel: kasan_save_stack+0x1e/0x40
Jul 27 12:14:23 nfsvmd07 kernel: kasan_set_track+0x21/0x30
Jul 27 12:14:23 nfsvmd07 kernel: kasan_set_free_info+0x20/0x30
Jul 27 12:14:23 nfsvmd07 kernel: __kasan_slab_free+0xf0/0x160
Jul 27 12:14:23 nfsvmd07 kernel: kmem_cache_free.part.0+0x7f/0x1c0
Jul 27 12:14:23 nfsvmd07 kernel: free_ol_stateid_reaplist+0x12b/0x200 [nfsd]
Jul 27 12:14:23 nfsvmd07 kernel: nfsd4_close+0x58e/0xe10 [nfsd]
Jul 27 12:14:23 nfsvmd07 kernel: nfsd4_proc_compound+0xd75/0x26c0 [nfsd]
Jul 27 12:14:23 nfsvmd07 kernel: nfsd_dispatch+0x4e8/0xc00 [nfsd]
Jul 27 12:14:23 nfsvmd07 kernel: svc_process_common+0xb51/0x1af0 [sunrpc]
Jul 27 12:14:23 nfsvmd07 kernel: svc_process+0x361/0x4f0 [sunrpc]
Jul 27 12:14:23 nfsvmd07 kernel: nfsd+0x2d6/0x570 [nfsd]
Jul 27 12:14:23 nfsvmd07 kernel: kthread+0x29e/0x340
Jul 27 12:14:23 nfsvmd07 kernel: ret_from_fork+0x1f/0x30
Jul 27 12:14:23 nfsvmd07 kernel:
Jul 27 12:14:23 nfsvmd07 kernel: The buggy address belongs to the object at ffff8881189c8228#012 which belongs to the cache nfsd4_stateids of size 360
Jul 27 12:14:23 nfsvmd07 kernel: The buggy address is located 8 bytes inside of#012 360-byte region [ffff8881189c8228, ffff8881189c8390)
Jul 27 12:14:23 nfsvmd07 kernel:
Jul 27 12:14:23 nfsvmd07 kernel: The buggy address belongs to the physical page:
Jul 27 12:14:23 nfsvmd07 kernel: page:000000009faa88de refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1189c8
Jul 27 12:14:23 nfsvmd07 kernel: flags: 0x8000000000000200(slab|zone=2)
Jul 27 12:14:23 nfsvmd07 kernel: raw: 8000000000000200 ffff8881008a0950 ffffea000399e380 ffff888108fd9d00
Jul 27 12:14:23 nfsvmd07 kernel: raw: 0000000000000000 ffff8881189c8080 0000000100000009
Jul 27 12:14:23 nfsvmd07 kernel: page dumped because: kasan: bad access detected
Jul 27 12:14:23 nfsvmd07 kernel:
Jul 27 12:14:23 nfsvmd07 kernel: Memory state around the buggy address:
Jul 27 12:14:23 nfsvmd07 kernel: ffff8881189c8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jul 27 12:14:23 nfsvmd07 kernel: ffff8881189c8180: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
Jul 27 12:14:23 nfsvmd07 kernel: >ffff8881189c8200: fc fc fc fc fc fa fb fb fb fb fb fb fb fb fb fb
Jul 27 12:14:23 nfsvmd07 kernel:                                     ^
Jul 27 12:14:23 nfsvmd07 kernel: ffff8881189c8280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jul 27 12:14:23 nfsvmd07 kernel: ffff8881189c8300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jul 27 12:14:23 nfsvmd07 kernel: ==================================================================

I think nfs4_free_ol_stateid needs to also removing the
nfs4_cpntf_state from the s2s_cp_stateids list, still
validating.

-Dai

>
>>>> [  842.455939] list_del corruption. prev->next should be
>>>> ffff9aaa8b5f0c78, but was ffff9aaab2713508. (prev=ffff9aaab2713510)
>>>> [  842.460118] ------------[ cut here ]------------
>>>> [  842.461599] kernel BUG at lib/list_debug.c:53!
>>>> [  842.462962] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>>>> [  842.464587] CPU: 1 PID: 500 Comm: kworker/u256:28 Not tainted 5.18.0 #70
>>>> [  842.466656] Hardware name: VMware, Inc. VMware Virtual
>>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
>>>> [  842.470309] Workqueue: nfsd4 laundromat_main [nfsd]
>>>> [  842.471898] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a
>>>> [  842.473792] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4
>>>> d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd
>>>> d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48
>>>> 89 ee
>>>> [  842.479607] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246
>>>> [  842.481828] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002
>>>> [  842.484769] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff
>>>> [  842.487252] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff
>>>> [  842.489939] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50
>>>> [  842.492215] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198
>>>> [  842.494406] FS:  0000000000000000(0000) GS:ffff9aaafbe40000(0000)
>>>> knlGS:0000000000000000
>>>> [  842.496939] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [  842.498759] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0
>>>> [  842.500957] Call Trace:
>>>> [  842.501740]  <TASK>
>>>> [  842.502479]  _free_cpntf_state_locked+0x36/0x90 [nfsd]
>>>> [  842.504157]  laundromat_main+0x59e/0x8b0 [nfsd]
>>>> [  842.505594]  ? finish_task_switch+0xbd/0x2a0
>>>> [  842.507247]  process_one_work+0x1c8/0x390
>>>> [  842.508538]  worker_thread+0x30/0x360
>>>> [  842.509670]  ? process_one_work+0x390/0x390
>>>> [  842.510957]  kthread+0xe8/0x110
>>>> [  842.511938]  ? kthread_complete_and_exit+0x20/0x20
>>>> [  842.513422]  ret_from_fork+0x22/0x30
>>>> [  842.514533]  </TASK>
>>>> [  842.515219] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
>>>> iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
>>>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
>>>> nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
>>>> vmw_vsock_vmci_transport vsock intel_rapl_msr snd_seq_midi
>>>> snd_seq_midi_event intel_rapl_common crct10dif_pclmul crc32_pclmul
>>>> vmw_balloon ghash_clmulni_intel pcspkr joydev btusb uvcvideo btrtl
>>>> btbcm btintel videobuf2_vmalloc videobuf2_memops snd_ens1371
>>>> videobuf2_v4l2 snd_ac97_codec ac97_bus videobuf2_common snd_seq
>>>> videodev snd_pcm bluetooth rfkill mc snd_timer snd_rawmidi
>>>> ecdh_generic snd_seq_device ecc snd soundcore vmw_vmci i2c_piix4
>>>> auth_rpcgss sunrpc ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic
>>>> nvme nvme_core t10_pi crc32c_intel crc64_rocksoft serio_raw crc64
>>>> vmwgfx vmxnet3 drm_ttm_helper ata_piix ttm drm_kms_helper syscopyarea
>>>> sysfillrect sysimgblt fb_sys_fops ahci libahci drm libata
>>>> [  842.541753] ---[ end trace 0000000000000000 ]---
>>>> [  842.543403] RIP: 0010:__list_del_entry_valid.cold.3+0x37/0x4a
>>>> [  842.545170] Code: e8 02 d8 fe ff 0f 0b 48 c7 c7 c0 bb b6 b0 e8 f4
>>>> d7 fe ff 0f 0b 48 89 d1 48 89 f2 48 89 fe 48 c7 c7 70 bb b6 b0 e8 dd
>>>> d7 fe ff <0f> 0b 48 89 fe 48 c7 c7 38 bb b6 b0 e8 cc d7 fe ff 0f 0b 48
>>>> 89 ee
>>>> [  842.551346] RSP: 0018:ffffa996c0ca7de8 EFLAGS: 00010246
>>>> [  842.552999] RAX: 000000000000006d RBX: ffff9aaa8b5f0c60 RCX: 0000000000000002
>>>> [  842.555151] RDX: 0000000000000000 RSI: ffffffffb0b64d55 RDI: 00000000ffffffff
>>>> [  842.557503] RBP: ffff9aaab9b62000 R08: 0000000000000000 R09: c0000000ffff7fff
>>>> [  842.559694] R10: 0000000000000001 R11: ffffa996c0ca7c00 R12: ffffa996c0ca7e50
>>>> [  842.561956] R13: ffff9aaab9b621b0 R14: fffffffffffffd12 R15: ffff9aaab9b62198
>>>> [  842.564300] FS:  0000000000000000(0000) GS:ffff9aaafbe40000(0000)
>>>> knlGS:0000000000000000
>>>> [  842.567357] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [  842.569273] CR2: 000055a8b4e96010 CR3: 0000000003a18001 CR4: 00000000001706e0
>>>> [  842.571598] Kernel panic - not syncing: Fatal exception
>>>> [  842.573674] Kernel Offset: 0x2e800000 from 0xffffffff81000000
>>>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>> [ 1101.134589] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>>>
>>>> On Wed, Jul 27, 2022 at 1:15 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>>>
>>>>>
>>>>>> On Jul 27, 2022, at 12:18 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>>>>
>>>>>> Hi Chuck,
>>>>> Sorry for the delay, I was traveling.
>>>>>
>>>>>> To make it compile I did:
>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>> index 7196bcafdd86..f6deffc921d0 100644
>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>> @@ -1536,7 +1536,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
>>>>>>        if (status)
>>>>>>                goto out;
>>>>>>
>>>>>> -       status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
>>>>>> +       status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount);
>>>>>>        if (status)
>>>>>>                goto out;
>>>>> Yes, same bug was reported by the day-0 kbot. v1 was kind of an RFC,
>>>>> as I hadn't fully tested it. Sorry for mislabeling it.
>>>>>
>>>>> I will post a v2 of this series with this fixed and with Dai's
>>>>> fix for nfsd4_decode_copy(). Stand by.
>>>>>
>>>>>
>>>>>> But when I tried to run the nfstest_ssc. The first test (intra01) made
>>>>>> the server oops:
>>>>>>
>>>>>> [ 9569.551100] CPU: 0 PID: 2861 Comm: nfsd Not tainted 5.19.0-rc6+ #73
>>>>>> [ 9569.552385] Hardware name: VMware, Inc. VMware Virtual
>>>>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
>>>>>> [ 9569.555043] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd]
>>>>>> [ 9569.556662] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d
>>>>>> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20
>>>>>> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00
>>>>>> 48 29
>>>>>> [ 9569.561792] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282
>>>>>> [ 9569.563112] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000
>>>>>> [ 9569.565196] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008
>>>>>> [ 9569.567140] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228
>>>>>> [ 9569.568929] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00
>>>>>> [ 9569.570477] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000
>>>>>> [ 9569.572052] FS:  0000000000000000(0000) GS:ffff99b5bbe00000(0000)
>>>>>> knlGS:0000000000000000
>>>>>> [ 9569.573926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [ 9569.575281] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0
>>>>>> [ 9569.577586] Call Trace:
>>>>>> [ 9569.578220]  <TASK>
>>>>>> [ 9569.578770]  ? nfsd4_proc_compound+0x3d2/0x730 [nfsd]
>>>>>> [ 9569.579945]  nfsd4_proc_compound+0x3d2/0x730 [nfsd]
>>>>>> [ 9569.581055]  nfsd_dispatch+0x146/0x270 [nfsd]
>>>>>> [ 9569.581987]  svc_process_common+0x365/0x5c0 [sunrpc]
>>>>>> [ 9569.583122]  ? nfsd_svc+0x350/0x350 [nfsd]
>>>>>> [ 9569.583986]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
>>>>>> [ 9569.585129]  svc_process+0xb7/0xf0 [sunrpc]
>>>>>> [ 9569.586169]  nfsd+0xd5/0x190 [nfsd]
>>>>>> [ 9569.587170]  kthread+0xe8/0x110
>>>>>> [ 9569.587898]  ? kthread_complete_and_exit+0x20/0x20
>>>>>> [ 9569.588934]  ret_from_fork+0x22/0x30
>>>>>> [ 9569.589759]  </TASK>
>>>>>> [ 9569.590224] Modules linked in: rdma_ucm ib_uverbs rpcrdma rdma_cm
>>>>>> iw_cm ib_cm ib_core nfsd nfs_acl lockd grace ext4 mbcache jbd2 fuse
>>>>>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
>>>>>> nf_reject_ipv4 nft_compat nf_tables nfnetlink tun bridge stp llc bnep
>>>>>> vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event
>>>>>> intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul
>>>>>> vmw_balloon ghash_clmulni_intel joydev pcspkr btusb btrtl btbcm
>>>>>> btintel snd_ens1371 uvcvideo snd_ac97_codec videobuf2_vmalloc ac97_bus
>>>>>> videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq snd_pcm
>>>>>> videodev bluetooth mc rfkill ecdh_generic ecc snd_timer snd_rawmidi
>>>>>> snd_seq_device snd vmw_vmci soundcore i2c_piix4 auth_rpcgss sunrpc
>>>>>> ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic crc32c_intel
>>>>>> ata_piix nvme ahci libahci nvme_core t10_pi crc64_rocksoft serio_raw
>>>>>> crc64 vmwgfx drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect
>>>>>> sysimgblt fb_sys_fops vmxnet3 drm libata
>>>>>> [ 9569.610612] CR2: 0000000000000000
>>>>>> [ 9569.611375] ---[ end trace 0000000000000000 ]---
>>>>>> [ 9569.612424] RIP: 0010:nfsd4_copy+0x28b/0x4e0 [nfsd]
>>>>>> [ 9569.613472] Code: 24 38 49 89 94 24 10 01 00 00 49 8b 56 08 48 8d
>>>>>> 79 08 49 89 94 24 18 01 00 00 49 8b 56 10 48 83 e7 f8 49 89 94 24 20
>>>>>> 01 00 00 <48> 8b 06 48 89 01 48 8b 86 04 04 00 00 48 89 81 04 04 00 00
>>>>>> 48 29
>>>>>> [ 9569.617410] RSP: 0018:ffffb092c0c97dd0 EFLAGS: 00010282
>>>>>> [ 9569.618487] RAX: ffff99b5465c2460 RBX: ffff99b5a68828e0 RCX: ffff99b5853b6000
>>>>>> [ 9569.620097] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff99b5853b6008
>>>>>> [ 9569.621710] RBP: ffffb092c0c97e10 R08: ffffffffc0bf3c24 R09: 0000000000000228
>>>>>> [ 9569.623398] R10: ffff99b54b0e9268 R11: ffff99b564326998 R12: ffff99b5543dfc00
>>>>>> [ 9569.625019] R13: ffff99b5a6882950 R14: ffff99b5a68829f0 R15: ffff99b546edc000
>>>>>> [ 9569.627456] FS:  0000000000000000(0000) GS:ffff99b5bbe00000(0000)
>>>>>> knlGS:0000000000000000
>>>>>> [ 9569.629249] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [ 9569.630433] CR2: 0000000000000000 CR3: 0000000076c36002 CR4: 00000000001706f0
>>>>>> [ 9569.632043] Kernel panic - not syncing: Fatal exception
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 26, 2022 at 3:45 PM Olga Kornievskaia <aglo@umich.edu> wrote:
>>>>>>> Chuck,
>>>>>>>
>>>>>>> Are there pre-reqs for this series? I had tried to apply the patches
>>>>>>> on top of 5-19-rc6 but I get the following compile error:
>>>>>>>
>>>>>>> fs/nfsd/nfs4proc.c: In function ‘nfsd4_setup_inter_ssc’:
>>>>>>> fs/nfsd/nfs4proc.c:1539:34: error: passing argument 1 of
>>>>>>> ‘nfsd4_interssc_connect’ from incompatible pointer type
>>>>>>> [-Werror=incompatible-pointer-types]
>>>>>>> status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
>>>>>>>                                  ^~~~~~~~~~~~~
>>>>>>> fs/nfsd/nfs4proc.c:1414:43: note: expected ‘struct nl4_server *’ but
>>>>>>> argument is of type ‘struct nl4_server **’
>>>>>>> nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
>>>>>>>                        ~~~~~~~~~~~~~~~~~~~^~~
>>>>>>> cc1: some warnings being treated as errors
>>>>>>> make[2]: *** [scripts/Makefile.build:249: fs/nfsd/nfs4proc.o] Error 1
>>>>>>> make[1]: *** [scripts/Makefile.build:466: fs/nfsd] Error 2
>>>>>>> make: *** [Makefile:1843: fs] Error 2
>>>>>>>
>>>>>>> On Fri, Jul 22, 2022 at 4:36 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>>>>>>>> While testing NFSD for-next, I noticed svc_generic_init_request()
>>>>>>>> was an unexpected hot spot on NFSv4 workloads. Drilling into the
>>>>>>>> perf report, it shows that the hot path in there is:
>>>>>>>>
>>>>>>>> 1208         memset(rqstp->rq_argp, 0, procp->pc_argsize);
>>>>>>>> 1209         memset(rqstp->rq_resp, 0, procp->pc_ressize);
>>>>>>>>
>>>>>>>> For an NFSv4 COMPOUND,
>>>>>>>>
>>>>>>>>        procp->pc_argsize = sizeof(nfsd4_compoundargs),
>>>>>>>>
>>>>>>>> struct nfsd4_compoundargs on my system is more than 17KB! This is
>>>>>>>> due to the size of the iops field:
>>>>>>>>
>>>>>>>>        struct nfsd4_op                 iops[8];
>>>>>>>>
>>>>>>>> Each struct nfsd4_op contains a union of the arguments for each
>>>>>>>> NFSv4 operation. Each argument is typically less than 128 bytes
>>>>>>>> except that struct nfsd4_copy and struct nfsd4_copy_notify are both
>>>>>>>> larger than 2KB each.
>>>>>>>>
>>>>>>>> I'm not yet totally convinced this series never orphans memory, but
>>>>>>>> it does reduce the size of nfsd4_compoundargs to just over 4KB. This
>>>>>>>> is still due to struct nfsd4_copy being almost 500 bytes. I don't
>>>>>>>> see more low-hanging fruit there, though.
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> Chuck Lever (11):
>>>>>>>>      NFSD: Shrink size of struct nfsd4_copy_notify
>>>>>>>>      NFSD: Shrink size of struct nfsd4_copy
>>>>>>>>      NFSD: Reorder the fields in struct nfsd4_op
>>>>>>>>      NFSD: Make nfs4_put_copy() static
>>>>>>>>      NFSD: Make boolean fields in struct nfsd4_copy into atomic bit flags
>>>>>>>>      NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
>>>>>>>>      NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
>>>>>>>>      NFSD: Refactor nfsd4_do_copy()
>>>>>>>>      NFSD: Remove kmalloc from nfsd4_do_async_copy()
>>>>>>>>      NFSD: Add nfsd4_send_cb_offload()
>>>>>>>>      NFSD: Move copy offload callback arguments into a separate structure
>>>>>>>>
>>>>>>>>
>>>>>>>> fs/nfsd/nfs4callback.c |  37 +++++----
>>>>>>>> fs/nfsd/nfs4proc.c     | 165 +++++++++++++++++++++--------------------
>>>>>>>> fs/nfsd/nfs4xdr.c      |  30 +++++---
>>>>>>>> fs/nfsd/state.h        |   1 -
>>>>>>>> fs/nfsd/xdr4.h         |  54 ++++++++++----
>>>>>>>> 5 files changed, 163 insertions(+), 124 deletions(-)
>>>>>>>>
>>>>>>>> --
>>>>>>>> Chuck Lever
>>>>>>>>
>>>>> --
>>>>> Chuck Lever
>>>>>
>>>>>
>>>>>
>>> --
>>> Chuck Lever
>>>
>>>
>>>