Message ID | 20171008225546.16538.99253.stgit@klimt.1015granger.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> On Oct 8, 2017, at 7:13 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > > During each NFSv4 callback Call, an RDMA Send completion frees the > page that contains the RPC Call message. If the upper layer determines > that a retransmit is necessary, this is too soon. > > One possible symptom: after a GARBAGE_ARGS response an NFSv4.1 > callback request, the following BUG fires on the NFS server: > > kernel: BUG: Bad page state in process kworker/0:2H pfn:7d3ce2 > kernel: page:ffffea001f4f3880 count:-2 mapcount:0 mapping: (null) index:0x0 > kernel: flags: 0x2fffff80000000() > kernel: raw: 002fffff80000000 0000000000000000 0000000000000000 fffffffeffffffff > kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 > kernel: page dumped because: nonzero _refcount > kernel: Modules linked in: cts rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm > ocfs2_nodemanager ocfs2_stackglue rpcrdm a ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad > rdma_cm ib_cm iw_cm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel > kvm irqbypass crct10dif_pc lmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt > iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd pcspkr lpc_ich i2c_i801 > mei_me mf d_core mei raid0 sg wmi ioatdma ipmi_si ipmi_devintf ipmi_msghandler shpchp > acpi_power_meter acpi_pad nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs > libcrc32c mlx4_en mlx4_ib mlx5_ib ib_core sd_mod sr_mod cdrom ast drm_kms_helper > syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crc32c_intel libahci drm > mlx5_core igb libata mlx4_core dca i2c_algo_bit i2c_core nvme > kernel: ptp nvme_core pps_core dm_mirror dm_region_hash dm_log dm_mod dax > kernel: CPU: 0 PID: 11495 Comm: kworker/0:2H Not tainted 4.14.0-rc3-00001-g577ce48 #811 > kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015 > kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] > kernel: Call Trace: > kernel: dump_stack+0x62/0x80 > kernel: bad_page+0xfe/0x11a > kernel: free_pages_check_bad+0x76/0x78 > kernel: free_pcppages_bulk+0x364/0x441 > kernel: ? ttwu_do_activate.isra.61+0x71/0x78 > kernel: free_hot_cold_page+0x1c5/0x202 > kernel: __put_page+0x2c/0x36 > kernel: svc_rdma_put_context+0xd9/0xe4 [rpcrdma] > kernel: svc_rdma_wc_send+0x50/0x98 [rpcrdma] > > This issue exists all the way back to v4.5, but refactoring and code > re-organization prevents this simple patch from applying to kernels > older than v4.12. The fix is the same, however, if someone needs to > backport it. > > Reported-by: Ben Coddington <bcodding@redhat.com> > Fixes: 5d252f90a800 ('svcrdma: Add class for RDMA backwards ... ') > Cc: stable@vger.kernel.org # v4.12 > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Ping. Reviewed-by gratefully accepted! It would be good if this patch can get into 4.14. > --- > net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > Hi Ben, Jeff- > > I was able to reproduce the problem here with a rigged Linux client. > This is my proposed fix. Fix also tested with a prototype Solaris > client similar to the one that exposed the problem. > > > diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c > index ec37ad8..1854db2 100644 > --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c > +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c > @@ -132,6 +132,10 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma, > if (ret) > goto out_err; > > + /* Bump page refcnt so Send completion doesn't release > + * the rq_buffer before all retransmits are complete. > + */ > + get_page(virt_to_page(rqst->rq_buffer)); > ret = svc_rdma_post_send_wr(rdma, ctxt, 1, 0); > if (ret) > goto out_unmap; > @@ -164,7 +168,6 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma, > return -EINVAL; > } > > - /* svc_rdma_sendto releases this page */ > page = alloc_page(RPCRDMA_DEF_GFP); > if (!page) > return -ENOMEM; > @@ -183,6 +186,7 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma, > { > struct rpc_rqst *rqst = task->tk_rqstp; > > + put_page(virt_to_page(rqst->rq_buffer)); > kfree(rqst->rq_rbuffer); > } > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2017-10-08 at 19:13 -0400, Chuck Lever wrote: > During each NFSv4 callback Call, an RDMA Send completion frees the > page that contains the RPC Call message. If the upper layer determines > that a retransmit is necessary, this is too soon. > > One possible symptom: after a GARBAGE_ARGS response an NFSv4.1 > callback request, the following BUG fires on the NFS server: > > kernel: BUG: Bad page state in process kworker/0:2H pfn:7d3ce2 > kernel: page:ffffea001f4f3880 count:-2 mapcount:0 mapping: (null) index:0x0 > kernel: flags: 0x2fffff80000000() > kernel: raw: 002fffff80000000 0000000000000000 0000000000000000 fffffffeffffffff > kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 > kernel: page dumped because: nonzero _refcount > kernel: Modules linked in: cts rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm > ocfs2_nodemanager ocfs2_stackglue rpcrdm a ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad > rdma_cm ib_cm iw_cm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel > kvm irqbypass crct10dif_pc lmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt > iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd pcspkr lpc_ich i2c_i801 > mei_me mf d_core mei raid0 sg wmi ioatdma ipmi_si ipmi_devintf ipmi_msghandler shpchp > acpi_power_meter acpi_pad nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs > libcrc32c mlx4_en mlx4_ib mlx5_ib ib_core sd_mod sr_mod cdrom ast drm_kms_helper > syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crc32c_intel libahci drm > mlx5_core igb libata mlx4_core dca i2c_algo_bit i2c_core nvme > kernel: ptp nvme_core pps_core dm_mirror dm_region_hash dm_log dm_mod dax > kernel: CPU: 0 PID: 11495 Comm: kworker/0:2H Not tainted 4.14.0-rc3-00001-g577ce48 #811 > kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015 > kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] > kernel: Call Trace: > kernel: dump_stack+0x62/0x80 > kernel: bad_page+0xfe/0x11a > kernel: free_pages_check_bad+0x76/0x78 > kernel: free_pcppages_bulk+0x364/0x441 > kernel: ? ttwu_do_activate.isra.61+0x71/0x78 > kernel: free_hot_cold_page+0x1c5/0x202 > kernel: __put_page+0x2c/0x36 > kernel: svc_rdma_put_context+0xd9/0xe4 [rpcrdma] > kernel: svc_rdma_wc_send+0x50/0x98 [rpcrdma] > > This issue exists all the way back to v4.5, but refactoring and code > re-organization prevents this simple patch from applying to kernels > older than v4.12. The fix is the same, however, if someone needs to > backport it. > > Reported-by: Ben Coddington <bcodding@redhat.com> > Fixes: 5d252f90a800 ('svcrdma: Add class for RDMA backwards ... ') > Cc: stable@vger.kernel.org # v4.12 > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > Hi Ben, Jeff- > > I was able to reproduce the problem here with a rigged Linux client. > This is my proposed fix. Fix also tested with a prototype Solaris > client similar to the one that exposed the problem. > > > diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c > index ec37ad8..1854db2 100644 > --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c > +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c > @@ -132,6 +132,10 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma, > if (ret) > goto out_err; > > + /* Bump page refcnt so Send completion doesn't release > + * the rq_buffer before all retransmits are complete. > + */ > + get_page(virt_to_page(rqst->rq_buffer)); Looks fairly reasonable, but is this enough? Could rq_buffer be larger than a page? > ret = svc_rdma_post_send_wr(rdma, ctxt, 1, 0); > if (ret) > goto out_unmap; > @@ -164,7 +168,6 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma, > return -EINVAL; > } > > - /* svc_rdma_sendto releases this page */ > page = alloc_page(RPCRDMA_DEF_GFP); > if (!page) > return -ENOMEM; > @@ -183,6 +186,7 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma, > { > struct rpc_rqst *rqst = task->tk_rqstp; > > + put_page(virt_to_page(rqst->rq_buffer)); > kfree(rqst->rq_rbuffer); > } > >
> On Oct 14, 2017, at 6:38 AM, Jeff Layton <jlayton@redhat.com> wrote: > > On Sun, 2017-10-08 at 19:13 -0400, Chuck Lever wrote: >> During each NFSv4 callback Call, an RDMA Send completion frees the >> page that contains the RPC Call message. If the upper layer determines >> that a retransmit is necessary, this is too soon. >> >> One possible symptom: after a GARBAGE_ARGS response an NFSv4.1 >> callback request, the following BUG fires on the NFS server: >> >> kernel: BUG: Bad page state in process kworker/0:2H pfn:7d3ce2 >> kernel: page:ffffea001f4f3880 count:-2 mapcount:0 mapping: (null) index:0x0 >> kernel: flags: 0x2fffff80000000() >> kernel: raw: 002fffff80000000 0000000000000000 0000000000000000 fffffffeffffffff >> kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 >> kernel: page dumped because: nonzero _refcount >> kernel: Modules linked in: cts rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm >> ocfs2_nodemanager ocfs2_stackglue rpcrdm a ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad >> rdma_cm ib_cm iw_cm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel >> kvm irqbypass crct10dif_pc lmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt >> iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd pcspkr lpc_ich i2c_i801 >> mei_me mf d_core mei raid0 sg wmi ioatdma ipmi_si ipmi_devintf ipmi_msghandler shpchp >> acpi_power_meter acpi_pad nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs >> libcrc32c mlx4_en mlx4_ib mlx5_ib ib_core sd_mod sr_mod cdrom ast drm_kms_helper >> syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crc32c_intel libahci drm >> mlx5_core igb libata mlx4_core dca i2c_algo_bit i2c_core nvme >> kernel: ptp nvme_core pps_core dm_mirror dm_region_hash dm_log dm_mod dax >> kernel: CPU: 0 PID: 11495 Comm: kworker/0:2H Not tainted 4.14.0-rc3-00001-g577ce48 #811 >> kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015 >> kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] >> kernel: Call Trace: >> kernel: dump_stack+0x62/0x80 >> kernel: bad_page+0xfe/0x11a >> kernel: free_pages_check_bad+0x76/0x78 >> kernel: free_pcppages_bulk+0x364/0x441 >> kernel: ? ttwu_do_activate.isra.61+0x71/0x78 >> kernel: free_hot_cold_page+0x1c5/0x202 >> kernel: __put_page+0x2c/0x36 >> kernel: svc_rdma_put_context+0xd9/0xe4 [rpcrdma] >> kernel: svc_rdma_wc_send+0x50/0x98 [rpcrdma] >> >> This issue exists all the way back to v4.5, but refactoring and code >> re-organization prevents this simple patch from applying to kernels >> older than v4.12. The fix is the same, however, if someone needs to >> backport it. >> >> Reported-by: Ben Coddington <bcodding@redhat.com> >> Fixes: 5d252f90a800 ('svcrdma: Add class for RDMA backwards ... ') >> Cc: stable@vger.kernel.org # v4.12 >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> >> --- >> net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> Hi Ben, Jeff- >> >> I was able to reproduce the problem here with a rigged Linux client. >> This is my proposed fix. Fix also tested with a prototype Solaris >> client similar to the one that exposed the problem. >> >> >> diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c >> index ec37ad8..1854db2 100644 >> --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c >> +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c >> @@ -132,6 +132,10 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma, >> if (ret) >> goto out_err; >> >> + /* Bump page refcnt so Send completion doesn't release >> + * the rq_buffer before all retransmits are complete. >> + */ >> + get_page(virt_to_page(rqst->rq_buffer)); > > Looks fairly reasonable, but is this enough? Could rq_buffer be larger > than a page? Not in the current implementation. See xprt_rdma_bc_allocate. Thanks for the review! >> ret = svc_rdma_post_send_wr(rdma, ctxt, 1, 0); >> if (ret) >> goto out_unmap; >> @@ -164,7 +168,6 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma, >> return -EINVAL; >> } >> >> - /* svc_rdma_sendto releases this page */ >> page = alloc_page(RPCRDMA_DEF_GFP); >> if (!page) >> return -ENOMEM; >> @@ -183,6 +186,7 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma, >> { >> struct rpc_rqst *rqst = task->tk_rqstp; >> >> + put_page(virt_to_page(rqst->rq_buffer)); >> kfree(rqst->rq_rbuffer); >> } >> >> > > -- > Jeff Layton <jlayton@redhat.com> > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2017-10-14 at 09:29 -0400, Chuck Lever wrote: > > On Oct 14, 2017, at 6:38 AM, Jeff Layton <jlayton@redhat.com> wrote: > > > > On Sun, 2017-10-08 at 19:13 -0400, Chuck Lever wrote: > > > During each NFSv4 callback Call, an RDMA Send completion frees the > > > page that contains the RPC Call message. If the upper layer determines > > > that a retransmit is necessary, this is too soon. > > > > > > One possible symptom: after a GARBAGE_ARGS response an NFSv4.1 > > > callback request, the following BUG fires on the NFS server: > > > > > > kernel: BUG: Bad page state in process kworker/0:2H pfn:7d3ce2 > > > kernel: page:ffffea001f4f3880 count:-2 mapcount:0 mapping: (null) index:0x0 > > > kernel: flags: 0x2fffff80000000() > > > kernel: raw: 002fffff80000000 0000000000000000 0000000000000000 fffffffeffffffff > > > kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 > > > kernel: page dumped because: nonzero _refcount > > > kernel: Modules linked in: cts rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm > > > ocfs2_nodemanager ocfs2_stackglue rpcrdm a ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad > > > rdma_cm ib_cm iw_cm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel > > > kvm irqbypass crct10dif_pc lmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt > > > iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd pcspkr lpc_ich i2c_i801 > > > mei_me mf d_core mei raid0 sg wmi ioatdma ipmi_si ipmi_devintf ipmi_msghandler shpchp > > > acpi_power_meter acpi_pad nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs > > > libcrc32c mlx4_en mlx4_ib mlx5_ib ib_core sd_mod sr_mod cdrom ast drm_kms_helper > > > syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crc32c_intel libahci drm > > > mlx5_core igb libata mlx4_core dca i2c_algo_bit i2c_core nvme > > > kernel: ptp nvme_core pps_core dm_mirror dm_region_hash dm_log dm_mod dax > > > kernel: CPU: 0 PID: 11495 Comm: kworker/0:2H Not tainted 4.14.0-rc3-00001-g577ce48 #811 > > > kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015 > > > kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] > > > kernel: Call Trace: > > > kernel: dump_stack+0x62/0x80 > > > kernel: bad_page+0xfe/0x11a > > > kernel: free_pages_check_bad+0x76/0x78 > > > kernel: free_pcppages_bulk+0x364/0x441 > > > kernel: ? ttwu_do_activate.isra.61+0x71/0x78 > > > kernel: free_hot_cold_page+0x1c5/0x202 > > > kernel: __put_page+0x2c/0x36 > > > kernel: svc_rdma_put_context+0xd9/0xe4 [rpcrdma] > > > kernel: svc_rdma_wc_send+0x50/0x98 [rpcrdma] > > > > > > This issue exists all the way back to v4.5, but refactoring and code > > > re-organization prevents this simple patch from applying to kernels > > > older than v4.12. The fix is the same, however, if someone needs to > > > backport it. > > > > > > Reported-by: Ben Coddington <bcodding@redhat.com> > > > Fixes: 5d252f90a800 ('svcrdma: Add class for RDMA backwards ... ') > > > Cc: stable@vger.kernel.org # v4.12 > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > > --- > > > net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++++- > > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > > Hi Ben, Jeff- > > > > > > I was able to reproduce the problem here with a rigged Linux client. > > > This is my proposed fix. Fix also tested with a prototype Solaris > > > client similar to the one that exposed the problem. > > > > > > > > > diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c > > > index ec37ad8..1854db2 100644 > > > --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c > > > +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c > > > @@ -132,6 +132,10 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma, > > > if (ret) > > > goto out_err; > > > > > > + /* Bump page refcnt so Send completion doesn't release > > > + * the rq_buffer before all retransmits are complete. > > > + */ > > > + get_page(virt_to_page(rqst->rq_buffer)); > > > > Looks fairly reasonable, but is this enough? Could rq_buffer be larger > > than a page? > > Not in the current implementation. See xprt_rdma_bc_allocate. > > Thanks for the review! > No problem. > > > > ret = svc_rdma_post_send_wr(rdma, ctxt, 1, 0); > > > if (ret) > > > goto out_unmap; > > > @@ -164,7 +168,6 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma, > > > return -EINVAL; > > > } > > > > > > - /* svc_rdma_sendto releases this page */ > > > page = alloc_page(RPCRDMA_DEF_GFP); > > > if (!page) > > > return -ENOMEM; > > > @@ -183,6 +186,7 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma, > > > { > > > struct rpc_rqst *rqst = task->tk_rqstp; > > > > > > + put_page(virt_to_page(rqst->rq_buffer)); > > > kfree(rqst->rq_rbuffer); > > > } > > > > > > > > > > -- > > Jeff Layton <jlayton@redhat.com> > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Chuck Lever > > > Might be worth making sure the thing isn't larger than a page with a WARN_ON, or at least a comment in there in case that ever changes. Otherwise, I think this is fine: Reviewed-by: Jeff Layton <jlayton@redhat.com> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c index ec37ad8..1854db2 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c @@ -132,6 +132,10 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma, if (ret) goto out_err; + /* Bump page refcnt so Send completion doesn't release + * the rq_buffer before all retransmits are complete. + */ + get_page(virt_to_page(rqst->rq_buffer)); ret = svc_rdma_post_send_wr(rdma, ctxt, 1, 0); if (ret) goto out_unmap; @@ -164,7 +168,6 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma, return -EINVAL; } - /* svc_rdma_sendto releases this page */ page = alloc_page(RPCRDMA_DEF_GFP); if (!page) return -ENOMEM; @@ -183,6 +186,7 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma, { struct rpc_rqst *rqst = task->tk_rqstp; + put_page(virt_to_page(rqst->rq_buffer)); kfree(rqst->rq_rbuffer); }
During each NFSv4 callback Call, an RDMA Send completion frees the page that contains the RPC Call message. If the upper layer determines that a retransmit is necessary, this is too soon. One possible symptom: after a GARBAGE_ARGS response an NFSv4.1 callback request, the following BUG fires on the NFS server: kernel: BUG: Bad page state in process kworker/0:2H pfn:7d3ce2 kernel: page:ffffea001f4f3880 count:-2 mapcount:0 mapping: (null) index:0x0 kernel: flags: 0x2fffff80000000() kernel: raw: 002fffff80000000 0000000000000000 0000000000000000 fffffffeffffffff kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 kernel: page dumped because: nonzero _refcount kernel: Modules linked in: cts rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue rpcrdm a ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pc lmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd pcspkr lpc_ich i2c_i801 mei_me mf d_core mei raid0 sg wmi ioatdma ipmi_si ipmi_devintf ipmi_msghandler shpchp acpi_power_meter acpi_pad nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en mlx4_ib mlx5_ib ib_core sd_mod sr_mod cdrom ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crc32c_intel libahci drm mlx5_core igb libata mlx4_core dca i2c_algo_bit i2c_core nvme kernel: ptp nvme_core pps_core dm_mirror dm_region_hash dm_log dm_mod dax kernel: CPU: 0 PID: 11495 Comm: kworker/0:2H Not tainted 4.14.0-rc3-00001-g577ce48 #811 kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015 kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] kernel: Call Trace: kernel: dump_stack+0x62/0x80 kernel: bad_page+0xfe/0x11a kernel: free_pages_check_bad+0x76/0x78 kernel: free_pcppages_bulk+0x364/0x441 kernel: ? ttwu_do_activate.isra.61+0x71/0x78 kernel: free_hot_cold_page+0x1c5/0x202 kernel: __put_page+0x2c/0x36 kernel: svc_rdma_put_context+0xd9/0xe4 [rpcrdma] kernel: svc_rdma_wc_send+0x50/0x98 [rpcrdma] This issue exists all the way back to v4.5, but refactoring and code re-organization prevents this simple patch from applying to kernels older than v4.12. The fix is the same, however, if someone needs to backport it. Reported-by: Ben Coddington <bcodding@redhat.com> Fixes: 5d252f90a800 ('svcrdma: Add class for RDMA backwards ... ') Cc: stable@vger.kernel.org # v4.12 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) Hi Ben, Jeff- I was able to reproduce the problem here with a rigged Linux client. This is my proposed fix. Fix also tested with a prototype Solaris client similar to the one that exposed the problem. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html