diff mbox

svcrdma: Preserve CB send buffer across retransmits

Message ID 20171016161133.29152.48259.stgit@klimt.1015granger.net (mailing list archive)
State New, archived
Headers show

Commit Message

Chuck Lever Oct. 16, 2017, 4:14 p.m. UTC
During each NFSv4 callback Call, an RDMA Send completion frees the
page that contains the RPC Call message. If the upper layer
determines that a retransmit is necessary, this is too soon.

One possible symptom: after a GARBAGE_ARGS response an NFSv4.1
callback request, the following BUG fires on the NFS server:

kernel: BUG: Bad page state in process kworker/0:2H  pfn:7d3ce2
kernel: page:ffffea001f4f3880 count:-2 mapcount:0 mapping:          (null) index:0x0
kernel: flags: 0x2fffff80000000()
kernel: raw: 002fffff80000000 0000000000000000 0000000000000000 fffffffeffffffff
kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
kernel: page dumped because: nonzero _refcount
kernel: Modules linked in: cts rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
ocfs2_nodemanager ocfs2_stackglue rpcrdm a ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
rdma_cm ib_cm iw_cm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
kvm irqbypass crct10dif_pc lmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt
iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd pcspkr lpc_ich i2c_i801
mei_me mf d_core mei raid0 sg wmi ioatdma ipmi_si ipmi_devintf ipmi_msghandler shpchp
acpi_power_meter acpi_pad nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs
libcrc32c mlx4_en mlx4_ib mlx5_ib ib_core sd_mod sr_mod cdrom ast drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crc32c_intel libahci drm
mlx5_core igb libata mlx4_core dca i2c_algo_bit i2c_core nvme
kernel: ptp nvme_core pps_core dm_mirror dm_region_hash dm_log dm_mod dax
kernel: CPU: 0 PID: 11495 Comm: kworker/0:2H Not tainted 4.14.0-rc3-00001-g577ce48 #811
kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
kernel: Call Trace:
kernel: dump_stack+0x62/0x80
kernel: bad_page+0xfe/0x11a
kernel: free_pages_check_bad+0x76/0x78
kernel: free_pcppages_bulk+0x364/0x441
kernel: ? ttwu_do_activate.isra.61+0x71/0x78
kernel: free_hot_cold_page+0x1c5/0x202
kernel: __put_page+0x2c/0x36
kernel: svc_rdma_put_context+0xd9/0xe4 [rpcrdma]
kernel: svc_rdma_wc_send+0x50/0x98 [rpcrdma]

This issue exists all the way back to v4.5, but refactoring and code
re-organization prevents this simple patch from applying to kernels
older than v4.12. The fix is the same, however, if someone needs to
backport it.

Reported-by: Ben Coddington <bcodding@redhat.com>
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=314
Fixes: 5d252f90a800 ('svcrdma: Add class for RDMA backwards ... ')
Cc: stable@vger.kernel.org # v4.12
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
---
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Hey Bruce-

It would be great if this could go into v4.14-rc, but I can live
with v4.15.

Jeff's review suggested adding a comment documenting the requirement
that rqst->rq_buffer is backed by a single page. However, all of the
server-side transport mechanics are page-based, so I'm not sure this
wouldn't be a redundant comment. Suggestions welcome.



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Chuck Lever Oct. 23, 2017, 6:12 p.m. UTC | #1
> On Oct 16, 2017, at 12:14 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> During each NFSv4 callback Call, an RDMA Send completion frees the
> page that contains the RPC Call message. If the upper layer
> determines that a retransmit is necessary, this is too soon.
> 
> One possible symptom: after a GARBAGE_ARGS response an NFSv4.1
> callback request, the following BUG fires on the NFS server:
> 
> kernel: BUG: Bad page state in process kworker/0:2H  pfn:7d3ce2
> kernel: page:ffffea001f4f3880 count:-2 mapcount:0 mapping:          (null) index:0x0
> kernel: flags: 0x2fffff80000000()
> kernel: raw: 002fffff80000000 0000000000000000 0000000000000000 fffffffeffffffff
> kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
> kernel: page dumped because: nonzero _refcount
> kernel: Modules linked in: cts rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
> ocfs2_nodemanager ocfs2_stackglue rpcrdm a ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
> rdma_cm ib_cm iw_cm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
> kvm irqbypass crct10dif_pc lmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt
> iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd pcspkr lpc_ich i2c_i801
> mei_me mf d_core mei raid0 sg wmi ioatdma ipmi_si ipmi_devintf ipmi_msghandler shpchp
> acpi_power_meter acpi_pad nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs
> libcrc32c mlx4_en mlx4_ib mlx5_ib ib_core sd_mod sr_mod cdrom ast drm_kms_helper
> syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crc32c_intel libahci drm
> mlx5_core igb libata mlx4_core dca i2c_algo_bit i2c_core nvme
> kernel: ptp nvme_core pps_core dm_mirror dm_region_hash dm_log dm_mod dax
> kernel: CPU: 0 PID: 11495 Comm: kworker/0:2H Not tainted 4.14.0-rc3-00001-g577ce48 #811
> kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
> kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> kernel: Call Trace:
> kernel: dump_stack+0x62/0x80
> kernel: bad_page+0xfe/0x11a
> kernel: free_pages_check_bad+0x76/0x78
> kernel: free_pcppages_bulk+0x364/0x441
> kernel: ? ttwu_do_activate.isra.61+0x71/0x78
> kernel: free_hot_cold_page+0x1c5/0x202
> kernel: __put_page+0x2c/0x36
> kernel: svc_rdma_put_context+0xd9/0xe4 [rpcrdma]
> kernel: svc_rdma_wc_send+0x50/0x98 [rpcrdma]
> 
> This issue exists all the way back to v4.5, but refactoring and code
> re-organization prevents this simple patch from applying to kernels
> older than v4.12. The fix is the same, however, if someone needs to
> backport it.
> 
> Reported-by: Ben Coddington <bcodding@redhat.com>
> BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=314
> Fixes: 5d252f90a800 ('svcrdma: Add class for RDMA backwards ... ')
> Cc: stable@vger.kernel.org # v4.12
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Reviewed-by: Jeff Layton <jlayton@redhat.com>
> ---
> net/sunrpc/xprtrdma/svc_rdma_backchannel.c |    6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
> 
> Hey Bruce-
> 
> It would be great if this could go into v4.14-rc, but I can live
> with v4.15.

Ping!


> Jeff's review suggested adding a comment documenting the requirement
> that rqst->rq_buffer is backed by a single page. However, all of the
> server-side transport mechanics are page-based, so I'm not sure this
> wouldn't be a redundant comment. Suggestions welcome.
> 
> 
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> index ec37ad8..1854db2 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> @@ -132,6 +132,10 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
> 	if (ret)
> 		goto out_err;
> 
> +	/* Bump page refcnt so Send completion doesn't release
> +	 * the rq_buffer before all retransmits are complete.
> +	 */
> +	get_page(virt_to_page(rqst->rq_buffer));
> 	ret = svc_rdma_post_send_wr(rdma, ctxt, 1, 0);
> 	if (ret)
> 		goto out_unmap;
> @@ -164,7 +168,6 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
> 		return -EINVAL;
> 	}
> 
> -	/* svc_rdma_sendto releases this page */
> 	page = alloc_page(RPCRDMA_DEF_GFP);
> 	if (!page)
> 		return -ENOMEM;
> @@ -183,6 +186,7 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
> {
> 	struct rpc_rqst *rqst = task->tk_rqstp;
> 
> +	put_page(virt_to_page(rqst->rq_buffer));
> 	kfree(rqst->rq_rbuffer);
> }
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields Oct. 23, 2017, 8:41 p.m. UTC | #2
Thanks for the ping, apologies for the delay.

On Mon, Oct 16, 2017 at 12:14:33PM -0400, Chuck Lever wrote:
> During each NFSv4 callback Call, an RDMA Send completion frees the
> page that contains the RPC Call message. If the upper layer
> determines that a retransmit is necessary, this is too soon.
> 
> One possible symptom: after a GARBAGE_ARGS response an NFSv4.1
> callback request, the following BUG fires on the NFS server:
...
> It would be great if this could go into v4.14-rc, but I can live
> with v4.15.

OK, I'll be slightly lazy and queue this up for 4.15....

> Jeff's review suggested adding a comment documenting the requirement
> that rqst->rq_buffer is backed by a single page. However, all of the
> server-side transport mechanics are page-based, so I'm not sure this
> wouldn't be a redundant comment. Suggestions welcome.

If it's always a page then I guess the way to make it ridiculously
obvious would be to keep a pointer to the page instead and use
page_address() as necessary, but.... I'm guessing that's inconvenient
for other users.

So, no suggestions.

--b.

> diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> index ec37ad8..1854db2 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> @@ -132,6 +132,10 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
>  	if (ret)
>  		goto out_err;
>  
> +	/* Bump page refcnt so Send completion doesn't release
> +	 * the rq_buffer before all retransmits are complete.
> +	 */
> +	get_page(virt_to_page(rqst->rq_buffer));
>  	ret = svc_rdma_post_send_wr(rdma, ctxt, 1, 0);
>  	if (ret)
>  		goto out_unmap;
> @@ -164,7 +168,6 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
>  		return -EINVAL;
>  	}
>  
> -	/* svc_rdma_sendto releases this page */
>  	page = alloc_page(RPCRDMA_DEF_GFP);
>  	if (!page)
>  		return -ENOMEM;
> @@ -183,6 +186,7 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
>  {
>  	struct rpc_rqst *rqst = task->tk_rqstp;
>  
> +	put_page(virt_to_page(rqst->rq_buffer));
>  	kfree(rqst->rq_rbuffer);
>  }
>  
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index ec37ad8..1854db2 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -132,6 +132,10 @@  static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
 	if (ret)
 		goto out_err;
 
+	/* Bump page refcnt so Send completion doesn't release
+	 * the rq_buffer before all retransmits are complete.
+	 */
+	get_page(virt_to_page(rqst->rq_buffer));
 	ret = svc_rdma_post_send_wr(rdma, ctxt, 1, 0);
 	if (ret)
 		goto out_unmap;
@@ -164,7 +168,6 @@  static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
 		return -EINVAL;
 	}
 
-	/* svc_rdma_sendto releases this page */
 	page = alloc_page(RPCRDMA_DEF_GFP);
 	if (!page)
 		return -ENOMEM;
@@ -183,6 +186,7 @@  static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
 {
 	struct rpc_rqst *rqst = task->tk_rqstp;
 
+	put_page(virt_to_page(rqst->rq_buffer));
 	kfree(rqst->rq_rbuffer);
 }