diff mbox

[v2,1/9] svcrdma: Do not send XDR roundup bytes for a write chunk

Message ID 20160208172430.13423.18896.stgit@klimt.1015granger.net (mailing list archive)
State New, archived
Headers show

Commit Message

Chuck Lever Feb. 8, 2016, 5:24 p.m. UTC
When the Linux server writes an odd-length data item into a Write
chunk, it finishes with XDR pad bytes. If the data item is smaller
than the Write chunk, the pad bytes are written at the end of the
data item, but still inside the chunk. That can expose these zero
bytes to the RPC consumer on the client.

XDR pad bytes are inserted in order to preserve the XDR alignment
of the next XDR data item in an XDR stream. But Write chunks do not
appear in the payload XDR stream, and only one data item is allowed
in each chunk. So XDR padding is unneeded.

Thus the server should not write XDR pad bytes in Write chunks.

I believe this is not an operational problem. Short NFS READs that
are returned in a Write chunk would be affected by this issue. But
they happen only when the read request goes past the EOF. Those are
zero bytes anyway, and there's no file data in the client's buffer
past EOF.

Otherwise, current NFS clients provide a separate extra segment for
catching XDR padding. If an odd-size data item fills the chunk,
then the XDR pad will be written to the extra segment.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-By: Devesh Sharma <devesh.sharma@broadcom.com>
Tested-By: Devesh Sharma <devesh.sharma@broadcom.com>
---
 net/sunrpc/xprtrdma/svc_rdma_sendto.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

J. Bruce Fields Feb. 8, 2016, 7:21 p.m. UTC | #1
On Mon, Feb 08, 2016 at 12:24:31PM -0500, Chuck Lever wrote:
> When the Linux server writes an odd-length data item into a Write
> chunk, it finishes with XDR pad bytes. If the data item is smaller
> than the Write chunk, the pad bytes are written at the end of the
> data item, but still inside the chunk. That can expose these zero
> bytes to the RPC consumer on the client.
> 
> XDR pad bytes are inserted in order to preserve the XDR alignment
> of the next XDR data item in an XDR stream. But Write chunks do not
> appear in the payload XDR stream, and only one data item is allowed
> in each chunk. So XDR padding is unneeded.
> 
> Thus the server should not write XDR pad bytes in Write chunks.
> 
> I believe this is not an operational problem. Short NFS READs that
> are returned in a Write chunk would be affected by this issue. But
> they happen only when the read request goes past the EOF. Those are
> zero bytes anyway, and there's no file data in the client's buffer
> past EOF.
> 
> Otherwise, current NFS clients provide a separate extra segment for
> catching XDR padding. If an odd-size data item fills the chunk,
> then the XDR pad will be written to the extra segment.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Reviewed-By: Devesh Sharma <devesh.sharma@broadcom.com>
> Tested-By: Devesh Sharma <devesh.sharma@broadcom.com>
> ---
>  net/sunrpc/xprtrdma/svc_rdma_sendto.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> index df57f3c..8591314 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> @@ -308,7 +308,7 @@ static int send_write_chunks(struct svcxprt_rdma *xprt,
>  			     struct svc_rqst *rqstp,
>  			     struct svc_rdma_req_map *vec)
>  {
> -	u32 xfer_len = rqstp->rq_res.page_len + rqstp->rq_res.tail[0].iov_len;
> +	u32 xfer_len = rqstp->rq_res.page_len;

Is this just unconditionally dropping the tail of the xdr_buf?

The tail isn't necessarily only padding.  For example, I believe that
the results of the GETATTR in a compound like:

	PUTFH
	READ
	GETATTR

will also go in the tail.

(But maybe you're sending the rest of the tail somewhere else, I'm not
very familiar with the RDMA code....)

--b.

>  	int write_len;
>  	u32 xdr_off;
>  	int chunk_off;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever Feb. 8, 2016, 8:03 p.m. UTC | #2
> On Feb 8, 2016, at 2:21 PM, bfields@fieldses.org wrote:
> 
> On Mon, Feb 08, 2016 at 12:24:31PM -0500, Chuck Lever wrote:
>> When the Linux server writes an odd-length data item into a Write
>> chunk, it finishes with XDR pad bytes. If the data item is smaller
>> than the Write chunk, the pad bytes are written at the end of the
>> data item, but still inside the chunk. That can expose these zero
>> bytes to the RPC consumer on the client.
>> 
>> XDR pad bytes are inserted in order to preserve the XDR alignment
>> of the next XDR data item in an XDR stream. But Write chunks do not
>> appear in the payload XDR stream, and only one data item is allowed
>> in each chunk. So XDR padding is unneeded.
>> 
>> Thus the server should not write XDR pad bytes in Write chunks.
>> 
>> I believe this is not an operational problem. Short NFS READs that
>> are returned in a Write chunk would be affected by this issue. But
>> they happen only when the read request goes past the EOF. Those are
>> zero bytes anyway, and there's no file data in the client's buffer
>> past EOF.
>> 
>> Otherwise, current NFS clients provide a separate extra segment for
>> catching XDR padding. If an odd-size data item fills the chunk,
>> then the XDR pad will be written to the extra segment.
>> 
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> Reviewed-By: Devesh Sharma <devesh.sharma@broadcom.com>
>> Tested-By: Devesh Sharma <devesh.sharma@broadcom.com>
>> ---
>> net/sunrpc/xprtrdma/svc_rdma_sendto.c |    2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>> index df57f3c..8591314 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
>> @@ -308,7 +308,7 @@ static int send_write_chunks(struct svcxprt_rdma *xprt,
>> 			     struct svc_rqst *rqstp,
>> 			     struct svc_rdma_req_map *vec)
>> {
>> -	u32 xfer_len = rqstp->rq_res.page_len + rqstp->rq_res.tail[0].iov_len;
>> +	u32 xfer_len = rqstp->rq_res.page_len;
> 
> Is this just unconditionally dropping the tail of the xdr_buf?
> 
> The tail isn't necessarily only padding.  For example, I believe that
> the results of the GETATTR in a compound like:
> 
> 	PUTFH
> 	READ
> 	GETATTR
> 
> will also go in the tail.

nfsd4_encode_splice_read() does indeed put the remaining
compound operations in xdr_buf.tail, as it should.

However, the RDMA transport must not send these remaining
results as part of a write chunk. It has to send them
either inline or in a reply chunk, following the contents
of xdr_buf.head.

If the server is putting trailing compound operations in
Write chunks, that's wrong. I haven't seen any problems
that could be attributed to this in my testing, but
maybe clients just put up with it without complaint.


> (But maybe you're sending the rest of the tail somewhere else, I'm not
> very familiar with the RDMA code....)

We may have a little problem if a Write chunk has a pad
_and_ there are subsequent results. Putting the pad in
the tail moves those results to the wrong XDR alignment
in the reply.

So I think this hunk is correct, but I will need to verify
that the logic in svc_rdma_sendto.c does the right thing
with the xdr_buf's tail.

Thanks for the review.


> --b.
> 
>> 	int write_len;
>> 	u32 xdr_off;
>> 	int chunk_off;

--
Chuck Lever




--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index df57f3c..8591314 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -308,7 +308,7 @@  static int send_write_chunks(struct svcxprt_rdma *xprt,
 			     struct svc_rqst *rqstp,
 			     struct svc_rdma_req_map *vec)
 {
-	u32 xfer_len = rqstp->rq_res.page_len + rqstp->rq_res.tail[0].iov_len;
+	u32 xfer_len = rqstp->rq_res.page_len;
 	int write_len;
 	u32 xdr_off;
 	int chunk_off;