[RFC] nfsd: Fix NFSv4 READ on RDMA when using readv
diff mbox series

Message ID 20200115202647.2172.666.stgit@bazille.1015granger.net
State New
Headers show
Series
  • [RFC] nfsd: Fix NFSv4 READ on RDMA when using readv
Related show

Commit Message

Chuck Lever Jan. 15, 2020, 8:37 p.m. UTC
svcrdma expects that the READ payload falls precisely into the
xdr_buf's page vector. Adding "xdr->iov = NULL" forces
xdr_reserve_space() to always use pages from xdr->buf->pages when
calling nfsd_readv.

Also, the XDR padding is problematic. For NFS/RDMA Write chunks,
the padding needs to be in xdr->buf->tail so that the transport can
skip over it. However for NFS/TCP and the NFS/RDMA Reply chunks,
the padding has to be retained. Not yet sure how to add this.

Fixes: b04209806384 ("nfsd4: allow exotic read compounds")
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=198053
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
Howdy Bruce-

I'm struggling with nfsd4_encode_readv().

- for NFS/RDMA Write chunks, the READ payload has to be in
  buf->pages. I've fixed that.

- xdr_reserve_space() calls don't need to explicitly align the
  @nbytes argument: xdr_reserve_space() already does this?

- the while loop probably won't work if a later READ in the COMPOUND
  doesn't start on a page boundary. This isn't a problem until we
  run into a Solaris client in forcedirectio mode.

- the XDR padding doesn't work for NFS/RDMA Write chunks, which are
  supposed to skip padding altogether.

Do you have suggestions? Thanks in advance.


 fs/nfsd/nfs4xdr.c |   17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

Comments

Chuck Lever Jan. 17, 2020, 6:39 p.m. UTC | #1
> On Jan 15, 2020, at 3:37 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> svcrdma expects that the READ payload falls precisely into the
> xdr_buf's page vector. Adding "xdr->iov = NULL" forces
> xdr_reserve_space() to always use pages from xdr->buf->pages when
> calling nfsd_readv.
> 
> Also, the XDR padding is problematic. For NFS/RDMA Write chunks,
> the padding needs to be in xdr->buf->tail so that the transport can
> skip over it. However for NFS/TCP and the NFS/RDMA Reply chunks,
> the padding has to be retained. Not yet sure how to add this.
> 
> Fixes: b04209806384 ("nfsd4: allow exotic read compounds")
> Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=198053
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> Howdy Bruce-
> 
> I'm struggling with nfsd4_encode_readv().
> 
> - for NFS/RDMA Write chunks, the READ payload has to be in
>  buf->pages. I've fixed that.
> 
> - xdr_reserve_space() calls don't need to explicitly align the
>  @nbytes argument: xdr_reserve_space() already does this?
> 
> - the while loop probably won't work if a later READ in the COMPOUND
>  doesn't start on a page boundary. This isn't a problem until we
>  run into a Solaris client in forcedirectio mode.
> 
> - the XDR padding doesn't work for NFS/RDMA Write chunks, which are
>  supposed to skip padding altogether.
> 
> Do you have suggestions? Thanks in advance.

I'm experimenting with an idea I think has been mentioned on list
a few times:

Having the RPC layer and transports deal with the padding of the
xdr_buf->pages vector, and moving that responsibility out of the
NFSD Reply encoder functions. xdr_buf->tail[0] then always begins
on an XDR-aligned boundary.

This should be straightforward for NFSv3. The only two Reply
encoders that are updated are READ and READLINK. I'm starting
with that.

Not quite sure yet how krb5i/krb5p will deal with this. Obviously
the page list pad needs to be inserted before each Reply is
wrapped.

I'll post more as experimentation progresses.


> fs/nfsd/nfs4xdr.c |   17 +++++++----------
> 1 file changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index d2dc4c0e22e8..14c68a136b4e 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -3519,17 +3519,14 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
> 	u32 zzz = 0;
> 	int pad;
> 
> +	/* Ensure xdr_reserve_space behaves itself */
> +	if (xdr->iov == xdr->buf->head) {
> +		xdr->iov = NULL;
> +		xdr->end = xdr->p;
> +	}
> +
> 	len = maxcount;
> 	v = 0;
> -
> -	thislen = min_t(long, len, ((void *)xdr->end - (void *)xdr->p));
> -	p = xdr_reserve_space(xdr, (thislen+3)&~3);
> -	WARN_ON_ONCE(!p);
> -	resp->rqstp->rq_vec[v].iov_base = p;
> -	resp->rqstp->rq_vec[v].iov_len = thislen;
> -	v++;
> -	len -= thislen;
> -
> 	while (len) {
> 		thislen = min_t(long, len, PAGE_SIZE);
> 		p = xdr_reserve_space(xdr, (thislen+3)&~3);
> @@ -3548,7 +3545,7 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
> 	read->rd_length = maxcount;
> 	if (nfserr)
> 		return nfserr;
> -	xdr_truncate_encode(xdr, starting_len + 8 + ((maxcount+3)&~3));
> +	xdr_truncate_encode(xdr, starting_len + 8 + maxcount);
> 
> 	tmp = htonl(eof);
> 	write_bytes_to_xdr_buf(xdr->buf, starting_len    , &tmp, 4);
> 

--
Chuck Lever
J. Bruce Fields Jan. 17, 2020, 9:46 p.m. UTC | #2
On Wed, Jan 15, 2020 at 03:37:33PM -0500, Chuck Lever wrote:
> svcrdma expects that the READ payload falls precisely into the
> xdr_buf's page vector. Adding "xdr->iov = NULL" forces
> xdr_reserve_space() to always use pages from xdr->buf->pages when
> calling nfsd_readv.
> 
> Also, the XDR padding is problematic. For NFS/RDMA Write chunks,
> the padding needs to be in xdr->buf->tail so that the transport can
> skip over it. However for NFS/TCP and the NFS/RDMA Reply chunks,
> the padding has to be retained. Not yet sure how to add this.
> 
> Fixes: b04209806384 ("nfsd4: allow exotic read compounds")
> Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=198053
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> Howdy Bruce-
> 
> I'm struggling with nfsd4_encode_readv().
> 
> - for NFS/RDMA Write chunks, the READ payload has to be in
>   buf->pages. I've fixed that.
> 
> - xdr_reserve_space() calls don't need to explicitly align the
>   @nbytes argument: xdr_reserve_space() already does this?
> 
> - the while loop probably won't work if a later READ in the COMPOUND
>   doesn't start on a page boundary. This isn't a problem until we
>   run into a Solaris client in forcedirectio mode.

So the Solaris client sends multiple reads per compound in that case?

> - the XDR padding doesn't work for NFS/RDMA Write chunks, which are
>   supposed to skip padding altogether.

krb5i/p has to treat read data as padded regardless of the transport,
doesn't it?

--b.

> Do you have suggestions? Thanks in advance.
> 
> 
>  fs/nfsd/nfs4xdr.c |   17 +++++++----------
>  1 file changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index d2dc4c0e22e8..14c68a136b4e 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -3519,17 +3519,14 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
>  	u32 zzz = 0;
>  	int pad;
>  
> +	/* Ensure xdr_reserve_space behaves itself */
> +	if (xdr->iov == xdr->buf->head) {
> +		xdr->iov = NULL;
> +		xdr->end = xdr->p;
> +	}
> +
>  	len = maxcount;
>  	v = 0;
> -
> -	thislen = min_t(long, len, ((void *)xdr->end - (void *)xdr->p));
> -	p = xdr_reserve_space(xdr, (thislen+3)&~3);
> -	WARN_ON_ONCE(!p);
> -	resp->rqstp->rq_vec[v].iov_base = p;
> -	resp->rqstp->rq_vec[v].iov_len = thislen;
> -	v++;
> -	len -= thislen;
> -
>  	while (len) {
>  		thislen = min_t(long, len, PAGE_SIZE);
>  		p = xdr_reserve_space(xdr, (thislen+3)&~3);
> @@ -3548,7 +3545,7 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
>  	read->rd_length = maxcount;
>  	if (nfserr)
>  		return nfserr;
> -	xdr_truncate_encode(xdr, starting_len + 8 + ((maxcount+3)&~3));
> +	xdr_truncate_encode(xdr, starting_len + 8 + maxcount);
>  
>  	tmp = htonl(eof);
>  	write_bytes_to_xdr_buf(xdr->buf, starting_len    , &tmp, 4);
Chuck Lever Jan. 17, 2020, 9:48 p.m. UTC | #3
> On Jan 17, 2020, at 4:46 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Wed, Jan 15, 2020 at 03:37:33PM -0500, Chuck Lever wrote:
>> svcrdma expects that the READ payload falls precisely into the
>> xdr_buf's page vector. Adding "xdr->iov = NULL" forces
>> xdr_reserve_space() to always use pages from xdr->buf->pages when
>> calling nfsd_readv.
>> 
>> Also, the XDR padding is problematic. For NFS/RDMA Write chunks,
>> the padding needs to be in xdr->buf->tail so that the transport can
>> skip over it. However for NFS/TCP and the NFS/RDMA Reply chunks,
>> the padding has to be retained. Not yet sure how to add this.
>> 
>> Fixes: b04209806384 ("nfsd4: allow exotic read compounds")
>> Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=198053
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> Howdy Bruce-
>> 
>> I'm struggling with nfsd4_encode_readv().
>> 
>> - for NFS/RDMA Write chunks, the READ payload has to be in
>>  buf->pages. I've fixed that.
>> 
>> - xdr_reserve_space() calls don't need to explicitly align the
>>  @nbytes argument: xdr_reserve_space() already does this?
>> 
>> - the while loop probably won't work if a later READ in the COMPOUND
>>  doesn't start on a page boundary. This isn't a problem until we
>>  run into a Solaris client in forcedirectio mode.
> 
> So the Solaris client sends multiple reads per compound in that case?

If the application uses readv() and the mount is in forcedirectio
mode, I'm told Solaris will send an NFSv4 COMPOUND with multiple
READ operations in it.


>> - the XDR padding doesn't work for NFS/RDMA Write chunks, which are
>>  supposed to skip padding altogether.
> 
> krb5i/p has to treat read data as padded regardless of the transport,
> doesn't it?

Yes. See my reply from earlier today: I think we can get away with
inserting the pad just before wrapping the Reply.


> --b.
> 
>> Do you have suggestions? Thanks in advance.
>> 
>> 
>> fs/nfsd/nfs4xdr.c |   17 +++++++----------
>> 1 file changed, 7 insertions(+), 10 deletions(-)
>> 
>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> index d2dc4c0e22e8..14c68a136b4e 100644
>> --- a/fs/nfsd/nfs4xdr.c
>> +++ b/fs/nfsd/nfs4xdr.c
>> @@ -3519,17 +3519,14 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
>> 	u32 zzz = 0;
>> 	int pad;
>> 
>> +	/* Ensure xdr_reserve_space behaves itself */
>> +	if (xdr->iov == xdr->buf->head) {
>> +		xdr->iov = NULL;
>> +		xdr->end = xdr->p;
>> +	}
>> +
>> 	len = maxcount;
>> 	v = 0;
>> -
>> -	thislen = min_t(long, len, ((void *)xdr->end - (void *)xdr->p));
>> -	p = xdr_reserve_space(xdr, (thislen+3)&~3);
>> -	WARN_ON_ONCE(!p);
>> -	resp->rqstp->rq_vec[v].iov_base = p;
>> -	resp->rqstp->rq_vec[v].iov_len = thislen;
>> -	v++;
>> -	len -= thislen;
>> -
>> 	while (len) {
>> 		thislen = min_t(long, len, PAGE_SIZE);
>> 		p = xdr_reserve_space(xdr, (thislen+3)&~3);
>> @@ -3548,7 +3545,7 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
>> 	read->rd_length = maxcount;
>> 	if (nfserr)
>> 		return nfserr;
>> -	xdr_truncate_encode(xdr, starting_len + 8 + ((maxcount+3)&~3));
>> +	xdr_truncate_encode(xdr, starting_len + 8 + maxcount);
>> 
>> 	tmp = htonl(eof);
>> 	write_bytes_to_xdr_buf(xdr->buf, starting_len    , &tmp, 4);

--
Chuck Lever

Patch
diff mbox series

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index d2dc4c0e22e8..14c68a136b4e 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3519,17 +3519,14 @@  static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
 	u32 zzz = 0;
 	int pad;
 
+	/* Ensure xdr_reserve_space behaves itself */
+	if (xdr->iov == xdr->buf->head) {
+		xdr->iov = NULL;
+		xdr->end = xdr->p;
+	}
+
 	len = maxcount;
 	v = 0;
-
-	thislen = min_t(long, len, ((void *)xdr->end - (void *)xdr->p));
-	p = xdr_reserve_space(xdr, (thislen+3)&~3);
-	WARN_ON_ONCE(!p);
-	resp->rqstp->rq_vec[v].iov_base = p;
-	resp->rqstp->rq_vec[v].iov_len = thislen;
-	v++;
-	len -= thislen;
-
 	while (len) {
 		thislen = min_t(long, len, PAGE_SIZE);
 		p = xdr_reserve_space(xdr, (thislen+3)&~3);
@@ -3548,7 +3545,7 @@  static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
 	read->rd_length = maxcount;
 	if (nfserr)
 		return nfserr;
-	xdr_truncate_encode(xdr, starting_len + 8 + ((maxcount+3)&~3));
+	xdr_truncate_encode(xdr, starting_len + 8 + maxcount);
 
 	tmp = htonl(eof);
 	write_bytes_to_xdr_buf(xdr->buf, starting_len    , &tmp, 4);