diff mbox series

[v2,1/2] NFSD: Encode COMPOUND operation status on page boundaries

Message ID 20241223180724.1804-5-cel@kernel.org (mailing list archive)
State Superseded
Headers show
Series Fix XDR encoding near page boundaries | expand

Commit Message

Chuck Lever Dec. 23, 2024, 6:07 p.m. UTC
From: Chuck Lever <chuck.lever@oracle.com>

J. David reports an odd corruption of a READDIR reply sent to a
FreeBSD client.

xdr_reserve_space() has to do a special trick when the @nbytes value
requests more space than there is in the current page of the XDR
buffer.

In that case, xdr_reserve_space() returns a pointer to the start of
the next page, and then the next call to xdr_reserve_space() invokes
__xdr_commit_encode() to copy enough of the data item back into the
previous page to make that data item contiguous across the page
boundary.

But we need to be careful in the case where buffer space is reserved
early for a data item that will be inserted into the buffer later.

One such caller, nfsd4_encode_operation(), reserves 8 bytes in the
encoding buffer for each COMPOUND operation. However, a READDIR
result can sometimes encode file names so that there are only 4
bytes left at the end of the current XDR buffer page (though plenty
of pages are left to handle the remaining encoding tasks).

If a COMPOUND operation follows the READDIR result (say, a GETATTR),
then nfsd4_encode_operation() will reserve 8 bytes for the op number
(9) and the op status (usually NFS4_OK). In this weird case,
xdr_reserve_space() returns a pointer to byte zero of the next buffer
page, as it assumes the data item will be copied back into place (in
the previous page) on the next call to xdr_reserve_space().

nfsd4_encode_operation() writes the op num into the buffer, then
saves the next 4-byte location for the op's status code. The next
xdr_reserve_space() call is part of GETATTR encoding, so the op num
gets copied back into the previous page, but the saved location for
the op status continues to point to the wrong spot in the current
XDR buffer page because __xdr_commit_encode() moved that data item.

After GETATTR encoding is complete, nfsd4_encode_operation() writes
the op status over the first XDR data item in the GETATTR result.
The NFS4_OK status code (0) makes it look like there are zero items
in the GETATTR's attribute bitmask.

The patch description of commit 2825a7f90753 ("nfsd4: allow encoding
across page boundaries") [2014] remarks that NFSD "can't handle a
new operation starting close to the end of a page." This bug appears
to be one reason for that remark.

Reported-by: J David <j.david.lists@gmail.com>
Closes: https://lore.kernel.org/linux-nfs/3998d739-c042-46b4-8166-dbd6c5f0e804@oracle.com/T/#t
X-Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4xdr.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

Comments

Rick Macklem Dec. 23, 2024, 11:46 p.m. UTC | #1
On Mon, Dec 23, 2024 at 10:07 AM <cel@kernel.org> wrote:
>
> From: Chuck Lever <chuck.lever@oracle.com>
>
> J. David reports an odd corruption of a READDIR reply sent to a
> FreeBSD client.
>
> xdr_reserve_space() has to do a special trick when the @nbytes value
> requests more space than there is in the current page of the XDR
> buffer.
>
> In that case, xdr_reserve_space() returns a pointer to the start of
> the next page, and then the next call to xdr_reserve_space() invokes
> __xdr_commit_encode() to copy enough of the data item back into the
> previous page to make that data item contiguous across the page
> boundary.
>
> But we need to be careful in the case where buffer space is reserved
> early for a data item that will be inserted into the buffer later.
>
> One such caller, nfsd4_encode_operation(), reserves 8 bytes in the
> encoding buffer for each COMPOUND operation. However, a READDIR
> result can sometimes encode file names so that there are only 4
> bytes left at the end of the current XDR buffer page (though plenty
> of pages are left to handle the remaining encoding tasks).
>
> If a COMPOUND operation follows the READDIR result (say, a GETATTR),
> then nfsd4_encode_operation() will reserve 8 bytes for the op number
> (9) and the op status (usually NFS4_OK). In this weird case,
> xdr_reserve_space() returns a pointer to byte zero of the next buffer
> page, as it assumes the data item will be copied back into place (in
> the previous page) on the next call to xdr_reserve_space().
>
> nfsd4_encode_operation() writes the op num into the buffer, then
> saves the next 4-byte location for the op's status code. The next
> xdr_reserve_space() call is part of GETATTR encoding, so the op num
> gets copied back into the previous page, but the saved location for
> the op status continues to point to the wrong spot in the current
> XDR buffer page because __xdr_commit_encode() moved that data item.
>
> After GETATTR encoding is complete, nfsd4_encode_operation() writes
> the op status over the first XDR data item in the GETATTR result.
> The NFS4_OK status code (0) makes it look like there are zero items
> in the GETATTR's attribute bitmask.
>
> The patch description of commit 2825a7f90753 ("nfsd4: allow encoding
> across page boundaries") [2014] remarks that NFSD "can't handle a
> new operation starting close to the end of a page." This bug appears
> to be one reason for that remark.
>
> Reported-by: J David <j.david.lists@gmail.com>
> Closes: https://lore.kernel.org/linux-nfs/3998d739-c042-46b4-8166-dbd6c5f0e804@oracle.com/T/#t
> X-Cc: stable@vger.kernel.org
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfs4xdr.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 53fac037611c..15cd716e9d91 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -5760,15 +5760,14 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
>         struct nfs4_stateowner *so = resp->cstate.replay_owner;
>         struct svc_rqst *rqstp = resp->rqstp;
>         const struct nfsd4_operation *opdesc = op->opdesc;
> -       int post_err_offset;
> +       unsigned int op_status_offset;
>         nfsd4_enc encoder;
> -       __be32 *p;
>
> -       p = xdr_reserve_space(xdr, 8);
> -       if (!p)
> +       if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT)
> +               goto release;
> +       op_status_offset = xdr_stream_pos(xdr);
> +       if (!xdr_reserve_space(xdr, 4))
Maybe XDR_UNIT instead of 4 for consistency, but I don't care which you choose.

>                 goto release;
> -       *p++ = cpu_to_be32(op->opnum);
> -       post_err_offset = xdr->buf->len;
>
>         if (op->opnum == OP_ILLEGAL)
>                 goto status;
> @@ -5809,20 +5808,21 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
>                  * bug if we had to do this on a non-idempotent op:
>                  */
>                 warn_on_nonidempotent_op(op);
> -               xdr_truncate_encode(xdr, post_err_offset);
> +               xdr_truncate_encode(xdr, op_status_offset + XDR_UNIT);
>         }
>         if (so) {
> -               int len = xdr->buf->len - post_err_offset;
> +               int len = xdr->buf->len - (op_status_offset + XDR_UNIT);
>
>                 so->so_replay.rp_status = op->status;
>                 so->so_replay.rp_buflen = len;
> -               read_bytes_from_xdr_buf(xdr->buf, post_err_offset,
> +               read_bytes_from_xdr_buf(xdr->buf, op_status_offset + XDR_UNIT,
>                                                 so->so_replay.rp_buf, len);
>         }
>  status:
>         op->status = nfsd4_map_status(op->status,
>                                       resp->cstate.minorversion);
> -       *p = op->status;
> +       write_bytes_to_xdr_buf(xdr->buf, op_status_offset,
> +                              &op->status, XDR_UNIT);
>  release:
>         if (opdesc && opdesc->op_release)
>                 opdesc->op_release(&op->u);
> --
> 2.47.0
>
NeilBrown Dec. 24, 2024, 11:19 p.m. UTC | #2
On Tue, 24 Dec 2024, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> J. David reports an odd corruption of a READDIR reply sent to a
> FreeBSD client.
> 
> xdr_reserve_space() has to do a special trick when the @nbytes value
> requests more space than there is in the current page of the XDR
> buffer.
> 
> In that case, xdr_reserve_space() returns a pointer to the start of
> the next page, and then the next call to xdr_reserve_space() invokes
> __xdr_commit_encode() to copy enough of the data item back into the
> previous page to make that data item contiguous across the page
> boundary.
> 
> But we need to be careful in the case where buffer space is reserved
> early for a data item that will be inserted into the buffer later.
> 
> One such caller, nfsd4_encode_operation(), reserves 8 bytes in the
> encoding buffer for each COMPOUND operation. However, a READDIR
> result can sometimes encode file names so that there are only 4
> bytes left at the end of the current XDR buffer page (though plenty
> of pages are left to handle the remaining encoding tasks).
> 
> If a COMPOUND operation follows the READDIR result (say, a GETATTR),
> then nfsd4_encode_operation() will reserve 8 bytes for the op number
> (9) and the op status (usually NFS4_OK). In this weird case,
> xdr_reserve_space() returns a pointer to byte zero of the next buffer
> page, as it assumes the data item will be copied back into place (in
> the previous page) on the next call to xdr_reserve_space().
> 
> nfsd4_encode_operation() writes the op num into the buffer, then
> saves the next 4-byte location for the op's status code. The next
> xdr_reserve_space() call is part of GETATTR encoding, so the op num
> gets copied back into the previous page, but the saved location for
> the op status continues to point to the wrong spot in the current
> XDR buffer page because __xdr_commit_encode() moved that data item.
> 
> After GETATTR encoding is complete, nfsd4_encode_operation() writes
> the op status over the first XDR data item in the GETATTR result.
> The NFS4_OK status code (0) makes it look like there are zero items
> in the GETATTR's attribute bitmask.
> 
> The patch description of commit 2825a7f90753 ("nfsd4: allow encoding
> across page boundaries") [2014] remarks that NFSD "can't handle a
> new operation starting close to the end of a page." This bug appears
> to be one reason for that remark.
> 
> Reported-by: J David <j.david.lists@gmail.com>
> Closes: https://lore.kernel.org/linux-nfs/3998d739-c042-46b4-8166-dbd6c5f0e804@oracle.com/T/#t
> X-Cc: stable@vger.kernel.org
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfs4xdr.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 53fac037611c..15cd716e9d91 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -5760,15 +5760,14 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
>  	struct nfs4_stateowner *so = resp->cstate.replay_owner;
>  	struct svc_rqst *rqstp = resp->rqstp;
>  	const struct nfsd4_operation *opdesc = op->opdesc;
> -	int post_err_offset;
> +	unsigned int op_status_offset;

As most uses of "op_status_offset" add XDR_UNIT I'd be incline to keep
the "post" offset.

   unsigned int op_status_offset, post_status_offset;

>  	nfsd4_enc encoder;
> -	__be32 *p;
>  
> -	p = xdr_reserve_space(xdr, 8);
> -	if (!p)
> +	if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT)
> +		goto release;
> +	op_status_offset = xdr_stream_pos(xdr);
> +	if (!xdr_reserve_space(xdr, 4))

The underlying message of this bug seems to be that xdr_reserve_space()
is a low-level interface that probably shouldn't be used outside of xdr
code.
So I wonder if we could use
    op_status_offset = xdr_stream_pos(xdr);
    if (xdr_stream_encode_u32(xdr, NFS4ERR_SERVERFAULT) < 0) //will be over-written
           goto release;
    post_status_offset = xdr_stream_pos(xdr);

instead??

But these are minor thoughts - only use them if you like them.
Generally this is a definite improvement.

Reviewed-by: NeilBrown <neilb@suse.de>

Thanks,
NeilBrown
Chuck Lever Dec. 25, 2024, 3:53 p.m. UTC | #3
On 12/24/24 6:19 PM, NeilBrown wrote:
> On Tue, 24 Dec 2024, cel@kernel.org wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> J. David reports an odd corruption of a READDIR reply sent to a
>> FreeBSD client.
>>
>> xdr_reserve_space() has to do a special trick when the @nbytes value
>> requests more space than there is in the current page of the XDR
>> buffer.
>>
>> In that case, xdr_reserve_space() returns a pointer to the start of
>> the next page, and then the next call to xdr_reserve_space() invokes
>> __xdr_commit_encode() to copy enough of the data item back into the
>> previous page to make that data item contiguous across the page
>> boundary.
>>
>> But we need to be careful in the case where buffer space is reserved
>> early for a data item that will be inserted into the buffer later.
>>
>> One such caller, nfsd4_encode_operation(), reserves 8 bytes in the
>> encoding buffer for each COMPOUND operation. However, a READDIR
>> result can sometimes encode file names so that there are only 4
>> bytes left at the end of the current XDR buffer page (though plenty
>> of pages are left to handle the remaining encoding tasks).
>>
>> If a COMPOUND operation follows the READDIR result (say, a GETATTR),
>> then nfsd4_encode_operation() will reserve 8 bytes for the op number
>> (9) and the op status (usually NFS4_OK). In this weird case,
>> xdr_reserve_space() returns a pointer to byte zero of the next buffer
>> page, as it assumes the data item will be copied back into place (in
>> the previous page) on the next call to xdr_reserve_space().
>>
>> nfsd4_encode_operation() writes the op num into the buffer, then
>> saves the next 4-byte location for the op's status code. The next
>> xdr_reserve_space() call is part of GETATTR encoding, so the op num
>> gets copied back into the previous page, but the saved location for
>> the op status continues to point to the wrong spot in the current
>> XDR buffer page because __xdr_commit_encode() moved that data item.
>>
>> After GETATTR encoding is complete, nfsd4_encode_operation() writes
>> the op status over the first XDR data item in the GETATTR result.
>> The NFS4_OK status code (0) makes it look like there are zero items
>> in the GETATTR's attribute bitmask.
>>
>> The patch description of commit 2825a7f90753 ("nfsd4: allow encoding
>> across page boundaries") [2014] remarks that NFSD "can't handle a
>> new operation starting close to the end of a page." This bug appears
>> to be one reason for that remark.
>>
>> Reported-by: J David <j.david.lists@gmail.com>
>> Closes: https://lore.kernel.org/linux-nfs/3998d739-c042-46b4-8166-dbd6c5f0e804@oracle.com/T/#t
>> X-Cc: stable@vger.kernel.org
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>   fs/nfsd/nfs4xdr.c | 20 ++++++++++----------
>>   1 file changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> index 53fac037611c..15cd716e9d91 100644
>> --- a/fs/nfsd/nfs4xdr.c
>> +++ b/fs/nfsd/nfs4xdr.c
>> @@ -5760,15 +5760,14 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
>>   	struct nfs4_stateowner *so = resp->cstate.replay_owner;
>>   	struct svc_rqst *rqstp = resp->rqstp;
>>   	const struct nfsd4_operation *opdesc = op->opdesc;
>> -	int post_err_offset;
>> +	unsigned int op_status_offset;
> 
> As most uses of "op_status_offset" add XDR_UNIT I'd be incline to keep
> the "post" offset.
> 
>     unsigned int op_status_offset, post_status_offset;

My thinking is that the "op_status_offset" value will be used on every
call of this function, but the "post_status_offset" cases are actually
exceedingly rare. There's no good reason to maintain that value
separately during every call.


>>   	nfsd4_enc encoder;
>> -	__be32 *p;
>>   
>> -	p = xdr_reserve_space(xdr, 8);
>> -	if (!p)
>> +	if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT)
>> +		goto release;
>> +	op_status_offset = xdr_stream_pos(xdr);
>> +	if (!xdr_reserve_space(xdr, 4))
> 
> The underlying message of this bug seems to be that xdr_reserve_space()
> is a low-level interface that probably shouldn't be used outside of xdr
> code.
> So I wonder if we could use
>      op_status_offset = xdr_stream_pos(xdr);
>      if (xdr_stream_encode_u32(xdr, NFS4ERR_SERVERFAULT) < 0) //will be over-written
>             goto release;
>      post_status_offset = xdr_stream_pos(xdr);
> 
> instead??
> 
> But these are minor thoughts - only use them if you like them.
> Generally this is a definite improvement.
> 
> Reviewed-by: NeilBrown <neilb@suse.de>

Thanks!
diff mbox series

Patch

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 53fac037611c..15cd716e9d91 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -5760,15 +5760,14 @@  nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
 	struct nfs4_stateowner *so = resp->cstate.replay_owner;
 	struct svc_rqst *rqstp = resp->rqstp;
 	const struct nfsd4_operation *opdesc = op->opdesc;
-	int post_err_offset;
+	unsigned int op_status_offset;
 	nfsd4_enc encoder;
-	__be32 *p;
 
-	p = xdr_reserve_space(xdr, 8);
-	if (!p)
+	if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT)
+		goto release;
+	op_status_offset = xdr_stream_pos(xdr);
+	if (!xdr_reserve_space(xdr, 4))
 		goto release;
-	*p++ = cpu_to_be32(op->opnum);
-	post_err_offset = xdr->buf->len;
 
 	if (op->opnum == OP_ILLEGAL)
 		goto status;
@@ -5809,20 +5808,21 @@  nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
 		 * bug if we had to do this on a non-idempotent op:
 		 */
 		warn_on_nonidempotent_op(op);
-		xdr_truncate_encode(xdr, post_err_offset);
+		xdr_truncate_encode(xdr, op_status_offset + XDR_UNIT);
 	}
 	if (so) {
-		int len = xdr->buf->len - post_err_offset;
+		int len = xdr->buf->len - (op_status_offset + XDR_UNIT);
 
 		so->so_replay.rp_status = op->status;
 		so->so_replay.rp_buflen = len;
-		read_bytes_from_xdr_buf(xdr->buf, post_err_offset,
+		read_bytes_from_xdr_buf(xdr->buf, op_status_offset + XDR_UNIT,
 						so->so_replay.rp_buf, len);
 	}
 status:
 	op->status = nfsd4_map_status(op->status,
 				      resp->cstate.minorversion);
-	*p = op->status;
+	write_bytes_to_xdr_buf(xdr->buf, op_status_offset,
+			       &op->status, XDR_UNIT);
 release:
 	if (opdesc && opdesc->op_release)
 		opdesc->op_release(&op->u);