Message ID | 20241223180724.1804-5-cel@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Fix XDR encoding near page boundaries | expand |
On Mon, Dec 23, 2024 at 10:07 AM <cel@kernel.org> wrote: > > From: Chuck Lever <chuck.lever@oracle.com> > > J. David reports an odd corruption of a READDIR reply sent to a > FreeBSD client. > > xdr_reserve_space() has to do a special trick when the @nbytes value > requests more space than there is in the current page of the XDR > buffer. > > In that case, xdr_reserve_space() returns a pointer to the start of > the next page, and then the next call to xdr_reserve_space() invokes > __xdr_commit_encode() to copy enough of the data item back into the > previous page to make that data item contiguous across the page > boundary. > > But we need to be careful in the case where buffer space is reserved > early for a data item that will be inserted into the buffer later. > > One such caller, nfsd4_encode_operation(), reserves 8 bytes in the > encoding buffer for each COMPOUND operation. However, a READDIR > result can sometimes encode file names so that there are only 4 > bytes left at the end of the current XDR buffer page (though plenty > of pages are left to handle the remaining encoding tasks). > > If a COMPOUND operation follows the READDIR result (say, a GETATTR), > then nfsd4_encode_operation() will reserve 8 bytes for the op number > (9) and the op status (usually NFS4_OK). In this weird case, > xdr_reserve_space() returns a pointer to byte zero of the next buffer > page, as it assumes the data item will be copied back into place (in > the previous page) on the next call to xdr_reserve_space(). > > nfsd4_encode_operation() writes the op num into the buffer, then > saves the next 4-byte location for the op's status code. The next > xdr_reserve_space() call is part of GETATTR encoding, so the op num > gets copied back into the previous page, but the saved location for > the op status continues to point to the wrong spot in the current > XDR buffer page because __xdr_commit_encode() moved that data item. > > After GETATTR encoding is complete, nfsd4_encode_operation() writes > the op status over the first XDR data item in the GETATTR result. > The NFS4_OK status code (0) makes it look like there are zero items > in the GETATTR's attribute bitmask. > > The patch description of commit 2825a7f90753 ("nfsd4: allow encoding > across page boundaries") [2014] remarks that NFSD "can't handle a > new operation starting close to the end of a page." This bug appears > to be one reason for that remark. > > Reported-by: J David <j.david.lists@gmail.com> > Closes: https://lore.kernel.org/linux-nfs/3998d739-c042-46b4-8166-dbd6c5f0e804@oracle.com/T/#t > X-Cc: stable@vger.kernel.org > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > fs/nfsd/nfs4xdr.c | 20 ++++++++++---------- > 1 file changed, 10 insertions(+), 10 deletions(-) > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c > index 53fac037611c..15cd716e9d91 100644 > --- a/fs/nfsd/nfs4xdr.c > +++ b/fs/nfsd/nfs4xdr.c > @@ -5760,15 +5760,14 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) > struct nfs4_stateowner *so = resp->cstate.replay_owner; > struct svc_rqst *rqstp = resp->rqstp; > const struct nfsd4_operation *opdesc = op->opdesc; > - int post_err_offset; > + unsigned int op_status_offset; > nfsd4_enc encoder; > - __be32 *p; > > - p = xdr_reserve_space(xdr, 8); > - if (!p) > + if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT) > + goto release; > + op_status_offset = xdr_stream_pos(xdr); > + if (!xdr_reserve_space(xdr, 4)) Maybe XDR_UNIT instead of 4 for consistency, but I don't care which you choose. > goto release; > - *p++ = cpu_to_be32(op->opnum); > - post_err_offset = xdr->buf->len; > > if (op->opnum == OP_ILLEGAL) > goto status; > @@ -5809,20 +5808,21 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) > * bug if we had to do this on a non-idempotent op: > */ > warn_on_nonidempotent_op(op); > - xdr_truncate_encode(xdr, post_err_offset); > + xdr_truncate_encode(xdr, op_status_offset + XDR_UNIT); > } > if (so) { > - int len = xdr->buf->len - post_err_offset; > + int len = xdr->buf->len - (op_status_offset + XDR_UNIT); > > so->so_replay.rp_status = op->status; > so->so_replay.rp_buflen = len; > - read_bytes_from_xdr_buf(xdr->buf, post_err_offset, > + read_bytes_from_xdr_buf(xdr->buf, op_status_offset + XDR_UNIT, > so->so_replay.rp_buf, len); > } > status: > op->status = nfsd4_map_status(op->status, > resp->cstate.minorversion); > - *p = op->status; > + write_bytes_to_xdr_buf(xdr->buf, op_status_offset, > + &op->status, XDR_UNIT); > release: > if (opdesc && opdesc->op_release) > opdesc->op_release(&op->u); > -- > 2.47.0 >
On Tue, 24 Dec 2024, cel@kernel.org wrote: > From: Chuck Lever <chuck.lever@oracle.com> > > J. David reports an odd corruption of a READDIR reply sent to a > FreeBSD client. > > xdr_reserve_space() has to do a special trick when the @nbytes value > requests more space than there is in the current page of the XDR > buffer. > > In that case, xdr_reserve_space() returns a pointer to the start of > the next page, and then the next call to xdr_reserve_space() invokes > __xdr_commit_encode() to copy enough of the data item back into the > previous page to make that data item contiguous across the page > boundary. > > But we need to be careful in the case where buffer space is reserved > early for a data item that will be inserted into the buffer later. > > One such caller, nfsd4_encode_operation(), reserves 8 bytes in the > encoding buffer for each COMPOUND operation. However, a READDIR > result can sometimes encode file names so that there are only 4 > bytes left at the end of the current XDR buffer page (though plenty > of pages are left to handle the remaining encoding tasks). > > If a COMPOUND operation follows the READDIR result (say, a GETATTR), > then nfsd4_encode_operation() will reserve 8 bytes for the op number > (9) and the op status (usually NFS4_OK). In this weird case, > xdr_reserve_space() returns a pointer to byte zero of the next buffer > page, as it assumes the data item will be copied back into place (in > the previous page) on the next call to xdr_reserve_space(). > > nfsd4_encode_operation() writes the op num into the buffer, then > saves the next 4-byte location for the op's status code. The next > xdr_reserve_space() call is part of GETATTR encoding, so the op num > gets copied back into the previous page, but the saved location for > the op status continues to point to the wrong spot in the current > XDR buffer page because __xdr_commit_encode() moved that data item. > > After GETATTR encoding is complete, nfsd4_encode_operation() writes > the op status over the first XDR data item in the GETATTR result. > The NFS4_OK status code (0) makes it look like there are zero items > in the GETATTR's attribute bitmask. > > The patch description of commit 2825a7f90753 ("nfsd4: allow encoding > across page boundaries") [2014] remarks that NFSD "can't handle a > new operation starting close to the end of a page." This bug appears > to be one reason for that remark. > > Reported-by: J David <j.david.lists@gmail.com> > Closes: https://lore.kernel.org/linux-nfs/3998d739-c042-46b4-8166-dbd6c5f0e804@oracle.com/T/#t > X-Cc: stable@vger.kernel.org > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > fs/nfsd/nfs4xdr.c | 20 ++++++++++---------- > 1 file changed, 10 insertions(+), 10 deletions(-) > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c > index 53fac037611c..15cd716e9d91 100644 > --- a/fs/nfsd/nfs4xdr.c > +++ b/fs/nfsd/nfs4xdr.c > @@ -5760,15 +5760,14 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) > struct nfs4_stateowner *so = resp->cstate.replay_owner; > struct svc_rqst *rqstp = resp->rqstp; > const struct nfsd4_operation *opdesc = op->opdesc; > - int post_err_offset; > + unsigned int op_status_offset; As most uses of "op_status_offset" add XDR_UNIT I'd be incline to keep the "post" offset. unsigned int op_status_offset, post_status_offset; > nfsd4_enc encoder; > - __be32 *p; > > - p = xdr_reserve_space(xdr, 8); > - if (!p) > + if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT) > + goto release; > + op_status_offset = xdr_stream_pos(xdr); > + if (!xdr_reserve_space(xdr, 4)) The underlying message of this bug seems to be that xdr_reserve_space() is a low-level interface that probably shouldn't be used outside of xdr code. So I wonder if we could use op_status_offset = xdr_stream_pos(xdr); if (xdr_stream_encode_u32(xdr, NFS4ERR_SERVERFAULT) < 0) //will be over-written goto release; post_status_offset = xdr_stream_pos(xdr); instead?? But these are minor thoughts - only use them if you like them. Generally this is a definite improvement. Reviewed-by: NeilBrown <neilb@suse.de> Thanks, NeilBrown
On 12/24/24 6:19 PM, NeilBrown wrote: > On Tue, 24 Dec 2024, cel@kernel.org wrote: >> From: Chuck Lever <chuck.lever@oracle.com> >> >> J. David reports an odd corruption of a READDIR reply sent to a >> FreeBSD client. >> >> xdr_reserve_space() has to do a special trick when the @nbytes value >> requests more space than there is in the current page of the XDR >> buffer. >> >> In that case, xdr_reserve_space() returns a pointer to the start of >> the next page, and then the next call to xdr_reserve_space() invokes >> __xdr_commit_encode() to copy enough of the data item back into the >> previous page to make that data item contiguous across the page >> boundary. >> >> But we need to be careful in the case where buffer space is reserved >> early for a data item that will be inserted into the buffer later. >> >> One such caller, nfsd4_encode_operation(), reserves 8 bytes in the >> encoding buffer for each COMPOUND operation. However, a READDIR >> result can sometimes encode file names so that there are only 4 >> bytes left at the end of the current XDR buffer page (though plenty >> of pages are left to handle the remaining encoding tasks). >> >> If a COMPOUND operation follows the READDIR result (say, a GETATTR), >> then nfsd4_encode_operation() will reserve 8 bytes for the op number >> (9) and the op status (usually NFS4_OK). In this weird case, >> xdr_reserve_space() returns a pointer to byte zero of the next buffer >> page, as it assumes the data item will be copied back into place (in >> the previous page) on the next call to xdr_reserve_space(). >> >> nfsd4_encode_operation() writes the op num into the buffer, then >> saves the next 4-byte location for the op's status code. The next >> xdr_reserve_space() call is part of GETATTR encoding, so the op num >> gets copied back into the previous page, but the saved location for >> the op status continues to point to the wrong spot in the current >> XDR buffer page because __xdr_commit_encode() moved that data item. >> >> After GETATTR encoding is complete, nfsd4_encode_operation() writes >> the op status over the first XDR data item in the GETATTR result. >> The NFS4_OK status code (0) makes it look like there are zero items >> in the GETATTR's attribute bitmask. >> >> The patch description of commit 2825a7f90753 ("nfsd4: allow encoding >> across page boundaries") [2014] remarks that NFSD "can't handle a >> new operation starting close to the end of a page." This bug appears >> to be one reason for that remark. >> >> Reported-by: J David <j.david.lists@gmail.com> >> Closes: https://lore.kernel.org/linux-nfs/3998d739-c042-46b4-8166-dbd6c5f0e804@oracle.com/T/#t >> X-Cc: stable@vger.kernel.org >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> >> --- >> fs/nfsd/nfs4xdr.c | 20 ++++++++++---------- >> 1 file changed, 10 insertions(+), 10 deletions(-) >> >> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c >> index 53fac037611c..15cd716e9d91 100644 >> --- a/fs/nfsd/nfs4xdr.c >> +++ b/fs/nfsd/nfs4xdr.c >> @@ -5760,15 +5760,14 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) >> struct nfs4_stateowner *so = resp->cstate.replay_owner; >> struct svc_rqst *rqstp = resp->rqstp; >> const struct nfsd4_operation *opdesc = op->opdesc; >> - int post_err_offset; >> + unsigned int op_status_offset; > > As most uses of "op_status_offset" add XDR_UNIT I'd be incline to keep > the "post" offset. > > unsigned int op_status_offset, post_status_offset; My thinking is that the "op_status_offset" value will be used on every call of this function, but the "post_status_offset" cases are actually exceedingly rare. There's no good reason to maintain that value separately during every call. >> nfsd4_enc encoder; >> - __be32 *p; >> >> - p = xdr_reserve_space(xdr, 8); >> - if (!p) >> + if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT) >> + goto release; >> + op_status_offset = xdr_stream_pos(xdr); >> + if (!xdr_reserve_space(xdr, 4)) > > The underlying message of this bug seems to be that xdr_reserve_space() > is a low-level interface that probably shouldn't be used outside of xdr > code. > So I wonder if we could use > op_status_offset = xdr_stream_pos(xdr); > if (xdr_stream_encode_u32(xdr, NFS4ERR_SERVERFAULT) < 0) //will be over-written > goto release; > post_status_offset = xdr_stream_pos(xdr); > > instead?? > > But these are minor thoughts - only use them if you like them. > Generally this is a definite improvement. > > Reviewed-by: NeilBrown <neilb@suse.de> Thanks!
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 53fac037611c..15cd716e9d91 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5760,15 +5760,14 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) struct nfs4_stateowner *so = resp->cstate.replay_owner; struct svc_rqst *rqstp = resp->rqstp; const struct nfsd4_operation *opdesc = op->opdesc; - int post_err_offset; + unsigned int op_status_offset; nfsd4_enc encoder; - __be32 *p; - p = xdr_reserve_space(xdr, 8); - if (!p) + if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT) + goto release; + op_status_offset = xdr_stream_pos(xdr); + if (!xdr_reserve_space(xdr, 4)) goto release; - *p++ = cpu_to_be32(op->opnum); - post_err_offset = xdr->buf->len; if (op->opnum == OP_ILLEGAL) goto status; @@ -5809,20 +5808,21 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) * bug if we had to do this on a non-idempotent op: */ warn_on_nonidempotent_op(op); - xdr_truncate_encode(xdr, post_err_offset); + xdr_truncate_encode(xdr, op_status_offset + XDR_UNIT); } if (so) { - int len = xdr->buf->len - post_err_offset; + int len = xdr->buf->len - (op_status_offset + XDR_UNIT); so->so_replay.rp_status = op->status; so->so_replay.rp_buflen = len; - read_bytes_from_xdr_buf(xdr->buf, post_err_offset, + read_bytes_from_xdr_buf(xdr->buf, op_status_offset + XDR_UNIT, so->so_replay.rp_buf, len); } status: op->status = nfsd4_map_status(op->status, resp->cstate.minorversion); - *p = op->status; + write_bytes_to_xdr_buf(xdr->buf, op_status_offset, + &op->status, XDR_UNIT); release: if (opdesc && opdesc->op_release) opdesc->op_release(&op->u);