Message ID | 173069566284.81717.2360317209010090007@noble.neil.brown.name (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | nfsd: fallback to sync COPY if async not possible | expand |
On Mon, 2024-11-04 at 15:47 +1100, NeilBrown wrote: > An NFSv4.2 COPY request can explicitly request a synchronous copy. If > that is not requested then the server is free to perform the copy > synchronously or asynchronously. > > In the Linux implementation an async copy requires more resources than a > sync copy. If nfsd cannot allocate these resources, the best response > is to simply perform the copy (or the first 4MB of it) synchronously. > Where does the copy get clamped at 4MB? > This choice may be debatable if the unavailable resource was due to > memory allocation failure - when memalloc fails it might be best to > simply give up as the server is clearly under load. However in the case > that policy prevents another kthread being created there is no benefit > and much cost is failing with NFS4ERR_DELAY. In that case it seems > reasonable to avoid that error in all circumstances. > > So change the out_err case to retry as a sync copy. > > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") > Signed-off-by: NeilBrown <neilb@suse.de> > --- > fs/nfsd/nfs4proc.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > index fea171ffed62..06e0d9153ca9 100644 > --- a/fs/nfsd/nfs4proc.c > +++ b/fs/nfsd/nfs4proc.c > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > wake_up_process(async_copy->copy_task); > status = nfs_ok; > } else { > + retry_sync: > status = nfsd4_do_copy(copy, copy->nf_src->nf_file, > copy->nf_dst->nf_file, true); > } > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > } > if (async_copy) > cleanup_async_copy(async_copy); > - status = nfserr_jukebox; > - goto out; > + goto retry_sync; > } > > static struct nfsd4_copy * > > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44 You're probably right that attempting to do this synchronously is probably best. That should avoid another RPC, though at the cost of having to shovel the data into the pagecache. If memory is very tight, I suppose the synchronous copy might also fail with NFS4ERR_DELAY, in which case we're no worse off. Reviewed-by: Jeff Layton <jlayton@kernel.org>
On Mon, Nov 04, 2024 at 08:05:27AM -0500, Jeff Layton wrote: > On Mon, 2024-11-04 at 15:47 +1100, NeilBrown wrote: > > An NFSv4.2 COPY request can explicitly request a synchronous copy. If > > that is not requested then the server is free to perform the copy > > synchronously or asynchronously. > > > > In the Linux implementation an async copy requires more resources than a > > sync copy. If nfsd cannot allocate these resources, the best response > > is to simply perform the copy (or the first 4MB of it) synchronously. > > > > Where does the copy get clamped at 4MB? fs/nfsd/vfs.c: nfsd_copy_file_range() > > This choice may be debatable if the unavailable resource was due to > > memory allocation failure - when memalloc fails it might be best to > > simply give up as the server is clearly under load. However in the case > > that policy prevents another kthread being created there is no benefit > > and much cost is failing with NFS4ERR_DELAY. In that case it seems > > reasonable to avoid that error in all circumstances. > > > > So change the out_err case to retry as a sync copy. > > > > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") > > Signed-off-by: NeilBrown <neilb@suse.de> > > --- > > fs/nfsd/nfs4proc.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > > index fea171ffed62..06e0d9153ca9 100644 > > --- a/fs/nfsd/nfs4proc.c > > +++ b/fs/nfsd/nfs4proc.c > > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > wake_up_process(async_copy->copy_task); > > status = nfs_ok; > > } else { > > + retry_sync: > > status = nfsd4_do_copy(copy, copy->nf_src->nf_file, > > copy->nf_dst->nf_file, true); > > } > > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > } > > if (async_copy) > > cleanup_async_copy(async_copy); > > - status = nfserr_jukebox; > > - goto out; > > + goto retry_sync; > > } > > > > static struct nfsd4_copy * > > > > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44 > > > You're probably right that attempting to do this synchronously is > probably best. That should avoid another RPC, though at the cost of > having to shovel the data into the pagecache. > > If memory is very tight, I suppose the synchronous copy might also fail > with NFS4ERR_DELAY, in which case we're no worse off. > > Reviewed-by: Jeff Layton <jlayton@kernel.org>
On Mon, Nov 04, 2024 at 03:47:42PM +1100, NeilBrown wrote: > > An NFSv4.2 COPY request can explicitly request a synchronous copy. If > that is not requested then the server is free to perform the copy > synchronously or asynchronously. > > In the Linux implementation an async copy requires more resources than a > sync copy. If nfsd cannot allocate these resources, the best response > is to simply perform the copy (or the first 4MB of it) synchronously. > > This choice may be debatable if the unavailable resource was due to > memory allocation failure - when memalloc fails it might be best to > simply give up as the server is clearly under load. However in the case > that policy prevents another kthread being created there is no benefit > and much cost is failing with NFS4ERR_DELAY. In that case it seems > reasonable to avoid that error in all circumstances. > > So change the out_err case to retry as a sync copy. > > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") Hi Neil, Why is a Fixes: tag necessary? And why that commit? async copies can fail due to lack of resources on kernels that don't have aadc3bbea163, AFAICT. > Signed-off-by: NeilBrown <neilb@suse.de> > --- > fs/nfsd/nfs4proc.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > index fea171ffed62..06e0d9153ca9 100644 > --- a/fs/nfsd/nfs4proc.c > +++ b/fs/nfsd/nfs4proc.c > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > wake_up_process(async_copy->copy_task); > status = nfs_ok; > } else { > + retry_sync: > status = nfsd4_do_copy(copy, copy->nf_src->nf_file, > copy->nf_dst->nf_file, true); > } > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > } > if (async_copy) > cleanup_async_copy(async_copy); > - status = nfserr_jukebox; > - goto out; > + goto retry_sync; > } > > static struct nfsd4_copy * > > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44 > -- > 2.47.0 >
On Tue, 05 Nov 2024, Jeff Layton wrote: > On Mon, 2024-11-04 at 15:47 +1100, NeilBrown wrote: > > An NFSv4.2 COPY request can explicitly request a synchronous copy. If > > that is not requested then the server is free to perform the copy > > synchronously or asynchronously. > > > > In the Linux implementation an async copy requires more resources than a > > sync copy. If nfsd cannot allocate these resources, the best response > > is to simply perform the copy (or the first 4MB of it) synchronously. > > > > Where does the copy get clamped at 4MB? > > > This choice may be debatable if the unavailable resource was due to > > memory allocation failure - when memalloc fails it might be best to > > simply give up as the server is clearly under load. However in the case > > that policy prevents another kthread being created there is no benefit > > and much cost is failing with NFS4ERR_DELAY. In that case it seems > > reasonable to avoid that error in all circumstances. > > > > So change the out_err case to retry as a sync copy. > > > > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") > > Signed-off-by: NeilBrown <neilb@suse.de> > > --- > > fs/nfsd/nfs4proc.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > > index fea171ffed62..06e0d9153ca9 100644 > > --- a/fs/nfsd/nfs4proc.c > > +++ b/fs/nfsd/nfs4proc.c > > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > wake_up_process(async_copy->copy_task); > > status = nfs_ok; > > } else { > > + retry_sync: > > status = nfsd4_do_copy(copy, copy->nf_src->nf_file, > > copy->nf_dst->nf_file, true); > > } > > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > } > > if (async_copy) > > cleanup_async_copy(async_copy); > > - status = nfserr_jukebox; > > - goto out; > > + goto retry_sync; > > } > > > > static struct nfsd4_copy * > > > > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44 > > > You're probably right that attempting to do this synchronously is > probably best. That should avoid another RPC, though at the cost of > having to shovel the data into the pagecache. COPY always shovels data into the pagecache. The difference between sync and async is: - sync runs in nfsd thread, async run in a workqueue - sync stops after 4MB, async loops around after each 4MB until it finishes (plus the obvious protocol differences). > > If memory is very tight, I suppose the synchronous copy might also fail > with NFS4ERR_DELAY, in which case we're no worse off. Memory being tight isn't the interesting case. More COPYs than nfsd threads is the interesting case. > > Reviewed-by: Jeff Layton <jlayton@kernel.org> Thanks, NeilBrown >
On Tue, 05 Nov 2024, Chuck Lever wrote: > On Mon, Nov 04, 2024 at 03:47:42PM +1100, NeilBrown wrote: > > > > An NFSv4.2 COPY request can explicitly request a synchronous copy. If > > that is not requested then the server is free to perform the copy > > synchronously or asynchronously. > > > > In the Linux implementation an async copy requires more resources than a > > sync copy. If nfsd cannot allocate these resources, the best response > > is to simply perform the copy (or the first 4MB of it) synchronously. > > > > This choice may be debatable if the unavailable resource was due to > > memory allocation failure - when memalloc fails it might be best to > > simply give up as the server is clearly under load. However in the case > > that policy prevents another kthread being created there is no benefit > > and much cost is failing with NFS4ERR_DELAY. In that case it seems > > reasonable to avoid that error in all circumstances. > > > > So change the out_err case to retry as a sync copy. > > > > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") > > Hi Neil, > > Why is a Fixes: tag necessary? > > And why that commit? async copies can fail due to lack of resources > on kernels that don't have aadc3bbea163, AFAICT. I had hoped my commit message would have explained that, though I accept it was not as explicit as it could be. kmalloc(GFP_KERNEL) allocation failures aren't interesting. They never happen for smallish sizes, and if they do then the server is so borked that it hardly matter what we do. The fixed commit introduces a new failure mode that COULD easily be hit in practice. It causes the N+1st COPY to wait indefinitely until at least one other copy completes which, as you observed in that commit, could "run for a long time". I don't think that behaviour is necessary or appropriate. Changing the handling for kmalloc failure was just an irrelevant side-effect for changing the behaviour when then number of COPY requests exceeded the number of configured threads. This came up because CVE-2024-49974 was created so I had to do something about the theoretical DoS vector in SLE kernels. I didn't like the patch so I backported Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous") instead (and wondered why it hadn't gone to stable). Thanks, NeilBrown > > > > Signed-off-by: NeilBrown <neilb@suse.de> > > --- > > fs/nfsd/nfs4proc.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > > index fea171ffed62..06e0d9153ca9 100644 > > --- a/fs/nfsd/nfs4proc.c > > +++ b/fs/nfsd/nfs4proc.c > > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > wake_up_process(async_copy->copy_task); > > status = nfs_ok; > > } else { > > + retry_sync: > > status = nfsd4_do_copy(copy, copy->nf_src->nf_file, > > copy->nf_dst->nf_file, true); > > } > > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > } > > if (async_copy) > > cleanup_async_copy(async_copy); > > - status = nfserr_jukebox; > > - goto out; > > + goto retry_sync; > > } > > > > static struct nfsd4_copy * > > > > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44 > > -- > > 2.47.0 > > > > -- > Chuck Lever >
On Tue, Nov 05, 2024 at 07:30:03AM +1100, NeilBrown wrote: > On Tue, 05 Nov 2024, Chuck Lever wrote: > > On Mon, Nov 04, 2024 at 03:47:42PM +1100, NeilBrown wrote: > > > > > > An NFSv4.2 COPY request can explicitly request a synchronous copy. If > > > that is not requested then the server is free to perform the copy > > > synchronously or asynchronously. > > > > > > In the Linux implementation an async copy requires more resources than a > > > sync copy. If nfsd cannot allocate these resources, the best response > > > is to simply perform the copy (or the first 4MB of it) synchronously. > > > > > > This choice may be debatable if the unavailable resource was due to > > > memory allocation failure - when memalloc fails it might be best to > > > simply give up as the server is clearly under load. However in the case > > > that policy prevents another kthread being created there is no benefit > > > and much cost is failing with NFS4ERR_DELAY. In that case it seems > > > reasonable to avoid that error in all circumstances. > > > > > > So change the out_err case to retry as a sync copy. > > > > > > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") > > > > Hi Neil, > > > > Why is a Fixes: tag necessary? > > > > And why that commit? async copies can fail due to lack of resources > > on kernels that don't have aadc3bbea163, AFAICT. > > I had hoped my commit message would have explained that, though I accept > it was not as explicit as it could be. The problem might be that you and I have different understandings of what exactly aadc3bbea163 does. > kmalloc(GFP_KERNEL) allocation failures aren't interesting. They never > happen for smallish sizes, and if they do then the server is so borked > that it hardly matter what we do. > > The fixed commit introduces a new failure mode that COULD easily be hit > in practice. It causes the N+1st COPY to wait indefinitely until at > least one other copy completes which, as you observed in that commit, > could "run for a long time". I don't think that behaviour is necessary > or appropriate. The waiting happens on the client. An async COPY operation always completes quickly on the server, in this case with NFS4ERR_DELAY. It does not tie up an nfsd thread. By the way, there are two fixes in this area that now appear in v6.12-rc6 that you should check out. > Changing the handling for kmalloc failure was just an irrelevant > side-effect for changing the behaviour when then number of COPY requests > exceeded the number of configured threads. aadc3bbea163 checks the number of concurrent /async/ COPY requests, which do not tie up nfsd threads, and thus are not limited by the svc_thread count, as synchronous COPY operations are by definition. I'm still thinking about the ramifications of converting an async COPY to a sync COPY in this case. We want to reduce the server workload in this case, rather than accommodate an aggressive client. > This came up because CVE-2024-49974 was created so I had to do something > about the theoretical DoS vector in SLE kernels. I didn't like the > patch so I backported > > Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous") > > instead (and wondered why it hadn't gone to stable). I was conservative about requesting a backport here. However, if a CVE has been filed, and if there is no automation behind that process, you can explicitly request aadc3bbea163 be backported. The problem, to me, was less about server resource depletion and more about client hangs. > Thanks, > NeilBrown > > > > > > > > > Signed-off-by: NeilBrown <neilb@suse.de> > > > --- > > > fs/nfsd/nfs4proc.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > > > index fea171ffed62..06e0d9153ca9 100644 > > > --- a/fs/nfsd/nfs4proc.c > > > +++ b/fs/nfsd/nfs4proc.c > > > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > > wake_up_process(async_copy->copy_task); > > > status = nfs_ok; > > > } else { > > > + retry_sync: > > > status = nfsd4_do_copy(copy, copy->nf_src->nf_file, > > > copy->nf_dst->nf_file, true); > > > } > > > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > > } > > > if (async_copy) > > > cleanup_async_copy(async_copy); > > > - status = nfserr_jukebox; > > > - goto out; > > > + goto retry_sync; > > > } > > > > > > static struct nfsd4_copy * > > > > > > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44 > > > -- > > > 2.47.0 > > > > > > > -- > > Chuck Lever > > >
On Tue, 05 Nov 2024, Chuck Lever wrote: > On Tue, Nov 05, 2024 at 07:30:03AM +1100, NeilBrown wrote: > > On Tue, 05 Nov 2024, Chuck Lever wrote: > > > On Mon, Nov 04, 2024 at 03:47:42PM +1100, NeilBrown wrote: > > > > > > > > An NFSv4.2 COPY request can explicitly request a synchronous copy. If > > > > that is not requested then the server is free to perform the copy > > > > synchronously or asynchronously. > > > > > > > > In the Linux implementation an async copy requires more resources than a > > > > sync copy. If nfsd cannot allocate these resources, the best response > > > > is to simply perform the copy (or the first 4MB of it) synchronously. > > > > > > > > This choice may be debatable if the unavailable resource was due to > > > > memory allocation failure - when memalloc fails it might be best to > > > > simply give up as the server is clearly under load. However in the case > > > > that policy prevents another kthread being created there is no benefit > > > > and much cost is failing with NFS4ERR_DELAY. In that case it seems > > > > reasonable to avoid that error in all circumstances. > > > > > > > > So change the out_err case to retry as a sync copy. > > > > > > > > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") > > > > > > Hi Neil, > > > > > > Why is a Fixes: tag necessary? > > > > > > And why that commit? async copies can fail due to lack of resources > > > on kernels that don't have aadc3bbea163, AFAICT. > > > > I had hoped my commit message would have explained that, though I accept > > it was not as explicit as it could be. > > The problem might be that you and I have different understandings of > what exactly aadc3bbea163 does. It might be. My understanding is that it limits the number of concurrent async COPY requests to ->sp_nrthreads and once that limit in reached any further COPY requests that don't explicitly request "synchronous" are refused with NFS4ERR_DELAY. > > > > kmalloc(GFP_KERNEL) allocation failures aren't interesting. They never > > happen for smallish sizes, and if they do then the server is so borked > > that it hardly matter what we do. > > > > The fixed commit introduces a new failure mode that COULD easily be hit > > in practice. It causes the N+1st COPY to wait indefinitely until at > > least one other copy completes which, as you observed in that commit, > > could "run for a long time". I don't think that behaviour is necessary > > or appropriate. > > The waiting happens on the client. An async COPY operation always > completes quickly on the server, in this case with NFS4ERR_DELAY. It > does not tie up an nfsd thread. Agreed that it doesn't tie up an nfsd thread. It does tie up a separate kthread for which there is a limit matching the number of nfsd threads (in the pool). Agreed that the waiting happens on the client, but why should there be any waiting at all? The client doesn't know what it is waiting for, so will typically wait a few seconds. In that time many megabytes of sync COPY could have been processed. > > By the way, there are two fixes in this area that now appear in > v6.12-rc6 that you should check out. I'll try to schedule time to have a look - thanks. > > > > Changing the handling for kmalloc failure was just an irrelevant > > side-effect for changing the behaviour when then number of COPY requests > > exceeded the number of configured threads. > > aadc3bbea163 checks the number of concurrent /async/ COPY requests, > which do not tie up nfsd threads, and thus are not limited by the > svc_thread count, as synchronous COPY operations are by definition. They are PRECISELY limited by the svc_thread count. ->sp_nrthreads. + if (atomic_inc_return(&nn->pending_async_copies) > + (int)rqstp->rq_pool->sp_nrthreads) { > > I'm still thinking about the ramifications of converting an async > COPY to a sync COPY in this case. We want to reduce the server > workload in this case, rather than accommodate an aggressive client. We are not "converting" an async COPY to a sync COPY. There is no such thing as an "async COPY" in terms of what the client requests. The client can request "COPY" which there server may perform sync or async, or the client can request "COPY synchronous" which the server must perform synchronously, or refuse to perform. By tying up a thread temporarily with a sync COPY we do reduce server workload by potentially increasing latency to the client. I don't think that "aggressive" is a fair description of the client. "opportunistic" might be reasonable. My current thinking is that we should not start extra threads for handling async copies. We should create a queue of pending copies and any nfsd thread can dequeue a copy and process 4MB each time through "The main request loop" just like it calls nfsd_file_net_dispose() to do a little bit of work. That isn't needed now but I'll need something like that before my dynamic thread pool work can land. > > > > This came up because CVE-2024-49974 was created so I had to do something > > about the theoretical DoS vector in SLE kernels. I didn't like the > > patch so I backported > > > > Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous") > > > > instead (and wondered why it hadn't gone to stable). > > I was conservative about requesting a backport here. However, if a > CVE has been filed, and if there is no automation behind that > process, you can explicitly request aadc3bbea163 be backported. > > The problem, to me, was less about server resource depletion and > more about client hangs. And yet the patch that dealt with the less important server resource depletion was marked for stable, and the patch that dealt with client hangs wasn't?? The CVE was for that less important patch, probably because it contained the magic word "DoS". I think 8d915bbf3926 should go to stable but I would like to understand why you felt the need to be conservative. Thanks, NeilBrown > > > > Thanks, > > NeilBrown > > > > > > > > > > > > > > Signed-off-by: NeilBrown <neilb@suse.de> > > > > --- > > > > fs/nfsd/nfs4proc.c | 4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > > > > index fea171ffed62..06e0d9153ca9 100644 > > > > --- a/fs/nfsd/nfs4proc.c > > > > +++ b/fs/nfsd/nfs4proc.c > > > > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > > > wake_up_process(async_copy->copy_task); > > > > status = nfs_ok; > > > > } else { > > > > + retry_sync: > > > > status = nfsd4_do_copy(copy, copy->nf_src->nf_file, > > > > copy->nf_dst->nf_file, true); > > > > } > > > > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > > > } > > > > if (async_copy) > > > > cleanup_async_copy(async_copy); > > > > - status = nfserr_jukebox; > > > > - goto out; > > > > + goto retry_sync; > > > > } > > > > > > > > static struct nfsd4_copy * > > > > > > > > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44 > > > > -- > > > > 2.47.0 > > > > > > > > > > -- > > > Chuck Lever > > > > > > > -- > Chuck Lever >
On Tue, Nov 05, 2024 at 11:32:48AM +1100, NeilBrown wrote: > On Tue, 05 Nov 2024, Chuck Lever wrote: > > On Tue, Nov 05, 2024 at 07:30:03AM +1100, NeilBrown wrote: > > > On Tue, 05 Nov 2024, Chuck Lever wrote: > > > > On Mon, Nov 04, 2024 at 03:47:42PM +1100, NeilBrown wrote: > > > > > > > > > > An NFSv4.2 COPY request can explicitly request a synchronous copy. If > > > > > that is not requested then the server is free to perform the copy > > > > > synchronously or asynchronously. > > > > > > > > > > In the Linux implementation an async copy requires more resources than a > > > > > sync copy. If nfsd cannot allocate these resources, the best response > > > > > is to simply perform the copy (or the first 4MB of it) synchronously. > > > > > > > > > > This choice may be debatable if the unavailable resource was due to > > > > > memory allocation failure - when memalloc fails it might be best to > > > > > simply give up as the server is clearly under load. However in the case > > > > > that policy prevents another kthread being created there is no benefit > > > > > and much cost is failing with NFS4ERR_DELAY. In that case it seems > > > > > reasonable to avoid that error in all circumstances. > > > > > > > > > > So change the out_err case to retry as a sync copy. > > > > > > > > > > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") > > > > > > > > Hi Neil, > > > > > > > > Why is a Fixes: tag necessary? > > > > > > > > And why that commit? async copies can fail due to lack of resources > > > > on kernels that don't have aadc3bbea163, AFAICT. > > > > > > I had hoped my commit message would have explained that, though I accept > > > it was not as explicit as it could be. > > > > The problem might be that you and I have different understandings of > > what exactly aadc3bbea163 does. > > It might be. > My understanding is that it limits the number of concurrent async > COPY requests to ->sp_nrthreads and once that limit in reached > any further COPY requests that don't explicitly request "synchronous" > are refused with NFS4ERR_DELAY. > > > > kmalloc(GFP_KERNEL) allocation failures aren't interesting. They never > > > happen for smallish sizes, and if they do then the server is so borked > > > that it hardly matter what we do. > > > > > > The fixed commit introduces a new failure mode that COULD easily be hit > > > in practice. It causes the N+1st COPY to wait indefinitely until at > > > least one other copy completes which, as you observed in that commit, > > > could "run for a long time". I don't think that behaviour is necessary > > > or appropriate. > > > > The waiting happens on the client. An async COPY operation always > > completes quickly on the server, in this case with NFS4ERR_DELAY. It > > does not tie up an nfsd thread. > > Agreed that it doesn't tie up an nfsd thread. It does tie up a separate > kthread for which there is a limit matching the number of nfsd threads > (in the pool). > > Agreed that the waiting happens on the client, but why should there be > any waiting at all? The client doesn't know what it is waiting for, so > will typically wait a few seconds. In that time many megabytes of sync > COPY could have been processed. The Linux NFS client's usual delay is, IIRC, 100 msec with exponential backoff. It's possible that the number of async copies is large because they are running on a slow export. Adding more copy work is going to make the situation worse -- and by handling the work with a synchronous COPY, it will tie up threads that should be available for other work. The feedback loop here should result in reducing server workload, not increasing it. > > By the way, there are two fixes in this area that now appear in > > v6.12-rc6 that you should check out. > > I'll try to schedule time to have a look - thanks. > > > > Changing the handling for kmalloc failure was just an irrelevant > > > side-effect for changing the behaviour when then number of COPY requests > > > exceeded the number of configured threads. > > > > aadc3bbea163 checks the number of concurrent /async/ COPY requests, > > which do not tie up nfsd threads, and thus are not limited by the > > svc_thread count, as synchronous COPY operations are by definition. > > They are PRECISELY limited by the svc_thread count. ->sp_nrthreads. I was describing the situation /before/ aadc3bbea163 , when there was no limit at all. Note that is an arbitrary limit. We could pick something else if this limit interferes with the dynamic thread count changes. > My current thinking is that we should not start extra threads for > handling async copies. We should create a queue of pending copies and > any nfsd thread can dequeue a copy and process 4MB each time through > "The main request loop" just like it calls nfsd_file_net_dispose() to do > a little bit of work. Having nfsd threads handle this workload again invites a DoS vector. The 4MB chunk limit is there precisely to prevent synchronous COPY operations from tying up nfsd threads for too long. On a slow export, this is still not going to work, so I'd rather see a timer for this protection; say, 30ms, rather than a byte count. If more than 4MB can be handled quickly, that will be good for throughput. Note that we still want to limit the number of background copy operations going on. I don't want a mechanism where a client can start an unbounded amount of work on the server. > > > This came up because CVE-2024-49974 was created so I had to do something > > > about the theoretical DoS vector in SLE kernels. I didn't like the > > > patch so I backported > > > > > > Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous") > > > > > > instead (and wondered why it hadn't gone to stable). > > > > I was conservative about requesting a backport here. However, if a > > CVE has been filed, and if there is no automation behind that > > process, you can explicitly request aadc3bbea163 be backported. > > > > The problem, to me, was less about server resource depletion and > > more about client hangs. > > And yet the patch that dealt with the less important server resource > depletion was marked for stable, and the patch that dealt with client > hangs wasn't?? > > The CVE was for that less important patch, probably because it contained > the magic word "DoS". Quite likely. I wasn't consulted before the CVE was opened, nor was I notified that it had been created. Note that distributions are encouraged to evaluate whether a CVE is serious enough to address, rather than simply backporting the fixes automatically. But I know some customers want every CVE handled, so that is sometimes difficult. > I think 8d915bbf3926 should go to stable but I would like to understand > why you felt the need to be conservative. First, I'm told that LTS generally avoids taking backports that overtly change user-visible behavior like disabling server-to-server copy (which requires async COPY to work). That was the main reason for my hesitance. But more importantly, the problem with the automatic backport mechanism is that marked patches are taken /immediately/ into stable. They don't get the kind of soak time that a normally-merged unmarked patch gets. The only way to ensure they get any real-world test experience at all is to not mark them, and then come back to them later and explicitly request a backport. And, generally, we want to know that a patch destined for LTS kernels has actually been applied to and tested on LTS first. Automatically backported patches don't get that verification at all. My overall preference is that Fixed: patches should be ignored by the automation, and that we have a designated NFSD LTS maintainer who will test patches on each LTS kernel and request their backport. I haven't found anyone to do that work, so we are limping along with the current situation. I recognize, however, that this needs to improve somehow with only the maintainer resources we have. > > > > > Signed-off-by: NeilBrown <neilb@suse.de> > > > > > --- > > > > > fs/nfsd/nfs4proc.c | 4 ++-- > > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > > > > > index fea171ffed62..06e0d9153ca9 100644 > > > > > --- a/fs/nfsd/nfs4proc.c > > > > > +++ b/fs/nfsd/nfs4proc.c > > > > > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > > > > wake_up_process(async_copy->copy_task); > > > > > status = nfs_ok; > > > > > } else { > > > > > + retry_sync: > > > > > status = nfsd4_do_copy(copy, copy->nf_src->nf_file, > > > > > copy->nf_dst->nf_file, true); > > > > > } > > > > > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > > > > } > > > > > if (async_copy) > > > > > cleanup_async_copy(async_copy); > > > > > - status = nfserr_jukebox; > > > > > - goto out; > > > > > + goto retry_sync; > > > > > } > > > > > > > > > > static struct nfsd4_copy * > > > > > > > > > > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44 > > > > > -- > > > > > 2.47.0 > > > > > > > > > > > > > -- > > > > Chuck Lever > > > > > > > > > > > -- > > Chuck Lever > > >
On Wed, 06 Nov 2024, Chuck Lever wrote: > On Tue, Nov 05, 2024 at 11:32:48AM +1100, NeilBrown wrote: > > On Tue, 05 Nov 2024, Chuck Lever wrote: > > > On Tue, Nov 05, 2024 at 07:30:03AM +1100, NeilBrown wrote: > > > > On Tue, 05 Nov 2024, Chuck Lever wrote: > > > > > On Mon, Nov 04, 2024 at 03:47:42PM +1100, NeilBrown wrote: > > > > > > > > > > > > An NFSv4.2 COPY request can explicitly request a synchronous copy. If > > > > > > that is not requested then the server is free to perform the copy > > > > > > synchronously or asynchronously. > > > > > > > > > > > > In the Linux implementation an async copy requires more resources than a > > > > > > sync copy. If nfsd cannot allocate these resources, the best response > > > > > > is to simply perform the copy (or the first 4MB of it) synchronously. > > > > > > > > > > > > This choice may be debatable if the unavailable resource was due to > > > > > > memory allocation failure - when memalloc fails it might be best to > > > > > > simply give up as the server is clearly under load. However in the case > > > > > > that policy prevents another kthread being created there is no benefit > > > > > > and much cost is failing with NFS4ERR_DELAY. In that case it seems > > > > > > reasonable to avoid that error in all circumstances. > > > > > > > > > > > > So change the out_err case to retry as a sync copy. > > > > > > > > > > > > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") > > > > > > > > > > Hi Neil, > > > > > > > > > > Why is a Fixes: tag necessary? > > > > > > > > > > And why that commit? async copies can fail due to lack of resources > > > > > on kernels that don't have aadc3bbea163, AFAICT. > > > > > > > > I had hoped my commit message would have explained that, though I accept > > > > it was not as explicit as it could be. > > > > > > The problem might be that you and I have different understandings of > > > what exactly aadc3bbea163 does. > > > > It might be. > > My understanding is that it limits the number of concurrent async > > COPY requests to ->sp_nrthreads and once that limit in reached > > any further COPY requests that don't explicitly request "synchronous" > > are refused with NFS4ERR_DELAY. > > > > > > kmalloc(GFP_KERNEL) allocation failures aren't interesting. They never > > > > happen for smallish sizes, and if they do then the server is so borked > > > > that it hardly matter what we do. > > > > > > > > The fixed commit introduces a new failure mode that COULD easily be hit > > > > in practice. It causes the N+1st COPY to wait indefinitely until at > > > > least one other copy completes which, as you observed in that commit, > > > > could "run for a long time". I don't think that behaviour is necessary > > > > or appropriate. > > > > > > The waiting happens on the client. An async COPY operation always > > > completes quickly on the server, in this case with NFS4ERR_DELAY. It > > > does not tie up an nfsd thread. > > > > Agreed that it doesn't tie up an nfsd thread. It does tie up a separate > > kthread for which there is a limit matching the number of nfsd threads > > (in the pool). > > > > Agreed that the waiting happens on the client, but why should there be > > any waiting at all? The client doesn't know what it is waiting for, so > > will typically wait a few seconds. In that time many megabytes of sync > > COPY could have been processed. > > The Linux NFS client's usual delay is, IIRC, 100 msec with > exponential backoff. Yep, up to 15seconds. > > It's possible that the number of async copies is large because they > are running on a slow export. Adding more copy work is going to make > the situation worse -- and by handling the work with a synchronous > COPY, it will tie up threads that should be available for other > work. Should be available for "work" yes, but why "other work"? Is COPY work not as important as READ or WRITE or GETATTR work? READ/WRITE are limited to 1MB, sync-COPY to 4MB so a small difference that, but it doesn't seem substantial. > > The feedback loop here should result in reducing server workload, > not increasing it. I agree with not increasing it. I don't see the rational for reducing workload, only for limiting it. > > > > > By the way, there are two fixes in this area that now appear in > > > v6.12-rc6 that you should check out. > > > > I'll try to schedule time to have a look - thanks. > > > > > > Changing the handling for kmalloc failure was just an irrelevant > > > > side-effect for changing the behaviour when then number of COPY requests > > > > exceeded the number of configured threads. > > > > > > aadc3bbea163 checks the number of concurrent /async/ COPY requests, > > > which do not tie up nfsd threads, and thus are not limited by the > > > svc_thread count, as synchronous COPY operations are by definition. > > > > They are PRECISELY limited by the svc_thread count. ->sp_nrthreads. > > I was describing the situation /before/ aadc3bbea163 , when there > was no limit at all. > > Note that is an arbitrary limit. We could pick something else if > this limit interferes with the dynamic thread count changes. > > > > My current thinking is that we should not start extra threads for > > handling async copies. We should create a queue of pending copies and > > any nfsd thread can dequeue a copy and process 4MB each time through > > "The main request loop" just like it calls nfsd_file_net_dispose() to do > > a little bit of work. > > Having nfsd threads handle this workload again invites a DoS vector. Any more so that having nfsd thread handling a WRITE workload? > > The 4MB chunk limit is there precisely to prevent synchronous COPY > operations from tying up nfsd threads for too long. On a slow export, > this is still not going to work, so I'd rather see a timer for this > protection; say, 30ms, rather than a byte count. If more than 4MB > can be handled quickly, that will be good for throughput. That sounds like a good goal. Ideally we would need a way to negotiate a window with write-back throttling so that we don't bother reading until we know that writing to the page-cache won't block. Certainly worth exploring. > > Note that we still want to limit the number of background copy > operations going on. I don't want a mechanism where a client can > start an unbounded amount of work on the server. This isn't obvious to me. The server makes no promise concerning the throughput it will provide. Having a list of COPY requests that add up to many terabytes isn't intrinsically a problem. Having millions of COPY requests in the list *might* be a little bit of a burden. Using __GFP_NORETRY might help put a natural limit on that. > > > > > > This came up because CVE-2024-49974 was created so I had to do something > > > > about the theoretical DoS vector in SLE kernels. I didn't like the > > > > patch so I backported > > > > > > > > Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous") > > > > > > > > instead (and wondered why it hadn't gone to stable). > > > > > > I was conservative about requesting a backport here. However, if a > > > CVE has been filed, and if there is no automation behind that > > > process, you can explicitly request aadc3bbea163 be backported. > > > > > > The problem, to me, was less about server resource depletion and > > > more about client hangs. > > > > And yet the patch that dealt with the less important server resource > > depletion was marked for stable, and the patch that dealt with client > > hangs wasn't?? > > > > The CVE was for that less important patch, probably because it contained > > the magic word "DoS". > > Quite likely. I wasn't consulted before the CVE was opened, nor was > I notified that it had been created. > > Note that distributions are encouraged to evaluate whether a CVE is > serious enough to address, rather than simply backporting the fixes > automatically. But I know some customers want every CVE handled, so > that is sometimes difficult. Yes, it needs to be handled. Declaring it invalid is certainly an option for handling it. I didn't quite feel I could justify that in this case. > > > > I think 8d915bbf3926 should go to stable but I would like to understand > > why you felt the need to be conservative. > > First, I'm told that LTS generally avoids taking backports that > overtly change user-visible behavior like disabling server-to-server > copy (which requires async COPY to work). That was the main reason > for my hesitance. Why does server-to-server require async COPY? RFC 7862 section 4.5. Inter-Server Copy says The destination server may perform the copy synchronously or asynchronously. but I see that nfsd4_copy() returns nfs4err_notsupp if the inter-server copy_is_sync(), but it isn't immediately obvious to me why. The patch which landed this functionality doesn't explain the restriction. I guess that with multiple 4MB sync COPY requests the server would need to repeatedly mount and unmount the source server which could be unnecessary work - or would need to cache the mount and know when to unmount it.... On the other hand, changing the user-visible behaviour of the client unexpected hanging waiting for a server-side copy completion notification that it will never get seems like a user-visible change that would be desirable. I'm starting to see why this is awkward. > > But more importantly, the problem with the automatic backport > mechanism is that marked patches are taken /immediately/ into > stable. They don't get the kind of soak time that a normally-merged > unmarked patch gets. The only way to ensure they get any real-world > test experience at all is to not mark them, and then come back to > them later and explicitly request a backport. > > And, generally, we want to know that a patch destined for LTS > kernels has actually been applied to and tested on LTS first. > Automatically backported patches don't get that verification at all. I thought it was possible to mark patches to tell the stable team exactly what you want. Greg certainly seems eager to give maintainers as much control as they ask for - without requiring them to do anything they don't want to do. If you have a clear idea of what you want, it might be good to spell that out and ask how to achieve it. > > My overall preference is that Fixed: patches should be ignored by > the automation, and that we have a designated NFSD LTS maintainer > who will test patches on each LTS kernel and request their backport. > I haven't found anyone to do that work, so we are limping along with > the current situation. I recognize, however, that this needs to > improve somehow with only the maintainer resources we have. :-) Thanks, NeilBrown > > > > > > > > Signed-off-by: NeilBrown <neilb@suse.de> > > > > > > --- > > > > > > fs/nfsd/nfs4proc.c | 4 ++-- > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > > > > > > index fea171ffed62..06e0d9153ca9 100644 > > > > > > --- a/fs/nfsd/nfs4proc.c > > > > > > +++ b/fs/nfsd/nfs4proc.c > > > > > > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > > > > > wake_up_process(async_copy->copy_task); > > > > > > status = nfs_ok; > > > > > > } else { > > > > > > + retry_sync: > > > > > > status = nfsd4_do_copy(copy, copy->nf_src->nf_file, > > > > > > copy->nf_dst->nf_file, true); > > > > > > } > > > > > > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > > > > > } > > > > > > if (async_copy) > > > > > > cleanup_async_copy(async_copy); > > > > > > - status = nfserr_jukebox; > > > > > > - goto out; > > > > > > + goto retry_sync; > > > > > > } > > > > > > > > > > > > static struct nfsd4_copy * > > > > > > > > > > > > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44 > > > > > > -- > > > > > > 2.47.0 > > > > > > > > > > > > > > > > -- > > > > > Chuck Lever > > > > > > > > > > > > > > > -- > > > Chuck Lever > > > > > > > -- > Chuck Lever >
On Wed, Nov 06, 2024 at 08:06:40AM +1100, NeilBrown wrote: > On Wed, 06 Nov 2024, Chuck Lever wrote: > > But more importantly, the problem with the automatic backport > > mechanism is that marked patches are taken /immediately/ into > > stable. They don't get the kind of soak time that a normally-merged > > unmarked patch gets. The only way to ensure they get any real-world > > test experience at all is to not mark them, and then come back to > > them later and explicitly request a backport. > > > > And, generally, we want to know that a patch destined for LTS > > kernels has actually been applied to and tested on LTS first. > > Automatically backported patches don't get that verification at all. > > I thought it was possible to mark patches to tell the stable team > exactly what you want. Greg certainly seems eager to give maintainers as > much control as they ask for - without requiring them to do anything > they don't want to do. Yes, Greg and Sasha want to do something constructive. I see suggestions flying by from time to time. The last specific one I think was rejected by Linus. It appears to be an ongoing conversation. > If you have a clear idea of what you want, it > might be good to spell that out and ask how to achieve it. Perhaps having a separate review process, where each patch in nfsd-next is audited just before the merge window to see whether it should be marked for stable, would help. Keeping written notes on these decisions would also be helpful; Greg asks when an explicit backport request comes by why it wasn't marked when it went into Linus' branch. Making it easier / more automated to test-apply such patches to LTS kernels is also something to think about. I have nightly CI for those branches, but it tests what is already in the LTS queue rather than testing NFSD-specific patches.
On Tue, Nov 5, 2024 at 4:06 PM NeilBrown <neilb@suse.de> wrote: > > On Wed, 06 Nov 2024, Chuck Lever wrote: > > On Tue, Nov 05, 2024 at 11:32:48AM +1100, NeilBrown wrote: > > > On Tue, 05 Nov 2024, Chuck Lever wrote: > > > > On Tue, Nov 05, 2024 at 07:30:03AM +1100, NeilBrown wrote: > > > > > On Tue, 05 Nov 2024, Chuck Lever wrote: > > > > > > On Mon, Nov 04, 2024 at 03:47:42PM +1100, NeilBrown wrote: > > > > > > > > > > > > > > An NFSv4.2 COPY request can explicitly request a synchronous copy. If > > > > > > > that is not requested then the server is free to perform the copy > > > > > > > synchronously or asynchronously. > > > > > > > > > > > > > > In the Linux implementation an async copy requires more resources than a > > > > > > > sync copy. If nfsd cannot allocate these resources, the best response > > > > > > > is to simply perform the copy (or the first 4MB of it) synchronously. > > > > > > > > > > > > > > This choice may be debatable if the unavailable resource was due to > > > > > > > memory allocation failure - when memalloc fails it might be best to > > > > > > > simply give up as the server is clearly under load. However in the case > > > > > > > that policy prevents another kthread being created there is no benefit > > > > > > > and much cost is failing with NFS4ERR_DELAY. In that case it seems > > > > > > > reasonable to avoid that error in all circumstances. > > > > > > > > > > > > > > So change the out_err case to retry as a sync copy. > > > > > > > > > > > > > > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") > > > > > > > > > > > > Hi Neil, > > > > > > > > > > > > Why is a Fixes: tag necessary? > > > > > > > > > > > > And why that commit? async copies can fail due to lack of resources > > > > > > on kernels that don't have aadc3bbea163, AFAICT. > > > > > > > > > > I had hoped my commit message would have explained that, though I accept > > > > > it was not as explicit as it could be. > > > > > > > > The problem might be that you and I have different understandings of > > > > what exactly aadc3bbea163 does. > > > > > > It might be. > > > My understanding is that it limits the number of concurrent async > > > COPY requests to ->sp_nrthreads and once that limit in reached > > > any further COPY requests that don't explicitly request "synchronous" > > > are refused with NFS4ERR_DELAY. > > > > > > > > kmalloc(GFP_KERNEL) allocation failures aren't interesting. They never > > > > > happen for smallish sizes, and if they do then the server is so borked > > > > > that it hardly matter what we do. > > > > > > > > > > The fixed commit introduces a new failure mode that COULD easily be hit > > > > > in practice. It causes the N+1st COPY to wait indefinitely until at > > > > > least one other copy completes which, as you observed in that commit, > > > > > could "run for a long time". I don't think that behaviour is necessary > > > > > or appropriate. > > > > > > > > The waiting happens on the client. An async COPY operation always > > > > completes quickly on the server, in this case with NFS4ERR_DELAY. It > > > > does not tie up an nfsd thread. > > > > > > Agreed that it doesn't tie up an nfsd thread. It does tie up a separate > > > kthread for which there is a limit matching the number of nfsd threads > > > (in the pool). > > > > > > Agreed that the waiting happens on the client, but why should there be > > > any waiting at all? The client doesn't know what it is waiting for, so > > > will typically wait a few seconds. In that time many megabytes of sync > > > COPY could have been processed. > > > > The Linux NFS client's usual delay is, IIRC, 100 msec with > > exponential backoff. > > Yep, up to 15seconds. > > > > > It's possible that the number of async copies is large because they > > are running on a slow export. Adding more copy work is going to make > > the situation worse -- and by handling the work with a synchronous > > COPY, it will tie up threads that should be available for other > > work. > > Should be available for "work" yes, but why "other work"? Is COPY work > not as important as READ or WRITE or GETATTR work? > READ/WRITE are limited to 1MB, sync-COPY to 4MB so a small difference > that, but it doesn't seem substantial. > > > > > The feedback loop here should result in reducing server workload, > > not increasing it. > > I agree with not increasing it. I don't see the rational for reducing > workload, only for limiting it. > > > > > > > > > By the way, there are two fixes in this area that now appear in > > > > v6.12-rc6 that you should check out. > > > > > > I'll try to schedule time to have a look - thanks. > > > > > > > > Changing the handling for kmalloc failure was just an irrelevant > > > > > side-effect for changing the behaviour when then number of COPY requests > > > > > exceeded the number of configured threads. > > > > > > > > aadc3bbea163 checks the number of concurrent /async/ COPY requests, > > > > which do not tie up nfsd threads, and thus are not limited by the > > > > svc_thread count, as synchronous COPY operations are by definition. > > > > > > They are PRECISELY limited by the svc_thread count. ->sp_nrthreads. > > > > I was describing the situation /before/ aadc3bbea163 , when there > > was no limit at all. > > > > Note that is an arbitrary limit. We could pick something else if > > this limit interferes with the dynamic thread count changes. > > > > > > > My current thinking is that we should not start extra threads for > > > handling async copies. We should create a queue of pending copies and > > > any nfsd thread can dequeue a copy and process 4MB each time through > > > "The main request loop" just like it calls nfsd_file_net_dispose() to do > > > a little bit of work. > > > > Having nfsd threads handle this workload again invites a DoS vector. > > Any more so that having nfsd thread handling a WRITE workload? Isn't a difference between a COPY and a WRITE the fact that the server has the ability to restrict the TCP window of the client sending the bytes. And for simplicity's sake, if we assume client/server has a single TCP stream when the window size is limited then other WRITEs are also prevented from sending more data. But a COPY operation doesn't require much to be sent and preventing the client from sending another COPY can't be achieved thru TCP window size. > > The 4MB chunk limit is there precisely to prevent synchronous COPY > > operations from tying up nfsd threads for too long. On a slow export, > > this is still not going to work, so I'd rather see a timer for this > > protection; say, 30ms, rather than a byte count. If more than 4MB > > can be handled quickly, that will be good for throughput. > > That sounds like a good goal. Ideally we would need a way to negotiate > a window with write-back throttling so that we don't bother reading > until we know that writing to the page-cache won't block. Certainly > worth exploring. > > > > > Note that we still want to limit the number of background copy > > operations going on. I don't want a mechanism where a client can > > start an unbounded amount of work on the server. > > This isn't obvious to me. The server makes no promise concerning the > throughput it will provide. Having a list of COPY requests that add up > to many terabytes isn't intrinsically a problem. Having millions of > COPY requests in the list *might* be a little bit of a burden. Using > __GFP_NORETRY might help put a natural limit on that. > > > > > > > > > > This came up because CVE-2024-49974 was created so I had to do something > > > > > about the theoretical DoS vector in SLE kernels. I didn't like the > > > > > patch so I backported > > > > > > > > > > Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous") I'm doing the same for RHEL. But the fact that CVE was created seems like a flaw of the CVE creation process in this case. It should have never been made a CVE. > > > > > > > > > > instead (and wondered why it hadn't gone to stable). > > > > > > > > I was conservative about requesting a backport here. However, if a > > > > CVE has been filed, and if there is no automation behind that > > > > process, you can explicitly request aadc3bbea163 be backported. > > > > > > > > The problem, to me, was less about server resource depletion and > > > > more about client hangs. > > > > > > And yet the patch that dealt with the less important server resource > > > depletion was marked for stable, and the patch that dealt with client > > > hangs wasn't?? > > > > > > The CVE was for that less important patch, probably because it contained > > > the magic word "DoS". > > > > Quite likely. I wasn't consulted before the CVE was opened, nor was > > I notified that it had been created. > > > > Note that distributions are encouraged to evaluate whether a CVE is > > serious enough to address, rather than simply backporting the fixes > > automatically. But I know some customers want every CVE handled, so > > that is sometimes difficult. > > Yes, it needs to be handled. Declaring it invalid is certainly an > option for handling it. I didn't quite feel I could justify that in > this case. > > > > > > > > I think 8d915bbf3926 should go to stable but I would like to understand > > > why you felt the need to be conservative. > > > > First, I'm told that LTS generally avoids taking backports that > > overtly change user-visible behavior like disabling server-to-server > > copy (which requires async COPY to work). That was the main reason > > for my hesitance. > > Why does server-to-server require async COPY? > RFC 7862 section 4.5. Inter-Server Copy says > The destination server may perform the copy synchronously or > asynchronously. > but I see that nfsd4_copy() returns nfs4err_notsupp if the inter-server > copy_is_sync(), but it isn't immediately obvious to me why. The patch > which landed this functionality doesn't explain the restriction. The choice (my choice) for not implementing a synchronous inter-server copy was from my understanding that such operation would be taking too much time and tie up a knfsd thread. > I guess that with multiple 4MB sync COPY requests the server would need > to repeatedly mount and unmount the source server which could be > unnecessary work - or would need to cache the mount and know when to > unmount it.... The implementation caches the (source server) mount for a period of time. But needing to issue multiple COPY can eventually impact performance. > On the other hand, changing the user-visible behaviour of the client > unexpected hanging waiting for a server-side copy completion > notification that it will never get seems like a user-visible change > that would be desirable. > I'm starting to see why this is awkward. > > > > > But more importantly, the problem with the automatic backport > > mechanism is that marked patches are taken /immediately/ into > > stable. They don't get the kind of soak time that a normally-merged > > unmarked patch gets. The only way to ensure they get any real-world > > test experience at all is to not mark them, and then come back to > > them later and explicitly request a backport. > > > > And, generally, we want to know that a patch destined for LTS > > kernels has actually been applied to and tested on LTS first. > > Automatically backported patches don't get that verification at all. > > I thought it was possible to mark patches to tell the stable team > exactly what you want. Greg certainly seems eager to give maintainers as > much control as they ask for - without requiring them to do anything > they don't want to do. If you have a clear idea of what you want, it > might be good to spell that out and ask how to achieve it. > > > > > My overall preference is that Fixed: patches should be ignored by > > the automation, and that we have a designated NFSD LTS maintainer > > who will test patches on each LTS kernel and request their backport. > > I haven't found anyone to do that work, so we are limping along with > > the current situation. I recognize, however, that this needs to > > improve somehow with only the maintainer resources we have. > > :-) > > Thanks, > NeilBrown > > > > > > > > > > > > Signed-off-by: NeilBrown <neilb@suse.de> > > > > > > > --- > > > > > > > fs/nfsd/nfs4proc.c | 4 ++-- > > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > > > > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > > > > > > > index fea171ffed62..06e0d9153ca9 100644 > > > > > > > --- a/fs/nfsd/nfs4proc.c > > > > > > > +++ b/fs/nfsd/nfs4proc.c > > > > > > > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > > > > > > wake_up_process(async_copy->copy_task); > > > > > > > status = nfs_ok; > > > > > > > } else { > > > > > > > + retry_sync: > > > > > > > status = nfsd4_do_copy(copy, copy->nf_src->nf_file, > > > > > > > copy->nf_dst->nf_file, true); > > > > > > > } > > > > > > > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > > > > > > } > > > > > > > if (async_copy) > > > > > > > cleanup_async_copy(async_copy); > > > > > > > - status = nfserr_jukebox; > > > > > > > - goto out; > > > > > > > + goto retry_sync; > > > > > > > } > > > > > > > > > > > > > > static struct nfsd4_copy * > > > > > > > > > > > > > > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44 > > > > > > > -- > > > > > > > 2.47.0 > > > > > > > > > > > > > > > > > > > -- > > > > > > Chuck Lever > > > > > > > > > > > > > > > > > > > -- > > > > Chuck Lever > > > > > > > > > > > -- > > Chuck Lever > > > >
On Wed, 06 Nov 2024, Olga Kornievskaia wrote: > On Tue, Nov 5, 2024 at 4:06 PM NeilBrown <neilb@suse.de> wrote: > > > > On Wed, 06 Nov 2024, Chuck Lever wrote: > > > > > > Having nfsd threads handle this workload again invites a DoS vector. > > > > Any more so that having nfsd thread handling a WRITE workload? > > Isn't a difference between a COPY and a WRITE the fact that the server > has the ability to restrict the TCP window of the client sending the > bytes. And for simplicity's sake, if we assume client/server has a > single TCP stream when the window size is limited then other WRITEs > are also prevented from sending more data. But a COPY operation > doesn't require much to be sent and preventing the client from sending > another COPY can't be achieved thru TCP window size. I think you are saying that the WRITE requests are naturally throttled. However that isn't necessarily the case. If the network is significantly faster than the storage path, and if a client (or several clients) send enough concurrent WRITE requests, then all nfsd threads could get stuck in writeback throttling code - which could be seen as a denial of service as no other requests could be serviced until congestion eases. So maybe some threads should be reserved for non-IO requests and so would avoid dequeuing WRITE and READ and COPY requests - and would not be involved in async COPY. So I'm still not convinced that COPY in any way introduced a new DoS problem - providing the limit of async-copy threads is in place. I wonder if we could use request deferral to delay IO requests until an IO thread is available... > > > > The 4MB chunk limit is there precisely to prevent synchronous COPY > > > operations from tying up nfsd threads for too long. On a slow export, > > > this is still not going to work, so I'd rather see a timer for this > > > protection; say, 30ms, rather than a byte count. If more than 4MB > > > can be handled quickly, that will be good for throughput. > > > > That sounds like a good goal. Ideally we would need a way to negotiate > > a window with write-back throttling so that we don't bother reading > > until we know that writing to the page-cache won't block. Certainly > > worth exploring. > > > > > > > > Note that we still want to limit the number of background copy > > > operations going on. I don't want a mechanism where a client can > > > start an unbounded amount of work on the server. > > > > This isn't obvious to me. The server makes no promise concerning the > > throughput it will provide. Having a list of COPY requests that add up > > to many terabytes isn't intrinsically a problem. Having millions of > > COPY requests in the list *might* be a little bit of a burden. Using > > __GFP_NORETRY might help put a natural limit on that. > > > > > > > > > > > > > > This came up because CVE-2024-49974 was created so I had to do something > > > > > > about the theoretical DoS vector in SLE kernels. I didn't like the > > > > > > patch so I backported > > > > > > > > > > > > Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous") > > I'm doing the same for RHEL. But the fact that CVE was created seems > like a flaw of the CVE creation process in this case. It should have > never been made a CVE. I think everyone agrees that the CVE process is flawed. Fixing it is not so easy :-( I'm glad you are doing the same. I'm a little concerned that this disabled inter-server copies too. I guess the answer to that is to help resolve the problems with async copy so that it can be re-enabled. > > > > > > > > > > > > > instead (and wondered why it hadn't gone to stable). > > > > > > > > > > I was conservative about requesting a backport here. However, if a > > > > > CVE has been filed, and if there is no automation behind that > > > > > process, you can explicitly request aadc3bbea163 be backported. > > > > > > > > > > The problem, to me, was less about server resource depletion and > > > > > more about client hangs. > > > > > > > > And yet the patch that dealt with the less important server resource > > > > depletion was marked for stable, and the patch that dealt with client > > > > hangs wasn't?? > > > > > > > > The CVE was for that less important patch, probably because it contained > > > > the magic word "DoS". > > > > > > Quite likely. I wasn't consulted before the CVE was opened, nor was > > > I notified that it had been created. > > > > > > Note that distributions are encouraged to evaluate whether a CVE is > > > serious enough to address, rather than simply backporting the fixes > > > automatically. But I know some customers want every CVE handled, so > > > that is sometimes difficult. > > > > Yes, it needs to be handled. Declaring it invalid is certainly an > > option for handling it. I didn't quite feel I could justify that in > > this case. > > > > > > > > > > > > I think 8d915bbf3926 should go to stable but I would like to understand > > > > why you felt the need to be conservative. > > > > > > First, I'm told that LTS generally avoids taking backports that > > > overtly change user-visible behavior like disabling server-to-server > > > copy (which requires async COPY to work). That was the main reason > > > for my hesitance. > > > > Why does server-to-server require async COPY? > > RFC 7862 section 4.5. Inter-Server Copy says > > The destination server may perform the copy synchronously or > > asynchronously. > > but I see that nfsd4_copy() returns nfs4err_notsupp if the inter-server > > copy_is_sync(), but it isn't immediately obvious to me why. The patch > > which landed this functionality doesn't explain the restriction. > > The choice (my choice) for not implementing a synchronous inter-server > copy was from my understanding that such operation would be taking too > much time and tie up a knfsd thread. I guess either mount or the read could add indefinite delays... Thanks. > > > I guess that with multiple 4MB sync COPY requests the server would need > > to repeatedly mount and unmount the source server which could be > > unnecessary work - or would need to cache the mount and know when to > > unmount it.... > > The implementation caches the (source server) mount for a period of > time. But needing to issue multiple COPY can eventually impact > performance. True, but I think that is the goal. If there are two many async COPYs, force the rest to by sync so as the slow down the client and limit the impact on the server. Thanks, NeilBrown
On Wed, Nov 06, 2024 at 03:30:54PM +1100, NeilBrown wrote: > On Wed, 06 Nov 2024, Olga Kornievskaia wrote: > > On Tue, Nov 5, 2024 at 4:06 PM NeilBrown <neilb@suse.de> wrote: > > > > > > On Wed, 06 Nov 2024, Chuck Lever wrote: > > > > > > > > Having nfsd threads handle this workload again invites a DoS vector. > > > > > > Any more so that having nfsd thread handling a WRITE workload? > > > > Isn't a difference between a COPY and a WRITE the fact that the server > > has the ability to restrict the TCP window of the client sending the > > bytes. And for simplicity's sake, if we assume client/server has a > > single TCP stream when the window size is limited then other WRITEs > > are also prevented from sending more data. But a COPY operation > > doesn't require much to be sent and preventing the client from sending > > another COPY can't be achieved thru TCP window size. > > I think you are saying that the WRITE requests are naturally throttled. > However that isn't necessarily the case. If the network is > significantly faster than the storage path, and if a client (or several > clients) send enough concurrent WRITE requests, then all nfsd threads > could get stuck in writeback throttling code - which could be seen as a > denial of service as no other requests could be serviced until > congestion eases. I agree that WRITE throttling needs attention too, but limiting background COPY seems like the more urgent matter. Unlike READ or WRITE, COPY isn't restricted to a single 1MB chunk. > So maybe some threads should be reserved for non-IO requests and so > would avoid dequeuing WRITE and READ and COPY requests - and would not > be involved in async COPY. I'm not enthusiastic about "reserving threads for other purposes". The mechanism that services incoming network requests is simply not at the right layer to sort I/O and non-I/O NFS requests. Instead NFSD will need some kind of two-layer request handling mechanism where any request that might take time or resources will need to be moved out of the layer that handles ingress network requests. I hesitate to call it deferral, because I suspect it will be a hot path, unlike the current svc_defer. > So I'm still not convinced that COPY in any way introduced a new DoS > problem - providing the limit of async-copy threads is in place. Because the kernel doesn't do background COPY at all right now, I'm going to defer this change until after the v6.13 merge window. The patch proposed at the beginning of this thread doesn't seem like a fix to me, but rather a change in policy. > > > > > > > This came up because CVE-2024-49974 was created so I had to do something > > > > > > > about the theoretical DoS vector in SLE kernels. I didn't like the > > > > > > > patch so I backported > > > > > > > > > > > > > > Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous") > > > > I'm doing the same for RHEL. But the fact that CVE was created seems > > like a flaw of the CVE creation process in this case. It should have > > never been made a CVE. > > I think everyone agrees that the CVE process is flawed. Fixing it is > not so easy :-( > I'm glad you are doing the same. I'm a little concerned that this > disabled inter-server copies too. I guess the answer to that is to help > resolve the problems with async copy so that it can be re-enabled. We've been working on restoring background COPY all summer. It is a priority. One issue is that a callback completion mechanism such as CB_OFFLOAD is inherently unreliable. The SCSI community has loads of implementation experience that backs this statement up. I've got an implementation of OFFLOAD_STATUS ready for the Linux NFS client so that it can poll for COPY completion if it thinks it has missed the CB_OFFLOAD callback. Another issue here is that RFC 7862 Section 4.8 wants each COPY stateid to remain until the server sees the CB_OFFLOAD response. NFSD needs to have a mechanism in place to ensure that a rogue client or poor network conditions don't allow the set of waiting COPY stateids to grow large over time.
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index fea171ffed62..06e0d9153ca9 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, wake_up_process(async_copy->copy_task); status = nfs_ok; } else { + retry_sync: status = nfsd4_do_copy(copy, copy->nf_src->nf_file, copy->nf_dst->nf_file, true); } @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, } if (async_copy) cleanup_async_copy(async_copy); - status = nfserr_jukebox; - goto out; + goto retry_sync; } static struct nfsd4_copy *
An NFSv4.2 COPY request can explicitly request a synchronous copy. If that is not requested then the server is free to perform the copy synchronously or asynchronously. In the Linux implementation an async copy requires more resources than a sync copy. If nfsd cannot allocate these resources, the best response is to simply perform the copy (or the first 4MB of it) synchronously. This choice may be debatable if the unavailable resource was due to memory allocation failure - when memalloc fails it might be best to simply give up as the server is clearly under load. However in the case that policy prevents another kthread being created there is no benefit and much cost is failing with NFS4ERR_DELAY. In that case it seems reasonable to avoid that error in all circumstances. So change the out_err case to retry as a sync copy. Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") Signed-off-by: NeilBrown <neilb@suse.de> --- fs/nfsd/nfs4proc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44