Message ID | 20240904040444.56070-1-ebiggers@kernel.org (mailing list archive) |
---|---|
State | Rejected, archived |
Headers | show |
Series | dm verity: don't use WQ_MEM_RECLAIM | expand |
On Tue, Sep 03, 2024 at 09:04:44PM -0700, Eric Biggers wrote: > From: Eric Biggers <ebiggers@google.com> > > Since dm-verity doesn't support writes, the kernel's memory reclaim code > will never wait on dm-verity work. That makes the use of WQ_MEM_RECLAIM > in dm-verity unnecessary. WQ_MEM_RECLAIM has been present from the > beginning of dm-verity, but I could not find a justification for it; > I suspect it was just copied from dm-crypt which does support writes. > > Therefore, remove WQ_MEM_RECLAIM from dm-verity. This eliminates the > creation of an unnecessary rescuer thread per dm-verity device. > > Signed-off-by: Eric Biggers <ebiggers@google.com> Thanks! Reviewed-by: Mike Snitzer <snitzer@kernel.org>
On Tue, 3 Sep 2024, Eric Biggers wrote: > From: Eric Biggers <ebiggers@google.com> > > Since dm-verity doesn't support writes, the kernel's memory reclaim code > will never wait on dm-verity work. That makes the use of WQ_MEM_RECLAIM > in dm-verity unnecessary. WQ_MEM_RECLAIM has been present from the > beginning of dm-verity, but I could not find a justification for it; > I suspect it was just copied from dm-crypt which does support writes. > > Therefore, remove WQ_MEM_RECLAIM from dm-verity. This eliminates the > creation of an unnecessary rescuer thread per dm-verity device. > > Signed-off-by: Eric Biggers <ebiggers@google.com> Hmm. I can think about a case where you have read-only dm-verity device, on the top of that you have dm-snapshot device and on the top of that you have a writable filesystem. When the filesystem needs to write data, it submits some write bios. When dm-snapshot receives these write bios, it will read from the dm-verity device and write to the snapshot's exception store device. So, dm-verity needs WQ_MEM_RECLAIM in this case. Mikulas > --- > drivers/md/dm-verity-target.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c > index cf659c8feb29f..051e84ca401dc 100644 > --- a/drivers/md/dm-verity-target.c > +++ b/drivers/md/dm-verity-target.c > @@ -1488,11 +1488,11 @@ static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv) > * Also as required for the "try_verify_in_tasklet" feature: WQ_HIGHPRI > * allows verify_wq to preempt softirq since verification in BH workqueue > * will fall-back to using it for error handling (or if the bufio cache > * doesn't have required hashes). > */ > - v->verify_wq = alloc_workqueue("kverityd", WQ_MEM_RECLAIM | WQ_HIGHPRI, 0); > + v->verify_wq = alloc_workqueue("kverityd", WQ_HIGHPRI, 0); > if (!v->verify_wq) { > ti->error = "Cannot allocate workqueue"; > r = -ENOMEM; > goto bad; > } > > base-commit: 88fac17500f4ea49c7bac136cf1b27e7b9980075 > -- > 2.46.0 >
On Thu, Sep 05, 2024 at 08:21:46PM +0200, Mikulas Patocka wrote: > > > On Tue, 3 Sep 2024, Eric Biggers wrote: > > > From: Eric Biggers <ebiggers@google.com> > > > > Since dm-verity doesn't support writes, the kernel's memory reclaim code > > will never wait on dm-verity work. That makes the use of WQ_MEM_RECLAIM > > in dm-verity unnecessary. WQ_MEM_RECLAIM has been present from the > > beginning of dm-verity, but I could not find a justification for it; > > I suspect it was just copied from dm-crypt which does support writes. > > > > Therefore, remove WQ_MEM_RECLAIM from dm-verity. This eliminates the > > creation of an unnecessary rescuer thread per dm-verity device. > > > > Signed-off-by: Eric Biggers <ebiggers@google.com> > > Hmm. I can think about a case where you have read-only dm-verity device, > on the top of that you have dm-snapshot device and on the top of that you > have a writable filesystem. > > When the filesystem needs to write data, it submits some write bios. When > dm-snapshot receives these write bios, it will read from the dm-verity > device and write to the snapshot's exception store device. So, dm-verity > needs WQ_MEM_RECLAIM in this case. > > Mikulas > Yes, unfortunately that sounds correct. This means that any workqueue involved in fulfilling block device I/O, regardless of whether that I/O is read or write, has to use WQ_MEM_RECLAIM. I wonder if there's any way to safely share the rescuer threads. - Eric
On Thu, Sep 05, 2024 at 03:35:55PM -0700, Eric Biggers wrote: > On Thu, Sep 05, 2024 at 08:21:46PM +0200, Mikulas Patocka wrote: > > > > > > On Tue, 3 Sep 2024, Eric Biggers wrote: > > > > > From: Eric Biggers <ebiggers@google.com> > > > > > > Since dm-verity doesn't support writes, the kernel's memory reclaim code > > > will never wait on dm-verity work. That makes the use of WQ_MEM_RECLAIM > > > in dm-verity unnecessary. WQ_MEM_RECLAIM has been present from the > > > beginning of dm-verity, but I could not find a justification for it; > > > I suspect it was just copied from dm-crypt which does support writes. > > > > > > Therefore, remove WQ_MEM_RECLAIM from dm-verity. This eliminates the > > > creation of an unnecessary rescuer thread per dm-verity device. > > > > > > Signed-off-by: Eric Biggers <ebiggers@google.com> > > > > Hmm. I can think about a case where you have read-only dm-verity device, > > on the top of that you have dm-snapshot device and on the top of that you > > have a writable filesystem. > > > > When the filesystem needs to write data, it submits some write bios. When > > dm-snapshot receives these write bios, it will read from the dm-verity > > device and write to the snapshot's exception store device. So, dm-verity > > needs WQ_MEM_RECLAIM in this case. > > > > Mikulas > > > > Yes, unfortunately that sounds correct. > > This means that any workqueue involved in fulfilling block device I/O, > regardless of whether that I/O is read or write, has to use WQ_MEM_RECLAIM. > > I wonder if there's any way to safely share the rescuer threads. Oh, I like that idea, yes please! (would be surprised if it exists, but I love being surprised!). Like Mikulas pointed out, we have had to deal with fundamental deadlocks due to resource sharing in DM. Hence the need for guaranteed forward progress that only WQ_MEM_RECLAIM can provide. All said, I'd like the same for NFS LOCALIO, we unfortunately have to enable WQ_MEM_RECLAIM for LOCALIO writes: https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/commit/?h=nfs-localio-for-next&id=85cdb98067c1c784c2744a6624608efea2b561e7 But in general LOCALIO's write path is a prime candidate for further optimization -- I look forward to continue that line of development once LOCALIO lands upstream. Mike
Hello, On Thu, Sep 05, 2024 at 07:35:41PM -0400, Mike Snitzer wrote: ... > > I wonder if there's any way to safely share the rescuer threads. > > Oh, I like that idea, yes please! (would be surprised if it exists, > but I love being surprised!). Like Mikulas pointed out, we have had > to deal with fundamental deadlocks due to resource sharing in DM. > Hence the need for guaranteed forward progress that only > WQ_MEM_RECLAIM can provide. The most straightforward way to do this would be simply sharing the workqueue across the entities that wanna be in the same forward progress guarantee domain. It shouldn't be that difficult to make workqueues share a rescuer either but may be a bit of an overkill. Taking a step back tho, how would you determine which ones can share a rescuer? Things which stack on top of each other can't share the rescuer cuz higher layer occupying the rescuer and stall lower layers and thus deadlock. The rescuers can be shared across independent stacks of dm devices but that sounds like that will probably involve some graph walking. Also, is this a real problem? Thanks.
On Thu, 5 Sep 2024, Eric Biggers wrote: > On Thu, Sep 05, 2024 at 08:21:46PM +0200, Mikulas Patocka wrote: > > > > > > On Tue, 3 Sep 2024, Eric Biggers wrote: > > > > > From: Eric Biggers <ebiggers@google.com> > > > > > > Since dm-verity doesn't support writes, the kernel's memory reclaim code > > > will never wait on dm-verity work. That makes the use of WQ_MEM_RECLAIM > > > in dm-verity unnecessary. WQ_MEM_RECLAIM has been present from the > > > beginning of dm-verity, but I could not find a justification for it; > > > I suspect it was just copied from dm-crypt which does support writes. > > > > > > Therefore, remove WQ_MEM_RECLAIM from dm-verity. This eliminates the > > > creation of an unnecessary rescuer thread per dm-verity device. > > > > > > Signed-off-by: Eric Biggers <ebiggers@google.com> > > > > Hmm. I can think about a case where you have read-only dm-verity device, > > on the top of that you have dm-snapshot device and on the top of that you > > have a writable filesystem. > > > > When the filesystem needs to write data, it submits some write bios. When > > dm-snapshot receives these write bios, it will read from the dm-verity > > device and write to the snapshot's exception store device. So, dm-verity > > needs WQ_MEM_RECLAIM in this case. > > > > Mikulas > > > > Yes, unfortunately that sounds correct. > > This means that any workqueue involved in fulfilling block device I/O, > regardless of whether that I/O is read or write, has to use WQ_MEM_RECLAIM. > > I wonder if there's any way to safely share the rescuer threads. > > - Eric When I thought about it, I think that removing WQ_MEM_RECLAIM would be incorrect even without snapshot and it could deadlock even with a read-only filesystem directly on the top of dm-verity. There is a limited number of workqueue kernel threads in the system. If all the workqueue kernel threads are busy trying to read some data from a filesystem that is on the top of dm-verity, then the system deadlocks. Dm-verity would wait until one of the work items exits - and the work items would wait for dm-verity to return the data. The probability that this happens is low, but theoretically it is wrong. Mikulas
On Thu, 5 Sep 2024, Tejun Heo wrote: > Hello, > > On Thu, Sep 05, 2024 at 07:35:41PM -0400, Mike Snitzer wrote: > ... > > > I wonder if there's any way to safely share the rescuer threads. > > > > Oh, I like that idea, yes please! (would be surprised if it exists, > > but I love being surprised!). Like Mikulas pointed out, we have had > > to deal with fundamental deadlocks due to resource sharing in DM. > > Hence the need for guaranteed forward progress that only > > WQ_MEM_RECLAIM can provide. I remember that one of the first thing that I did when I started at Red Hat was to remove shared resources from device mapper :) There were shared mempools and shared kernel threads. You can see this piece of code in mm/mempool.c that was a workaround for shared mempool bugs: /* * FIXME: this should be io_schedule(). The timeout is there as a * workaround for some DM problems in 2.6.18. */ io_schedule_timeout(5*HZ); > The most straightforward way to do this would be simply sharing the > workqueue across the entities that wanna be in the same forward progress > guarantee domain. It shouldn't be that difficult to make workqueues share a > rescuer either but may be a bit of an overkill. > > Taking a step back tho, how would you determine which ones can share a > rescuer? Things which stack on top of each other can't share the rescuer cuz > higher layer occupying the rescuer and stall lower layers and thus deadlock. > The rescuers can be shared across independent stacks of dm devices but that > sounds like that will probably involve some graph walking. Also, is this a > real problem? > > Thanks. It would be nice if we could know dependencies of every Linux driver. But we are not quite there. We know the dependencies inside device mapper, but when you use some non-dm device (like md, loop), we don't have a dependecy graph for that. Mikulas
diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c index cf659c8feb29f..051e84ca401dc 100644 --- a/drivers/md/dm-verity-target.c +++ b/drivers/md/dm-verity-target.c @@ -1488,11 +1488,11 @@ static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv) * Also as required for the "try_verify_in_tasklet" feature: WQ_HIGHPRI * allows verify_wq to preempt softirq since verification in BH workqueue * will fall-back to using it for error handling (or if the bufio cache * doesn't have required hashes). */ - v->verify_wq = alloc_workqueue("kverityd", WQ_MEM_RECLAIM | WQ_HIGHPRI, 0); + v->verify_wq = alloc_workqueue("kverityd", WQ_HIGHPRI, 0); if (!v->verify_wq) { ti->error = "Cannot allocate workqueue"; r = -ENOMEM; goto bad; }