diff mbox series

dm verity: don't use WQ_MEM_RECLAIM

Message ID 20240904040444.56070-1-ebiggers@kernel.org (mailing list archive)
State Rejected, archived
Headers show
Series dm verity: don't use WQ_MEM_RECLAIM | expand

Commit Message

Eric Biggers Sept. 4, 2024, 4:04 a.m. UTC
From: Eric Biggers <ebiggers@google.com>

Since dm-verity doesn't support writes, the kernel's memory reclaim code
will never wait on dm-verity work.  That makes the use of WQ_MEM_RECLAIM
in dm-verity unnecessary.  WQ_MEM_RECLAIM has been present from the
beginning of dm-verity, but I could not find a justification for it;
I suspect it was just copied from dm-crypt which does support writes.

Therefore, remove WQ_MEM_RECLAIM from dm-verity.  This eliminates the
creation of an unnecessary rescuer thread per dm-verity device.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 drivers/md/dm-verity-target.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


base-commit: 88fac17500f4ea49c7bac136cf1b27e7b9980075

Comments

Mike Snitzer Sept. 5, 2024, 2:32 p.m. UTC | #1
On Tue, Sep 03, 2024 at 09:04:44PM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Since dm-verity doesn't support writes, the kernel's memory reclaim code
> will never wait on dm-verity work.  That makes the use of WQ_MEM_RECLAIM
> in dm-verity unnecessary.  WQ_MEM_RECLAIM has been present from the
> beginning of dm-verity, but I could not find a justification for it;
> I suspect it was just copied from dm-crypt which does support writes.
> 
> Therefore, remove WQ_MEM_RECLAIM from dm-verity.  This eliminates the
> creation of an unnecessary rescuer thread per dm-verity device.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Thanks!

Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Mikulas Patocka Sept. 5, 2024, 6:21 p.m. UTC | #2
On Tue, 3 Sep 2024, Eric Biggers wrote:

> From: Eric Biggers <ebiggers@google.com>
> 
> Since dm-verity doesn't support writes, the kernel's memory reclaim code
> will never wait on dm-verity work.  That makes the use of WQ_MEM_RECLAIM
> in dm-verity unnecessary.  WQ_MEM_RECLAIM has been present from the
> beginning of dm-verity, but I could not find a justification for it;
> I suspect it was just copied from dm-crypt which does support writes.
> 
> Therefore, remove WQ_MEM_RECLAIM from dm-verity.  This eliminates the
> creation of an unnecessary rescuer thread per dm-verity device.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Hmm. I can think about a case where you have read-only dm-verity device, 
on the top of that you have dm-snapshot device and on the top of that you 
have a writable filesystem.

When the filesystem needs to write data, it submits some write bios. When 
dm-snapshot receives these write bios, it will read from the dm-verity 
device and write to the snapshot's exception store device. So, dm-verity 
needs WQ_MEM_RECLAIM in this case.

Mikulas

> ---
>  drivers/md/dm-verity-target.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c
> index cf659c8feb29f..051e84ca401dc 100644
> --- a/drivers/md/dm-verity-target.c
> +++ b/drivers/md/dm-verity-target.c
> @@ -1488,11 +1488,11 @@ static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	 * Also as required for the "try_verify_in_tasklet" feature: WQ_HIGHPRI
>  	 * allows verify_wq to preempt softirq since verification in BH workqueue
>  	 * will fall-back to using it for error handling (or if the bufio cache
>  	 * doesn't have required hashes).
>  	 */
> -	v->verify_wq = alloc_workqueue("kverityd", WQ_MEM_RECLAIM | WQ_HIGHPRI, 0);
> +	v->verify_wq = alloc_workqueue("kverityd", WQ_HIGHPRI, 0);
>  	if (!v->verify_wq) {
>  		ti->error = "Cannot allocate workqueue";
>  		r = -ENOMEM;
>  		goto bad;
>  	}
> 
> base-commit: 88fac17500f4ea49c7bac136cf1b27e7b9980075
> -- 
> 2.46.0
>
Eric Biggers Sept. 5, 2024, 10:35 p.m. UTC | #3
On Thu, Sep 05, 2024 at 08:21:46PM +0200, Mikulas Patocka wrote:
> 
> 
> On Tue, 3 Sep 2024, Eric Biggers wrote:
> 
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > Since dm-verity doesn't support writes, the kernel's memory reclaim code
> > will never wait on dm-verity work.  That makes the use of WQ_MEM_RECLAIM
> > in dm-verity unnecessary.  WQ_MEM_RECLAIM has been present from the
> > beginning of dm-verity, but I could not find a justification for it;
> > I suspect it was just copied from dm-crypt which does support writes.
> > 
> > Therefore, remove WQ_MEM_RECLAIM from dm-verity.  This eliminates the
> > creation of an unnecessary rescuer thread per dm-verity device.
> > 
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
> 
> Hmm. I can think about a case where you have read-only dm-verity device, 
> on the top of that you have dm-snapshot device and on the top of that you 
> have a writable filesystem.
> 
> When the filesystem needs to write data, it submits some write bios. When 
> dm-snapshot receives these write bios, it will read from the dm-verity 
> device and write to the snapshot's exception store device. So, dm-verity 
> needs WQ_MEM_RECLAIM in this case.
> 
> Mikulas
> 

Yes, unfortunately that sounds correct.

This means that any workqueue involved in fulfilling block device I/O,
regardless of whether that I/O is read or write, has to use WQ_MEM_RECLAIM.

I wonder if there's any way to safely share the rescuer threads.

- Eric
Mike Snitzer Sept. 5, 2024, 11:35 p.m. UTC | #4
On Thu, Sep 05, 2024 at 03:35:55PM -0700, Eric Biggers wrote:
> On Thu, Sep 05, 2024 at 08:21:46PM +0200, Mikulas Patocka wrote:
> > 
> > 
> > On Tue, 3 Sep 2024, Eric Biggers wrote:
> > 
> > > From: Eric Biggers <ebiggers@google.com>
> > > 
> > > Since dm-verity doesn't support writes, the kernel's memory reclaim code
> > > will never wait on dm-verity work.  That makes the use of WQ_MEM_RECLAIM
> > > in dm-verity unnecessary.  WQ_MEM_RECLAIM has been present from the
> > > beginning of dm-verity, but I could not find a justification for it;
> > > I suspect it was just copied from dm-crypt which does support writes.
> > > 
> > > Therefore, remove WQ_MEM_RECLAIM from dm-verity.  This eliminates the
> > > creation of an unnecessary rescuer thread per dm-verity device.
> > > 
> > > Signed-off-by: Eric Biggers <ebiggers@google.com>
> > 
> > Hmm. I can think about a case where you have read-only dm-verity device, 
> > on the top of that you have dm-snapshot device and on the top of that you 
> > have a writable filesystem.
> > 
> > When the filesystem needs to write data, it submits some write bios. When 
> > dm-snapshot receives these write bios, it will read from the dm-verity 
> > device and write to the snapshot's exception store device. So, dm-verity 
> > needs WQ_MEM_RECLAIM in this case.
> > 
> > Mikulas
> > 
> 
> Yes, unfortunately that sounds correct.
> 
> This means that any workqueue involved in fulfilling block device I/O,
> regardless of whether that I/O is read or write, has to use WQ_MEM_RECLAIM.
> 
> I wonder if there's any way to safely share the rescuer threads.

Oh, I like that idea, yes please! (would be surprised if it exists,
but I love being surprised!).  Like Mikulas pointed out, we have had
to deal with fundamental deadlocks due to resource sharing in DM.
Hence the need for guaranteed forward progress that only
WQ_MEM_RECLAIM can provide.

All said, I'd like the same for NFS LOCALIO, we unfortunately have to
enable WQ_MEM_RECLAIM for LOCALIO writes:
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/commit/?h=nfs-localio-for-next&id=85cdb98067c1c784c2744a6624608efea2b561e7

But in general LOCALIO's write path is a prime candidate for further
optimization -- I look forward to continue that line of development
once LOCALIO lands upstream.

Mike
Tejun Heo Sept. 6, 2024, 1:34 a.m. UTC | #5
Hello,

On Thu, Sep 05, 2024 at 07:35:41PM -0400, Mike Snitzer wrote:
...
> > I wonder if there's any way to safely share the rescuer threads.
> 
> Oh, I like that idea, yes please! (would be surprised if it exists,
> but I love being surprised!).  Like Mikulas pointed out, we have had
> to deal with fundamental deadlocks due to resource sharing in DM.
> Hence the need for guaranteed forward progress that only
> WQ_MEM_RECLAIM can provide.

The most straightforward way to do this would be simply sharing the
workqueue across the entities that wanna be in the same forward progress
guarantee domain. It shouldn't be that difficult to make workqueues share a
rescuer either but may be a bit of an overkill.

Taking a step back tho, how would you determine which ones can share a
rescuer? Things which stack on top of each other can't share the rescuer cuz
higher layer occupying the rescuer and stall lower layers and thus deadlock.
The rescuers can be shared across independent stacks of dm devices but that
sounds like that will probably involve some graph walking. Also, is this a
real problem?

Thanks.
Mikulas Patocka Sept. 6, 2024, 10:59 a.m. UTC | #6
On Thu, 5 Sep 2024, Eric Biggers wrote:

> On Thu, Sep 05, 2024 at 08:21:46PM +0200, Mikulas Patocka wrote:
> > 
> > 
> > On Tue, 3 Sep 2024, Eric Biggers wrote:
> > 
> > > From: Eric Biggers <ebiggers@google.com>
> > > 
> > > Since dm-verity doesn't support writes, the kernel's memory reclaim code
> > > will never wait on dm-verity work.  That makes the use of WQ_MEM_RECLAIM
> > > in dm-verity unnecessary.  WQ_MEM_RECLAIM has been present from the
> > > beginning of dm-verity, but I could not find a justification for it;
> > > I suspect it was just copied from dm-crypt which does support writes.
> > > 
> > > Therefore, remove WQ_MEM_RECLAIM from dm-verity.  This eliminates the
> > > creation of an unnecessary rescuer thread per dm-verity device.
> > > 
> > > Signed-off-by: Eric Biggers <ebiggers@google.com>
> > 
> > Hmm. I can think about a case where you have read-only dm-verity device, 
> > on the top of that you have dm-snapshot device and on the top of that you 
> > have a writable filesystem.
> > 
> > When the filesystem needs to write data, it submits some write bios. When 
> > dm-snapshot receives these write bios, it will read from the dm-verity 
> > device and write to the snapshot's exception store device. So, dm-verity 
> > needs WQ_MEM_RECLAIM in this case.
> > 
> > Mikulas
> > 
> 
> Yes, unfortunately that sounds correct.
> 
> This means that any workqueue involved in fulfilling block device I/O,
> regardless of whether that I/O is read or write, has to use WQ_MEM_RECLAIM.
> 
> I wonder if there's any way to safely share the rescuer threads.
> 
> - Eric

When I thought about it, I think that removing WQ_MEM_RECLAIM would be 
incorrect even without snapshot and it could deadlock even with a 
read-only filesystem directly on the top of dm-verity.

There is a limited number of workqueue kernel threads in the system. If 
all the workqueue kernel threads are busy trying to read some data from a 
filesystem that is on the top of dm-verity, then the system deadlocks. 
Dm-verity would wait until one of the work items exits - and the work 
items would wait for dm-verity to return the data.

The probability that this happens is low, but theoretically it is wrong.

Mikulas
Mikulas Patocka Sept. 6, 2024, 11:23 a.m. UTC | #7
On Thu, 5 Sep 2024, Tejun Heo wrote:

> Hello,
> 
> On Thu, Sep 05, 2024 at 07:35:41PM -0400, Mike Snitzer wrote:
> ...
> > > I wonder if there's any way to safely share the rescuer threads.
> > 
> > Oh, I like that idea, yes please! (would be surprised if it exists,
> > but I love being surprised!).  Like Mikulas pointed out, we have had
> > to deal with fundamental deadlocks due to resource sharing in DM.
> > Hence the need for guaranteed forward progress that only
> > WQ_MEM_RECLAIM can provide.

I remember that one of the first thing that I did when I started at Red 
Hat was to remove shared resources from device mapper :) There were shared 
mempools and shared kernel threads.

You can see this piece of code in mm/mempool.c that was a workaround for 
shared mempool bugs:
        /*
         * FIXME: this should be io_schedule().  The timeout is there as a
         * workaround for some DM problems in 2.6.18.
         */
        io_schedule_timeout(5*HZ);

> The most straightforward way to do this would be simply sharing the
> workqueue across the entities that wanna be in the same forward progress
> guarantee domain. It shouldn't be that difficult to make workqueues share a
> rescuer either but may be a bit of an overkill.
> 
> Taking a step back tho, how would you determine which ones can share a
> rescuer? Things which stack on top of each other can't share the rescuer cuz
> higher layer occupying the rescuer and stall lower layers and thus deadlock.
> The rescuers can be shared across independent stacks of dm devices but that
> sounds like that will probably involve some graph walking. Also, is this a
> real problem?
> 
> Thanks.

It would be nice if we could know dependencies of every Linux driver. But 
we are not quite there. We know the dependencies inside device mapper, but 
when you use some non-dm device (like md, loop), we don't have a dependecy 
graph for that.

Mikulas
diff mbox series

Patch

diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c
index cf659c8feb29f..051e84ca401dc 100644
--- a/drivers/md/dm-verity-target.c
+++ b/drivers/md/dm-verity-target.c
@@ -1488,11 +1488,11 @@  static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	 * Also as required for the "try_verify_in_tasklet" feature: WQ_HIGHPRI
 	 * allows verify_wq to preempt softirq since verification in BH workqueue
 	 * will fall-back to using it for error handling (or if the bufio cache
 	 * doesn't have required hashes).
 	 */
-	v->verify_wq = alloc_workqueue("kverityd", WQ_MEM_RECLAIM | WQ_HIGHPRI, 0);
+	v->verify_wq = alloc_workqueue("kverityd", WQ_HIGHPRI, 0);
 	if (!v->verify_wq) {
 		ti->error = "Cannot allocate workqueue";
 		r = -ENOMEM;
 		goto bad;
 	}