diff mbox series

fsverity: Remove WQ_UNBOUND from fsverity read workqueue

Message ID 20230309213742.572091-1-nhuck@google.com (mailing list archive)
State Superseded
Headers show
Series fsverity: Remove WQ_UNBOUND from fsverity read workqueue | expand

Commit Message

Nathan Huckleberry March 9, 2023, 9:37 p.m. UTC
WQ_UNBOUND causes significant scheduler latency on ARM64/Android.  This
is problematic for latency sensitive workloads like I/O post-processing.

Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related
scheduler latency and improves app cold startup times by ~30ms.

This code was tested by running Android app startup benchmarks and
measuring how long the fsverity workqueue spent in the ready queue.

Before
Total workqueue scheduler latency: 553800us
After
Total workqueue scheduler latency: 18962us

Change-Id: I693efee541757851ed6d229430111cd763d39067
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
---
 fs/verity/verify.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Eric Biggers March 10, 2023, 5:11 a.m. UTC | #1
On Thu, Mar 09, 2023 at 01:37:41PM -0800, Nathan Huckleberry wrote:
> WQ_UNBOUND causes significant scheduler latency on ARM64/Android.  This
> is problematic for latency sensitive workloads like I/O post-processing.
> 
> Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related
> scheduler latency and improves app cold startup times by ~30ms.

Maybe mention that WQ_UNBOUND was recently removed from the dm-verity workqueue
too, for the same reason?

I'm still amazed that it's such a big improvement!  I don't really need it to
apply this patch, but it would be very interesting to know exactly why the
latency is so bad with WQ_UNBOUND.

> 
> This code was tested by running Android app startup benchmarks and
> measuring how long the fsverity workqueue spent in the ready queue.
> 
> Before
> Total workqueue scheduler latency: 553800us
> After
> Total workqueue scheduler latency: 18962us
> 
> Change-Id: I693efee541757851ed6d229430111cd763d39067

No Change-Id in upstream patches, please.

> diff --git a/fs/verity/verify.c b/fs/verity/verify.c
> index f50e3b5b52c9..e8ec37774d63 100644
> --- a/fs/verity/verify.c
> +++ b/fs/verity/verify.c
> @@ -395,7 +395,7 @@ int __init fsverity_init_workqueue(void)
>  	 * which blocks reads from completing, over regular application tasks.
>  	 */
>  	fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue",
> -						  WQ_UNBOUND | WQ_HIGHPRI,
> +						  WQ_HIGHPRI,
>  						  num_online_cpus());

There's a comment just above here that explains why WQ_UNBOUND is being used.
It needs to be updated to explain why WQ_UNBOUND is *not* being used.

- Eric
Hillf Danton March 10, 2023, 6:55 a.m. UTC | #2
On 9 Mar 2023 21:11:47 -0800 Eric Biggers <ebiggers@kernel.org>
> On Thu, Mar 09, 2023 at 01:37:41PM -0800, Nathan Huckleberry wrote:
> > WQ_UNBOUND causes significant scheduler latency on ARM64/Android.  This
> > is problematic for latency sensitive workloads like I/O post-processing.
> > 
> > Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related
> > scheduler latency and improves app cold startup times by ~30ms.
> 
> Maybe mention that WQ_UNBOUND was recently removed from the dm-verity workqueue
> too, for the same reason?
> 
> I'm still amazed that it's such a big improvement!  I don't really need it to
> apply this patch, but it would be very interesting to know exactly why the
> latency is so bad with WQ_UNBOUND.
> 
> > This code was tested by running Android app startup benchmarks and
> > measuring how long the fsverity workqueue spent in the ready queue.
> > 
> > Before
> > Total workqueue scheduler latency: 553800us
> > After
> > Total workqueue scheduler latency: 18962us

Given the gap between data above and the 15253 us in diagram[1], and
the SHA instructions[2], could you specify a bit on your test?

[1] https://lore.kernel.org/linux-erofs/20230106073502.4017276-1-dhavale@google.com/
[2] https://lore.kernel.org/lkml/CAJkfWY490-m6wNubkxiTPsW59sfsQs37Wey279LmiRxKt7aQYg@mail.gmail.com/
Nathan Huckleberry March 10, 2023, 7:09 p.m. UTC | #3
On Fri, Mar 10, 2023 at 12:01 AM Hillf Danton <hdanton@sina.com> wrote:
>
> On 9 Mar 2023 21:11:47 -0800 Eric Biggers <ebiggers@kernel.org>
> > On Thu, Mar 09, 2023 at 01:37:41PM -0800, Nathan Huckleberry wrote:
> > > WQ_UNBOUND causes significant scheduler latency on ARM64/Android.  This
> > > is problematic for latency sensitive workloads like I/O post-processing.
> > >
> > > Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related
> > > scheduler latency and improves app cold startup times by ~30ms.
> >
> > Maybe mention that WQ_UNBOUND was recently removed from the dm-verity workqueue
> > too, for the same reason?
> >
> > I'm still amazed that it's such a big improvement!  I don't really need it to
> > apply this patch, but it would be very interesting to know exactly why the
> > latency is so bad with WQ_UNBOUND.

My current guess for the root cause is excessing saving/restoring of
the FPSIMD state.

> >
> > > This code was tested by running Android app startup benchmarks and
> > > measuring how long the fsverity workqueue spent in the ready queue.
> > >
> > > Before
> > > Total workqueue scheduler latency: 553800us
> > > After
> > > Total workqueue scheduler latency: 18962us
>
> Given the gap between data above and the 15253 us in diagram[1], and
> the SHA instructions[2], could you specify a bit on your test?

The test I'm running opens the Android messaging APK which is
validated with fsverity. It opens the messaging app 25 times, dropping
caches each time. The benchmark produces a Perfetto trace which we use
to compute the scheduler latency. We sum up the amount of time that
each fsverity worker spent in the ready state. The test in [1] is
similar, but may be using a different APK. These tests are not in
AOSP, so I can't share a link to them, but I would expect that fio on
a ramdisk would produce similarly good results.

>
> [1] https://lore.kernel.org/linux-erofs/20230106073502.4017276-1-dhavale@google.com/
> [2] https://lore.kernel.org/lkml/CAJkfWY490-m6wNubkxiTPsW59sfsQs37Wey279LmiRxKt7aQYg@mail.gmail.com/
Hillf Danton March 11, 2023, 12:15 a.m. UTC | #4
On 10 Mar 2023 11:09:55 -0800 Nathan Huckleberry <nhuck@google.com>
> On Fri, Mar 10, 2023 at 12:01=E2=80=AFAM Hillf Danton <hdanton@sina.com> wr=
> ote:
> >
> > On 9 Mar 2023 21:11:47 -0800 Eric Biggers <ebiggers@kernel.org>
> > > On Thu, Mar 09, 2023 at 01:37:41PM -0800, Nathan Huckleberry wrote:
> > > > WQ_UNBOUND causes significant scheduler latency on ARM64/Android.  Th=
> is
> > > > is problematic for latency sensitive workloads like I/O post-processi=
> ng.
> > > >
> > > > Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue relat=
> ed
> > > > scheduler latency and improves app cold startup times by ~30ms.
> > >
> > > Maybe mention that WQ_UNBOUND was recently removed from the dm-verity w=
> orkqueue
> > > too, for the same reason?
> > >
> > > I'm still amazed that it's such a big improvement!  I don't really need=
>  it to
> > > apply this patch, but it would be very interesting to know exactly why =
> the
> > > latency is so bad with WQ_UNBOUND.
> 
> My current guess for the root cause is excessing saving/restoring of
> the FPSIMD state.
> 
> > >
> > > > This code was tested by running Android app startup benchmarks and
> > > > measuring how long the fsverity workqueue spent in the ready queue.
> > > >
> > > > Before
> > > > Total workqueue scheduler latency: 553800us
> > > > After
> > > > Total workqueue scheduler latency: 18962us
> >
> > Given the gap between data above and the 15253 us in diagram[1], and
> > the SHA instructions[2], could you specify a bit on your test?
> 
> The test I'm running opens the Android messaging APK which is
> validated with fsverity. It opens the messaging app 25 times, dropping
> caches each time. The benchmark produces a Perfetto trace which we use
> to compute the scheduler latency. We sum up the amount of time that
> each fsverity worker spent in the ready state. The test in [1] is
> similar, but may be using a different APK. These tests are not in
> AOSP, so I can't share a link to them, but I would expect that fio on
> a ramdisk would produce similarly good results.

Thanks for your introduction to the test.
Are the similar results ARM64/Android specific? Could this patch help X86?
> 
> >
> > [1] https://lore.kernel.org/linux-erofs/20230106073502.4017276-1-dhavale@=
> google.com/
> > [2] https://lore.kernel.org/lkml/CAJkfWY490-m6wNubkxiTPsW59sfsQs37Wey279L=
> miRxKt7aQYg@mail.gmail.com/
diff mbox series

Patch

diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index f50e3b5b52c9..e8ec37774d63 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -395,7 +395,7 @@  int __init fsverity_init_workqueue(void)
 	 * which blocks reads from completing, over regular application tasks.
 	 */
 	fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue",
-						  WQ_UNBOUND | WQ_HIGHPRI,
+						  WQ_HIGHPRI,
 						  num_online_cpus());
 	if (!fsverity_read_workqueue)
 		return -ENOMEM;