diff mbox series

[kvm-unit-tests] tscdeadline_latency: Check condition first before loop

Message ID 20190711071756.2784-1-peterx@redhat.com (mailing list archive)
State New, archived
Headers show
Series [kvm-unit-tests] tscdeadline_latency: Check condition first before loop | expand

Commit Message

Peter Xu July 11, 2019, 7:17 a.m. UTC
This patch fixes a tscdeadline_latency hang when specifying a very
small breakmax value.  It's easily reproduced on my host with
parameters like "200000 10000 10" (set breakmax to 10 TSC clocks).

The problem is test_tsc_deadline_timer() can be very slow because
we've got printf() in there.  So when reach the main loop we might
have already triggered the IRQ handler for multiple times and we might
have triggered the hitmax condition which will turn IRQ off.  Then
with no IRQ that first HLT instruction can last forever.

Fix this by simply checking the condition first in the loop.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 x86/tscdeadline_latency.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Peter Xu July 11, 2019, 7:33 a.m. UTC | #1
On Thu, Jul 11, 2019 at 03:17:56PM +0800, Peter Xu wrote:
> This patch fixes a tscdeadline_latency hang when specifying a very
> small breakmax value.  It's easily reproduced on my host with
> parameters like "200000 10000 10" (set breakmax to 10 TSC clocks).
> 
> The problem is test_tsc_deadline_timer() can be very slow because
> we've got printf() in there.  So when reach the main loop we might
> have already triggered the IRQ handler for multiple times and we might
> have triggered the hitmax condition which will turn IRQ off.  Then
> with no IRQ that first HLT instruction can last forever.
> 
> Fix this by simply checking the condition first in the loop.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  x86/tscdeadline_latency.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/x86/tscdeadline_latency.c b/x86/tscdeadline_latency.c
> index 0617a1b..4ee5917 100644
> --- a/x86/tscdeadline_latency.c
> +++ b/x86/tscdeadline_latency.c
> @@ -118,9 +118,9 @@ int main(int argc, char **argv)
>      test_tsc_deadline_timer();
>      irq_enable();
>  
> -    do {
> +    /* The condition might have triggered already, so check before HLT. */
> +    while (!hitmax && table_idx < size)

Hmm... I think this is not ideal too in that variables (e.g., hitmax)
could logically still change between the condition check and HLT below
(though this patch already runs nicely here).  Maybe we can simply use
"nop" or "pause" instead of "hlt".

I tested that using pause fixes the problem too.

>          asm volatile("hlt");
> -    } while (!hitmax && table_idx < size);
>  
>      for (i = 0; i < table_idx; i++) {
>          if (hitmax && i == table_idx-1)
> -- 
> 2.21.0
> 

Regards,
Sean Christopherson July 11, 2019, 2:05 p.m. UTC | #2
On Thu, Jul 11, 2019 at 03:33:35PM +0800, Peter Xu wrote:
> On Thu, Jul 11, 2019 at 03:17:56PM +0800, Peter Xu wrote:
> > This patch fixes a tscdeadline_latency hang when specifying a very
> > small breakmax value.  It's easily reproduced on my host with
> > parameters like "200000 10000 10" (set breakmax to 10 TSC clocks).
> > 
> > The problem is test_tsc_deadline_timer() can be very slow because
> > we've got printf() in there.  So when reach the main loop we might
> > have already triggered the IRQ handler for multiple times and we might
> > have triggered the hitmax condition which will turn IRQ off.  Then
> > with no IRQ that first HLT instruction can last forever.
> > 
> > Fix this by simply checking the condition first in the loop.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  x86/tscdeadline_latency.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/x86/tscdeadline_latency.c b/x86/tscdeadline_latency.c
> > index 0617a1b..4ee5917 100644
> > --- a/x86/tscdeadline_latency.c
> > +++ b/x86/tscdeadline_latency.c
> > @@ -118,9 +118,9 @@ int main(int argc, char **argv)
> >      test_tsc_deadline_timer();
> >      irq_enable();
> >  
> > -    do {
> > +    /* The condition might have triggered already, so check before HLT. */
> > +    while (!hitmax && table_idx < size)
> 
> Hmm... I think this is not ideal too in that variables (e.g., hitmax)
> could logically still change between the condition check and HLT below
> (though this patch already runs nicely here).  Maybe we can simply use
> "nop" or "pause" instead of "hlt".
> 
> I tested that using pause fixes the problem too.

Ensuring the first hlt lands in an interrupt shadow should prevent getting
into a halted state after the timer has been disabled, e.g.:

    irq_disable();
    test_tsc_deadline_timer();

    do {
        safe_halt();
    } while (!hitmax && table_idx < size);

> 
> >          asm volatile("hlt");
> > -    } while (!hitmax && table_idx < size);
> >  
> >      for (i = 0; i < table_idx; i++) {
> >          if (hitmax && i == table_idx-1)
> > -- 
> > 2.21.0
> > 
> 
> Regards,
> 
> -- 
> Peter Xu
Peter Xu July 11, 2019, 11:27 p.m. UTC | #3
On Thu, Jul 11, 2019 at 07:05:53AM -0700, Sean Christopherson wrote:
> Ensuring the first hlt lands in an interrupt shadow should prevent getting
> into a halted state after the timer has been disabled, e.g.:
> 
>     irq_disable();
>     test_tsc_deadline_timer();
> 
>     do {
>         safe_halt();
>     } while (!hitmax && table_idx < size);

Yes seems better, thanks for the suggestion (though I'll probably also
need to remove the hidden sti in start_tsc_deadline_timer).

Is safe_halt() really safe?  I mean, IRQ handler could still run
before HLT right after STI right?  Though no matter what I think it's
fine for this test case because we'll skip the first IRQ after all.
Just curious.

Thanks,
Sean Christopherson July 11, 2019, 11:34 p.m. UTC | #4
On Fri, Jul 12, 2019 at 07:27:36AM +0800, Peter Xu wrote:
> On Thu, Jul 11, 2019 at 07:05:53AM -0700, Sean Christopherson wrote:
> > Ensuring the first hlt lands in an interrupt shadow should prevent getting
> > into a halted state after the timer has been disabled, e.g.:
> > 
> >     irq_disable();
> >     test_tsc_deadline_timer();
> > 
> >     do {
> >         safe_halt();
> >     } while (!hitmax && table_idx < size);
> 
> Yes seems better, thanks for the suggestion (though I'll probably also
> need to remove the hidden sti in start_tsc_deadline_timer).
> 
> Is safe_halt() really safe?  I mean, IRQ handler could still run
> before HLT right after STI right?  Though no matter what I think it's
> fine for this test case because we'll skip the first IRQ after all.
> Just curious.

It's safe, at least on modern hardware.  Everything since P6, and I
think all AMD CPUs?, have an interrupt shadow where interrupts are
blocked for one additional instruction after being enabled by STI.
diff mbox series

Patch

diff --git a/x86/tscdeadline_latency.c b/x86/tscdeadline_latency.c
index 0617a1b..4ee5917 100644
--- a/x86/tscdeadline_latency.c
+++ b/x86/tscdeadline_latency.c
@@ -118,9 +118,9 @@  int main(int argc, char **argv)
     test_tsc_deadline_timer();
     irq_enable();
 
-    do {
+    /* The condition might have triggered already, so check before HLT. */
+    while (!hitmax && table_idx < size)
         asm volatile("hlt");
-    } while (!hitmax && table_idx < size);
 
     for (i = 0; i < table_idx; i++) {
         if (hitmax && i == table_idx-1)