diff mbox series

[v1] s390x/tcg: MVCL: Exit to main loop if there are pending interrupts

Message ID 20191001181655.25948-1-david@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v1] s390x/tcg: MVCL: Exit to main loop if there are pending interrupts | expand

Commit Message

David Hildenbrand Oct. 1, 2019, 6:16 p.m. UTC
MVCL is interruptible and we should check for interrupts and process
them after writing back the variables to the registers.

I can see both checks triggering. Most of them pass the first check,
however, sometimes also the second check strikes and an interrupt gets
delivered. (I assume pending interrupts that were not deliverable when
injected but are now deliverable)

When booting Fedora 30, I can see a handful of these exits and it seems
to work reliable. (it never get's triggered via EXECUTE, though)

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/mem_helper.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

Comments

Richard Henderson Oct. 1, 2019, 7:17 p.m. UTC | #1
On 10/1/19 11:16 AM, David Hildenbrand wrote:
> +static inline bool should_interrupt_instruction(CPUState *cs)
> +{
> +    /*
> +     * Something asked us to stop executing chained TBs, e.g.,
> +     * cpu_interrupt() or cpu_exit().
> +     */
> +    if ((int32_t)atomic_read(&cpu_neg(cs)->icount_decr.u32) < 0) {
> +        return true;
> +    }
> +
> +    /* We have a deliverable interrupt pending. */
> +    if ((atomic_read(&cs->interrupt_request) & CPU_INTERRUPT_HARD) &&
> +        s390_cpu_has_int(S390_CPU(cs))) {
> +        return true;
> +    }
> +    return false;
> +}

The first condition should be true whenever the second condition is true.

In particular, tcg_handle_interrupt sets icount_decr.u16.high = -1 for
qemu_cpu_is_self; otherwise, qemu_cpu_kick calls cpu_exit which does the same
thing.

Think of it this way: we only test icount_decr.u32 at the start of each TB, and
that's the only thing we have that brings us back to the main loop for any
other kind of interrupt.


r~
David Hildenbrand Oct. 1, 2019, 7:47 p.m. UTC | #2
On 01.10.19 21:17, Richard Henderson wrote:
> On 10/1/19 11:16 AM, David Hildenbrand wrote:
>> +static inline bool should_interrupt_instruction(CPUState *cs)
>> +{
>> +    /*
>> +     * Something asked us to stop executing chained TBs, e.g.,
>> +     * cpu_interrupt() or cpu_exit().
>> +     */
>> +    if ((int32_t)atomic_read(&cpu_neg(cs)->icount_decr.u32) < 0) {
>> +        return true;
>> +    }
>> +
>> +    /* We have a deliverable interrupt pending. */
>> +    if ((atomic_read(&cs->interrupt_request) & CPU_INTERRUPT_HARD) &&
>> +        s390_cpu_has_int(S390_CPU(cs))) {
>> +        return true;
>> +    }
>> +    return false;
>> +}
> 
> The first condition should be true whenever the second condition is true.

@@ -1018,6 +1018,7 @@ static inline bool should_interrupt_instruction(CPUState *cs)
     /* We have a deliverable interrupt pending. */
     if ((atomic_read(&cs->interrupt_request) & CPU_INTERRUPT_HARD) &&
         s390_cpu_has_int(S390_CPU(cs))) {
+        g_assert((int32_t)atomic_read(&cpu_neg(cs)->icount_decr.u32) < 0);
         return true;
     }
     return false;


...


[   60.109761] systemd[1]: Set hostname to <rhel8>.
**
ERROR:/home/dhildenb/git/qemu/target/s390x/mem_helper.c:1021:should_interrupt_instruction: assertion failed: ((int32_t)atomic_read(&cpu_neg(cs)->icount_decr.u32) < 0)


A race? Roughly 20-30% pass the first but not the second check. And
in total, on a Fedora 30 boot, I can maybe see 30 calls of
should_interrupt_instruction() succeeding.

I thought these could be pending interrupts that were not deliverable
when injected but are now deliverable. For these,
icount_decr.u32.high would already have been set to 0.

OTOH, I guess we always exit the TB in case we change the "deliverable" state
of an IRQ, e.g., after LPSW or LCTL. E.g.,

static DisasJumpType op_lctlg(DisasContext *s, DisasOps *o)
{
...
    /* Exit to main loop to reevaluate s390_cpu_exec_interrupt.  */
    return DISAS_PC_STALE_NOCHAIN;
}

Maybe really a race then - or we are not properly exiting back to the
main loop in all scenarios.

> 
> In particular, tcg_handle_interrupt sets icount_decr.u16.high = -1 for
> qemu_cpu_is_self; otherwise, qemu_cpu_kick calls cpu_exit which does the same
> thing.
> 
> Think of it this way: we only test icount_decr.u32 at the start of each TB, and
> that's the only thing we have that brings us back to the main loop for any
> other kind of interrupt.
>
Richard Henderson Oct. 1, 2019, 9:59 p.m. UTC | #3
On 10/1/19 12:47 PM, David Hildenbrand wrote:
> On 01.10.19 21:17, Richard Henderson wrote:
>> On 10/1/19 11:16 AM, David Hildenbrand wrote:
>>> +static inline bool should_interrupt_instruction(CPUState *cs)
>>> +{
>>> +    /*
>>> +     * Something asked us to stop executing chained TBs, e.g.,
>>> +     * cpu_interrupt() or cpu_exit().
>>> +     */
>>> +    if ((int32_t)atomic_read(&cpu_neg(cs)->icount_decr.u32) < 0) {
>>> +        return true;
>>> +    }
>>> +
>>> +    /* We have a deliverable interrupt pending. */
>>> +    if ((atomic_read(&cs->interrupt_request) & CPU_INTERRUPT_HARD) &&
>>> +        s390_cpu_has_int(S390_CPU(cs))) {
>>> +        return true;
>>> +    }
>>> +    return false;
>>> +}
>>
>> The first condition should be true whenever the second condition is true.
> 
> @@ -1018,6 +1018,7 @@ static inline bool should_interrupt_instruction(CPUState *cs)
>      /* We have a deliverable interrupt pending. */
>      if ((atomic_read(&cs->interrupt_request) & CPU_INTERRUPT_HARD) &&
>          s390_cpu_has_int(S390_CPU(cs))) {
> +        g_assert((int32_t)atomic_read(&cpu_neg(cs)->icount_decr.u32) < 0);
>          return true;
>      }
>      return false;
> 
> 
> ...
> 
> 
> [   60.109761] systemd[1]: Set hostname to <rhel8>.
> **
> ERROR:/home/dhildenb/git/qemu/target/s390x/mem_helper.c:1021:should_interrupt_instruction: assertion failed: ((int32_t)atomic_read(&cpu_neg(cs)->icount_decr.u32) < 0)
> 
> 
> A race? Roughly 20-30% pass the first but not the second check. And
> in total, on a Fedora 30 boot, I can maybe see 30 calls of
> should_interrupt_instruction() succeeding.
> 
> I thought these could be pending interrupts that were not deliverable
> when injected but are now deliverable. For these,
> icount_decr.u32.high would already have been set to 0.
> 
> OTOH, I guess we always exit the TB in case we change the "deliverable" state
> of an IRQ, e.g., after LPSW or LCTL. E.g.,
> 
> static DisasJumpType op_lctlg(DisasContext *s, DisasOps *o)
> {
> ...
>     /* Exit to main loop to reevaluate s390_cpu_exec_interrupt.  */
>     return DISAS_PC_STALE_NOCHAIN;
> }
> 
> Maybe really a race then - or we are not properly exiting back to the
> main loop in all scenarios.

I think that it's a race right here in should_interrupt_instruction.

Notice, interrupt_request gets set before icount_decr.  Indeed, the barrier
happens immediately before the set of icount_decr in cpu_exit().

(It is briefly confusing that we have a barrier in cpu_exit and not in
tcg_handle_interrupt.  But that's explained by the cpu_is_self -- no need for a
barrier for the current cpu.  I also think we could usefully use
atomic_store_release instead of a separate smp_wmb.)

Therefore checking interrupt_request after checking icount_decr violates the
ordering rules.

This is confirmed, ish, by noticing putting a breakpoint at that second return
(or assert) and noticing that icount_decr.u16.hi == -1.  It did get set by one
of the other threads, and before gdb managed to stop the world.


r~
David Hildenbrand Oct. 2, 2019, 7:04 a.m. UTC | #4
On 01.10.19 23:59, Richard Henderson wrote:
> On 10/1/19 12:47 PM, David Hildenbrand wrote:
>> On 01.10.19 21:17, Richard Henderson wrote:
>>> On 10/1/19 11:16 AM, David Hildenbrand wrote:
>>>> +static inline bool should_interrupt_instruction(CPUState *cs)
>>>> +{
>>>> +    /*
>>>> +     * Something asked us to stop executing chained TBs, e.g.,
>>>> +     * cpu_interrupt() or cpu_exit().
>>>> +     */
>>>> +    if ((int32_t)atomic_read(&cpu_neg(cs)->icount_decr.u32) < 0) {
>>>> +        return true;
>>>> +    }
>>>> +
>>>> +    /* We have a deliverable interrupt pending. */
>>>> +    if ((atomic_read(&cs->interrupt_request) & CPU_INTERRUPT_HARD) &&
>>>> +        s390_cpu_has_int(S390_CPU(cs))) {
>>>> +        return true;
>>>> +    }
>>>> +    return false;
>>>> +}
>>>
>>> The first condition should be true whenever the second condition is true.
>>
>> @@ -1018,6 +1018,7 @@ static inline bool should_interrupt_instruction(CPUState *cs)
>>      /* We have a deliverable interrupt pending. */
>>      if ((atomic_read(&cs->interrupt_request) & CPU_INTERRUPT_HARD) &&
>>          s390_cpu_has_int(S390_CPU(cs))) {
>> +        g_assert((int32_t)atomic_read(&cpu_neg(cs)->icount_decr.u32) < 0);
>>          return true;
>>      }
>>      return false;
>>
>>
>> ...
>>
>>
>> [   60.109761] systemd[1]: Set hostname to <rhel8>.
>> **
>> ERROR:/home/dhildenb/git/qemu/target/s390x/mem_helper.c:1021:should_interrupt_instruction: assertion failed: ((int32_t)atomic_read(&cpu_neg(cs)->icount_decr.u32) < 0)
>>
>>
>> A race? Roughly 20-30% pass the first but not the second check. And
>> in total, on a Fedora 30 boot, I can maybe see 30 calls of
>> should_interrupt_instruction() succeeding.
>>
>> I thought these could be pending interrupts that were not deliverable
>> when injected but are now deliverable. For these,
>> icount_decr.u32.high would already have been set to 0.
>>
>> OTOH, I guess we always exit the TB in case we change the "deliverable" state
>> of an IRQ, e.g., after LPSW or LCTL. E.g.,
>>
>> static DisasJumpType op_lctlg(DisasContext *s, DisasOps *o)
>> {
>> ...
>>     /* Exit to main loop to reevaluate s390_cpu_exec_interrupt.  */
>>     return DISAS_PC_STALE_NOCHAIN;
>> }
>>
>> Maybe really a race then - or we are not properly exiting back to the
>> main loop in all scenarios.
> 
> I think that it's a race right here in should_interrupt_instruction.
> 
> Notice, interrupt_request gets set before icount_decr.  Indeed, the barrier
> happens immediately before the set of icount_decr in cpu_exit().
> 
> (It is briefly confusing that we have a barrier in cpu_exit and not in
> tcg_handle_interrupt.  But that's explained by the cpu_is_self -- no need for a
> barrier for the current cpu.  I also think we could usefully use
> atomic_store_release instead of a separate smp_wmb.)
> 
> Therefore checking interrupt_request after checking icount_decr violates the
> ordering rules.
> 
> This is confirmed, ish, by noticing putting a breakpoint at that second return
> (or assert) and noticing that icount_decr.u16.hi == -1.  It did get set by one
> of the other threads, and before gdb managed to stop the world.

I am still puzzled why I can trigger that many races. I'll do some more
investigation, however, if there would be a deliverable interrupt which
will not set icount_decr.u16.hi == -1, then we would have to fix
something that ionstead. So the second check in this patch can indeed
go, thanks for clarifying!
David Hildenbrand Oct. 2, 2019, 8:19 a.m. UTC | #5
On 02.10.19 09:04, David Hildenbrand wrote:
> On 01.10.19 23:59, Richard Henderson wrote:
>> On 10/1/19 12:47 PM, David Hildenbrand wrote:
>>> On 01.10.19 21:17, Richard Henderson wrote:
>>>> On 10/1/19 11:16 AM, David Hildenbrand wrote:
>>>>> +static inline bool should_interrupt_instruction(CPUState *cs)
>>>>> +{
>>>>> +    /*
>>>>> +     * Something asked us to stop executing chained TBs, e.g.,
>>>>> +     * cpu_interrupt() or cpu_exit().
>>>>> +     */
>>>>> +    if ((int32_t)atomic_read(&cpu_neg(cs)->icount_decr.u32) < 0) {
>>>>> +        return true;
>>>>> +    }
>>>>> +
>>>>> +    /* We have a deliverable interrupt pending. */
>>>>> +    if ((atomic_read(&cs->interrupt_request) & CPU_INTERRUPT_HARD) &&
>>>>> +        s390_cpu_has_int(S390_CPU(cs))) {
>>>>> +        return true;
>>>>> +    }
>>>>> +    return false;
>>>>> +}
>>>>
>>>> The first condition should be true whenever the second condition is true.
>>>
>>> @@ -1018,6 +1018,7 @@ static inline bool should_interrupt_instruction(CPUState *cs)
>>>      /* We have a deliverable interrupt pending. */
>>>      if ((atomic_read(&cs->interrupt_request) & CPU_INTERRUPT_HARD) &&
>>>          s390_cpu_has_int(S390_CPU(cs))) {
>>> +        g_assert((int32_t)atomic_read(&cpu_neg(cs)->icount_decr.u32) < 0);
>>>          return true;
>>>      }
>>>      return false;
>>>
>>>
>>> ...
>>>
>>>
>>> [   60.109761] systemd[1]: Set hostname to <rhel8>.
>>> **
>>> ERROR:/home/dhildenb/git/qemu/target/s390x/mem_helper.c:1021:should_interrupt_instruction: assertion failed: ((int32_t)atomic_read(&cpu_neg(cs)->icount_decr.u32) < 0)
>>>
>>>
>>> A race? Roughly 20-30% pass the first but not the second check. And
>>> in total, on a Fedora 30 boot, I can maybe see 30 calls of
>>> should_interrupt_instruction() succeeding.
>>>
>>> I thought these could be pending interrupts that were not deliverable
>>> when injected but are now deliverable. For these,
>>> icount_decr.u32.high would already have been set to 0.
>>>
>>> OTOH, I guess we always exit the TB in case we change the "deliverable" state
>>> of an IRQ, e.g., after LPSW or LCTL. E.g.,
>>>
>>> static DisasJumpType op_lctlg(DisasContext *s, DisasOps *o)
>>> {
>>> ...
>>>     /* Exit to main loop to reevaluate s390_cpu_exec_interrupt.  */
>>>     return DISAS_PC_STALE_NOCHAIN;
>>> }
>>>
>>> Maybe really a race then - or we are not properly exiting back to the
>>> main loop in all scenarios.
>>
>> I think that it's a race right here in should_interrupt_instruction.
>>
>> Notice, interrupt_request gets set before icount_decr.  Indeed, the barrier
>> happens immediately before the set of icount_decr in cpu_exit().
>>
>> (It is briefly confusing that we have a barrier in cpu_exit and not in
>> tcg_handle_interrupt.  But that's explained by the cpu_is_self -- no need for a
>> barrier for the current cpu.  I also think we could usefully use
>> atomic_store_release instead of a separate smp_wmb.)
>>
>> Therefore checking interrupt_request after checking icount_decr violates the
>> ordering rules.
>>
>> This is confirmed, ish, by noticing putting a breakpoint at that second return
>> (or assert) and noticing that icount_decr.u16.hi == -1.  It did get set by one
>> of the other threads, and before gdb managed to stop the world.
> 
> I am still puzzled why I can trigger that many races. I'll do some more
> investigation, however, if there would be a deliverable interrupt which
> will not set icount_decr.u16.hi == -1, then we would have to fix
> something that ionstead. So the second check in this patch can indeed
> go, thanks for clarifying!
> 

So there is one place I once introduced that is at least prone to a very
rare race when CPUs are waking up:
hw/intc/s390_flic.c:qemu_s390_flic_notify()

We set "cs->interrupt_request |= CPU_INTERRUPT_HARD" for all CPUs -
especially also STOPPED ones and trigger cpu_interrupt(cs,
CPU_INTERRUPT_HARD) only if
a) the CPU is running
b) the CPU is sleeping and could eventually consume the interrupt

There is a very small chance that a CPU would wake up and not get a
cpu_interrupt(cs, CPU_INTERRUPT_HARD). In the worst case, the I/O
interrupt would be delayed on that CPU, usually another CPU will pick it
up - totally acceptable.


If pending interrupts are deliverable is affected by the PSW mask and
control registers:
- lctl(g): We do a DISAS_PC_STALE_NOCHAIN
- lpsw(e): We do a DISAS_NORETURN
- stnosm: We do a DISAS_PC_STALE_NOCHAIN
- ssm: We do a DISAS_PC_STALE_NOCHAIN

Of course, the PSW will be changed when injecting exceptions, but that
path should also properly check for other deliverable interrupts (I once
added a deliver loop to s390_cpu_do_interrupt()).

So I think the only option really is that I was seeing a bunch of the
races you explained.
diff mbox series

Patch

diff --git a/target/s390x/mem_helper.c b/target/s390x/mem_helper.c
index 4254548935..96f0728cb7 100644
--- a/target/s390x/mem_helper.c
+++ b/target/s390x/mem_helper.c
@@ -1005,6 +1005,24 @@  static inline uint32_t do_mvcl(CPUS390XState *env,
     return *destlen ? 3 : cc;
 }
 
+static inline bool should_interrupt_instruction(CPUState *cs)
+{
+    /*
+     * Something asked us to stop executing chained TBs, e.g.,
+     * cpu_interrupt() or cpu_exit().
+     */
+    if ((int32_t)atomic_read(&cpu_neg(cs)->icount_decr.u32) < 0) {
+        return true;
+    }
+
+    /* We have a deliverable interrupt pending. */
+    if ((atomic_read(&cs->interrupt_request) & CPU_INTERRUPT_HARD) &&
+        s390_cpu_has_int(S390_CPU(cs))) {
+        return true;
+    }
+    return false;
+}
+
 /* move long */
 uint32_t HELPER(mvcl)(CPUS390XState *env, uint32_t r1, uint32_t r2)
 {
@@ -1015,6 +1033,7 @@  uint32_t HELPER(mvcl)(CPUS390XState *env, uint32_t r1, uint32_t r2)
     uint64_t srclen = env->regs[r2 + 1] & 0xffffff;
     uint64_t src = get_address(env, r2);
     uint8_t pad = env->regs[r2 + 1] >> 24;
+    CPUState *cs = env_cpu(env);
     S390Access srca, desta;
     uint32_t cc, cur_len;
 
@@ -1065,7 +1084,14 @@  uint32_t HELPER(mvcl)(CPUS390XState *env, uint32_t r1, uint32_t r2)
         env->regs[r1 + 1] = deposit64(env->regs[r1 + 1], 0, 24, destlen);
         set_address_zero(env, r1, dest);
 
-        /* TODO: Deliver interrupts. */
+        /*
+         * MVCL is interruptible. Check if there is any irq, and if so,
+         * return to the main loop where we can process it. In case we
+         * don't deliver an interrupt, we'll end up back in this handler.
+         */
+        if (unlikely(should_interrupt_instruction(cs))) {
+            cpu_loop_exit_restore(cs, ra);
+        }
     }
     return cc;
 }