diff mbox series

target/arm/helper: Fix timer interrupt masking when HCR_EL2.E2H == 0

Message ID 20240615185423.49474-1-florian.lugou@provenrun.com (mailing list archive)
State New, archived
Headers show
Series target/arm/helper: Fix timer interrupt masking when HCR_EL2.E2H == 0 | expand

Commit Message

Florian Lugou June 15, 2024, 6:54 p.m. UTC
CNTHCTL_EL2 based masking of timer interrupts was introduced in
f6fc36deef6abcee406211f3e2f11ff894b87fa4. This masking was however
effective no matter whether EL2 was enabled in the current security
state or not, contrary to arm specification.

Signed-off-by: Florian Lugou <florian.lugou@provenrun.com>
---
 target/arm/helper.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Peter Maydell June 20, 2024, 10:43 a.m. UTC | #1
On Sat, 15 Jun 2024 at 19:56, Florian Lugou <florian.lugou@provenrun.com> wrote:
>
> CNTHCTL_EL2 based masking of timer interrupts was introduced in
> f6fc36deef6abcee406211f3e2f11ff894b87fa4. This masking was however
> effective no matter whether EL2 was enabled in the current security
> state or not, contrary to arm specification.
>
> Signed-off-by: Florian Lugou <florian.lugou@provenrun.com>
> ---
>  target/arm/helper.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index ce31957235..60e2344c68 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -2684,7 +2684,8 @@ static void gt_update_irq(ARMCPU *cpu, int timeridx)
>       * If bit CNTHCTL_EL2.CNT[VP]MASK is set, it overrides IMASK.
>       * It is RES0 in Secure and NonSecure state.
>       */
> -    if ((ss == ARMSS_Root || ss == ARMSS_Realm) &&
> +    if ((arm_hcr_el2_eff(env) & HCR_E2H) &&
> +        (ss == ARMSS_Root || ss == ARMSS_Realm) &&

When the architecture says "is EL2 enabled in the current security state"
it doesn't mean "is HCR_EL2.E2H set?", it means "is this either NonSecure/Realm
or else is SCR_EL2.EEL2 set?". Compare the pseudocode EL2Enabled()
and QEMU's arm_is_el2_enabled() and arm_is_el2_enabled_secstate() functions.
This doesn't mean much in Root state, and for Realm state EL2 is always
enabled (assuming it is implemented).

For this timer check, we're doing I think the same thing as the
pseudocode AArch64.CheckTimerConditions(), which does:

  if (IsFeatureImplemented(FEAT_RME) && ss IN {SS_Root, SS_Realm} &&
      CNTHCTL_EL2.CNTPMASK == '1') then
     imask = '1';

so I'm inclined to say that our current implementation in QEMU is correct.

>          ((timeridx == GTIMER_VIRT && (cnthctl & R_CNTHCTL_CNTVMASK_MASK)) ||
>           (timeridx == GTIMER_PHYS && (cnthctl & R_CNTHCTL_CNTPMASK_MASK)))) {
>          irqstate = 0;
> --

thanks
-- PMM
Florian Lugou June 20, 2024, 1:56 p.m. UTC | #2
On Thu, Jun 20, 2024 at 11:43:17AM +0100, Peter Maydell wrote:
> On Sat, 15 Jun 2024 at 19:56, Florian Lugou <florian.lugou@provenrun.com> wrote:
> >
> > CNTHCTL_EL2 based masking of timer interrupts was introduced in
> > f6fc36deef6abcee406211f3e2f11ff894b87fa4. This masking was however
> > effective no matter whether EL2 was enabled in the current security
> > state or not, contrary to arm specification.
> >
> > Signed-off-by: Florian Lugou <florian.lugou@provenrun.com>
> > ---
> >  target/arm/helper.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/target/arm/helper.c b/target/arm/helper.c
> > index ce31957235..60e2344c68 100644
> > --- a/target/arm/helper.c
> > +++ b/target/arm/helper.c
> > @@ -2684,7 +2684,8 @@ static void gt_update_irq(ARMCPU *cpu, int timeridx)
> >       * If bit CNTHCTL_EL2.CNT[VP]MASK is set, it overrides IMASK.
> >       * It is RES0 in Secure and NonSecure state.
> >       */
> > -    if ((ss == ARMSS_Root || ss == ARMSS_Realm) &&
> > +    if ((arm_hcr_el2_eff(env) & HCR_E2H) &&
> > +        (ss == ARMSS_Root || ss == ARMSS_Realm) &&
> 
> When the architecture says "is EL2 enabled in the current security state"
> it doesn't mean "is HCR_EL2.E2H set?", it means "is this either NonSecure/Realm
> or else is SCR_EL2.EEL2 set?". Compare the pseudocode EL2Enabled()
> and QEMU's arm_is_el2_enabled() and arm_is_el2_enabled_secstate() functions.
> This doesn't mean much in Root state, and for Realm state EL2 is always
> enabled (assuming it is implemented).
> 
> For this timer check, we're doing I think the same thing as the
> pseudocode AArch64.CheckTimerConditions(), which does:
> 
>   if (IsFeatureImplemented(FEAT_RME) && ss IN {SS_Root, SS_Realm} &&
>       CNTHCTL_EL2.CNTPMASK == '1') then
>      imask = '1';
> 
> so I'm inclined to say that our current implementation in QEMU is correct.

Indeed. I got confused with the specification, my apologies.

I am facing an issue with QEMU freezing waiting for a timer interrupt when
running with -icount shift=0,sleep=off. Bissection has shown that the issue
appeared with f6fc36deef6abcee406211f3e2f11ff894b87fa4.

Further testing suggests that the issue may come from gt_recalc_timer. Calling
gt_update_irq before timer_mod (as it was done before f6fc36deef6a) rather than
at the end of the function solves the issue. Is it possible that timer_mod
relies on cpu->gt_timer_outputs, which has not been modified at this point to
reflect the timer triggering?

> 
> >          ((timeridx == GTIMER_VIRT && (cnthctl & R_CNTHCTL_CNTVMASK_MASK)) ||
> >           (timeridx == GTIMER_PHYS && (cnthctl & R_CNTHCTL_CNTPMASK_MASK)))) {
> >          irqstate = 0;
> > --
> 
> thanks
> -- PMM

Best,
Peter Maydell June 20, 2024, 7:01 p.m. UTC | #3
On Thu, 20 Jun 2024 at 14:56, Florian Lugou <florian.lugou@provenrun.com> wrote:
>
> On Thu, Jun 20, 2024 at 11:43:17AM +0100, Peter Maydell wrote:
> > For this timer check, we're doing I think the same thing as the
> > pseudocode AArch64.CheckTimerConditions(), which does:
> >
> >   if (IsFeatureImplemented(FEAT_RME) && ss IN {SS_Root, SS_Realm} &&
> >       CNTHCTL_EL2.CNTPMASK == '1') then
> >      imask = '1';
> >
> > so I'm inclined to say that our current implementation in QEMU is correct.
>
> Indeed. I got confused with the specification, my apologies.
>
> I am facing an issue with QEMU freezing waiting for a timer interrupt when
> running with -icount shift=0,sleep=off. Bissection has shown that the issue
> appeared with f6fc36deef6abcee406211f3e2f11ff894b87fa4.
>
> Further testing suggests that the issue may come from gt_recalc_timer. Calling
> gt_update_irq before timer_mod (as it was done before f6fc36deef6a) rather than
> at the end of the function solves the issue. Is it possible that timer_mod
> relies on cpu->gt_timer_outputs, which has not been modified at this point to
> reflect the timer triggering?

I don't *think* it ought to care -- timer_mod() tells QEMU's timer
infrastructure when to schedule the next timer callback for,
and the gt_timer_outputs qemu_irqs tell the interrupt controller
that the interrupt lines have changed state.

Do you have a reproduce case?

-- PMM
Florian Lugou June 21, 2024, 2:07 p.m. UTC | #4
On Thu, Jun 20, 2024 at 08:01:01PM +0100, Peter Maydell wrote:
> On Thu, 20 Jun 2024 at 14:56, Florian Lugou <florian.lugou@provenrun.com> wrote:
> >
> > On Thu, Jun 20, 2024 at 11:43:17AM +0100, Peter Maydell wrote:
> > > For this timer check, we're doing I think the same thing as the
> > > pseudocode AArch64.CheckTimerConditions(), which does:
> > >
> > >   if (IsFeatureImplemented(FEAT_RME) && ss IN {SS_Root, SS_Realm} &&
> > >       CNTHCTL_EL2.CNTPMASK == '1') then
> > >      imask = '1';
> > >
> > > so I'm inclined to say that our current implementation in QEMU is correct.
> >
> > Indeed. I got confused with the specification, my apologies.
> >
> > I am facing an issue with QEMU freezing waiting for a timer interrupt when
> > running with -icount shift=0,sleep=off. Bissection has shown that the issue
> > appeared with f6fc36deef6abcee406211f3e2f11ff894b87fa4.
> >
> > Further testing suggests that the issue may come from gt_recalc_timer. Calling
> > gt_update_irq before timer_mod (as it was done before f6fc36deef6a) rather than
> > at the end of the function solves the issue. Is it possible that timer_mod
> > relies on cpu->gt_timer_outputs, which has not been modified at this point to
> > reflect the timer triggering?
> 
> I don't *think* it ought to care -- timer_mod() tells QEMU's timer
> infrastructure when to schedule the next timer callback for,
> and the gt_timer_outputs qemu_irqs tell the interrupt controller
> that the interrupt lines have changed state.
> 
> Do you have a reproduce case?

I do:

$ cat test.S
.section .text
.global __start
__start:
    /* Setup exception table */
    ldr     x0, =exc_vector_table
    msr     vbar_el3, x0

    /* Enable and mask secure physical timer */
    mrs     x0, CNTPS_CTL_EL1
    orr     x0, x0, 3
    msr     CNTPS_CTL_EL1, x0

    mov     x0, 0x8000000 /* GIC base address */

    /* Enable group 0 */
    ldr     w1, [x0, 0] /* GICD_CTLR */
    orr     w1, w1, 0x1
    str     w1, [x0, 0] /* GICD_CTLR */

    /* Enable timer interrupt */
    add     x0, x0, 0xb0000 /* SGI_base */
    mov     w1, (1 << 29)
    str     w1, [x0, 0x100] /* GICR_ISENABLER0 */

    /* Enable all priorities */
    mov     x0, 0xff
    msr     ICC_PMR_EL1, x0
    mov     x0, 1
    msr     ICC_IGRPEN0_EL1, x0

    /* Set timer compare value ~0.8s in the future */
    mrs     x0, CNTPCT_EL0
    mov     x1, 0x3000000
    add     x0, x0, x1
    msr     CNTPS_CVAL_EL1, x0

    /* Unmask the timer */
    mrs     x0, CNTPS_CTL_EL1
    bic     x0, x0, 2
    msr     CNTPS_CTL_EL1, x0

    /* Enable interrupts */
    mrs     x0, SCR_EL3
    orr     x0, x0, 4
    msr     SCR_EL3, x0
    msr     daifclr, 0x1
    dsb     sy

    /* Loop on WFI */
0:
    wfi
    b       0b

.macro EXIT
    .p2align 7
    /* Issue a SYS_EXIT semihosting call */
    mov     x0, 0x18
    .word 0xD45E0000
    /* unreachable */
    b       .
.endm

.macro HOLE
    .p2align 7
    b       .
.endm

.p2align 11
exc_vector_table:
    HOLE /* Synchronous, from EL3, with SP_EL0 */
    HOLE /* IRQ,         from EL3, with SP_EL0 */
    HOLE /* FIQ,         from EL3, with SP_EL0 */
    HOLE /* SError,      from EL3, with SP_EL0 */

    HOLE /* Synchronous, from EL3, with SP_ELx */
    HOLE /* IRQ,         from EL3, with SP_ELx */
    EXIT /* FIQ,         from EL3, with SP_ELx */
    HOLE /* SError,      from EL3, with SP_ELx */

    HOLE /* Synchronous, from lower, with lvl n-1 aarch64 */
    HOLE /* IRQ,         from lower, with lvl n-1 aarch64 */
    HOLE /* FIQ,         from lower, with lvl n-1 aarch64 */
    HOLE /* SError,      from lower, with lvl n-1 aarch64 */

    HOLE /* Synchronous, from lower, with lvl n-1 aarch32 */
    HOLE /* IRQ,         from lower, with lvl n-1 aarch32 */
    HOLE /* FIQ,         from lower, with lvl n-1 aarch32 */
    HOLE /* SError,      from lower, with lvl n-1 aarch32 */

$ aarch64-none-elf-gcc -ffreestanding -nostdlib -T qemu/tests/tcg/aarch64/system/kernel.ld -o test test.S

$ qemu-system-aarch64 \
        -machine virt,secure=on,gic-version=3 \
        -cpu cortex-a57 \
        -kernel test \
        -display none \
        -semihosting

$ # Exits after ~1s

$ qemu-system-aarch64 \
        -machine virt,secure=on,gic-version=3 \
        -cpu cortex-a57 \
        -kernel test \
        -display none \
        -semihosting \
        -icount shift=0,sleep=off

... (hangs until QEMU is killed)

Best,
Peter Maydell Aug. 13, 2024, 12:13 p.m. UTC | #5
On Fri, 21 Jun 2024 at 15:07, Florian Lugou <florian.lugou@provenrun.com> wrote:
>
> On Thu, Jun 20, 2024 at 08:01:01PM +0100, Peter Maydell wrote:
> > On Thu, 20 Jun 2024 at 14:56, Florian Lugou <florian.lugou@provenrun.com> wrote:
> > >
> > > On Thu, Jun 20, 2024 at 11:43:17AM +0100, Peter Maydell wrote:
> > > > For this timer check, we're doing I think the same thing as the
> > > > pseudocode AArch64.CheckTimerConditions(), which does:
> > > >
> > > >   if (IsFeatureImplemented(FEAT_RME) && ss IN {SS_Root, SS_Realm} &&
> > > >       CNTHCTL_EL2.CNTPMASK == '1') then
> > > >      imask = '1';
> > > >
> > > > so I'm inclined to say that our current implementation in QEMU is correct.
> > >
> > > Indeed. I got confused with the specification, my apologies.
> > >
> > > I am facing an issue with QEMU freezing waiting for a timer interrupt when
> > > running with -icount shift=0,sleep=off. Bissection has shown that the issue
> > > appeared with f6fc36deef6abcee406211f3e2f11ff894b87fa4.
> > >
> > > Further testing suggests that the issue may come from gt_recalc_timer. Calling
> > > gt_update_irq before timer_mod (as it was done before f6fc36deef6a) rather than
> > > at the end of the function solves the issue. Is it possible that timer_mod
> > > relies on cpu->gt_timer_outputs, which has not been modified at this point to
> > > reflect the timer triggering?
> >
> > I don't *think* it ought to care -- timer_mod() tells QEMU's timer
> > infrastructure when to schedule the next timer callback for,
> > and the gt_timer_outputs qemu_irqs tell the interrupt controller
> > that the interrupt lines have changed state.
> >
> > Do you have a reproduce case?
>
> I do:

(Sorry I didn't get back to you on this earlier.)

> $ aarch64-none-elf-gcc -ffreestanding -nostdlib -T qemu/tests/tcg/aarch64/system/kernel.ld -o test test.S
>
> $ qemu-system-aarch64 \
>         -machine virt,secure=on,gic-version=3 \
>         -cpu cortex-a57 \
>         -kernel test \
>         -display none \
>         -semihosting
>
> $ # Exits after ~1s
>
> $ qemu-system-aarch64 \
>         -machine virt,secure=on,gic-version=3 \
>         -cpu cortex-a57 \
>         -kernel test \
>         -display none \
>         -semihosting \
>         -icount shift=0,sleep=off
>
> ... (hangs until QEMU is killed)

For me, with QEMU commit 9eb51530c12ae645b, this test case
exits (doesn't hang) with both these command lines. Do you
still see this bug? I guess it's possible we fixed it in
the last month or so, though I can't see anything obviously
relevant in the git logs.

thanks
-- PMM
Florian Lugou Aug. 20, 2024, 11:30 a.m. UTC | #6
> > $ aarch64-none-elf-gcc -ffreestanding -nostdlib -T qemu/tests/tcg/aarch64/system/kernel.ld -o test test.S
> >
> > $ qemu-system-aarch64 \
> >         -machine virt,secure=on,gic-version=3 \
> >         -cpu cortex-a57 \
> >         -kernel test \
> >         -display none \
> >         -semihosting
> >
> > $ # Exits after ~1s
> >
> > $ qemu-system-aarch64 \
> >         -machine virt,secure=on,gic-version=3 \
> >         -cpu cortex-a57 \
> >         -kernel test \
> >         -display none \
> >         -semihosting \
> >         -icount shift=0,sleep=off
> >
> > ... (hangs until QEMU is killed)
> 
> For me, with QEMU commit 9eb51530c12ae645b, this test case
> exits (doesn't hang) with both these command lines. Do you
> still see this bug? I guess it's possible we fixed it in
> the last month or so, though I can't see anything obviously
> relevant in the git logs.

Thank you for taking the time to test it.

On my machine (Ubuntu 22.04), with QEMU configuration options
"--target-list=aarch64-softmmu --enable-debug", running the provided test case
with "-icount shift=0,sleep=off" still makes QEMU hang forever on commit
9eb51530c12ae645b.

The issue was initially reported by a colleague of mine so I was hoping it would
be somehow reliably reproducible. But apparently it is not.

I will try to find some time to investigate a bit more.

Thank you,
diff mbox series

Patch

diff --git a/target/arm/helper.c b/target/arm/helper.c
index ce31957235..60e2344c68 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -2684,7 +2684,8 @@  static void gt_update_irq(ARMCPU *cpu, int timeridx)
      * If bit CNTHCTL_EL2.CNT[VP]MASK is set, it overrides IMASK.
      * It is RES0 in Secure and NonSecure state.
      */
-    if ((ss == ARMSS_Root || ss == ARMSS_Realm) &&
+    if ((arm_hcr_el2_eff(env) & HCR_E2H) &&
+        (ss == ARMSS_Root || ss == ARMSS_Realm) &&
         ((timeridx == GTIMER_VIRT && (cnthctl & R_CNTHCTL_CNTVMASK_MASK)) ||
          (timeridx == GTIMER_PHYS && (cnthctl & R_CNTHCTL_CNTPMASK_MASK)))) {
         irqstate = 0;