Message ID | 004401d3894c$b3fc90f0$1bf5b2d0$@ru (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 9 January 2018 at 13:21, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote: > I tried to get some logs with the following code. > It prints that there was an exception 5 and it was overwritten by the standard code. > Fixed code prevents this overwrite. > > I guess that one of the following is true: > - unfixed version misses some exceptions > - fixed version processes some exceptions twice (e.g., when there is no clear exception) > > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > index 280200f..fa810f7 100644 > --- a/accel/tcg/cpu-exec.c > +++ b/accel/tcg/cpu-exec.c > @@ -605,6 +605,8 @@ static inline bool cpu_handle_interrupt(CPUState *cpu, > /* Finally, check if we need to exit to the main loop. */ > if (unlikely(atomic_read(&cpu->exit_request) > || (use_icount && cpu->icount_decr.u16.low + cpu->icount_extra == 0))) > + if (cpu->exception_index != -1 && cpu->exception_index != EXCP_INTERRUP > + qemu_log("overwriting excp_index %x\n", cpu->exception_index); > atomic_set(&cpu->exit_request, 0); > cpu->exception_index = EXCP_INTERRUPT; > return true; This looks like it's just working around whatever is going on (why should EXCP_INTERRUPT be special?). What we need to do is find out what's actually happening here... thanks -- PMM
> From: Peter Maydell [mailto:peter.maydell@linaro.org] > On 9 January 2018 at 13:21, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote: > > I tried to get some logs with the following code. > > It prints that there was an exception 5 and it was overwritten by the standard code. > > Fixed code prevents this overwrite. > > > > I guess that one of the following is true: > > - unfixed version misses some exceptions > > - fixed version processes some exceptions twice (e.g., when there is no clear exception) > > > > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > > index 280200f..fa810f7 100644 > > --- a/accel/tcg/cpu-exec.c > > +++ b/accel/tcg/cpu-exec.c > > @@ -605,6 +605,8 @@ static inline bool cpu_handle_interrupt(CPUState *cpu, > > /* Finally, check if we need to exit to the main loop. */ > > if (unlikely(atomic_read(&cpu->exit_request) > > || (use_icount && cpu->icount_decr.u16.low + cpu->icount_extra == 0))) > > + if (cpu->exception_index != -1 && cpu->exception_index != EXCP_INTERRUP > > + qemu_log("overwriting excp_index %x\n", cpu->exception_index); > > atomic_set(&cpu->exit_request, 0); > > cpu->exception_index = EXCP_INTERRUPT; > > return true; > > This looks like it's just working around whatever is going on > (why should EXCP_INTERRUPT be special?). What we need to do is > find out what's actually happening here... The failure cause is in incorrect interrupt processing. When ARM processes hardware interrupt in arm_cpu_exec_interrupt(), it executes cs->exception_index = excp_idx; This assumes, that the exception will be processed later. But it is processed immediately by calling cc->do_interrupt(cs); instead of leaving this job to cpu_exec. I guess these calls should be removed to match the cpu_exec execution pattern. Pavel Dovgalyuk
On 10 January 2018 at 07:04, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote: > The failure cause is in incorrect interrupt processing. > When ARM processes hardware interrupt in arm_cpu_exec_interrupt(), > it executes cs->exception_index = excp_idx; > > This assumes, that the exception will be processed later. > But it is processed immediately by calling cc->do_interrupt(cs); > instead of leaving this job to cpu_exec. That seems fine to me. The code knows it needs to take an interrupt, it has all the information it needs to do it, it can just go ahead and call the code that takes the interrupt. The comment in accel/tcg/cpu-exec.c says: /* The target hook has 3 exit conditions: False when the interrupt isn't processed, True when it is, and we should restart on a new TB, and via longjmp via cpu_loop_exit. */ and here we have processed the interrupt and returned true. We set exception_index because that's how you tell the do_interrupt hook which interrupt to deal with. That is, the pattern that the arm target code assumes for cs->exception_index is "you don't set this unless you're about to call do_interrupt; if you do set it then definitely call do_interrupt and don't do anything much in between". In that view of the world there's no need to reset it or check it because nothing is permitted to happen between "set value" and "call do_interrupt". This is in line with the way we handle other arm-specific bits of information associated with the exception, like env->exception.target_el. Having a long gap between "set value" and "do_interrupt" is worrying because it means that maybe we might end up doing something else in that gap that corrupts the various bits of information associated with the exception, or something that's not architecturally permitted to happen at that point. thanks -- PMM
> From: Peter Maydell [mailto:peter.maydell@linaro.org] > On 10 January 2018 at 07:04, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote: > > The failure cause is in incorrect interrupt processing. > > When ARM processes hardware interrupt in arm_cpu_exec_interrupt(), > > it executes cs->exception_index = excp_idx; > > > > This assumes, that the exception will be processed later. > > But it is processed immediately by calling cc->do_interrupt(cs); > > instead of leaving this job to cpu_exec. > > That seems fine to me. The code knows it needs to take > an interrupt, it has all the information it needs to do > it, it can just go ahead and call the code that takes > the interrupt. The comment in accel/tcg/cpu-exec.c says: > > /* The target hook has 3 exit conditions: > False when the interrupt isn't processed, > True when it is, and we should restart on a new TB, > and via longjmp via cpu_loop_exit. */ > > and here we have processed the interrupt and returned true. > We set exception_index because that's how you tell the > do_interrupt hook which interrupt to deal with. > > That is, the pattern that the arm target code assumes for > cs->exception_index is "you don't set this unless you're > about to call do_interrupt; if you do set it then definitely > call do_interrupt and don't do anything much in between". > In that view of the world there's no need to reset it or > check it because nothing is permitted to happen between > "set value" and "call do_interrupt". This is in line with > the way we handle other arm-specific bits of information > associated with the exception, like env->exception.target_el. > Having a long gap between "set value" and "do_interrupt" > is worrying because it means that maybe we might end > up doing something else in that gap that corrupts the > various bits of information associated with the exception, > or something that's not architecturally permitted to > happen at that point. I see. I found the same pattern in other targets. But only MIPS resets exception_index after processing the interrupt. Others do not bother. Then the following change of cpu_exec should be correct? if (cc->cpu_exec_interrupt(cpu, interrupt_request)) { replay_interrupt(); + cpu->exception_index = -1; *last_tb = NULL; } Pavel Dovgalyuk
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index 280200f..fa810f7 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -605,6 +605,8 @@ static inline bool cpu_handle_interrupt(CPUState *cpu, /* Finally, check if we need to exit to the main loop. */ if (unlikely(atomic_read(&cpu->exit_request) || (use_icount && cpu->icount_decr.u16.low + cpu->icount_extra == 0))) + if (cpu->exception_index != -1 && cpu->exception_index != EXCP_INTERRUP + qemu_log("overwriting excp_index %x\n", cpu->exception_index); atomic_set(&cpu->exit_request, 0); cpu->exception_index = EXCP_INTERRUPT; return true;