diff mbox

[v7,03/14] replay: exception replay fix

Message ID 20170124071713.4572.36636.stgit@PASHA-ISP (mailing list archive)
State New, archived
Headers show

Commit Message

Pavel Dovgalyuk Jan. 24, 2017, 7:17 a.m. UTC
This patch fixes replaying the exception when TB cache is full.
It breaks cpu loop execution through setting exception_index
to process such queued work as TB flush.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
---
 cpu-exec.c |    4 ++++
 1 file changed, 4 insertions(+)

Comments

Paolo Bonzini Jan. 25, 2017, 10:50 a.m. UTC | #1
On 24/01/2017 08:17, Pavel Dovgalyuk wrote:
> @@ -451,6 +451,10 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>  #ifndef CONFIG_USER_ONLY
>      } else if (replay_has_exception()
>                 && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
> +        /* Break the execution loop in case of running out of TB cache.
> +           This is needed to make flushing of the TB cache, because
> +           real flush is queued to be executed outside the cpu loop. */
> +        cpu->exception_index = EXCP_INTERRUPT;
>          /* try to cause an exception pending in the log */
>          cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
>          *ret = -1;

Why is replay_has_exception() related to be running out of TB cache?

Paolo
Pavel Dovgalyuk Jan. 25, 2017, 11:12 a.m. UTC | #2
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> On 24/01/2017 08:17, Pavel Dovgalyuk wrote:
> > @@ -451,6 +451,10 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
> >  #ifndef CONFIG_USER_ONLY
> >      } else if (replay_has_exception()
> >                 && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
> > +        /* Break the execution loop in case of running out of TB cache.
> > +           This is needed to make flushing of the TB cache, because
> > +           real flush is queued to be executed outside the cpu loop. */
> > +        cpu->exception_index = EXCP_INTERRUPT;
> >          /* try to cause an exception pending in the log */
> >          cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
> >          *ret = -1;
> 
> Why is replay_has_exception() related to be running out of TB cache?

It doesn't.
Calling tb_find when there is not space in cache causes tb_flush and cpu_loop_exit.
But execution loop will continue, because there is no reason to break it
(like setting exception_index).

Pavel Dovgalyuk
Paolo Bonzini Jan. 25, 2017, 11:18 a.m. UTC | #3
On 25/01/2017 12:12, Pavel Dovgalyuk wrote:
>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>> On 24/01/2017 08:17, Pavel Dovgalyuk wrote:
>>> @@ -451,6 +451,10 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>>>  #ifndef CONFIG_USER_ONLY
>>>      } else if (replay_has_exception()
>>>                 && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
>>> +        /* Break the execution loop in case of running out of TB cache.
>>> +           This is needed to make flushing of the TB cache, because
>>> +           real flush is queued to be executed outside the cpu loop. */
>>> +        cpu->exception_index = EXCP_INTERRUPT;
>>>          /* try to cause an exception pending in the log */
>>>          cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
>>>          *ret = -1;
>>
>> Why is replay_has_exception() related to be running out of TB cache?
> 
> It doesn't.
> Calling tb_find when there is not space in cache causes tb_flush and cpu_loop_exit.
> But execution loop will continue, because there is no reason to break it
> (like setting exception_index).

What about setting cpu->exit_request?  queue_work_on_cpu calls
qemu_cpu_kick.

Paolo
Pavel Dovgalyuk Jan. 25, 2017, 11:33 a.m. UTC | #4
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> On 25/01/2017 12:12, Pavel Dovgalyuk wrote:
> >> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> >> On 24/01/2017 08:17, Pavel Dovgalyuk wrote:
> >>> @@ -451,6 +451,10 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
> >>>  #ifndef CONFIG_USER_ONLY
> >>>      } else if (replay_has_exception()
> >>>                 && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
> >>> +        /* Break the execution loop in case of running out of TB cache.
> >>> +           This is needed to make flushing of the TB cache, because
> >>> +           real flush is queued to be executed outside the cpu loop. */
> >>> +        cpu->exception_index = EXCP_INTERRUPT;
> >>>          /* try to cause an exception pending in the log */
> >>>          cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
> >>>          *ret = -1;
> >>
> >> Why is replay_has_exception() related to be running out of TB cache?
> >
> > It doesn't.
> > Calling tb_find when there is not space in cache causes tb_flush and cpu_loop_exit.
> > But execution loop will continue, because there is no reason to break it
> > (like setting exception_index).
> 
> What about setting cpu->exit_request?  queue_work_on_cpu calls
> qemu_cpu_kick.

cpu->exit_request does not checked in this loop.
We have to add this checking somewhere then?

Pavel Dovgalyuk
Paolo Bonzini Jan. 25, 2017, 11:56 a.m. UTC | #5
On 25/01/2017 12:33, Pavel Dovgalyuk wrote:
>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>> On 25/01/2017 12:12, Pavel Dovgalyuk wrote:
>>>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>>>> On 24/01/2017 08:17, Pavel Dovgalyuk wrote:
>>>>> @@ -451,6 +451,10 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>>>>>  #ifndef CONFIG_USER_ONLY
>>>>>      } else if (replay_has_exception()
>>>>>                 && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
>>>>> +        /* Break the execution loop in case of running out of TB cache.
>>>>> +           This is needed to make flushing of the TB cache, because
>>>>> +           real flush is queued to be executed outside the cpu loop. */
>>>>> +        cpu->exception_index = EXCP_INTERRUPT;
>>>>>          /* try to cause an exception pending in the log */
>>>>>          cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
>>>>>          *ret = -1;
>>>>
>>>> Why is replay_has_exception() related to be running out of TB cache?
>>>
>>> It doesn't.
>>> Calling tb_find when there is not space in cache causes tb_flush and cpu_loop_exit.
>>> But execution loop will continue, because there is no reason to break it
>>> (like setting exception_index).
>>
>> What about setting cpu->exit_request?  queue_work_on_cpu calls
>> qemu_cpu_kick.
> 
> cpu->exit_request does not checked in this loop.
> We have to add this checking somewhere then?

It's checked by cpu_handle_interrupt.  Are you not reaching
cpu_handle_interrupt then?  Why?

Or perhaps cpu_handle_interrupt should not be testing cpu->exit_request,
but cpu->exception_index != -1 (and cpu_exit can cmpxchg
cpu->exception_index from -1 to EXCP_INTERRUPT)?

Again, it's hard to follow without knowing the invariants. :(

Paolo
Pavel Dovgalyuk Jan. 25, 2017, 12:27 p.m. UTC | #6
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> On 25/01/2017 12:33, Pavel Dovgalyuk wrote:
> >> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> >> On 25/01/2017 12:12, Pavel Dovgalyuk wrote:
> >>>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> >>>> On 24/01/2017 08:17, Pavel Dovgalyuk wrote:
> >>>>> @@ -451,6 +451,10 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
> >>>>>  #ifndef CONFIG_USER_ONLY
> >>>>>      } else if (replay_has_exception()
> >>>>>                 && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
> >>>>> +        /* Break the execution loop in case of running out of TB cache.
> >>>>> +           This is needed to make flushing of the TB cache, because
> >>>>> +           real flush is queued to be executed outside the cpu loop. */
> >>>>> +        cpu->exception_index = EXCP_INTERRUPT;
> >>>>>          /* try to cause an exception pending in the log */
> >>>>>          cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
> >>>>>          *ret = -1;
> >>>>
> >>>> Why is replay_has_exception() related to be running out of TB cache?
> >>>
> >>> It doesn't.
> >>> Calling tb_find when there is not space in cache causes tb_flush and cpu_loop_exit.
> >>> But execution loop will continue, because there is no reason to break it
> >>> (like setting exception_index).
> >>
> >> What about setting cpu->exit_request?  queue_work_on_cpu calls
> >> qemu_cpu_kick.
> >
> > cpu->exit_request does not checked in this loop.
> > We have to add this checking somewhere then?
> 
> It's checked by cpu_handle_interrupt.  Are you not reaching
> cpu_handle_interrupt then?  Why?

Execution doesn't reach cpu_handle_interrupt, because tb_find 
calls cpu_loop_exit. And the execution loop enters cpu_handle_exception again and again.

Pavel Dovgalyuk
Paolo Bonzini Jan. 25, 2017, 1:21 p.m. UTC | #7
On 25/01/2017 13:27, Pavel Dovgalyuk wrote:
>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>> On 25/01/2017 12:33, Pavel Dovgalyuk wrote:
>>>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>>>> On 25/01/2017 12:12, Pavel Dovgalyuk wrote:
>>>>>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>>>>>> On 24/01/2017 08:17, Pavel Dovgalyuk wrote:
>>>>>>> @@ -451,6 +451,10 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>>>>>>>  #ifndef CONFIG_USER_ONLY
>>>>>>>      } else if (replay_has_exception()
>>>>>>>                 && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
>>>>>>> +        /* Break the execution loop in case of running out of TB cache.
>>>>>>> +           This is needed to make flushing of the TB cache, because
>>>>>>> +           real flush is queued to be executed outside the cpu loop. */
>>>>>>> +        cpu->exception_index = EXCP_INTERRUPT;
>>>>>>>          /* try to cause an exception pending in the log */
>>>>>>>          cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
>>>>>>>          *ret = -1;
>>>>>>
>>>>>> Why is replay_has_exception() related to be running out of TB cache?
>>>>>
>>>>> It doesn't.
>>>>> Calling tb_find when there is not space in cache causes tb_flush and cpu_loop_exit.
>>>>> But execution loop will continue, because there is no reason to break it
>>>>> (like setting exception_index).
>>>>
>>>> What about setting cpu->exit_request?  queue_work_on_cpu calls
>>>> qemu_cpu_kick.
>>>
>>> cpu->exit_request does not checked in this loop.
>>> We have to add this checking somewhere then?
>>
>> It's checked by cpu_handle_interrupt.  Are you not reaching
>> cpu_handle_interrupt then?  Why?
> 
> Execution doesn't reach cpu_handle_interrupt, because tb_find 
> calls cpu_loop_exit. And the execution loop enters cpu_handle_exception again and again.

Perhaps tb_gen_code should set cpu->exception_index before calling
cpu_loop_exit.

Paolo

> Pavel Dovgalyuk
>
Pavel Dovgalyuk Jan. 25, 2017, 1:26 p.m. UTC | #8
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> On 25/01/2017 13:27, Pavel Dovgalyuk wrote:
> >> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> >> On 25/01/2017 12:33, Pavel Dovgalyuk wrote:
> >>>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> >>>> On 25/01/2017 12:12, Pavel Dovgalyuk wrote:
> >>>>>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> >>>>>> On 24/01/2017 08:17, Pavel Dovgalyuk wrote:
> >>>>>>> @@ -451,6 +451,10 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
> >>>>>>>  #ifndef CONFIG_USER_ONLY
> >>>>>>>      } else if (replay_has_exception()
> >>>>>>>                 && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
> >>>>>>> +        /* Break the execution loop in case of running out of TB cache.
> >>>>>>> +           This is needed to make flushing of the TB cache, because
> >>>>>>> +           real flush is queued to be executed outside the cpu loop. */
> >>>>>>> +        cpu->exception_index = EXCP_INTERRUPT;
> >>>>>>>          /* try to cause an exception pending in the log */
> >>>>>>>          cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
> >>>>>>>          *ret = -1;
> >>>>>>
> >>>>>> Why is replay_has_exception() related to be running out of TB cache?
> >>>>>
> >>>>> It doesn't.
> >>>>> Calling tb_find when there is not space in cache causes tb_flush and cpu_loop_exit.
> >>>>> But execution loop will continue, because there is no reason to break it
> >>>>> (like setting exception_index).
> >>>>
> >>>> What about setting cpu->exit_request?  queue_work_on_cpu calls
> >>>> qemu_cpu_kick.
> >>>
> >>> cpu->exit_request does not checked in this loop.
> >>> We have to add this checking somewhere then?
> >>
> >> It's checked by cpu_handle_interrupt.  Are you not reaching
> >> cpu_handle_interrupt then?  Why?
> >
> > Execution doesn't reach cpu_handle_interrupt, because tb_find
> > calls cpu_loop_exit. And the execution loop enters cpu_handle_exception again and again.
> 
> Perhaps tb_gen_code should set cpu->exception_index before calling
> cpu_loop_exit.

Maybe it would be better, because tb_gen_code is called in many other paths.
I'll fix the patch.

Pavel Dovgalyuk
diff mbox

Patch

diff --git a/cpu-exec.c b/cpu-exec.c
index fa08c73..79a2167 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -451,6 +451,10 @@  static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
 #ifndef CONFIG_USER_ONLY
     } else if (replay_has_exception()
                && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
+        /* Break the execution loop in case of running out of TB cache.
+           This is needed to make flushing of the TB cache, because
+           real flush is queued to be executed outside the cpu loop. */
+        cpu->exception_index = EXCP_INTERRUPT;
         /* try to cause an exception pending in the log */
         cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
         *ret = -1;