diff mbox

tcg: handle EXCP_ATOMIC exception properly

Message ID 20170210014519.12413-1-bobby.prani@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Pranith Kumar Feb. 10, 2017, 1:45 a.m. UTC
The current method of executing atomic code in a guest uses
cpu_exec_step_atomic() from the outermost loop. This causes an abort()
when single stepping over atomic code since debug exception longjmp
will point to the the setlongjmp in cpu_exec(). Another issue with
this mechanism is that the flags which were set in atomic execution
will be lost since we do not call cpu_exec_enter().

The following patch moves atomic exception handling to the exception
handler where all these issues are taken care of. The change in
start_exclusive() is necessary since now the cpu in atomic execution
will have its running flag set, but we do not want to count it as
pending.

Thanks to Alex for helping me debug the issue.

CC: Alex Bennée <alex.bennee@linaro.org>
CC: Richard Henderson <rth@twiddle.net>
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 cpu-exec.c    | 2 ++
 cpus-common.c | 2 +-
 cpus.c        | 4 ----
 3 files changed, 3 insertions(+), 5 deletions(-)

Comments

Pranith Kumar Feb. 10, 2017, 4:54 a.m. UTC | #1
On Thu, Feb 9, 2017 at 8:45 PM, Pranith Kumar <bobby.prani@gmail.com> wrote:
>
> The current method of executing atomic code in a guest uses
> cpu_exec_step_atomic() from the outermost loop. This causes an abort()
> when single stepping over atomic code since debug exception longjmp
> will point to the the setlongjmp in cpu_exec(). Another issue with
> this mechanism is that the flags which were set in atomic execution
> will be lost since we do not call cpu_exec_enter().
>
> The following patch moves atomic exception handling to the exception
> handler where all these issues are taken care of. The change in
> start_exclusive() is necessary since now the cpu in atomic execution
> will have its running flag set, but we do not want to count it as
> pending.
>
> Thanks to Alex for helping me debug the issue.
>
> CC: Alex Bennée <alex.bennee@linaro.org>
> CC: Richard Henderson <rth@twiddle.net>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
>  cpu-exec.c    | 2 ++
>  cpus-common.c | 2 +-
>  cpus.c        | 4 ----
>  3 files changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index b0ddada8c1..dceacfc5dd 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -427,6 +427,8 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>              *ret = cpu->exception_index;
>              if (*ret == EXCP_DEBUG) {
>                  cpu_handle_debug_exception(cpu);
> +            } else if (*ret == EXCP_ATOMIC) {
> +                cpu_exec_step_atomic(cpu);
>              }
>              cpu->exception_index = -1;
>              return true;


Looks like this is going to be a problem since we should not call
start_exclusive() from cpu_exec() (doh', I just read the comment for
this :-/).

It'll be great if we can make it callable from there. Thoughts?

Thanks,
Paolo Bonzini Feb. 10, 2017, 11:34 a.m. UTC | #2
On 10/02/2017 02:45, Pranith Kumar wrote:
> The current method of executing atomic code in a guest uses
> cpu_exec_step_atomic() from the outermost loop. This causes an abort()
> when single stepping over atomic code since debug exception longjmp
> will point to the the setlongjmp in cpu_exec(). Another issue with
> this mechanism is that the flags which were set in atomic execution
> will be lost since we do not call cpu_exec_enter().
> 
> The following patch moves atomic exception handling to the exception
> handler where all these issues are taken care of. The change in
> start_exclusive() is necessary since now the cpu in atomic execution
> will have its running flag set, but we do not want to count it as
> pending.
> 
> Thanks to Alex for helping me debug the issue.
> 
> CC: Alex Bennée <alex.bennee@linaro.org>
> CC: Richard Henderson <rth@twiddle.net>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
>  cpu-exec.c    | 2 ++
>  cpus-common.c | 2 +-
>  cpus.c        | 4 ----
>  3 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/cpu-exec.c b/cpu-exec.c
> index b0ddada8c1..dceacfc5dd 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -427,6 +427,8 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>              *ret = cpu->exception_index;
>              if (*ret == EXCP_DEBUG) {
>                  cpu_handle_debug_exception(cpu);
> +            } else if (*ret == EXCP_ATOMIC) {
> +                cpu_exec_step_atomic(cpu);

I think you can unlock/lock the iothread here, and also call
cpu_exec_end/start to work around the limitation in start_exclusive.

Paolo

>              }
>              cpu->exception_index = -1;
>              return true;
> diff --git a/cpus-common.c b/cpus-common.c
> index 59f751ecf9..7b859752ea 100644
> --- a/cpus-common.c
> +++ b/cpus-common.c
> @@ -192,7 +192,7 @@ void start_exclusive(void)
>      smp_mb();
>      running_cpus = 0;
>      CPU_FOREACH(other_cpu) {
> -        if (atomic_read(&other_cpu->running)) {
> +        if (atomic_read(&other_cpu->running) && !qemu_cpu_is_self(other_cpu)) {
>              other_cpu->has_waiter = true;
>              running_cpus++;
>              qemu_cpu_kick(other_cpu);
> diff --git a/cpus.c b/cpus.c
> index e1b82bcd49..981f23d52b 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1461,10 +1461,6 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>                   */
>                  g_assert(cpu->halted);
>                  break;
> -            case EXCP_ATOMIC:
> -                qemu_mutex_unlock_iothread();
> -                cpu_exec_step_atomic(cpu);
> -                qemu_mutex_lock_iothread();
>              default:
>                  /* Ignore everything else? */
>                  break;
>
Alex Bennée Feb. 10, 2017, 12:13 p.m. UTC | #3
Pranith Kumar <bobby.prani@gmail.com> writes:

> The current method of executing atomic code in a guest uses
> cpu_exec_step_atomic() from the outermost loop. This causes an abort()
> when single stepping over atomic code since debug exception longjmp
> will point to the the setlongjmp in cpu_exec(). Another issue with
> this mechanism is that the flags which were set in atomic execution
> will be lost since we do not call cpu_exec_enter().

I should not the original patch (which is still in my tree so I guess I
should squash it) says:

  The patch enables handling atomic code in the guest. This should be
  preferably done in cpu_handle_exception(), but the current assumptions
  regarding when we can execute atomic sections cause a deadlock.


> The following patch moves atomic exception handling to the exception
> handler where all these issues are taken care of. The change in
> start_exclusive() is necessary since now the cpu in atomic execution
> will have its running flag set, but we do not want to count it as
> pending.
>
> Thanks to Alex for helping me debug the issue.
>
> CC: Alex Bennée <alex.bennee@linaro.org>
> CC: Richard Henderson <rth@twiddle.net>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
>  cpu-exec.c    | 2 ++
>  cpus-common.c | 2 +-
>  cpus.c        | 4 ----
>  3 files changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index b0ddada8c1..dceacfc5dd 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -427,6 +427,8 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>              *ret = cpu->exception_index;
>              if (*ret == EXCP_DEBUG) {
>                  cpu_handle_debug_exception(cpu);
> +            } else if (*ret == EXCP_ATOMIC) {
> +                cpu_exec_step_atomic(cpu);
>              }
>              cpu->exception_index = -1;
>              return true;
> diff --git a/cpus-common.c b/cpus-common.c
> index 59f751ecf9..7b859752ea 100644
> --- a/cpus-common.c
> +++ b/cpus-common.c
> @@ -192,7 +192,7 @@ void start_exclusive(void)
>      smp_mb();
>      running_cpus = 0;
>      CPU_FOREACH(other_cpu) {
> -        if (atomic_read(&other_cpu->running)) {
> +        if (atomic_read(&other_cpu->running) &&
> !qemu_cpu_is_self(other_cpu)) {

The comment above reads:

  Must only be called from outside cpu_exec.

So we need to revise this comment. Is this really a limitation or was it
originally the design goal?

>              other_cpu->has_waiter = true;
>              running_cpus++;
>              qemu_cpu_kick(other_cpu);
> diff --git a/cpus.c b/cpus.c
> index e1b82bcd49..981f23d52b 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1461,10 +1461,6 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>                   */
>                  g_assert(cpu->halted);
>                  break;
> -            case EXCP_ATOMIC:
> -                qemu_mutex_unlock_iothread();
> -                cpu_exec_step_atomic(cpu);
> -                qemu_mutex_lock_iothread();
>              default:
>                  /* Ignore everything else? */
>                  break;


--
Alex Bennée
Paolo Bonzini Feb. 10, 2017, 12:15 p.m. UTC | #4
On 10/02/2017 13:13, Alex Bennée wrote:
>> +        if (atomic_read(&other_cpu->running) &&
>> !qemu_cpu_is_self(other_cpu)) {
> The comment above reads:
> 
>   Must only be called from outside cpu_exec.
> 
> So we need to revise this comment. Is this really a limitation or was it
> originally the design goal?

If you want to call it within cpu_exec, you can always use
cpu_exec_end/start around it.

I think we should first of all get rid of the iothread lock within
cpu_exec, and then look at this patch again.

Paolo
Alex Bennée Feb. 10, 2017, 12:18 p.m. UTC | #5
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 10/02/2017 02:45, Pranith Kumar wrote:
>> The current method of executing atomic code in a guest uses
>> cpu_exec_step_atomic() from the outermost loop. This causes an abort()
>> when single stepping over atomic code since debug exception longjmp
>> will point to the the setlongjmp in cpu_exec(). Another issue with
>> this mechanism is that the flags which were set in atomic execution
>> will be lost since we do not call cpu_exec_enter().
>>
>> The following patch moves atomic exception handling to the exception
>> handler where all these issues are taken care of. The change in
>> start_exclusive() is necessary since now the cpu in atomic execution
>> will have its running flag set, but we do not want to count it as
>> pending.
>>
>> Thanks to Alex for helping me debug the issue.
>>
>> CC: Alex Bennée <alex.bennee@linaro.org>
>> CC: Richard Henderson <rth@twiddle.net>
>> CC: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
>> ---
>>  cpu-exec.c    | 2 ++
>>  cpus-common.c | 2 +-
>>  cpus.c        | 4 ----
>>  3 files changed, 3 insertions(+), 5 deletions(-)
>>
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index b0ddada8c1..dceacfc5dd 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -427,6 +427,8 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>>              *ret = cpu->exception_index;
>>              if (*ret == EXCP_DEBUG) {
>>                  cpu_handle_debug_exception(cpu);
>> +            } else if (*ret == EXCP_ATOMIC) {
>> +                cpu_exec_step_atomic(cpu);
>
> I think you can unlock/lock the iothread here, and also call

The iothread is already unlocked by this point (see tcg_cpu_exec).

> cpu_exec_end/start to work around the limitation in start_exclusive.

While that seems right it also seems very messy as it inverts the calls
so far. I fear we may end up very confused in special casing. Is there a
cleaner way we can unwind this?

>
> Paolo
>
>>              }
>>              cpu->exception_index = -1;
>>              return true;
>> diff --git a/cpus-common.c b/cpus-common.c
>> index 59f751ecf9..7b859752ea 100644
>> --- a/cpus-common.c
>> +++ b/cpus-common.c
>> @@ -192,7 +192,7 @@ void start_exclusive(void)
>>      smp_mb();
>>      running_cpus = 0;
>>      CPU_FOREACH(other_cpu) {
>> -        if (atomic_read(&other_cpu->running)) {
>> +        if (atomic_read(&other_cpu->running) && !qemu_cpu_is_self(other_cpu)) {
>>              other_cpu->has_waiter = true;
>>              running_cpus++;
>>              qemu_cpu_kick(other_cpu);
>> diff --git a/cpus.c b/cpus.c
>> index e1b82bcd49..981f23d52b 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -1461,10 +1461,6 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>                   */
>>                  g_assert(cpu->halted);
>>                  break;
>> -            case EXCP_ATOMIC:
>> -                qemu_mutex_unlock_iothread();
>> -                cpu_exec_step_atomic(cpu);
>> -                qemu_mutex_lock_iothread();
>>              default:
>>                  /* Ignore everything else? */
>>                  break;
>>


--
Alex Bennée
Paolo Bonzini Feb. 10, 2017, 12:29 p.m. UTC | #6
On 10/02/2017 13:18, Alex Bennée wrote:
>> I think you can unlock/lock the iothread here, and also call
>
> The iothread is already unlocked by this point (see tcg_cpu_exec).

Is this patch on top of the MTTCG branch?  If not, cpu_handle_exception
runs with the iothread lock taken doesn't it?

>> cpu_exec_end/start to work around the limitation in start_exclusive.
>
> While that seems right it also seems very messy as it inverts the calls
> so far. I fear we may end up very confused in special casing. Is there a
> cleaner way we can unwind this?

I'm not sure what is messy...  cpu_exec_start/end and
start/end_exclusive are fundamentally a rwlock.

Doing

	cpu_exec_end
	start_exclusive
	...
	end_exclusive
	cpu_exec_start

simply means temporarily upgrading the lock from read to write.

Paolo
Alex Bennée Feb. 10, 2017, 12:33 p.m. UTC | #7
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 10/02/2017 13:13, Alex Bennée wrote:
>>> +        if (atomic_read(&other_cpu->running) &&
>>> !qemu_cpu_is_self(other_cpu)) {
>> The comment above reads:
>>
>>   Must only be called from outside cpu_exec.
>>
>> So we need to revise this comment. Is this really a limitation or was it
>> originally the design goal?
>
> If you want to call it within cpu_exec, you can always use
> cpu_exec_end/start around it.
>
> I think we should first of all get rid of the iothread lock within
> cpu_exec, and then look at this patch again.

What to do with the MTTCG series in the meantime?

I'm hesitant to hold up the whole series for a potentially messy
re-factor of the cpu loop code to push out the BQL which could take some
time, although of course I don't want to merge something that makes that
harder.

That said for TCG system emulation EXCP_ATOMIC is currently broken with
respect to debugging. However for the initial guests/host combination of
ARMv7/8 on x86_64 we don't need the fallback with pretty much 99% of
deployed hosts. How about the following:

  - drop Pranith's patch for the current MTTCG series
  - replace with an error/abort on EXCP_ATOMIC
  - proceed with merge as plan
  - tackle the BQL lock next (along with more MTTCG guest/targets enablement)

Richard/Peter,

Any thoughts/opinions on this?

--
Alex Bennée
Alex Bennée Feb. 10, 2017, 12:57 p.m. UTC | #8
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 10/02/2017 13:18, Alex Bennée wrote:
>>> I think you can unlock/lock the iothread here, and also call
>>
>> The iothread is already unlocked by this point (see tcg_cpu_exec).
>
> Is this patch on top of the MTTCG branch?  If not, cpu_handle_exception
> runs with the iothread lock taken doesn't it?
>
>>> cpu_exec_end/start to work around the limitation in start_exclusive.
>>
>> While that seems right it also seems very messy as it inverts the calls
>> so far. I fear we may end up very confused in special casing. Is there a
>> cleaner way we can unwind this?
>
> I'm not sure what is messy...  cpu_exec_start/end and
> start/end_exclusive are fundamentally a rwlock.
>
> Doing
>
> 	cpu_exec_end
> 	start_exclusive
> 	...
> 	end_exclusive
> 	cpu_exec_start
>
> simply means temporarily upgrading the lock from read to write.

I mean messy from the point of reading the code where we reverse a
sequence done from higher up in the call chain. Maybe I'm being overly
picky because we aren't exactly on a hot path here anyway.

--
Alex Bennée
Paolo Bonzini Feb. 10, 2017, 1:12 p.m. UTC | #9
On 10/02/2017 13:33, Alex Bennée wrote:
> That said for TCG system emulation EXCP_ATOMIC is currently broken with
> respect to debugging. However for the initial guests/host combination of
> ARMv7/8 on x86_64 we don't need the fallback with pretty much 99% of
> deployed hosts. How about the following:
> 
>   - drop Pranith's patch for the current MTTCG series
>   - replace with an error/abort on EXCP_ATOMIC

Don't even replace it? :)

Paolo

>   - proceed with merge as plan
>   - tackle the BQL lock next (along with more MTTCG guest/targets enablement)
Pranith Kumar Feb. 10, 2017, 1:59 p.m. UTC | #10
On Fri, Feb 10, 2017 at 7:29 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:

>
>
> On 10/02/2017 13:18, Alex Bennée wrote:
> >> I think you can unlock/lock the iothread here, and also call
> >
> > The iothread is already unlocked by this point (see tcg_cpu_exec).
>
> Is this patch on top of the MTTCG branch?  If not, cpu_handle_exception
> runs with the iothread lock taken doesn't it?
>
> >> cpu_exec_end/start to work around the limitation in start_exclusive.
> >
> > While that seems right it also seems very messy as it inverts the calls
> > so far. I fear we may end up very confused in special casing. Is there a
> > cleaner way we can unwind this?
>
> I'm not sure what is messy...  cpu_exec_start/end and
> start/end_exclusive are fundamentally a rwlock.
>
> Doing
>
>         cpu_exec_end
>         start_exclusive
>         ...
>         end_exclusive
>         cpu_exec_start
>
> simply means temporarily upgrading the lock from read to write.
>

This seems to be the simplest thing to do for now. I'll send a v2.

Thanks,
Alex Bennée Feb. 10, 2017, 2:37 p.m. UTC | #11
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 10/02/2017 13:33, Alex Bennée wrote:
>> That said for TCG system emulation EXCP_ATOMIC is currently broken with
>> respect to debugging. However for the initial guests/host combination of
>> ARMv7/8 on x86_64 we don't need the fallback with pretty much 99% of
>> deployed hosts. How about the following:
>>
>>   - drop Pranith's patch for the current MTTCG series
>>   - replace with an error/abort on EXCP_ATOMIC
>
> Don't even replace it? :)

I guess - the failure mode is only when we single step? Should we not
try to emit a warning like we do for the mismatched MOs?

>
> Paolo
>
>>   - proceed with merge as plan
>>   - tackle the BQL lock next (along with more MTTCG guest/targets enablement)


--
Alex Bennée
Paolo Bonzini Feb. 10, 2017, 2:44 p.m. UTC | #12
On 10/02/2017 15:37, Alex Bennée wrote:
>>>   - drop Pranith's patch for the current MTTCG series
>>>   - replace with an error/abort on EXCP_ATOMIC
>> Don't even replace it? :)
> 
> I guess - the failure mode is only when we single step? Should we not
> try to emit a warning like we do for the mismatched MOs?

If we're confident we can fix it before 2.9, we can afford the small bug
for the time being.

Paolo
diff mbox

Patch

diff --git a/cpu-exec.c b/cpu-exec.c
index b0ddada8c1..dceacfc5dd 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -427,6 +427,8 @@  static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
             *ret = cpu->exception_index;
             if (*ret == EXCP_DEBUG) {
                 cpu_handle_debug_exception(cpu);
+            } else if (*ret == EXCP_ATOMIC) {
+                cpu_exec_step_atomic(cpu);
             }
             cpu->exception_index = -1;
             return true;
diff --git a/cpus-common.c b/cpus-common.c
index 59f751ecf9..7b859752ea 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -192,7 +192,7 @@  void start_exclusive(void)
     smp_mb();
     running_cpus = 0;
     CPU_FOREACH(other_cpu) {
-        if (atomic_read(&other_cpu->running)) {
+        if (atomic_read(&other_cpu->running) && !qemu_cpu_is_self(other_cpu)) {
             other_cpu->has_waiter = true;
             running_cpus++;
             qemu_cpu_kick(other_cpu);
diff --git a/cpus.c b/cpus.c
index e1b82bcd49..981f23d52b 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1461,10 +1461,6 @@  static void *qemu_tcg_cpu_thread_fn(void *arg)
                  */
                 g_assert(cpu->halted);
                 break;
-            case EXCP_ATOMIC:
-                qemu_mutex_unlock_iothread();
-                cpu_exec_step_atomic(cpu);
-                qemu_mutex_lock_iothread();
             default:
                 /* Ignore everything else? */
                 break;