Message ID | 20170210014519.12413-1-bobby.prani@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Feb 9, 2017 at 8:45 PM, Pranith Kumar <bobby.prani@gmail.com> wrote: > > The current method of executing atomic code in a guest uses > cpu_exec_step_atomic() from the outermost loop. This causes an abort() > when single stepping over atomic code since debug exception longjmp > will point to the the setlongjmp in cpu_exec(). Another issue with > this mechanism is that the flags which were set in atomic execution > will be lost since we do not call cpu_exec_enter(). > > The following patch moves atomic exception handling to the exception > handler where all these issues are taken care of. The change in > start_exclusive() is necessary since now the cpu in atomic execution > will have its running flag set, but we do not want to count it as > pending. > > Thanks to Alex for helping me debug the issue. > > CC: Alex Bennée <alex.bennee@linaro.org> > CC: Richard Henderson <rth@twiddle.net> > CC: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> > --- > cpu-exec.c | 2 ++ > cpus-common.c | 2 +- > cpus.c | 4 ---- > 3 files changed, 3 insertions(+), 5 deletions(-) > > diff --git a/cpu-exec.c b/cpu-exec.c > index b0ddada8c1..dceacfc5dd 100644 > --- a/cpu-exec.c > +++ b/cpu-exec.c > @@ -427,6 +427,8 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret) > *ret = cpu->exception_index; > if (*ret == EXCP_DEBUG) { > cpu_handle_debug_exception(cpu); > + } else if (*ret == EXCP_ATOMIC) { > + cpu_exec_step_atomic(cpu); > } > cpu->exception_index = -1; > return true; Looks like this is going to be a problem since we should not call start_exclusive() from cpu_exec() (doh', I just read the comment for this :-/). It'll be great if we can make it callable from there. Thoughts? Thanks,
On 10/02/2017 02:45, Pranith Kumar wrote: > The current method of executing atomic code in a guest uses > cpu_exec_step_atomic() from the outermost loop. This causes an abort() > when single stepping over atomic code since debug exception longjmp > will point to the the setlongjmp in cpu_exec(). Another issue with > this mechanism is that the flags which were set in atomic execution > will be lost since we do not call cpu_exec_enter(). > > The following patch moves atomic exception handling to the exception > handler where all these issues are taken care of. The change in > start_exclusive() is necessary since now the cpu in atomic execution > will have its running flag set, but we do not want to count it as > pending. > > Thanks to Alex for helping me debug the issue. > > CC: Alex Bennée <alex.bennee@linaro.org> > CC: Richard Henderson <rth@twiddle.net> > CC: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> > --- > cpu-exec.c | 2 ++ > cpus-common.c | 2 +- > cpus.c | 4 ---- > 3 files changed, 3 insertions(+), 5 deletions(-) > > diff --git a/cpu-exec.c b/cpu-exec.c > index b0ddada8c1..dceacfc5dd 100644 > --- a/cpu-exec.c > +++ b/cpu-exec.c > @@ -427,6 +427,8 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret) > *ret = cpu->exception_index; > if (*ret == EXCP_DEBUG) { > cpu_handle_debug_exception(cpu); > + } else if (*ret == EXCP_ATOMIC) { > + cpu_exec_step_atomic(cpu); I think you can unlock/lock the iothread here, and also call cpu_exec_end/start to work around the limitation in start_exclusive. Paolo > } > cpu->exception_index = -1; > return true; > diff --git a/cpus-common.c b/cpus-common.c > index 59f751ecf9..7b859752ea 100644 > --- a/cpus-common.c > +++ b/cpus-common.c > @@ -192,7 +192,7 @@ void start_exclusive(void) > smp_mb(); > running_cpus = 0; > CPU_FOREACH(other_cpu) { > - if (atomic_read(&other_cpu->running)) { > + if (atomic_read(&other_cpu->running) && !qemu_cpu_is_self(other_cpu)) { > other_cpu->has_waiter = true; > running_cpus++; > qemu_cpu_kick(other_cpu); > diff --git a/cpus.c b/cpus.c > index e1b82bcd49..981f23d52b 100644 > --- a/cpus.c > +++ b/cpus.c > @@ -1461,10 +1461,6 @@ static void *qemu_tcg_cpu_thread_fn(void *arg) > */ > g_assert(cpu->halted); > break; > - case EXCP_ATOMIC: > - qemu_mutex_unlock_iothread(); > - cpu_exec_step_atomic(cpu); > - qemu_mutex_lock_iothread(); > default: > /* Ignore everything else? */ > break; >
Pranith Kumar <bobby.prani@gmail.com> writes: > The current method of executing atomic code in a guest uses > cpu_exec_step_atomic() from the outermost loop. This causes an abort() > when single stepping over atomic code since debug exception longjmp > will point to the the setlongjmp in cpu_exec(). Another issue with > this mechanism is that the flags which were set in atomic execution > will be lost since we do not call cpu_exec_enter(). I should not the original patch (which is still in my tree so I guess I should squash it) says: The patch enables handling atomic code in the guest. This should be preferably done in cpu_handle_exception(), but the current assumptions regarding when we can execute atomic sections cause a deadlock. > The following patch moves atomic exception handling to the exception > handler where all these issues are taken care of. The change in > start_exclusive() is necessary since now the cpu in atomic execution > will have its running flag set, but we do not want to count it as > pending. > > Thanks to Alex for helping me debug the issue. > > CC: Alex Bennée <alex.bennee@linaro.org> > CC: Richard Henderson <rth@twiddle.net> > CC: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> > --- > cpu-exec.c | 2 ++ > cpus-common.c | 2 +- > cpus.c | 4 ---- > 3 files changed, 3 insertions(+), 5 deletions(-) > > diff --git a/cpu-exec.c b/cpu-exec.c > index b0ddada8c1..dceacfc5dd 100644 > --- a/cpu-exec.c > +++ b/cpu-exec.c > @@ -427,6 +427,8 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret) > *ret = cpu->exception_index; > if (*ret == EXCP_DEBUG) { > cpu_handle_debug_exception(cpu); > + } else if (*ret == EXCP_ATOMIC) { > + cpu_exec_step_atomic(cpu); > } > cpu->exception_index = -1; > return true; > diff --git a/cpus-common.c b/cpus-common.c > index 59f751ecf9..7b859752ea 100644 > --- a/cpus-common.c > +++ b/cpus-common.c > @@ -192,7 +192,7 @@ void start_exclusive(void) > smp_mb(); > running_cpus = 0; > CPU_FOREACH(other_cpu) { > - if (atomic_read(&other_cpu->running)) { > + if (atomic_read(&other_cpu->running) && > !qemu_cpu_is_self(other_cpu)) { The comment above reads: Must only be called from outside cpu_exec. So we need to revise this comment. Is this really a limitation or was it originally the design goal? > other_cpu->has_waiter = true; > running_cpus++; > qemu_cpu_kick(other_cpu); > diff --git a/cpus.c b/cpus.c > index e1b82bcd49..981f23d52b 100644 > --- a/cpus.c > +++ b/cpus.c > @@ -1461,10 +1461,6 @@ static void *qemu_tcg_cpu_thread_fn(void *arg) > */ > g_assert(cpu->halted); > break; > - case EXCP_ATOMIC: > - qemu_mutex_unlock_iothread(); > - cpu_exec_step_atomic(cpu); > - qemu_mutex_lock_iothread(); > default: > /* Ignore everything else? */ > break; -- Alex Bennée
On 10/02/2017 13:13, Alex Bennée wrote: >> + if (atomic_read(&other_cpu->running) && >> !qemu_cpu_is_self(other_cpu)) { > The comment above reads: > > Must only be called from outside cpu_exec. > > So we need to revise this comment. Is this really a limitation or was it > originally the design goal? If you want to call it within cpu_exec, you can always use cpu_exec_end/start around it. I think we should first of all get rid of the iothread lock within cpu_exec, and then look at this patch again. Paolo
Paolo Bonzini <pbonzini@redhat.com> writes: > On 10/02/2017 02:45, Pranith Kumar wrote: >> The current method of executing atomic code in a guest uses >> cpu_exec_step_atomic() from the outermost loop. This causes an abort() >> when single stepping over atomic code since debug exception longjmp >> will point to the the setlongjmp in cpu_exec(). Another issue with >> this mechanism is that the flags which were set in atomic execution >> will be lost since we do not call cpu_exec_enter(). >> >> The following patch moves atomic exception handling to the exception >> handler where all these issues are taken care of. The change in >> start_exclusive() is necessary since now the cpu in atomic execution >> will have its running flag set, but we do not want to count it as >> pending. >> >> Thanks to Alex for helping me debug the issue. >> >> CC: Alex Bennée <alex.bennee@linaro.org> >> CC: Richard Henderson <rth@twiddle.net> >> CC: Paolo Bonzini <pbonzini@redhat.com> >> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> >> --- >> cpu-exec.c | 2 ++ >> cpus-common.c | 2 +- >> cpus.c | 4 ---- >> 3 files changed, 3 insertions(+), 5 deletions(-) >> >> diff --git a/cpu-exec.c b/cpu-exec.c >> index b0ddada8c1..dceacfc5dd 100644 >> --- a/cpu-exec.c >> +++ b/cpu-exec.c >> @@ -427,6 +427,8 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret) >> *ret = cpu->exception_index; >> if (*ret == EXCP_DEBUG) { >> cpu_handle_debug_exception(cpu); >> + } else if (*ret == EXCP_ATOMIC) { >> + cpu_exec_step_atomic(cpu); > > I think you can unlock/lock the iothread here, and also call The iothread is already unlocked by this point (see tcg_cpu_exec). > cpu_exec_end/start to work around the limitation in start_exclusive. While that seems right it also seems very messy as it inverts the calls so far. I fear we may end up very confused in special casing. Is there a cleaner way we can unwind this? > > Paolo > >> } >> cpu->exception_index = -1; >> return true; >> diff --git a/cpus-common.c b/cpus-common.c >> index 59f751ecf9..7b859752ea 100644 >> --- a/cpus-common.c >> +++ b/cpus-common.c >> @@ -192,7 +192,7 @@ void start_exclusive(void) >> smp_mb(); >> running_cpus = 0; >> CPU_FOREACH(other_cpu) { >> - if (atomic_read(&other_cpu->running)) { >> + if (atomic_read(&other_cpu->running) && !qemu_cpu_is_self(other_cpu)) { >> other_cpu->has_waiter = true; >> running_cpus++; >> qemu_cpu_kick(other_cpu); >> diff --git a/cpus.c b/cpus.c >> index e1b82bcd49..981f23d52b 100644 >> --- a/cpus.c >> +++ b/cpus.c >> @@ -1461,10 +1461,6 @@ static void *qemu_tcg_cpu_thread_fn(void *arg) >> */ >> g_assert(cpu->halted); >> break; >> - case EXCP_ATOMIC: >> - qemu_mutex_unlock_iothread(); >> - cpu_exec_step_atomic(cpu); >> - qemu_mutex_lock_iothread(); >> default: >> /* Ignore everything else? */ >> break; >> -- Alex Bennée
On 10/02/2017 13:18, Alex Bennée wrote: >> I think you can unlock/lock the iothread here, and also call > > The iothread is already unlocked by this point (see tcg_cpu_exec). Is this patch on top of the MTTCG branch? If not, cpu_handle_exception runs with the iothread lock taken doesn't it? >> cpu_exec_end/start to work around the limitation in start_exclusive. > > While that seems right it also seems very messy as it inverts the calls > so far. I fear we may end up very confused in special casing. Is there a > cleaner way we can unwind this? I'm not sure what is messy... cpu_exec_start/end and start/end_exclusive are fundamentally a rwlock. Doing cpu_exec_end start_exclusive ... end_exclusive cpu_exec_start simply means temporarily upgrading the lock from read to write. Paolo
Paolo Bonzini <pbonzini@redhat.com> writes: > On 10/02/2017 13:13, Alex Bennée wrote: >>> + if (atomic_read(&other_cpu->running) && >>> !qemu_cpu_is_self(other_cpu)) { >> The comment above reads: >> >> Must only be called from outside cpu_exec. >> >> So we need to revise this comment. Is this really a limitation or was it >> originally the design goal? > > If you want to call it within cpu_exec, you can always use > cpu_exec_end/start around it. > > I think we should first of all get rid of the iothread lock within > cpu_exec, and then look at this patch again. What to do with the MTTCG series in the meantime? I'm hesitant to hold up the whole series for a potentially messy re-factor of the cpu loop code to push out the BQL which could take some time, although of course I don't want to merge something that makes that harder. That said for TCG system emulation EXCP_ATOMIC is currently broken with respect to debugging. However for the initial guests/host combination of ARMv7/8 on x86_64 we don't need the fallback with pretty much 99% of deployed hosts. How about the following: - drop Pranith's patch for the current MTTCG series - replace with an error/abort on EXCP_ATOMIC - proceed with merge as plan - tackle the BQL lock next (along with more MTTCG guest/targets enablement) Richard/Peter, Any thoughts/opinions on this? -- Alex Bennée
Paolo Bonzini <pbonzini@redhat.com> writes: > On 10/02/2017 13:18, Alex Bennée wrote: >>> I think you can unlock/lock the iothread here, and also call >> >> The iothread is already unlocked by this point (see tcg_cpu_exec). > > Is this patch on top of the MTTCG branch? If not, cpu_handle_exception > runs with the iothread lock taken doesn't it? > >>> cpu_exec_end/start to work around the limitation in start_exclusive. >> >> While that seems right it also seems very messy as it inverts the calls >> so far. I fear we may end up very confused in special casing. Is there a >> cleaner way we can unwind this? > > I'm not sure what is messy... cpu_exec_start/end and > start/end_exclusive are fundamentally a rwlock. > > Doing > > cpu_exec_end > start_exclusive > ... > end_exclusive > cpu_exec_start > > simply means temporarily upgrading the lock from read to write. I mean messy from the point of reading the code where we reverse a sequence done from higher up in the call chain. Maybe I'm being overly picky because we aren't exactly on a hot path here anyway. -- Alex Bennée
On 10/02/2017 13:33, Alex Bennée wrote: > That said for TCG system emulation EXCP_ATOMIC is currently broken with > respect to debugging. However for the initial guests/host combination of > ARMv7/8 on x86_64 we don't need the fallback with pretty much 99% of > deployed hosts. How about the following: > > - drop Pranith's patch for the current MTTCG series > - replace with an error/abort on EXCP_ATOMIC Don't even replace it? :) Paolo > - proceed with merge as plan > - tackle the BQL lock next (along with more MTTCG guest/targets enablement)
On Fri, Feb 10, 2017 at 7:29 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > On 10/02/2017 13:18, Alex Bennée wrote: > >> I think you can unlock/lock the iothread here, and also call > > > > The iothread is already unlocked by this point (see tcg_cpu_exec). > > Is this patch on top of the MTTCG branch? If not, cpu_handle_exception > runs with the iothread lock taken doesn't it? > > >> cpu_exec_end/start to work around the limitation in start_exclusive. > > > > While that seems right it also seems very messy as it inverts the calls > > so far. I fear we may end up very confused in special casing. Is there a > > cleaner way we can unwind this? > > I'm not sure what is messy... cpu_exec_start/end and > start/end_exclusive are fundamentally a rwlock. > > Doing > > cpu_exec_end > start_exclusive > ... > end_exclusive > cpu_exec_start > > simply means temporarily upgrading the lock from read to write. > This seems to be the simplest thing to do for now. I'll send a v2. Thanks,
Paolo Bonzini <pbonzini@redhat.com> writes: > On 10/02/2017 13:33, Alex Bennée wrote: >> That said for TCG system emulation EXCP_ATOMIC is currently broken with >> respect to debugging. However for the initial guests/host combination of >> ARMv7/8 on x86_64 we don't need the fallback with pretty much 99% of >> deployed hosts. How about the following: >> >> - drop Pranith's patch for the current MTTCG series >> - replace with an error/abort on EXCP_ATOMIC > > Don't even replace it? :) I guess - the failure mode is only when we single step? Should we not try to emit a warning like we do for the mismatched MOs? > > Paolo > >> - proceed with merge as plan >> - tackle the BQL lock next (along with more MTTCG guest/targets enablement) -- Alex Bennée
On 10/02/2017 15:37, Alex Bennée wrote: >>> - drop Pranith's patch for the current MTTCG series >>> - replace with an error/abort on EXCP_ATOMIC >> Don't even replace it? :) > > I guess - the failure mode is only when we single step? Should we not > try to emit a warning like we do for the mismatched MOs? If we're confident we can fix it before 2.9, we can afford the small bug for the time being. Paolo
diff --git a/cpu-exec.c b/cpu-exec.c index b0ddada8c1..dceacfc5dd 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -427,6 +427,8 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret) *ret = cpu->exception_index; if (*ret == EXCP_DEBUG) { cpu_handle_debug_exception(cpu); + } else if (*ret == EXCP_ATOMIC) { + cpu_exec_step_atomic(cpu); } cpu->exception_index = -1; return true; diff --git a/cpus-common.c b/cpus-common.c index 59f751ecf9..7b859752ea 100644 --- a/cpus-common.c +++ b/cpus-common.c @@ -192,7 +192,7 @@ void start_exclusive(void) smp_mb(); running_cpus = 0; CPU_FOREACH(other_cpu) { - if (atomic_read(&other_cpu->running)) { + if (atomic_read(&other_cpu->running) && !qemu_cpu_is_self(other_cpu)) { other_cpu->has_waiter = true; running_cpus++; qemu_cpu_kick(other_cpu); diff --git a/cpus.c b/cpus.c index e1b82bcd49..981f23d52b 100644 --- a/cpus.c +++ b/cpus.c @@ -1461,10 +1461,6 @@ static void *qemu_tcg_cpu_thread_fn(void *arg) */ g_assert(cpu->halted); break; - case EXCP_ATOMIC: - qemu_mutex_unlock_iothread(); - cpu_exec_step_atomic(cpu); - qemu_mutex_lock_iothread(); default: /* Ignore everything else? */ break;
The current method of executing atomic code in a guest uses cpu_exec_step_atomic() from the outermost loop. This causes an abort() when single stepping over atomic code since debug exception longjmp will point to the the setlongjmp in cpu_exec(). Another issue with this mechanism is that the flags which were set in atomic execution will be lost since we do not call cpu_exec_enter(). The following patch moves atomic exception handling to the exception handler where all these issues are taken care of. The change in start_exclusive() is necessary since now the cpu in atomic execution will have its running flag set, but we do not want to count it as pending. Thanks to Alex for helping me debug the issue. CC: Alex Bennée <alex.bennee@linaro.org> CC: Richard Henderson <rth@twiddle.net> CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> --- cpu-exec.c | 2 ++ cpus-common.c | 2 +- cpus.c | 4 ---- 3 files changed, 3 insertions(+), 5 deletions(-)