Message ID | 161105999349.694343.16096128094758045254.stgit@pasha-ThinkPad-X280 (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | replay: fix replay of the interrupts | expand |
On 1/19/21 1:39 PM, Pavel Dovgalyuk wrote: > Sometimes interrupt event comes at the same time with > the virtual timers. In this case replay tries to proceed > the timers, because deadline for them is zero. > This patch allows processing interrupts and exceptions > by entering the vCPU execution loop, when deadline is zero, > but checkpoint associated with virtual timers is not ready > to be replayed. > > Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> > --- > accel/tcg/tcg-cpus-icount.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/accel/tcg/tcg-cpus-icount.c b/accel/tcg/tcg-cpus-icount.c > index 9f45432275..a6d2bb8a88 100644 > --- a/accel/tcg/tcg-cpus-icount.c > +++ b/accel/tcg/tcg-cpus-icount.c > @@ -81,7 +81,13 @@ void icount_handle_deadline(void) > int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL, > QEMU_TIMER_ATTR_ALL); > > - if (deadline == 0) { > + /* > + * Instructions, interrupts, and exceptions are processed in cpu-exec. > + * Don't interrupt cpu thread, when these events are waiting > + * (i.e., there is no checkpoint) > + */ > + if (deadline == 0 > + && (replay_mode == REPLAY_MODE_RECORD || replay_has_checkpoint())) { LGTM, but Cc'ing Peter/Alex just in case :) > icount_notify_aio_contexts(); > } > } > >
On 19/01/21 13:39, Pavel Dovgalyuk wrote: > Sometimes interrupt event comes at the same time with > the virtual timers. In this case replay tries to proceed > the timers, because deadline for them is zero. > This patch allows processing interrupts and exceptions > by entering the vCPU execution loop, when deadline is zero, > but checkpoint associated with virtual timers is not ready > to be replayed. > > Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> > --- > accel/tcg/tcg-cpus-icount.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/accel/tcg/tcg-cpus-icount.c b/accel/tcg/tcg-cpus-icount.c > index 9f45432275..a6d2bb8a88 100644 > --- a/accel/tcg/tcg-cpus-icount.c > +++ b/accel/tcg/tcg-cpus-icount.c > @@ -81,7 +81,13 @@ void icount_handle_deadline(void) > int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL, > QEMU_TIMER_ATTR_ALL); > > - if (deadline == 0) { > + /* > + * Instructions, interrupts, and exceptions are processed in cpu-exec. > + * Don't interrupt cpu thread, when these events are waiting > + * (i.e., there is no checkpoint) > + */ > + if (deadline == 0 > + && (replay_mode == REPLAY_MODE_RECORD || replay_has_checkpoint())) { Should this be replay_mode != REPLAY_MODE_PLAY || replay_has_checkpoint()? Paolo > icount_notify_aio_contexts(); > } > } >
On 23.01.2021 21:15, Paolo Bonzini wrote: > On 19/01/21 13:39, Pavel Dovgalyuk wrote: >> Sometimes interrupt event comes at the same time with >> the virtual timers. In this case replay tries to proceed >> the timers, because deadline for them is zero. >> This patch allows processing interrupts and exceptions >> by entering the vCPU execution loop, when deadline is zero, >> but checkpoint associated with virtual timers is not ready >> to be replayed. >> >> Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> >> --- >> accel/tcg/tcg-cpus-icount.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/accel/tcg/tcg-cpus-icount.c b/accel/tcg/tcg-cpus-icount.c >> index 9f45432275..a6d2bb8a88 100644 >> --- a/accel/tcg/tcg-cpus-icount.c >> +++ b/accel/tcg/tcg-cpus-icount.c >> @@ -81,7 +81,13 @@ void icount_handle_deadline(void) >> int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL, >> QEMU_TIMER_ATTR_ALL); >> - if (deadline == 0) { >> + /* >> + * Instructions, interrupts, and exceptions are processed in >> cpu-exec. >> + * Don't interrupt cpu thread, when these events are waiting >> + * (i.e., there is no checkpoint) >> + */ >> + if (deadline == 0 >> + && (replay_mode == REPLAY_MODE_RECORD || >> replay_has_checkpoint())) { > > Should this be replay_mode != REPLAY_MODE_PLAY || replay_has_checkpoint()? It was the first idea, but I thought, that == is more straightforward to understand than !=. Pavel Dovgalyuk
In general I agree, but != means that rr disabled returns true. In general it seems to me that rr disabled should work more or less the same as record mode, because there is no replay log to provide the checkpoints. Paolo Il lun 25 gen 2021, 06:38 Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru> ha scritto: > On 23.01.2021 21:15, Paolo Bonzini wrote: > > On 19/01/21 13:39, Pavel Dovgalyuk wrote: > >> Sometimes interrupt event comes at the same time with > >> the virtual timers. In this case replay tries to proceed > >> the timers, because deadline for them is zero. > >> This patch allows processing interrupts and exceptions > >> by entering the vCPU execution loop, when deadline is zero, > >> but checkpoint associated with virtual timers is not ready > >> to be replayed. > >> > >> Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> > >> --- > >> accel/tcg/tcg-cpus-icount.c | 8 +++++++- > >> 1 file changed, 7 insertions(+), 1 deletion(-) > >> > >> diff --git a/accel/tcg/tcg-cpus-icount.c b/accel/tcg/tcg-cpus-icount.c > >> index 9f45432275..a6d2bb8a88 100644 > >> --- a/accel/tcg/tcg-cpus-icount.c > >> +++ b/accel/tcg/tcg-cpus-icount.c > >> @@ -81,7 +81,13 @@ void icount_handle_deadline(void) > >> int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL, > >> > QEMU_TIMER_ATTR_ALL); > >> - if (deadline == 0) { > >> + /* > >> + * Instructions, interrupts, and exceptions are processed in > >> cpu-exec. > >> + * Don't interrupt cpu thread, when these events are waiting > >> + * (i.e., there is no checkpoint) > >> + */ > >> + if (deadline == 0 > >> + && (replay_mode == REPLAY_MODE_RECORD || > >> replay_has_checkpoint())) { > > > > Should this be replay_mode != REPLAY_MODE_PLAY || > replay_has_checkpoint()? > > It was the first idea, but I thought, that == is more straightforward > to understand than !=. > > Pavel Dovgalyuk > >
Paolo Bonzini <pbonzini@redhat.com> writes: > In general I agree, but != means that rr disabled returns true. In general > it seems to me that rr disabled should work more or less the same as record > mode, because there is no replay log to provide the checkpoints. Is this not an argument to combine the mode and check into replay.h inline helpers with some clear semantic documentation and the call sites become self documenting? if (deadline == 0 && replay_recording_or_checkpoint()) which also makes things easier to compile away if replay isn't there? > > Paolo > > Il lun 25 gen 2021, 06:38 Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru> ha > scritto: > >> On 23.01.2021 21:15, Paolo Bonzini wrote: >> > On 19/01/21 13:39, Pavel Dovgalyuk wrote: >> >> Sometimes interrupt event comes at the same time with >> >> the virtual timers. In this case replay tries to proceed >> >> the timers, because deadline for them is zero. >> >> This patch allows processing interrupts and exceptions >> >> by entering the vCPU execution loop, when deadline is zero, >> >> but checkpoint associated with virtual timers is not ready >> >> to be replayed. >> >> >> >> Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> >> >> --- >> >> accel/tcg/tcg-cpus-icount.c | 8 +++++++- >> >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> >> >> diff --git a/accel/tcg/tcg-cpus-icount.c b/accel/tcg/tcg-cpus-icount.c >> >> index 9f45432275..a6d2bb8a88 100644 >> >> --- a/accel/tcg/tcg-cpus-icount.c >> >> +++ b/accel/tcg/tcg-cpus-icount.c >> >> @@ -81,7 +81,13 @@ void icount_handle_deadline(void) >> >> int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL, >> >> >> QEMU_TIMER_ATTR_ALL); >> >> - if (deadline == 0) { >> >> + /* >> >> + * Instructions, interrupts, and exceptions are processed in >> >> cpu-exec. >> >> + * Don't interrupt cpu thread, when these events are waiting >> >> + * (i.e., there is no checkpoint) >> >> + */ >> >> + if (deadline == 0 >> >> + && (replay_mode == REPLAY_MODE_RECORD || >> >> replay_has_checkpoint())) { >> > >> > Should this be replay_mode != REPLAY_MODE_PLAY || >> replay_has_checkpoint()? >> >> It was the first idea, but I thought, that == is more straightforward >> to understand than !=. >> >> Pavel Dovgalyuk >> >>
On 1/25/21 1:12 PM, Alex Bennée wrote: > > Paolo Bonzini <pbonzini@redhat.com> writes: > >> In general I agree, but != means that rr disabled returns true. In general >> it seems to me that rr disabled should work more or less the same as record >> mode, because there is no replay log to provide the checkpoints. > > Is this not an argument to combine the mode and check into replay.h > inline helpers with some clear semantic documentation and the call sites > become self documenting? > > if (deadline == 0 && replay_recording_or_checkpoint()) > > which also makes things easier to compile away if replay isn't there? Seems that the TCG build faces a similar issue to the issue I was facing with the non-TCG build, before the non-TCG build got functional again (for x86). We solved the non-TCG build problem, by not compiling replay at all for non-TCG, plus closing our nose and stubbing away what couldn't be completely removed (yet). But the CONFIG_TCG build has the same legitimate requirement towards a non-CONFIG_REPLAY build. ie, like we have tcg_enabled(), should we have replay_enabled()? Maybe it could be reworked starting from replay_events_enabled()? And then when things are refactored properly for replay_enabled(), a non-REPLAY TCG build can basically ignore all the inner workings of replay. Ciao, Claudio > >> >> Paolo >> >> Il lun 25 gen 2021, 06:38 Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru> ha >> scritto: >> >>> On 23.01.2021 21:15, Paolo Bonzini wrote: >>>> On 19/01/21 13:39, Pavel Dovgalyuk wrote: >>>>> Sometimes interrupt event comes at the same time with >>>>> the virtual timers. In this case replay tries to proceed >>>>> the timers, because deadline for them is zero. >>>>> This patch allows processing interrupts and exceptions >>>>> by entering the vCPU execution loop, when deadline is zero, >>>>> but checkpoint associated with virtual timers is not ready >>>>> to be replayed. >>>>> >>>>> Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> >>>>> --- >>>>> accel/tcg/tcg-cpus-icount.c | 8 +++++++- >>>>> 1 file changed, 7 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/accel/tcg/tcg-cpus-icount.c b/accel/tcg/tcg-cpus-icount.c >>>>> index 9f45432275..a6d2bb8a88 100644 >>>>> --- a/accel/tcg/tcg-cpus-icount.c >>>>> +++ b/accel/tcg/tcg-cpus-icount.c >>>>> @@ -81,7 +81,13 @@ void icount_handle_deadline(void) >>>>> int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL, >>>>> >>> QEMU_TIMER_ATTR_ALL); >>>>> - if (deadline == 0) { >>>>> + /* >>>>> + * Instructions, interrupts, and exceptions are processed in >>>>> cpu-exec. >>>>> + * Don't interrupt cpu thread, when these events are waiting >>>>> + * (i.e., there is no checkpoint) >>>>> + */ >>>>> + if (deadline == 0 >>>>> + && (replay_mode == REPLAY_MODE_RECORD || >>>>> replay_has_checkpoint())) { >>>> >>>> Should this be replay_mode != REPLAY_MODE_PLAY || >>> replay_has_checkpoint()? >>> >>> It was the first idea, but I thought, that == is more straightforward >>> to understand than !=. >>> >>> Pavel Dovgalyuk >>> >>> > >
On 1/25/21 1:43 PM, Claudio Fontana wrote: > On 1/25/21 1:12 PM, Alex Bennée wrote: >> >> Paolo Bonzini <pbonzini@redhat.com> writes: >> >>> In general I agree, but != means that rr disabled returns true. In general >>> it seems to me that rr disabled should work more or less the same as record >>> mode, because there is no replay log to provide the checkpoints. >> >> Is this not an argument to combine the mode and check into replay.h >> inline helpers with some clear semantic documentation and the call sites >> become self documenting? >> >> if (deadline == 0 && replay_recording_or_checkpoint()) >> >> which also makes things easier to compile away if replay isn't there? > > > Seems that the TCG build faces a similar issue to the issue I was facing with the non-TCG build, > before the non-TCG build got functional again (for x86). > > We solved the non-TCG build problem, by not compiling replay at all for non-TCG, plus closing our nose and stubbing away what couldn't be completely removed (yet). > > But the CONFIG_TCG build has the same legitimate requirement towards a non-CONFIG_REPLAY build. > > ie, like we have tcg_enabled(), should we have replay_enabled()? Maybe it could be reworked starting from replay_events_enabled()? > > And then when things are refactored properly for replay_enabled(), a non-REPLAY TCG build can basically ignore all the inner workings of replay. > I guess to summarize the above, should there be a CONFIG_REPLAY, dependent on CONFIG_TCG, by default on, but which could be disabled with --disable-replay ? Thanks, Claudio > >> >>> >>> Paolo >>> >>> Il lun 25 gen 2021, 06:38 Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru> ha >>> scritto: >>> >>>> On 23.01.2021 21:15, Paolo Bonzini wrote: >>>>> On 19/01/21 13:39, Pavel Dovgalyuk wrote: >>>>>> Sometimes interrupt event comes at the same time with >>>>>> the virtual timers. In this case replay tries to proceed >>>>>> the timers, because deadline for them is zero. >>>>>> This patch allows processing interrupts and exceptions >>>>>> by entering the vCPU execution loop, when deadline is zero, >>>>>> but checkpoint associated with virtual timers is not ready >>>>>> to be replayed. >>>>>> >>>>>> Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> >>>>>> --- >>>>>> accel/tcg/tcg-cpus-icount.c | 8 +++++++- >>>>>> 1 file changed, 7 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/accel/tcg/tcg-cpus-icount.c b/accel/tcg/tcg-cpus-icount.c >>>>>> index 9f45432275..a6d2bb8a88 100644 >>>>>> --- a/accel/tcg/tcg-cpus-icount.c >>>>>> +++ b/accel/tcg/tcg-cpus-icount.c >>>>>> @@ -81,7 +81,13 @@ void icount_handle_deadline(void) >>>>>> int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL, >>>>>> >>>> QEMU_TIMER_ATTR_ALL); >>>>>> - if (deadline == 0) { >>>>>> + /* >>>>>> + * Instructions, interrupts, and exceptions are processed in >>>>>> cpu-exec. >>>>>> + * Don't interrupt cpu thread, when these events are waiting >>>>>> + * (i.e., there is no checkpoint) >>>>>> + */ >>>>>> + if (deadline == 0 >>>>>> + && (replay_mode == REPLAY_MODE_RECORD || >>>>>> replay_has_checkpoint())) { >>>>> >>>>> Should this be replay_mode != REPLAY_MODE_PLAY || >>>> replay_has_checkpoint()? >>>> >>>> It was the first idea, but I thought, that == is more straightforward >>>> to understand than !=. >>>> >>>> Pavel Dovgalyuk >>>> >>>> >> >> >
Claudio Fontana <cfontana@suse.de> writes: > On 1/25/21 1:43 PM, Claudio Fontana wrote: >> On 1/25/21 1:12 PM, Alex Bennée wrote: >>> >>> Paolo Bonzini <pbonzini@redhat.com> writes: >>> >>>> In general I agree, but != means that rr disabled returns true. In general >>>> it seems to me that rr disabled should work more or less the same as record >>>> mode, because there is no replay log to provide the checkpoints. >>> >>> Is this not an argument to combine the mode and check into replay.h >>> inline helpers with some clear semantic documentation and the call sites >>> become self documenting? >>> >>> if (deadline == 0 && replay_recording_or_checkpoint()) >>> >>> which also makes things easier to compile away if replay isn't there? >> >> >> Seems that the TCG build faces a similar issue to the issue I was facing with the non-TCG build, >> before the non-TCG build got functional again (for x86). >> >> We solved the non-TCG build problem, by not compiling replay at all for non-TCG, plus closing our nose and stubbing away what couldn't be completely removed (yet). >> >> But the CONFIG_TCG build has the same legitimate requirement towards a non-CONFIG_REPLAY build. >> >> ie, like we have tcg_enabled(), should we have replay_enabled()? Maybe it could be reworked starting from replay_events_enabled()? >> >> And then when things are refactored properly for replay_enabled(), a non-REPLAY TCG build can basically ignore all the inner workings of replay. >> > > I guess to summarize the above, should there be a CONFIG_REPLAY, dependent on CONFIG_TCG, by default on, > but which could be disabled with > > --disable-replay > > ? I'm not sure - fundamentally having replay is one of those cool things you can do when you have TCG and I suspect there is less pressure to have a finely tuned TCG enabled build compared to our HW accelerated brethren using hypervisors. TCG plugins are a configure option purely because there is a small but non-marginal cost in having it enabled. I doubt replay figures that much if you are not using it. That said if it makes it easier to make sure our abstractions are clean and the layering is good then maybe it is worth having a build that allows us to check that. But if it's going to be super fiddly and involve large amounts of code motion I doubt the "win" is big enough for the effort. Also I suspect the config option would be CONFIG_ICOUNT because replay is just one of the features on the spectrum of: deterministic icount -> record/replay -> reverse debugging which all require the base support for icount.
On 1/25/21 3:26 PM, Alex Bennée wrote: > > Claudio Fontana <cfontana@suse.de> writes: > >> On 1/25/21 1:43 PM, Claudio Fontana wrote: >>> On 1/25/21 1:12 PM, Alex Bennée wrote: >>>> >>>> Paolo Bonzini <pbonzini@redhat.com> writes: >>>> >>>>> In general I agree, but != means that rr disabled returns true. In general >>>>> it seems to me that rr disabled should work more or less the same as record >>>>> mode, because there is no replay log to provide the checkpoints. >>>> >>>> Is this not an argument to combine the mode and check into replay.h >>>> inline helpers with some clear semantic documentation and the call sites >>>> become self documenting? >>>> >>>> if (deadline == 0 && replay_recording_or_checkpoint()) >>>> >>>> which also makes things easier to compile away if replay isn't there? >>> >>> >>> Seems that the TCG build faces a similar issue to the issue I was facing with the non-TCG build, >>> before the non-TCG build got functional again (for x86). >>> >>> We solved the non-TCG build problem, by not compiling replay at all for non-TCG, plus closing our nose and stubbing away what couldn't be completely removed (yet). >>> >>> But the CONFIG_TCG build has the same legitimate requirement towards a non-CONFIG_REPLAY build. >>> >>> ie, like we have tcg_enabled(), should we have replay_enabled()? Maybe it could be reworked starting from replay_events_enabled()? >>> >>> And then when things are refactored properly for replay_enabled(), a non-REPLAY TCG build can basically ignore all the inner workings of replay. >>> >> >> I guess to summarize the above, should there be a CONFIG_REPLAY, dependent on CONFIG_TCG, by default on, >> but which could be disabled with >> >> --disable-replay >> >> ? > > I'm not sure - fundamentally having replay is one of those cool things > you can do when you have TCG and I suspect there is less pressure to > have a finely tuned TCG enabled build compared to our HW accelerated > brethren using hypervisors. TCG plugins are a configure option purely > because there is a small but non-marginal cost in having it enabled. I > doubt replay figures that much if you are not using it. > > That said if it makes it easier to make sure our abstractions are clean > and the layering is good then maybe it is worth having a build that I think that cleaning up these aspects would be beneficial in replay itself, but maybe this could be done without forcing the --disable-replay option. > allows us to check that. But if it's going to be super fiddly and > involve large amounts of code motion I doubt the "win" is big enough for > the effort. > > Also I suspect the config option would be CONFIG_ICOUNT because replay > is just one of the features on the spectrum of: > > deterministic icount -> record/replay -> reverse debugging > > which all require the base support for icount. > Right, icount_enabled() is already fundamentally able to check whether replay functionality is available or not. Probably there is already some cleanup possible by applying this consistently. Other clean up opportunities I see are in making consistent checks for whether the functionality is active, and consolidate all the spread state, including (but not limited to): replay_mode, events_enabled, replay_is_debugging, in_checkpoint, ... (into a single struct replay ?) ..and then I guess clean up the namespace from all the replay_ functions for which we need a gazillion stubs for non-TCG code.. Ciao, Claudio
diff --git a/accel/tcg/tcg-cpus-icount.c b/accel/tcg/tcg-cpus-icount.c index 9f45432275..a6d2bb8a88 100644 --- a/accel/tcg/tcg-cpus-icount.c +++ b/accel/tcg/tcg-cpus-icount.c @@ -81,7 +81,13 @@ void icount_handle_deadline(void) int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL, QEMU_TIMER_ATTR_ALL); - if (deadline == 0) { + /* + * Instructions, interrupts, and exceptions are processed in cpu-exec. + * Don't interrupt cpu thread, when these events are waiting + * (i.e., there is no checkpoint) + */ + if (deadline == 0 + && (replay_mode == REPLAY_MODE_RECORD || replay_has_checkpoint())) { icount_notify_aio_contexts(); } }
Sometimes interrupt event comes at the same time with the virtual timers. In this case replay tries to proceed the timers, because deadline for them is zero. This patch allows processing interrupts and exceptions by entering the vCPU execution loop, when deadline is zero, but checkpoint associated with virtual timers is not ready to be replayed. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> --- accel/tcg/tcg-cpus-icount.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)