Message ID | 20180912081950.3228.68987.stgit@pasha-VirtualBox (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fixing record/replay and adding reverse debugging | expand |
On 09/12/2018 04:19 AM, Pavel Dovgalyuk wrote: > This patch makes IDE trim BH deterministic, because it affects > the device state. Therefore its invocation should be replayed > instead of running at the random moment. > > Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> > --- > hw/ide/core.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/hw/ide/core.c b/hw/ide/core.c > index 2c62efc..04e22e7 100644 > --- a/hw/ide/core.c > +++ b/hw/ide/core.c > @@ -35,6 +35,7 @@ > #include "sysemu/block-backend.h" > #include "qapi/error.h" > #include "qemu/cutils.h" > +#include "sysemu/replay.h" > > #include "hw/ide/internal.h" > #include "trace.h" > @@ -479,7 +480,7 @@ static void ide_issue_trim_cb(void *opaque, int ret) > done: > iocb->aiocb = NULL; > if (iocb->bh) { > - qemu_bh_schedule(iocb->bh); > + replay_bh_schedule_event(iocb->bh); > } > } > > > Just passing by: Why do we need to change this call, but nothing else in IDE? I don't mind conceptually, but it's odd to me that of all the calls I make in this emulator that change state somewhere that this is the only one you need to hijack for the replay feature. Is this a necessarily complete change? --js
> From: John Snow [mailto:jsnow@redhat.com] > On 09/12/2018 04:19 AM, Pavel Dovgalyuk wrote: > > This patch makes IDE trim BH deterministic, because it affects > > the device state. Therefore its invocation should be replayed > > instead of running at the random moment. > > > > Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> > > --- > > hw/ide/core.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/hw/ide/core.c b/hw/ide/core.c > > index 2c62efc..04e22e7 100644 > > --- a/hw/ide/core.c > > +++ b/hw/ide/core.c > > @@ -35,6 +35,7 @@ > > #include "sysemu/block-backend.h" > > #include "qapi/error.h" > > #include "qemu/cutils.h" > > +#include "sysemu/replay.h" > > > > #include "hw/ide/internal.h" > > #include "trace.h" > > @@ -479,7 +480,7 @@ static void ide_issue_trim_cb(void *opaque, int ret) > > done: > > iocb->aiocb = NULL; > > if (iocb->bh) { > > - qemu_bh_schedule(iocb->bh); > > + replay_bh_schedule_event(iocb->bh); > > } > > } > > > Just passing by: Why do we need to change this call, but nothing else in > IDE? This call is responsible for a bug that was reproducible. > I don't mind conceptually, but it's odd to me that of all the calls I > make in this emulator that change state somewhere that this is the only > one you need to hijack for the replay feature. > > Is this a necessarily complete change? Maybe not. We can hardly analyze all peripheral devices code and fix all the calls. But I think we can improve that patch and at least look through ide core to fix other calls. Pavel Dovgalyuk
> From: Pavel Dovgalyuk [mailto:dovgaluk@ispras.ru] > > From: John Snow [mailto:jsnow@redhat.com] > > On 09/12/2018 04:19 AM, Pavel Dovgalyuk wrote: > > > This patch makes IDE trim BH deterministic, because it affects > > > the device state. Therefore its invocation should be replayed > > > instead of running at the random moment. > > > > > > Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> > > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> > > > --- > > > hw/ide/core.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/hw/ide/core.c b/hw/ide/core.c > > > index 2c62efc..04e22e7 100644 > > > --- a/hw/ide/core.c > > > +++ b/hw/ide/core.c > > > @@ -35,6 +35,7 @@ > > > #include "sysemu/block-backend.h" > > > #include "qapi/error.h" > > > #include "qemu/cutils.h" > > > +#include "sysemu/replay.h" > > > > > > #include "hw/ide/internal.h" > > > #include "trace.h" > > > @@ -479,7 +480,7 @@ static void ide_issue_trim_cb(void *opaque, int ret) > > > done: > > > iocb->aiocb = NULL; > > > if (iocb->bh) { > > > - qemu_bh_schedule(iocb->bh); > > > + replay_bh_schedule_event(iocb->bh); > > > } > > > } > > > > > Just passing by: Why do we need to change this call, but nothing else in > > IDE? > > This call is responsible for a bug that was reproducible. > > > I don't mind conceptually, but it's odd to me that of all the calls I > > make in this emulator that change state somewhere that this is the only > > one you need to hijack for the replay feature. > > > > Is this a necessarily complete change? I found one more BH in ide/core: static void ide_restart_cb(void *opaque, int running, RunState state) { IDEBus *bus = opaque; if (!running) return; if (!bus->bh) { bus->bh = qemu_bh_new(ide_restart_bh, bus); qemu_bh_schedule(bus->bh); } } void ide_register_restart_cb(IDEBus *bus) { if (bus->dma->ops->restart_dma) { bus->vmstate = qemu_add_vm_change_state_handler(ide_restart_cb, bus); } } As I understand, it is called when VM start/stop event happen. These events are not related to the guest state. Does this BH change the guest state somehow? Pavel Dovgalyuk
On 09/14/2018 01:48 AM, Pavel Dovgalyuk wrote: >> From: John Snow [mailto:jsnow@redhat.com] >> On 09/12/2018 04:19 AM, Pavel Dovgalyuk wrote: >>> This patch makes IDE trim BH deterministic, because it affects >>> the device state. Therefore its invocation should be replayed >>> instead of running at the random moment. >>> >>> Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> >>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> >>> --- >>> hw/ide/core.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/hw/ide/core.c b/hw/ide/core.c >>> index 2c62efc..04e22e7 100644 >>> --- a/hw/ide/core.c >>> +++ b/hw/ide/core.c >>> @@ -35,6 +35,7 @@ >>> #include "sysemu/block-backend.h" >>> #include "qapi/error.h" >>> #include "qemu/cutils.h" >>> +#include "sysemu/replay.h" >>> >>> #include "hw/ide/internal.h" >>> #include "trace.h" >>> @@ -479,7 +480,7 @@ static void ide_issue_trim_cb(void *opaque, int ret) >>> done: >>> iocb->aiocb = NULL; >>> if (iocb->bh) { >>> - qemu_bh_schedule(iocb->bh); >>> + replay_bh_schedule_event(iocb->bh); >>> } >>> } >>> >> Just passing by: Why do we need to change this call, but nothing else in >> IDE? > > This call is responsible for a bug that was reproducible. > >> I don't mind conceptually, but it's odd to me that of all the calls I >> make in this emulator that change state somewhere that this is the only >> one you need to hijack for the replay feature. >> >> Is this a necessarily complete change? > > Maybe not. We can hardly analyze all peripheral devices code and fix all the calls. > But I think we can improve that patch and at least look through ide core to fix other calls. > > Pavel Dovgalyuk > It just seems odd that if you're working on a replay mechanism that requires you to intercept my QEMU API calls that you're only changing a trim callback. I'd kind of expect that you don't need to intercept any, unless these are legacy calls that I shouldn't be making at all and you have a more generic intercept somewhere deeper in the codebase. In that case, I really ought to hustle off of my use of legacy calls. What are the criteria for things you need to intercept/wrap?
On 09/14/2018 03:27 AM, Pavel Dovgalyuk wrote: >> From: Pavel Dovgalyuk [mailto:dovgaluk@ispras.ru] >>> From: John Snow [mailto:jsnow@redhat.com] >>> On 09/12/2018 04:19 AM, Pavel Dovgalyuk wrote: >>>> This patch makes IDE trim BH deterministic, because it affects >>>> the device state. Therefore its invocation should be replayed >>>> instead of running at the random moment. >>>> >>>> Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> >>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> >>>> --- >>>> hw/ide/core.c | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/hw/ide/core.c b/hw/ide/core.c >>>> index 2c62efc..04e22e7 100644 >>>> --- a/hw/ide/core.c >>>> +++ b/hw/ide/core.c >>>> @@ -35,6 +35,7 @@ >>>> #include "sysemu/block-backend.h" >>>> #include "qapi/error.h" >>>> #include "qemu/cutils.h" >>>> +#include "sysemu/replay.h" >>>> >>>> #include "hw/ide/internal.h" >>>> #include "trace.h" >>>> @@ -479,7 +480,7 @@ static void ide_issue_trim_cb(void *opaque, int ret) >>>> done: >>>> iocb->aiocb = NULL; >>>> if (iocb->bh) { >>>> - qemu_bh_schedule(iocb->bh); >>>> + replay_bh_schedule_event(iocb->bh); >>>> } >>>> } >>>> >>> Just passing by: Why do we need to change this call, but nothing else in >>> IDE? >> >> This call is responsible for a bug that was reproducible. >> >>> I don't mind conceptually, but it's odd to me that of all the calls I >>> make in this emulator that change state somewhere that this is the only >>> one you need to hijack for the replay feature. >>> >>> Is this a necessarily complete change? > > > I found one more BH in ide/core: > > static void ide_restart_cb(void *opaque, int running, RunState state) > { > IDEBus *bus = opaque; > > if (!running) > return; > > if (!bus->bh) { > bus->bh = qemu_bh_new(ide_restart_bh, bus); > qemu_bh_schedule(bus->bh); > } > } > > void ide_register_restart_cb(IDEBus *bus) > { > if (bus->dma->ops->restart_dma) { > bus->vmstate = qemu_add_vm_change_state_handler(ide_restart_cb, bus); > } > } > > As I understand, it is called when VM start/stop event happen. > These events are not related to the guest state. > > Does this BH change the guest state somehow? > > Pavel Dovgalyuk > > Shouldn't change guest state all by itself. ide_restart_bh does, though. (Changes device registers, can cause block IO to occur, etc.) --js
On 14/09/2018 16:00, John Snow wrote: >> Maybe not. We can hardly analyze all peripheral devices code and fix all the calls. >> But I think we can improve that patch and at least look through ide core to fix other calls. >> >> Pavel Dovgalyuk >> > It just seems odd that if you're working on a replay mechanism that > requires you to intercept my QEMU API calls that you're only changing a > trim callback. You need it only here because the block layer is already calling replay_bh_schedule_event (actually replay_bh_schedule_oneshot_event after patch 22 of this series) on reads and writes. Paolo > I'd kind of expect that you don't need to intercept any, unless these > are legacy calls that I shouldn't be making at all and you have a more > generic intercept somewhere deeper in the codebase. > > In that case, I really ought to hustle off of my use of legacy calls. > > What are the criteria for things you need to intercept/wrap?
On 14 September 2018 at 16:19, Paolo Bonzini <pbonzini@redhat.com> wrote: > On 14/09/2018 16:00, John Snow wrote: >>> Maybe not. We can hardly analyze all peripheral devices code and fix all the calls. >>> But I think we can improve that patch and at least look through ide core to fix other calls. >>> >>> Pavel Dovgalyuk >>> >> It just seems odd that if you're working on a replay mechanism that >> requires you to intercept my QEMU API calls that you're only changing a >> trim callback. > > You need it only here because the block layer is already calling > replay_bh_schedule_event (actually replay_bh_schedule_oneshot_event > after patch 22 of this series) on reads and writes. Do we have documentation describing when a device model needs to care about record/replay ? thanks -- PMM
> From: Peter Maydell [mailto:peter.maydell@linaro.org] > On 14 September 2018 at 16:19, Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 14/09/2018 16:00, John Snow wrote: > >>> Maybe not. We can hardly analyze all peripheral devices code and fix all the calls. > >>> But I think we can improve that patch and at least look through ide core to fix other > calls. > >>> > >>> Pavel Dovgalyuk > >>> > >> It just seems odd that if you're working on a replay mechanism that > >> requires you to intercept my QEMU API calls that you're only changing a > >> trim callback. > > > > You need it only here because the block layer is already calling > > replay_bh_schedule_event (actually replay_bh_schedule_oneshot_event > > after patch 22 of this series) on reads and writes. > > Do we have documentation describing when a device model needs to care > about record/replay ? Not yet. But how we are suppose this doc to be seen by everyone? Pavel Dovgalyuk
> From: John Snow [mailto:jsnow@redhat.com] > On 09/14/2018 03:27 AM, Pavel Dovgalyuk wrote: > >> From: Pavel Dovgalyuk [mailto:dovgaluk@ispras.ru] > >>> From: John Snow [mailto:jsnow@redhat.com] > >>> On 09/12/2018 04:19 AM, Pavel Dovgalyuk wrote: > >>>> This patch makes IDE trim BH deterministic, because it affects > >>>> the device state. Therefore its invocation should be replayed > >>>> instead of running at the random moment. > >>>> > >>>> Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> > >>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> > >>>> --- > >>>> hw/ide/core.c | 3 ++- > >>>> 1 file changed, 2 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/hw/ide/core.c b/hw/ide/core.c > >>>> index 2c62efc..04e22e7 100644 > >>>> --- a/hw/ide/core.c > >>>> +++ b/hw/ide/core.c > >>>> @@ -35,6 +35,7 @@ > >>>> #include "sysemu/block-backend.h" > >>>> #include "qapi/error.h" > >>>> #include "qemu/cutils.h" > >>>> +#include "sysemu/replay.h" > >>>> > >>>> #include "hw/ide/internal.h" > >>>> #include "trace.h" > >>>> @@ -479,7 +480,7 @@ static void ide_issue_trim_cb(void *opaque, int ret) > >>>> done: > >>>> iocb->aiocb = NULL; > >>>> if (iocb->bh) { > >>>> - qemu_bh_schedule(iocb->bh); > >>>> + replay_bh_schedule_event(iocb->bh); > >>>> } > >>>> } > >>>> > >>> Just passing by: Why do we need to change this call, but nothing else in > >>> IDE? > >> > >> This call is responsible for a bug that was reproducible. > >> > >>> I don't mind conceptually, but it's odd to me that of all the calls I > >>> make in this emulator that change state somewhere that this is the only > >>> one you need to hijack for the replay feature. > >>> > >>> Is this a necessarily complete change? > > > > > > I found one more BH in ide/core: > > > > static void ide_restart_cb(void *opaque, int running, RunState state) > > { > > IDEBus *bus = opaque; > > > > if (!running) > > return; > > > > if (!bus->bh) { > > bus->bh = qemu_bh_new(ide_restart_bh, bus); > > qemu_bh_schedule(bus->bh); > > } > > } > > > > void ide_register_restart_cb(IDEBus *bus) > > { > > if (bus->dma->ops->restart_dma) { > > bus->vmstate = qemu_add_vm_change_state_handler(ide_restart_cb, bus); > > } > > } > > > > As I understand, it is called when VM start/stop event happen. > > These events are not related to the guest state. > > > > Does this BH change the guest state somehow? > > Shouldn't change guest state all by itself. > > ide_restart_bh does, though. (Changes device registers, can cause block > IO to occur, etc.) Why is this needed? I mean that start/stop should not change the state of the guest. Why should we restart IDE controller operations? Pavel Dovgalyuk
On 17/09/2018 13:57, Pavel Dovgalyuk wrote: >> From: Peter Maydell [mailto:peter.maydell@linaro.org] >> On 14 September 2018 at 16:19, Paolo Bonzini <pbonzini@redhat.com> wrote: >>> On 14/09/2018 16:00, John Snow wrote: >>>>> Maybe not. We can hardly analyze all peripheral devices code and fix all the calls. >>>>> But I think we can improve that patch and at least look through ide core to fix other >> calls. >>>>> >>>>> Pavel Dovgalyuk >>>>> >>>> It just seems odd that if you're working on a replay mechanism that >>>> requires you to intercept my QEMU API calls that you're only changing a >>>> trim callback. >>> >>> You need it only here because the block layer is already calling >>> replay_bh_schedule_event (actually replay_bh_schedule_oneshot_event >>> after patch 22 of this series) on reads and writes. >> >> Do we have documentation describing when a device model needs to care >> about record/replay ? > > Not yet. > But how we are suppose this doc to be seen by everyone? Put it in docs/devel. Paolo
On 09/17/2018 08:00 AM, Pavel Dovgalyuk wrote: >> From: John Snow [mailto:jsnow@redhat.com] >> On 09/14/2018 03:27 AM, Pavel Dovgalyuk wrote: >>>> From: Pavel Dovgalyuk [mailto:dovgaluk@ispras.ru] >>>>> From: John Snow [mailto:jsnow@redhat.com] >>>>> On 09/12/2018 04:19 AM, Pavel Dovgalyuk wrote: >>>>>> This patch makes IDE trim BH deterministic, because it affects >>>>>> the device state. Therefore its invocation should be replayed >>>>>> instead of running at the random moment. >>>>>> >>>>>> Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> >>>>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> >>>>>> --- >>>>>> hw/ide/core.c | 3 ++- >>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/hw/ide/core.c b/hw/ide/core.c >>>>>> index 2c62efc..04e22e7 100644 >>>>>> --- a/hw/ide/core.c >>>>>> +++ b/hw/ide/core.c >>>>>> @@ -35,6 +35,7 @@ >>>>>> #include "sysemu/block-backend.h" >>>>>> #include "qapi/error.h" >>>>>> #include "qemu/cutils.h" >>>>>> +#include "sysemu/replay.h" >>>>>> >>>>>> #include "hw/ide/internal.h" >>>>>> #include "trace.h" >>>>>> @@ -479,7 +480,7 @@ static void ide_issue_trim_cb(void *opaque, int ret) >>>>>> done: >>>>>> iocb->aiocb = NULL; >>>>>> if (iocb->bh) { >>>>>> - qemu_bh_schedule(iocb->bh); >>>>>> + replay_bh_schedule_event(iocb->bh); >>>>>> } >>>>>> } >>>>>> >>>>> Just passing by: Why do we need to change this call, but nothing else in >>>>> IDE? >>>> >>>> This call is responsible for a bug that was reproducible. >>>> >>>>> I don't mind conceptually, but it's odd to me that of all the calls I >>>>> make in this emulator that change state somewhere that this is the only >>>>> one you need to hijack for the replay feature. >>>>> >>>>> Is this a necessarily complete change? >>> >>> >>> I found one more BH in ide/core: >>> >>> static void ide_restart_cb(void *opaque, int running, RunState state) >>> { >>> IDEBus *bus = opaque; >>> >>> if (!running) >>> return; >>> >>> if (!bus->bh) { >>> bus->bh = qemu_bh_new(ide_restart_bh, bus); >>> qemu_bh_schedule(bus->bh); >>> } >>> } >>> >>> void ide_register_restart_cb(IDEBus *bus) >>> { >>> if (bus->dma->ops->restart_dma) { >>> bus->vmstate = qemu_add_vm_change_state_handler(ide_restart_cb, bus); >>> } >>> } >>> >>> As I understand, it is called when VM start/stop event happen. >>> These events are not related to the guest state. >>> >>> Does this BH change the guest state somehow? >> >> Shouldn't change guest state all by itself. >> >> ide_restart_bh does, though. (Changes device registers, can cause block >> IO to occur, etc.) > > Why is this needed? > I mean that start/stop should not change the state of the guest. > Why should we restart IDE controller operations? > > Pavel Dovgalyuk > This is part of the rerror/werror=stop model where if we hit a host IO problem we pause the guest instead of reporting the error. When we re-engage the VM, the IO is re-submitted, which may change guest-visible data. BTW, I'm fine with the changes presented so far, I was just trying to understand them. Paolo, please go ahead. --js
diff --git a/hw/ide/core.c b/hw/ide/core.c index 2c62efc..04e22e7 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -35,6 +35,7 @@ #include "sysemu/block-backend.h" #include "qapi/error.h" #include "qemu/cutils.h" +#include "sysemu/replay.h" #include "hw/ide/internal.h" #include "trace.h" @@ -479,7 +480,7 @@ static void ide_issue_trim_cb(void *opaque, int ret) done: iocb->aiocb = NULL; if (iocb->bh) { - qemu_bh_schedule(iocb->bh); + replay_bh_schedule_event(iocb->bh); } }