Message ID | 20160301122554.GA3745@work-vm (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2016/3/1 20:25, Dr. David Alan Gilbert wrote: > * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote: >> On 2016/2/29 17:47, Dr. David Alan Gilbert wrote: >>> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote: >>>> On 2016/2/27 0:36, Dr. David Alan Gilbert wrote: >>>>> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: >>>>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote: >>>>>>> From: root <root@localhost.localdomain> >>>>>>> >>>>>>> This is the 15th version of COLO (Still only support periodic checkpoint). >>>>>>> >>>>>>> Here is only COLO frame part, you can get the whole codes from github: >>>>>>> https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode >>>>>>> >>>>>>> There are little changes for this series except the network releated part. >>>>>> >>>>>> I was looking at the time the guest is paused during COLO and >>>>>> was surprised to find one of the larger chunks was the time to reset >>>>>> the guest before loading each checkpoint; I've traced it part way, the >>>>>> biggest contributors for my test VM seem to be: >>>>>> >>>>>> 3.8ms pcibus_reset: VGA >>>>>> 1.8ms pcibus_reset: virtio-net-pci >>>>>> 1.5ms pcibus_reset: virtio-blk-pci >>>>>> 1.5ms qemu_devices_reset: piix4_reset >>>>>> 1.1ms pcibus_reset: piix3-ide >>>>>> 1.1ms pcibus_reset: virtio-rng-pci >>>>>> >>>>>> I've not looked deeper yet, but some of these are very silly; >>>>>> I'm running with -nographic so why it's taking 3.8ms to reset VGA is >>>>>> going to be interesting. >>>>>> Also, my only block device is the virtio-blk, so while I understand the >>>>>> standard PC machine has the IDE controller, why it takes it over a ms >>>>>> to reset an unused device. >>>>> >>>>> OK, so I've dug a bit deeper, and it appears that it's the changes in >>>>> PCI bars that actually take the time; every time we do a reset we >>>>> reset all the BARs, this causes it to do a pci_update_mappings and >>>>> end up doing a memory_region_del_subregion. >>>>> Then we load the config space of the PCI device as we do the vmstate_load, >>>>> and this recreates all the mappings again. >>>>> >>>>> I'm not sure what the fix is, but that sounds like it would >>>>> speed up the checkpoints usefully if we can avoid the map/remap when >>>>> they're the same. >>>>> >>>> >>>> Interesting, and thanks for your report. >>>> >>>> We already known qemu_system_reset() is a time-consuming function, we shouldn't >>>> call it here, but if we didn't do that, there will be a bug, which we have >>>> reported before in the previous COLO series, the bellow is the copy of the related >>>> patch comment: > > Paolo suggested one fix, see the patch below; I'm not sure if it's safe > (in particular if the guest changed a bar and the device code tried to access the memory > while loading the state???) - but it does seem to work and shaves ~10ms off the reset/load > times: > Nice work, i also tested it, and it is a good improvement, I'm wondering if it is safe here, it should be safe to apply to qemu_system_reset() independently (I tested it too, it will shaves about 5ms off). Hailiang > Dave > > commit 7570b2984143860005ad9fe79f5394c75f294328 > Author: Dr. David Alan Gilbert <dgilbert@redhat.com> > Date: Tue Mar 1 12:08:14 2016 +0000 > > COLO: Lock memory map around reset/load > > Changing the memory map appears to be expensive; we see this > partiuclarly when on loading a checkpoint we: > a) reset the devices > This causes PCI bars to be reset > b) Loading the device states > This causes the PCI bars to be reloaded. > > Turning this all into a single memory_region_transaction saves > ~10ms/checkpoint. > > TBD: What happens if the device code accesses the RAM during loading > the checkpoint? > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > Suggested-by: Paolo Bonzini <pbonzini@redhat.com> > > diff --git a/migration/colo.c b/migration/colo.c > index 45c3432..c44fb2a 100644 > --- a/migration/colo.c > +++ b/migration/colo.c > @@ -22,6 +22,7 @@ > #include "net/colo-proxy.h" > #include "net/net.h" > #include "block/block_int.h" > +#include "exec/memory.h" > > static bool vmstate_loading; > > @@ -934,6 +935,7 @@ void *colo_process_incoming_thread(void *opaque) > > stage_time_start = qemu_clock_get_us(QEMU_CLOCK_HOST); > qemu_mutex_lock_iothread(); > + memory_region_transaction_begin(); > qemu_system_reset(VMRESET_SILENT); > stage_time_end = qemu_clock_get_us(QEMU_CLOCK_HOST); > timed_average_account(&mis->colo_state.time_reset, > @@ -947,6 +949,7 @@ void *colo_process_incoming_thread(void *opaque) > stage_time_end - stage_time_start); > stage_time_start = stage_time_end; > ret = qemu_load_device_state(fb); > + memory_region_transaction_commit(); > if (ret < 0) { > error_report("COLO: load device state failed\n"); > vmstate_loading = false; > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > . >
* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote: > On 2016/3/1 20:25, Dr. David Alan Gilbert wrote: > >* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote: > >>On 2016/2/29 17:47, Dr. David Alan Gilbert wrote: > >>>* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote: > >>>>On 2016/2/27 0:36, Dr. David Alan Gilbert wrote: > >>>>>* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: > >>>>>>* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote: > >>>>>>>From: root <root@localhost.localdomain> > >>>>>>> > >>>>>>>This is the 15th version of COLO (Still only support periodic checkpoint). > >>>>>>> > >>>>>>>Here is only COLO frame part, you can get the whole codes from github: > >>>>>>>https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode > >>>>>>> > >>>>>>>There are little changes for this series except the network releated part. > >>>>>> > >>>>>>I was looking at the time the guest is paused during COLO and > >>>>>>was surprised to find one of the larger chunks was the time to reset > >>>>>>the guest before loading each checkpoint; I've traced it part way, the > >>>>>>biggest contributors for my test VM seem to be: > >>>>>> > >>>>>> 3.8ms pcibus_reset: VGA > >>>>>> 1.8ms pcibus_reset: virtio-net-pci > >>>>>> 1.5ms pcibus_reset: virtio-blk-pci > >>>>>> 1.5ms qemu_devices_reset: piix4_reset > >>>>>> 1.1ms pcibus_reset: piix3-ide > >>>>>> 1.1ms pcibus_reset: virtio-rng-pci > >>>>>> > >>>>>>I've not looked deeper yet, but some of these are very silly; > >>>>>>I'm running with -nographic so why it's taking 3.8ms to reset VGA is > >>>>>>going to be interesting. > >>>>>>Also, my only block device is the virtio-blk, so while I understand the > >>>>>>standard PC machine has the IDE controller, why it takes it over a ms > >>>>>>to reset an unused device. > >>>>> > >>>>>OK, so I've dug a bit deeper, and it appears that it's the changes in > >>>>>PCI bars that actually take the time; every time we do a reset we > >>>>>reset all the BARs, this causes it to do a pci_update_mappings and > >>>>>end up doing a memory_region_del_subregion. > >>>>>Then we load the config space of the PCI device as we do the vmstate_load, > >>>>>and this recreates all the mappings again. > >>>>> > >>>>>I'm not sure what the fix is, but that sounds like it would > >>>>>speed up the checkpoints usefully if we can avoid the map/remap when > >>>>>they're the same. > >>>>> > >>>> > >>>>Interesting, and thanks for your report. > >>>> > >>>>We already known qemu_system_reset() is a time-consuming function, we shouldn't > >>>>call it here, but if we didn't do that, there will be a bug, which we have > >>>>reported before in the previous COLO series, the bellow is the copy of the related > >>>>patch comment: > > > >Paolo suggested one fix, see the patch below; I'm not sure if it's safe > >(in particular if the guest changed a bar and the device code tried to access the memory > >while loading the state???) - but it does seem to work and shaves ~10ms off the reset/load > >times: > > > > Nice work, i also tested it, and it is a good improvement, I'm wondering if it is safe here, > it should be safe to apply to qemu_system_reset() independently (I tested it too, > it will shaves about 5ms off). Yes, it seems quite nice. I did find today one VM that wont boot with COLO with that change; it's an ubuntu VM that has a delay in Grub, and it's when it does the first checkpoint during Grub still being displayed it gets an error from the inbound migrate. The error is VQ 0 size 0x80 Guest index 0x2444 inconsistent with Host index 0x119e: delta 0x12a6 from virtio-blk - so maybe virtio-blk is accessing the memory during loading. Dave > Hailiang > > >Dave > > > >commit 7570b2984143860005ad9fe79f5394c75f294328 > >Author: Dr. David Alan Gilbert <dgilbert@redhat.com> > >Date: Tue Mar 1 12:08:14 2016 +0000 > > > > COLO: Lock memory map around reset/load > > > > Changing the memory map appears to be expensive; we see this > > partiuclarly when on loading a checkpoint we: > > a) reset the devices > > This causes PCI bars to be reset > > b) Loading the device states > > This causes the PCI bars to be reloaded. > > > > Turning this all into a single memory_region_transaction saves > > ~10ms/checkpoint. > > > > TBD: What happens if the device code accesses the RAM during loading > > the checkpoint? > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > Suggested-by: Paolo Bonzini <pbonzini@redhat.com> > > > >diff --git a/migration/colo.c b/migration/colo.c > >index 45c3432..c44fb2a 100644 > >--- a/migration/colo.c > >+++ b/migration/colo.c > >@@ -22,6 +22,7 @@ > > #include "net/colo-proxy.h" > > #include "net/net.h" > > #include "block/block_int.h" > >+#include "exec/memory.h" > > > > static bool vmstate_loading; > > > >@@ -934,6 +935,7 @@ void *colo_process_incoming_thread(void *opaque) > > > > stage_time_start = qemu_clock_get_us(QEMU_CLOCK_HOST); > > qemu_mutex_lock_iothread(); > >+ memory_region_transaction_begin(); > > qemu_system_reset(VMRESET_SILENT); > > stage_time_end = qemu_clock_get_us(QEMU_CLOCK_HOST); > > timed_average_account(&mis->colo_state.time_reset, > >@@ -947,6 +949,7 @@ void *colo_process_incoming_thread(void *opaque) > > stage_time_end - stage_time_start); > > stage_time_start = stage_time_end; > > ret = qemu_load_device_state(fb); > >+ memory_region_transaction_commit(); > > if (ret < 0) { > > error_report("COLO: load device state failed\n"); > > vmstate_loading = false; > > > >-- > >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > >. > > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff --git a/migration/colo.c b/migration/colo.c index 45c3432..c44fb2a 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -22,6 +22,7 @@ #include "net/colo-proxy.h" #include "net/net.h" #include "block/block_int.h" +#include "exec/memory.h" static bool vmstate_loading; @@ -934,6 +935,7 @@ void *colo_process_incoming_thread(void *opaque) stage_time_start = qemu_clock_get_us(QEMU_CLOCK_HOST); qemu_mutex_lock_iothread(); + memory_region_transaction_begin(); qemu_system_reset(VMRESET_SILENT); stage_time_end = qemu_clock_get_us(QEMU_CLOCK_HOST); timed_average_account(&mis->colo_state.time_reset, @@ -947,6 +949,7 @@ void *colo_process_incoming_thread(void *opaque) stage_time_end - stage_time_start); stage_time_start = stage_time_end; ret = qemu_load_device_state(fb); + memory_region_transaction_commit(); if (ret < 0) { error_report("COLO: load device state failed\n"); vmstate_loading = false;