Message ID | 1568049517-10261-1-git-send-email-andychiu@synology.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ahci: enable pci bus master MemoryRegion before loading ahci engines | expand |
On 9/9/19 1:18 PM, andychiu via Qemu-devel wrote: > If Windows 10 guests have enabled 'turn off hard disk after idle' > option in power settings, and the guest has a SATA disk plugged in, > the SATA disk will be turned off after a specified idle time. > If the guest is live migrated or saved/loaded with its SATA disk > turned off, the following error will occur: > > qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address > qemu-system-x86_64: Failed to load ich9_ahci:ahci > qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:1a.0/ich9_ahci' > qemu-system-x86_64: load of migration failed: Operation not permitted > Oof. That can't have been fun to discover. > Observation from trace logs shows that a while after Windows 10 turns off > a SATA disk (IDE disks don't have the following behavior), > it will disable the PCI_COMMAND_MASTER flag of the pci device containing > the ahci device. When the the disk is turning back on, > the PCI_COMMAND_MASTER flag will be restored first. > But if the guest is migrated or saved/loaded while the disk is off, > the post_load callback of ahci device, ahci_state_post_load(), will fail > at ahci_cond_start_engines() if the MemoryRegion > pci_dev->bus_master_enable_region is not enabled, with pci_dev pointing > to the PCIDevice struct containing the ahci device. > > This patch enables pci_dev->bus_master_enable_region before calling > ahci_cond_start_engines() in ahci_state_post_load(), and restore the > MemoryRegion to its original state afterwards.> This looks good to me from an AHCI perspective, but I'm not as clear on the implications of toggling the MemoryRegion, so I have some doubts. MST, can you chime in and clear my confusion? I suppose when the PCI_COMMAND_MASTER bit is turned off, we disable the memory region, as a guest would be unable to establish a new mapping in this time, so it makes sense that the attempt to map it fails. What's less clear to me is what happens to existing mappings when a region is disabled. Are they invalidated? If so, does it make sense that we are trying to establish a mapping here at all? Maybe it's absolutely correct that this fails. (I suppose, though, that the simple toggling of the region won't be a guest-visible event, so it's probably safe to do. Right?) What I find weird for AHCI is this: We try to engage the CLB mapping before the FIS mapping, but we fail at the FIS mapping. So why is PORT_CMD_FIS_RX set while PORT_CMD_START is unset? It kind of looks like we only half-heartedly stopped the AHCI device. Maybe that's just what Windows does, but I wonder if there's a bug where we're erroneously leaving PORT_CMD_FIS_RX set when we've been disabled. It seems like the guest would need to re-set the mappings anyway, so maybe trying to restore a stale mapping is not the right thing to do. Andy, if you have traces left over: What AHCI registers does Windows touch when it disables the AHCI device? What registers does it touch when it re-engages it? I just want to make sure I'm not leaving something dangling by accident. --js > Signed-off-by: andychiu <andychiu@synology.com> > --- > hw/ide/ahci.c | 53 ++++++++++++++++++++++++++++++++++++----------------- > 1 file changed, 36 insertions(+), 17 deletions(-) > > diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c > index d45393c..83f8c30 100644 > --- a/hw/ide/ahci.c > +++ b/hw/ide/ahci.c > @@ -1649,33 +1649,52 @@ static const VMStateDescription vmstate_ahci_device = { > }, > }; > > +static int ahci_state_load_engines(AHCIState *s, AHCIDevice *ad) > +{ > + AHCIPortRegs *pr = &ad->port_regs; > + DeviceState *dev_state = s->container; > + PCIDevice *pci_dev = (PCIDevice *) object_dynamic_cast(OBJECT(dev_state), > + TYPE_PCI_DEVICE); > + bool pci_bus_master_enabled = pci_dev->bus_master_enable_region.enabled; > + > + if (!(pr->cmd & PORT_CMD_START) && (pr->cmd & PORT_CMD_LIST_ON)) { > + error_report("AHCI: DMA engine should be off, but status bit " > + "indicates it is still running."); > + return -1; > + } > + if (!(pr->cmd & PORT_CMD_FIS_RX) && (pr->cmd & PORT_CMD_FIS_ON)) { > + error_report("AHCI: FIS RX engine should be off, but status bit " > + "indicates it is still running."); > + return -1; > + } > + > + memory_region_set_enabled(&pci_dev->bus_master_enable_region, true); > + > + /* > + * After a migrate, the DMA/FIS engines are "off" and > + * need to be conditionally restarted > + */ > + pr->cmd &= ~(PORT_CMD_LIST_ON | PORT_CMD_FIS_ON); > + if (ahci_cond_start_engines(ad) != 0) { > + return -1; > + } > + memory_region_set_enabled(&pci_dev->bus_master_enable_region, > + pci_bus_master_enabled); > + > + return 0; > +} > + > static int ahci_state_post_load(void *opaque, int version_id) > { > int i, j; > struct AHCIDevice *ad; > NCQTransferState *ncq_tfs; > - AHCIPortRegs *pr; > AHCIState *s = opaque; > > for (i = 0; i < s->ports; i++) { > ad = &s->dev[i]; > - pr = &ad->port_regs; > - > - if (!(pr->cmd & PORT_CMD_START) && (pr->cmd & PORT_CMD_LIST_ON)) { > - error_report("AHCI: DMA engine should be off, but status bit " > - "indicates it is still running."); > - return -1; > - } > - if (!(pr->cmd & PORT_CMD_FIS_RX) && (pr->cmd & PORT_CMD_FIS_ON)) { > - error_report("AHCI: FIS RX engine should be off, but status bit " > - "indicates it is still running."); > - return -1; > - } > > - /* After a migrate, the DMA/FIS engines are "off" and > - * need to be conditionally restarted */ > - pr->cmd &= ~(PORT_CMD_LIST_ON | PORT_CMD_FIS_ON); > - if (ahci_cond_start_engines(ad) != 0) { > + if (ahci_state_load_engines(s, ad)) { > return -1; > } > >
Patchew URL: https://patchew.org/QEMU/1568049517-10261-1-git-send-email-andychiu@synology.com/ Hi, This series seems to have some coding style problems. See output below for more information: Subject: [Qemu-devel] [PATCH] ahci: enable pci bus master MemoryRegion before loading ahci engines Message-id: 1568049517-10261-1-git-send-email-andychiu@synology.com Type: series === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu - [tag update] patchew/20190909155813.27760-1-laurent@vivier.eu -> patchew/20190909155813.27760-1-laurent@vivier.eu Switched to a new branch 'test' 3073672 ahci: enable pci bus master MemoryRegion before loading ahci engines === OUTPUT BEGIN === ERROR: Author email address is mangled by the mailing list #2: Author: andychiu via Qemu-devel <qemu-devel@nongnu.org> total: 1 errors, 0 warnings, 69 lines checked Commit 307367238e8f (ahci: enable pci bus master MemoryRegion before loading ahci engines) has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/1568049517-10261-1-git-send-email-andychiu@synology.com/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
Hi John, This issue can only be reproduced on Windows 10. I've observed and compared the behavior of Windows 10 and Windows 7. It seems Windows 7 wouldn't disable the PCI_COMMAND_MASTER flag when disabling ahci devices. That's why this issue won't happen on Win7. Here's the trace log on both guest OS, on disabling and re-engaging SATA disk: Windows 10, disabling SATA disk: ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: 0x00040000 handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: 0x00: 27 80 ea 00 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: 0x00080000 handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: 0x00: 27 80 e0 00 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:GHC] @ 0x4: 0x0000000080000000 pci_cfg_write ich9-ahci 26:0 @0x4 <- 0x507 pci_cfg_write ich9-ahci 26:0 @0x82 <- 0x80 pci_cfg_write ich9-ahci 26:0 @0x4 <- 0x500 pci_update_mappings_del d=0x7f6da4eb0b20 00:1a.0 4,0xc0a0+0x20 pci_update_mappings_del d=0x7f6da4eb0b20 00:1a.0 5,0xfebf1000+0x1000 ----------------------------------------------------------------------------------------- Windows 10, re-engaging SATA disk: pci_cfg_write ich9-ahci 26:0 @0x14 <- 0x0 pci_cfg_write ich9-ahci 26:0 @0x18 <- 0x0 pci_cfg_write ich9-ahci 26:0 @0x1c <- 0x0 pci_cfg_write ich9-ahci 26:0 @0x20 <- 0xc0a0 pci_cfg_write ich9-ahci 26:0 @0x24 <- 0xfebf1000 pci_cfg_write ich9-ahci 26:0 @0x30 <- 0x0 pci_cfg_write ich9-ahci 26:0 @0x3c <- 0x0 pci_cfg_write ich9-ahci 26:0 @0xc <- 0x0 pci_cfg_write ich9-ahci 26:0 @0xd <- 0x0 pci_cfg_write ich9-ahci 26:0 @0x4 <- 0x500 pci_cfg_write ich9-ahci 26:0 @0x4 <- 0x507 pci_update_mappings_add d=0x7f6da4eb0b20 00:1a.0 4,0xc0a0+0x20 pci_update_mappings_add d=0x7f6da4eb0b20 00:1a.0 5,0xfebf1000+0x1000 pci_cfg_write ich9-ahci 26:0 @0x6 <- 0xf900 pci_cfg_write ich9-ahci 26:0 @0x4 <- 0x507 pci_cfg_write ich9-ahci 26:0 @0x82 <- 0x80 pci_cfg_write ich9-ahci 26:0 @0x84 <- 0xfee0100c pci_cfg_write ich9-ahci 26:0 @0x88 <- 0x0 pci_cfg_write ich9-ahci 26:0 @0x8c <- 0x49a1 pci_cfg_write ich9-ahci 26:0 @0x82 <- 0x81 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:GHC] @ 0x4: 0x0000000080000002 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCLB] @ 0x0: 0x7fe9f000 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCLBU] @ 0x4: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxFB] @ 0x8: 0x7fe9f400 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxFBU] @ 0xc: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCLB] @ 0x0: 0x7fea5000 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCLBU] @ 0x4: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxFB] @ 0x8: 0x7fea5400 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxFBU] @ 0xc: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCLB] @ 0x0: 0x7feab000 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCLBU] @ 0x4: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxFB] @ 0x8: 0x7feab400 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxFBU] @ 0xc: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCLB] @ 0x0: 0x7feb1000 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCLBU] @ 0x4: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxFB] @ 0x8: 0x7feb1400 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxFBU] @ 0xc: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCLB] @ 0x0: 0x7feb7000 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCLBU] @ 0x4: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxFB] @ 0x8: 0x7feb7400 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxFBU] @ 0xc: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCLB] @ 0x0: 0x7fe99000 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCLBU] @ 0x4: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxFB] @ 0x8: 0x7fe99400 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxFBU] @ 0xc: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCMD] @ 0x18: 0x0000c016 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCMD] @ 0x18: 0x00004006 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PXIE] @ 0x14: 0x7d00000f ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxSCTL] @ 0x2c: 0x00000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxSCTL] @ 0x2c: 0x00000000 ahci_reset_port ahci(0x7f6da4eb1400)[0]: reset port ide_reset IDEstate 0x7f6da4eb39f0 ide_reset IDEstate 0x7f6da4eb3dc0 ahci_set_signature ahci(0x7f6da4eb1400)[0]: set signature sector:0x01 nsector:0x01 lcyl:0x00 hcyl:0x00 (cumulatively: 0x00000101) ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PXIE] @ 0x14: 0x7d40004f ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCMD] @ 0x18: 0x00000016 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0xffffffff ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PXIE] @ 0x14: 0x7d40004f ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCMD] @ 0x18: 0x00004017 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: 0x00000001 handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: 0x00: 27 80 ef 66 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: 0x00000001 handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: 0x00: 27 80 ef 66 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: 0x00000001 handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: 0x00: 27 80 f5 00 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: 0x00000001 handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: 0x00: 27 80 ef 02 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: 0x00100000 handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: 0x00: 27 80 42 00 00 00 00 e0 00 00 00 00 01 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: 0x00200000 handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: 0x00: 27 80 b0 d8 01 4f c2 a0 00 00 00 00 01 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: 0x00400000 handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: 0x00: 27 80 ec 00 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_populate_sglist ahci(0x7f6da4eb1400)[0] ahci_dma_prepare_buf ahci(0x7f6da4eb1400)[0]: prepare buf limit=512 prepared=512 ahci_start_transfer ahci(0x7f6da4eb1400)[0]: reading 512 bytes on ata w/ sglist ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000003 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxIS] @ 0x10: 0xffffffff ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000002 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PXIE] @ 0x14: 0x7d40004f ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxIS] @ 0x10: 0xffffffff ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000004 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PXIE] @ 0x14: 0x7d40004f ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxIS] @ 0x10: 0xffffffff ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000008 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PXIE] @ 0x14: 0x7d40004f ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxIS] @ 0x10: 0xffffffff ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000010 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PXIE] @ 0x14: 0x7d40004f ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxIS] @ 0x10: 0xffffffff ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000020 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PXIE] @ 0x14: 0x7d40004f ----------------------------------------------------------------------------------------- Windows 7, disabling SATA disk: ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: 0x00020000 handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: 0x00: 27 80 ea 00 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: 0x00040000 handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: 0x00: 27 80 e0 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: 0x0000000000000001 ----------------------------------------------------------------------------------------- Windows 7, re-engaging SATA disk: ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: 0x00002000 handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: 0x00: 27 80 ef 66 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: 0x00004000 handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: 0x00: 27 80 f5 00 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: 0x00008000 handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: 0x00: 27 80 ef 02 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: 0x00010000 handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: 0x00: 27 80 ec 00 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_populate_sglist ahci(0x7fcc4e19b4a0)[0] ahci_dma_prepare_buf ahci(0x7fcc4e19b4a0)[0]: prepare buf limit=512 prepared=512 ahci_start_transfer ahci(0x7fcc4e19b4a0)[0]: reading 512 bytes on ata w/ sglist ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 0x00000002 ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: 0x0000000000000001 ----------------------------------------------------------------------------------------- -- Best regards, Andy Chiu John Snow <jsnow@redhat.com> 於 2019-09-10 02:13 寫道: > > > > On 9/9/19 1:18 PM, andychiu via Qemu-devel wrote: > > If Windows 10 guests have enabled 'turn off hard disk after idle' > > option in power settings, and the guest has a SATA disk plugged in, > > the SATA disk will be turned off after a specified idle time. > > If the guest is live migrated or saved/loaded with its SATA disk > > turned off, the following error will occur: > > > > qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address > > qemu-system-x86_64: Failed to load ich9_ahci:ahci > > qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:1a.0/ich9_ahci' > > qemu-system-x86_64: load of migration failed: Operation not permitted > > > > Oof. That can't have been fun to discover. > > > Observation from trace logs shows that a while after Windows 10 turns off > > a SATA disk (IDE disks don't have the following behavior), > > it will disable the PCI_COMMAND_MASTER flag of the pci device containing > > the ahci device. When the the disk is turning back on, > > the PCI_COMMAND_MASTER flag will be restored first. > > But if the guest is migrated or saved/loaded while the disk is off, > > the post_load callback of ahci device, ahci_state_post_load(), will fail > > at ahci_cond_start_engines() if the MemoryRegion > > pci_dev->bus_master_enable_region is not enabled, with pci_dev pointing > > to the PCIDevice struct containing the ahci device. > > > > This patch enables pci_dev->bus_master_enable_region before calling > > ahci_cond_start_engines() in ahci_state_post_load(), and restore the > > MemoryRegion to its original state afterwards.> > > This looks good to me from an AHCI perspective, but I'm not as clear on > the implications of toggling the MemoryRegion, so I have some doubts. > > > MST, can you chime in and clear my confusion? > > I suppose when the PCI_COMMAND_MASTER bit is turned off, we disable the > memory region, as a guest would be unable to establish a new mapping in > this time, so it makes sense that the attempt to map it fails. > > What's less clear to me is what happens to existing mappings when a > region is disabled. Are they invalidated? If so, does it make sense that > we are trying to establish a mapping here at all? Maybe it's absolutely > correct that this fails. > > (I suppose, though, that the simple toggling of the region won't be a > guest-visible event, so it's probably safe to do. Right?) > > What I find weird for AHCI is this: We try to engage the CLB mapping > before the FIS mapping, but we fail at the FIS mapping. So why is > PORT_CMD_FIS_RX set while PORT_CMD_START is unset? > > It kind of looks like we only half-heartedly stopped the AHCI device. > Maybe that's just what Windows does, but I wonder if there's a bug where > we're erroneously leaving PORT_CMD_FIS_RX set when we've been disabled. > It seems like the guest would need to re-set the mappings anyway, so > maybe trying to restore a stale mapping is not the right thing to do. > > > > Andy, if you have traces left over: What AHCI registers does Windows > touch when it disables the AHCI device? What registers does it touch > when it re-engages it? > > I just want to make sure I'm not leaving something dangling by accident. > > --js > > > Signed-off-by: andychiu <andychiu@synology.com> > > --- > > hw/ide/ahci.c | 53 ++++++++++++++++++++++++++++++++++++----------------- > > 1 file changed, 36 insertions(+), 17 deletions(-) > > > > diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c > > index d45393c..83f8c30 100644 > > --- a/hw/ide/ahci.c > > +++ b/hw/ide/ahci.c > > @@ -1649,33 +1649,52 @@ static const VMStateDescription vmstate_ahci_device = { > > }, > > }; > > > > +static int ahci_state_load_engines(AHCIState *s, AHCIDevice *ad) > > +{ > > + AHCIPortRegs *pr = &ad->port_regs; > > + DeviceState *dev_state = s->container; > > + PCIDevice *pci_dev = (PCIDevice *) object_dynamic_cast(OBJECT(dev_state), > > + TYPE_PCI_DEVICE); > > + bool pci_bus_master_enabled = pci_dev->bus_master_enable_region.enabled; > > + > > + if (!(pr->cmd & PORT_CMD_START) && (pr->cmd & PORT_CMD_LIST_ON)) { > > + error_report("AHCI: DMA engine should be off, but status bit " > > + "indicates it is still running."); > > + return -1; > > + } > > + if (!(pr->cmd & PORT_CMD_FIS_RX) && (pr->cmd & PORT_CMD_FIS_ON)) { > > + error_report("AHCI: FIS RX engine should be off, but status bit " > > + "indicates it is still running."); > > + return -1; > > + } > > + > > + memory_region_set_enabled(&pci_dev->bus_master_enable_region, true); > > + > > + /* > > + * After a migrate, the DMA/FIS engines are "off" and > > + * need to be conditionally restarted > > + */ > > + pr->cmd &= ~(PORT_CMD_LIST_ON | PORT_CMD_FIS_ON); > > + if (ahci_cond_start_engines(ad) != 0) { > > + return -1; > > + } > > + memory_region_set_enabled(&pci_dev->bus_master_enable_region, > > + pci_bus_master_enabled); > > + > > + return 0; > > +} > > + > > static int ahci_state_post_load(void *opaque, int version_id) > > { > > int i, j; > > struct AHCIDevice *ad; > > NCQTransferState *ncq_tfs; > > - AHCIPortRegs *pr; > > AHCIState *s = opaque; > > > > for (i = 0; i < s->ports; i++) { > > ad = &s->dev[i]; > > - pr = &ad->port_regs; > > - > > - if (!(pr->cmd & PORT_CMD_START) && (pr->cmd & PORT_CMD_LIST_ON)) { > > - error_report("AHCI: DMA engine should be off, but status bit " > > - "indicates it is still running."); > > - return -1; > > - } > > - if (!(pr->cmd & PORT_CMD_FIS_RX) && (pr->cmd & PORT_CMD_FIS_ON)) { > > - error_report("AHCI: FIS RX engine should be off, but status bit " > > - "indicates it is still running."); > > - return -1; > > - } > > > > - /* After a migrate, the DMA/FIS engines are "off" and > > - * need to be conditionally restarted */ > > - pr->cmd &= ~(PORT_CMD_LIST_ON | PORT_CMD_FIS_ON); > > - if (ahci_cond_start_engines(ad) != 0) { > > + if (ahci_state_load_engines(s, ad)) { > > return -1; > > } > > > >
On Tue, Sep 10, 2019 at 01:18:37AM +0800, andychiu wrote: > If Windows 10 guests have enabled 'turn off hard disk after idle' > option in power settings, and the guest has a SATA disk plugged in, > the SATA disk will be turned off after a specified idle time. > If the guest is live migrated or saved/loaded with its SATA disk > turned off, the following error will occur: > > qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address > qemu-system-x86_64: Failed to load ich9_ahci:ahci > qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:1a.0/ich9_ahci' > qemu-system-x86_64: load of migration failed: Operation not permitted > > Observation from trace logs shows that a while after Windows 10 turns off > a SATA disk (IDE disks don't have the following behavior), > it will disable the PCI_COMMAND_MASTER flag of the pci device containing > the ahci device. When the the disk is turning back on, > the PCI_COMMAND_MASTER flag will be restored first. > But if the guest is migrated or saved/loaded while the disk is off, > the post_load callback of ahci device, ahci_state_post_load(), will fail > at ahci_cond_start_engines() if the MemoryRegion > pci_dev->bus_master_enable_region is not enabled, with pci_dev pointing > to the PCIDevice struct containing the ahci device. > > This patch enables pci_dev->bus_master_enable_region before calling > ahci_cond_start_engines() in ahci_state_post_load(), and restore the > MemoryRegion to its original state afterwards. > > Signed-off-by: andychiu <andychiu@synology.com> Poking at PCI device internals like this seems fragile. And force enabling bus master can lead to unpleasantness like corrupting guest memory, unhandled interrupts, etc. E.g. it's quite reasonable, spec-wise, for the guest to move thing in memory around while bus mastering is off. Can you teach ahci that region being disabled during migration is ok, and recover from it? > --- > hw/ide/ahci.c | 53 ++++++++++++++++++++++++++++++++++++----------------- > 1 file changed, 36 insertions(+), 17 deletions(-) > > diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c > index d45393c..83f8c30 100644 > --- a/hw/ide/ahci.c > +++ b/hw/ide/ahci.c > @@ -1649,33 +1649,52 @@ static const VMStateDescription vmstate_ahci_device = { > }, > }; > > +static int ahci_state_load_engines(AHCIState *s, AHCIDevice *ad) > +{ > + AHCIPortRegs *pr = &ad->port_regs; > + DeviceState *dev_state = s->container; > + PCIDevice *pci_dev = (PCIDevice *) object_dynamic_cast(OBJECT(dev_state), > + TYPE_PCI_DEVICE); > + bool pci_bus_master_enabled = pci_dev->bus_master_enable_region.enabled; > + > + if (!(pr->cmd & PORT_CMD_START) && (pr->cmd & PORT_CMD_LIST_ON)) { > + error_report("AHCI: DMA engine should be off, but status bit " > + "indicates it is still running."); > + return -1; > + } > + if (!(pr->cmd & PORT_CMD_FIS_RX) && (pr->cmd & PORT_CMD_FIS_ON)) { > + error_report("AHCI: FIS RX engine should be off, but status bit " > + "indicates it is still running."); > + return -1; > + } > + > + memory_region_set_enabled(&pci_dev->bus_master_enable_region, true); > + > + /* > + * After a migrate, the DMA/FIS engines are "off" and > + * need to be conditionally restarted > + */ > + pr->cmd &= ~(PORT_CMD_LIST_ON | PORT_CMD_FIS_ON); > + if (ahci_cond_start_engines(ad) != 0) { > + return -1; > + } > + memory_region_set_enabled(&pci_dev->bus_master_enable_region, > + pci_bus_master_enabled); > + > + return 0; > +} > + > static int ahci_state_post_load(void *opaque, int version_id) > { > int i, j; > struct AHCIDevice *ad; > NCQTransferState *ncq_tfs; > - AHCIPortRegs *pr; > AHCIState *s = opaque; > > for (i = 0; i < s->ports; i++) { > ad = &s->dev[i]; > - pr = &ad->port_regs; > - > - if (!(pr->cmd & PORT_CMD_START) && (pr->cmd & PORT_CMD_LIST_ON)) { > - error_report("AHCI: DMA engine should be off, but status bit " > - "indicates it is still running."); > - return -1; > - } > - if (!(pr->cmd & PORT_CMD_FIS_RX) && (pr->cmd & PORT_CMD_FIS_ON)) { > - error_report("AHCI: FIS RX engine should be off, but status bit " > - "indicates it is still running."); > - return -1; > - } > > - /* After a migrate, the DMA/FIS engines are "off" and > - * need to be conditionally restarted */ > - pr->cmd &= ~(PORT_CMD_LIST_ON | PORT_CMD_FIS_ON); > - if (ahci_cond_start_engines(ad) != 0) { > + if (ahci_state_load_engines(s, ad)) { > return -1; > } > > -- > 2.7.4
Hi John, Sorry I'm re-sending this mail due to format issue in the last one. This issue can only be reproduced on Windows 10. I've observed and compared the behavior of Windows 10 and Windows 7. It seems Windows 7 wouldn't disable the PCI_COMMAND_MASTER flag when disabling ahci devices. That's why this issue won't happen on Win7. Here's the trace log on both guest OS, on disabling and re-engaging SATA disk: Windows 10, disabling SATA disk: ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: 0x00040000 handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: 0x00: 27 80 ea 00 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: 0x00080000 handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: 0x00: 27 80 e0 00 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:GHC] @ 0x4: 0x0000000080000000 pci_cfg_write ich9-ahci 26:0 @0x4 <- 0x507 pci_cfg_write ich9-ahci 26:0 @0x82 <- 0x80 pci_cfg_write ich9-ahci 26:0 @0x4 <- 0x500 pci_update_mappings_del d=0x7f6da4eb0b20 00:1a.0 4,0xc0a0+0x20 pci_update_mappings_del d=0x7f6da4eb0b20 00:1a.0 5,0xfebf1000+0x1000 ------------------------------------------------------------------- Windows 10, re-engaging SATA disk: pci_cfg_write ich9-ahci 26:0 @0x14 <- 0x0 pci_cfg_write ich9-ahci 26:0 @0x18 <- 0x0 pci_cfg_write ich9-ahci 26:0 @0x1c <- 0x0 pci_cfg_write ich9-ahci 26:0 @0x20 <- 0xc0a0 pci_cfg_write ich9-ahci 26:0 @0x24 <- 0xfebf1000 pci_cfg_write ich9-ahci 26:0 @0x30 <- 0x0 pci_cfg_write ich9-ahci 26:0 @0x3c <- 0x0 pci_cfg_write ich9-ahci 26:0 @0xc <- 0x0 pci_cfg_write ich9-ahci 26:0 @0xd <- 0x0 pci_cfg_write ich9-ahci 26:0 @0x4 <- 0x500 pci_cfg_write ich9-ahci 26:0 @0x4 <- 0x507 pci_update_mappings_add d=0x7f6da4eb0b20 00:1a.0 4,0xc0a0+0x20 pci_update_mappings_add d=0x7f6da4eb0b20 00:1a.0 5,0xfebf1000+0x1000 pci_cfg_write ich9-ahci 26:0 @0x6 <- 0xf900 pci_cfg_write ich9-ahci 26:0 @0x4 <- 0x507 pci_cfg_write ich9-ahci 26:0 @0x82 <- 0x80 pci_cfg_write ich9-ahci 26:0 @0x84 <- 0xfee0100c pci_cfg_write ich9-ahci 26:0 @0x88 <- 0x0 pci_cfg_write ich9-ahci 26:0 @0x8c <- 0x49a1 pci_cfg_write ich9-ahci 26:0 @0x82 <- 0x81 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:GHC] @ 0x4: 0x0000000080000002 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCLB] @ 0x0: 0x7fe9f000 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCLBU] @ 0x4: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxFB] @ 0x8: 0x7fe9f400 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxFBU] @ 0xc: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCLB] @ 0x0: 0x7fea5000 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCLBU] @ 0x4: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxFB] @ 0x8: 0x7fea5400 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxFBU] @ 0xc: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCLB] @ 0x0: 0x7feab000 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCLBU] @ 0x4: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxFB] @ 0x8: 0x7feab400 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxFBU] @ 0xc: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCLB] @ 0x0: 0x7feb1000 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCLBU] @ 0x4: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxFB] @ 0x8: 0x7feb1400 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxFBU] @ 0xc: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCLB] @ 0x0: 0x7feb7000 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCLBU] @ 0x4: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxFB] @ 0x8: 0x7feb7400 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxFBU] @ 0xc: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCLB] @ 0x0: 0x7fe99000 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCLBU] @ 0x4: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxFB] @ 0x8: 0x7fe99400 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxFBU] @ 0xc: 0x00000000 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCMD] @ 0x18: 0x0000c016 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCMD] @ 0x18: 0x00004006 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PXIE] @ 0x14: 0x7d00000f ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxSCTL] @ 0x2c: 0x00000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxSCTL] @ 0x2c: 0x00000000 ahci_reset_port ahci(0x7f6da4eb1400)[0]: reset port ide_reset IDEstate 0x7f6da4eb39f0 ide_reset IDEstate 0x7f6da4eb3dc0 ahci_set_signature ahci(0x7f6da4eb1400)[0]: set signature sector:0x01 nsector:0x01 lcyl:0x00 hcyl:0x00 (cumulatively: 0x00000101) ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PXIE] @ 0x14: 0x7d40004f ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCMD] @ 0x18: 0x00000016 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0xffffffff ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PXIE] @ 0x14: 0x7d40004f ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCMD] @ 0x18: 0x00004017 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: 0x00000001 handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: 0x00: 27 80 ef 66 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: 0x00000001 handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: 0x00: 27 80 ef 66 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: 0x00000001 handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: 0x00: 27 80 f5 00 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: 0x00000001 handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: 0x00: 27 80 ef 02 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: 0x00100000 handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: 0x00: 27 80 42 00 00 00 00 e0 00 00 00 00 01 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: 0x00200000 handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: 0x00: 27 80 b0 d8 01 4f c2 a0 00 00 00 00 01 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: 0x00400000 handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: 0x00: 27 80 ec 00 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_populate_sglist ahci(0x7f6da4eb1400)[0] ahci_dma_prepare_buf ahci(0x7f6da4eb1400)[0]: prepare buf limit=512 prepared=512 ahci_start_transfer ahci(0x7f6da4eb1400)[0]: reading 512 bytes on ata w/ sglist ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: 0x00000003 ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxIS] @ 0x10: 0xffffffff ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000002 ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PXIE] @ 0x14: 0x7d40004f ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxIS] @ 0x10: 0xffffffff ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000004 ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PXIE] @ 0x14: 0x7d40004f ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxIS] @ 0x10: 0xffffffff ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000008 ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PXIE] @ 0x14: 0x7d40004f ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxIS] @ 0x10: 0xffffffff ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000010 ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PXIE] @ 0x14: 0x7d40004f ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCMD] @ 0x18: 0x00000006 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxIS] @ 0x10: 0xffffffff ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: 0x0000000000000020 ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PXIE] @ 0x14: 0x7d40004f ------------------------------------------------------------------- Windows 7, disabling SATA disk: ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: 0x00020000 handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: 0x00: 27 80 ea 00 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: 0x00040000 handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: 0x00: 27 80 e0 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: 0x0000000000000001 ------------------------------------------------------------------- Windows 7, re-engaging SATA disk: ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: 0x00002000 handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: 0x00: 27 80 ef 66 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: 0x00004000 handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: 0x00: 27 80 f5 00 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: 0x00008000 handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: 0x00: 27 80 ef 02 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: 0x0000000000000001 ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: 0x00010000 handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: 0x00: 27 80 ec 00 00 00 00 a0 00 00 00 00 00 00 00 00 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ahci_populate_sglist ahci(0x7fcc4e19b4a0)[0] ahci_dma_prepare_buf ahci(0x7fcc4e19b4a0)[0]: prepare buf limit=512 prepared=512 ahci_start_transfer ahci(0x7fcc4e19b4a0)[0]: reading 512 bytes on ata w/ sglist ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: 0xffffffff ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 0x00000001 ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: 0x00000002 ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: 0x0000000000000001 ------------------------------------------------------------------- -- Best regards, Andy Chiu On 2019/9/10 上午2:13, John Snow wrote: > On 9/9/19 1:18 PM, andychiu via Qemu-devel wrote: >> If Windows 10 guests have enabled 'turn off hard disk after idle' >> option in power settings, and the guest has a SATA disk plugged in, >> the SATA disk will be turned off after a specified idle time. >> If the guest is live migrated or saved/loaded with its SATA disk >> turned off, the following error will occur: >> >> qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address >> qemu-system-x86_64: Failed to load ich9_ahci:ahci >> qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:1a.0/ich9_ahci' >> qemu-system-x86_64: load of migration failed: Operation not permitted >> > Oof. That can't have been fun to discover. > >> Observation from trace logs shows that a while after Windows 10 turns off >> a SATA disk (IDE disks don't have the following behavior), >> it will disable the PCI_COMMAND_MASTER flag of the pci device containing >> the ahci device. When the the disk is turning back on, >> the PCI_COMMAND_MASTER flag will be restored first. >> But if the guest is migrated or saved/loaded while the disk is off, >> the post_load callback of ahci device, ahci_state_post_load(), will fail >> at ahci_cond_start_engines() if the MemoryRegion >> pci_dev->bus_master_enable_region is not enabled, with pci_dev pointing >> to the PCIDevice struct containing the ahci device. >> >> This patch enables pci_dev->bus_master_enable_region before calling >> ahci_cond_start_engines() in ahci_state_post_load(), and restore the >> MemoryRegion to its original state afterwards.> > This looks good to me from an AHCI perspective, but I'm not as clear on > the implications of toggling the MemoryRegion, so I have some doubts. > > > MST, can you chime in and clear my confusion? > > I suppose when the PCI_COMMAND_MASTER bit is turned off, we disable the > memory region, as a guest would be unable to establish a new mapping in > this time, so it makes sense that the attempt to map it fails. > > What's less clear to me is what happens to existing mappings when a > region is disabled. Are they invalidated? If so, does it make sense that > we are trying to establish a mapping here at all? Maybe it's absolutely > correct that this fails. > > (I suppose, though, that the simple toggling of the region won't be a > guest-visible event, so it's probably safe to do. Right?) > > What I find weird for AHCI is this: We try to engage the CLB mapping > before the FIS mapping, but we fail at the FIS mapping. So why is > PORT_CMD_FIS_RX set while PORT_CMD_START is unset? > > It kind of looks like we only half-heartedly stopped the AHCI device. > Maybe that's just what Windows does, but I wonder if there's a bug where > we're erroneously leaving PORT_CMD_FIS_RX set when we've been disabled. > It seems like the guest would need to re-set the mappings anyway, so > maybe trying to restore a stale mapping is not the right thing to do. > > > > Andy, if you have traces left over: What AHCI registers does Windows > touch when it disables the AHCI device? What registers does it touch > when it re-engages it? > > I just want to make sure I'm not leaving something dangling by accident. > > --js > >> Signed-off-by: andychiu<andychiu@synology.com> >> --- >> hw/ide/ahci.c | 53 ++++++++++++++++++++++++++++++++++++----------------- >> 1 file changed, 36 insertions(+), 17 deletions(-) >> >> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c >> index d45393c..83f8c30 100644 >> --- a/hw/ide/ahci.c >> +++ b/hw/ide/ahci.c >> @@ -1649,33 +1649,52 @@ static const VMStateDescription vmstate_ahci_device = { >> }, >> }; >> >> +static int ahci_state_load_engines(AHCIState *s, AHCIDevice *ad) >> +{ >> + AHCIPortRegs *pr = &ad->port_regs; >> + DeviceState *dev_state = s->container; >> + PCIDevice *pci_dev = (PCIDevice *) object_dynamic_cast(OBJECT(dev_state), >> + TYPE_PCI_DEVICE); >> + bool pci_bus_master_enabled = pci_dev->bus_master_enable_region.enabled; >> + >> + if (!(pr->cmd & PORT_CMD_START) && (pr->cmd & PORT_CMD_LIST_ON)) { >> + error_report("AHCI: DMA engine should be off, but status bit " >> + "indicates it is still running."); >> + return -1; >> + } >> + if (!(pr->cmd & PORT_CMD_FIS_RX) && (pr->cmd & PORT_CMD_FIS_ON)) { >> + error_report("AHCI: FIS RX engine should be off, but status bit " >> + "indicates it is still running."); >> + return -1; >> + } >> + >> + memory_region_set_enabled(&pci_dev->bus_master_enable_region, true); >> + >> + /* >> + * After a migrate, the DMA/FIS engines are "off" and >> + * need to be conditionally restarted >> + */ >> + pr->cmd &= ~(PORT_CMD_LIST_ON | PORT_CMD_FIS_ON); >> + if (ahci_cond_start_engines(ad) != 0) { >> + return -1; >> + } >> + memory_region_set_enabled(&pci_dev->bus_master_enable_region, >> + pci_bus_master_enabled); >> + >> + return 0; >> +} >> + >> static int ahci_state_post_load(void *opaque, int version_id) >> { >> int i, j; >> struct AHCIDevice *ad; >> NCQTransferState *ncq_tfs; >> - AHCIPortRegs *pr; >> AHCIState *s = opaque; >> >> for (i = 0; i < s->ports; i++) { >> ad = &s->dev[i]; >> - pr = &ad->port_regs; >> - >> - if (!(pr->cmd & PORT_CMD_START) && (pr->cmd & PORT_CMD_LIST_ON)) { >> - error_report("AHCI: DMA engine should be off, but status bit " >> - "indicates it is still running."); >> - return -1; >> - } >> - if (!(pr->cmd & PORT_CMD_FIS_RX) && (pr->cmd & PORT_CMD_FIS_ON)) { >> - error_report("AHCI: FIS RX engine should be off, but status bit " >> - "indicates it is still running."); >> - return -1; >> - } >> >> - /* After a migrate, the DMA/FIS engines are "off" and >> - * need to be conditionally restarted */ >> - pr->cmd &= ~(PORT_CMD_LIST_ON | PORT_CMD_FIS_ON); >> - if (ahci_cond_start_engines(ad) != 0) { >> + if (ahci_state_load_engines(s, ad)) { >> return -1; >> } >> >>
On 9/10/19 3:04 AM, Michael S. Tsirkin wrote: > On Tue, Sep 10, 2019 at 01:18:37AM +0800, andychiu wrote: >> If Windows 10 guests have enabled 'turn off hard disk after idle' >> option in power settings, and the guest has a SATA disk plugged in, >> the SATA disk will be turned off after a specified idle time. >> If the guest is live migrated or saved/loaded with its SATA disk >> turned off, the following error will occur: >> >> qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address >> qemu-system-x86_64: Failed to load ich9_ahci:ahci >> qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:1a.0/ich9_ahci' >> qemu-system-x86_64: load of migration failed: Operation not permitted >> >> Observation from trace logs shows that a while after Windows 10 turns off >> a SATA disk (IDE disks don't have the following behavior), >> it will disable the PCI_COMMAND_MASTER flag of the pci device containing >> the ahci device. When the the disk is turning back on, >> the PCI_COMMAND_MASTER flag will be restored first. >> But if the guest is migrated or saved/loaded while the disk is off, >> the post_load callback of ahci device, ahci_state_post_load(), will fail >> at ahci_cond_start_engines() if the MemoryRegion >> pci_dev->bus_master_enable_region is not enabled, with pci_dev pointing >> to the PCIDevice struct containing the ahci device. >> >> This patch enables pci_dev->bus_master_enable_region before calling >> ahci_cond_start_engines() in ahci_state_post_load(), and restore the >> MemoryRegion to its original state afterwards. >> >> Signed-off-by: andychiu <andychiu@synology.com> > > Poking at PCI device internals like this seems fragile. And force > enabling bus master can lead to unpleasantness like corrupting guest > memory, unhandled interrupts, etc. E.g. it's quite reasonable, > spec-wise, for the guest to move thing in memory around while bus > mastering is off. > > Can you teach ahci that region being disabled > during migration is ok, and recover from it? That's what I'm wondering. I could try to just disable the FIS RX engine if the mapping fails, but that will require a change to guest visible state. My hunch, though, is that when windows re-enables the device it will need to re-program the address registers anyway, so it might cope well with the FIS RX bit getting switched off. (I'm wondering if it isn't a mistake that QEMU is trying to re-map this address in the first place. Is it legal that the PCI device has pci bus master disabled but we've held on to a mapping? Should there be some callback where AHCI knows to invalidate mappings at that point...?)
On Tue, Sep 10, 2019 at 09:50:41AM -0400, John Snow wrote: > > > On 9/10/19 3:04 AM, Michael S. Tsirkin wrote: > > On Tue, Sep 10, 2019 at 01:18:37AM +0800, andychiu wrote: > >> If Windows 10 guests have enabled 'turn off hard disk after idle' > >> option in power settings, and the guest has a SATA disk plugged in, > >> the SATA disk will be turned off after a specified idle time. > >> If the guest is live migrated or saved/loaded with its SATA disk > >> turned off, the following error will occur: > >> > >> qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address > >> qemu-system-x86_64: Failed to load ich9_ahci:ahci > >> qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:1a.0/ich9_ahci' > >> qemu-system-x86_64: load of migration failed: Operation not permitted > >> > >> Observation from trace logs shows that a while after Windows 10 turns off > >> a SATA disk (IDE disks don't have the following behavior), > >> it will disable the PCI_COMMAND_MASTER flag of the pci device containing > >> the ahci device. When the the disk is turning back on, > >> the PCI_COMMAND_MASTER flag will be restored first. > >> But if the guest is migrated or saved/loaded while the disk is off, > >> the post_load callback of ahci device, ahci_state_post_load(), will fail > >> at ahci_cond_start_engines() if the MemoryRegion > >> pci_dev->bus_master_enable_region is not enabled, with pci_dev pointing > >> to the PCIDevice struct containing the ahci device. > >> > >> This patch enables pci_dev->bus_master_enable_region before calling > >> ahci_cond_start_engines() in ahci_state_post_load(), and restore the > >> MemoryRegion to its original state afterwards. > >> > >> Signed-off-by: andychiu <andychiu@synology.com> > > > > Poking at PCI device internals like this seems fragile. And force > > enabling bus master can lead to unpleasantness like corrupting guest > > memory, unhandled interrupts, etc. E.g. it's quite reasonable, > > spec-wise, for the guest to move thing in memory around while bus > > mastering is off. > > > > Can you teach ahci that region being disabled > > during migration is ok, and recover from it? > > That's what I'm wondering. > > I could try to just disable the FIS RX engine if the mapping fails, but > that will require a change to guest visible state. > > My hunch, though, is that when windows re-enables the device it will > need to re-program the address registers anyway, so it might cope well > with the FIS RX bit getting switched off. > > (I'm wondering if it isn't a mistake that QEMU is trying to re-map this > address in the first place. Is it legal that the PCI device has pci bus > master disabled but we've held on to a mapping? If you are poking at guest memory when bus master is off, then most likely yes. > Should there be some > callback where AHCI knows to invalidate mappings at that point...?) ATM the callback is the config write, you check proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER and if disabled invalidate the mapping. virtio at least has code that pokes at proxy->pci_dev.config[PCI_COMMAND] too, I'm quite open to a function along the lines of pci_is_bus_master_enabled() that will do this.
On 9/10/19 9:58 AM, Michael S. Tsirkin wrote: > On Tue, Sep 10, 2019 at 09:50:41AM -0400, John Snow wrote: >> >> >> On 9/10/19 3:04 AM, Michael S. Tsirkin wrote: >>> On Tue, Sep 10, 2019 at 01:18:37AM +0800, andychiu wrote: >>>> If Windows 10 guests have enabled 'turn off hard disk after idle' >>>> option in power settings, and the guest has a SATA disk plugged in, >>>> the SATA disk will be turned off after a specified idle time. >>>> If the guest is live migrated or saved/loaded with its SATA disk >>>> turned off, the following error will occur: >>>> >>>> qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address >>>> qemu-system-x86_64: Failed to load ich9_ahci:ahci >>>> qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:1a.0/ich9_ahci' >>>> qemu-system-x86_64: load of migration failed: Operation not permitted >>>> >>>> Observation from trace logs shows that a while after Windows 10 turns off >>>> a SATA disk (IDE disks don't have the following behavior), >>>> it will disable the PCI_COMMAND_MASTER flag of the pci device containing >>>> the ahci device. When the the disk is turning back on, >>>> the PCI_COMMAND_MASTER flag will be restored first. >>>> But if the guest is migrated or saved/loaded while the disk is off, >>>> the post_load callback of ahci device, ahci_state_post_load(), will fail >>>> at ahci_cond_start_engines() if the MemoryRegion >>>> pci_dev->bus_master_enable_region is not enabled, with pci_dev pointing >>>> to the PCIDevice struct containing the ahci device. >>>> >>>> This patch enables pci_dev->bus_master_enable_region before calling >>>> ahci_cond_start_engines() in ahci_state_post_load(), and restore the >>>> MemoryRegion to its original state afterwards. >>>> >>>> Signed-off-by: andychiu <andychiu@synology.com> >>> >>> Poking at PCI device internals like this seems fragile. And force >>> enabling bus master can lead to unpleasantness like corrupting guest >>> memory, unhandled interrupts, etc. E.g. it's quite reasonable, >>> spec-wise, for the guest to move thing in memory around while bus >>> mastering is off. >>> >>> Can you teach ahci that region being disabled >>> during migration is ok, and recover from it? >> >> That's what I'm wondering. >> >> I could try to just disable the FIS RX engine if the mapping fails, but >> that will require a change to guest visible state. >> >> My hunch, though, is that when windows re-enables the device it will >> need to re-program the address registers anyway, so it might cope well >> with the FIS RX bit getting switched off. >> >> (I'm wondering if it isn't a mistake that QEMU is trying to re-map this >> address in the first place. Is it legal that the PCI device has pci bus >> master disabled but we've held on to a mapping? > > If you are poking at guest memory when bus master is off, then most likely yes. > >> Should there be some >> callback where AHCI knows to invalidate mappings at that point...?) > > ATM the callback is the config write, you check > proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER > and if disabled invalidate the mapping. > > virtio at least has code that pokes at > proxy->pci_dev.config[PCI_COMMAND] too, I'm quite > open to a function along the lines of > pci_is_bus_master_enabled() > that will do this. > Well, that's not a callback. I don't think it's right to check the PCI_COMMAND register *every* time AHCI does anything at all to see if its mappings are still valid. AHCI makes a mapping *once* when FIS RX is turned on, and it unmaps it when it's turned off. It assumes it remains valid that whole time. When we migrate, it checks to see if it was running, and performs the mappings again to re-boot the state machine. What I'm asking is; what are the implications of a guest disabling PCI_COMMAND_MASTER? (I don't know PCI as well as you do.) What should that mean for the AHCI state machine? Does this *necessarily* invalidate the mappings? (In which case -- it's an error that AHCI held on to them after Windows disabled the card, even if AHCI isn't being engaged by the guest anymore. Essentially, we were turned off but didn't clean up a dangling pointer, but we need the event that tells us to clean the dangling mapping.)
On 9/10/19 3:20 AM, Andy wrote: > Hi John, > > Sorry I'm re-sending this mail due to format issue in the last one. > No problem at all. Thank you for the detailed logs, it's really helpful. > This issue can only be reproduced on Windows 10. > I've observed and compared the behavior of Windows 10 and Windows 7. > It seems Windows 7 wouldn't disable the PCI_COMMAND_MASTER flag > when disabling ahci devices. That's why this issue won't happen on Win7. > > Here's the trace log on both guest OS, on disabling and re-engaging SATA > disk: > > Windows 10, disabling SATA disk: > > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: > 0x00040000 > handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: > 0x00: 27 80 ea 00 00 00 00 a0 00 00 00 00 00 00 00 00 > 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: > 0x00000001 > ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: > 0x0000000000000001 > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: > 0x00080000 > handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: > 0x00: 27 80 e0 00 00 00 00 a0 00 00 00 00 00 00 00 00 > 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: > 0x00000001 > ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: > 0x0000000000000001 > ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:GHC] @ 0x4: > 0x0000000080000000 > pci_cfg_write ich9-ahci 26:0 @0x4 <- 0x507 > pci_cfg_write ich9-ahci 26:0 @0x82 <- 0x80 > pci_cfg_write ich9-ahci 26:0 @0x4 <- 0x500 > pci_update_mappings_del d=0x7f6da4eb0b20 00:1a.0 4,0xc0a0+0x20 > pci_update_mappings_del d=0x7f6da4eb0b20 00:1a.0 5,0xfebf1000+0x1000 ! It doesn't look like windows changes the PxCMD registers at all, it's trying to clear AHCI Enable (AE) instead. GHC's bits are: 0: HBA Reset (HR), RW1 1: Interrupt Enable (IE), RW 2: MSI Revert to Single Message (MSRM), Read Only 31: AHCI Enable (AE) It looks like it's trying to disable the AHCI device in this manner, but I'm not sure that makes sense; it's not really a spin-down or sleep command. Our implementation for AHCI sets HOST_CAP_AHCI, which is CAP.SAM in the spec -- "Supports AHCI-mode Only" -- which means that AHCI Enable is supposed to be RO and set to '1'. I'm not sure what Windows is trying to do here -- it might assume that this is a true-blue ICH9 and it can switch off the AHCI engine with a quirk. It's not immediately clear to me what QEMU should do when it sees this behavior. > ------------------------------------------------------------------- > > Windows 10, re-engaging SATA disk: > > pci_cfg_write ich9-ahci 26:0 @0x14 <- 0x0 > pci_cfg_write ich9-ahci 26:0 @0x18 <- 0x0 > pci_cfg_write ich9-ahci 26:0 @0x1c <- 0x0 > pci_cfg_write ich9-ahci 26:0 @0x20 <- 0xc0a0 > pci_cfg_write ich9-ahci 26:0 @0x24 <- 0xfebf1000 > pci_cfg_write ich9-ahci 26:0 @0x30 <- 0x0 > pci_cfg_write ich9-ahci 26:0 @0x3c <- 0x0 > pci_cfg_write ich9-ahci 26:0 @0xc <- 0x0 > pci_cfg_write ich9-ahci 26:0 @0xd <- 0x0 > pci_cfg_write ich9-ahci 26:0 @0x4 <- 0x500 > pci_cfg_write ich9-ahci 26:0 @0x4 <- 0x507 > pci_update_mappings_add d=0x7f6da4eb0b20 00:1a.0 4,0xc0a0+0x20 > pci_update_mappings_add d=0x7f6da4eb0b20 00:1a.0 5,0xfebf1000+0x1000 > pci_cfg_write ich9-ahci 26:0 @0x6 <- 0xf900 > pci_cfg_write ich9-ahci 26:0 @0x4 <- 0x507 > pci_cfg_write ich9-ahci 26:0 @0x82 <- 0x80 > pci_cfg_write ich9-ahci 26:0 @0x84 <- 0xfee0100c > pci_cfg_write ich9-ahci 26:0 @0x88 <- 0x0 > pci_cfg_write ich9-ahci 26:0 @0x8c <- 0x49a1 > pci_cfg_write ich9-ahci 26:0 @0x82 <- 0x81 > ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:GHC] @ 0x4: > 0x0000000080000002 Not sure what that 8 is. Some vendor-specific thing that the real ICH9 might implement. > ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCLB] @ 0x0: > 0x7fe9f000 > ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCLBU] @ 0x4: > 0x00000000 > ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxFB] @ 0x8: > 0x7fe9f400 > ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxFBU] @ 0xc: > 0x00000000 > ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCLB] @ 0x0: > 0x7fea5000 > ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCLBU] @ 0x4: > 0x00000000 > ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxFB] @ 0x8: > 0x7fea5400 > ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxFBU] @ 0xc: > 0x00000000 > ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCLB] @ 0x0: > 0x7feab000 > ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCLBU] @ 0x4: > 0x00000000 > ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxFB] @ 0x8: > 0x7feab400 > ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxFBU] @ 0xc: > 0x00000000 > ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCLB] @ 0x0: > 0x7feb1000 > ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCLBU] @ 0x4: > 0x00000000 > ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxFB] @ 0x8: > 0x7feb1400 > ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxFBU] @ 0xc: > 0x00000000 > ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCLB] @ 0x0: > 0x7feb7000 > ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCLBU] @ 0x4: > 0x00000000 > ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxFB] @ 0x8: > 0x7feb7400 > ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxFBU] @ 0xc: > 0x00000000 > ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCLB] @ 0x0: > 0x7fe99000 > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCLBU] @ 0x4: > 0x00000000 > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxFB] @ 0x8: > 0x7fe99400 > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxFBU] @ 0xc: > 0x00000000 So it does rewrite the mappings when it re-engages anyway. Might be safe to ignore the mapping failures and just turn off the engines and see what windows does. > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCMD] @ 0x18: > 0x0000c016 0xc016 ... 1100 0000 0001 0110 0: ~PORT_CMD_START 1: PORT_CMD_SPIN_UP 2: PORT_CMD_POWER_ON 3: ~PORT_CMD_CLO 4: PORT_CMD_FIS_RX ... 14: PORT_CMD_FIS_ON 15: PORT_CMD_LIST_ON Turning FIS RX on but not engaging the CLB yet. > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCMD] @ 0x18: > 0x00004006 0100 0000 0000 0110 1: PORT_CMD_SPIN_UP 2: PORT_CMD_POWER_ON 14: PORT_CMD_FIS_ON We're requesting FIS RX off. > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PXIE] @ 0x14: (Just now noticing I typo'd PxIE as PXIE. Oops.) Well, I'd love to have a conclusion for you but I don't right now. Maybe it's best to just fudge it for now with something like this: post-migrate, if PCI_COMMAND_MASTER is off, skip trying to re-map FIS/CLB addresses. If we do so, toggle the related engage and status bits off. It's a hack, but I don't know what the right thing to do is and don't have the time presently to investigate more deeply. > 0x7d00000f > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxSCTL] @ 0x2c: > 0x00000001 > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxSCTL] @ 0x2c: > 0x00000000 > ahci_reset_port ahci(0x7f6da4eb1400)[0]: reset port > ide_reset IDEstate 0x7f6da4eb39f0 > ide_reset IDEstate 0x7f6da4eb3dc0 > ahci_set_signature ahci(0x7f6da4eb1400)[0]: set signature sector:0x01 > nsector:0x01 lcyl:0x00 hcyl:0x00 (cumulatively: 0x00000101) > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PXIE] @ 0x14: > 0x7d40004f > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxSERR] @ 0x30: > 0xffffffff > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCMD] @ 0x18: > 0x00000016 > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: > 0x00000001 > ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: > 0x0000000000000001 > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxSERR] @ 0x30: > 0xffffffff > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: > 0xffffffff > ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: > 0x0000000000000001 > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PXIE] @ 0x14: > 0x7d40004f > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCMD] @ 0x18: > 0x00004017 > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: > 0x00000001 > handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: > 0x00: 27 80 ef 66 00 00 00 a0 00 00 00 00 00 00 00 00 > 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: > 0x00000001 > ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: > 0x0000000000000001 > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: > 0x00000001 > handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: > 0x00: 27 80 ef 66 00 00 00 a0 00 00 00 00 00 00 00 00 > 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: > 0x00000001 > ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: > 0x0000000000000001 > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: > 0x00000001 > handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: > 0x00: 27 80 f5 00 00 00 00 a0 00 00 00 00 00 00 00 00 > 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: > 0x00000001 > ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: > 0x0000000000000001 > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: > 0x00000001 > handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: > 0x00: 27 80 ef 02 00 00 00 a0 00 00 00 00 00 00 00 00 > 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: > 0x00000001 > ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: > 0x0000000000000001 > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: > 0x00100000 > handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: > 0x00: 27 80 42 00 00 00 00 e0 00 00 00 00 01 00 00 00 > 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: > 0x00000001 > ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: > 0x0000000000000001 > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: > 0x00200000 > handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: > 0x00: 27 80 b0 d8 01 4f c2 a0 00 00 00 00 01 00 00 00 > 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: > 0x00000001 > ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: > 0x0000000000000001 > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxCI] @ 0x38: > 0x00400000 > handle_cmd_fis_dump ahci(0x7f6da4eb1400)[0]: FIS: > 0x00: 27 80 ec 00 00 00 00 a0 00 00 00 00 00 00 00 00 > 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ahci_populate_sglist ahci(0x7f6da4eb1400)[0] > ahci_dma_prepare_buf ahci(0x7f6da4eb1400)[0]: prepare buf limit=512 > prepared=512 > ahci_start_transfer ahci(0x7f6da4eb1400)[0]: reading 512 bytes on ata w/ > sglist > ahci_cmd_done ahci(0x7f6da4eb1400)[0]: cmd done > ahci_port_write ahci(0x7f6da4eb1400)[0]: port write [reg:PxIS] @ 0x10: > 0x00000003 > ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: > 0x0000000000000001 > ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxSERR] @ 0x30: > 0xffffffff > ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PxIS] @ 0x10: > 0xffffffff > ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: > 0x0000000000000002 > ahci_port_write ahci(0x7f6da4eb1400)[1]: port write [reg:PXIE] @ 0x14: > 0x7d40004f > ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxSERR] @ 0x30: > 0xffffffff > ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PxIS] @ 0x10: > 0xffffffff > ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: > 0x0000000000000004 > ahci_port_write ahci(0x7f6da4eb1400)[2]: port write [reg:PXIE] @ 0x14: > 0x7d40004f > ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxSERR] @ 0x30: > 0xffffffff > ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PxIS] @ 0x10: > 0xffffffff > ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: > 0x0000000000000008 > ahci_port_write ahci(0x7f6da4eb1400)[3]: port write [reg:PXIE] @ 0x14: > 0x7d40004f > ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxSERR] @ 0x30: > 0xffffffff > ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PxIS] @ 0x10: > 0xffffffff > ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: > 0x0000000000000010 > ahci_port_write ahci(0x7f6da4eb1400)[4]: port write [reg:PXIE] @ 0x14: > 0x7d40004f > ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxCMD] @ 0x18: > 0x00000006 > ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxSERR] @ 0x30: > 0xffffffff > ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PxIS] @ 0x10: > 0xffffffff > ahci_mem_write_host ahci(0x7f6da4eb1400) write4 [reg:IS] @ 0x8: > 0x0000000000000020 > ahci_port_write ahci(0x7f6da4eb1400)[5]: port write [reg:PXIE] @ 0x14: > 0x7d40004f > ------------------------------------------------------------------- > > Windows 7, disabling SATA disk: > > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: > 0x00020000 > handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: > 0x00: 27 80 ea 00 00 00 00 a0 00 00 00 00 00 00 00 00 > 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: > 0xffffffff > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: > 0x00000001 > ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: > 0x0000000000000001 > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: > 0x00040000 > handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: > 0x00: 27 80 e0 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: > 0xffffffff > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: > 0x00000001 > ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: > 0x0000000000000001 > ------------------------------------------------------------------- > > Windows 7, re-engaging SATA disk: > > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: > 0x00002000 > handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: > 0x00: 27 80 ef 66 00 00 00 a0 00 00 00 00 00 00 00 00 > 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: > 0xffffffff > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: > 0x00000001 > ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: > 0x0000000000000001 > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: > 0x00004000 > handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: > 0x00: 27 80 f5 00 00 00 00 a0 00 00 00 00 00 00 00 00 > 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: > 0xffffffff > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: > 0x00000001 > ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: > 0x0000000000000001 > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: > 0x00008000 > handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: > 0x00: 27 80 ef 02 00 00 00 a0 00 00 00 00 00 00 00 00 > 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: > 0xffffffff > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: > 0x00000001 > ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: > 0x0000000000000001 > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxCI] @ 0x38: > 0x00010000 > handle_cmd_fis_dump ahci(0x7fcc4e19b4a0)[0]: FIS: > 0x00: 27 80 ec 00 00 00 00 a0 00 00 00 00 00 00 00 00 > 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ahci_populate_sglist ahci(0x7fcc4e19b4a0)[0] > ahci_dma_prepare_buf ahci(0x7fcc4e19b4a0)[0]: prepare buf limit=512 > prepared=512 > ahci_start_transfer ahci(0x7fcc4e19b4a0)[0]: reading 512 bytes on ata w/ > sglist > ahci_cmd_done ahci(0x7fcc4e19b4a0)[0]: cmd done > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxSERR] @ 0x30: > 0xffffffff > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: > 0x00000001 > ahci_port_write ahci(0x7fcc4e19b4a0)[0]: port write [reg:PxIS] @ 0x10: > 0x00000002 > ahci_mem_write_host ahci(0x7fcc4e19b4a0) write4 [reg:IS] @ 0x8: > 0x0000000000000001 > ------------------------------------------------------------------- > > > -- > Best regards, > Andy Chiu > > On 2019/9/10 上午2:13, John Snow wrote: >> On 9/9/19 1:18 PM, andychiu via Qemu-devel wrote: >>> If Windows 10 guests have enabled 'turn off hard disk after idle' >>> option in power settings, and the guest has a SATA disk plugged in, >>> the SATA disk will be turned off after a specified idle time. >>> If the guest is live migrated or saved/loaded with its SATA disk >>> turned off, the following error will occur: >>> >>> qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS >>> receive buffer address >>> qemu-system-x86_64: Failed to load ich9_ahci:ahci >>> qemu-system-x86_64: error while loading state for instance 0x0 of >>> device '0000:00:1a.0/ich9_ahci' >>> qemu-system-x86_64: load of migration failed: Operation not permitted >>> >> Oof. That can't have been fun to discover. >> >>> Observation from trace logs shows that a while after Windows 10 turns >>> off >>> a SATA disk (IDE disks don't have the following behavior), >>> it will disable the PCI_COMMAND_MASTER flag of the pci device containing >>> the ahci device. When the the disk is turning back on, >>> the PCI_COMMAND_MASTER flag will be restored first. >>> But if the guest is migrated or saved/loaded while the disk is off, >>> the post_load callback of ahci device, ahci_state_post_load(), will fail >>> at ahci_cond_start_engines() if the MemoryRegion >>> pci_dev->bus_master_enable_region is not enabled, with pci_dev pointing >>> to the PCIDevice struct containing the ahci device. >>> >>> This patch enables pci_dev->bus_master_enable_region before calling >>> ahci_cond_start_engines() in ahci_state_post_load(), and restore the >>> MemoryRegion to its original state afterwards.> >> This looks good to me from an AHCI perspective, but I'm not as clear on >> the implications of toggling the MemoryRegion, so I have some doubts. >> >> >> MST, can you chime in and clear my confusion? >> >> I suppose when the PCI_COMMAND_MASTER bit is turned off, we disable the >> memory region, as a guest would be unable to establish a new mapping in >> this time, so it makes sense that the attempt to map it fails. >> >> What's less clear to me is what happens to existing mappings when a >> region is disabled. Are they invalidated? If so, does it make sense that >> we are trying to establish a mapping here at all? Maybe it's absolutely >> correct that this fails. >> >> (I suppose, though, that the simple toggling of the region won't be a >> guest-visible event, so it's probably safe to do. Right?) >> >> What I find weird for AHCI is this: We try to engage the CLB mapping >> before the FIS mapping, but we fail at the FIS mapping. So why is >> PORT_CMD_FIS_RX set while PORT_CMD_START is unset? >> >> It kind of looks like we only half-heartedly stopped the AHCI device. >> Maybe that's just what Windows does, but I wonder if there's a bug where >> we're erroneously leaving PORT_CMD_FIS_RX set when we've been disabled. >> It seems like the guest would need to re-set the mappings anyway, so >> maybe trying to restore a stale mapping is not the right thing to do. >> >> >> >> Andy, if you have traces left over: What AHCI registers does Windows >> touch when it disables the AHCI device? What registers does it touch >> when it re-engages it? >> >> I just want to make sure I'm not leaving something dangling by accident. >> >> --js >> >>> Signed-off-by: andychiu<andychiu@synology.com> >>> --- >>> hw/ide/ahci.c | 53 >>> ++++++++++++++++++++++++++++++++++++----------------- >>> 1 file changed, 36 insertions(+), 17 deletions(-) >>> >>> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c >>> index d45393c..83f8c30 100644 >>> --- a/hw/ide/ahci.c >>> +++ b/hw/ide/ahci.c >>> @@ -1649,33 +1649,52 @@ static const VMStateDescription >>> vmstate_ahci_device = { >>> }, >>> }; >>> +static int ahci_state_load_engines(AHCIState *s, AHCIDevice *ad) >>> +{ >>> + AHCIPortRegs *pr = &ad->port_regs; >>> + DeviceState *dev_state = s->container; >>> + PCIDevice *pci_dev = (PCIDevice *) >>> object_dynamic_cast(OBJECT(dev_state), >>> + >>> TYPE_PCI_DEVICE); >>> + bool pci_bus_master_enabled = >>> pci_dev->bus_master_enable_region.enabled; >>> + >>> + if (!(pr->cmd & PORT_CMD_START) && (pr->cmd & PORT_CMD_LIST_ON)) { >>> + error_report("AHCI: DMA engine should be off, but status bit " >>> + "indicates it is still running."); >>> + return -1; >>> + } >>> + if (!(pr->cmd & PORT_CMD_FIS_RX) && (pr->cmd & PORT_CMD_FIS_ON)) { >>> + error_report("AHCI: FIS RX engine should be off, but status >>> bit " >>> + "indicates it is still running."); >>> + return -1; >>> + } >>> + >>> + memory_region_set_enabled(&pci_dev->bus_master_enable_region, >>> true); >>> + >>> + /* >>> + * After a migrate, the DMA/FIS engines are "off" and >>> + * need to be conditionally restarted >>> + */ >>> + pr->cmd &= ~(PORT_CMD_LIST_ON | PORT_CMD_FIS_ON); >>> + if (ahci_cond_start_engines(ad) != 0) { >>> + return -1; >>> + } >>> + memory_region_set_enabled(&pci_dev->bus_master_enable_region, >>> + pci_bus_master_enabled); >>> + >>> + return 0; >>> +} >>> + >>> static int ahci_state_post_load(void *opaque, int version_id) >>> { >>> int i, j; >>> struct AHCIDevice *ad; >>> NCQTransferState *ncq_tfs; >>> - AHCIPortRegs *pr; >>> AHCIState *s = opaque; >>> for (i = 0; i < s->ports; i++) { >>> ad = &s->dev[i]; >>> - pr = &ad->port_regs; >>> - >>> - if (!(pr->cmd & PORT_CMD_START) && (pr->cmd & >>> PORT_CMD_LIST_ON)) { >>> - error_report("AHCI: DMA engine should be off, but status >>> bit " >>> - "indicates it is still running."); >>> - return -1; >>> - } >>> - if (!(pr->cmd & PORT_CMD_FIS_RX) && (pr->cmd & >>> PORT_CMD_FIS_ON)) { >>> - error_report("AHCI: FIS RX engine should be off, but >>> status bit " >>> - "indicates it is still running."); >>> - return -1; >>> - } >>> - /* After a migrate, the DMA/FIS engines are "off" and >>> - * need to be conditionally restarted */ >>> - pr->cmd &= ~(PORT_CMD_LIST_ON | PORT_CMD_FIS_ON); >>> - if (ahci_cond_start_engines(ad) != 0) { >>> + if (ahci_state_load_engines(s, ad)) { >>> return -1; >>> } >>>
Hi Everyone, We are facing this issue. I see this conversation was never conversed and discussed issue is still active on QEMU master. Just for summary, the solution mentioned in this thread "temporarily enable bus master memory region" was not taken with the following justification. "Poking at PCI device internals like this seems fragile. And force enabling bus master can lead to unpleasantness like corrupting guest memory, unhandled interrupts, etc. E.g. it's quite reasonable, spec-wise, for the guest to move thing in memory around while bus mastering is off." There was an alternate solution mentioned in thread, that is to mark PORT_CMD_FIS_RX and PORT_CMD_START disabled forcefully if the bus master for the device is disabled. But there are no further conclusive discussions on this. I wanted to start this conversation again to hopefully get a conclusion for this. Thanks Manish Mishra On 10/09/19 7:38 pm, John Snow wrote: > > On 9/10/19 9:58 AM, Michael S. Tsirkin wrote: >> On Tue, Sep 10, 2019 at 09:50:41AM -0400, John Snow wrote: >>> >>> On 9/10/19 3:04 AM, Michael S. Tsirkin wrote: >>>> On Tue, Sep 10, 2019 at 01:18:37AM +0800, andychiu wrote: >>>>> If Windows 10 guests have enabled 'turn off hard disk after idle' >>>>> option in power settings, and the guest has a SATA disk plugged in, >>>>> the SATA disk will be turned off after a specified idle time. >>>>> If the guest is live migrated or saved/loaded with its SATA disk >>>>> turned off, the following error will occur: >>>>> >>>>> qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address >>>>> qemu-system-x86_64: Failed to load ich9_ahci:ahci >>>>> qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:1a.0/ich9_ahci' >>>>> qemu-system-x86_64: load of migration failed: Operation not permitted >>>>> >>>>> Observation from trace logs shows that a while after Windows 10 turns off >>>>> a SATA disk (IDE disks don't have the following behavior), >>>>> it will disable the PCI_COMMAND_MASTER flag of the pci device containing >>>>> the ahci device. When the the disk is turning back on, >>>>> the PCI_COMMAND_MASTER flag will be restored first. >>>>> But if the guest is migrated or saved/loaded while the disk is off, >>>>> the post_load callback of ahci device, ahci_state_post_load(), will fail >>>>> at ahci_cond_start_engines() if the MemoryRegion >>>>> pci_dev->bus_master_enable_region is not enabled, with pci_dev pointing >>>>> to the PCIDevice struct containing the ahci device. >>>>> >>>>> This patch enables pci_dev->bus_master_enable_region before calling >>>>> ahci_cond_start_engines() in ahci_state_post_load(), and restore the >>>>> MemoryRegion to its original state afterwards. >>>>> >>>>> Signed-off-by: andychiu<andychiu@synology.com> >>>> Poking at PCI device internals like this seems fragile. And force >>>> enabling bus master can lead to unpleasantness like corrupting guest >>>> memory, unhandled interrupts, etc. E.g. it's quite reasonable, >>>> spec-wise, for the guest to move thing in memory around while bus >>>> mastering is off. >>>> >>>> Can you teach ahci that region being disabled >>>> during migration is ok, and recover from it? >>> That's what I'm wondering. >>> >>> I could try to just disable the FIS RX engine if the mapping fails, but >>> that will require a change to guest visible state. >>> >>> My hunch, though, is that when windows re-enables the device it will >>> need to re-program the address registers anyway, so it might cope well >>> with the FIS RX bit getting switched off. >>> >>> (I'm wondering if it isn't a mistake that QEMU is trying to re-map this >>> address in the first place. Is it legal that the PCI device has pci bus >>> master disabled but we've held on to a mapping? >> If you are poking at guest memory when bus master is off, then most likely yes. >> >>> Should there be some >>> callback where AHCI knows to invalidate mappings at that point...?) >> ATM the callback is the config write, you check >> proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER >> and if disabled invalidate the mapping. >> >> virtio at least has code that pokes at >> proxy->pci_dev.config[PCI_COMMAND] too, I'm quite >> open to a function along the lines of >> pci_is_bus_master_enabled() >> that will do this. >> > Well, that's not a callback. I don't think it's right to check the > PCI_COMMAND register *every* time AHCI does anything at all to see if > its mappings are still valid. > > AHCI makes a mapping *once* when FIS RX is turned on, and it unmaps it > when it's turned off. It assumes it remains valid that whole time. When > we migrate, it checks to see if it was running, and performs the > mappings again to re-boot the state machine. > > What I'm asking is; what are the implications of a guest disabling > PCI_COMMAND_MASTER? (I don't know PCI as well as you do.) > > What should that mean for the AHCI state machine? > > Does this *necessarily* invalidate the mappings? > (In which case -- it's an error that AHCI held on to them after Windows > disabled the card, even if AHCI isn't being engaged by the guest > anymore. Essentially, we were turned off but didn't clean up a dangling > pointer, but we need the event that tells us to clean the dangling mapping.) >
On Tue, Sep 10, 2019 at 10:08:20AM -0400, John Snow wrote: > > > On 9/10/19 9:58 AM, Michael S. Tsirkin wrote: > > On Tue, Sep 10, 2019 at 09:50:41AM -0400, John Snow wrote: > >> > >> > >> On 9/10/19 3:04 AM, Michael S. Tsirkin wrote: > >>> On Tue, Sep 10, 2019 at 01:18:37AM +0800, andychiu wrote: > >>>> If Windows 10 guests have enabled 'turn off hard disk after idle' > >>>> option in power settings, and the guest has a SATA disk plugged in, > >>>> the SATA disk will be turned off after a specified idle time. > >>>> If the guest is live migrated or saved/loaded with its SATA disk > >>>> turned off, the following error will occur: > >>>> > >>>> qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address > >>>> qemu-system-x86_64: Failed to load ich9_ahci:ahci > >>>> qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:1a.0/ich9_ahci' > >>>> qemu-system-x86_64: load of migration failed: Operation not permitted > >>>> > >>>> Observation from trace logs shows that a while after Windows 10 turns off > >>>> a SATA disk (IDE disks don't have the following behavior), > >>>> it will disable the PCI_COMMAND_MASTER flag of the pci device containing > >>>> the ahci device. When the the disk is turning back on, > >>>> the PCI_COMMAND_MASTER flag will be restored first. > >>>> But if the guest is migrated or saved/loaded while the disk is off, > >>>> the post_load callback of ahci device, ahci_state_post_load(), will fail > >>>> at ahci_cond_start_engines() if the MemoryRegion > >>>> pci_dev->bus_master_enable_region is not enabled, with pci_dev pointing > >>>> to the PCIDevice struct containing the ahci device. > >>>> > >>>> This patch enables pci_dev->bus_master_enable_region before calling > >>>> ahci_cond_start_engines() in ahci_state_post_load(), and restore the > >>>> MemoryRegion to its original state afterwards. > >>>> > >>>> Signed-off-by: andychiu <andychiu@synology.com> > >>> > >>> Poking at PCI device internals like this seems fragile. And force > >>> enabling bus master can lead to unpleasantness like corrupting guest > >>> memory, unhandled interrupts, etc. E.g. it's quite reasonable, > >>> spec-wise, for the guest to move thing in memory around while bus > >>> mastering is off. > >>> > >>> Can you teach ahci that region being disabled > >>> during migration is ok, and recover from it? > >> > >> That's what I'm wondering. > >> > >> I could try to just disable the FIS RX engine if the mapping fails, but > >> that will require a change to guest visible state. > >> > >> My hunch, though, is that when windows re-enables the device it will > >> need to re-program the address registers anyway, so it might cope well > >> with the FIS RX bit getting switched off. > >> > >> (I'm wondering if it isn't a mistake that QEMU is trying to re-map this > >> address in the first place. Is it legal that the PCI device has pci bus > >> master disabled but we've held on to a mapping? > > > > If you are poking at guest memory when bus master is off, then most likely yes. > > > >> Should there be some > >> callback where AHCI knows to invalidate mappings at that point...?) > > > > ATM the callback is the config write, you check > > proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER > > and if disabled invalidate the mapping. > > > > virtio at least has code that pokes at > > proxy->pci_dev.config[PCI_COMMAND] too, I'm quite > > open to a function along the lines of > > pci_is_bus_master_enabled() > > that will do this. > > > > Well, that's not a callback. I don't think it's right to check the > PCI_COMMAND register *every* time AHCI does anything at all to see if > its mappings are still valid. > > AHCI makes a mapping *once* when FIS RX is turned on, and it unmaps it > when it's turned off. It assumes it remains valid that whole time. When > we migrate, it checks to see if it was running, and performs the > mappings again to re-boot the state machine. > > What I'm asking is; what are the implications of a guest disabling > PCI_COMMAND_MASTER? (I don't know PCI as well as you do.) The implication is that no reads or writes must be initiated by device: either memory or IO reads, or sending MSI. INT#x is unaffected. writes into device memory are unaffected. whether reads from device memory are affected kind of depends, but maybe not. Whether device caches anything internally has nothing to do with PCI_COMMAND_MASTER and PCI spec says nothing about it. Windows uses PCI_COMMAND_MASTER to emulate surprise removal so there's that. > What should that mean for the AHCI state machine? > > Does this *necessarily* invalidate the mappings? > (In which case -- it's an error that AHCI held on to them after Windows > disabled the card, even if AHCI isn't being engaged by the guest > anymore. Essentially, we were turned off but didn't clean up a dangling > pointer, but we need the event that tells us to clean the dangling mapping.) It does not have to but it must stop memory accesses through the mappings.
On 21/08/23 5:31 pm, manish.mishra wrote: > > Hi Everyone, > > We are facing this issue. I see this conversation was never conversed and discussed issue is still active on QEMU master. Just for summary, the solution mentioned in this thread "temporarily enable bus master memory region" was not taken with the following justification. > > "Poking at PCI device internals like this seems fragile. And force > > enabling bus master can lead to unpleasantness like corrupting guest > > memory, unhandled interrupts, etc. E.g. it's quite reasonable, > > spec-wise, for the guest to move thing in memory around while bus > > mastering is off." > > > There was an alternate solution mentioned in thread, that is to mark PORT_CMD_FIS_RX and PORT_CMD_START disabled forcefully if the bus master for the device is disabled. But there are no further conclusive discussions on this. > > > I wanted to start this conversation again to hopefully get a conclusion for this. > > Thanks > > Manish Mishra > > > On 10/09/19 7:38 pm, John Snow wrote: >> On 9/10/19 9:58 AM, Michael S. Tsirkin wrote: >>> On Tue, Sep 10, 2019 at 09:50:41AM -0400, John Snow wrote: >>>> On 9/10/19 3:04 AM, Michael S. Tsirkin wrote: >>>>> On Tue, Sep 10, 2019 at 01:18:37AM +0800, andychiu wrote: >>>>>> If Windows 10 guests have enabled 'turn off hard disk after idle' >>>>>> option in power settings, and the guest has a SATA disk plugged in, >>>>>> the SATA disk will be turned off after a specified idle time. >>>>>> If the guest is live migrated or saved/loaded with its SATA disk >>>>>> turned off, the following error will occur: >>>>>> >>>>>> qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address >>>>>> qemu-system-x86_64: Failed to load ich9_ahci:ahci >>>>>> qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:1a.0/ich9_ahci' >>>>>> qemu-system-x86_64: load of migration failed: Operation not permitted >>>>>> >>>>>> Observation from trace logs shows that a while after Windows 10 turns off >>>>>> a SATA disk (IDE disks don't have the following behavior), >>>>>> it will disable the PCI_COMMAND_MASTER flag of the pci device containing >>>>>> the ahci device. When the the disk is turning back on, >>>>>> the PCI_COMMAND_MASTER flag will be restored first. >>>>>> But if the guest is migrated or saved/loaded while the disk is off, >>>>>> the post_load callback of ahci device, ahci_state_post_load(), will fail >>>>>> at ahci_cond_start_engines() if the MemoryRegion >>>>>> pci_dev->bus_master_enable_region is not enabled, with pci_dev pointing >>>>>> to the PCIDevice struct containing the ahci device. >>>>>> >>>>>> This patch enables pci_dev->bus_master_enable_region before calling >>>>>> ahci_cond_start_engines() in ahci_state_post_load(), and restore the >>>>>> MemoryRegion to its original state afterwards. >>>>>> >>>>>> Signed-off-by: andychiu<andychiu@synology.com> >>>>> Poking at PCI device internals like this seems fragile. And force >>>>> enabling bus master can lead to unpleasantness like corrupting guest >>>>> memory, unhandled interrupts, etc. E.g. it's quite reasonable, >>>>> spec-wise, for the guest to move thing in memory around while bus >>>>> mastering is off. >>>>> >>>>> Can you teach ahci that region being disabled >>>>> during migration is ok, and recover from it? >>>> That's what I'm wondering. >>>> >>>> I could try to just disable the FIS RX engine if the mapping fails, but >>>> that will require a change to guest visible state. >>>> >>>> My hunch, though, is that when windows re-enables the device it will >>>> need to re-program the address registers anyway, so it might cope well >>>> with the FIS RX bit getting switched off. >>>> >>>> (I'm wondering if it isn't a mistake that QEMU is trying to re-map this >>>> address in the first place. Is it legal that the PCI device has pci bus >>>> master disabled but we've held on to a mapping? >>> If you are poking at guest memory when bus master is off, then most likely yes. >>> >>>> Should there be some >>>> callback where AHCI knows to invalidate mappings at that point...?) >>> ATM the callback is the config write, you check >>> proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER >>> and if disabled invalidate the mapping. >>> >>> virtio at least has code that pokes at >>> proxy->pci_dev.config[PCI_COMMAND] too, I'm quite >>> open to a function along the lines of >>> pci_is_bus_master_enabled() >>> that will do this. >>> >> Well, that's not a callback. I don't think it's right to check the >> PCI_COMMAND register *every* time AHCI does anything at all to see if >> its mappings are still valid. >> >> AHCI makes a mapping *once* when FIS RX is turned on, and it unmaps it >> when it's turned off. It assumes it remains valid that whole time. When >> we migrate, it checks to see if it was running, and performs the >> mappings again to re-boot the state machine. >> >> What I'm asking is; what are the implications of a guest disabling >> PCI_COMMAND_MASTER? (I don't know PCI as well as you do.) >> >> What should that mean for the AHCI state machine? >> >> Does this *necessarily* invalidate the mappings? >> (In which case -- it's an error that AHCI held on to them after Windows >> disabled the card, even if AHCI isn't being engaged by the guest >> anymore. Essentially, we were turned off but didn't clean up a dangling >> pointer, but we need the event that tells us to clean the dangling mapping.) >> Hi Everyone, I would appreciate some response on this. :) Thanks Manish Mishra
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c index d45393c..83f8c30 100644 --- a/hw/ide/ahci.c +++ b/hw/ide/ahci.c @@ -1649,33 +1649,52 @@ static const VMStateDescription vmstate_ahci_device = { }, }; +static int ahci_state_load_engines(AHCIState *s, AHCIDevice *ad) +{ + AHCIPortRegs *pr = &ad->port_regs; + DeviceState *dev_state = s->container; + PCIDevice *pci_dev = (PCIDevice *) object_dynamic_cast(OBJECT(dev_state), + TYPE_PCI_DEVICE); + bool pci_bus_master_enabled = pci_dev->bus_master_enable_region.enabled; + + if (!(pr->cmd & PORT_CMD_START) && (pr->cmd & PORT_CMD_LIST_ON)) { + error_report("AHCI: DMA engine should be off, but status bit " + "indicates it is still running."); + return -1; + } + if (!(pr->cmd & PORT_CMD_FIS_RX) && (pr->cmd & PORT_CMD_FIS_ON)) { + error_report("AHCI: FIS RX engine should be off, but status bit " + "indicates it is still running."); + return -1; + } + + memory_region_set_enabled(&pci_dev->bus_master_enable_region, true); + + /* + * After a migrate, the DMA/FIS engines are "off" and + * need to be conditionally restarted + */ + pr->cmd &= ~(PORT_CMD_LIST_ON | PORT_CMD_FIS_ON); + if (ahci_cond_start_engines(ad) != 0) { + return -1; + } + memory_region_set_enabled(&pci_dev->bus_master_enable_region, + pci_bus_master_enabled); + + return 0; +} + static int ahci_state_post_load(void *opaque, int version_id) { int i, j; struct AHCIDevice *ad; NCQTransferState *ncq_tfs; - AHCIPortRegs *pr; AHCIState *s = opaque; for (i = 0; i < s->ports; i++) { ad = &s->dev[i]; - pr = &ad->port_regs; - - if (!(pr->cmd & PORT_CMD_START) && (pr->cmd & PORT_CMD_LIST_ON)) { - error_report("AHCI: DMA engine should be off, but status bit " - "indicates it is still running."); - return -1; - } - if (!(pr->cmd & PORT_CMD_FIS_RX) && (pr->cmd & PORT_CMD_FIS_ON)) { - error_report("AHCI: FIS RX engine should be off, but status bit " - "indicates it is still running."); - return -1; - } - /* After a migrate, the DMA/FIS engines are "off" and - * need to be conditionally restarted */ - pr->cmd &= ~(PORT_CMD_LIST_ON | PORT_CMD_FIS_ON); - if (ahci_cond_start_engines(ad) != 0) { + if (ahci_state_load_engines(s, ad)) { return -1; }
If Windows 10 guests have enabled 'turn off hard disk after idle' option in power settings, and the guest has a SATA disk plugged in, the SATA disk will be turned off after a specified idle time. If the guest is live migrated or saved/loaded with its SATA disk turned off, the following error will occur: qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address qemu-system-x86_64: Failed to load ich9_ahci:ahci qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:1a.0/ich9_ahci' qemu-system-x86_64: load of migration failed: Operation not permitted Observation from trace logs shows that a while after Windows 10 turns off a SATA disk (IDE disks don't have the following behavior), it will disable the PCI_COMMAND_MASTER flag of the pci device containing the ahci device. When the the disk is turning back on, the PCI_COMMAND_MASTER flag will be restored first. But if the guest is migrated or saved/loaded while the disk is off, the post_load callback of ahci device, ahci_state_post_load(), will fail at ahci_cond_start_engines() if the MemoryRegion pci_dev->bus_master_enable_region is not enabled, with pci_dev pointing to the PCIDevice struct containing the ahci device. This patch enables pci_dev->bus_master_enable_region before calling ahci_cond_start_engines() in ahci_state_post_load(), and restore the MemoryRegion to its original state afterwards. Signed-off-by: andychiu <andychiu@synology.com> --- hw/ide/ahci.c | 53 ++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 36 insertions(+), 17 deletions(-)