Message ID | 20240109215145.430207-11-stewart.hildebrand@amd.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | PCI devices passthrough on Arm, part 3 | expand |
On 09.01.2024 22:51, Stewart Hildebrand wrote: > --- a/xen/drivers/vpci/header.c > +++ b/xen/drivers/vpci/header.c > @@ -168,6 +168,9 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd, > if ( !rom_only ) > { > pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd); > + /* Show DomU that we updated P2M */ > + header->guest_cmd &= ~PCI_COMMAND_MEMORY; > + header->guest_cmd |= cmd & PCI_COMMAND_MEMORY; > header->bars_mapped = map; > } I don't follow what the comment means to say. The bit in question has no real connection to the P2M, and the guest also may have no notion of the underlying hypervisor's internals. Likely connected to ... > @@ -524,9 +527,26 @@ static void cf_check cmd_write( > { > struct vpci_header *header = data; > > + if ( !is_hardware_domain(pdev->domain) ) > + { > + const struct vpci *vpci = pdev->vpci; > + > + if ( (vpci->msi && vpci->msi->enabled) || > + (vpci->msix && vpci->msix->enabled) ) > + cmd |= PCI_COMMAND_INTX_DISABLE; > + > + /* > + * Do not show change to PCI_COMMAND_MEMORY bit until we finish > + * modifying P2M mappings. > + */ > + header->guest_cmd = (cmd & ~PCI_COMMAND_MEMORY) | > + (header->guest_cmd & PCI_COMMAND_MEMORY); > + } ... the comment here, but then shouldn't it be that the guest can't even issue a 2nd cfg space access until the present write has been carried out? Otherwise I'd be inclined to claim that such a partial update is unlikely to be spec-conformant. > @@ -843,6 +885,15 @@ static int cf_check init_header(struct pci_dev *pdev) > if ( cmd & PCI_COMMAND_MEMORY ) > pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY); > > + /* > + * Clear PCI_COMMAND_MEMORY and PCI_COMMAND_IO for DomUs, so they will > + * always start with memory decoding disabled and to ensure that we will not > + * call modify_bars() at the end of this function. > + */ > + if ( !is_hwdom ) > + cmd &= ~(PCI_COMMAND_MEMORY | PCI_COMMAND_IO); > + header->guest_cmd = cmd; With PCI_COMMAND_MEMORY clear, the hw reg won't further be written on the success return path. Yet wouldn't we better clear PCI_COMMAND_IO also in hardware (until we properly support it)? I also think the insertion point for the new code isn't well chosen: The comment just out of context indicates that the code in context above is connected to the subsequent code. Whereas the addition is not. > --- a/xen/drivers/vpci/msi.c > +++ b/xen/drivers/vpci/msi.c > @@ -70,6 +70,15 @@ static void cf_check control_write( > > if ( vpci_msi_arch_enable(msi, pdev, vectors) ) > return; > + > + /* > + * Make sure domU doesn't enable INTx while enabling MSI. > + */ Nit: This ought to be a single line comment, just like ... > + if ( !is_hardware_domain(pdev->domain) ) > + { > + pci_intx(pdev, false); > + pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE; > + } > } > else > vpci_msi_arch_disable(msi, pdev); > --- a/xen/drivers/vpci/msix.c > +++ b/xen/drivers/vpci/msix.c > @@ -135,6 +135,13 @@ static void cf_check control_write( > } > } > > + /* Make sure domU doesn't enable INTx while enabling MSI-X. */ > + if ( new_enabled && !msix->enabled && !is_hardware_domain(pdev->domain) ) > + { > + pci_intx(pdev, false); > + pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE; > + } ... the similar code here has it. In both cases, is it really appropriate to set the bit in guest view? Jan
On 1/25/24 10:43, Jan Beulich wrote: > On 09.01.2024 22:51, Stewart Hildebrand wrote: >> --- a/xen/drivers/vpci/header.c >> +++ b/xen/drivers/vpci/header.c >> @@ -168,6 +168,9 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd, >> if ( !rom_only ) >> { >> pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd); >> + /* Show DomU that we updated P2M */ >> + header->guest_cmd &= ~PCI_COMMAND_MEMORY; >> + header->guest_cmd |= cmd & PCI_COMMAND_MEMORY; >> header->bars_mapped = map; >> } > > I don't follow what the comment means to say. The bit in question has no > real connection to the P2M, and the guest also may have no notion of the > underlying hypervisor's internals. Likely connected to ... Indeed. If the comment survives to v13, I'll update it to: /* Now that we updated P2M, show DomU change to PCI_COMMAND_MEMORY */ > >> @@ -524,9 +527,26 @@ static void cf_check cmd_write( >> { >> struct vpci_header *header = data; >> >> + if ( !is_hardware_domain(pdev->domain) ) >> + { >> + const struct vpci *vpci = pdev->vpci; >> + >> + if ( (vpci->msi && vpci->msi->enabled) || >> + (vpci->msix && vpci->msix->enabled) ) >> + cmd |= PCI_COMMAND_INTX_DISABLE; >> + >> + /* >> + * Do not show change to PCI_COMMAND_MEMORY bit until we finish >> + * modifying P2M mappings. >> + */ >> + header->guest_cmd = (cmd & ~PCI_COMMAND_MEMORY) | >> + (header->guest_cmd & PCI_COMMAND_MEMORY); >> + } > > ... the comment here, but then shouldn't it be that the guest can't even > issue a 2nd cfg space access until the present write has been carried out? > Otherwise I'd be inclined to claim that such a partial update is unlikely > to be spec-conformant. Due to the raise_softirq() call added in 3e568fa9e19c ("vpci: fix deferral of long operations") my current understanding is: when the guest toggles memory decoding, the guest vcpu doesn't resume execution until vpci_process_pending() and modify_decoding() have finished. So I think the guest should see a consistent state of the register, unless it was trying to read from a different vcpu than the one doing the writing. Regardless, if the guest did have an opportunity to successfully read the partially updated state of the register, I'm not really spotting what part of the spec that would be a violation of. PCIe 6.1 has this description regarding the bit: "When this bit is Set" and "When this bit is Clear" the device will decode (or not) memory accesses. The spec doesn't seem to distinguish whether the host or the device itself is the one to set/clear the bit. One might even try to argue the opposite: allowing the bit to be toggled before the device reflects the change would be a violation of spec. Since the spec is ambiguous in this regard, I don't think either argument is particularly strong. Chesterton's fence: the logic for deferring the update of PCI_COMMAND_MEMORY in guest_cmd was added between v10 and v11 of this series. I went back to look at the review comments on v10 [1], but the rationale is still not entirely clear to me. At the end of the day, with the information I have at hand, I suspect it would be fine either way (whether updating guest_cmd is deferred or not). If no other info comes to light, I'm leaning toward not deferring because it would be simpler to update the bit right away in cmd_write(). [1] https://lore.kernel.org/xen-devel/ZVy73iJ3E8nJHvgf@macbook.local/ > >> @@ -843,6 +885,15 @@ static int cf_check init_header(struct pci_dev *pdev) >> if ( cmd & PCI_COMMAND_MEMORY ) >> pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY); >> >> + /* >> + * Clear PCI_COMMAND_MEMORY and PCI_COMMAND_IO for DomUs, so they will >> + * always start with memory decoding disabled and to ensure that we will not >> + * call modify_bars() at the end of this function. >> + */ >> + if ( !is_hwdom ) >> + cmd &= ~(PCI_COMMAND_MEMORY | PCI_COMMAND_IO); >> + header->guest_cmd = cmd; > > With PCI_COMMAND_MEMORY clear, the hw reg won't further be written on the > success return path. Yet wouldn't we better clear PCI_COMMAND_IO also in > hardware (until we properly support it)? Yes, I'll clear PCI_COMMAND_IO in hardware too > > I also think the insertion point for the new code isn't well chosen: The > comment just out of context indicates that the code in context above is > connected to the subsequent code. Whereas the addition is not. I'll rearrange it > >> --- a/xen/drivers/vpci/msi.c >> +++ b/xen/drivers/vpci/msi.c >> @@ -70,6 +70,15 @@ static void cf_check control_write( >> >> if ( vpci_msi_arch_enable(msi, pdev, vectors) ) >> return; >> + >> + /* >> + * Make sure domU doesn't enable INTx while enabling MSI. >> + */ > > Nit: This ought to be a single line comment, just like ... OK, I'll make it a single line > >> + if ( !is_hardware_domain(pdev->domain) ) >> + { >> + pci_intx(pdev, false); >> + pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE; >> + } >> } >> else >> vpci_msi_arch_disable(msi, pdev); >> --- a/xen/drivers/vpci/msix.c >> +++ b/xen/drivers/vpci/msix.c >> @@ -135,6 +135,13 @@ static void cf_check control_write( >> } >> } >> >> + /* Make sure domU doesn't enable INTx while enabling MSI-X. */ >> + if ( new_enabled && !msix->enabled && !is_hardware_domain(pdev->domain) ) >> + { >> + pci_intx(pdev, false); >> + pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE; >> + } > > ... the similar code here has it. > > In both cases, is it really appropriate to set the bit in guest view? I added this based on Roger's comment at [2]. Roger, what do you think? I don't believe QEMU updates the guest view in this manner. [2] https://lore.kernel.org/xen-devel/ZLqI65gmNj1XDBm4@MacBook-Air-de-Roger.local/
On 01.02.2024 05:50, Stewart Hildebrand wrote: > On 1/25/24 10:43, Jan Beulich wrote: >> On 09.01.2024 22:51, Stewart Hildebrand wrote: >>> --- a/xen/drivers/vpci/header.c >>> +++ b/xen/drivers/vpci/header.c >>> @@ -168,6 +168,9 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd, >>> if ( !rom_only ) >>> { >>> pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd); >>> + /* Show DomU that we updated P2M */ >>> + header->guest_cmd &= ~PCI_COMMAND_MEMORY; >>> + header->guest_cmd |= cmd & PCI_COMMAND_MEMORY; >>> header->bars_mapped = map; >>> } >> >> I don't follow what the comment means to say. The bit in question has no >> real connection to the P2M, and the guest also may have no notion of the >> underlying hypervisor's internals. Likely connected to ... > > Indeed. If the comment survives to v13, I'll update it to: > > /* Now that we updated P2M, show DomU change to PCI_COMMAND_MEMORY */ > >> >>> @@ -524,9 +527,26 @@ static void cf_check cmd_write( >>> { >>> struct vpci_header *header = data; >>> >>> + if ( !is_hardware_domain(pdev->domain) ) >>> + { >>> + const struct vpci *vpci = pdev->vpci; >>> + >>> + if ( (vpci->msi && vpci->msi->enabled) || >>> + (vpci->msix && vpci->msix->enabled) ) >>> + cmd |= PCI_COMMAND_INTX_DISABLE; >>> + >>> + /* >>> + * Do not show change to PCI_COMMAND_MEMORY bit until we finish >>> + * modifying P2M mappings. >>> + */ >>> + header->guest_cmd = (cmd & ~PCI_COMMAND_MEMORY) | >>> + (header->guest_cmd & PCI_COMMAND_MEMORY); >>> + } >> >> ... the comment here, but then shouldn't it be that the guest can't even >> issue a 2nd cfg space access until the present write has been carried out? >> Otherwise I'd be inclined to claim that such a partial update is unlikely >> to be spec-conformant. > > Due to the raise_softirq() call added in > > 3e568fa9e19c ("vpci: fix deferral of long operations") > > my current understanding is: when the guest toggles memory decoding, the guest vcpu doesn't resume execution until vpci_process_pending() and modify_decoding() have finished. So I think the guest should see a consistent state of the register, unless it was trying to read from a different vcpu than the one doing the writing. > > Regardless, if the guest did have an opportunity to successfully read the partially updated state of the register, I'm not really spotting what part of the spec that would be a violation of. PCIe 6.1 has this description regarding the bit: "When this bit is Set" and "When this bit is Clear" the device will decode (or not) memory accesses. The spec doesn't seem to distinguish whether the host or the device itself is the one to set/clear the bit. One might even try to argue the opposite: allowing the bit to be toggled before the device reflects the change would be a violation of spec. Since the spec is ambiguous in this regard, I don't think either argument is particularly strong. > > Chesterton's fence: the logic for deferring the update of PCI_COMMAND_MEMORY in guest_cmd was added between v10 and v11 of this series. I went back to look at the review comments on v10 [1], but the rationale is still not entirely clear to me. Indeed. The only sentence possibly hinting in such a direction would imo have been "I'm kind of unsure whether we want to fake the guest view by returning what the guest writes." It's unclear to me whether it really was meant that way. > At the end of the day, with the information I have at hand, I suspect it would be fine either way (whether updating guest_cmd is deferred or not). If no other info comes to light, I'm leaning toward not deferring because it would be simpler to update the bit right away in cmd_write(). I'm not sure it would be fine either way. Config space writes aren't posted writes, so they complete synchronously. IOW whatever internal state updates are needed in the device, they ought to have finished by the time the write completes. > [1] https://lore.kernel.org/xen-devel/ZVy73iJ3E8nJHvgf@macbook.local/ > >>[...] >>> --- a/xen/drivers/vpci/msix.c >>> +++ b/xen/drivers/vpci/msix.c >>> @@ -135,6 +135,13 @@ static void cf_check control_write( >>> } >>> } >>> >>> + /* Make sure domU doesn't enable INTx while enabling MSI-X. */ >>> + if ( new_enabled && !msix->enabled && !is_hardware_domain(pdev->domain) ) >>> + { >>> + pci_intx(pdev, false); >>> + pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE; >>> + } >> >> ... the similar code here has it. >> >> In both cases, is it really appropriate to set the bit in guest view? > > I added this based on Roger's comment at [2]. Roger, what do you think? I don't believe QEMU updates the guest view in this manner. > > [2] https://lore.kernel.org/xen-devel/ZLqI65gmNj1XDBm4@MacBook-Air-de-Roger.local/ Leaving this for Roger to answer. Jan
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c index f0b0b64b0929..374e8e119231 100644 --- a/xen/drivers/vpci/header.c +++ b/xen/drivers/vpci/header.c @@ -168,6 +168,9 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd, if ( !rom_only ) { pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd); + /* Show DomU that we updated P2M */ + header->guest_cmd &= ~PCI_COMMAND_MEMORY; + header->guest_cmd |= cmd & PCI_COMMAND_MEMORY; header->bars_mapped = map; } else @@ -524,9 +527,26 @@ static void cf_check cmd_write( { struct vpci_header *header = data; + if ( !is_hardware_domain(pdev->domain) ) + { + const struct vpci *vpci = pdev->vpci; + + if ( (vpci->msi && vpci->msi->enabled) || + (vpci->msix && vpci->msix->enabled) ) + cmd |= PCI_COMMAND_INTX_DISABLE; + + /* + * Do not show change to PCI_COMMAND_MEMORY bit until we finish + * modifying P2M mappings. + */ + header->guest_cmd = (cmd & ~PCI_COMMAND_MEMORY) | + (header->guest_cmd & PCI_COMMAND_MEMORY); + } + /* * Let Dom0 play with all the bits directly except for the memory - * decoding one. + * decoding one. Bits that are not allowed for DomU are already + * handled above and by the rsvdp_mask. */ if ( header->bars_mapped != !!(cmd & PCI_COMMAND_MEMORY) ) /* @@ -540,6 +560,14 @@ static void cf_check cmd_write( pci_conf_write16(pdev->sbdf, reg, cmd); } +static uint32_t cf_check guest_cmd_read( + const struct pci_dev *pdev, unsigned int reg, void *data) +{ + const struct vpci_header *header = data; + + return header->guest_cmd; +} + static void cf_check bar_write( const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data) { @@ -756,9 +784,23 @@ static int cf_check init_header(struct pci_dev *pdev) return -EOPNOTSUPP; } - /* Setup a handler for the command register. */ - rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND, - 2, header); + /* + * Setup a handler for the command register. + * + * TODO: If support for emulated bits is added, re-visit how to handle + * PCI_COMMAND_PARITY, PCI_COMMAND_SERR, and PCI_COMMAND_FAST_BACK. + */ + rc = vpci_add_register_mask(pdev->vpci, + is_hwdom ? vpci_hw_read16 : guest_cmd_read, + cmd_write, PCI_COMMAND, 2, header, 0, 0, + PCI_COMMAND_RSVDP_MASK | + (is_hwdom ? 0 + : PCI_COMMAND_IO | + PCI_COMMAND_PARITY | + PCI_COMMAND_WAIT | + PCI_COMMAND_SERR | + PCI_COMMAND_FAST_BACK), + 0); if ( rc ) return rc; @@ -843,6 +885,15 @@ static int cf_check init_header(struct pci_dev *pdev) if ( cmd & PCI_COMMAND_MEMORY ) pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY); + /* + * Clear PCI_COMMAND_MEMORY and PCI_COMMAND_IO for DomUs, so they will + * always start with memory decoding disabled and to ensure that we will not + * call modify_bars() at the end of this function. + */ + if ( !is_hwdom ) + cmd &= ~(PCI_COMMAND_MEMORY | PCI_COMMAND_IO); + header->guest_cmd = cmd; + for ( i = 0; i < num_bars; i++ ) { uint8_t reg = PCI_BASE_ADDRESS_0 + i * 4; diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c index 6ff71e5f9ab7..aae8055ce92d 100644 --- a/xen/drivers/vpci/msi.c +++ b/xen/drivers/vpci/msi.c @@ -70,6 +70,15 @@ static void cf_check control_write( if ( vpci_msi_arch_enable(msi, pdev, vectors) ) return; + + /* + * Make sure domU doesn't enable INTx while enabling MSI. + */ + if ( !is_hardware_domain(pdev->domain) ) + { + pci_intx(pdev, false); + pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE; + } } else vpci_msi_arch_disable(msi, pdev); diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c index b6abab47efdd..d4f756ce287a 100644 --- a/xen/drivers/vpci/msix.c +++ b/xen/drivers/vpci/msix.c @@ -135,6 +135,13 @@ static void cf_check control_write( } } + /* Make sure domU doesn't enable INTx while enabling MSI-X. */ + if ( new_enabled && !msix->enabled && !is_hardware_domain(pdev->domain) ) + { + pci_intx(pdev, false); + pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE; + } + msix->masked = new_masked; msix->enabled = new_enabled; diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h index 9909b27425a5..b2f2e43e864d 100644 --- a/xen/include/xen/pci_regs.h +++ b/xen/include/xen/pci_regs.h @@ -48,6 +48,7 @@ #define PCI_COMMAND_SERR 0x100 /* Enable SERR */ #define PCI_COMMAND_FAST_BACK 0x200 /* Enable back-to-back writes */ #define PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */ +#define PCI_COMMAND_RSVDP_MASK 0xf800 #define PCI_STATUS 0x06 /* 16 bits */ #define PCI_STATUS_IMM_READY 0x01 /* Immediate Readiness */ diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h index e89c571890b2..77320a667e55 100644 --- a/xen/include/xen/vpci.h +++ b/xen/include/xen/vpci.h @@ -107,6 +107,9 @@ struct vpci { } bars[PCI_HEADER_NORMAL_NR_BARS + 1]; /* At most 6 BARS + 1 expansion ROM BAR. */ + /* Guest (domU only) view of the PCI_COMMAND register. */ + uint16_t guest_cmd; + /* * Store whether the ROM enable bit is set (doesn't imply ROM BAR * is mapped into guest p2m) if there's a ROM BAR on the device.