Message ID | 20221114192011.1539233-2-marmarek@invisiblethingslab.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] hw/xen/xen_pt: Call default handler only if no custom one is set | expand |
On 14/11/2022 19:20, Marek Marczykowski-Górecki wrote: > The /dev/mem is used for two purposes: > - reading PCI_MSIX_ENTRY_CTRL_MASKBIT > - reading Pending Bit Array (PBA) > > The first one was originally done because when Xen did not send all > vector ctrl writes to the device model, so QEMU might have outdated old > register value. This has been changed in Xen, so QEMU can now use its > cached value of the register instead. > > The Pending Bit Array (PBA) handling is for the case where it lives on > the same page as the MSI-X table itself. Xen has been extended to handle > this case too (as well as other registers that may live on those pages), > so QEMU handling is not necessary anymore. > > Removing /dev/mem access is useful to work within stubdomain, and > necessary when dom0 kernel runs in lockdown mode. > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> The commit message ought to go further. Using /dev/mem like this is buggy anyway, because it is trapped and emulated by Xen in whatever context Qemu is running. Qemu cannot get the actual hardware value, and even if it could, it would be racy with transient operations needing to mask the vector. i.e. it's not just nice-to-remote - it's fixing real corner cases. ~Andrew
On Mon, Nov 14, 2022 at 07:39:48PM +0000, Andrew Cooper wrote: > On 14/11/2022 19:20, Marek Marczykowski-Górecki wrote: > > The /dev/mem is used for two purposes: > > - reading PCI_MSIX_ENTRY_CTRL_MASKBIT > > - reading Pending Bit Array (PBA) > > > > The first one was originally done because when Xen did not send all > > vector ctrl writes to the device model, so QEMU might have outdated old > > register value. This has been changed in Xen, so QEMU can now use its > > cached value of the register instead. I should have noted that "has been changed in Xen" isn't fully accurate (yet). It refers to this patch: https://lore.kernel.org/xen-devel/20221114192100.1539267-2-marmarek@invisiblethingslab.com/T/#u > > The Pending Bit Array (PBA) handling is for the case where it lives on > > the same page as the MSI-X table itself. Xen has been extended to handle > > this case too (as well as other registers that may live on those pages), > > so QEMU handling is not necessary anymore. > > > > Removing /dev/mem access is useful to work within stubdomain, and > > necessary when dom0 kernel runs in lockdown mode. > > > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > > The commit message ought to go further. Using /dev/mem like this is > buggy anyway, because it is trapped and emulated by Xen in whatever > context Qemu is running. Qemu cannot get the actual hardware value, and > even if it could, it would be racy with transient operations needing to > mask the vector. > > i.e. it's not just nice-to-remote - it's fixing real corner cases. Good point, I'll extend it. But for this to work, the Xen patch needs to go in first (which won't happen for 4.17). And also, upstream QEMU probably needs some way to detect whether Xen has the change or not - to work with older versions too.
On 14.11.2022 20:20, Marek Marczykowski-Górecki wrote: > The /dev/mem is used for two purposes: > - reading PCI_MSIX_ENTRY_CTRL_MASKBIT > - reading Pending Bit Array (PBA) > > The first one was originally done because when Xen did not send all > vector ctrl writes to the device model, so QEMU might have outdated old > register value. This has been changed in Xen, so QEMU can now use its > cached value of the register instead. > > The Pending Bit Array (PBA) handling is for the case where it lives on > the same page as the MSI-X table itself. Xen has been extended to handle > this case too (as well as other registers that may live on those pages), > so QEMU handling is not necessary anymore. Don't you need to check for new enough Xen for both aspects? Jan
On Tue, Nov 15, 2022 at 09:14:07AM +0100, Jan Beulich wrote: > On 14.11.2022 20:20, Marek Marczykowski-Górecki wrote: > > The /dev/mem is used for two purposes: > > - reading PCI_MSIX_ENTRY_CTRL_MASKBIT > > - reading Pending Bit Array (PBA) > > > > The first one was originally done because when Xen did not send all > > vector ctrl writes to the device model, so QEMU might have outdated old > > register value. This has been changed in Xen, so QEMU can now use its > > cached value of the register instead. > > > > The Pending Bit Array (PBA) handling is for the case where it lives on > > the same page as the MSI-X table itself. Xen has been extended to handle > > this case too (as well as other registers that may live on those pages), > > so QEMU handling is not necessary anymore. > > Don't you need to check for new enough Xen for both aspects? Yes, see my response to Andrew in the thread. I'm open for suggestions what to check specifically (Xen version directly?).
On 15.11.2022 12:38, Marek Marczykowski-Górecki wrote: > On Tue, Nov 15, 2022 at 09:14:07AM +0100, Jan Beulich wrote: >> On 14.11.2022 20:20, Marek Marczykowski-Górecki wrote: >>> The /dev/mem is used for two purposes: >>> - reading PCI_MSIX_ENTRY_CTRL_MASKBIT >>> - reading Pending Bit Array (PBA) >>> >>> The first one was originally done because when Xen did not send all >>> vector ctrl writes to the device model, so QEMU might have outdated old >>> register value. This has been changed in Xen, so QEMU can now use its >>> cached value of the register instead. >>> >>> The Pending Bit Array (PBA) handling is for the case where it lives on >>> the same page as the MSI-X table itself. Xen has been extended to handle >>> this case too (as well as other registers that may live on those pages), >>> so QEMU handling is not necessary anymore. >> >> Don't you need to check for new enough Xen for both aspects? > > Yes, see my response to Andrew in the thread. I'm open for suggestions > what to check specifically (Xen version directly?). I guess we should first see what changes we (you) end up doing in Xen itself, and then decide whether to surface the new functionality in some kind of feature indicator. Generally I'd prefer to avoid version checks, because they don't fit very well with backports nor the (rare) need to revert something. But sometimes a feature can be probed easily without requiring an explicit feature indicator. Jan
On Mon, Nov 14, 2022 at 2:21 PM Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> wrote: > > The /dev/mem is used for two purposes: > - reading PCI_MSIX_ENTRY_CTRL_MASKBIT > - reading Pending Bit Array (PBA) > > The first one was originally done because when Xen did not send all > vector ctrl writes to the device model, so QEMU might have outdated old > register value. This has been changed in Xen, so QEMU can now use its > cached value of the register instead. > > The Pending Bit Array (PBA) handling is for the case where it lives on > the same page as the MSI-X table itself. Xen has been extended to handle > this case too (as well as other registers that may live on those pages), > so QEMU handling is not necessary anymore. > > Removing /dev/mem access is useful to work within stubdomain, and > necessary when dom0 kernel runs in lockdown mode. > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> I put the Xen, QEMU, and xen-pciback patches into OpenXT and gave a little test. When pci_permissive=0, iwlwifi fails to load its firmware. With pci_permissive=1, it looks like MSI-X is enabled. (I previously included your libxl allow_interrupt_control patch - that seemed to get regular MSIs working prior to the MSI-X patches.) I also removed the OpenXT equivalent of 0005-Disable-MSI-X-caps.patch. I am testing with Linux 5.4.y, so that could be another factor. One strange thing is the lspci output. Dom0 shows MSI-X enabled. Meanwhile NDVM (sys-net) does not show the MSI-X capability. If you `hexdump -C /sys/bus/pci/devices/$dev/config` you can see MSI-X enabled, but you also see that the MSI capability has 00 as the next pointer, so lspci stops parsing. MSI cap stubdom: 00000040 10 00 92 00 c0 0e 00 00 10 0c 10 00 00 00 00 00 |................| 0x41 -> next 0x00 MSI cap dom0: 00000040 10 80 92 00 c0 0e 00 10 10 0c 10 00 00 00 00 00 |................| 0x41 -> next 0x80 MSI-X: 00000080 11 00 0f 80 00 20 00 00 00 30 00 00 00 00 00 00 AFAIU, the value 0x80 at offset 0x83 is MSI-X Enabled. I had a boot where assignment failed with the hypervisor printing: d12: assign (0000:00:14.3) failed (-16) Rebooting the laptop seemed to clear that. Regards, Jason
On Wed, Nov 16, 2022 at 02:15:22PM -0500, Jason Andryuk wrote: > On Mon, Nov 14, 2022 at 2:21 PM Marek Marczykowski-Górecki > <marmarek@invisiblethingslab.com> wrote: > > > > The /dev/mem is used for two purposes: > > - reading PCI_MSIX_ENTRY_CTRL_MASKBIT > > - reading Pending Bit Array (PBA) > > > > The first one was originally done because when Xen did not send all > > vector ctrl writes to the device model, so QEMU might have outdated old > > register value. This has been changed in Xen, so QEMU can now use its > > cached value of the register instead. > > > > The Pending Bit Array (PBA) handling is for the case where it lives on > > the same page as the MSI-X table itself. Xen has been extended to handle > > this case too (as well as other registers that may live on those pages), > > so QEMU handling is not necessary anymore. > > > > Removing /dev/mem access is useful to work within stubdomain, and > > necessary when dom0 kernel runs in lockdown mode. > > > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > > I put the Xen, QEMU, and xen-pciback patches into OpenXT and gave a > little test. When pci_permissive=0, iwlwifi fails to load its > firmware. With pci_permissive=1, it looks like MSI-X is enabled. (I > previously included your libxl allow_interrupt_control patch - that > seemed to get regular MSIs working prior to the MSI-X patches.) I > also removed the OpenXT equivalent of 0005-Disable-MSI-X-caps.patch. > I am testing with Linux 5.4.y, so that could be another factor. Can you confirm the allow_interrupt_control is set by libxl? Also, vanilla 5.4 doesn't have the allow_interrupt_control patch at all, and you may have an earlier version that had "allow_msi_enable" as the sysfs file name. > One strange thing is the lspci output. Dom0 shows MSI-X enabled. > Meanwhile NDVM (sys-net) does not show the MSI-X capability. If you > `hexdump -C /sys/bus/pci/devices/$dev/config` you can see MSI-X > enabled, but you also see that the MSI capability has 00 as the next > pointer, so lspci stops parsing. This 00 value is written by Linux[1] (sic!) and then qemu incorrectly allowing the write and happily emulating that zero. The other qemu patch in this series ought to fix that (as in: properly refuse the write), do you have it included? [1] https://github.com/torvalds/linux/blob/master/drivers/net/wireless/intel/iwlwifi/pcie/drv.c#L1721 > MSI cap stubdom: > 00000040 10 00 92 00 c0 0e 00 00 10 0c 10 00 00 00 00 00 |................| > 0x41 -> next 0x00 > MSI cap dom0: > 00000040 10 80 92 00 c0 0e 00 10 10 0c 10 00 00 00 00 00 |................| > 0x41 -> next 0x80 > > MSI-X: > 00000080 11 00 0f 80 00 20 00 00 00 30 00 00 00 00 00 00 > > AFAIU, the value 0x80 at offset 0x83 is MSI-X Enabled. > > I had a boot where assignment failed with the hypervisor printing: > d12: assign (0000:00:14.3) failed (-16) > Rebooting the laptop seemed to clear that. Zombie of previous domain? Not set as "assignable" first?
On Wed, Nov 16, 2022 at 10:40:02PM +0100, Marek Marczykowski-Górecki wrote: > On Wed, Nov 16, 2022 at 02:15:22PM -0500, Jason Andryuk wrote: > > On Mon, Nov 14, 2022 at 2:21 PM Marek Marczykowski-Górecki > > <marmarek@invisiblethingslab.com> wrote: > > > > > > The /dev/mem is used for two purposes: > > > - reading PCI_MSIX_ENTRY_CTRL_MASKBIT > > > - reading Pending Bit Array (PBA) > > > > > > The first one was originally done because when Xen did not send all > > > vector ctrl writes to the device model, so QEMU might have outdated old > > > register value. This has been changed in Xen, so QEMU can now use its > > > cached value of the register instead. > > > > > > The Pending Bit Array (PBA) handling is for the case where it lives on > > > the same page as the MSI-X table itself. Xen has been extended to handle > > > this case too (as well as other registers that may live on those pages), > > > so QEMU handling is not necessary anymore. > > > > > > Removing /dev/mem access is useful to work within stubdomain, and > > > necessary when dom0 kernel runs in lockdown mode. > > > > > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > > > > I put the Xen, QEMU, and xen-pciback patches into OpenXT and gave a > > little test. When pci_permissive=0, iwlwifi fails to load its > > firmware. With pci_permissive=1, it looks like MSI-X is enabled. (I > > previously included your libxl allow_interrupt_control patch - that > > seemed to get regular MSIs working prior to the MSI-X patches.) I > > also removed the OpenXT equivalent of 0005-Disable-MSI-X-caps.patch. > > I am testing with Linux 5.4.y, so that could be another factor. > > Can you confirm the allow_interrupt_control is set by libxl? Also, > vanilla 5.4 doesn't have the allow_interrupt_control patch at all, and you > may have an earlier version that had "allow_msi_enable" as the sysfs > file name. Ok, I found what is wrong. Enabling MSI-X is refused, because INTx isn't disabled at this point yet. And apparently I was testing this with permissive=1... Linux does this: https://github.com/torvalds/linux/blob/master/drivers/pci/msi/msi.c#L611 In short: 1. Enable MSI-X with MASKALL=1 2. Setup MSI-X table 3. Disable INTx 4. Set MASKALL=0 This patch on top should fix this: ----8<---- diff --git a/drivers/xen/xen-pciback/conf_space_capability.c b/drivers/xen/xen-pciback/conf_space_capability.c index 097316a74126..f4c4381de76e 100644 --- a/drivers/xen/xen-pciback/conf_space_capability.c +++ b/drivers/xen/xen-pciback/conf_space_capability.c @@ -235,7 +235,7 @@ static int msi_msix_flags_write(struct pci_dev *dev, int offset, u16 new_value, (new_value ^ old_value) & ~field_config->allowed_bits) return PCIBIOS_SET_FAILED; - if (new_value & field_config->enable_bit) { + if ((new_value & field_config->allowed_bits) == field_config->enable_bit) { /* don't allow enabling together with other interrupt types */ int int_type = xen_pcibk_get_interrupt_type(dev); ----8<---- Jan, is the above safe? It should prevent clearing MASKALL if INTx isn't disabled, unless I missed something? But also, it will allow enabling MSI-X with MASKALL=1 together with MSI, which I'm not sure if should be allowed. Alternatively to the above patch, I could allow specifically setting MSIX_FLAGS_ENABLE + MSIX_FLAGS_MASKALL while INTx isn't disabled as a single exception, but at this point I'm not sure if some other driver or OS wouldn't approach this in yet another way.
On 17.11.2022 04:34, Marek Marczykowski-Górecki wrote: > Ok, I found what is wrong. Enabling MSI-X is refused, because INTx isn't > disabled at this point yet. And apparently I was testing this with > permissive=1... > > Linux does this: > https://github.com/torvalds/linux/blob/master/drivers/pci/msi/msi.c#L611 > In short: > 1. Enable MSI-X with MASKALL=1 > 2. Setup MSI-X table > 3. Disable INTx > 4. Set MASKALL=0 > > This patch on top should fix this: > ----8<---- > diff --git a/drivers/xen/xen-pciback/conf_space_capability.c b/drivers/xen/xen-pciback/conf_space_capability.c > index 097316a74126..f4c4381de76e 100644 > --- a/drivers/xen/xen-pciback/conf_space_capability.c > +++ b/drivers/xen/xen-pciback/conf_space_capability.c > @@ -235,7 +235,7 @@ static int msi_msix_flags_write(struct pci_dev *dev, int offset, u16 new_value, > (new_value ^ old_value) & ~field_config->allowed_bits) > return PCIBIOS_SET_FAILED; > > - if (new_value & field_config->enable_bit) { > + if ((new_value & field_config->allowed_bits) == field_config->enable_bit) { > /* don't allow enabling together with other interrupt types */ > int int_type = xen_pcibk_get_interrupt_type(dev); > > ----8<---- > > Jan, is the above safe? It should prevent clearing MASKALL if INTx isn't > disabled, unless I missed something? But also, it will allow enabling > MSI-X with MASKALL=1 together with MSI, which I'm not sure if should be > allowed. While it would probably be okay with what we have now (after your earlier patch introducing allowed_bits), it's likely not going to be correct once further bits would be added to allowed_bits (which clearly is going to be wanted sooner or later, e.g. for multi-vector MSI). Hence I think ... > Alternatively to the above patch, I could allow specifically setting > MSIX_FLAGS_ENABLE + MSIX_FLAGS_MASKALL while INTx isn't disabled as a > single exception, ... this is the way to go, and ... > but at this point I'm not sure if some other driver or > OS wouldn't approach this in yet another way. ... I guess we need to further add exceptions as needed - the one further approach I could see is to keep all entry's mask bits set until setting "INTx disable", without using MASKALL. I'd like to note though that the PCI spec has no such exception. It, however, also doesn't mandate setting "INTx disable" - that's merely a workaround for flawed hardware afaik. (In Xen we also only cross check MSI and MSI-X. IOW we rely on Dom0 anyway when it comes to driving "INTx disable".) Therefore in the end pciback looks to be going too far in enforcing such dependencies. Thinking of this - what about making the change in xen_pcibk_get_interrupt_type() instead, setting INTERRUPT_TYPE_MSIX only when MASKALL is clear (alongside ENABLE being set)? This would then also cover command_write(). Jan
On Thu, Nov 17, 2022 at 09:04:40AM +0100, Jan Beulich wrote: > On 17.11.2022 04:34, Marek Marczykowski-Górecki wrote: > > Ok, I found what is wrong. Enabling MSI-X is refused, because INTx isn't > > disabled at this point yet. And apparently I was testing this with > > permissive=1... > > > > Linux does this: > > https://github.com/torvalds/linux/blob/master/drivers/pci/msi/msi.c#L611 > > In short: > > 1. Enable MSI-X with MASKALL=1 > > 2. Setup MSI-X table > > 3. Disable INTx > > 4. Set MASKALL=0 > > > > This patch on top should fix this: > > ----8<---- > > diff --git a/drivers/xen/xen-pciback/conf_space_capability.c b/drivers/xen/xen-pciback/conf_space_capability.c > > index 097316a74126..f4c4381de76e 100644 > > --- a/drivers/xen/xen-pciback/conf_space_capability.c > > +++ b/drivers/xen/xen-pciback/conf_space_capability.c > > @@ -235,7 +235,7 @@ static int msi_msix_flags_write(struct pci_dev *dev, int offset, u16 new_value, > > (new_value ^ old_value) & ~field_config->allowed_bits) > > return PCIBIOS_SET_FAILED; > > > > - if (new_value & field_config->enable_bit) { > > + if ((new_value & field_config->allowed_bits) == field_config->enable_bit) { > > /* don't allow enabling together with other interrupt types */ > > int int_type = xen_pcibk_get_interrupt_type(dev); > > > > ----8<---- > > > > Jan, is the above safe? It should prevent clearing MASKALL if INTx isn't > > disabled, unless I missed something? But also, it will allow enabling > > MSI-X with MASKALL=1 together with MSI, which I'm not sure if should be > > allowed. > > While it would probably be okay with what we have now (after your earlier > patch introducing allowed_bits), it's likely not going to be correct once > further bits would be added to allowed_bits (which clearly is going to be > wanted sooner or later, e.g. for multi-vector MSI). Hence I think ... > > > Alternatively to the above patch, I could allow specifically setting > > MSIX_FLAGS_ENABLE + MSIX_FLAGS_MASKALL while INTx isn't disabled as a > > single exception, > > ... this is the way to go, and ... Ok, I'll go this way then. > > but at this point I'm not sure if some other driver or > > OS wouldn't approach this in yet another way. > > ... I guess we need to further add exceptions as needed - the one further > approach I could see is to keep all entry's mask bits set until setting > "INTx disable", without using MASKALL. > > I'd like to note though that the PCI spec has no such exception. It, > however, also doesn't mandate setting "INTx disable" - that's merely a > workaround for flawed hardware afaik. (In Xen we also only cross check > MSI and MSI-X. IOW we rely on Dom0 anyway when it comes to driving > "INTx disable".) Therefore in the end pciback looks to be going too far > in enforcing such dependencies. > > Thinking of this - what about making the change in > xen_pcibk_get_interrupt_type() instead, setting INTERRUPT_TYPE_MSIX only > when MASKALL is clear (alongside ENABLE being set)? This would then also > cover command_write(). That may be a good idea, but wouldn't cover the case here: xen_pcibk_get_interrupt_type() at this time returns INTERRUPT_TYPE_INTX only, and setting MSIX_FLAGS_ENABLE is prevented.
On Wed, Nov 16, 2022 at 10:34 PM Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> wrote: > > On Wed, Nov 16, 2022 at 10:40:02PM +0100, Marek Marczykowski-Górecki wrote: > > On Wed, Nov 16, 2022 at 02:15:22PM -0500, Jason Andryuk wrote: > > > On Mon, Nov 14, 2022 at 2:21 PM Marek Marczykowski-Górecki > > > <marmarek@invisiblethingslab.com> wrote: > > > > > > > > The /dev/mem is used for two purposes: > > > > - reading PCI_MSIX_ENTRY_CTRL_MASKBIT > > > > - reading Pending Bit Array (PBA) > > > > > > > > The first one was originally done because when Xen did not send all > > > > vector ctrl writes to the device model, so QEMU might have outdated old > > > > register value. This has been changed in Xen, so QEMU can now use its > > > > cached value of the register instead. > > > > > > > > The Pending Bit Array (PBA) handling is for the case where it lives on > > > > the same page as the MSI-X table itself. Xen has been extended to handle > > > > this case too (as well as other registers that may live on those pages), > > > > so QEMU handling is not necessary anymore. > > > > > > > > Removing /dev/mem access is useful to work within stubdomain, and > > > > necessary when dom0 kernel runs in lockdown mode. > > > > > > > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > > > > > > I put the Xen, QEMU, and xen-pciback patches into OpenXT and gave a > > > little test. When pci_permissive=0, iwlwifi fails to load its > > > firmware. With pci_permissive=1, it looks like MSI-X is enabled. (I > > > previously included your libxl allow_interrupt_control patch - that > > > seemed to get regular MSIs working prior to the MSI-X patches.) I > > > also removed the OpenXT equivalent of 0005-Disable-MSI-X-caps.patch. > > > I am testing with Linux 5.4.y, so that could be another factor. > > > > Can you confirm the allow_interrupt_control is set by libxl? Also, > > vanilla 5.4 doesn't have the allow_interrupt_control patch at all, and you > > may have an earlier version that had "allow_msi_enable" as the sysfs > > file name. I backported allow_interrupt_control to 5.4 and that is set properly. > Ok, I found what is wrong. Enabling MSI-X is refused, because INTx isn't > disabled at this point yet. And apparently I was testing this with > permissive=1... > > Linux does this: > https://github.com/torvalds/linux/blob/master/drivers/pci/msi/msi.c#L611 > In short: > 1. Enable MSI-X with MASKALL=1 > 2. Setup MSI-X table > 3. Disable INTx > 4. Set MASKALL=0 > > This patch on top should fix this: > ----8<---- > diff --git a/drivers/xen/xen-pciback/conf_space_capability.c b/drivers/xen/xen-pciback/conf_space_capability.c > index 097316a74126..f4c4381de76e 100644 > --- a/drivers/xen/xen-pciback/conf_space_capability.c > +++ b/drivers/xen/xen-pciback/conf_space_capability.c > @@ -235,7 +235,7 @@ static int msi_msix_flags_write(struct pci_dev *dev, int offset, u16 new_value, > (new_value ^ old_value) & ~field_config->allowed_bits) > return PCIBIOS_SET_FAILED; > > - if (new_value & field_config->enable_bit) { > + if ((new_value & field_config->allowed_bits) == field_config->enable_bit) { > /* don't allow enabling together with other interrupt types */ > int int_type = xen_pcibk_get_interrupt_type(dev); > > ----8<---- FWIW, I can confirm this allows enabling MSI-X with permissive=0 for me. Thanks, Jason
diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h index e7c4316a7d..de4094e7ec 100644 --- a/hw/xen/xen_pt.h +++ b/hw/xen/xen_pt.h @@ -206,7 +206,6 @@ typedef struct XenPTMSIX { uint32_t table_offset_adjust; /* page align mmap */ uint64_t mmio_base_addr; MemoryRegion mmio; - void *phys_iomem_base; XenPTMSIXEntry msix_entry[]; } XenPTMSIX; diff --git a/hw/xen/xen_pt_msi.c b/hw/xen/xen_pt_msi.c index b71563f98a..a8a75dff66 100644 --- a/hw/xen/xen_pt_msi.c +++ b/hw/xen/xen_pt_msi.c @@ -460,15 +460,7 @@ static void pci_msix_write(void *opaque, hwaddr addr, entry->updated = true; } else if (msix->enabled && entry->updated && !(val & PCI_MSIX_ENTRY_CTRL_MASKBIT)) { - const volatile uint32_t *vec_ctrl; - - /* - * If Xen intercepts the mask bit access, entry->vec_ctrl may not be - * up-to-date. Read from hardware directly. - */ - vec_ctrl = s->msix->phys_iomem_base + entry_nr * PCI_MSIX_ENTRY_SIZE - + PCI_MSIX_ENTRY_VECTOR_CTRL; - xen_pt_msix_update_one(s, entry_nr, *vec_ctrl); + xen_pt_msix_update_one(s, entry_nr, entry->latch(VECTOR_CTRL)); } set_entry_value(entry, offset, val); @@ -493,7 +485,9 @@ static uint64_t pci_msix_read(void *opaque, hwaddr addr, return get_entry_value(&msix->msix_entry[entry_nr], offset); } else { /* Pending Bit Array (PBA) */ - return *(uint32_t *)(msix->phys_iomem_base + addr); + XEN_PT_LOG(&s->dev, "reading PBA, addr %#lx, offset %#lx\n", + addr, addr - msix->total_entries * PCI_MSIX_ENTRY_SIZE); + return 0xFFFFFFFF; } } @@ -529,7 +523,6 @@ int xen_pt_msix_init(XenPCIPassthroughState *s, uint32_t base) int i, total_entries, bar_index; XenHostPCIDevice *hd = &s->real_device; PCIDevice *d = &s->dev; - int fd = -1; XenPTMSIX *msix = NULL; int rc = 0; @@ -576,34 +569,6 @@ int xen_pt_msix_init(XenPCIPassthroughState *s, uint32_t base) msix->table_base = s->real_device.io_regions[bar_index].base_addr; XEN_PT_LOG(d, "get MSI-X table BAR base 0x%"PRIx64"\n", msix->table_base); - fd = open("/dev/mem", O_RDWR); - if (fd == -1) { - rc = -errno; - XEN_PT_ERR(d, "Can't open /dev/mem: %s\n", strerror(errno)); - goto error_out; - } - XEN_PT_LOG(d, "table_off = 0x%x, total_entries = %d\n", - table_off, total_entries); - msix->table_offset_adjust = table_off & 0x0fff; - msix->phys_iomem_base = - mmap(NULL, - total_entries * PCI_MSIX_ENTRY_SIZE + msix->table_offset_adjust, - PROT_READ, - MAP_SHARED | MAP_LOCKED, - fd, - msix->table_base + table_off - msix->table_offset_adjust); - close(fd); - if (msix->phys_iomem_base == MAP_FAILED) { - rc = -errno; - XEN_PT_ERR(d, "Can't map physical MSI-X table: %s\n", strerror(errno)); - goto error_out; - } - msix->phys_iomem_base = (char *)msix->phys_iomem_base - + msix->table_offset_adjust; - - XEN_PT_LOG(d, "mapping physical MSI-X table to %p\n", - msix->phys_iomem_base); - memory_region_add_subregion_overlap(&s->bar[bar_index], table_off, &msix->mmio, 2); /* Priority: pci default + 1 */ @@ -624,14 +589,6 @@ void xen_pt_msix_unmap(XenPCIPassthroughState *s) return; } - /* unmap the MSI-X memory mapped register area */ - if (msix->phys_iomem_base) { - XEN_PT_LOG(&s->dev, "unmapping physical MSI-X table from %p\n", - msix->phys_iomem_base); - munmap(msix->phys_iomem_base, msix->total_entries * PCI_MSIX_ENTRY_SIZE - + msix->table_offset_adjust); - } - memory_region_del_subregion(&s->bar[msix->bar_index], &msix->mmio); }
The /dev/mem is used for two purposes: - reading PCI_MSIX_ENTRY_CTRL_MASKBIT - reading Pending Bit Array (PBA) The first one was originally done because when Xen did not send all vector ctrl writes to the device model, so QEMU might have outdated old register value. This has been changed in Xen, so QEMU can now use its cached value of the register instead. The Pending Bit Array (PBA) handling is for the case where it lives on the same page as the MSI-X table itself. Xen has been extended to handle this case too (as well as other registers that may live on those pages), so QEMU handling is not necessary anymore. Removing /dev/mem access is useful to work within stubdomain, and necessary when dom0 kernel runs in lockdown mode. Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> --- hw/xen/xen_pt.h | 1 - hw/xen/xen_pt_msi.c | 51 ++++----------------------------------------- 2 files changed, 4 insertions(+), 48 deletions(-)