Message ID | 6984a8571dac35d04c85117834d99b00fe1c4184.1680752649.git-series.marmarek@invisiblethingslab.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | MSI-X support with qemu in stubdomain, and other related changes | expand |
On 06.04.2023 05:57, Marek Marczykowski-Górecki wrote: > Some firmware/devices are found to not reset MSI-X properly, leaving > MASKALL set. Jason reports on his machine MASKALL persists through a > warm reboot, but is cleared on cold boot. Xen relies on initial state > being MASKALL clear. Especially, pci_reset_msix_state() assumes if > MASKALL is set, it was Xen setting it due to msix->host_maskall or > msix->guest_maskall. Clearing just MASKALL might be unsafe if ENABLE is > set, so clear them both. > > Reported-by: Jason Andryuk <jandryuk@gmail.com> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> albeit with a couple of nits (which I'd be happy to address while committing, so long as you agree). First one being on the last sentence above: It's surely not just "might"; if resetting already doesn't work right, nothing says that the individual mask bit all end up set. Clearing ENABLE as well is only natural imo, if we already need to fix up after firmware. So maybe "Even if so far not observed to be left set, clear ENABLE as well"? > --- a/xen/drivers/passthrough/msi.c > +++ b/xen/drivers/passthrough/msi.c > @@ -46,6 +46,23 @@ int pdev_msi_init(struct pci_dev *pdev) > spin_lock_init(&msix->table_lock); > > ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos)); > + > + if ( ctrl & (PCI_MSIX_FLAGS_MASKALL|PCI_MSIX_FLAGS_ENABLE) ) Style (missing blanks around |; once more below). > + { > + /* > + * pci_reset_msix_state() relies on MASKALL not being set > + * initially, clear it (and ENABLE too - for safety), to meet that > + * expectation. > + */ > + printk(XENLOG_WARNING > + "%pp: unexpected initial MSI-X state (MASKALL=%d, ENABLE=%d), fixing\n", > + &pdev->sbdf, > + (ctrl & PCI_MSIX_FLAGS_MASKALL) ? 1 : 0, > + (ctrl & PCI_MSIX_FLAGS_ENABLE) ? 1 : 0); Our "canonical" way of dealing with this is !!(x & y). > + ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL); > + pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl); > + } > + > msix->nr_entries = msix_table_size(ctrl); > > pdev->msix = msix; Aiui there's no dependency here on the earlier patches in the series; please confirm (or otherwise). Jason - any chance of getting a Tested-by: from you? Jan
On Mon, Apr 24, 2023 at 10:19 AM Jan Beulich <jbeulich@suse.com> wrote: > > On 06.04.2023 05:57, Marek Marczykowski-Górecki wrote: > > Some firmware/devices are found to not reset MSI-X properly, leaving > > MASKALL set. Jason reports on his machine MASKALL persists through a > > warm reboot, but is cleared on cold boot. Xen relies on initial state > > being MASKALL clear. Especially, pci_reset_msix_state() assumes if > > MASKALL is set, it was Xen setting it due to msix->host_maskall or > > msix->guest_maskall. Clearing just MASKALL might be unsafe if ENABLE is > > set, so clear them both. > > > > Reported-by: Jason Andryuk <jandryuk@gmail.com> > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > albeit with a couple of nits (which I'd be happy to address while > committing, so long as you agree). First one being on the last > sentence above: It's surely not just "might"; if resetting already > doesn't work right, nothing says that the individual mask bit all > end up set. Clearing ENABLE as well is only natural imo, if we > already need to fix up after firmware. So maybe "Even if so far not > observed to be left set, clear ENABLE as well"? > > > --- a/xen/drivers/passthrough/msi.c > > +++ b/xen/drivers/passthrough/msi.c > > @@ -46,6 +46,23 @@ int pdev_msi_init(struct pci_dev *pdev) > > spin_lock_init(&msix->table_lock); > > > > ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos)); > > + > > + if ( ctrl & (PCI_MSIX_FLAGS_MASKALL|PCI_MSIX_FLAGS_ENABLE) ) > > Style (missing blanks around |; once more below). > > > + { > > + /* > > + * pci_reset_msix_state() relies on MASKALL not being set > > + * initially, clear it (and ENABLE too - for safety), to meet that > > + * expectation. > > + */ > > + printk(XENLOG_WARNING > > + "%pp: unexpected initial MSI-X state (MASKALL=%d, ENABLE=%d), fixing\n", > > + &pdev->sbdf, > > + (ctrl & PCI_MSIX_FLAGS_MASKALL) ? 1 : 0, > > + (ctrl & PCI_MSIX_FLAGS_ENABLE) ? 1 : 0); > > Our "canonical" way of dealing with this is !!(x & y). > > > + ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL); > > + pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl); > > + } > > + > > msix->nr_entries = msix_table_size(ctrl); > > > > pdev->msix = msix; > > > Aiui there's no dependency here on the earlier patches in the series; > please confirm (or otherwise). > > Jason - any chance of getting a Tested-by: from you? I'm building v3 now. v2 worked for clearing MASKALL on initial boot. I posted in these two messages - a summary is below. https://lore.kernel.org/xen-devel/CAKf6xpto87QRSKT2qc1yApNfaw2SrLLxPoytYJv_jEbYTAbjCg@mail.gmail.com/ https://lore.kernel.org/xen-devel/CAKf6xptHALLR-Qjf=p5y0o9Ud2V7eFMJuB8Ap-PLjv-N7PAJVQ@mail.gmail.com/ OpenXT has a patch that performs an extra reset after domain shutdown, and that causes Xen to set MASKALL. I confirmed by removing it. So this patch helps with clearing MASKALL on host boot, but with the OpenXT patch, rebooting a domain fails. MASKALL gets set on VM shutdown and then the subsequent boot can't assign the device. So this patch is helpful in some scenarios, but it was also an issue caused by the OpenXT patch. Does that make it unsuitable for inclusion? I assume the OpenXT patch wasn't an issue previously since MSI-X was never enabled. Regards, Jason
On 24.04.2023 17:25, Jason Andryuk wrote: > On Mon, Apr 24, 2023 at 10:19 AM Jan Beulich <jbeulich@suse.com> wrote: >> Jason - any chance of getting a Tested-by: from you? > > I'm building v3 now. v2 worked for clearing MASKALL on initial boot. > > I posted in these two messages - a summary is below. > https://lore.kernel.org/xen-devel/CAKf6xpto87QRSKT2qc1yApNfaw2SrLLxPoytYJv_jEbYTAbjCg@mail.gmail.com/ > https://lore.kernel.org/xen-devel/CAKf6xptHALLR-Qjf=p5y0o9Ud2V7eFMJuB8Ap-PLjv-N7PAJVQ@mail.gmail.com/ > > OpenXT has a patch that performs an extra reset after domain shutdown, > and that causes Xen to set MASKALL. I confirmed by removing it. So > this patch helps with clearing MASKALL on host boot, but with the > OpenXT patch, rebooting a domain fails. MASKALL gets set on VM > shutdown and then the subsequent boot can't assign the device. > > So this patch is helpful in some scenarios, but it was also an issue > caused by the OpenXT patch. Does that make it unsuitable for > inclusion? What is "it" here? If I get your reply right, there is a similar issue left unaddressed by this version of the change (and as was said before, a device reset changing state that Xen tracks or otherwise cares about needs to be reported to Xen). Yet that doesn't really fit with the question, at least the way I read it ... Jan > I assume the OpenXT patch wasn't an issue previously since > MSI-X was never enabled. > > Regards, > Jason
On Mon, Apr 24, 2023 at 11:25:01AM -0400, Jason Andryuk wrote: > On Mon, Apr 24, 2023 at 10:19 AM Jan Beulich <jbeulich@suse.com> wrote: > > > > On 06.04.2023 05:57, Marek Marczykowski-Górecki wrote: > > > Some firmware/devices are found to not reset MSI-X properly, leaving > > > MASKALL set. Jason reports on his machine MASKALL persists through a > > > warm reboot, but is cleared on cold boot. Xen relies on initial state > > > being MASKALL clear. Especially, pci_reset_msix_state() assumes if > > > MASKALL is set, it was Xen setting it due to msix->host_maskall or > > > msix->guest_maskall. Clearing just MASKALL might be unsafe if ENABLE is > > > set, so clear them both. > > > > > > Reported-by: Jason Andryuk <jandryuk@gmail.com> > > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > > > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > > albeit with a couple of nits (which I'd be happy to address while > > committing, so long as you agree). Yes, thanks! > > First one being on the last > > sentence above: It's surely not just "might"; if resetting already > > doesn't work right, nothing says that the individual mask bit all > > end up set. Clearing ENABLE as well is only natural imo, if we > > already need to fix up after firmware. So maybe "Even if so far not > > observed to be left set, clear ENABLE as well"? > > > > > --- a/xen/drivers/passthrough/msi.c > > > +++ b/xen/drivers/passthrough/msi.c > > > @@ -46,6 +46,23 @@ int pdev_msi_init(struct pci_dev *pdev) > > > spin_lock_init(&msix->table_lock); > > > > > > ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos)); > > > + > > > + if ( ctrl & (PCI_MSIX_FLAGS_MASKALL|PCI_MSIX_FLAGS_ENABLE) ) > > > > Style (missing blanks around |; once more below). > > > > > + { > > > + /* > > > + * pci_reset_msix_state() relies on MASKALL not being set > > > + * initially, clear it (and ENABLE too - for safety), to meet that > > > + * expectation. > > > + */ > > > + printk(XENLOG_WARNING > > > + "%pp: unexpected initial MSI-X state (MASKALL=%d, ENABLE=%d), fixing\n", > > > + &pdev->sbdf, > > > + (ctrl & PCI_MSIX_FLAGS_MASKALL) ? 1 : 0, > > > + (ctrl & PCI_MSIX_FLAGS_ENABLE) ? 1 : 0); > > > > Our "canonical" way of dealing with this is !!(x & y). > > > > > + ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL); > > > + pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl); > > > + } > > > + > > > msix->nr_entries = msix_table_size(ctrl); > > > > > > pdev->msix = msix; > > > > > > Aiui there's no dependency here on the earlier patches in the series; > > please confirm (or otherwise). Indeed. An earlier patch uncovered a firmware (or such) issue on some systems and this patch deals with it, but it doesn't depend on earlier patches. > > Jason - any chance of getting a Tested-by: from you? > > I'm building v3 now. v2 worked for clearing MASKALL on initial boot. > > I posted in these two messages - a summary is below. > https://lore.kernel.org/xen-devel/CAKf6xpto87QRSKT2qc1yApNfaw2SrLLxPoytYJv_jEbYTAbjCg@mail.gmail.com/ > https://lore.kernel.org/xen-devel/CAKf6xptHALLR-Qjf=p5y0o9Ud2V7eFMJuB8Ap-PLjv-N7PAJVQ@mail.gmail.com/ > > OpenXT has a patch that performs an extra reset after domain shutdown, > and that causes Xen to set MASKALL. I confirmed by removing it. So > this patch helps with clearing MASKALL on host boot, but with the > OpenXT patch, rebooting a domain fails. MASKALL gets set on VM > shutdown and then the subsequent boot can't assign the device. > > So this patch is helpful in some scenarios, but it was also an issue > caused by the OpenXT patch. Does that make it unsuitable for > inclusion? I assume the OpenXT patch wasn't an issue previously since > MSI-X was never enabled. Upstream Xen IMO should deal with the state it gets on boot, regardless of what was running previously (the actual issue is likely in firmware or device itself, that it doesn't clear that bit, but well...). So, rebooting from OpenXT, into vanilla upstream Xen should result in fully functional system. That's why I included this patch, but haven't dealt with an issue caused by OpenXT patch on subsequent domain startups (as it doesn't apply to the upstream code base).
On Mon, Apr 24, 2023 at 11:30 AM Jan Beulich <jbeulich@suse.com> wrote: > > On 24.04.2023 17:25, Jason Andryuk wrote: > > On Mon, Apr 24, 2023 at 10:19 AM Jan Beulich <jbeulich@suse.com> wrote: > >> Jason - any chance of getting a Tested-by: from you? > > > > I'm building v3 now. v2 worked for clearing MASKALL on initial boot. > > > > I posted in these two messages - a summary is below. > > https://lore.kernel.org/xen-devel/CAKf6xpto87QRSKT2qc1yApNfaw2SrLLxPoytYJv_jEbYTAbjCg@mail.gmail.com/ > > https://lore.kernel.org/xen-devel/CAKf6xptHALLR-Qjf=p5y0o9Ud2V7eFMJuB8Ap-PLjv-N7PAJVQ@mail.gmail.com/ > > > > OpenXT has a patch that performs an extra reset after domain shutdown, > > and that causes Xen to set MASKALL. I confirmed by removing it. So > > this patch helps with clearing MASKALL on host boot, but with the > > OpenXT patch, rebooting a domain fails. MASKALL gets set on VM > > shutdown and then the subsequent boot can't assign the device. > > > > So this patch is helpful in some scenarios, but it was also an issue > > caused by the OpenXT patch. Does that make it unsuitable for > > inclusion? > > What is "it" here? If I get your reply right, there is a similar issue > left unaddressed by this version of the change (and as was said before, > a device reset changing state that Xen tracks or otherwise cares about > needs to be reported to Xen). Yet that doesn't really fit with the > question, at least the way I read it ... "So this patch is helpful in some scenarios, but setting MASKALL in the first place is an issue caused by the OpenXT patch. Does that make this patch unsuitable for inclusion?" I think Marek's response that "Xen IMO should deal with the state it gets on boot, regardless of what was running previously" makes sense and means this is worthy of inclusion. And I tested it. Without the OpenXT libxl-fix-flr.patch: (XEN) 0000:00:14.3: unexpected initial MSI-X state (MASKALL=0, ENABLE=1), fixing With the OpenXT patch: (XEN) 0000:00:14.3: unexpected initial MSI-X state (MASKALL=1, ENABLE=1), fixing Tested-by: Jason Andryuk <jandryuk@gmail.com> The patch is here if anyone want to look: https://github.com/OpenXT/xenclient-oe/blob/master/recipes-extended/xen/files/libxl-fix-flr.patch It's calling libxl__device_pci_reset() from destroy_finish_check(), so it's not trying to do anything behind Xen's back. It's just that Xen sees memory decoding disabled, and then sets MASKALL. Regards, Jason
diff --git a/xen/drivers/passthrough/msi.c b/xen/drivers/passthrough/msi.c index ce1a450f6f4a..c9f7eac29ebf 100644 --- a/xen/drivers/passthrough/msi.c +++ b/xen/drivers/passthrough/msi.c @@ -46,6 +46,23 @@ int pdev_msi_init(struct pci_dev *pdev) spin_lock_init(&msix->table_lock); ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos)); + + if ( ctrl & (PCI_MSIX_FLAGS_MASKALL|PCI_MSIX_FLAGS_ENABLE) ) + { + /* + * pci_reset_msix_state() relies on MASKALL not being set + * initially, clear it (and ENABLE too - for safety), to meet that + * expectation. + */ + printk(XENLOG_WARNING + "%pp: unexpected initial MSI-X state (MASKALL=%d, ENABLE=%d), fixing\n", + &pdev->sbdf, + (ctrl & PCI_MSIX_FLAGS_MASKALL) ? 1 : 0, + (ctrl & PCI_MSIX_FLAGS_ENABLE) ? 1 : 0); + ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL); + pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl); + } + msix->nr_entries = msix_table_size(ctrl); pdev->msix = msix;
Some firmware/devices are found to not reset MSI-X properly, leaving MASKALL set. Jason reports on his machine MASKALL persists through a warm reboot, but is cleared on cold boot. Xen relies on initial state being MASKALL clear. Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen setting it due to msix->host_maskall or msix->guest_maskall. Clearing just MASKALL might be unsafe if ENABLE is set, so clear them both. Reported-by: Jason Andryuk <jandryuk@gmail.com> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> --- v3: - update comment - clear bits only when they were set --- xen/drivers/passthrough/msi.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)