diff mbox series

[v3,4/4] x86/msi: clear initial MSI-X state on boot

Message ID 6984a8571dac35d04c85117834d99b00fe1c4184.1680752649.git-series.marmarek@invisiblethingslab.com (mailing list archive)
State New, archived
Headers show
Series MSI-X support with qemu in stubdomain, and other related changes | expand

Commit Message

Marek Marczykowski-Górecki April 6, 2023, 3:57 a.m. UTC
Some firmware/devices are found to not reset MSI-X properly, leaving
MASKALL set. Jason reports on his machine MASKALL persists through a
warm reboot, but is cleared on cold boot. Xen relies on initial state
being MASKALL clear. Especially, pci_reset_msix_state() assumes if
MASKALL is set, it was Xen setting it due to msix->host_maskall or
msix->guest_maskall. Clearing just MASKALL might be unsafe if ENABLE is
set, so clear them both.

Reported-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
v3:
 - update comment
 - clear bits only when they were set
---
 xen/drivers/passthrough/msi.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Jan Beulich April 24, 2023, 2:19 p.m. UTC | #1
On 06.04.2023 05:57, Marek Marczykowski-Górecki wrote:
> Some firmware/devices are found to not reset MSI-X properly, leaving
> MASKALL set. Jason reports on his machine MASKALL persists through a
> warm reboot, but is cleared on cold boot. Xen relies on initial state
> being MASKALL clear. Especially, pci_reset_msix_state() assumes if
> MASKALL is set, it was Xen setting it due to msix->host_maskall or
> msix->guest_maskall. Clearing just MASKALL might be unsafe if ENABLE is
> set, so clear them both.
> 
> Reported-by: Jason Andryuk <jandryuk@gmail.com>
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
albeit with a couple of nits (which I'd be happy to address while
committing, so long as you agree). First one being on the last
sentence above: It's surely not just "might"; if resetting already
doesn't work right, nothing says that the individual mask bit all
end up set. Clearing ENABLE as well is only natural imo, if we
already need to fix up after firmware. So maybe "Even if so far not
observed to be left set, clear ENABLE as well"?

> --- a/xen/drivers/passthrough/msi.c
> +++ b/xen/drivers/passthrough/msi.c
> @@ -46,6 +46,23 @@ int pdev_msi_init(struct pci_dev *pdev)
>          spin_lock_init(&msix->table_lock);
>  
>          ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
> +
> +        if ( ctrl & (PCI_MSIX_FLAGS_MASKALL|PCI_MSIX_FLAGS_ENABLE) )

Style (missing blanks around |; once more below).

> +        {
> +            /*
> +             * pci_reset_msix_state() relies on MASKALL not being set
> +             * initially, clear it (and ENABLE too - for safety), to meet that
> +             * expectation.
> +             */
> +            printk(XENLOG_WARNING
> +                   "%pp: unexpected initial MSI-X state (MASKALL=%d, ENABLE=%d), fixing\n",
> +                   &pdev->sbdf,
> +                   (ctrl & PCI_MSIX_FLAGS_MASKALL) ? 1 : 0,
> +                   (ctrl & PCI_MSIX_FLAGS_ENABLE) ? 1 : 0);

Our "canonical" way of dealing with this is !!(x & y).

> +            ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL);
> +            pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl);
> +        }
> +
>          msix->nr_entries = msix_table_size(ctrl);
>  
>          pdev->msix = msix;


Aiui there's no dependency here on the earlier patches in the series;
please confirm (or otherwise).

Jason - any chance of getting a Tested-by: from you?

Jan
Jason Andryuk April 24, 2023, 3:25 p.m. UTC | #2
On Mon, Apr 24, 2023 at 10:19 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 06.04.2023 05:57, Marek Marczykowski-Górecki wrote:
> > Some firmware/devices are found to not reset MSI-X properly, leaving
> > MASKALL set. Jason reports on his machine MASKALL persists through a
> > warm reboot, but is cleared on cold boot. Xen relies on initial state
> > being MASKALL clear. Especially, pci_reset_msix_state() assumes if
> > MASKALL is set, it was Xen setting it due to msix->host_maskall or
> > msix->guest_maskall. Clearing just MASKALL might be unsafe if ENABLE is
> > set, so clear them both.
> >
> > Reported-by: Jason Andryuk <jandryuk@gmail.com>
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> albeit with a couple of nits (which I'd be happy to address while
> committing, so long as you agree). First one being on the last
> sentence above: It's surely not just "might"; if resetting already
> doesn't work right, nothing says that the individual mask bit all
> end up set. Clearing ENABLE as well is only natural imo, if we
> already need to fix up after firmware. So maybe "Even if so far not
> observed to be left set, clear ENABLE as well"?
>
> > --- a/xen/drivers/passthrough/msi.c
> > +++ b/xen/drivers/passthrough/msi.c
> > @@ -46,6 +46,23 @@ int pdev_msi_init(struct pci_dev *pdev)
> >          spin_lock_init(&msix->table_lock);
> >
> >          ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
> > +
> > +        if ( ctrl & (PCI_MSIX_FLAGS_MASKALL|PCI_MSIX_FLAGS_ENABLE) )
>
> Style (missing blanks around |; once more below).
>
> > +        {
> > +            /*
> > +             * pci_reset_msix_state() relies on MASKALL not being set
> > +             * initially, clear it (and ENABLE too - for safety), to meet that
> > +             * expectation.
> > +             */
> > +            printk(XENLOG_WARNING
> > +                   "%pp: unexpected initial MSI-X state (MASKALL=%d, ENABLE=%d), fixing\n",
> > +                   &pdev->sbdf,
> > +                   (ctrl & PCI_MSIX_FLAGS_MASKALL) ? 1 : 0,
> > +                   (ctrl & PCI_MSIX_FLAGS_ENABLE) ? 1 : 0);
>
> Our "canonical" way of dealing with this is !!(x & y).
>
> > +            ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL);
> > +            pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl);
> > +        }
> > +
> >          msix->nr_entries = msix_table_size(ctrl);
> >
> >          pdev->msix = msix;
>
>
> Aiui there's no dependency here on the earlier patches in the series;
> please confirm (or otherwise).
>
> Jason - any chance of getting a Tested-by: from you?

I'm building v3 now.  v2  worked for clearing MASKALL on initial boot.

I posted in these two messages - a summary is below.
https://lore.kernel.org/xen-devel/CAKf6xpto87QRSKT2qc1yApNfaw2SrLLxPoytYJv_jEbYTAbjCg@mail.gmail.com/
https://lore.kernel.org/xen-devel/CAKf6xptHALLR-Qjf=p5y0o9Ud2V7eFMJuB8Ap-PLjv-N7PAJVQ@mail.gmail.com/

OpenXT has a patch that performs an extra reset after domain shutdown,
and that causes Xen to set MASKALL.  I confirmed by removing it.  So
this patch helps with clearing MASKALL on host boot, but with the
OpenXT patch, rebooting a domain fails.  MASKALL gets set on VM
shutdown and then the subsequent boot can't assign the device.

So this patch is helpful in some scenarios, but it was also an issue
caused by the OpenXT patch.  Does that make it unsuitable for
inclusion?  I assume the OpenXT patch wasn't an issue previously since
MSI-X was never enabled.

Regards,
Jason
Jan Beulich April 24, 2023, 3:30 p.m. UTC | #3
On 24.04.2023 17:25, Jason Andryuk wrote:
> On Mon, Apr 24, 2023 at 10:19 AM Jan Beulich <jbeulich@suse.com> wrote:
>> Jason - any chance of getting a Tested-by: from you?
> 
> I'm building v3 now.  v2  worked for clearing MASKALL on initial boot.
> 
> I posted in these two messages - a summary is below.
> https://lore.kernel.org/xen-devel/CAKf6xpto87QRSKT2qc1yApNfaw2SrLLxPoytYJv_jEbYTAbjCg@mail.gmail.com/
> https://lore.kernel.org/xen-devel/CAKf6xptHALLR-Qjf=p5y0o9Ud2V7eFMJuB8Ap-PLjv-N7PAJVQ@mail.gmail.com/
> 
> OpenXT has a patch that performs an extra reset after domain shutdown,
> and that causes Xen to set MASKALL.  I confirmed by removing it.  So
> this patch helps with clearing MASKALL on host boot, but with the
> OpenXT patch, rebooting a domain fails.  MASKALL gets set on VM
> shutdown and then the subsequent boot can't assign the device.
> 
> So this patch is helpful in some scenarios, but it was also an issue
> caused by the OpenXT patch.  Does that make it unsuitable for
> inclusion?

What is "it" here? If I get your reply right, there is a similar issue
left unaddressed by this version of the change (and as was said before,
a device reset changing state that Xen tracks or otherwise cares about
needs to be reported to Xen). Yet that doesn't really fit with the
question, at least the way I read it ...

Jan

>  I assume the OpenXT patch wasn't an issue previously since
> MSI-X was never enabled.
> 
> Regards,
> Jason
Marek Marczykowski-Górecki April 24, 2023, 3:34 p.m. UTC | #4
On Mon, Apr 24, 2023 at 11:25:01AM -0400, Jason Andryuk wrote:
> On Mon, Apr 24, 2023 at 10:19 AM Jan Beulich <jbeulich@suse.com> wrote:
> >
> > On 06.04.2023 05:57, Marek Marczykowski-Górecki wrote:
> > > Some firmware/devices are found to not reset MSI-X properly, leaving
> > > MASKALL set. Jason reports on his machine MASKALL persists through a
> > > warm reboot, but is cleared on cold boot. Xen relies on initial state
> > > being MASKALL clear. Especially, pci_reset_msix_state() assumes if
> > > MASKALL is set, it was Xen setting it due to msix->host_maskall or
> > > msix->guest_maskall. Clearing just MASKALL might be unsafe if ENABLE is
> > > set, so clear them both.
> > >
> > > Reported-by: Jason Andryuk <jandryuk@gmail.com>
> > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> >
> > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> > albeit with a couple of nits (which I'd be happy to address while
> > committing, so long as you agree). 

Yes, thanks!

> > First one being on the last
> > sentence above: It's surely not just "might"; if resetting already
> > doesn't work right, nothing says that the individual mask bit all
> > end up set. Clearing ENABLE as well is only natural imo, if we
> > already need to fix up after firmware. So maybe "Even if so far not
> > observed to be left set, clear ENABLE as well"?
> >
> > > --- a/xen/drivers/passthrough/msi.c
> > > +++ b/xen/drivers/passthrough/msi.c
> > > @@ -46,6 +46,23 @@ int pdev_msi_init(struct pci_dev *pdev)
> > >          spin_lock_init(&msix->table_lock);
> > >
> > >          ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
> > > +
> > > +        if ( ctrl & (PCI_MSIX_FLAGS_MASKALL|PCI_MSIX_FLAGS_ENABLE) )
> >
> > Style (missing blanks around |; once more below).
> >
> > > +        {
> > > +            /*
> > > +             * pci_reset_msix_state() relies on MASKALL not being set
> > > +             * initially, clear it (and ENABLE too - for safety), to meet that
> > > +             * expectation.
> > > +             */
> > > +            printk(XENLOG_WARNING
> > > +                   "%pp: unexpected initial MSI-X state (MASKALL=%d, ENABLE=%d), fixing\n",
> > > +                   &pdev->sbdf,
> > > +                   (ctrl & PCI_MSIX_FLAGS_MASKALL) ? 1 : 0,
> > > +                   (ctrl & PCI_MSIX_FLAGS_ENABLE) ? 1 : 0);
> >
> > Our "canonical" way of dealing with this is !!(x & y).
> >
> > > +            ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL);
> > > +            pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl);
> > > +        }
> > > +
> > >          msix->nr_entries = msix_table_size(ctrl);
> > >
> > >          pdev->msix = msix;
> >
> >
> > Aiui there's no dependency here on the earlier patches in the series;
> > please confirm (or otherwise).

Indeed. An earlier patch uncovered a firmware (or such) issue on some
systems and this patch deals with it, but it doesn't depend on earlier
patches.

> > Jason - any chance of getting a Tested-by: from you?
> 
> I'm building v3 now.  v2  worked for clearing MASKALL on initial boot.
> 
> I posted in these two messages - a summary is below.
> https://lore.kernel.org/xen-devel/CAKf6xpto87QRSKT2qc1yApNfaw2SrLLxPoytYJv_jEbYTAbjCg@mail.gmail.com/
> https://lore.kernel.org/xen-devel/CAKf6xptHALLR-Qjf=p5y0o9Ud2V7eFMJuB8Ap-PLjv-N7PAJVQ@mail.gmail.com/
> 
> OpenXT has a patch that performs an extra reset after domain shutdown,
> and that causes Xen to set MASKALL.  I confirmed by removing it.  So
> this patch helps with clearing MASKALL on host boot, but with the
> OpenXT patch, rebooting a domain fails.  MASKALL gets set on VM
> shutdown and then the subsequent boot can't assign the device.
> 
> So this patch is helpful in some scenarios, but it was also an issue
> caused by the OpenXT patch.  Does that make it unsuitable for
> inclusion?  I assume the OpenXT patch wasn't an issue previously since
> MSI-X was never enabled.

Upstream Xen IMO should deal with the state it gets on boot, regardless
of what was running previously (the actual issue is likely in firmware
or device itself, that it doesn't clear that bit, but well...). So,
rebooting from OpenXT, into vanilla upstream Xen should result in fully
functional system. That's why I included this patch, but haven't dealt
with an issue caused by OpenXT patch on subsequent domain startups (as
it doesn't apply to the upstream code base).
Jason Andryuk April 24, 2023, 4:42 p.m. UTC | #5
On Mon, Apr 24, 2023 at 11:30 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 24.04.2023 17:25, Jason Andryuk wrote:
> > On Mon, Apr 24, 2023 at 10:19 AM Jan Beulich <jbeulich@suse.com> wrote:
> >> Jason - any chance of getting a Tested-by: from you?
> >
> > I'm building v3 now.  v2  worked for clearing MASKALL on initial boot.
> >
> > I posted in these two messages - a summary is below.
> > https://lore.kernel.org/xen-devel/CAKf6xpto87QRSKT2qc1yApNfaw2SrLLxPoytYJv_jEbYTAbjCg@mail.gmail.com/
> > https://lore.kernel.org/xen-devel/CAKf6xptHALLR-Qjf=p5y0o9Ud2V7eFMJuB8Ap-PLjv-N7PAJVQ@mail.gmail.com/
> >
> > OpenXT has a patch that performs an extra reset after domain shutdown,
> > and that causes Xen to set MASKALL.  I confirmed by removing it.  So
> > this patch helps with clearing MASKALL on host boot, but with the
> > OpenXT patch, rebooting a domain fails.  MASKALL gets set on VM
> > shutdown and then the subsequent boot can't assign the device.
> >
> > So this patch is helpful in some scenarios, but it was also an issue
> > caused by the OpenXT patch.  Does that make it unsuitable for
> > inclusion?
>
> What is "it" here? If I get your reply right, there is a similar issue
> left unaddressed by this version of the change (and as was said before,
> a device reset changing state that Xen tracks or otherwise cares about
> needs to be reported to Xen). Yet that doesn't really fit with the
> question, at least the way I read it ...

"So this patch is helpful in some scenarios, but setting MASKALL in
the first place is an issue caused by the OpenXT patch.  Does that
make this patch unsuitable for inclusion?"

I think Marek's response that "Xen IMO should deal with the state it
gets on boot, regardless of what was running previously" makes sense
and means this is worthy of inclusion.

And I tested it.  Without the OpenXT libxl-fix-flr.patch:
(XEN) 0000:00:14.3: unexpected initial MSI-X state (MASKALL=0, ENABLE=1), fixing
With the OpenXT patch:
(XEN) 0000:00:14.3: unexpected initial MSI-X state (MASKALL=1, ENABLE=1), fixing

Tested-by: Jason Andryuk <jandryuk@gmail.com>

The patch is here if anyone want to look:
https://github.com/OpenXT/xenclient-oe/blob/master/recipes-extended/xen/files/libxl-fix-flr.patch

It's calling libxl__device_pci_reset() from destroy_finish_check(), so
it's not trying to do anything behind Xen's back.  It's just that Xen
sees memory decoding disabled, and then sets MASKALL.

Regards,
Jason
diff mbox series

Patch

diff --git a/xen/drivers/passthrough/msi.c b/xen/drivers/passthrough/msi.c
index ce1a450f6f4a..c9f7eac29ebf 100644
--- a/xen/drivers/passthrough/msi.c
+++ b/xen/drivers/passthrough/msi.c
@@ -46,6 +46,23 @@  int pdev_msi_init(struct pci_dev *pdev)
         spin_lock_init(&msix->table_lock);
 
         ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
+
+        if ( ctrl & (PCI_MSIX_FLAGS_MASKALL|PCI_MSIX_FLAGS_ENABLE) )
+        {
+            /*
+             * pci_reset_msix_state() relies on MASKALL not being set
+             * initially, clear it (and ENABLE too - for safety), to meet that
+             * expectation.
+             */
+            printk(XENLOG_WARNING
+                   "%pp: unexpected initial MSI-X state (MASKALL=%d, ENABLE=%d), fixing\n",
+                   &pdev->sbdf,
+                   (ctrl & PCI_MSIX_FLAGS_MASKALL) ? 1 : 0,
+                   (ctrl & PCI_MSIX_FLAGS_ENABLE) ? 1 : 0);
+            ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL);
+            pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl);
+        }
+
         msix->nr_entries = msix_table_size(ctrl);
 
         pdev->msix = msix;