diff mbox series

[v2,3/3] x86/msi: clear initial MSI-X state on boot

Message ID 20230325024924.882883-3-marmarek@invisiblethingslab.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model | expand

Commit Message

Marek Marczykowski-Górecki March 25, 2023, 2:49 a.m. UTC
Some firmware/devices are found to not reset MSI-X properly, leaving
MASKALL set. Xen relies on initial state being both disabled.
Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
setting it due to msix->host_maskall or msix->guest_maskall. Clearing
just MASKALL might be unsafe if ENABLE is set, so clear them both.

Reported-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
v2:
 - new patch
---
 xen/drivers/passthrough/msi.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Roger Pau Monné March 28, 2023, 11:37 a.m. UTC | #1
On Sat, Mar 25, 2023 at 03:49:24AM +0100, Marek Marczykowski-Górecki wrote:
> Some firmware/devices are found to not reset MSI-X properly, leaving
> MASKALL set. Xen relies on initial state being both disabled.

The 'both' in this sentence seems out of context, as you just mention
MASKALL in the previous sentence.

> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> just MASKALL might be unsafe if ENABLE is set, so clear them both.
> 
> Reported-by: Jason Andryuk <jandryuk@gmail.com>
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> ---
> v2:
>  - new patch
> ---
>  xen/drivers/passthrough/msi.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/xen/drivers/passthrough/msi.c b/xen/drivers/passthrough/msi.c
> index ce1a450f6f4a..60adad47e379 100644
> --- a/xen/drivers/passthrough/msi.c
> +++ b/xen/drivers/passthrough/msi.c
> @@ -48,6 +48,13 @@ int pdev_msi_init(struct pci_dev *pdev)
>          ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
>          msix->nr_entries = msix_table_size(ctrl);
>  
> +        /*
> +         * Clear both ENABLE and MASKALL, pci_reset_msix_state() relies on this
> +         * initial state.
> +         */
> +        ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL);
> +        pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl);

Should you make this conditional to them being set in the first place:

if ( ctrl & (PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL) )
{
    ...

We should avoid the extra access if possible (specially if it's to
write the same value that has been read).

I would even move this before the msix_table_size() call, so the
handling of ctrl is closer together.

Would it also be worth to print a debug message that the initial
device state seems inconsistent?

Thanks, Roger.
Marek Marczykowski-Górecki March 28, 2023, 12:07 p.m. UTC | #2
On Tue, Mar 28, 2023 at 01:37:14PM +0200, Roger Pau Monné wrote:
> On Sat, Mar 25, 2023 at 03:49:24AM +0100, Marek Marczykowski-Górecki wrote:
> > Some firmware/devices are found to not reset MSI-X properly, leaving
> > MASKALL set. Xen relies on initial state being both disabled.
> 
> The 'both' in this sentence seems out of context, as you just mention
> MASKALL in the previous sentence.
> 
> > Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> > setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> > just MASKALL might be unsafe if ENABLE is set, so clear them both.
> > 
> > Reported-by: Jason Andryuk <jandryuk@gmail.com>
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > ---
> > v2:
> >  - new patch
> > ---
> >  xen/drivers/passthrough/msi.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/xen/drivers/passthrough/msi.c b/xen/drivers/passthrough/msi.c
> > index ce1a450f6f4a..60adad47e379 100644
> > --- a/xen/drivers/passthrough/msi.c
> > +++ b/xen/drivers/passthrough/msi.c
> > @@ -48,6 +48,13 @@ int pdev_msi_init(struct pci_dev *pdev)
> >          ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
> >          msix->nr_entries = msix_table_size(ctrl);
> >  
> > +        /*
> > +         * Clear both ENABLE and MASKALL, pci_reset_msix_state() relies on this
> > +         * initial state.
> > +         */
> > +        ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL);
> > +        pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl);
> 
> Should you make this conditional to them being set in the first place:
> 
> if ( ctrl & (PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL) )
> {
>     ...
> 
> We should avoid the extra access if possible (specially if it's to
> write the same value that has been read).
> 
> I would even move this before the msix_table_size() call, so the
> handling of ctrl is closer together.
> 
> Would it also be worth to print a debug message that the initial
> device state seems inconsistent?

That may make sense. XENLOG_WARNING? XENLOG_DEBUG?
Jan Beulich March 28, 2023, 12:38 p.m. UTC | #3
On 28.03.2023 14:07, Marek Marczykowski-Górecki wrote:
> On Tue, Mar 28, 2023 at 01:37:14PM +0200, Roger Pau Monné wrote:
>> On Sat, Mar 25, 2023 at 03:49:24AM +0100, Marek Marczykowski-Górecki wrote:
>>> --- a/xen/drivers/passthrough/msi.c
>>> +++ b/xen/drivers/passthrough/msi.c
>>> @@ -48,6 +48,13 @@ int pdev_msi_init(struct pci_dev *pdev)
>>>          ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
>>>          msix->nr_entries = msix_table_size(ctrl);
>>>  
>>> +        /*
>>> +         * Clear both ENABLE and MASKALL, pci_reset_msix_state() relies on this
>>> +         * initial state.
>>> +         */
>>> +        ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL);
>>> +        pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl);
>>
>> Should you make this conditional to them being set in the first place:
>>
>> if ( ctrl & (PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL) )
>> {
>>     ...
>>
>> We should avoid the extra access if possible (specially if it's to
>> write the same value that has been read).
>>
>> I would even move this before the msix_table_size() call, so the
>> handling of ctrl is closer together.
>>
>> Would it also be worth to print a debug message that the initial
>> device state seems inconsistent?
> 
> That may make sense. XENLOG_WARNING? XENLOG_DEBUG?

Warning I would say, as this isn't liable to repeat frequently for the
same device. And it's helpful to see even in release build runs using
default log levels.

Jan
Jan Beulich March 28, 2023, 12:54 p.m. UTC | #4
On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> Some firmware/devices are found to not reset MSI-X properly, leaving
> MASKALL set. Xen relies on initial state being both disabled.
> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> just MASKALL might be unsafe if ENABLE is set, so clear them both.

But pci_reset_msix_state() comes into play only when assigning a device
to a DomU. If the tool stack doing a reset doesn't properly clear the
bit, how would it be cleared the next time round (i.e. after the guest
stopped and then possibly was started again)? It feels like the issue
wants dealing with elsewhere, possibly in the tool stack.

> --- a/xen/drivers/passthrough/msi.c
> +++ b/xen/drivers/passthrough/msi.c
> @@ -48,6 +48,13 @@ int pdev_msi_init(struct pci_dev *pdev)
>          ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
>          msix->nr_entries = msix_table_size(ctrl);
>  
> +        /*
> +         * Clear both ENABLE and MASKALL, pci_reset_msix_state() relies on this
> +         * initial state.
> +         */

Please can comments be accurate at least at the time of writing?
pci_reset_msix_state() doesn't care about ENABLE at all.

Jan
Marek Marczykowski-Górecki March 28, 2023, 1:04 p.m. UTC | #5
On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> > Some firmware/devices are found to not reset MSI-X properly, leaving
> > MASKALL set. Xen relies on initial state being both disabled.
> > Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> > setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> > just MASKALL might be unsafe if ENABLE is set, so clear them both.
> 
> But pci_reset_msix_state() comes into play only when assigning a device
> to a DomU. If the tool stack doing a reset doesn't properly clear the
> bit, how would it be cleared the next time round (i.e. after the guest
> stopped and then possibly was started again)? It feels like the issue
> wants dealing with elsewhere, possibly in the tool stack.

I may be misremembering some details, but AFAIR Xen intercepts
toolstack's (or more generally: accesses from dom0) attempt to clean
this up and once it enters an inconsistent state (or rather: starts with
such at the start of the day), there was no way to clean it up.

I have considered changing pci_reset_msix_state() to not choke on
MASKALL being set, but I'm a bit afraid of doing it, as there it seems
there is a lot of assumptions all over the place and I may miss some.

> 
> > --- a/xen/drivers/passthrough/msi.c
> > +++ b/xen/drivers/passthrough/msi.c
> > @@ -48,6 +48,13 @@ int pdev_msi_init(struct pci_dev *pdev)
> >          ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
> >          msix->nr_entries = msix_table_size(ctrl);
> >  
> > +        /*
> > +         * Clear both ENABLE and MASKALL, pci_reset_msix_state() relies on this
> > +         * initial state.
> > +         */
> 
> Please can comments be accurate at least at the time of writing?
> pci_reset_msix_state() doesn't care about ENABLE at all.
> 
> Jan
Jason Andryuk March 28, 2023, 1:17 p.m. UTC | #6
On Tue, Mar 28, 2023 at 9:04 AM Marek Marczykowski-Górecki
<marmarek@invisiblethingslab.com> wrote:
>
> On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> > On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> > > Some firmware/devices are found to not reset MSI-X properly, leaving
> > > MASKALL set. Xen relies on initial state being both disabled.
> > > Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> > > setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> > > just MASKALL might be unsafe if ENABLE is set, so clear them both.
> >
> > But pci_reset_msix_state() comes into play only when assigning a device
> > to a DomU. If the tool stack doing a reset doesn't properly clear the
> > bit, how would it be cleared the next time round (i.e. after the guest
> > stopped and then possibly was started again)? It feels like the issue
> > wants dealing with elsewhere, possibly in the tool stack.
>
> I may be misremembering some details, but AFAIR Xen intercepts
> toolstack's (or more generally: accesses from dom0) attempt to clean
> this up and once it enters an inconsistent state (or rather: starts with
> such at the start of the day), there was no way to clean it up.
>
> I have considered changing pci_reset_msix_state() to not choke on
> MASKALL being set, but I'm a bit afraid of doing it, as there it seems
> there is a lot of assumptions all over the place and I may miss some.

Hi, Marek and Jan,

Marek, thank you for working on MSI-X support.

As Jan says, the clearing here works during system boot.  However, I
have found that Xen itself is setting MASKALL in __pci_disable_msix()
when shutting down a domU.  When that is called, memory_decoded(dev)
returns false, and Xen prints "cannot disable IRQ 137: masking MSI-X
on 0000:00:14.3".  That makes the device unavailable for subsequent
domU assignment.  I haven't investigated where and why memory decoding
gets disabled for the device.

Testing was with this v2 patchset integrated into OpenXT w/ Xen 4.16.
We have some device reset changes, so I'll have to look at them again.
Hmmm, they move the libxl device reseting from pci_remove_detached()
to libxl__destroy_domid() to ensure all devices are de-assign after
the domain is destroyed.  A kernel patch implements a "more thorough
reset" which could do a slot or bus level reset, and the desire is to
have all devices deassigned before that.  Maybe the shift later is
throwing off Xen's expectations?

As Marek says, the MASKALL handling seems tricky.  Xen seems to use it
to ensure there are no interrupts, so clearing it makes one worry
about breaking something else.

Regards,
Jason
Jan Beulich March 28, 2023, 1:23 p.m. UTC | #7
On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
> On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
>> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
>>> Some firmware/devices are found to not reset MSI-X properly, leaving
>>> MASKALL set. Xen relies on initial state being both disabled.
>>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
>>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
>>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
>>
>> But pci_reset_msix_state() comes into play only when assigning a device
>> to a DomU. If the tool stack doing a reset doesn't properly clear the
>> bit, how would it be cleared the next time round (i.e. after the guest
>> stopped and then possibly was started again)? It feels like the issue
>> wants dealing with elsewhere, possibly in the tool stack.
> 
> I may be misremembering some details, but AFAIR Xen intercepts
> toolstack's (or more generally: accesses from dom0) attempt to clean
> this up and once it enters an inconsistent state (or rather: starts with
> such at the start of the day), there was no way to clean it up.

Iirc Roger and you already discussed that there needs to be an
indication of device reset having happened, so that Xen can resync
from this "behind its back" operation. That would look to be the
point/place where such inconsistencies should be eliminated.

> I have considered changing pci_reset_msix_state() to not choke on
> MASKALL being set, but I'm a bit afraid of doing it, as there it seems
> there is a lot of assumptions all over the place and I may miss some.

Well, the comment there alone is enough justification to leave that
check as is, I would say. To drop it there, something equivalent
would likely want adding in its stead.

But you haven't really answered my earlier question: If reset leaves
MASKALL set, how would the same issue be taken care of the 2nd time
round? (If, otoh, firmware sets the bit for some reason, but reset
clears it along with ENABLE, I could see how a one-time action might
suffice. But the first sentence in the description doesn't make clear
which situation it is that we have to work around.)

Jan
Roger Pau Monné March 28, 2023, 1:27 p.m. UTC | #8
On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote:
> On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
> > On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> >> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> >>> Some firmware/devices are found to not reset MSI-X properly, leaving
> >>> MASKALL set. Xen relies on initial state being both disabled.
> >>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> >>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> >>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
> >>
> >> But pci_reset_msix_state() comes into play only when assigning a device
> >> to a DomU. If the tool stack doing a reset doesn't properly clear the
> >> bit, how would it be cleared the next time round (i.e. after the guest
> >> stopped and then possibly was started again)? It feels like the issue
> >> wants dealing with elsewhere, possibly in the tool stack.
> > 
> > I may be misremembering some details, but AFAIR Xen intercepts
> > toolstack's (or more generally: accesses from dom0) attempt to clean
> > this up and once it enters an inconsistent state (or rather: starts with
> > such at the start of the day), there was no way to clean it up.
> 
> Iirc Roger and you already discussed that there needs to be an
> indication of device reset having happened, so that Xen can resync
> from this "behind its back" operation. That would look to be the
> point/place where such inconsistencies should be eliminated.

I think that was a different conversation with Huang Rui related to
the AMD GPU work, see:

https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/

I understood the problem Marek was trying to solve was that some
devices where initialized with the MASKALL bit set (likely by the
firmware?) and that prevented Xen from using them.  But now seeing the
further replies on this patch I'm unsure whether that's the case.

Thanks, Roger.
Jason Andryuk March 28, 2023, 1:32 p.m. UTC | #9
On Tue, Mar 28, 2023 at 9:28 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote:
> > On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
> > > On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> > >> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> > >>> Some firmware/devices are found to not reset MSI-X properly, leaving
> > >>> MASKALL set. Xen relies on initial state being both disabled.
> > >>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> > >>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> > >>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
> > >>
> > >> But pci_reset_msix_state() comes into play only when assigning a device
> > >> to a DomU. If the tool stack doing a reset doesn't properly clear the
> > >> bit, how would it be cleared the next time round (i.e. after the guest
> > >> stopped and then possibly was started again)? It feels like the issue
> > >> wants dealing with elsewhere, possibly in the tool stack.
> > >
> > > I may be misremembering some details, but AFAIR Xen intercepts
> > > toolstack's (or more generally: accesses from dom0) attempt to clean
> > > this up and once it enters an inconsistent state (or rather: starts with
> > > such at the start of the day), there was no way to clean it up.
> >
> > Iirc Roger and you already discussed that there needs to be an
> > indication of device reset having happened, so that Xen can resync
> > from this "behind its back" operation. That would look to be the
> > point/place where such inconsistencies should be eliminated.
>
> I think that was a different conversation with Huang Rui related to
> the AMD GPU work, see:
>
> https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/
>
> I understood the problem Marek was trying to solve was that some
> devices where initialized with the MASKALL bit set (likely by the
> firmware?) and that prevented Xen from using them.  But now seeing the
> further replies on this patch I'm unsure whether that's the case.

In my case, Xen's setting of MASKALL persists through a warm reboot,
so Xen sees it set when booting.  On a cold boot, MASKALL is not set.

Regards,
Jason
Jan Beulich March 28, 2023, 1:35 p.m. UTC | #10
On 28.03.2023 15:32, Jason Andryuk wrote:
> On Tue, Mar 28, 2023 at 9:28 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>> On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote:
>>> On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
>>>> On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
>>>>> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
>>>>>> Some firmware/devices are found to not reset MSI-X properly, leaving
>>>>>> MASKALL set. Xen relies on initial state being both disabled.
>>>>>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
>>>>>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
>>>>>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
>>>>>
>>>>> But pci_reset_msix_state() comes into play only when assigning a device
>>>>> to a DomU. If the tool stack doing a reset doesn't properly clear the
>>>>> bit, how would it be cleared the next time round (i.e. after the guest
>>>>> stopped and then possibly was started again)? It feels like the issue
>>>>> wants dealing with elsewhere, possibly in the tool stack.
>>>>
>>>> I may be misremembering some details, but AFAIR Xen intercepts
>>>> toolstack's (or more generally: accesses from dom0) attempt to clean
>>>> this up and once it enters an inconsistent state (or rather: starts with
>>>> such at the start of the day), there was no way to clean it up.
>>>
>>> Iirc Roger and you already discussed that there needs to be an
>>> indication of device reset having happened, so that Xen can resync
>>> from this "behind its back" operation. That would look to be the
>>> point/place where such inconsistencies should be eliminated.
>>
>> I think that was a different conversation with Huang Rui related to
>> the AMD GPU work, see:
>>
>> https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/
>>
>> I understood the problem Marek was trying to solve was that some
>> devices where initialized with the MASKALL bit set (likely by the
>> firmware?) and that prevented Xen from using them.  But now seeing the
>> further replies on this patch I'm unsure whether that's the case.
> 
> In my case, Xen's setting of MASKALL persists through a warm reboot,

And does this get in the way of Dom0 using the device? (Before a DomU
gets to use it, things should be properly reset anyway.)

Jan

> so Xen sees it set when booting.  On a cold boot, MASKALL is not set.
> 
> Regards,
> Jason
Jason Andryuk March 28, 2023, 1:43 p.m. UTC | #11
On Tue, Mar 28, 2023 at 9:35 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 28.03.2023 15:32, Jason Andryuk wrote:
> > On Tue, Mar 28, 2023 at 9:28 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >> On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote:
> >>> On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
> >>>> On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> >>>>> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> >>>>>> Some firmware/devices are found to not reset MSI-X properly, leaving
> >>>>>> MASKALL set. Xen relies on initial state being both disabled.
> >>>>>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> >>>>>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> >>>>>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
> >>>>>
> >>>>> But pci_reset_msix_state() comes into play only when assigning a device
> >>>>> to a DomU. If the tool stack doing a reset doesn't properly clear the
> >>>>> bit, how would it be cleared the next time round (i.e. after the guest
> >>>>> stopped and then possibly was started again)? It feels like the issue
> >>>>> wants dealing with elsewhere, possibly in the tool stack.
> >>>>
> >>>> I may be misremembering some details, but AFAIR Xen intercepts
> >>>> toolstack's (or more generally: accesses from dom0) attempt to clean
> >>>> this up and once it enters an inconsistent state (or rather: starts with
> >>>> such at the start of the day), there was no way to clean it up.
> >>>
> >>> Iirc Roger and you already discussed that there needs to be an
> >>> indication of device reset having happened, so that Xen can resync
> >>> from this "behind its back" operation. That would look to be the
> >>> point/place where such inconsistencies should be eliminated.
> >>
> >> I think that was a different conversation with Huang Rui related to
> >> the AMD GPU work, see:
> >>
> >> https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/
> >>
> >> I understood the problem Marek was trying to solve was that some
> >> devices where initialized with the MASKALL bit set (likely by the
> >> firmware?) and that prevented Xen from using them.  But now seeing the
> >> further replies on this patch I'm unsure whether that's the case.
> >
> > In my case, Xen's setting of MASKALL persists through a warm reboot,
>
> And does this get in the way of Dom0 using the device? (Before a DomU
> gets to use it, things should be properly reset anyway.)

Dom0 doesn't have drivers for the device, so I am not sure.  I don't
seem to have the logs around, but I believe when MASKALL is set, the
initial quarantine of the device fails.  Yes, some notes I have
mention:

It's getting -EBUSY from pdev_msix_assign() which means
pci_reset_msix_state() is failing:
    if ( pci_conf_read16(pdev->sbdf, msix_control_reg(pos)) &
         PCI_MSIX_FLAGS_MASKALL )
        return -EBUSY;

Regards,
Jason
Jan Beulich March 28, 2023, 1:54 p.m. UTC | #12
On 28.03.2023 15:43, Jason Andryuk wrote:
> On Tue, Mar 28, 2023 at 9:35 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 28.03.2023 15:32, Jason Andryuk wrote:
>>> On Tue, Mar 28, 2023 at 9:28 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>> On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote:
>>>>> On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
>>>>>> On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
>>>>>>> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
>>>>>>>> Some firmware/devices are found to not reset MSI-X properly, leaving
>>>>>>>> MASKALL set. Xen relies on initial state being both disabled.
>>>>>>>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
>>>>>>>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
>>>>>>>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
>>>>>>>
>>>>>>> But pci_reset_msix_state() comes into play only when assigning a device
>>>>>>> to a DomU. If the tool stack doing a reset doesn't properly clear the
>>>>>>> bit, how would it be cleared the next time round (i.e. after the guest
>>>>>>> stopped and then possibly was started again)? It feels like the issue
>>>>>>> wants dealing with elsewhere, possibly in the tool stack.
>>>>>>
>>>>>> I may be misremembering some details, but AFAIR Xen intercepts
>>>>>> toolstack's (or more generally: accesses from dom0) attempt to clean
>>>>>> this up and once it enters an inconsistent state (or rather: starts with
>>>>>> such at the start of the day), there was no way to clean it up.
>>>>>
>>>>> Iirc Roger and you already discussed that there needs to be an
>>>>> indication of device reset having happened, so that Xen can resync
>>>>> from this "behind its back" operation. That would look to be the
>>>>> point/place where such inconsistencies should be eliminated.
>>>>
>>>> I think that was a different conversation with Huang Rui related to
>>>> the AMD GPU work, see:
>>>>
>>>> https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/
>>>>
>>>> I understood the problem Marek was trying to solve was that some
>>>> devices where initialized with the MASKALL bit set (likely by the
>>>> firmware?) and that prevented Xen from using them.  But now seeing the
>>>> further replies on this patch I'm unsure whether that's the case.
>>>
>>> In my case, Xen's setting of MASKALL persists through a warm reboot,
>>
>> And does this get in the way of Dom0 using the device? (Before a DomU
>> gets to use it, things should be properly reset anyway.)
> 
> Dom0 doesn't have drivers for the device, so I am not sure.  I don't
> seem to have the logs around, but I believe when MASKALL is set, the
> initial quarantine of the device fails.  Yes, some notes I have
> mention:
> 
> It's getting -EBUSY from pdev_msix_assign() which means
> pci_reset_msix_state() is failing:
>     if ( pci_conf_read16(pdev->sbdf, msix_control_reg(pos)) &
>          PCI_MSIX_FLAGS_MASKALL )
>         return -EBUSY;

Arguably this check may want skipping when moving to quarantine. I'd
still be curious to know whether the device works in Dom0, and
confirmation of device reset's effect on the bit would also be helpful.

Jan
Jason Andryuk March 28, 2023, 3:08 p.m. UTC | #13
On Tue, Mar 28, 2023 at 9:54 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 28.03.2023 15:43, Jason Andryuk wrote:
> > On Tue, Mar 28, 2023 at 9:35 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 28.03.2023 15:32, Jason Andryuk wrote:
> >>> On Tue, Mar 28, 2023 at 9:28 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>>> On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote:
> >>>>> On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
> >>>>>> On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> >>>>>>> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> >>>>>>>> Some firmware/devices are found to not reset MSI-X properly, leaving
> >>>>>>>> MASKALL set. Xen relies on initial state being both disabled.
> >>>>>>>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> >>>>>>>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> >>>>>>>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
> >>>>>>>
> >>>>>>> But pci_reset_msix_state() comes into play only when assigning a device
> >>>>>>> to a DomU. If the tool stack doing a reset doesn't properly clear the
> >>>>>>> bit, how would it be cleared the next time round (i.e. after the guest
> >>>>>>> stopped and then possibly was started again)? It feels like the issue
> >>>>>>> wants dealing with elsewhere, possibly in the tool stack.
> >>>>>>
> >>>>>> I may be misremembering some details, but AFAIR Xen intercepts
> >>>>>> toolstack's (or more generally: accesses from dom0) attempt to clean
> >>>>>> this up and once it enters an inconsistent state (or rather: starts with
> >>>>>> such at the start of the day), there was no way to clean it up.
> >>>>>
> >>>>> Iirc Roger and you already discussed that there needs to be an
> >>>>> indication of device reset having happened, so that Xen can resync
> >>>>> from this "behind its back" operation. That would look to be the
> >>>>> point/place where such inconsistencies should be eliminated.
> >>>>
> >>>> I think that was a different conversation with Huang Rui related to
> >>>> the AMD GPU work, see:
> >>>>
> >>>> https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/
> >>>>
> >>>> I understood the problem Marek was trying to solve was that some
> >>>> devices where initialized with the MASKALL bit set (likely by the
> >>>> firmware?) and that prevented Xen from using them.  But now seeing the
> >>>> further replies on this patch I'm unsure whether that's the case.
> >>>
> >>> In my case, Xen's setting of MASKALL persists through a warm reboot,
> >>
> >> And does this get in the way of Dom0 using the device? (Before a DomU
> >> gets to use it, things should be properly reset anyway.)
> >
> > Dom0 doesn't have drivers for the device, so I am not sure.  I don't
> > seem to have the logs around, but I believe when MASKALL is set, the
> > initial quarantine of the device fails.  Yes, some notes I have
> > mention:
> >
> > It's getting -EBUSY from pdev_msix_assign() which means
> > pci_reset_msix_state() is failing:
> >     if ( pci_conf_read16(pdev->sbdf, msix_control_reg(pos)) &
> >          PCI_MSIX_FLAGS_MASKALL )
> >         return -EBUSY;
>
> Arguably this check may want skipping when moving to quarantine. I'd
> still be curious to know whether the device works in Dom0, and
> confirmation of device reset's effect on the bit would also be helpful.

echo 1 > /sys/.../reset does not clear MASKALL.

Regards,
Jason
Jan Beulich March 28, 2023, 3:39 p.m. UTC | #14
On 28.03.2023 17:08, Jason Andryuk wrote:
> On Tue, Mar 28, 2023 at 9:54 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 28.03.2023 15:43, Jason Andryuk wrote:
>>> On Tue, Mar 28, 2023 at 9:35 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 28.03.2023 15:32, Jason Andryuk wrote:
>>>>> On Tue, Mar 28, 2023 at 9:28 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>>>> On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote:
>>>>>>> On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
>>>>>>>> On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
>>>>>>>>> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
>>>>>>>>>> Some firmware/devices are found to not reset MSI-X properly, leaving
>>>>>>>>>> MASKALL set. Xen relies on initial state being both disabled.
>>>>>>>>>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
>>>>>>>>>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
>>>>>>>>>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
>>>>>>>>>
>>>>>>>>> But pci_reset_msix_state() comes into play only when assigning a device
>>>>>>>>> to a DomU. If the tool stack doing a reset doesn't properly clear the
>>>>>>>>> bit, how would it be cleared the next time round (i.e. after the guest
>>>>>>>>> stopped and then possibly was started again)? It feels like the issue
>>>>>>>>> wants dealing with elsewhere, possibly in the tool stack.
>>>>>>>>
>>>>>>>> I may be misremembering some details, but AFAIR Xen intercepts
>>>>>>>> toolstack's (or more generally: accesses from dom0) attempt to clean
>>>>>>>> this up and once it enters an inconsistent state (or rather: starts with
>>>>>>>> such at the start of the day), there was no way to clean it up.
>>>>>>>
>>>>>>> Iirc Roger and you already discussed that there needs to be an
>>>>>>> indication of device reset having happened, so that Xen can resync
>>>>>>> from this "behind its back" operation. That would look to be the
>>>>>>> point/place where such inconsistencies should be eliminated.
>>>>>>
>>>>>> I think that was a different conversation with Huang Rui related to
>>>>>> the AMD GPU work, see:
>>>>>>
>>>>>> https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/
>>>>>>
>>>>>> I understood the problem Marek was trying to solve was that some
>>>>>> devices where initialized with the MASKALL bit set (likely by the
>>>>>> firmware?) and that prevented Xen from using them.  But now seeing the
>>>>>> further replies on this patch I'm unsure whether that's the case.
>>>>>
>>>>> In my case, Xen's setting of MASKALL persists through a warm reboot,
>>>>
>>>> And does this get in the way of Dom0 using the device? (Before a DomU
>>>> gets to use it, things should be properly reset anyway.)
>>>
>>> Dom0 doesn't have drivers for the device, so I am not sure.  I don't
>>> seem to have the logs around, but I believe when MASKALL is set, the
>>> initial quarantine of the device fails.  Yes, some notes I have
>>> mention:
>>>
>>> It's getting -EBUSY from pdev_msix_assign() which means
>>> pci_reset_msix_state() is failing:
>>>     if ( pci_conf_read16(pdev->sbdf, msix_control_reg(pos)) &
>>>          PCI_MSIX_FLAGS_MASKALL )
>>>         return -EBUSY;
>>
>> Arguably this check may want skipping when moving to quarantine. I'd
>> still be curious to know whether the device works in Dom0, and
>> confirmation of device reset's effect on the bit would also be helpful.
> 
> echo 1 > /sys/.../reset does not clear MASKALL.

Well, I think that - as proposed elsewhere - we need the kernel to tell
us about resets, and then we could clear the bit from there. (I didn't
check whether the spec permits the bit to remain set, or whether this
is a device erratum.)

Jan
Jason Andryuk March 28, 2023, 5:38 p.m. UTC | #15
On Tue, Mar 28, 2023 at 9:17 AM Jason Andryuk <jandryuk@gmail.com> wrote:
>
> On Tue, Mar 28, 2023 at 9:04 AM Marek Marczykowski-Górecki
> <marmarek@invisiblethingslab.com> wrote:
> >
> > On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> > > On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> > > > Some firmware/devices are found to not reset MSI-X properly, leaving
> > > > MASKALL set. Xen relies on initial state being both disabled.
> > > > Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> > > > setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> > > > just MASKALL might be unsafe if ENABLE is set, so clear them both.
> > >
> > > But pci_reset_msix_state() comes into play only when assigning a device
> > > to a DomU. If the tool stack doing a reset doesn't properly clear the
> > > bit, how would it be cleared the next time round (i.e. after the guest
> > > stopped and then possibly was started again)? It feels like the issue
> > > wants dealing with elsewhere, possibly in the tool stack.
> >
> > I may be misremembering some details, but AFAIR Xen intercepts
> > toolstack's (or more generally: accesses from dom0) attempt to clean
> > this up and once it enters an inconsistent state (or rather: starts with
> > such at the start of the day), there was no way to clean it up.
> >
> > I have considered changing pci_reset_msix_state() to not choke on
> > MASKALL being set, but I'm a bit afraid of doing it, as there it seems
> > there is a lot of assumptions all over the place and I may miss some.
>
> Hi, Marek and Jan,
>
> Marek, thank you for working on MSI-X support.
>
> As Jan says, the clearing here works during system boot.  However, I
> have found that Xen itself is setting MASKALL in __pci_disable_msix()
> when shutting down a domU.  When that is called, memory_decoded(dev)
> returns false, and Xen prints "cannot disable IRQ 137: masking MSI-X
> on 0000:00:14.3".  That makes the device unavailable for subsequent
> domU assignment.  I haven't investigated where and why memory decoding
> gets disabled for the device.
>
> Testing was with this v2 patchset integrated into OpenXT w/ Xen 4.16.
> We have some device reset changes, so I'll have to look at them again.
> Hmmm, they move the libxl device reseting from pci_remove_detached()
> to libxl__destroy_domid() to ensure all devices are de-assign after
> the domain is destroyed.  A kernel patch implements a "more thorough
> reset" which could do a slot or bus level reset, and the desire is to
> have all devices deassigned before that.  Maybe the shift later is
> throwing off Xen's expectations?

I dropped the OpenXT libxl patch, and Xen is not setting MASKALL.

Regards,
Jason
diff mbox series

Patch

diff --git a/xen/drivers/passthrough/msi.c b/xen/drivers/passthrough/msi.c
index ce1a450f6f4a..60adad47e379 100644
--- a/xen/drivers/passthrough/msi.c
+++ b/xen/drivers/passthrough/msi.c
@@ -48,6 +48,13 @@  int pdev_msi_init(struct pci_dev *pdev)
         ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
         msix->nr_entries = msix_table_size(ctrl);
 
+        /*
+         * Clear both ENABLE and MASKALL, pci_reset_msix_state() relies on this
+         * initial state.
+         */
+        ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL);
+        pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl);
+
         pdev->msix = msix;
     }