diff mbox series

[for-4.17,5/6] pci: do not disable memory decoding for devices

Message ID 20221020094649.28667-6-roger.pau@citrix.com (mailing list archive)
State Superseded
Headers show
Series (v)pci: fixes related to memory decoding handling | expand

Commit Message

Roger Pau Monne Oct. 20, 2022, 9:46 a.m. UTC
Commit 75cc460a1b added checks to ensure the position of the BARs from
PCI devices don't overlap with regions defined on the memory map.
When there's a collision memory decoding is left disabled for the
device, assuming that dom0 will reposition the BAR if necessary and
enable memory decoding.

While this would be the case for devices being used by dom0, devices
being used by the firmware itself that have no driver would usually be
left with memory decoding disabled by dom0 if that's the state dom0
found them in, and thus firmware trying to make use of them will not
function correctly.

The initial intent of 75cc460a1b was to prevent vPCI from creating
MMIO mappings on the dom0 p2m over regions that would otherwise
already have mappings established.  It's my view now that we likely
went too far with 75cc460a1b, and Xen disabling memory decoding of
devices (as buggy as they might be) is harmful, and reduces the set of
hardware on which Xen works.

This commits reverts most of 75cc460a1b, and instead adds checks to
vPCI in order to prevent misplaced BARs from being added to the
hardware domain p2m.  Signaling on whether BARs are mapped is tracked
in the vpci structure, so that misplaced BARs are not mapped, and thus
Xen won't attempt to unmap them when memory decoding is disabled.

This restores the behavior of Xen for PV dom0 to the state it was
previous to 75cc460a1b, while also introducing a more contained fix
for the vPCI BAR mapping issues.

Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
AT Citrix we have a system with a device with the following BARs:

BAR [0xfe010, 0xfe010] -> in a EfiMemoryMappedIO region
BAR [0, 0x1fff] -> not positioned, outside host bridge window

And memory decoding enabled by the firmware.  With the current code
(or any of the previous fix proposals), Xen would still disable memory
decoding for the device, and the system will freeze when attempting to
set EFI vars.

I'm afraid the best solution to avoid regressions caused by 75cc460a1b
is the proposal here.
---
 xen/drivers/passthrough/pci.c | 69 -----------------------------------
 xen/drivers/vpci/header.c     | 22 ++++++++++-
 2 files changed, 20 insertions(+), 71 deletions(-)

Comments

Jan Beulich Oct. 24, 2022, 11:19 a.m. UTC | #1
On 20.10.2022 11:46, Roger Pau Monne wrote:
> Commit 75cc460a1b added checks to ensure the position of the BARs from
> PCI devices don't overlap with regions defined on the memory map.
> When there's a collision memory decoding is left disabled for the
> device, assuming that dom0 will reposition the BAR if necessary and
> enable memory decoding.
> 
> While this would be the case for devices being used by dom0, devices
> being used by the firmware itself that have no driver would usually be
> left with memory decoding disabled by dom0 if that's the state dom0
> found them in, and thus firmware trying to make use of them will not
> function correctly.
> 
> The initial intent of 75cc460a1b was to prevent vPCI from creating
> MMIO mappings on the dom0 p2m over regions that would otherwise
> already have mappings established.  It's my view now that we likely
> went too far with 75cc460a1b, and Xen disabling memory decoding of
> devices (as buggy as they might be) is harmful, and reduces the set of
> hardware on which Xen works.
> 
> This commits reverts most of 75cc460a1b, and instead adds checks to
> vPCI in order to prevent misplaced BARs from being added to the
> hardware domain p2m.

Which makes me wonder: How do things work then? Dom0 then still can't
access the BAR address range, can it? Plus with this adjustment, is
...

>  Signaling on whether BARs are mapped is tracked
> in the vpci structure, so that misplaced BARs are not mapped, and thus
> Xen won't attempt to unmap them when memory decoding is disabled.
> 
> This restores the behavior of Xen for PV dom0 to the state it was
> previous to 75cc460a1b, while also introducing a more contained fix
> for the vPCI BAR mapping issues.

... this (in particular "restores the behavior") a valid description
of this change?

> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> AT Citrix we have a system with a device with the following BARs:
> 
> BAR [0xfe010, 0xfe010] -> in a EfiMemoryMappedIO region
> BAR [0, 0x1fff] -> not positioned, outside host bridge window
> 
> And memory decoding enabled by the firmware.  With the current code
> (or any of the previous fix proposals), Xen would still disable memory
> decoding for the device, and the system will freeze when attempting to
> set EFI vars.

Isn't the latter (BAR at address 0) yet another problem? I have to admit
that I'm uncertain in how far it is a good idea to try to make Xen look
to work on such a system ...

Jan
Roger Pau Monne Oct. 24, 2022, 12:45 p.m. UTC | #2
On Mon, Oct 24, 2022 at 01:19:22PM +0200, Jan Beulich wrote:
> On 20.10.2022 11:46, Roger Pau Monne wrote:
> > Commit 75cc460a1b added checks to ensure the position of the BARs from
> > PCI devices don't overlap with regions defined on the memory map.
> > When there's a collision memory decoding is left disabled for the
> > device, assuming that dom0 will reposition the BAR if necessary and
> > enable memory decoding.
> > 
> > While this would be the case for devices being used by dom0, devices
> > being used by the firmware itself that have no driver would usually be
> > left with memory decoding disabled by dom0 if that's the state dom0
> > found them in, and thus firmware trying to make use of them will not
> > function correctly.
> > 
> > The initial intent of 75cc460a1b was to prevent vPCI from creating
> > MMIO mappings on the dom0 p2m over regions that would otherwise
> > already have mappings established.  It's my view now that we likely
> > went too far with 75cc460a1b, and Xen disabling memory decoding of
> > devices (as buggy as they might be) is harmful, and reduces the set of
> > hardware on which Xen works.
> > 
> > This commits reverts most of 75cc460a1b, and instead adds checks to
> > vPCI in order to prevent misplaced BARs from being added to the
> > hardware domain p2m.
> 
> Which makes me wonder: How do things work then? Dom0 then still can't
> access the BAR address range, can it?

It does allow access on some situations where the previous arrangement
didn't work because it wholesale disabled memory decoding for the
device.

So if it's only one BAR that's misplaced the rest will still get added
to the dom0 p2m and be accessible, because memory decoding won't be
turned off for the device.

> Plus with this adjustment, is
> ...
> 
> >  Signaling on whether BARs are mapped is tracked
> > in the vpci structure, so that misplaced BARs are not mapped, and thus
> > Xen won't attempt to unmap them when memory decoding is disabled.
> > 
> > This restores the behavior of Xen for PV dom0 to the state it was
> > previous to 75cc460a1b, while also introducing a more contained fix
> > for the vPCI BAR mapping issues.
> 
> ... this (in particular "restores the behavior") a valid description
> of this change?

Yes, it restores the previous behavior for PV dom0, as memory decoding
is no longer turned off for any devices regardless of where the BARs
are positioned.

> > Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > AT Citrix we have a system with a device with the following BARs:
> > 
> > BAR [0xfe010, 0xfe010] -> in a EfiMemoryMappedIO region
> > BAR [0, 0x1fff] -> not positioned, outside host bridge window
> > 
> > And memory decoding enabled by the firmware.  With the current code
> > (or any of the previous fix proposals), Xen would still disable memory
> > decoding for the device, and the system will freeze when attempting to
> > set EFI vars.
> 
> Isn't the latter (BAR at address 0) yet another problem?

It's a BAR that hasn't been positioned by the firmware AFAICT.  Which
is a bug in the firmware but shouldn't prevent Xen from booting.

In the above system address 0 is outside of the PCI host bridge
window, so even if we mapped the BAR and memory decoding for the
device was enabled accessing such BAR wouldn't work.

> I have to admit
> that I'm uncertain in how far it is a good idea to try to make Xen look
> to work on such a system ...

PV dom0 works on a system like the above prior to c/s 75cc460a1b, so I
would consider 75cc460a1b to be a regression for PV dom0 setups.

Thanks, Roger.
Jan Beulich Oct. 24, 2022, 1:59 p.m. UTC | #3
On 24.10.2022 14:45, Roger Pau Monné wrote:
> On Mon, Oct 24, 2022 at 01:19:22PM +0200, Jan Beulich wrote:
>> On 20.10.2022 11:46, Roger Pau Monne wrote:
>>> Commit 75cc460a1b added checks to ensure the position of the BARs from
>>> PCI devices don't overlap with regions defined on the memory map.
>>> When there's a collision memory decoding is left disabled for the
>>> device, assuming that dom0 will reposition the BAR if necessary and
>>> enable memory decoding.
>>>
>>> While this would be the case for devices being used by dom0, devices
>>> being used by the firmware itself that have no driver would usually be
>>> left with memory decoding disabled by dom0 if that's the state dom0
>>> found them in, and thus firmware trying to make use of them will not
>>> function correctly.
>>>
>>> The initial intent of 75cc460a1b was to prevent vPCI from creating
>>> MMIO mappings on the dom0 p2m over regions that would otherwise
>>> already have mappings established.  It's my view now that we likely
>>> went too far with 75cc460a1b, and Xen disabling memory decoding of
>>> devices (as buggy as they might be) is harmful, and reduces the set of
>>> hardware on which Xen works.
>>>
>>> This commits reverts most of 75cc460a1b, and instead adds checks to
>>> vPCI in order to prevent misplaced BARs from being added to the
>>> hardware domain p2m.
>>
>> Which makes me wonder: How do things work then? Dom0 then still can't
>> access the BAR address range, can it?
> 
> It does allow access on some situations where the previous arrangement
> didn't work because it wholesale disabled memory decoding for the
> device.
> 
> So if it's only one BAR that's misplaced the rest will still get added
> to the dom0 p2m and be accessible, because memory decoding won't be
> turned off for the device.

Right - without a per-BAR disable there can only be all or nothing. In
the end if things work with this adjustment, the problem BAR cannot
really be in use aiui. I wonder what you would propose we do if on
another system such a BAR is actually in use.

>> Plus with this adjustment, is
>> ...
>>
>>>  Signaling on whether BARs are mapped is tracked
>>> in the vpci structure, so that misplaced BARs are not mapped, and thus
>>> Xen won't attempt to unmap them when memory decoding is disabled.
>>>
>>> This restores the behavior of Xen for PV dom0 to the state it was
>>> previous to 75cc460a1b, while also introducing a more contained fix
>>> for the vPCI BAR mapping issues.
>>
>> ... this (in particular "restores the behavior") a valid description
>> of this change?
> 
> Yes, it restores the previous behavior for PV dom0, as memory decoding
> is no longer turned off for any devices regardless of where the BARs
> are positioned.

It restores one aspect of behavior but then puts in place another
restriction.

>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> AT Citrix we have a system with a device with the following BARs:
>>>
>>> BAR [0xfe010, 0xfe010] -> in a EfiMemoryMappedIO region
>>> BAR [0, 0x1fff] -> not positioned, outside host bridge window
>>>
>>> And memory decoding enabled by the firmware.  With the current code
>>> (or any of the previous fix proposals), Xen would still disable memory
>>> decoding for the device, and the system will freeze when attempting to
>>> set EFI vars.
>>
>> Isn't the latter (BAR at address 0) yet another problem?
> 
> It's a BAR that hasn't been positioned by the firmware AFAICT.  Which
> is a bug in the firmware but shouldn't prevent Xen from booting.
> 
> In the above system address 0 is outside of the PCI host bridge
> window, so even if we mapped the BAR and memory decoding for the
> device was enabled accessing such BAR wouldn't work.

It's mere luck I would say that in this case the BAR is outside the
bridge's window. What if this was a device integrated in the root
complex?

>> I have to admit
>> that I'm uncertain in how far it is a good idea to try to make Xen look
>> to work on such a system ...
> 
> PV dom0 works on a system like the above prior to c/s 75cc460a1b, so I
> would consider 75cc460a1b to be a regression for PV dom0 setups.

Agreed, in a way it is a regression. In another way it is deliberate
behavior to not accept bogus configurations. The difficulty is to
find a reasonable balance between allowing Xen to work in such cases
and guarding Xen from suffering follow-on issues resulting from such
misconfiguration. After all if this system later was impacted by the
bad BAR(s), connecting the misbehavior to the root cause might end
up quite a bit more difficult.

Jan
Roger Pau Monne Oct. 24, 2022, 3:45 p.m. UTC | #4
On Mon, Oct 24, 2022 at 03:59:18PM +0200, Jan Beulich wrote:
> On 24.10.2022 14:45, Roger Pau Monné wrote:
> > On Mon, Oct 24, 2022 at 01:19:22PM +0200, Jan Beulich wrote:
> >> On 20.10.2022 11:46, Roger Pau Monne wrote:
> >>> Commit 75cc460a1b added checks to ensure the position of the BARs from
> >>> PCI devices don't overlap with regions defined on the memory map.
> >>> When there's a collision memory decoding is left disabled for the
> >>> device, assuming that dom0 will reposition the BAR if necessary and
> >>> enable memory decoding.
> >>>
> >>> While this would be the case for devices being used by dom0, devices
> >>> being used by the firmware itself that have no driver would usually be
> >>> left with memory decoding disabled by dom0 if that's the state dom0
> >>> found them in, and thus firmware trying to make use of them will not
> >>> function correctly.
> >>>
> >>> The initial intent of 75cc460a1b was to prevent vPCI from creating
> >>> MMIO mappings on the dom0 p2m over regions that would otherwise
> >>> already have mappings established.  It's my view now that we likely
> >>> went too far with 75cc460a1b, and Xen disabling memory decoding of
> >>> devices (as buggy as they might be) is harmful, and reduces the set of
> >>> hardware on which Xen works.
> >>>
> >>> This commits reverts most of 75cc460a1b, and instead adds checks to
> >>> vPCI in order to prevent misplaced BARs from being added to the
> >>> hardware domain p2m.
> >>
> >> Which makes me wonder: How do things work then? Dom0 then still can't
> >> access the BAR address range, can it?
> > 
> > It does allow access on some situations where the previous arrangement
> > didn't work because it wholesale disabled memory decoding for the
> > device.
> > 
> > So if it's only one BAR that's misplaced the rest will still get added
> > to the dom0 p2m and be accessible, because memory decoding won't be
> > turned off for the device.
> 
> Right - without a per-BAR disable there can only be all or nothing. In
> the end if things work with this adjustment, the problem BAR cannot
> really be in use aiui. I wonder what you would propose we do if on
> another system such a BAR is actually in use.

dom0 would have to change the position of the BAR to a suitable place
and then use it.  Linux dom0 does already reposition bogus BARs of
devices.

> >> Plus with this adjustment, is
> >> ...
> >>
> >>>  Signaling on whether BARs are mapped is tracked
> >>> in the vpci structure, so that misplaced BARs are not mapped, and thus
> >>> Xen won't attempt to unmap them when memory decoding is disabled.
> >>>
> >>> This restores the behavior of Xen for PV dom0 to the state it was
> >>> previous to 75cc460a1b, while also introducing a more contained fix
> >>> for the vPCI BAR mapping issues.
> >>
> >> ... this (in particular "restores the behavior") a valid description
> >> of this change?
> > 
> > Yes, it restores the previous behavior for PV dom0, as memory decoding
> > is no longer turned off for any devices regardless of where the BARs
> > are positioned.
> 
> It restores one aspect of behavior but then puts in place another
> restriction.

I assume the other restriction is about moving the check to vPCI code
rather than disabling memory decoding?

It restores the behavior for PV dom0, and keeps a more 'contained' fix
for PVH dom0.

> 
> >>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
> >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>> ---
> >>> AT Citrix we have a system with a device with the following BARs:
> >>>
> >>> BAR [0xfe010, 0xfe010] -> in a EfiMemoryMappedIO region
> >>> BAR [0, 0x1fff] -> not positioned, outside host bridge window
> >>>
> >>> And memory decoding enabled by the firmware.  With the current code
> >>> (or any of the previous fix proposals), Xen would still disable memory
> >>> decoding for the device, and the system will freeze when attempting to
> >>> set EFI vars.
> >>
> >> Isn't the latter (BAR at address 0) yet another problem?
> > 
> > It's a BAR that hasn't been positioned by the firmware AFAICT.  Which
> > is a bug in the firmware but shouldn't prevent Xen from booting.
> > 
> > In the above system address 0 is outside of the PCI host bridge
> > window, so even if we mapped the BAR and memory decoding for the
> > device was enabled accessing such BAR wouldn't work.
> 
> It's mere luck I would say that in this case the BAR is outside the
> bridge's window. What if this was a device integrated in the root
> complex?

I would expect dom0 to reposition the BAR, but doesn't a root complex
also have a set of windows in decodes accesses from (as listed in ACPI
_CRS method for the device), and hence still need BARs to be
positioned at certain ranges in order to be accessible?

> >> I have to admit
> >> that I'm uncertain in how far it is a good idea to try to make Xen look
> >> to work on such a system ...
> > 
> > PV dom0 works on a system like the above prior to c/s 75cc460a1b, so I
> > would consider 75cc460a1b to be a regression for PV dom0 setups.
> 
> Agreed, in a way it is a regression. In another way it is deliberate
> behavior to not accept bogus configurations. The difficulty is to
> find a reasonable balance between allowing Xen to work in such cases
> and guarding Xen from suffering follow-on issues resulting from such
> misconfiguration. After all if this system later was impacted by the
> bad BAR(s), connecting the misbehavior to the root cause might end
> up quite a bit more difficult.

IMO we should strive to boot (almost?) everywhere Linux (or your
chosen dom0 OS) also boots, since that's what users expect.

I would assume if the system was impacted by the bad BARs, it would
also affect the dom0 OS when booting natively on such platform.

What we do right now with memory decoding already leads to a very
difficult to diagnose issue, as on the above example calling an UEFI
runtime method completely freezes the box (no debug keys, no watchdog
worked).

So I think leaving the system PCI devices as-is and letting dom0 deal
with the conflicts is likely a better option than playing with the
memory decoding bits.

Thanks, Roger.
Jan Beulich Oct. 24, 2022, 3:56 p.m. UTC | #5
On 24.10.2022 17:45, Roger Pau Monné wrote:
> On Mon, Oct 24, 2022 at 03:59:18PM +0200, Jan Beulich wrote:
>> On 24.10.2022 14:45, Roger Pau Monné wrote:
>>> On Mon, Oct 24, 2022 at 01:19:22PM +0200, Jan Beulich wrote:
>>>> On 20.10.2022 11:46, Roger Pau Monne wrote:
>>>>> Commit 75cc460a1b added checks to ensure the position of the BARs from
>>>>> PCI devices don't overlap with regions defined on the memory map.
>>>>> When there's a collision memory decoding is left disabled for the
>>>>> device, assuming that dom0 will reposition the BAR if necessary and
>>>>> enable memory decoding.
>>>>>
>>>>> While this would be the case for devices being used by dom0, devices
>>>>> being used by the firmware itself that have no driver would usually be
>>>>> left with memory decoding disabled by dom0 if that's the state dom0
>>>>> found them in, and thus firmware trying to make use of them will not
>>>>> function correctly.
>>>>>
>>>>> The initial intent of 75cc460a1b was to prevent vPCI from creating
>>>>> MMIO mappings on the dom0 p2m over regions that would otherwise
>>>>> already have mappings established.  It's my view now that we likely
>>>>> went too far with 75cc460a1b, and Xen disabling memory decoding of
>>>>> devices (as buggy as they might be) is harmful, and reduces the set of
>>>>> hardware on which Xen works.
>>>>>
>>>>> This commits reverts most of 75cc460a1b, and instead adds checks to
>>>>> vPCI in order to prevent misplaced BARs from being added to the
>>>>> hardware domain p2m.
>>>>
>>>> Which makes me wonder: How do things work then? Dom0 then still can't
>>>> access the BAR address range, can it?
>>>
>>> It does allow access on some situations where the previous arrangement
>>> didn't work because it wholesale disabled memory decoding for the
>>> device.
>>>
>>> So if it's only one BAR that's misplaced the rest will still get added
>>> to the dom0 p2m and be accessible, because memory decoding won't be
>>> turned off for the device.
>>
>> Right - without a per-BAR disable there can only be all or nothing. In
>> the end if things work with this adjustment, the problem BAR cannot
>> really be in use aiui. I wonder what you would propose we do if on
>> another system such a BAR is actually in use.
> 
> dom0 would have to change the position of the BAR to a suitable place
> and then use it.  Linux dom0 does already reposition bogus BARs of
> devices.

Yet that still can't realistically work if the firmware expects the
BAR at the address recorded in the EFI memory map entry.

>>>> Plus with this adjustment, is
>>>> ...
>>>>
>>>>>  Signaling on whether BARs are mapped is tracked
>>>>> in the vpci structure, so that misplaced BARs are not mapped, and thus
>>>>> Xen won't attempt to unmap them when memory decoding is disabled.
>>>>>
>>>>> This restores the behavior of Xen for PV dom0 to the state it was
>>>>> previous to 75cc460a1b, while also introducing a more contained fix
>>>>> for the vPCI BAR mapping issues.
>>>>
>>>> ... this (in particular "restores the behavior") a valid description
>>>> of this change?
>>>
>>> Yes, it restores the previous behavior for PV dom0, as memory decoding
>>> is no longer turned off for any devices regardless of where the BARs
>>> are positioned.
>>
>> It restores one aspect of behavior but then puts in place another
>> restriction.
> 
> I assume the other restriction is about moving the check to vPCI code
> rather than disabling memory decoding?
> 
> It restores the behavior for PV dom0, and keeps a more 'contained' fix
> for PVH dom0.

Ouch, I'm sorry: I didn't pay attention to the "restore" applying to PV
(the similarity of the acronyms made me read "PVH" despite it being "PV").

>>>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>>> ---
>>>>> AT Citrix we have a system with a device with the following BARs:
>>>>>
>>>>> BAR [0xfe010, 0xfe010] -> in a EfiMemoryMappedIO region
>>>>> BAR [0, 0x1fff] -> not positioned, outside host bridge window
>>>>>
>>>>> And memory decoding enabled by the firmware.  With the current code
>>>>> (or any of the previous fix proposals), Xen would still disable memory
>>>>> decoding for the device, and the system will freeze when attempting to
>>>>> set EFI vars.
>>>>
>>>> Isn't the latter (BAR at address 0) yet another problem?
>>>
>>> It's a BAR that hasn't been positioned by the firmware AFAICT.  Which
>>> is a bug in the firmware but shouldn't prevent Xen from booting.
>>>
>>> In the above system address 0 is outside of the PCI host bridge
>>> window, so even if we mapped the BAR and memory decoding for the
>>> device was enabled accessing such BAR wouldn't work.
>>
>> It's mere luck I would say that in this case the BAR is outside the
>> bridge's window. What if this was a device integrated in the root
>> complex?
> 
> I would expect dom0 to reposition the BAR, but doesn't a root complex
> also have a set of windows in decodes accesses from (as listed in ACPI
> _CRS method for the device), and hence still need BARs to be
> positioned at certain ranges in order to be accessible?

Possibly; I guess I haven't learned enough of how this works at the
root complex. Yet still an unassigned BAR might end up overlapping a
valid window.

>>>> I have to admit
>>>> that I'm uncertain in how far it is a good idea to try to make Xen look
>>>> to work on such a system ...
>>>
>>> PV dom0 works on a system like the above prior to c/s 75cc460a1b, so I
>>> would consider 75cc460a1b to be a regression for PV dom0 setups.
>>
>> Agreed, in a way it is a regression. In another way it is deliberate
>> behavior to not accept bogus configurations. The difficulty is to
>> find a reasonable balance between allowing Xen to work in such cases
>> and guarding Xen from suffering follow-on issues resulting from such
>> misconfiguration. After all if this system later was impacted by the
>> bad BAR(s), connecting the misbehavior to the root cause might end
>> up quite a bit more difficult.
> 
> IMO we should strive to boot (almost?) everywhere Linux (or your
> chosen dom0 OS) also boots, since that's what users expect.
> 
> I would assume if the system was impacted by the bad BARs, it would
> also affect the dom0 OS when booting natively on such platform.
> 
> What we do right now with memory decoding already leads to a very
> difficult to diagnose issue, as on the above example calling an UEFI
> runtime method completely freezes the box (no debug keys, no watchdog
> worked).
> 
> So I think leaving the system PCI devices as-is and letting dom0 deal
> with the conflicts is likely a better option than playing with the
> memory decoding bits.

Maybe. None of the workarounds really feel very good.

Jan
Roger Pau Monne Oct. 24, 2022, 4:24 p.m. UTC | #6
On Mon, Oct 24, 2022 at 05:56:25PM +0200, Jan Beulich wrote:
> On 24.10.2022 17:45, Roger Pau Monné wrote:
> > On Mon, Oct 24, 2022 at 03:59:18PM +0200, Jan Beulich wrote:
> >> On 24.10.2022 14:45, Roger Pau Monné wrote:
> >>> On Mon, Oct 24, 2022 at 01:19:22PM +0200, Jan Beulich wrote:
> >>>> On 20.10.2022 11:46, Roger Pau Monne wrote:
> >>>>> Commit 75cc460a1b added checks to ensure the position of the BARs from
> >>>>> PCI devices don't overlap with regions defined on the memory map.
> >>>>> When there's a collision memory decoding is left disabled for the
> >>>>> device, assuming that dom0 will reposition the BAR if necessary and
> >>>>> enable memory decoding.
> >>>>>
> >>>>> While this would be the case for devices being used by dom0, devices
> >>>>> being used by the firmware itself that have no driver would usually be
> >>>>> left with memory decoding disabled by dom0 if that's the state dom0
> >>>>> found them in, and thus firmware trying to make use of them will not
> >>>>> function correctly.
> >>>>>
> >>>>> The initial intent of 75cc460a1b was to prevent vPCI from creating
> >>>>> MMIO mappings on the dom0 p2m over regions that would otherwise
> >>>>> already have mappings established.  It's my view now that we likely
> >>>>> went too far with 75cc460a1b, and Xen disabling memory decoding of
> >>>>> devices (as buggy as they might be) is harmful, and reduces the set of
> >>>>> hardware on which Xen works.
> >>>>>
> >>>>> This commits reverts most of 75cc460a1b, and instead adds checks to
> >>>>> vPCI in order to prevent misplaced BARs from being added to the
> >>>>> hardware domain p2m.
> >>>>
> >>>> Which makes me wonder: How do things work then? Dom0 then still can't
> >>>> access the BAR address range, can it?
> >>>
> >>> It does allow access on some situations where the previous arrangement
> >>> didn't work because it wholesale disabled memory decoding for the
> >>> device.
> >>>
> >>> So if it's only one BAR that's misplaced the rest will still get added
> >>> to the dom0 p2m and be accessible, because memory decoding won't be
> >>> turned off for the device.
> >>
> >> Right - without a per-BAR disable there can only be all or nothing. In
> >> the end if things work with this adjustment, the problem BAR cannot
> >> really be in use aiui. I wonder what you would propose we do if on
> >> another system such a BAR is actually in use.
> > 
> > dom0 would have to change the position of the BAR to a suitable place
> > and then use it.  Linux dom0 does already reposition bogus BARs of
> > devices.
> 
> Yet that still can't realistically work if the firmware expects the
> BAR at the address recorded in the EFI memory map entry.

I was thinking about the BAR at address 0, rather than the BAR in the
EfiMemoryMappedIO region.

dom0 OS would need to avoid moving it, but that would also apply when
running natively on the platform.  The behavior when running as a dom0
won't change vs the native behavior, which is what we should aim for.

> >>>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
> >>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>>>> ---
> >>>>> AT Citrix we have a system with a device with the following BARs:
> >>>>>
> >>>>> BAR [0xfe010, 0xfe010] -> in a EfiMemoryMappedIO region
> >>>>> BAR [0, 0x1fff] -> not positioned, outside host bridge window
> >>>>>
> >>>>> And memory decoding enabled by the firmware.  With the current code
> >>>>> (or any of the previous fix proposals), Xen would still disable memory
> >>>>> decoding for the device, and the system will freeze when attempting to
> >>>>> set EFI vars.
> >>>>
> >>>> Isn't the latter (BAR at address 0) yet another problem?
> >>>
> >>> It's a BAR that hasn't been positioned by the firmware AFAICT.  Which
> >>> is a bug in the firmware but shouldn't prevent Xen from booting.
> >>>
> >>> In the above system address 0 is outside of the PCI host bridge
> >>> window, so even if we mapped the BAR and memory decoding for the
> >>> device was enabled accessing such BAR wouldn't work.
> >>
> >> It's mere luck I would say that in this case the BAR is outside the
> >> bridge's window. What if this was a device integrated in the root
> >> complex?
> > 
> > I would expect dom0 to reposition the BAR, but doesn't a root complex
> > also have a set of windows in decodes accesses from (as listed in ACPI
> > _CRS method for the device), and hence still need BARs to be
> > positioned at certain ranges in order to be accessible?
> 
> Possibly; I guess I haven't learned enough of how this works at the
> root complex. Yet still an unassigned BAR might end up overlapping a
> valid window.

Right, but if the BAR overlaps a valid window it could be seen as
correctly positioned, and in any case that would be for dom0 to deal
with.

What we care about is BARs no overlapping regions on the memory map,
so that we can setup a valid p2m for dom0.

> >>>> I have to admit
> >>>> that I'm uncertain in how far it is a good idea to try to make Xen look
> >>>> to work on such a system ...
> >>>
> >>> PV dom0 works on a system like the above prior to c/s 75cc460a1b, so I
> >>> would consider 75cc460a1b to be a regression for PV dom0 setups.
> >>
> >> Agreed, in a way it is a regression. In another way it is deliberate
> >> behavior to not accept bogus configurations. The difficulty is to
> >> find a reasonable balance between allowing Xen to work in such cases
> >> and guarding Xen from suffering follow-on issues resulting from such
> >> misconfiguration. After all if this system later was impacted by the
> >> bad BAR(s), connecting the misbehavior to the root cause might end
> >> up quite a bit more difficult.
> > 
> > IMO we should strive to boot (almost?) everywhere Linux (or your
> > chosen dom0 OS) also boots, since that's what users expect.
> > 
> > I would assume if the system was impacted by the bad BARs, it would
> > also affect the dom0 OS when booting natively on such platform.
> > 
> > What we do right now with memory decoding already leads to a very
> > difficult to diagnose issue, as on the above example calling an UEFI
> > runtime method completely freezes the box (no debug keys, no watchdog
> > worked).
> > 
> > So I think leaving the system PCI devices as-is and letting dom0 deal
> > with the conflicts is likely a better option than playing with the
> > memory decoding bits.
> 
> Maybe. None of the workarounds really feel very good.

Hence this last suggestion which limits the workarounds to PVH dom0
only, thus limiting the interaction of Xen with PCI devices as much as
possible.  I think it's an appropriate compromise between being able
to boot as PVH dom0 and not playing with the PCI device memory
decoding bits.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 149f68bb6e..b42acb8d7c 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -233,9 +233,6 @@  static void check_pdev(const struct pci_dev *pdev)
      PCI_STATUS_REC_TARGET_ABORT | PCI_STATUS_REC_MASTER_ABORT | \
      PCI_STATUS_SIG_SYSTEM_ERROR | PCI_STATUS_DETECTED_PARITY)
     u16 val;
-    unsigned int nbars = 0, rom_pos = 0, i;
-    static const char warn[] = XENLOG_WARNING
-        "%pp disabled: %sBAR [%#lx, %#lx] overlaps with memory map\n";
 
     if ( command_mask )
     {
@@ -254,8 +251,6 @@  static void check_pdev(const struct pci_dev *pdev)
     switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
     {
     case PCI_HEADER_TYPE_BRIDGE:
-        nbars = PCI_HEADER_BRIDGE_NR_BARS;
-        rom_pos = PCI_ROM_ADDRESS1;
         if ( !bridge_ctl_mask )
             break;
         val = pci_conf_read16(pdev->sbdf, PCI_BRIDGE_CONTROL);
@@ -272,75 +267,11 @@  static void check_pdev(const struct pci_dev *pdev)
         }
         break;
 
-    case PCI_HEADER_TYPE_NORMAL:
-        nbars = PCI_HEADER_NORMAL_NR_BARS;
-        rom_pos = PCI_ROM_ADDRESS;
-        break;
-
     case PCI_HEADER_TYPE_CARDBUS:
         /* TODO */
         break;
     }
 #undef PCI_STATUS_CHECK
-
-    /* Check if BARs overlap with other memory regions. */
-    val = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
-    if ( !(val & PCI_COMMAND_MEMORY) || pdev->ignore_bars )
-        return;
-
-    pci_conf_write16(pdev->sbdf, PCI_COMMAND, val & ~PCI_COMMAND_MEMORY);
-    for ( i = 0; i < nbars; )
-    {
-        uint64_t addr, size;
-        unsigned int reg = PCI_BASE_ADDRESS_0 + i * 4;
-        int rc = 1;
-
-        if ( (pci_conf_read32(pdev->sbdf, reg) & PCI_BASE_ADDRESS_SPACE) !=
-             PCI_BASE_ADDRESS_SPACE_MEMORY )
-            goto next;
-
-        rc = pci_size_mem_bar(pdev->sbdf, reg, &addr, &size,
-                              (i == nbars - 1) ? PCI_BAR_LAST : 0);
-        if ( rc < 0 )
-            /* Unable to size, better leave memory decoding disabled. */
-            return;
-        if ( size && !pci_check_bar(pdev, maddr_to_mfn(addr),
-                                    maddr_to_mfn(addr + size - 1)) )
-        {
-            /*
-             * Return without enabling memory decoding if BAR position is not
-             * in IO suitable memory. Let the hardware domain re-position the
-             * BAR.
-             */
-            printk(warn,
-                   &pdev->sbdf, "", PFN_DOWN(addr), PFN_DOWN(addr + size - 1));
-            return;
-        }
-
- next:
-        ASSERT(rc > 0);
-        i += rc;
-    }
-
-    if ( rom_pos &&
-         (pci_conf_read32(pdev->sbdf, rom_pos) & PCI_ROM_ADDRESS_ENABLE) )
-    {
-        uint64_t addr, size;
-        int rc = pci_size_mem_bar(pdev->sbdf, rom_pos, &addr, &size,
-                                  PCI_BAR_ROM);
-
-        if ( rc < 0 )
-            return;
-        if ( size && !pci_check_bar(pdev, maddr_to_mfn(addr),
-                                    maddr_to_mfn(addr + size - 1)) )
-        {
-            printk(warn, &pdev->sbdf, "ROM ", PFN_DOWN(addr),
-                   PFN_DOWN(addr + size - 1));
-            return;
-        }
-    }
-
-    pci_conf_write16(pdev->sbdf, PCI_COMMAND, val);
 }
 
 static void apply_quirks(struct pci_dev *pdev)
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index eb9219a49a..4d7f8f4a30 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -121,7 +121,9 @@  static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
         }
 
         if ( !rom_only &&
-             (bar->type != VPCI_BAR_ROM || header->rom_enabled) )
+             (bar->type != VPCI_BAR_ROM || header->rom_enabled) &&
+             pci_check_bar(pdev, _mfn(PFN_DOWN(bar->addr)),
+                           _mfn(PFN_DOWN(bar->addr + bar->size - 1))) )
             bar->enabled = map;
     }
 
@@ -234,9 +236,25 @@  static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
 
         if ( !MAPPABLE_BAR(bar) ||
              (rom_only ? bar->type != VPCI_BAR_ROM
-                       : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) )
+                       : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) ||
+             /* Skip BARs already in the requested state. */
+             bar->enabled == !!(cmd & PCI_COMMAND_MEMORY) )
             continue;
 
+        /*
+         * Only do BAR position checks for the hardware domain, for guests it's
+         * assumed that the hardware domain has changed the position of any
+         * problematic BARs.
+         */
+        if ( is_hardware_domain(pdev->domain) &&
+             !pci_check_bar(pdev, _mfn(start), _mfn(end)) )
+        {
+            printk(XENLOG_G_WARNING
+                   "%pp: not mapping BAR [%lx, %lx] invalid position\n",
+                   &pdev->sbdf, start, end);
+            continue;
+        }
+
         rc = rangeset_add_range(mem, start, end);
         if ( rc )
         {