mbox series

[v2,0/8] PCI: Align small (<4k) BARs

Message ID 20240716193246.1909697-1-stewart.hildebrand@amd.com (mailing list archive)
Headers show
Series PCI: Align small (<4k) BARs | expand

Message

Stewart Hildebrand July 16, 2024, 7:32 p.m. UTC
This series sets the default minimum resource alignment to 4k for memory
BARs. In preparation, it makes an optimization and addresses some corner
cases observed when reallocating BARs. I consider the prepapatory
patches to be prerequisites to changing the default BAR alignment.

I considered introducing checks for the specific scenarios described,
but chose not to pursue this. A check such as "if (xen_domain())" may be
pretty simple, but that doesn't account for other hypervisors. If other
hypervisors are to be considered, or if we try to dynamically reallocate
BARs for devices being marked for passthrough, such a check may quickly
grow unwieldy. Further, checking for the MSI-X tables residing in a
small (<4k) BAR is unlikely to be a one-liner. Making 4k alignment the
default seems more robust. Lastly, when using IORESOURCE_STARTALIGN, all
resources in the system need to be aligned.

I considered alternatively adding new functionality to the
pci=resource_alignment= option, but that approach was already attempted
and decided against [1].

[1] https://lore.kernel.org/linux-pci/1473757234-5284-4-git-send-email-xyjxie@linux.vnet.ibm.com/

v1->v2:
* rename ("PCI: don't clear already cleared bit") to
  ("PCI: Don't unnecessarily disable memory decoding")
* new patch: ("x86/PCI: Move some logic to new function")
* new patch: ("powerpc/pci: Preserve IORESOURCE_STARTALIGN alignment")

Stewart Hildebrand (8):
  x86/PCI: Move some logic to new function
  PCI: Don't unnecessarily disable memory decoding
  PCI: Restore resource alignment
  PCI: Restore memory decoding after reallocation
  x86/PCI: Preserve IORESOURCE_STARTALIGN alignment
  powerpc/pci: Preserve IORESOURCE_STARTALIGN alignment
  PCI: Don't reassign resources that are already aligned
  PCI: Align small (<4k) BARs

 arch/powerpc/kernel/pci-common.c |  6 +++--
 arch/x86/pci/i386.c              | 38 +++++++++++++++------------
 drivers/pci/pci.c                | 43 +++++++++++++++++++++++--------
 drivers/pci/setup-bus.c          | 44 ++++++++++++++++++++++++++++++++
 include/linux/pci.h              |  2 ++
 5 files changed, 103 insertions(+), 30 deletions(-)

Comments

David Laight July 17, 2024, 1:15 p.m. UTC | #1
From: Stewart Hildebrand
> Sent: 16 July 2024 20:33
> 
> This series sets the default minimum resource alignment to 4k for memory
> BARs. In preparation, it makes an optimization and addresses some corner
> cases observed when reallocating BARs. I consider the prepapatory
> patches to be prerequisites to changing the default BAR alignment.

Should the BARs be page aligned on systems with large pages?
At least as an option for hypervisor pass-through and any than can be mmap()ed
into userspace.

Does any hardware actually have 'silly numbers' of small memory BARs?

I have a vague memory of some ethernet controllers having lots of (?)
virtual devices that might have separate registers than can be mapped
out to a hypervisor.
Expanding those to a large page might be problematic - but needed for security.

For more normal hardware just ensuring that two separate targets don't share
a page while allowing (eg) two 1k BAR to reside in the same 64k page would
give some security.

Aligning a small MSIX BAR is unlikely to have any effect on the address
space utilisation (for PCIe) since the bridge will assign a power of two
sized block - with a big pad (useful for generating pcie errors!)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
David Laight July 17, 2024, 1:21 p.m. UTC | #2
From: David Laight
> Sent: 17 July 2024 14:15
> 
> From: Stewart Hildebrand
> > Sent: 16 July 2024 20:33
> >
> > This series sets the default minimum resource alignment to 4k for memory
> > BARs. In preparation, it makes an optimization and addresses some corner
> > cases observed when reallocating BARs. I consider the prepapatory
> > patches to be prerequisites to changing the default BAR alignment.
> 
> Should the BARs be page aligned on systems with large pages?
> At least as an option for hypervisor pass-through and any than can be mmap()ed
> into userspace.

The actual patch says PAGE_SIZE ..

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Stewart Hildebrand July 17, 2024, 6:30 p.m. UTC | #3
On 7/17/24 09:15, David Laight wrote:
> From: Stewart Hildebrand
>> Sent: 16 July 2024 20:33
>>
>> This series sets the default minimum resource alignment to 4k for memory
>> BARs. In preparation, it makes an optimization and addresses some corner
>> cases observed when reallocating BARs. I consider the prepapatory
>> patches to be prerequisites to changing the default BAR alignment.
> 
> Should the BARs be page aligned on systems with large pages?
> At least as an option for hypervisor pass-through and any than can be mmap()ed
> into userspace.

It is sort of an option already using the pci=resource_alignment=
option, but you'd need to spell out every device and manually set the
alignment value, and you'd still end up with fake BAR sizes. I had
actually prepared locally a patch to make this less painful to do and
preserve the BAR size (introduce "pci=resource_alignment=all" option),
but I'd like Bjorn's opinion before sending since there has been some
previous reluctance to making changes to the pci=resource_alignment=
option [2].

[2] https://lore.kernel.org/linux-pci/20160929115422.GA31048@localhost/

Anyway, 4k is more defensible because that is what the PCIe 6.1 spec
calls out, and that is better than the current situation of no default
minimum alignment.

I feel PAGE_SIZE is also justified, and that is why the actual patch now
says max(SZ_4K, PAGE_SIZE) as you pointed out elsewhere. This is a
change from v1 that simply had 4k (sorry, I forgot to update the cover
letter). PowerNV has been using PAGE_SIZE since commits 0a701aa63784 and
382746376993, I think with 64k page size. I don't have a strong opinion
whether the common default should be max(SZ_4K, PAGE_SIZE) or simply
SZ_4K.

> Does any hardware actually have 'silly numbers' of small memory BARs?
> 
> I have a vague memory of some ethernet controllers having lots of (?)
> virtual devices that might have separate registers than can be mapped
> out to a hypervisor.
> Expanding those to a large page might be problematic - but needed for security.

This series does not change alignment of SRIOV / VF BARs. See commits
62d9a78f32d9, ea629d873f3e, and PCIe 6.1 spec section 9.2.1.1.1.

> For more normal hardware just ensuring that two separate targets don't share
> a page while allowing (eg) two 1k BAR to reside in the same 64k page would
> give some security.

Allow me to understand this better, with an example:

PCI Device A
    BAR 1 (1k)
    BAR 2 (1k)

PCI Device B
    BAR 1 (1k)
    BAR 2 (1k)

We align all BARs to 4k. Additionally, are you saying it would be ok to
let both device A BARs to reside in the same 64k page, while device B
BARs would need to reside in a separate 64k page? I.e. having two levels
of alignment: PAGE_SIZE on a per-device basis, and 4k on a per-BAR
basis?

If I understand you correctly, there's currently no logic in the PCI
subsystem to easily support this, so that is a rather large ask. I'm
also not sure that it's necessary.

> Aligning a small MSIX BAR is unlikely to have any effect on the address
> space utilisation (for PCIe) since the bridge will assign a power of two
> sized block - with a big pad (useful for generating pcie errors!)
> 
> 	David
David Laight July 18, 2024, 10:01 a.m. UTC | #4
From: Stewart Hildebrand
> Sent: 17 July 2024 19:31
...
> > For more normal hardware just ensuring that two separate targets don't share
> > a page while allowing (eg) two 1k BAR to reside in the same 64k page would
> > give some security.
> 
> Allow me to understand this better, with an example:
> 
> PCI Device A
>     BAR 1 (1k)
>     BAR 2 (1k)
> 
> PCI Device B
>     BAR 1 (1k)
>     BAR 2 (1k)
> 
> We align all BARs to 4k. Additionally, are you saying it would be ok to
> let both device A BARs to reside in the same 64k page, while device B
> BARs would need to reside in a separate 64k page? I.e. having two levels
> of alignment: PAGE_SIZE on a per-device basis, and 4k on a per-BAR
> basis?
> 
> If I understand you correctly, there's currently no logic in the PCI
> subsystem to easily support this, so that is a rather large ask. I'm
> also not sure that it's necessary.

That is what I was thinking, but it probably doesn't matter.
It would only be necessary if the system would otherwise run out
of PCI(e) address space.

Even after I reduced our FPGAs BARs from 32MB to 'only' 4MB (1MB + 1MB + 8k)
we still get issues with some PC bios failing to allocate the resources
in some slots - but these are old x86-64 systems that might have been expected
to run 32bit windows.
The requirement to use a separate BAR for MSIX pretty much doubles the
required address space.

As an aside, if a PCIe device asks for:
	BAR-0 (4k)
	BAR-1 (8k)
	BAR-2 (4k)
(which is a bit silly)
does it get packed into 16k with no padding by assigning BAR-2 between
BAR-0 and BAR-1, or is it all padded out to 32k.
I'd probably add a comment to say it isn't done :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Stewart Hildebrand July 18, 2024, 1:48 p.m. UTC | #5
On 7/18/24 06:01, David Laight wrote:
> From: Stewart Hildebrand
>> Sent: 17 July 2024 19:31
> ...
>>> For more normal hardware just ensuring that two separate targets don't share
>>> a page while allowing (eg) two 1k BAR to reside in the same 64k page would
>>> give some security.
>>
>> Allow me to understand this better, with an example:
>>
>> PCI Device A
>>     BAR 1 (1k)
>>     BAR 2 (1k)
>>
>> PCI Device B
>>     BAR 1 (1k)
>>     BAR 2 (1k)
>>
>> We align all BARs to 4k. Additionally, are you saying it would be ok to
>> let both device A BARs to reside in the same 64k page, while device B
>> BARs would need to reside in a separate 64k page? I.e. having two levels
>> of alignment: PAGE_SIZE on a per-device basis, and 4k on a per-BAR
>> basis?
>>
>> If I understand you correctly, there's currently no logic in the PCI
>> subsystem to easily support this, so that is a rather large ask. I'm
>> also not sure that it's necessary.
> 
> That is what I was thinking, but it probably doesn't matter.
> It would only be necessary if the system would otherwise run out
> of PCI(e) address space.
> 
> Even after I reduced our FPGAs BARs from 32MB to 'only' 4MB (1MB + 1MB + 8k)
> we still get issues with some PC bios failing to allocate the resources
> in some slots - but these are old x86-64 systems that might have been expected
> to run 32bit windows.

I expect this series will not make any difference with that particular
scenario since the BARs are >4k (and PAGE_SIZE == 4k on x86).

> The requirement to use a separate BAR for MSIX pretty much doubles the
> required address space.

4k region, not BAR.

> As an aside, if a PCIe device asks for:
> 	BAR-0 (4k)
> 	BAR-1 (8k)
> 	BAR-2 (4k)
> (which is a bit silly)
> does it get packed into 16k with no padding by assigning BAR-2 between
> BAR-0 and BAR-1, or is it all padded out to 32k.
> I'd probably add a comment to say it isn't done :-)

On a system with 4k page size, this series should not affect the example
you've provided since those BARs are all 4k or larger.

If you are testing with this series applied to your kernel and notice
any regression, please let me know.