Message ID | 20170323205342.GB23612@bhelgaas-glaptop.roam.corp.google.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Thu, Mar 23, 2017 at 03:53:42PM -0500, Bjorn Helgaas wrote: > Hi Yongji, > > On Wed, Feb 15, 2017 at 02:45:05PM +0800, Yongji Xie wrote: > > When vfio passthroughs a PCI device of which MMIO BARs are > > smaller than PAGE_SIZE, guest will not handle the mmio > > accesses to the BARs which leads to mmio emulations in host. > > > > This is because vfio will not allow to passthrough one BAR's > > mmio page which may be shared with other BARs. Otherwise, > > there will be a backdoor that guest can use to access BARs > > of other guest. > > Please include a pointer to the VFIO code that enforces this. It's > not obvious to me how it would do this. This doesn't change the > *size* of the resource, only the alignment. So if VFIO sees a BAR > like [mem 0x80000000-0x800000ff], it knows the BAR is aligned enough > that it *could* be the only thing on a page, but I don't know how it > could know that there will never be another BAR at 0x80000100. Even > if there's no other BAR on that page *now*, it would have to know that > no hot-added device will ever be placed on that page. Never mind, I found it. I updated the changelog like this; please correct anything I got wrong: When VFIO passes through a PCI device to a guest, it does not allow the guest to mmap BARs that are smaller than PAGE_SIZE unless it can reserve the rest of the page (see vfio_pci_probe_mmaps()). This is because a page might contain several small BARs for unrelated devices and a guest should not be able to access all of them. VFIO emulates guest accesses to non-mappable BARs, which is functional but slow. On systems with large page sizes, e.g., PowerNV with 64K pages, BARs are more likely to share a page and performance is more likely to be a problem. Add a macro to set default alignment for all PCI devices. An arch can set this to PAGE_SIZE to force the PCI core to place memory BARs on their own pages.
On Thu, Mar 23, 2017 at 03:53:42PM -0500, Bjorn Helgaas wrote: > Hi Yongji, > > On Wed, Feb 15, 2017 at 02:45:05PM +0800, Yongji Xie wrote: >> When vfio passthroughs a PCI device of which MMIO BARs are >> smaller than PAGE_SIZE, guest will not handle the mmio >> accesses to the BARs which leads to mmio emulations in host. >> >> This is because vfio will not allow to passthrough one BAR's >> mmio page which may be shared with other BARs. Otherwise, >> there will be a backdoor that guest can use to access BARs >> of other guest. > > Please include a pointer to the VFIO code that enforces this. It's > not obvious to me how it would do this. This doesn't change the > *size* of the resource, only the alignment. So if VFIO sees a BAR > like [mem 0x80000000-0x800000ff], it knows the BAR is aligned enough > that it *could* be the only thing on a page, but I don't know how it > could know that there will never be another BAR at 0x80000100. Even > if there's no other BAR on that page *now*, it would have to know that > no hot-added device will ever be placed on that page. > >> This patch adds a macro to set default alignment for all >> PCI devices. Then we could solve this issue on some platforms >> which would easily hit this issue because of their 64K page >> such as PowerNV platform by defining this macro as PAGE_SIZE. >> >> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> >> --- >> arch/powerpc/include/asm/pci.h | 4 ++++ >> drivers/pci/pci.c | 3 +++ >> 2 files changed, 7 insertions(+), 0 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h >> index e9bd6cf..5e31bc2 100644 >> --- a/arch/powerpc/include/asm/pci.h >> +++ b/arch/powerpc/include/asm/pci.h >> @@ -28,6 +28,10 @@ >> #define PCIBIOS_MIN_IO 0x1000 >> #define PCIBIOS_MIN_MEM 0x10000000 >> >> +#ifdef CONFIG_PPC_POWERNV >> +#define PCIBIOS_DEFAULT_ALIGNMENT PAGE_SIZE >> +#endif >> + >> struct pci_dev; >> >> /* Values for the `which' argument to sys_pciconfig_iobase syscall. */ >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> index a881c0d..2622e9b 100644 >> --- a/drivers/pci/pci.c >> +++ b/drivers/pci/pci.c >> @@ -4965,6 +4965,9 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev) >> resource_size_t align = 0; >> char *p; >> >> +#ifdef PCIBIOS_DEFAULT_ALIGNMENT >> + align = PCIBIOS_DEFAULT_ALIGNMENT; >> +#endif > > I'd prefer to move this #ifdef out of the code path, as in the patch > below. > > I don't know how powerpc kernels are built: can you build a kernel > that will run on PowerNV as well as on different powerpc systems? If > so, is it acceptable to force that kernel to user 64K alignment even > when it's running on non-PowerNV systems? Or do you need a function > call here instead of a #define? > That's a good point, the macro may be also defined on non-PowerNV systems. Maybe an arch-specific function is more reasonable here. Thanks, Yongji
Bjorn Helgaas <helgaas@kernel.org> writes: > > I don't know how powerpc kernels are built: can you build a kernel > that will run on PowerNV as well as on different powerpc systems? Yes you can. But there are some restrictions. You can only build a 64-bit OR a 32-bit kernel. And within 64-bit you can only build a *kernel* for the "server architecture (~= IBM CPUs)" or the "embedded architecture (Freescale/NXP)". So in practice you can build a 64-bit kernel that supports: - powernv (bare metal on IBM server chips) - pseries (virtualised on IBM server chips or qemu) - powermac (Apple G5s) - pasemi (long dead but still used by some Amiga folks?) - Cell/PS3 (also long dead) > If so, is it acceptable to force that kernel to user 64K alignment even > when it's running on non-PowerNV systems? Probably, but I'm not sure TBH. Benh will know, I'll try and get his attention. cheers
On Mon, 2017-03-27 at 21:17 +1100, Michael Ellerman wrote: > > If so, is it acceptable to force that kernel to user 64K alignment > > even > > when it's running on non-PowerNV systems? > > Probably, but I'm not sure TBH. Benh will know, I'll try and get his > attention. Can we make the alignment PAGE_SIZE ? I think that should be ok as long as it doesn't try to re-allocate BARs for devices that have already been properly allocated by firmware (ie, PowerMac, if you mess around with the MacIO chip on these, bad things will happen). Cheers, Ben.
On Mon, Mar 27, 2017 at 09:25:50PM +1100, Benjamin Herrenschmidt wrote: > On Mon, 2017-03-27 at 21:17 +1100, Michael Ellerman wrote: > > > If so, is it acceptable to force that kernel to user 64K alignment > > > even > > > when it's running on non-PowerNV systems? > > > > Probably, but I'm not sure TBH. Benh will know, I'll try and get his > > attention. > > Can we make the alignment PAGE_SIZE ? I think that should be ok as long > as it doesn't try to re-allocate BARs for devices that have already > been properly allocated by firmware (ie, PowerMac, if you mess around > with the MacIO chip on these, bad things will happen). I think this *will* change BARs even if they've already been allocated by firmware, because it affects this path that is used for every device: pci_device_add pci_reassigndev_resource_alignment pci_specified_resource_alignment align = PCIBIOS_DEFAULT_ALIGNMENT I'm looking for an update that fixes the minor comment issues and possibly addresses this PowerMac question. Bjorn
diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h index 93eded8d3843..dc3c81ab4cc4 100644 --- a/arch/powerpc/include/asm/pci.h +++ b/arch/powerpc/include/asm/pci.h @@ -28,6 +28,10 @@ #define PCIBIOS_MIN_IO 0x1000 #define PCIBIOS_MIN_MEM 0x10000000 +#ifdef CONFIG_PPC_POWERNV +#define PCIBIOS_DEFAULT_ALIGNMENT PAGE_SIZE +#endif + struct pci_dev; /* Values for the `which' argument to sys_pciconfig_iobase syscall. */ diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 7904d02ffdb9..e9a079063fd0 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4962,7 +4962,7 @@ static resource_size_t pci_specified_resource_alignment(struct pci_dev *dev) { int seg, bus, slot, func, align_order, count; unsigned short vendor, device, subsystem_vendor, subsystem_device; - resource_size_t align = 0; + resource_size_t align = PCIBIOS_DEFAULT_ALIGNMENT; char *p; spin_lock(&resource_alignment_lock); diff --git a/include/linux/pci.h b/include/linux/pci.h index eb3da1a04e6c..618beb5809d1 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1630,6 +1630,10 @@ static inline int pci_get_new_domain_nr(void) { return -ENOSYS; } #define pci_root_bus_fwnode(bus) NULL #endif +#ifndef PCIBIOS_DEFAULT_ALIGNMENT +#define PCIBIOS_DEFAULT_ALIGNMENT 0 +#endif + /* these helpers provide future and backwards compatibility * for accessing popular PCI BAR info */ #define pci_resource_start(dev, bar) ((dev)->resource[(bar)].start)