diff mbox

[RFC] pci: Identify Enhanced Allocation (EA) BAR Equivalent resources

Message ID 20160114172645.23429.9938.stgit@gimli.home (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Alex Williamson Jan. 14, 2016, 5:26 p.m. UTC
We've done a pretty good job of abstracting EA from drivers, but there
are some properties of BAR Equivalent resources that don't really jive
with traditional PCI BARs.  In particular, natural alignment is only
encouraged, not required.

Why does this matter?  There are drivers like vfio-pci that will
happily gobble up the EA abstraction that's been implemented and
expose a device using EA to userspace as if those resources are
traditional BARs.  Pretty cool.  The vfio API is bus agnostic, so it
doesn't care about alignment.  The problem comes with PCI config space
emulation where we don't let userspace manipulate the BAR value, but
we do emulate BAR sizing.  The abstraction kind of falls apart if
userspace gets garbage when they try to size what appears to be a
traditional BAR, but is actually a BAR equivalent.

We could simply round up the size in vfio to make it naturally
aligned, but then we're imposing artificial sizes to the user and we
have the discontinuity that BAR size emulation and vfio region size
reporting don't agree on the size.  I think what we want to do is
expose EA to the user, reporting traditional BARs with BEIs as
zero-sized and providing additional regions for the user to access
each EA region, whether it has a BEI or not.

To facilitate that, a flag indicating whether a PCI resource is a
traditional BAR or BAR equivalent seems much nicer than attempting
to size the BAR ourselves or deducing it through the EA capability.

Thoughts?

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/pci/pci.c      |    2 +-
 include/linux/ioport.h |    2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

sostalle Jan. 14, 2016, 6:34 p.m. UTC | #1
On Thu, Jan 14, 2016 at 10:26:56AM -0700, Alex Williamson wrote:
> We've done a pretty good job of abstracting EA from drivers, but there
> are some properties of BAR Equivalent resources that don't really jive
> with traditional PCI BARs.  In particular, natural alignment is only
> encouraged, not required.
> 
> Why does this matter?  There are drivers like vfio-pci that will
> happily gobble up the EA abstraction that's been implemented and
> expose a device using EA to userspace as if those resources are
> traditional BARs.  Pretty cool.  The vfio API is bus agnostic, so it
> doesn't care about alignment.  The problem comes with PCI config space
> emulation where we don't let userspace manipulate the BAR value, but
> we do emulate BAR sizing.  The abstraction kind of falls apart if
> userspace gets garbage when they try to size what appears to be a
> traditional BAR, but is actually a BAR equivalent.
>
> We could simply round up the size in vfio to make it naturally
> aligned, but then we're imposing artificial sizes to the user and we
> have the discontinuity that BAR size emulation and vfio region size
> reporting don't agree on the size.  I think what we want to do is
> expose EA to the user, reporting traditional BARs with BEIs as
> zero-sized and providing additional regions for the user to access
> each EA region, whether it has a BEI or not.
> 
> To facilitate that, a flag indicating whether a PCI resource is a
> traditional BAR or BAR equivalent seems much nicer than attempting
> to size the BAR ourselves or deducing it through the EA capability.

If vfio does size the resource, EA entries that are aligned could
still be emulated as BARs, correct?

I would think that emulating a BAR would be preferred when possible,
for backwards-compatibility.

> Thoughts?

I like the idea of adding an EA flag.

There were some cases in the kernel where it would be nice to know if a
resource was fixed because it was EA or if something else was fixing it.
Adding that flag was discussed during the code review of the EA code,
but it was decided that we could get by without it.

IIRC, most of the cases that required the flag had to do with EA entries
for bridges. Since bridge support wasn't added, we didn't need the flag.

-Sean


> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  drivers/pci/pci.c      |    2 +-
>  include/linux/ioport.h |    2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 314db8c..174c734 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev)
>  
>  static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
>  {
> -	unsigned long flags = IORESOURCE_PCI_FIXED;
> +	unsigned long flags = IORESOURCE_PCI_FIXED | IORESOURCE_PCI_EA_BEI;
>  
>  	switch (prop) {
>  	case PCI_EA_P_MEM:
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 24bea08..5acc194 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -105,6 +105,8 @@ struct resource {
>  /* PCI control bits.  Shares IORESOURCE_BITS with above PCI ROM.  */
>  #define IORESOURCE_PCI_FIXED		(1<<4)	/* Do not move resource */
>  
> +/* PCI Enhanced Allocation defined BAR equivalent resource */
> +#define IORESOURCE_PCI_EA_BEI		(1<<5)
>  
>  /* helpers to define resources */
>  #define DEFINE_RES_NAMED(_start, _size, _name, _flags)			\
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Daney Jan. 14, 2016, 6:54 p.m. UTC | #2
On 01/14/2016 09:26 AM, Alex Williamson wrote:
> We've done a pretty good job of abstracting EA from drivers, but there
> are some properties of BAR Equivalent resources that don't really jive
> with traditional PCI BARs.  In particular, natural alignment is only
> encouraged, not required.
>
> Why does this matter?  There are drivers like vfio-pci that will
> happily gobble up the EA abstraction that's been implemented and
> expose a device using EA to userspace as if those resources are
> traditional BARs.  Pretty cool.  The vfio API is bus agnostic, so it
> doesn't care about alignment.  The problem comes with PCI config space
> emulation where we don't let userspace manipulate the BAR value, but
> we do emulate BAR sizing.  The abstraction kind of falls apart if
> userspace gets garbage when they try to size what appears to be a
> traditional BAR, but is actually a BAR equivalent.
>
> We could simply round up the size in vfio to make it naturally
> aligned, but then we're imposing artificial sizes to the user and we
> have the discontinuity that BAR size emulation and vfio region size
> reporting don't agree on the size.  I think what we want to do is
> expose EA to the user, reporting traditional BARs with BEIs as
> zero-sized and providing additional regions for the user to access
> each EA region, whether it has a BEI or not.
>
> To facilitate that, a flag indicating whether a PCI resource is a
> traditional BAR or BAR equivalent seems much nicer than attempting
> to size the BAR ourselves or deducing it through the EA capability.
>
> Thoughts?

Is the flag exposed to userspace in any way?

I haven't dug into what uses the flags.

One problem we have seen is with the lspci utility which cannot 
distinguish between SROIV BARs and EA provisioned BARs.

Would, or could, this be used there?

David Daney


>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>   drivers/pci/pci.c      |    2 +-
>   include/linux/ioport.h |    2 ++
>   2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 314db8c..174c734 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev)
>
>   static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
>   {
> -	unsigned long flags = IORESOURCE_PCI_FIXED;
> +	unsigned long flags = IORESOURCE_PCI_FIXED | IORESOURCE_PCI_EA_BEI;
>
>   	switch (prop) {
>   	case PCI_EA_P_MEM:
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 24bea08..5acc194 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -105,6 +105,8 @@ struct resource {
>   /* PCI control bits.  Shares IORESOURCE_BITS with above PCI ROM.  */
>   #define IORESOURCE_PCI_FIXED		(1<<4)	/* Do not move resource */
>
> +/* PCI Enhanced Allocation defined BAR equivalent resource */
> +#define IORESOURCE_PCI_EA_BEI		(1<<5)
>
>   /* helpers to define resources */
>   #define DEFINE_RES_NAMED(_start, _size, _name, _flags)			\
>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Jan. 14, 2016, 7:16 p.m. UTC | #3
On Thu, 2016-01-14 at 10:34 -0800, Sean O. Stalley wrote:
> On Thu, Jan 14, 2016 at 10:26:56AM -0700, Alex Williamson wrote:
> > We've done a pretty good job of abstracting EA from drivers, but
> > there
> > are some properties of BAR Equivalent resources that don't really
> > jive
> > with traditional PCI BARs.  In particular, natural alignment is
> > only
> > encouraged, not required.
> > 
> > Why does this matter?  There are drivers like vfio-pci that will
> > happily gobble up the EA abstraction that's been implemented and
> > expose a device using EA to userspace as if those resources are
> > traditional BARs.  Pretty cool.  The vfio API is bus agnostic, so
> > it
> > doesn't care about alignment.  The problem comes with PCI config
> > space
> > emulation where we don't let userspace manipulate the BAR value,
> > but
> > we do emulate BAR sizing.  The abstraction kind of falls apart if
> > userspace gets garbage when they try to size what appears to be a
> > traditional BAR, but is actually a BAR equivalent.
> > 
> > We could simply round up the size in vfio to make it naturally
> > aligned, but then we're imposing artificial sizes to the user and
> > we
> > have the discontinuity that BAR size emulation and vfio region size
> > reporting don't agree on the size.  I think what we want to do is
> > expose EA to the user, reporting traditional BARs with BEIs as
> > zero-sized and providing additional regions for the user to access
> > each EA region, whether it has a BEI or not.
> > 
> > To facilitate that, a flag indicating whether a PCI resource is a
> > traditional BAR or BAR equivalent seems much nicer than attempting
> > to size the BAR ourselves or deducing it through the EA capability.
> 
> If vfio does size the resource, EA entries that are aligned could
> still be emulated as BARs, correct?
> 
> I would think that emulating a BAR would be preferred when possible,
> for backwards-compatibility.

If a BEI is naturally aligned, I can't think of any problems with
exposing it as a traditional BAR to userspace.  I agree that there may
be some compatibility benefits there, so it may be useful to offer both
options.  I don't think we can combine them though, it would violate
the EA spec to expose the traditional BAR and and the matching BEI.
We'd either need to hide the fake BAR or hide the EA entry defining
that BEI.  A module option could define which is preferred or maybe an
ioctl.

> > Thoughts?
> 
> I like the idea of adding an EA flag.
> 
> There were some cases in the kernel where it would be nice to know if a
> resource was fixed because it was EA or if something else was fixing it.
> Adding that flag was discussed during the code review of the EA code,
> but it was decided that we could get by without it.
> 
> IIRC, most of the cases that required the flag had to do with EA entries
> for bridges. Since bridge support wasn't added, we didn't need the flag.

By my reading of the spec, not all BEIs need to be fixed, is this just
a simplification to avoid sizing and mapping a BAR that doesn't exist
in the traditional sense?  A flag on the resource seems like it would
be useful for that as well if we ever wanted to add the case where an
AE BAR equivalent could be remapped.  Thanks,

Alex

> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> >  drivers/pci/pci.c      |    2 +-
> >  include/linux/ioport.h |    2 ++
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 314db8c..174c734 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev)
> >  
> >  static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
> >  {
> > -	unsigned long flags = IORESOURCE_PCI_FIXED;
> > +	unsigned long flags = IORESOURCE_PCI_FIXED |
> > IORESOURCE_PCI_EA_BEI;
> >  
> >  	switch (prop) {
> >  	case PCI_EA_P_MEM:
> > diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> > index 24bea08..5acc194 100644
> > --- a/include/linux/ioport.h
> > +++ b/include/linux/ioport.h
> > @@ -105,6 +105,8 @@ struct resource {
> >  /* PCI control bits.  Shares IORESOURCE_BITS with above PCI
> > ROM.  */
> >  #define IORESOURCE_PCI_FIXED		(1<<4)	/* Do
> > not move resource */
> >  
> > +/* PCI Enhanced Allocation defined BAR equivalent resource */
> > +#define IORESOURCE_PCI_EA_BEI		(1<<5)
> >  
> >  /* helpers to define resources */
> >  #define DEFINE_RES_NAMED(_start, _size, _name, _flags)		
> > 	\
> > 

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Jan. 14, 2016, 7:20 p.m. UTC | #4
On Thu, 2016-01-14 at 10:54 -0800, David Daney wrote:
> On 01/14/2016 09:26 AM, Alex Williamson wrote:
> > We've done a pretty good job of abstracting EA from drivers, but
> > there
> > are some properties of BAR Equivalent resources that don't really
> > jive
> > with traditional PCI BARs.  In particular, natural alignment is
> > only
> > encouraged, not required.
> > 
> > Why does this matter?  There are drivers like vfio-pci that will
> > happily gobble up the EA abstraction that's been implemented and
> > expose a device using EA to userspace as if those resources are
> > traditional BARs.  Pretty cool.  The vfio API is bus agnostic, so
> > it
> > doesn't care about alignment.  The problem comes with PCI config
> > space
> > emulation where we don't let userspace manipulate the BAR value,
> > but
> > we do emulate BAR sizing.  The abstraction kind of falls apart if
> > userspace gets garbage when they try to size what appears to be a
> > traditional BAR, but is actually a BAR equivalent.
> > 
> > We could simply round up the size in vfio to make it naturally
> > aligned, but then we're imposing artificial sizes to the user and
> > we
> > have the discontinuity that BAR size emulation and vfio region size
> > reporting don't agree on the size.  I think what we want to do is
> > expose EA to the user, reporting traditional BARs with BEIs as
> > zero-sized and providing additional regions for the user to access
> > each EA region, whether it has a BEI or not.
> > 
> > To facilitate that, a flag indicating whether a PCI resource is a
> > traditional BAR or BAR equivalent seems much nicer than attempting
> > to size the BAR ourselves or deducing it through the EA capability.
> > 
> > Thoughts?
> 
> Is the flag exposed to userspace in any way?
> 
> I haven't dug into what uses the flags.
> 
> One problem we have seen is with the lspci utility which cannot 
> distinguish between SROIV BARs and EA provisioned BARs.
> 
> Would, or could, this be used there?

Perhaps so, the flags would be exposed in sysfs in
/sys/bus/pci/devices/<device>/resource.  Three fields are printed for
each resource: start, end, and flags.  That's definitely something
lspci could consume.  Thanks,

Alex

> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> >   drivers/pci/pci.c      |    2 +-
> >   include/linux/ioport.h |    2 ++
> >   2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 314db8c..174c734 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev)
> > 
> >   static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
> >   {
> > -	unsigned long flags = IORESOURCE_PCI_FIXED;
> > +	unsigned long flags = IORESOURCE_PCI_FIXED |
> > IORESOURCE_PCI_EA_BEI;
> > 
> >   	switch (prop) {
> >   	case PCI_EA_P_MEM:
> > diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> > index 24bea08..5acc194 100644
> > --- a/include/linux/ioport.h
> > +++ b/include/linux/ioport.h
> > @@ -105,6 +105,8 @@ struct resource {
> >   /* PCI control bits.  Shares IORESOURCE_BITS with above PCI
> > ROM.  */
> >   #define IORESOURCE_PCI_FIXED		(1<<4)	/* Do
> > not move resource */
> > 
> > +/* PCI Enhanced Allocation defined BAR equivalent resource */
> > +#define IORESOURCE_PCI_EA_BEI		(1<<5)
> > 
> >   /* helpers to define resources */
> >   #define DEFINE_RES_NAMED(_start, _size, _name, _flags)		
> > 	\
> > 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
sostalle Jan. 14, 2016, 7:27 p.m. UTC | #5
On Thu, Jan 14, 2016 at 10:54:48AM -0800, David Daney wrote:
> On 01/14/2016 09:26 AM, Alex Williamson wrote:
> >We've done a pretty good job of abstracting EA from drivers, but there
> >are some properties of BAR Equivalent resources that don't really jive
> >with traditional PCI BARs.  In particular, natural alignment is only
> >encouraged, not required.
> >
> >Why does this matter?  There are drivers like vfio-pci that will
> >happily gobble up the EA abstraction that's been implemented and
> >expose a device using EA to userspace as if those resources are
> >traditional BARs.  Pretty cool.  The vfio API is bus agnostic, so it
> >doesn't care about alignment.  The problem comes with PCI config space
> >emulation where we don't let userspace manipulate the BAR value, but
> >we do emulate BAR sizing.  The abstraction kind of falls apart if
> >userspace gets garbage when they try to size what appears to be a
> >traditional BAR, but is actually a BAR equivalent.
> >
> >We could simply round up the size in vfio to make it naturally
> >aligned, but then we're imposing artificial sizes to the user and we
> >have the discontinuity that BAR size emulation and vfio region size
> >reporting don't agree on the size.  I think what we want to do is
> >expose EA to the user, reporting traditional BARs with BEIs as
> >zero-sized and providing additional regions for the user to access
> >each EA region, whether it has a BEI or not.
> >
> >To facilitate that, a flag indicating whether a PCI resource is a
> >traditional BAR or BAR equivalent seems much nicer than attempting
> >to size the BAR ourselves or deducing it through the EA capability.
> >
> >Thoughts?
> 
> Is the flag exposed to userspace in any way?

Yes. Resources are exposed through a sysfs file.
(/sys/bus/pci/devices/[Device Number]/resource)
This file contains the resource range, as well as the flags.

> I haven't dug into what uses the flags.
> 
> One problem we have seen is with the lspci utility which cannot
> distinguish between SROIV BARs and EA provisioned BARs.
> 
> Would, or could, this be used there?

This flag would let us distinguish between EA BARs & SRIOV BARs in lspci.

When lspci sees a resource in the sysfs file without a matching BAR
in configspace, it assumes that the BAR comes from an SRIOV entry.
This was a good assumption until EA showed up.

The only case in lspci that this flag wouldn't handle is when the
resource is provisioned through EA & SRIOV (ie: BEI 9-14).
lspci would only flag the resource as EA.

-Sean

> 
> David Daney
> 
> 
> >
> >Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> >---
> >  drivers/pci/pci.c      |    2 +-
> >  include/linux/ioport.h |    2 ++
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> >
> >diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >index 314db8c..174c734 100644
> >--- a/drivers/pci/pci.c
> >+++ b/drivers/pci/pci.c
> >@@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev)
> >
> >  static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
> >  {
> >-	unsigned long flags = IORESOURCE_PCI_FIXED;
> >+	unsigned long flags = IORESOURCE_PCI_FIXED | IORESOURCE_PCI_EA_BEI;
> >
> >  	switch (prop) {
> >  	case PCI_EA_P_MEM:
> >diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> >index 24bea08..5acc194 100644
> >--- a/include/linux/ioport.h
> >+++ b/include/linux/ioport.h
> >@@ -105,6 +105,8 @@ struct resource {
> >  /* PCI control bits.  Shares IORESOURCE_BITS with above PCI ROM.  */
> >  #define IORESOURCE_PCI_FIXED		(1<<4)	/* Do not move resource */
> >
> >+/* PCI Enhanced Allocation defined BAR equivalent resource */
> >+#define IORESOURCE_PCI_EA_BEI		(1<<5)
> >
> >  /* helpers to define resources */
> >  #define DEFINE_RES_NAMED(_start, _size, _name, _flags)			\
> >
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
sostalle Jan. 14, 2016, 8:23 p.m. UTC | #6
On Thu, Jan 14, 2016 at 12:16:02PM -0700, Alex Williamson wrote:
> On Thu, 2016-01-14 at 10:34 -0800, Sean O. Stalley wrote:
> > On Thu, Jan 14, 2016 at 10:26:56AM -0700, Alex Williamson wrote:
> > > We've done a pretty good job of abstracting EA from drivers, but
> > > there
> > > are some properties of BAR Equivalent resources that don't really
> > > jive
> > > with traditional PCI BARs.  In particular, natural alignment is
> > > only
> > > encouraged, not required.
> > > 
> > > Why does this matter?  There are drivers like vfio-pci that will
> > > happily gobble up the EA abstraction that's been implemented and
> > > expose a device using EA to userspace as if those resources are
> > > traditional BARs.  Pretty cool.  The vfio API is bus agnostic, so
> > > it
> > > doesn't care about alignment.  The problem comes with PCI config
> > > space
> > > emulation where we don't let userspace manipulate the BAR value,
> > > but
> > > we do emulate BAR sizing.  The abstraction kind of falls apart if
> > > userspace gets garbage when they try to size what appears to be a
> > > traditional BAR, but is actually a BAR equivalent.
> > > 
> > > We could simply round up the size in vfio to make it naturally
> > > aligned, but then we're imposing artificial sizes to the user and
> > > we
> > > have the discontinuity that BAR size emulation and vfio region size
> > > reporting don't agree on the size.  I think what we want to do is
> > > expose EA to the user, reporting traditional BARs with BEIs as
> > > zero-sized and providing additional regions for the user to access
> > > each EA region, whether it has a BEI or not.
> > > 
> > > To facilitate that, a flag indicating whether a PCI resource is a
> > > traditional BAR or BAR equivalent seems much nicer than attempting
> > > to size the BAR ourselves or deducing it through the EA capability.
> > 
> > If vfio does size the resource, EA entries that are aligned could
> > still be emulated as BARs, correct?
> > 
> > I would think that emulating a BAR would be preferred when possible,
> > for backwards-compatibility.
> 
> If a BEI is naturally aligned, I can't think of any problems with
> exposing it as a traditional BAR to userspace.  I agree that there may
> be some compatibility benefits there, so it may be useful to offer both
> options.  I don't think we can combine them though, it would violate
> the EA spec to expose the traditional BAR and and the matching BEI.
> We'd either need to hide the fake BAR or hide the EA entry defining
> that BEI.  A module option could define which is preferred or maybe an
> ioctl.

Would any functionality be lost if vfio:
	- emulates BARs & hide EA entry when EA resources are aligned.
	- exposes EA entries when the resources aren't aligned (no BAR emulation).
?

I'm just wondering if giving userspace the option to pick is necessary,
or if there is a setting that is always ideal.

> > > Thoughts?
> > 
> > I like the idea of adding an EA flag.
> > 
> > There were some cases in the kernel where it would be nice to know if a
> > resource was fixed because it was EA or if something else was fixing it.
> > Adding that flag was discussed during the code review of the EA code,
> > but it was decided that we could get by without it.
> > 
> > IIRC, most of the cases that required the flag had to do with EA entries
> > for bridges. Since bridge support wasn't added, we didn't need the flag.
> 
> By my reading of the spec, not all BEIs need to be fixed, is this just
> a simplification to avoid sizing and mapping a BAR that doesn't exist
> in the traditional sense?  A flag on the resource seems like it would
> be useful for that as well if we ever wanted to add the case where an
> AE BAR equivalent could be remapped.  Thanks,

All of the usable BEIs have a HwInit Base & MaxOffset, and therefore a
fixed range. The "unavailable for use" resources aren't explicitly HwInit,
but the spec doesn't define how/when you can move them.

The spec does define a writeable bit for resources,
but doesn't define how to use it either. I think the intention was to
be able to expand EA in the future to cover movable resources.

Anyway, I think having an explicit flag that says "This Resource is from EA"
that is independent of "This resource is fixed" is a good idea.


Acked-by: Sean O. Stalley <sean.stalley@intel.com>


Thanks,
Sean

> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > ---
> > >  drivers/pci/pci.c      |    2 +-
> > >  include/linux/ioport.h |    2 ++
> > >  2 files changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index 314db8c..174c734 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev)
> > >  
> > >  static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
> > >  {
> > > -	unsigned long flags = IORESOURCE_PCI_FIXED;
> > > +	unsigned long flags = IORESOURCE_PCI_FIXED |
> > > IORESOURCE_PCI_EA_BEI;
> > >  
> > >  	switch (prop) {
> > >  	case PCI_EA_P_MEM:
> > > diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> > > index 24bea08..5acc194 100644
> > > --- a/include/linux/ioport.h
> > > +++ b/include/linux/ioport.h
> > > @@ -105,6 +105,8 @@ struct resource {
> > >  /* PCI control bits.  Shares IORESOURCE_BITS with above PCI
> > > ROM.  */
> > >  #define IORESOURCE_PCI_FIXED		(1<<4)	/* Do
> > > not move resource */
> > >  
> > > +/* PCI Enhanced Allocation defined BAR equivalent resource *
> > > +#define IORESOURCE_PCI_EA_BEI		(1<<5)
> > >  
> > >  /* helpers to define resources */
> > >  #define DEFINE_RES_NAMED(_start, _size, _name, _flags)		
> > > 	\
> > > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Jan. 14, 2016, 9:14 p.m. UTC | #7
On Thu, 2016-01-14 at 12:23 -0800, Sean O. Stalley wrote:
> On Thu, Jan 14, 2016 at 12:16:02PM -0700, Alex Williamson wrote:
> > On Thu, 2016-01-14 at 10:34 -0800, Sean O. Stalley wrote:
> > > On Thu, Jan 14, 2016 at 10:26:56AM -0700, Alex Williamson wrote:
> > > > We've done a pretty good job of abstracting EA from drivers, but
> > > > there
> > > > are some properties of BAR Equivalent resources that don't really
> > > > jive
> > > > with traditional PCI BARs.  In particular, natural alignment is
> > > > only
> > > > encouraged, not required.
> > > > 
> > > > Why does this matter?  There are drivers like vfio-pci that will
> > > > happily gobble up the EA abstraction that's been implemented and
> > > > expose a device using EA to userspace as if those resources are
> > > > traditional BARs.  Pretty cool.  The vfio API is bus agnostic, so
> > > > it
> > > > doesn't care about alignment.  The problem comes with PCI config
> > > > space
> > > > emulation where we don't let userspace manipulate the BAR value,
> > > > but
> > > > we do emulate BAR sizing.  The abstraction kind of falls apart if
> > > > userspace gets garbage when they try to size what appears to be a
> > > > traditional BAR, but is actually a BAR equivalent.
> > > > 
> > > > We could simply round up the size in vfio to make it naturally
> > > > aligned, but then we're imposing artificial sizes to the user and
> > > > we
> > > > have the discontinuity that BAR size emulation and vfio region size
> > > > reporting don't agree on the size.  I think what we want to do is
> > > > expose EA to the user, reporting traditional BARs with BEIs as
> > > > zero-sized and providing additional regions for the user to access
> > > > each EA region, whether it has a BEI or not.
> > > > 
> > > > To facilitate that, a flag indicating whether a PCI resource is a
> > > > traditional BAR or BAR equivalent seems much nicer than attempting
> > > > to size the BAR ourselves or deducing it through the EA capability.
> > > 
> > > If vfio does size the resource, EA entries that are aligned could
> > > still be emulated as BARs, correct?
> > > 
> > > I would think that emulating a BAR would be preferred when possible,
> > > for backwards-compatibility.
> > 
> > If a BEI is naturally aligned, I can't think of any problems with
> > exposing it as a traditional BAR to userspace.  I agree that there may
> > be some compatibility benefits there, so it may be useful to offer both
> > options.  I don't think we can combine them though, it would violate
> > the EA spec to expose the traditional BAR and and the matching BEI.
> > We'd either need to hide the fake BAR or hide the EA entry defining
> > that BEI.  A module option could define which is preferred or maybe an
> > ioctl.
> 
> Would any functionality be lost if vfio:
> 	- emulates BARs & hide EA entry when EA resources are aligned.
> 	- exposes EA entries when the resources aren't aligned (no BAR emulation).
> ?
> 
> I'm just wondering if giving userspace the option to pick is necessary,
> or if there is a setting that is always ideal.

That certainly might be a good default and the only use case I can
think where it wouldn't be ideal is if we want to expose EA to a VM for
the purpose of doing EA testing and development in a guest.  A module
option would make more sense than defining a user interface for that
case though.

> > > > Thoughts?
> > > 
> > > I like the idea of adding an EA flag.
> > > 
> > > There were some cases in the kernel where it would be nice to know if a
> > > resource was fixed because it was EA or if something else was fixing it.
> > > Adding that flag was discussed during the code review of the EA code,
> > > but it was decided that we could get by without it.
> > > 
> > > IIRC, most of the cases that required the flag had to do with EA entries
> > > for bridges. Since bridge support wasn't added, we didn't need the flag.
> > 
> > By my reading of the spec, not all BEIs need to be fixed, is this just
> > a simplification to avoid sizing and mapping a BAR that doesn't exist
> > in the traditional sense?  A flag on the resource seems like it would
> > be useful for that as well if we ever wanted to add the case where an
> > AE BAR equivalent could be remapped.  Thanks,
> 
> All of the usable BEIs have a HwInit Base & MaxOffset, and therefore a
> fixed range. The "unavailable for use" resources aren't explicitly HwInit,
> but the spec doesn't define how/when you can move them.
> 
> The spec does define a writeable bit for resources,
> but doesn't define how to use it either. I think the intention was to
> be able to expand EA in the future to cover movable resources.

Wow, that's really confusing to have a writable bit per entry and then
define properties which forbid its use.  We'd need to go through the
SIG and define a whole new set of properties just to get movable
versions, crazy.  Thanks for the explanation.
> 
> Anyway, I think having an explicit flag that says "This Resource is from EA"
> that is independent of "This resource is fixed" is a good idea.
> 
> 
> Acked-by: Sean O. Stalley <sean.stalley@intel.com>

Cool, thanks

Alex

> > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > ---
> > > >  drivers/pci/pci.c      |    2 +-
> > > >  include/linux/ioport.h |    2 ++
> > > >  2 files changed, 3 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > index 314db8c..174c734 100644
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev)
> > > >  
> > > >  static unsigned long pci_ea_flags(struct pci_dev *dev, u8
> > > > prop)
> > > >  {
> > > > -	unsigned long flags = IORESOURCE_PCI_FIXED;
> > > > +	unsigned long flags = IORESOURCE_PCI_FIXED |
> > > > IORESOURCE_PCI_EA_BEI;
> > > >  
> > > >  	switch (prop) {
> > > >  	case PCI_EA_P_MEM:
> > > > diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> > > > index 24bea08..5acc194 100644
> > > > --- a/include/linux/ioport.h
> > > > +++ b/include/linux/ioport.h
> > > > @@ -105,6 +105,8 @@ struct resource {
> > > >  /* PCI control bits.  Shares IORESOURCE_BITS with above PCI
> > > > ROM.  */
> > > >  #define IORESOURCE_PCI_FIXED		(1<<4)	/*
> > > > Do
> > > > not move resource */
> > > >  
> > > > +/* PCI Enhanced Allocation defined BAR equivalent resource *
> > > > +#define IORESOURCE_PCI_EA_BEI		(1<<5)
> > > >  
> > > >  /* helpers to define resources */
> > > >  #define DEFINE_RES_NAMED(_start, _size, _name, _flags)		
> > > > 	\
> > > > 
> > 

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
sostalle Jan. 14, 2016, 11:02 p.m. UTC | #8
On Thu, Jan 14, 2016 at 02:14:52PM -0700, Alex Williamson wrote:
> On Thu, 2016-01-14 at 12:23 -0800, Sean O. Stalley wrote:
> > On Thu, Jan 14, 2016 at 12:16:02PM -0700, Alex Williamson wrote:
> > > On Thu, 2016-01-14 at 10:34 -0800, Sean O. Stalley wrote:
> > > > On Thu, Jan 14, 2016 at 10:26:56AM -0700, Alex Williamson wrote:
> > > > > We've done a pretty good job of abstracting EA from drivers, but
> > > > > there
> > > > > are some properties of BAR Equivalent resources that don't really
> > > > > jive
> > > > > with traditional PCI BARs.  In particular, natural alignment is
> > > > > only
> > > > > encouraged, not required.
> > > > > 
> > > > > Why does this matter?  There are drivers like vfio-pci that will
> > > > > happily gobble up the EA abstraction that's been implemented and
> > > > > expose a device using EA to userspace as if those resources are
> > > > > traditional BARs.  Pretty cool.  The vfio API is bus agnostic, so
> > > > > it
> > > > > doesn't care about alignment.  The problem comes with PCI config
> > > > > space
> > > > > emulation where we don't let userspace manipulate the BAR value,
> > > > > but
> > > > > we do emulate BAR sizing.  The abstraction kind of falls apart if
> > > > > userspace gets garbage when they try to size what appears to be a
> > > > > traditional BAR, but is actually a BAR equivalent.
> > > > > 
> > > > > We could simply round up the size in vfio to make it naturally
> > > > > aligned, but then we're imposing artificial sizes to the user and
> > > > > we
> > > > > have the discontinuity that BAR size emulation and vfio region size
> > > > > reporting don't agree on the size.  I think what we want to do is
> > > > > expose EA to the user, reporting traditional BARs with BEIs as
> > > > > zero-sized and providing additional regions for the user to access
> > > > > each EA region, whether it has a BEI or not.
> > > > > 
> > > > > To facilitate that, a flag indicating whether a PCI resource is a
> > > > > traditional BAR or BAR equivalent seems much nicer than attempting
> > > > > to size the BAR ourselves or deducing it through the EA capability.
> > > > 
> > > > If vfio does size the resource, EA entries that are aligned could
> > > > still be emulated as BARs, correct?
> > > > 
> > > > I would think that emulating a BAR would be preferred when possible,
> > > > for backwards-compatibility.
> > > 
> > > If a BEI is naturally aligned, I can't think of any problems with
> > > exposing it as a traditional BAR to userspace.  I agree that there may
> > > be some compatibility benefits there, so it may be useful to offer both
> > > options.  I don't think we can combine them though, it would violate
> > > the EA spec to expose the traditional BAR and and the matching BEI.
> > > We'd either need to hide the fake BAR or hide the EA entry defining
> > > that BEI.  A module option could define which is preferred or maybe an
> > > ioctl.
> > 
> > Would any functionality be lost if vfio:
> > 	- emulates BARs & hide EA entry when EA resources are aligned.
> > 	- exposes EA entries when the resources aren't aligned (no BAR emulation).
> > ?
> > 
> > I'm just wondering if giving userspace the option to pick is necessary,
> > or if there is a setting that is always ideal.
> 
> That certainly might be a good default and the only use case I can
> think where it wouldn't be ideal is if we want to expose EA to a VM for
> the purpose of doing EA testing and development in a guest.  A module
> option would make more sense than defining a user interface for that
> case though.

The testing I have done for EA has been in a VM.
I sent out some patches a while back for QEMU that allow the user to add an EA entry to the existing PCI models.

[https://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg00348.html]

I don't know if they will be useful for what you are doing, but they were very useful to me when doing the initial EA development.

-Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Jan. 20, 2016, 8:20 p.m. UTC | #9
On Thu, 2016-01-14 at 10:26 -0700, Alex Williamson wrote:
> We've done a pretty good job of abstracting EA from drivers, but there
> are some properties of BAR Equivalent resources that don't really jive
> with traditional PCI BARs.  In particular, natural alignment is only
> encouraged, not required.
> 
> Why does this matter?  There are drivers like vfio-pci that will
> happily gobble up the EA abstraction that's been implemented and
> expose a device using EA to userspace as if those resources are
> traditional BARs.  Pretty cool.  The vfio API is bus agnostic, so it
> doesn't care about alignment.  The problem comes with PCI config space
> emulation where we don't let userspace manipulate the BAR value, but
> we do emulate BAR sizing.  The abstraction kind of falls apart if
> userspace gets garbage when they try to size what appears to be a
> traditional BAR, but is actually a BAR equivalent.
> 
> We could simply round up the size in vfio to make it naturally
> aligned, but then we're imposing artificial sizes to the user and we
> have the discontinuity that BAR size emulation and vfio region size
> reporting don't agree on the size.  I think what we want to do is
> expose EA to the user, reporting traditional BARs with BEIs as
> zero-sized and providing additional regions for the user to access
> each EA region, whether it has a BEI or not.
> 
> To facilitate that, a flag indicating whether a PCI resource is a
> traditional BAR or BAR equivalent seems much nicer than attempting
> to size the BAR ourselves or deducing it through the EA capability.
> 
> Thoughts?
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---


Just to loop back on this, it seems like we do have some support and
use cases beyond what I proposed.  Thanks for the discussion of that.
However, I'm reluctant to post this formally because the change is user
visible, it consumes a limited resource, and I don't know how quickly
vfio-pci is going to be able to make use of this flag.  The vfio-pci
work may not happen until a device appears with poorly sized resources
that has some use case with vfio-pci.  Even then, we may be able to
infer the BEI association without this flag.  So, while I'm not opposed
to this flag, I don't see a need to drive it right now and those that
do have a more immediate need are welcome to take over.  Thanks,

Alex


>  drivers/pci/pci.c      |    2 +-
>  include/linux/ioport.h |    2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 314db8c..174c734 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev)
>  
>  static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
>  {
> -	unsigned long flags = IORESOURCE_PCI_FIXED;
> +	unsigned long flags = IORESOURCE_PCI_FIXED | IORESOURCE_PCI_EA_BEI;
>  
>  	switch (prop) {
>  	case PCI_EA_P_MEM:
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 24bea08..5acc194 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -105,6 +105,8 @@ struct resource {
>  /* PCI control bits.  Shares IORESOURCE_BITS with above PCI ROM.  */
>  #define IORESOURCE_PCI_FIXED		(1<<4)	/* Do not move resource */
>  
> +/* PCI Enhanced Allocation defined BAR equivalent resource */
> +#define IORESOURCE_PCI_EA_BEI		(1<<5)
>  
>  /* helpers to define resources */
>  #define DEFINE_RES_NAMED(_start, _size, _name, _flags)			\
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
sostalle Jan. 21, 2016, 5:48 p.m. UTC | #10
On Wed, Jan 20, 2016 at 01:20:27PM -0700, Alex Williamson wrote:
> On Thu, 2016-01-14 at 10:26 -0700, Alex Williamson wrote:
> > We've done a pretty good job of abstracting EA from drivers, but there
> > are some properties of BAR Equivalent resources that don't really jive
> > with traditional PCI BARs.  In particular, natural alignment is only
> > encouraged, not required.
> > 
> > Why does this matter?  There are drivers like vfio-pci that will
> > happily gobble up the EA abstraction that's been implemented and
> > expose a device using EA to userspace as if those resources are
> > traditional BARs.  Pretty cool.  The vfio API is bus agnostic, so it
> > doesn't care about alignment.  The problem comes with PCI config space
> > emulation where we don't let userspace manipulate the BAR value, but
> > we do emulate BAR sizing.  The abstraction kind of falls apart if
> > userspace gets garbage when they try to size what appears to be a
> > traditional BAR, but is actually a BAR equivalent.
> > 
> > We could simply round up the size in vfio to make it naturally
> > aligned, but then we're imposing artificial sizes to the user and we
> > have the discontinuity that BAR size emulation and vfio region size
> > reporting don't agree on the size.  I think what we want to do is
> > expose EA to the user, reporting traditional BARs with BEIs as
> > zero-sized and providing additional regions for the user to access
> > each EA region, whether it has a BEI or not.
> > 
> > To facilitate that, a flag indicating whether a PCI resource is a
> > traditional BAR or BAR equivalent seems much nicer than attempting
> > to size the BAR ourselves or deducing it through the EA capability.
> > 
> > Thoughts?
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> 
> 
> Just to loop back on this, it seems like we do have some support and
> use cases beyond what I proposed.  Thanks for the discussion of that.
> However, I'm reluctant to post this formally because the change is user
> visible, it consumes a limited resource, and I don't know how quickly
> vfio-pci is going to be able to make use of this flag.  The vfio-pci
> work may not happen until a device appears with poorly sized resources
> that has some use case with vfio-pci.  Even then, we may be able to
> infer the BEI association without this flag.  So, while I'm not opposed
> to this flag, I don't see a need to drive it right now and those that
> do have a more immediate need are welcome to take over.  Thanks,
> 
> Alex
> 
> 

Thanks Alex,

I'll pick it up and fix the lspci case.

-Sean

> >  drivers/pci/pci.c      |    2 +-
> >  include/linux/ioport.h |    2 ++
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 314db8c..174c734 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -2229,7 +2229,7 @@ void pci_pm_init(struct pci_dev *dev)
> >  
> >  static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
> >  {
> > -	unsigned long flags = IORESOURCE_PCI_FIXED;
> > +	unsigned long flags = IORESOURCE_PCI_FIXED | IORESOURCE_PCI_EA_BEI;
> >  
> >  	switch (prop) {
> >  	case PCI_EA_P_MEM:
> > diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> > index 24bea08..5acc194 100644
> > --- a/include/linux/ioport.h
> > +++ b/include/linux/ioport.h
> > @@ -105,6 +105,8 @@ struct resource {
> >  /* PCI control bits.  Shares IORESOURCE_BITS with above PCI ROM.  */
> >  #define IORESOURCE_PCI_FIXED		(1<<4)	/* Do not move resource */
> >  
> > +/* PCI Enhanced Allocation defined BAR equivalent resource */
> > +#define IORESOURCE_PCI_EA_BEI		(1<<5)
> >  
> >  /* helpers to define resources */
> >  #define DEFINE_RES_NAMED(_start, _size, _name, _flags)			\
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 314db8c..174c734 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2229,7 +2229,7 @@  void pci_pm_init(struct pci_dev *dev)
 
 static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
 {
-	unsigned long flags = IORESOURCE_PCI_FIXED;
+	unsigned long flags = IORESOURCE_PCI_FIXED | IORESOURCE_PCI_EA_BEI;
 
 	switch (prop) {
 	case PCI_EA_P_MEM:
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 24bea08..5acc194 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -105,6 +105,8 @@  struct resource {
 /* PCI control bits.  Shares IORESOURCE_BITS with above PCI ROM.  */
 #define IORESOURCE_PCI_FIXED		(1<<4)	/* Do not move resource */
 
+/* PCI Enhanced Allocation defined BAR equivalent resource */
+#define IORESOURCE_PCI_EA_BEI		(1<<5)
 
 /* helpers to define resources */
 #define DEFINE_RES_NAMED(_start, _size, _name, _flags)			\