diff mbox

[v3] pci: correct pci config size default for cap version 2 endpoints

Message ID 20110722155338.43049.12587.stgit@dddsys0.bos.redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Donald Dutile July 22, 2011, 3:59 p.m. UTC
v3: remove all boundary tests. just fix the obvious bug.
  : boundary test is not necessary; get this fix in &
    post boundary test in another separate patch.
    
v2: do local boundary check with respect to legacy PCI header length,
    and don't depend on it in pci_add_capability().
  : fix compilation, and change else>2 to simple else for all other cases.

v1: first patch: boundary check in pci_add_capability().

Doing device assignement using a PCIe device with it's
PCI Cap structure at offset 0xcc showed a problem in
the default size mapped for this cap-id.

The failure caused a corruption which might have gone unnoticed
otherwise.

Fix assigned_device_pci_cap_init() to set the default
size of PCIe Cap structure (cap-id 0x10) to 0x34 instead of 0x3c.
0x34 is default, min, for endpoint device with a cap version of 2.

Signed-off-by: Donald Dutile <ddutile@redhat.com>
cc: Alex Williamson <alex.williamson@redhat.com>
cc: Michael S. Tsirkin <mst@redhat.com>

tested-by: ebenes@redhat.com

---

 hw/device-assignment.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Alex Williamson July 22, 2011, 5 p.m. UTC | #1
On Fri, 2011-07-22 at 11:59 -0400, Donald Dutile wrote:
> v3: remove all boundary tests. just fix the obvious bug.
>   : boundary test is not necessary; get this fix in &
>     post boundary test in another separate patch.
>     
> v2: do local boundary check with respect to legacy PCI header length,
>     and don't depend on it in pci_add_capability().
>   : fix compilation, and change else>2 to simple else for all other cases.
> 
> v1: first patch: boundary check in pci_add_capability().
> 
> Doing device assignement using a PCIe device with it's
> PCI Cap structure at offset 0xcc showed a problem in
> the default size mapped for this cap-id.
> 
> The failure caused a corruption which might have gone unnoticed
> otherwise.
> 
> Fix assigned_device_pci_cap_init() to set the default
> size of PCIe Cap structure (cap-id 0x10) to 0x34 instead of 0x3c.
> 0x34 is default, min, for endpoint device with a cap version of 2.
> 
> Signed-off-by: Donald Dutile <ddutile@redhat.com>
> cc: Alex Williamson <alex.williamson@redhat.com>
> cc: Michael S. Tsirkin <mst@redhat.com>
> 
> tested-by: ebenes@redhat.com
> 
> ---
> 
>  hw/device-assignment.c |    8 +++++---
>  1 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index 36ad6b0..34db52e 100644
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -1419,16 +1419,18 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>      }
>  
>      if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0))) {
> -        uint8_t version;
> +        uint8_t version, size;
>          uint16_t type, devctl, lnkcap, lnksta;
>          uint32_t devcap;
> -        int size = 0x3c; /* version 2 size */
>  
>          version = pci_get_byte(pci_dev->config + pos + PCI_EXP_FLAGS);
>          version &= PCI_EXP_FLAGS_VERS;
>          if (version == 1) {
>              size = 0x14;
> -        } else if (version > 2) {
> +        } else if (version == 2) {
> +            /* don't include slot cap/stat/ctrl 2 regs; only support endpoints */
> +            size = 0x34;
> +        } else {
>              fprintf(stderr, "Unsupported PCI express capability version %d\n",
>                      version);
>              return -EINVAL;
> 

Acked-by: Alex Williamson <alex.williamson@redhat.com>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Wright July 22, 2011, 9:24 p.m. UTC | #2
* Donald Dutile (ddutile@redhat.com) wrote:
> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index 36ad6b0..34db52e 100644
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -1419,16 +1419,18 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>      }
>  
>      if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0))) {
> -        uint8_t version;
> +        uint8_t version, size;
>          uint16_t type, devctl, lnkcap, lnksta;
>          uint32_t devcap;
> -        int size = 0x3c; /* version 2 size */
>  
>          version = pci_get_byte(pci_dev->config + pos + PCI_EXP_FLAGS);
>          version &= PCI_EXP_FLAGS_VERS;
>          if (version == 1) {
>              size = 0x14;
> -        } else if (version > 2) {
> +        } else if (version == 2) {
> +            /* don't include slot cap/stat/ctrl 2 regs; only support endpoints */
> +            size = 0x34;

That doesn't look correct to me.  The size is fixed, just that some
registers are Reserved Zero when they do not apply (e.g. endpoint only).
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson July 22, 2011, 9:30 p.m. UTC | #3
On Fri, 2011-07-22 at 14:24 -0700, Chris Wright wrote:
> * Donald Dutile (ddutile@redhat.com) wrote:
> > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > index 36ad6b0..34db52e 100644
> > --- a/hw/device-assignment.c
> > +++ b/hw/device-assignment.c
> > @@ -1419,16 +1419,18 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
> >      }
> >  
> >      if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0))) {
> > -        uint8_t version;
> > +        uint8_t version, size;
> >          uint16_t type, devctl, lnkcap, lnksta;
> >          uint32_t devcap;
> > -        int size = 0x3c; /* version 2 size */
> >  
> >          version = pci_get_byte(pci_dev->config + pos + PCI_EXP_FLAGS);
> >          version &= PCI_EXP_FLAGS_VERS;
> >          if (version == 1) {
> >              size = 0x14;
> > -        } else if (version > 2) {
> > +        } else if (version == 2) {
> > +            /* don't include slot cap/stat/ctrl 2 regs; only support endpoints */
> > +            size = 0x34;
> 
> That doesn't look correct to me.  The size is fixed, just that some
> registers are Reserved Zero when they do not apply (e.g. endpoint only).

Apparently it can be interpreted differently.  In this case, we've seen
a tg3 device expose a v2 PCI express capability at offset 0xcc.  Using
0x3c bytes, we extend 8 bytes past the legacy config space area :(

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Wright July 22, 2011, 9:35 p.m. UTC | #4
* Alex Williamson (alex.williamson@redhat.com) wrote:
> On Fri, 2011-07-22 at 14:24 -0700, Chris Wright wrote:
> > * Donald Dutile (ddutile@redhat.com) wrote:
> > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > index 36ad6b0..34db52e 100644
> > > --- a/hw/device-assignment.c
> > > +++ b/hw/device-assignment.c
> > > @@ -1419,16 +1419,18 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
> > >      }
> > >  
> > >      if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0))) {
> > > -        uint8_t version;
> > > +        uint8_t version, size;
> > >          uint16_t type, devctl, lnkcap, lnksta;
> > >          uint32_t devcap;
> > > -        int size = 0x3c; /* version 2 size */
> > >  
> > >          version = pci_get_byte(pci_dev->config + pos + PCI_EXP_FLAGS);
> > >          version &= PCI_EXP_FLAGS_VERS;
> > >          if (version == 1) {
> > >              size = 0x14;
> > > -        } else if (version > 2) {
> > > +        } else if (version == 2) {
> > > +            /* don't include slot cap/stat/ctrl 2 regs; only support endpoints */
> > > +            size = 0x34;
> > 
> > That doesn't look correct to me.  The size is fixed, just that some
> > registers are Reserved Zero when they do not apply (e.g. endpoint only).
> 
> Apparently it can be interpreted differently.  In this case, we've seen
> a tg3 device expose a v2 PCI express capability at offset 0xcc.  Using
> 0x3c bytes, we extend 8 bytes past the legacy config space area :(

Wow, that device sounds broken to me.  The spec is pretty clear.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin July 24, 2011, 8:12 a.m. UTC | #5
On Fri, Jul 22, 2011 at 02:35:47PM -0700, Chris Wright wrote:
> * Alex Williamson (alex.williamson@redhat.com) wrote:
> > On Fri, 2011-07-22 at 14:24 -0700, Chris Wright wrote:
> > > * Donald Dutile (ddutile@redhat.com) wrote:
> > > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > > index 36ad6b0..34db52e 100644
> > > > --- a/hw/device-assignment.c
> > > > +++ b/hw/device-assignment.c
> > > > @@ -1419,16 +1419,18 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
> > > >      }
> > > >  
> > > >      if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0))) {
> > > > -        uint8_t version;
> > > > +        uint8_t version, size;
> > > >          uint16_t type, devctl, lnkcap, lnksta;
> > > >          uint32_t devcap;
> > > > -        int size = 0x3c; /* version 2 size */
> > > >  
> > > >          version = pci_get_byte(pci_dev->config + pos + PCI_EXP_FLAGS);
> > > >          version &= PCI_EXP_FLAGS_VERS;
> > > >          if (version == 1) {
> > > >              size = 0x14;
> > > > -        } else if (version > 2) {
> > > > +        } else if (version == 2) {
> > > > +            /* don't include slot cap/stat/ctrl 2 regs; only support endpoints */
> > > > +            size = 0x34;
> > > 
> > > That doesn't look correct to me.  The size is fixed, just that some
> > > registers are Reserved Zero when they do not apply (e.g. endpoint only).
> > 
> > Apparently it can be interpreted differently.  In this case, we've seen
> > a tg3 device expose a v2 PCI express capability at offset 0xcc.  Using
> > 0x3c bytes, we extend 8 bytes past the legacy config space area :(
> 
> Wow, that device sounds broken to me.  The spec is pretty clear.

Yes, I agree it's broken. Looks like something that
happens when a device is designed in parallel with the spec.

What bothers me is this patch seems to make devices that do behave
correctly out of spec (registers will be writeable by default) -
correct?

How about we check for overflow and only do the hacks
if it happens?

Also, the code to initialize slot and root control registers is still
there: it would seem that running it will corrupt memmory beyond the
config array?
Michael S. Tsirkin July 24, 2011, 8:41 a.m. UTC | #6
On Sun, Jul 24, 2011 at 11:12:44AM +0300, Michael S. Tsirkin wrote:
> On Fri, Jul 22, 2011 at 02:35:47PM -0700, Chris Wright wrote:
> > * Alex Williamson (alex.williamson@redhat.com) wrote:
> > > On Fri, 2011-07-22 at 14:24 -0700, Chris Wright wrote:
> > > > * Donald Dutile (ddutile@redhat.com) wrote:
> > > > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > > > index 36ad6b0..34db52e 100644
> > > > > --- a/hw/device-assignment.c
> > > > > +++ b/hw/device-assignment.c
> > > > > @@ -1419,16 +1419,18 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
> > > > >      }
> > > > >  
> > > > >      if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0))) {
> > > > > -        uint8_t version;
> > > > > +        uint8_t version, size;
> > > > >          uint16_t type, devctl, lnkcap, lnksta;
> > > > >          uint32_t devcap;
> > > > > -        int size = 0x3c; /* version 2 size */
> > > > >  
> > > > >          version = pci_get_byte(pci_dev->config + pos + PCI_EXP_FLAGS);
> > > > >          version &= PCI_EXP_FLAGS_VERS;
> > > > >          if (version == 1) {
> > > > >              size = 0x14;
> > > > > -        } else if (version > 2) {
> > > > > +        } else if (version == 2) {
> > > > > +            /* don't include slot cap/stat/ctrl 2 regs; only support endpoints */
> > > > > +            size = 0x34;
> > > > 
> > > > That doesn't look correct to me.  The size is fixed, just that some
> > > > registers are Reserved Zero when they do not apply (e.g. endpoint only).
> > > 
> > > Apparently it can be interpreted differently.  In this case, we've seen
> > > a tg3 device expose a v2 PCI express capability at offset 0xcc.  Using
> > > 0x3c bytes, we extend 8 bytes past the legacy config space area :(
> > 
> > Wow, that device sounds broken to me.  The spec is pretty clear.
> 
> Yes, I agree it's broken. Looks like something that
> happens when a device is designed in parallel with the spec.
> 
> What bothers me is this patch seems to make devices that do behave
> correctly out of spec (registers will be writeable by default) -
> correct?
> 
> How about we check for overflow and only do the hacks
> if it happens?
> 
> Also, the code to initialize slot and root control registers is still
> there: it would seem that running it will corrupt memmory beyond the
> config array?

I take this last bit back: registers we touch are at offset < 0x34.
Sorry about the noise. But the question about read-only registers
still stands.

> 
> -- 
> MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin July 24, 2011, 10:58 a.m. UTC | #7
On Sun, Jul 24, 2011 at 11:41:10AM +0300, Michael S. Tsirkin wrote:
> On Sun, Jul 24, 2011 at 11:12:44AM +0300, Michael S. Tsirkin wrote:
> > On Fri, Jul 22, 2011 at 02:35:47PM -0700, Chris Wright wrote:
> > > * Alex Williamson (alex.williamson@redhat.com) wrote:
> > > > On Fri, 2011-07-22 at 14:24 -0700, Chris Wright wrote:
> > > > > * Donald Dutile (ddutile@redhat.com) wrote:
> > > > > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > > > > index 36ad6b0..34db52e 100644
> > > > > > --- a/hw/device-assignment.c
> > > > > > +++ b/hw/device-assignment.c
> > > > > > @@ -1419,16 +1419,18 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
> > > > > >      }
> > > > > >  
> > > > > >      if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0))) {
> > > > > > -        uint8_t version;
> > > > > > +        uint8_t version, size;
> > > > > >          uint16_t type, devctl, lnkcap, lnksta;
> > > > > >          uint32_t devcap;
> > > > > > -        int size = 0x3c; /* version 2 size */
> > > > > >  
> > > > > >          version = pci_get_byte(pci_dev->config + pos + PCI_EXP_FLAGS);
> > > > > >          version &= PCI_EXP_FLAGS_VERS;
> > > > > >          if (version == 1) {
> > > > > >              size = 0x14;
> > > > > > -        } else if (version > 2) {
> > > > > > +        } else if (version == 2) {
> > > > > > +            /* don't include slot cap/stat/ctrl 2 regs; only support endpoints */
> > > > > > +            size = 0x34;
> > > > > 
> > > > > That doesn't look correct to me.  The size is fixed, just that some
> > > > > registers are Reserved Zero when they do not apply (e.g. endpoint only).
> > > > 
> > > > Apparently it can be interpreted differently.  In this case, we've seen
> > > > a tg3 device expose a v2 PCI express capability at offset 0xcc.  Using
> > > > 0x3c bytes, we extend 8 bytes past the legacy config space area :(
> > > 
> > > Wow, that device sounds broken to me.  The spec is pretty clear.
> > 
> > Yes, I agree it's broken. Looks like something that
> > happens when a device is designed in parallel with the spec.
> > 
> > What bothers me is this patch seems to make devices that do behave
> > correctly out of spec (registers will be writeable by default) -
> > correct?
> > 
> > How about we check for overflow and only do the hacks
> > if it happens?
> > 
> > Also, the code to initialize slot and root control registers is still
> > there: it would seem that running it will corrupt memmory beyond the
> > config array?
> 
> I take this last bit back: registers we touch are at offset < 0x34.
> Sorry about the noise. But the question about read-only registers
> still stands.

Also, where does the magic 0x34 come from? I'm guessing this is
simply what's left till the end of the config space.
So let's be conservative specific as possible with
this hack:

/* A version 2 device was observed to only have a partial
 * implementation for the capability structure. Apparently, it doesn't
 * implement the registers from slot capability 2 and on (offset 0x34),
 * with the capability at offset 0xCC = 256 - 0x34. This is out of spec,
 * but let's try to support this. */
if (version == 2 && pos == 0xCC) {
	size = 0x34;
}


> > 
> > -- 
> > MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Donald Dutile July 25, 2011, 7:37 p.m. UTC | #8
On 07/24/2011 06:58 AM, Michael S. Tsirkin wrote:
> On Sun, Jul 24, 2011 at 11:41:10AM +0300, Michael S. Tsirkin wrote:
>> On Sun, Jul 24, 2011 at 11:12:44AM +0300, Michael S. Tsirkin wrote:
>>> On Fri, Jul 22, 2011 at 02:35:47PM -0700, Chris Wright wrote:
>>>> * Alex Williamson (alex.williamson@redhat.com) wrote:
>>>>> On Fri, 2011-07-22 at 14:24 -0700, Chris Wright wrote:
>>>>>> * Donald Dutile (ddutile@redhat.com) wrote:
>>>>>>> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
>>>>>>> index 36ad6b0..34db52e 100644
>>>>>>> --- a/hw/device-assignment.c
>>>>>>> +++ b/hw/device-assignment.c
>>>>>>> @@ -1419,16 +1419,18 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>>>>>>>       }
>>>>>>>
>>>>>>>       if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0))) {
>>>>>>> -        uint8_t version;
>>>>>>> +        uint8_t version, size;
>>>>>>>           uint16_t type, devctl, lnkcap, lnksta;
>>>>>>>           uint32_t devcap;
>>>>>>> -        int size = 0x3c; /* version 2 size */
>>>>>>>
>>>>>>>           version = pci_get_byte(pci_dev->config + pos + PCI_EXP_FLAGS);
>>>>>>>           version&= PCI_EXP_FLAGS_VERS;
>>>>>>>           if (version == 1) {
>>>>>>>               size = 0x14;
>>>>>>> -        } else if (version>  2) {
>>>>>>> +        } else if (version == 2) {
>>>>>>> +            /* don't include slot cap/stat/ctrl 2 regs; only support endpoints */
>>>>>>> +            size = 0x34;
>>>>>>
>>>>>> That doesn't look correct to me.  The size is fixed, just that some
>>>>>> registers are Reserved Zero when they do not apply (e.g. endpoint only).
>>>>>
>>>>> Apparently it can be interpreted differently.  In this case, we've seen
>>>>> a tg3 device expose a v2 PCI express capability at offset 0xcc.  Using
>>>>> 0x3c bytes, we extend 8 bytes past the legacy config space area :(
>>>>
>>>> Wow, that device sounds broken to me.  The spec is pretty clear.
>>>
>>> Yes, I agree it's broken. Looks like something that
>>> happens when a device is designed in parallel with the spec.
>>>
>>> What bothers me is this patch seems to make devices that do behave
>>> correctly out of spec (registers will be writeable by default) -
>>> correct?
>>>
>>> How about we check for overflow and only do the hacks
>>> if it happens?
>>>
>>> Also, the code to initialize slot and root control registers is still
>>> there: it would seem that running it will corrupt memmory beyond the
>>> config array?
>>
>> I take this last bit back: registers we touch are at offset<  0x34.
>> Sorry about the noise. But the question about read-only registers
>> still stands.
>
> Also, where does the magic 0x34 come from? I'm guessing this is
> simply what's left till the end of the config space.
> So let's be conservative specific as possible with
> this hack:
>

I believe the spec leaves room for interpretation, and thus the
resulting 'broken' device.  As I read the spec, the size of the struct can be:

-- 0x2c for all devices, min., that are cap version 2 or higher.
-- 0x34 for devices with links, i.e., not a root-port-based device,
                               , a device not integrated into the root port
                               , or if it is, it uses a serial link anyhow
                                 (doesn't strip out 8b/10b serial link btwn device
                                  & internal root port)
-- 0x3c for devices with slots, i.e., a bridge with downstream slots,
                                 i.e., not an endpoint, i.e., a PCI(e) bridge.

Thus, 0x34 was chosen, since we don't support device assigning PCI bridges,
(not until MRIOV shows up, at least), and 0x34 fits the bug at hand, and
device cap/stat/control may be used/modified.

So, a 'hack' is not needed.  In fact, the 0x34 size is a bit of a hack,
since the case to use 0x2c could be ascertained by checking if the device
is a root port device, _and_ it's not using a serial link, but a perusal of
root port devices on a number of systems I looked at always had this structure
greater than 0x2c, so I figured the simple heuristic of 0x34 was sufficient.

> /* A version 2 device was observed to only have a partial
>   * implementation for the capability structure. Apparently, it doesn't
>   * implement the registers from slot capability 2 and on (offset 0x34),
>   * with the capability at offset 0xCC = 256 - 0x34. This is out of spec,
>   * but let's try to support this. */
> if (version == 2&&  pos == 0xCC) {
> 	size = 0x34;
> }
>
>
>>>
>>> --
>>> MST

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson July 25, 2011, 8:20 p.m. UTC | #9
On Mon, 2011-07-25 at 15:37 -0400, Don Dutile wrote:
> On 07/24/2011 06:58 AM, Michael S. Tsirkin wrote:
> > On Sun, Jul 24, 2011 at 11:41:10AM +0300, Michael S. Tsirkin wrote:
> >> On Sun, Jul 24, 2011 at 11:12:44AM +0300, Michael S. Tsirkin wrote:
> >>> On Fri, Jul 22, 2011 at 02:35:47PM -0700, Chris Wright wrote:
> >>>> * Alex Williamson (alex.williamson@redhat.com) wrote:
> >>>>> On Fri, 2011-07-22 at 14:24 -0700, Chris Wright wrote:
> >>>>>> * Donald Dutile (ddutile@redhat.com) wrote:
> >>>>>>> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> >>>>>>> index 36ad6b0..34db52e 100644
> >>>>>>> --- a/hw/device-assignment.c
> >>>>>>> +++ b/hw/device-assignment.c
> >>>>>>> @@ -1419,16 +1419,18 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
> >>>>>>>       }
> >>>>>>>
> >>>>>>>       if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0))) {
> >>>>>>> -        uint8_t version;
> >>>>>>> +        uint8_t version, size;
> >>>>>>>           uint16_t type, devctl, lnkcap, lnksta;
> >>>>>>>           uint32_t devcap;
> >>>>>>> -        int size = 0x3c; /* version 2 size */
> >>>>>>>
> >>>>>>>           version = pci_get_byte(pci_dev->config + pos + PCI_EXP_FLAGS);
> >>>>>>>           version&= PCI_EXP_FLAGS_VERS;
> >>>>>>>           if (version == 1) {
> >>>>>>>               size = 0x14;
> >>>>>>> -        } else if (version>  2) {
> >>>>>>> +        } else if (version == 2) {
> >>>>>>> +            /* don't include slot cap/stat/ctrl 2 regs; only support endpoints */
> >>>>>>> +            size = 0x34;
> >>>>>>
> >>>>>> That doesn't look correct to me.  The size is fixed, just that some
> >>>>>> registers are Reserved Zero when they do not apply (e.g. endpoint only).
> >>>>>
> >>>>> Apparently it can be interpreted differently.  In this case, we've seen
> >>>>> a tg3 device expose a v2 PCI express capability at offset 0xcc.  Using
> >>>>> 0x3c bytes, we extend 8 bytes past the legacy config space area :(
> >>>>
> >>>> Wow, that device sounds broken to me.  The spec is pretty clear.
> >>>
> >>> Yes, I agree it's broken. Looks like something that
> >>> happens when a device is designed in parallel with the spec.
> >>>
> >>> What bothers me is this patch seems to make devices that do behave
> >>> correctly out of spec (registers will be writeable by default) -
> >>> correct?
> >>>
> >>> How about we check for overflow and only do the hacks
> >>> if it happens?
> >>>
> >>> Also, the code to initialize slot and root control registers is still
> >>> there: it would seem that running it will corrupt memmory beyond the
> >>> config array?
> >>
> >> I take this last bit back: registers we touch are at offset<  0x34.
> >> Sorry about the noise. But the question about read-only registers
> >> still stands.
> >
> > Also, where does the magic 0x34 come from? I'm guessing this is
> > simply what's left till the end of the config space.
> > So let's be conservative specific as possible with
> > this hack:
> >
> 
> I believe the spec leaves room for interpretation, and thus the
> resulting 'broken' device.  As I read the spec, the size of the struct can be:
> 
> -- 0x2c for all devices, min., that are cap version 2 or higher.
> -- 0x34 for devices with links, i.e., not a root-port-based device,
>                                , a device not integrated into the root port
>                                , or if it is, it uses a serial link anyhow
>                                  (doesn't strip out 8b/10b serial link btwn device
>                                   & internal root port)
> -- 0x3c for devices with slots, i.e., a bridge with downstream slots,
>                                  i.e., not an endpoint, i.e., a PCI(e) bridge.
> 
> Thus, 0x34 was chosen, since we don't support device assigning PCI bridges,
> (not until MRIOV shows up, at least), and 0x34 fits the bug at hand, and
> device cap/stat/control may be used/modified.
> 
> So, a 'hack' is not needed.  In fact, the 0x34 size is a bit of a hack,
> since the case to use 0x2c could be ascertained by checking if the device
> is a root port device, _and_ it's not using a serial link, but a perusal of
> root port devices on a number of systems I looked at always had this structure
> greater than 0x2c, so I figured the simple heuristic of 0x34 was sufficient.
> 
> > /* A version 2 device was observed to only have a partial
> >   * implementation for the capability structure. Apparently, it doesn't
> >   * implement the registers from slot capability 2 and on (offset 0x34),
> >   * with the capability at offset 0xCC = 256 - 0x34. This is out of spec,
> >   * but let's try to support this. */
> > if (version == 2&&  pos == 0xCC) {
> > 	size = 0x34;
> > }

The more I look at it, the more I think that maybe this is an especially
broken device and we shouldn't change the default for it.  BTW, the
programming reference for this device is here:

http://www.broadcom.com/collateral/pg/5761-PG100-R.pdf

They've burned up most of the capability area for vendor specific
registers, so there's not enough room for the full pci-e capability
structure.  I'd be fine with adding a test specifically for this.  As
you suggested on IRC, print some warning and cut pci-e back to 0x34 if
it extends past the legacy config space, and reject the device if it's
still too small.  Leave the default 0x3c since this is the only device
we've found with this problem.  Thanks,

Alex



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Donald Dutile July 25, 2011, 8:42 p.m. UTC | #10
On 07/25/2011 04:20 PM, Alex Williamson wrote:
> On Mon, 2011-07-25 at 15:37 -0400, Don Dutile wrote:
>> On 07/24/2011 06:58 AM, Michael S. Tsirkin wrote:
>>> On Sun, Jul 24, 2011 at 11:41:10AM +0300, Michael S. Tsirkin wrote:
>>>> On Sun, Jul 24, 2011 at 11:12:44AM +0300, Michael S. Tsirkin wrote:
>>>>> On Fri, Jul 22, 2011 at 02:35:47PM -0700, Chris Wright wrote:
>>>>>> * Alex Williamson (alex.williamson@redhat.com) wrote:
>>>>>>> On Fri, 2011-07-22 at 14:24 -0700, Chris Wright wrote:
>>>>>>>> * Donald Dutile (ddutile@redhat.com) wrote:
>>>>>>>>> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
>>>>>>>>> index 36ad6b0..34db52e 100644
>>>>>>>>> --- a/hw/device-assignment.c
>>>>>>>>> +++ b/hw/device-assignment.c
>>>>>>>>> @@ -1419,16 +1419,18 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
>>>>>>>>>        }
>>>>>>>>>
>>>>>>>>>        if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0))) {
>>>>>>>>> -        uint8_t version;
>>>>>>>>> +        uint8_t version, size;
>>>>>>>>>            uint16_t type, devctl, lnkcap, lnksta;
>>>>>>>>>            uint32_t devcap;
>>>>>>>>> -        int size = 0x3c; /* version 2 size */
>>>>>>>>>
>>>>>>>>>            version = pci_get_byte(pci_dev->config + pos + PCI_EXP_FLAGS);
>>>>>>>>>            version&= PCI_EXP_FLAGS_VERS;
>>>>>>>>>            if (version == 1) {
>>>>>>>>>                size = 0x14;
>>>>>>>>> -        } else if (version>   2) {
>>>>>>>>> +        } else if (version == 2) {
>>>>>>>>> +            /* don't include slot cap/stat/ctrl 2 regs; only support endpoints */
>>>>>>>>> +            size = 0x34;
>>>>>>>>
>>>>>>>> That doesn't look correct to me.  The size is fixed, just that some
>>>>>>>> registers are Reserved Zero when they do not apply (e.g. endpoint only).
>>>>>>>
>>>>>>> Apparently it can be interpreted differently.  In this case, we've seen
>>>>>>> a tg3 device expose a v2 PCI express capability at offset 0xcc.  Using
>>>>>>> 0x3c bytes, we extend 8 bytes past the legacy config space area :(
>>>>>>
>>>>>> Wow, that device sounds broken to me.  The spec is pretty clear.
>>>>>
>>>>> Yes, I agree it's broken. Looks like something that
>>>>> happens when a device is designed in parallel with the spec.
>>>>>
>>>>> What bothers me is this patch seems to make devices that do behave
>>>>> correctly out of spec (registers will be writeable by default) -
>>>>> correct?
>>>>>
>>>>> How about we check for overflow and only do the hacks
>>>>> if it happens?
>>>>>
>>>>> Also, the code to initialize slot and root control registers is still
>>>>> there: it would seem that running it will corrupt memmory beyond the
>>>>> config array?
>>>>
>>>> I take this last bit back: registers we touch are at offset<   0x34.
>>>> Sorry about the noise. But the question about read-only registers
>>>> still stands.
>>>
>>> Also, where does the magic 0x34 come from? I'm guessing this is
>>> simply what's left till the end of the config space.
>>> So let's be conservative specific as possible with
>>> this hack:
>>>
>>
>> I believe the spec leaves room for interpretation, and thus the
>> resulting 'broken' device.  As I read the spec, the size of the struct can be:
>>
>> -- 0x2c for all devices, min., that are cap version 2 or higher.
>> -- 0x34 for devices with links, i.e., not a root-port-based device,
>>                                 , a device not integrated into the root port
>>                                 , or if it is, it uses a serial link anyhow
>>                                   (doesn't strip out 8b/10b serial link btwn device
>>                                    &  internal root port)
>> -- 0x3c for devices with slots, i.e., a bridge with downstream slots,
>>                                   i.e., not an endpoint, i.e., a PCI(e) bridge.
>>
>> Thus, 0x34 was chosen, since we don't support device assigning PCI bridges,
>> (not until MRIOV shows up, at least), and 0x34 fits the bug at hand, and
>> device cap/stat/control may be used/modified.
>>
>> So, a 'hack' is not needed.  In fact, the 0x34 size is a bit of a hack,
>> since the case to use 0x2c could be ascertained by checking if the device
>> is a root port device, _and_ it's not using a serial link, but a perusal of
>> root port devices on a number of systems I looked at always had this structure
>> greater than 0x2c, so I figured the simple heuristic of 0x34 was sufficient.
>>
>>> /* A version 2 device was observed to only have a partial
>>>    * implementation for the capability structure. Apparently, it doesn't
>>>    * implement the registers from slot capability 2 and on (offset 0x34),
>>>    * with the capability at offset 0xCC = 256 - 0x34. This is out of spec,
>>>    * but let's try to support this. */
>>> if (version == 2&&   pos == 0xCC) {
>>> 	size = 0x34;
>>> }
>
> The more I look at it, the more I think that maybe this is an especially
> broken device and we shouldn't change the default for it.  BTW, the
> programming reference for this device is here:
>
> http://www.broadcom.com/collateral/pg/5761-PG100-R.pdf
>
> They've burned up most of the capability area for vendor specific
> registers, so there's not enough room for the full pci-e capability
> structure.  I'd be fine with adding a test specifically for this.  As
> you suggested on IRC, print some warning and cut pci-e back to 0x34 if
> it extends past the legacy config space, and reject the device if it's
> still too small.  Leave the default 0x3c since this is the only device
> we've found with this problem.  Thanks,
>
> Alex
>
>
>

I have to admit, broken devices wrt specs isn't uncommon, and
this device seems to fit the bill.  Even though the diagram for
PCIe Capability structure implies optional regs, the verbal section in
7.8 says the registers should read-as-0 if they aren't supported,
implying, the space for them should be reserved and read as 0's.

So, a check & workaround seems in order....

so something like
max_cap_size = 256-base;
if (size > max_cap_size) then size=max_cap_size, fprintf warning
about structure mapping-size reduction?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin July 26, 2011, 10:46 a.m. UTC | #11
On Mon, Jul 25, 2011 at 04:42:50PM -0400, Don Dutile wrote:
> On 07/25/2011 04:20 PM, Alex Williamson wrote:
> >On Mon, 2011-07-25 at 15:37 -0400, Don Dutile wrote:
> >>On 07/24/2011 06:58 AM, Michael S. Tsirkin wrote:
> >>>On Sun, Jul 24, 2011 at 11:41:10AM +0300, Michael S. Tsirkin wrote:
> >>>>On Sun, Jul 24, 2011 at 11:12:44AM +0300, Michael S. Tsirkin wrote:
> >>>>>On Fri, Jul 22, 2011 at 02:35:47PM -0700, Chris Wright wrote:
> >>>>>>* Alex Williamson (alex.williamson@redhat.com) wrote:
> >>>>>>>On Fri, 2011-07-22 at 14:24 -0700, Chris Wright wrote:
> >>>>>>>>* Donald Dutile (ddutile@redhat.com) wrote:
> >>>>>>>>>diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> >>>>>>>>>index 36ad6b0..34db52e 100644
> >>>>>>>>>--- a/hw/device-assignment.c
> >>>>>>>>>+++ b/hw/device-assignment.c
> >>>>>>>>>@@ -1419,16 +1419,18 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
> >>>>>>>>>       }
> >>>>>>>>>
> >>>>>>>>>       if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0))) {
> >>>>>>>>>-        uint8_t version;
> >>>>>>>>>+        uint8_t version, size;
> >>>>>>>>>           uint16_t type, devctl, lnkcap, lnksta;
> >>>>>>>>>           uint32_t devcap;
> >>>>>>>>>-        int size = 0x3c; /* version 2 size */
> >>>>>>>>>
> >>>>>>>>>           version = pci_get_byte(pci_dev->config + pos + PCI_EXP_FLAGS);
> >>>>>>>>>           version&= PCI_EXP_FLAGS_VERS;
> >>>>>>>>>           if (version == 1) {
> >>>>>>>>>               size = 0x14;
> >>>>>>>>>-        } else if (version>   2) {
> >>>>>>>>>+        } else if (version == 2) {
> >>>>>>>>>+            /* don't include slot cap/stat/ctrl 2 regs; only support endpoints */
> >>>>>>>>>+            size = 0x34;
> >>>>>>>>
> >>>>>>>>That doesn't look correct to me.  The size is fixed, just that some
> >>>>>>>>registers are Reserved Zero when they do not apply (e.g. endpoint only).
> >>>>>>>
> >>>>>>>Apparently it can be interpreted differently.  In this case, we've seen
> >>>>>>>a tg3 device expose a v2 PCI express capability at offset 0xcc.  Using
> >>>>>>>0x3c bytes, we extend 8 bytes past the legacy config space area :(
> >>>>>>
> >>>>>>Wow, that device sounds broken to me.  The spec is pretty clear.
> >>>>>
> >>>>>Yes, I agree it's broken. Looks like something that
> >>>>>happens when a device is designed in parallel with the spec.
> >>>>>
> >>>>>What bothers me is this patch seems to make devices that do behave
> >>>>>correctly out of spec (registers will be writeable by default) -
> >>>>>correct?
> >>>>>
> >>>>>How about we check for overflow and only do the hacks
> >>>>>if it happens?
> >>>>>
> >>>>>Also, the code to initialize slot and root control registers is still
> >>>>>there: it would seem that running it will corrupt memmory beyond the
> >>>>>config array?
> >>>>
> >>>>I take this last bit back: registers we touch are at offset<   0x34.
> >>>>Sorry about the noise. But the question about read-only registers
> >>>>still stands.
> >>>
> >>>Also, where does the magic 0x34 come from? I'm guessing this is
> >>>simply what's left till the end of the config space.
> >>>So let's be conservative specific as possible with
> >>>this hack:
> >>>
> >>
> >>I believe the spec leaves room for interpretation, and thus the
> >>resulting 'broken' device.  As I read the spec, the size of the struct can be:
> >>
> >>-- 0x2c for all devices, min., that are cap version 2 or higher.
> >>-- 0x34 for devices with links, i.e., not a root-port-based device,
> >>                                , a device not integrated into the root port
> >>                                , or if it is, it uses a serial link anyhow
> >>                                  (doesn't strip out 8b/10b serial link btwn device
> >>                                   &  internal root port)
> >>-- 0x3c for devices with slots, i.e., a bridge with downstream slots,
> >>                                  i.e., not an endpoint, i.e., a PCI(e) bridge.
> >>
> >>Thus, 0x34 was chosen, since we don't support device assigning PCI bridges,
> >>(not until MRIOV shows up, at least), and 0x34 fits the bug at hand, and
> >>device cap/stat/control may be used/modified.
> >>
> >>So, a 'hack' is not needed.  In fact, the 0x34 size is a bit of a hack,
> >>since the case to use 0x2c could be ascertained by checking if the device
> >>is a root port device, _and_ it's not using a serial link, but a perusal of
> >>root port devices on a number of systems I looked at always had this structure
> >>greater than 0x2c, so I figured the simple heuristic of 0x34 was sufficient.
> >>
> >>>/* A version 2 device was observed to only have a partial
> >>>   * implementation for the capability structure. Apparently, it doesn't
> >>>   * implement the registers from slot capability 2 and on (offset 0x34),
> >>>   * with the capability at offset 0xCC = 256 - 0x34. This is out of spec,
> >>>   * but let's try to support this. */
> >>>if (version == 2&&   pos == 0xCC) {
> >>>	size = 0x34;
> >>>}
> >
> >The more I look at it, the more I think that maybe this is an especially
> >broken device and we shouldn't change the default for it.  BTW, the
> >programming reference for this device is here:
> >
> >http://www.broadcom.com/collateral/pg/5761-PG100-R.pdf
> >
> >They've burned up most of the capability area for vendor specific
> >registers, so there's not enough room for the full pci-e capability
> >structure.  I'd be fine with adding a test specifically for this.  As
> >you suggested on IRC, print some warning and cut pci-e back to 0x34 if
> >it extends past the legacy config space, and reject the device if it's
> >still too small.  Leave the default 0x3c since this is the only device
> >we've found with this problem.  Thanks,
> >
> >Alex
> >
> >
> >
> 
> I have to admit, broken devices wrt specs isn't uncommon, and
> this device seems to fit the bill.  Even though the diagram for
> PCIe Capability structure implies optional regs, the verbal section in
> 7.8 says the registers should read-as-0 if they aren't supported,
> implying, the space for them should be reserved and read as 0's.
> 
> So, a check & workaround seems in order....
> 
> so something like
> max_cap_size = 256-base;
> if (size > max_cap_size) then size=max_cap_size, fprintf warning
> about structure mapping-size reduction?

s/fprintf/error_report/

Also need to make sure we don't access the config array
outside the allocated size anywhere. That's why I thought
special-casing just the specific offset is a better idea.
Chris Wright Aug. 8, 2011, 6:44 p.m. UTC | #12
* Don Dutile (ddutile@redhat.com) wrote:
> On 07/24/2011 06:58 AM, Michael S. Tsirkin wrote:
> >On Sun, Jul 24, 2011 at 11:41:10AM +0300, Michael S. Tsirkin wrote:
> >>On Sun, Jul 24, 2011 at 11:12:44AM +0300, Michael S. Tsirkin wrote:
> >>>On Fri, Jul 22, 2011 at 02:35:47PM -0700, Chris Wright wrote:
> >>>>* Alex Williamson (alex.williamson@redhat.com) wrote:
> >>>>>On Fri, 2011-07-22 at 14:24 -0700, Chris Wright wrote:
> >>>>>>* Donald Dutile (ddutile@redhat.com) wrote:
> >>>>>>>+        } else if (version == 2) {
> >>>>>>>+            /* don't include slot cap/stat/ctrl 2 regs; only support endpoints */
> >>>>>>>+            size = 0x34;
> >>>>>>
> >>>>>>That doesn't look correct to me.  The size is fixed, just that some
> >>>>>>registers are Reserved Zero when they do not apply (e.g. endpoint only).
> >>>>>
> >>>>>Apparently it can be interpreted differently.  In this case, we've seen
> >>>>>a tg3 device expose a v2 PCI express capability at offset 0xcc.  Using
> >>>>>0x3c bytes, we extend 8 bytes past the legacy config space area :(
> >>>>
> >>>>Wow, that device sounds broken to me.  The spec is pretty clear.
> >>>
> >>>Yes, I agree it's broken. Looks like something that
> >>>happens when a device is designed in parallel with the spec.
> >>>
> >>>What bothers me is this patch seems to make devices that do behave
> >>>correctly out of spec (registers will be writeable by default) -
> >>>correct?
> >>>
> >>>How about we check for overflow and only do the hacks
> >>>if it happens?
> >>>
> >>>Also, the code to initialize slot and root control registers is still
> >>>there: it would seem that running it will corrupt memmory beyond the
> >>>config array?
> >>
> >>I take this last bit back: registers we touch are at offset<  0x34.
> >>Sorry about the noise. But the question about read-only registers
> >>still stands.
> >
> >Also, where does the magic 0x34 come from? I'm guessing this is
> >simply what's left till the end of the config space.
> >So let's be conservative specific as possible with
> >this hack:
> 
> I believe the spec leaves room for interpretation, and thus the
> resulting 'broken' device.  As I read the spec, the size of the struct can be:

Yeah, I can see how it might be misinterpreted, however, it's made
really clear in the config space test spec.  This strucuture is meant to
be full size.  Perhaps something like Michael suggested (and if really
paranoid + pci vendor/device id to quirk it).  I haven't come across many
devices have this wrong.

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 36ad6b0..34db52e 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1419,16 +1419,18 @@  static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
     }
 
     if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0))) {
-        uint8_t version;
+        uint8_t version, size;
         uint16_t type, devctl, lnkcap, lnksta;
         uint32_t devcap;
-        int size = 0x3c; /* version 2 size */
 
         version = pci_get_byte(pci_dev->config + pos + PCI_EXP_FLAGS);
         version &= PCI_EXP_FLAGS_VERS;
         if (version == 1) {
             size = 0x14;
-        } else if (version > 2) {
+        } else if (version == 2) {
+            /* don't include slot cap/stat/ctrl 2 regs; only support endpoints */
+            size = 0x34;
+        } else {
             fprintf(stderr, "Unsupported PCI express capability version %d\n",
                     version);
             return -EINVAL;