diff mbox

[RFC,v6] PCI: Set PCI-E Max Payload Size on fabric

Message ID CAC1AzdeKjxc=Nm9wkp7cSgUsVhuiA7YeomQsz9fH+nGTPFnLQw@mail.gmail.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

jordan hargrave July 14, 2011, 1:30 a.m. UTC
> On Wed, Jun 29, 2011 at 1:07 AM, Benjamin Herrenschmidt
> <benh@xxxxxxxxxxxxxxxxxxx> wrote:
>> On Wed, 2011-06-29 at 00:28 -0500, Jon Mason wrote:
>>> There is a sizable performance boost for having the largest possible
>>> maximum payload size on each PCI-E device.  However, the maximum payload
>>> size must be uniform on a given PCI-E fabric, and each device, bridge,
>>> and root port can have a different max size.  To find and configure the
>>> optimal MPS settings, one must walk the fabric and determine the largest
>>> MPS available on all the devices on the given fabric.
>>>

>>> -/* Program PCIE MaxPayload setting on device: ensure parent maxpayload <= device */
>>
>> It is interesting to see how busted that comment was in retrospect ;-)
>>
>>> -static int pci_set_payload(struct pci_dev *dev)
>>> -{
>>> -       int pos, ppos;
>>> -       u16 pctl, psz;
>>> -       u16 dctl, dsz, dcap, dmax;
>>> -       struct pci_dev *parent;
>>> -
>>> -       parent = dev->bus->self;
>>> -       pos = pci_find_capability(dev, PCI_CAP_ID_EXP);
>>> -       if (!pos)
>>> -               return 0;
>>> -
>>> -       /* Read Device MaxPayload capability and setting */
>>> -       pci_read_config_word(dev, pos + PCI_EXP_DEVCTL, &dctl);
>>> -       pci_read_config_word(dev, pos + PCI_EXP_DEVCAP, &dcap);
>>> -       dsz = (dctl & PCI_EXP_DEVCTL_PAYLOAD) >> 5;
>>> -       dmax = (dcap & PCI_EXP_DEVCAP_PAYLOAD);
>>> -
>>> -       /* Read Parent MaxPayload setting */
>>> -       ppos = pci_find_capability(parent, PCI_CAP_ID_EXP);
>>> -       if (!ppos)
>>> -               return 0;
>>> -       pci_read_config_word(parent, ppos + PCI_EXP_DEVCTL, &pctl);
>>> -       psz = (pctl &  PCI_EXP_DEVCTL_PAYLOAD) >> 5;
>>> -
>>> -       /* If parent payload > device max payload -> error
>>> -        * If parent payload > device payload -> set speed
>>> -        * If parent payload <= device payload -> do nothing
>>> -        */
>>
>> And this one too, it would be useful to know the rationale of whoever
>> came up with that code in the first place ... oh well, let's not dwelve
>> on it now...
>
> I believe Jordan (who has been Cc'ed on these e-mails) was the
> originator of this code.  Perhaps he can shed some light.
>

This was required for hotplugging a device whose parent MPS was set to
something other than the minimum 128.  When a PCIE device was inserted
into the hotplug bridge, the Linux pci subsystem would probe the
device, but it would only have a DEVCTL MPS=128 by default (DEVCAP MPS
= 512).  On this particular system, the parent bridge MPS was set to
DEVCTL MPS=256 (DEVCAP MPS=2048).  So whenever a transaction to the
new device was attempted, it would cause PCI errors as the transaction
was too large.

So what the old code did was this:
pctl=Read Parent DevCtl MPS
dcap=Read Device DevCap MPS
dctl=Read Device DevCtl MPS

If the parent MPS setting (pctl) was greater than the device
capability (dcap) -> error
else If the parent MPS setting (pctl) was greater than the device
setting (dctl) -> Set device MPS to that of the parent
else do nothing

>>> +/**
>>> + * pcie_set_mps - set PCI Express maximum payload size
>>> + * @dev: PCI device to query
>>> + * @rq: maximum payload size in bytes
>>> + *    valid values are 128, 256, 512, 1024, 2048, 4096
>>> + *
>>> + * If possible sets maximum payload size
>>> + */
>>> +int pcie_set_mps(struct pci_dev *dev, int mps)
>>> +{
>>> +     int cap, err = -EINVAL;
>>> +     u16 ctl, v;
>>> +
>>> +     if (mps < 128 || mps > 4096 || !is_power_of_2(mps))
>>> +             goto out;
>>> +
>>> +     v = (ffs(mps) - 8) << 5;
>>> +     if (v > dev->pcie_mpss)
>>> +             goto out;
>>> +
>>> +     cap = pci_pcie_cap(dev);
>>> +     if (!cap)
>>> +             goto out;
>>> +
>>> +     err = pci_read_config_word(dev, cap + PCI_EXP_DEVCTL, &ctl);
>>> +     if (err)
>>> +             goto out;
>>> +
>>> +     if ((ctl & PCI_EXP_DEVCTL_PAYLOAD) != v) {
>>> +             ctl &= ~PCI_EXP_DEVCTL_PAYLOAD;
>>> +             ctl |= v;
>>> +             err = pci_write_config_word(dev, cap + PCI_EXP_DEVCTL, ctl);
>>> +     }
>>> +out:
>>> +     return err;
>>> +}
>>> +
This code wasn't working on my system.. it would generate an error
when trying to set the device MPS.  the issue was that v = 128,256,...
while pcie->mpss is 0,1,2,...  The conditional would always be true
and the new MPS would not be set during hotplug.

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jon Mason July 14, 2011, 3:44 p.m. UTC | #1
On Wed, Jul 13, 2011 at 8:30 PM, jordan hargrave <jharg93@gmail.com> wrote:
>> On Wed, Jun 29, 2011 at 1:07 AM, Benjamin Herrenschmidt
>> <benh@xxxxxxxxxxxxxxxxxxx> wrote:
>>> On Wed, 2011-06-29 at 00:28 -0500, Jon Mason wrote:
>>>> There is a sizable performance boost for having the largest possible
>>>> maximum payload size on each PCI-E device.  However, the maximum payload
>>>> size must be uniform on a given PCI-E fabric, and each device, bridge,
>>>> and root port can have a different max size.  To find and configure the
>>>> optimal MPS settings, one must walk the fabric and determine the largest
>>>> MPS available on all the devices on the given fabric.
>>>>
>
>>>> -/* Program PCIE MaxPayload setting on device: ensure parent maxpayload <= device */
>>>
>>> It is interesting to see how busted that comment was in retrospect ;-)
>>>
>>>> -static int pci_set_payload(struct pci_dev *dev)
>>>> -{
>>>> -       int pos, ppos;
>>>> -       u16 pctl, psz;
>>>> -       u16 dctl, dsz, dcap, dmax;
>>>> -       struct pci_dev *parent;
>>>> -
>>>> -       parent = dev->bus->self;
>>>> -       pos = pci_find_capability(dev, PCI_CAP_ID_EXP);
>>>> -       if (!pos)
>>>> -               return 0;
>>>> -
>>>> -       /* Read Device MaxPayload capability and setting */
>>>> -       pci_read_config_word(dev, pos + PCI_EXP_DEVCTL, &dctl);
>>>> -       pci_read_config_word(dev, pos + PCI_EXP_DEVCAP, &dcap);
>>>> -       dsz = (dctl & PCI_EXP_DEVCTL_PAYLOAD) >> 5;
>>>> -       dmax = (dcap & PCI_EXP_DEVCAP_PAYLOAD);
>>>> -
>>>> -       /* Read Parent MaxPayload setting */
>>>> -       ppos = pci_find_capability(parent, PCI_CAP_ID_EXP);
>>>> -       if (!ppos)
>>>> -               return 0;
>>>> -       pci_read_config_word(parent, ppos + PCI_EXP_DEVCTL, &pctl);
>>>> -       psz = (pctl &  PCI_EXP_DEVCTL_PAYLOAD) >> 5;
>>>> -
>>>> -       /* If parent payload > device max payload -> error
>>>> -        * If parent payload > device payload -> set speed
>>>> -        * If parent payload <= device payload -> do nothing
>>>> -        */
>>>
>>> And this one too, it would be useful to know the rationale of whoever
>>> came up with that code in the first place ... oh well, let's not dwelve
>>> on it now...
>>
>> I believe Jordan (who has been Cc'ed on these e-mails) was the
>> originator of this code.  Perhaps he can shed some light.
>>
>
> This was required for hotplugging a device whose parent MPS was set to
> something other than the minimum 128.  When a PCIE device was inserted
> into the hotplug bridge, the Linux pci subsystem would probe the
> device, but it would only have a DEVCTL MPS=128 by default (DEVCAP MPS
> = 512).  On this particular system, the parent bridge MPS was set to
> DEVCTL MPS=256 (DEVCAP MPS=2048).  So whenever a transaction to the
> new device was attempted, it would cause PCI errors as the transaction
> was too large.
>
> So what the old code did was this:
> pctl=Read Parent DevCtl MPS
> dcap=Read Device DevCap MPS
> dctl=Read Device DevCtl MPS
>
> If the parent MPS setting (pctl) was greater than the device
> capability (dcap) -> error
> else If the parent MPS setting (pctl) was greater than the device
> setting (dctl) -> Set device MPS to that of the parent
> else do nothing
>
>>>> +/**
>>>> + * pcie_set_mps - set PCI Express maximum payload size
>>>> + * @dev: PCI device to query
>>>> + * @rq: maximum payload size in bytes
>>>> + *    valid values are 128, 256, 512, 1024, 2048, 4096
>>>> + *
>>>> + * If possible sets maximum payload size
>>>> + */
>>>> +int pcie_set_mps(struct pci_dev *dev, int mps)
>>>> +{
>>>> +     int cap, err = -EINVAL;
>>>> +     u16 ctl, v;
>>>> +
>>>> +     if (mps < 128 || mps > 4096 || !is_power_of_2(mps))
>>>> +             goto out;
>>>> +
>>>> +     v = (ffs(mps) - 8) << 5;
>>>> +     if (v > dev->pcie_mpss)
>>>> +             goto out;
>>>> +
>>>> +     cap = pci_pcie_cap(dev);
>>>> +     if (!cap)
>>>> +             goto out;
>>>> +
>>>> +     err = pci_read_config_word(dev, cap + PCI_EXP_DEVCTL, &ctl);
>>>> +     if (err)
>>>> +             goto out;
>>>> +
>>>> +     if ((ctl & PCI_EXP_DEVCTL_PAYLOAD) != v) {
>>>> +             ctl &= ~PCI_EXP_DEVCTL_PAYLOAD;
>>>> +             ctl |= v;
>>>> +             err = pci_write_config_word(dev, cap + PCI_EXP_DEVCTL, ctl);
>>>> +     }
>>>> +out:
>>>> +     return err;
>>>> +}
>>>> +
> This code wasn't working on my system.. it would generate an error
> when trying to set the device MPS.  the issue was that v = 128,256,...
> while pcie->mpss is 0,1,2,...  The conditional would always be true
> and the new MPS would not be set during hotplug.

I see the bug.  I'll have a new version coming out shortly, as I want
to verify it first on the faulty HW we have internally.

>
> --- one Wed Jul 13 20:25:53 2011
> +++ two Wed Jul 13 20:25:34 2011
> @@ -7,7 +7,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
>              goto out;
>
>      v = (ffs(mps) - 8) << 5;
> -     if (v > dev->pcie_mpss)
> +     if (v > (128 << dev->pcie_mpss))
>              goto out;
>
>      cap = pci_pcie_cap(dev);
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- one Wed Jul 13 20:25:53 2011
+++ two Wed Jul 13 20:25:34 2011
@@ -7,7 +7,7 @@  int pcie_set_mps(struct pci_dev *dev, int mps)
              goto out;

      v = (ffs(mps) - 8) << 5;
-     if (v > dev->pcie_mpss)
+     if (v > (128 << dev->pcie_mpss))
              goto out;

      cap = pci_pcie_cap(dev);