diff mbox

[-v3] PCI: update device mps when doing pci hotplug

Message ID 1375776540-23988-1-git-send-email-wangyijing@huawei.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Yijing Wang Aug. 6, 2013, 8:09 a.m. UTC
v2->v3: Update CC stable tag suggested by Li Zefan.
v1->v2: Update patch log, remove Joe's reported-by, because his problem
        was mainly caused by BIOS incorrect setting. But this patch mainly
		to fix the bug caused by device hot add. Conservatively, this 
		version only update the mps problem when hot add. When the device
		mps < parent mps found, this patch try to update device mps.
		It seems unlikely device mps > parent mps after hot add device.
		So we don't care that situation.

Currently we don't update device's mps vaule when doing
pci device hot-add. The hot-added device's mps will be set
to default value (128B). But the upstream port device's mps
may be larger than 128B which was set by firmware during
system bootup. In this case the new added device may not
work normally. This patch try to update the hot added device
mps euqal to its parent mps, if device mpss < parent mps,
print warning.

References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
Reported-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: Jon Mason <jdmason@kudzu.us>
Cc: stable@vger.kernel.org # 3.4+
---
 drivers/pci/probe.c |   43 +++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 43 insertions(+), 0 deletions(-)

Comments

Jon Mason Aug. 6, 2013, 11:45 p.m. UTC | #1
On Tue, Aug 6, 2013 at 1:09 AM, Yijing Wang <wangyijing@huawei.com> wrote:
> v2->v3: Update CC stable tag suggested by Li Zefan.
> v1->v2: Update patch log, remove Joe's reported-by, because his problem
>         was mainly caused by BIOS incorrect setting. But this patch mainly
>                 to fix the bug caused by device hot add. Conservatively, this
>                 version only update the mps problem when hot add. When the device
>                 mps < parent mps found, this patch try to update device mps.
>                 It seems unlikely device mps > parent mps after hot add device.
>                 So we don't care that situation.
>
> Currently we don't update device's mps vaule when doing
> pci device hot-add. The hot-added device's mps will be set
> to default value (128B). But the upstream port device's mps
> may be larger than 128B which was set by firmware during
> system bootup. In this case the new added device may not
> work normally. This patch try to update the hot added device
> mps euqal to its parent mps, if device mpss < parent mps,
> print warning.
>
> References: https://bugzilla.kernel.org/show_bug.cgi?id=60671
> Reported-by: Yijing Wang <wangyijing@huawei.com>
> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> Cc: Jon Mason <jdmason@kudzu.us>
> Cc: stable@vger.kernel.org # 3.4+
> ---
>  drivers/pci/probe.c |   43 +++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 43 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index cf57fe7..9b0e634 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1603,6 +1603,40 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
>         return 0;
>  }
>
> +static int pcie_bus_update_set(struct pci_dev *dev, void *data)
> +{
> +       int mps, p_mps;
> +
> +       if (!pci_is_pcie(dev) || !dev->bus->self)
> +               return 0;
> +
> +       mps = pcie_get_mps(dev);
> +       p_mps = pcie_get_mps(dev->bus->self);
> +
> +       if (mps < p_mps)
> +               goto update;

It would be more clear to have it return 0 if mps >= p_mps, and not
have the goto statement.

> +
> +       return 0;
> +
> +update:
> +       /* If current mpss is lager than upstream, use upstream mps to update
> +        * current mps, otherwise print warning info.
> +        */
> +       if ((128 << dev->pcie_mpss) >= p_mps)
> +               pcie_write_mps(dev, p_mps);
> +       else
> +               dev_warn(&dev->dev, "MPS %d MPSS %d both smaller than upstream MPS %d\n"
> +                               "If necessary, use \"pci=pcie_bus_peer2peer\" boot parameter to avoid this problem\n",
> +                               mps, 128 << dev->pcie_mpss, p_mps);
> +       return 0;
> +}
> +
> +static void pcie_bus_update_setting(struct pci_bus *bus)
> +{
> +       if (bus->self->is_hotplug_bridge)
> +               pci_walk_bus(bus, pcie_bus_update_set, NULL);

Why not just call pcie_bus_configure_set(bus->self, &smpss); with smpss of 0?

> +}
> +
>  /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down,
>   * parents then children fashion.  If this changes, then this code will not
>   * work as designed.
> @@ -1614,6 +1648,15 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss)
>         if (!pci_is_pcie(bus->self))
>                 return;
>
> +       /* Sometimes we should update device mps here,
> +        * eg. after hot add, device mps value will be
> +        * set to default(128B), but the upstream port
> +        * mps value may be larger than 128B, if we do
> +        * not update the device mps, it maybe can not
> +        * work normally.

This is slightly confusing to me.  It would be more clear to say:
There are situations (i.e., hot add) where the upstream port might
have a larger MPS than the device.  In these situations, the port MPS
needs to be reconfigured to the lower value or the device will not
operate properly.

> +        */
> +       pcie_bus_update_setting(bus);

This only seems to be necessary in the "TUNE_OFF" case.  It would be
best to move it under that, just 2 lines down.

> +
>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF)

Perhaps it is time to make "SAFE" the default option, per the
discussion at last years PCI mini-summit.
Bjorn, thoughts?


Thanks,
Jon

>                 return;
>
> --
> 1.7.1
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang Aug. 7, 2013, 1:52 a.m. UTC | #2
>> +static int pcie_bus_update_set(struct pci_dev *dev, void *data)
>> +{
>> +       int mps, p_mps;
>> +
>> +       if (!pci_is_pcie(dev) || !dev->bus->self)
>> +               return 0;
>> +
>> +       mps = pcie_get_mps(dev);
>> +       p_mps = pcie_get_mps(dev->bus->self);
>> +
>> +       if (mps < p_mps)
>> +               goto update;

Hi Jon,
   Thanks for your review and comments!

> 
> It would be more clear to have it return 0 if mps >= p_mps, and not
> have the goto statement.

OK, will update.


> 
>> +
>> +       return 0;
>> +
>> +update:
>> +       /* If current mpss is lager than upstream, use upstream mps to update
>> +        * current mps, otherwise print warning info.
>> +        */
>> +       if ((128 << dev->pcie_mpss) >= p_mps)
>> +               pcie_write_mps(dev, p_mps);
>> +       else
>> +               dev_warn(&dev->dev, "MPS %d MPSS %d both smaller than upstream MPS %d\n"
>> +                               "If necessary, use \"pci=pcie_bus_peer2peer\" boot parameter to avoid this problem\n",
>> +                               mps, 128 << dev->pcie_mpss, p_mps);
>> +       return 0;
>> +}
>> +
>> +static void pcie_bus_update_setting(struct pci_bus *bus)
>> +{
>> +       if (bus->self->is_hotplug_bridge)
>> +               pci_walk_bus(bus, pcie_bus_update_set, NULL);
> 
> Why not just call pcie_bus_configure_set(bus->self, &smpss); with smpss of 0?

Because for safety, I won't touch the running pcie device(RP, UP/DP,EP) mps.
Consider the following case:
Root Port------>Upstream Port------>Downstream Port 1 ---->Endpoint device A (newly inserted device)
mps=256          256               |    256			128
                                   |Downstream Port 2 ---->Endpoint device B
					256			256

This patch try to update device A's mps equal to its parent DP 1 mps (256).
Because EP A device is not running, newly inserted, not configure yet. So configure its mps
to 256B is safe.

But if we try to configure DP 1 mps to 128B, it's not safe. its parent Upstream port still is 256.

> 
>> +}
>> +
>>  /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down,
>>   * parents then children fashion.  If this changes, then this code will not
>>   * work as designed.
>> @@ -1614,6 +1648,15 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss)
>>         if (!pci_is_pcie(bus->self))
>>                 return;
>>
>> +       /* Sometimes we should update device mps here,
>> +        * eg. after hot add, device mps value will be
>> +        * set to default(128B), but the upstream port
>> +        * mps value may be larger than 128B, if we do
>> +        * not update the device mps, it maybe can not
>> +        * work normally.
> 
> This is slightly confusing to me.  It would be more clear to say:
> There are situations (i.e., hot add) where the upstream port might
> have a larger MPS than the device.  In these situations, the port MPS
> needs to be reconfigured to the lower value or the device will not
> operate properly.

Sorry for my poor English, I mean the device  is the newly hot added device, not
the port device. So we will only reconfigure the newly hot added device mps.

> 
>> +        */
>> +       pcie_bus_update_setting(bus);
> 
> This only seems to be necessary in the "TUNE_OFF" case.  It would be
> best to move it under that, just 2 lines down.

Good idea, will update, thanks!


Thanks!
Yijing.
> 
>> +
>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF)
> 
> Perhaps it is time to make "SAFE" the default option, per the
> discussion at last years PCI mini-summit.
> Bjorn, thoughts?
> 
> 
> Thanks,
> Jon
> 
>>                 return;
>>
>> --
>> 1.7.1
>>
>>
> 
> .
>
Jon Mason Aug. 7, 2013, 4:56 p.m. UTC | #3
On Tue, Aug 6, 2013 at 6:52 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>> +static int pcie_bus_update_set(struct pci_dev *dev, void *data)
>>> +{
>>> +       int mps, p_mps;
>>> +
>>> +       if (!pci_is_pcie(dev) || !dev->bus->self)
>>> +               return 0;
>>> +
>>> +       mps = pcie_get_mps(dev);
>>> +       p_mps = pcie_get_mps(dev->bus->self);
>>> +
>>> +       if (mps < p_mps)
>>> +               goto update;
>
> Hi Jon,
>    Thanks for your review and comments!
>
>>
>> It would be more clear to have it return 0 if mps >= p_mps, and not
>> have the goto statement.
>
> OK, will update.
>
>
>>
>>> +
>>> +       return 0;
>>> +
>>> +update:
>>> +       /* If current mpss is lager than upstream, use upstream mps to update
>>> +        * current mps, otherwise print warning info.
>>> +        */
>>> +       if ((128 << dev->pcie_mpss) >= p_mps)
>>> +               pcie_write_mps(dev, p_mps);
>>> +       else
>>> +               dev_warn(&dev->dev, "MPS %d MPSS %d both smaller than upstream MPS %d\n"
>>> +                               "If necessary, use \"pci=pcie_bus_peer2peer\" boot parameter to avoid this problem\n",
>>> +                               mps, 128 << dev->pcie_mpss, p_mps);
>>> +       return 0;
>>> +}
>>> +
>>> +static void pcie_bus_update_setting(struct pci_bus *bus)
>>> +{
>>> +       if (bus->self->is_hotplug_bridge)
>>> +               pci_walk_bus(bus, pcie_bus_update_set, NULL);
>>
>> Why not just call pcie_bus_configure_set(bus->self, &smpss); with smpss of 0?
>
> Because for safety, I won't touch the running pcie device(RP, UP/DP,EP) mps.
> Consider the following case:
> Root Port------>Upstream Port------>Downstream Port 1 ---->Endpoint device A (newly inserted device)
> mps=256          256               |    256                     128
>                                    |Downstream Port 2 ---->Endpoint device B
>                                         256                     256
>
> This patch try to update device A's mps equal to its parent DP 1 mps (256).
> Because EP A device is not running, newly inserted, not configure yet. So configure its mps
> to 256B is safe.

I understand your logic now and agree.

> But if we try to configure DP 1 mps to 128B, it's not safe. its parent Upstream port still is 256.

This is exactly the case that the "PERFORMANCE" option is trying to
allow.  If the MRRS is set to the MPS of the device, it should work.
If not, then we should rip out all of the "PERFORMANCE" code.  Is this
something you can verify?

Thanks,
Jon

>>
>>> +}
>>> +
>>>  /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down,
>>>   * parents then children fashion.  If this changes, then this code will not
>>>   * work as designed.
>>> @@ -1614,6 +1648,15 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss)
>>>         if (!pci_is_pcie(bus->self))
>>>                 return;
>>>
>>> +       /* Sometimes we should update device mps here,
>>> +        * eg. after hot add, device mps value will be
>>> +        * set to default(128B), but the upstream port
>>> +        * mps value may be larger than 128B, if we do
>>> +        * not update the device mps, it maybe can not
>>> +        * work normally.
>>
>> This is slightly confusing to me.  It would be more clear to say:
>> There are situations (i.e., hot add) where the upstream port might
>> have a larger MPS than the device.  In these situations, the port MPS
>> needs to be reconfigured to the lower value or the device will not
>> operate properly.
>
> Sorry for my poor English, I mean the device  is the newly hot added device, not
> the port device. So we will only reconfigure the newly hot added device mps.
>
>>
>>> +        */
>>> +       pcie_bus_update_setting(bus);
>>
>> This only seems to be necessary in the "TUNE_OFF" case.  It would be
>> best to move it under that, just 2 lines down.
>
> Good idea, will update, thanks!
>
>
> Thanks!
> Yijing.
>>
>>> +
>>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF)
>>
>> Perhaps it is time to make "SAFE" the default option, per the
>> discussion at last years PCI mini-summit.
>> Bjorn, thoughts?
>>
>>
>> Thanks,
>> Jon
>>
>>>                 return;
>>>
>>> --
>>> 1.7.1
>>>
>>>
>>
>> .
>>
>
>
> --
> Thanks!
> Yijing
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang Aug. 8, 2013, 2:48 a.m. UTC | #4
>> Because for safety, I won't touch the running pcie device(RP, UP/DP,EP) mps.
>> Consider the following case:
>> Root Port------>Upstream Port------>Downstream Port 1 ---->Endpoint device A (newly inserted device)
>> mps=256          256               |    256                     128
>>                                    |Downstream Port 2 ---->Endpoint device B
>>                                         256                     256
>>
>> This patch try to update device A's mps equal to its parent DP 1 mps (256).
>> Because EP A device is not running, newly inserted, not configure yet. So configure its mps
>> to 256B is safe.
> 
> I understand your logic now and agree.

Hi Jon,
   Thanks for your comments!
I think we can improve this logic a little . As your comments in pcie_find_smpss(), if the PCI hotplug
slot is directly connected to the root port and there are not other devices on the fabric, then this
is not an issue..  So I think in this case, we can first update the newly hot added device mps as above logic.
if newly hot added device mpss < root port mps, then we can modify both root port device and newly hot added device
mps to device mpss. What do you think?

eg.
Root port --------------> slot (mps is default 128,assume mpss is 256)
(mps 512)

Only in this case, I think we try to update the parent device is safe.

after update:
Root port --------------> slot (mps is default 128,assume mpss is 256)
(mps 512-->256)		mps 128--->256

> 
>> But if we try to configure DP 1 mps to 128B, it's not safe. its parent Upstream port still is 256.
> 
> This is exactly the case that the "PERFORMANCE" option is trying to
> allow.  If the MRRS is set to the MPS of the device, it should work.
> If not, then we should rip out all of the "PERFORMANCE" code.  Is this
> something you can verify?

Hi Jon,
   I am not very clear about what the role of MRRS. So if I understand wrong, please correct me, thanks.

eg.
DP1 ----------------> EP A
mps=256			mps=128

MRRS can control the read request TLP size when the Function as a Requester.
So if we set the EP A MRRS to mps value(128), EP A won't generate
TLP larger than 128, so Request stream from EP A to DP1 is safe.

But if EP A is as a receiver, DP 1 generate completion TLP (larger than 128) to EP A.
these TLPs will be discarded by EP A, right?

So in my idea, If we set both mps and mrrs of EP A to 128, we can insure TLPs stream from EP A is safe.
But How do we guarantee that DP1 won't generate TLPs to EP A is larger than 128?

Jon, PCIe Spec only involves mps and mrrs setting a little, Are there any other specs about this?


Thanks!
Yijing.


> 
> Thanks,
> Jon
> 
>>>
>>>> +}
>>>> +
>>>>  /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down,
>>>>   * parents then children fashion.  If this changes, then this code will not
>>>>   * work as designed.
>>>> @@ -1614,6 +1648,15 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss)
>>>>         if (!pci_is_pcie(bus->self))
>>>>                 return;
>>>>
>>>> +       /* Sometimes we should update device mps here,
>>>> +        * eg. after hot add, device mps value will be
>>>> +        * set to default(128B), but the upstream port
>>>> +        * mps value may be larger than 128B, if we do
>>>> +        * not update the device mps, it maybe can not
>>>> +        * work normally.
>>>
>>> This is slightly confusing to me.  It would be more clear to say:
>>> There are situations (i.e., hot add) where the upstream port might
>>> have a larger MPS than the device.  In these situations, the port MPS
>>> needs to be reconfigured to the lower value or the device will not
>>> operate properly.
>>
>> Sorry for my poor English, I mean the device  is the newly hot added device, not
>> the port device. So we will only reconfigure the newly hot added device mps.
>>
>>>
>>>> +        */
>>>> +       pcie_bus_update_setting(bus);
>>>
>>> This only seems to be necessary in the "TUNE_OFF" case.  It would be
>>> best to move it under that, just 2 lines down.
>>
>> Good idea, will update, thanks!
>>
>>
>> Thanks!
>> Yijing.
>>>
>>>> +
>>>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF)
>>>
>>> Perhaps it is time to make "SAFE" the default option, per the
>>> discussion at last years PCI mini-summit.
>>> Bjorn, thoughts?
>>>
>>>
>>> Thanks,
>>> Jon
>>>
>>>>                 return;
>>>>
>>>> --
>>>> 1.7.1
>>>>
>>>>
>>>
>>> .
>>>
>>
>>
>> --
>> Thanks!
>> Yijing
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
>
Jon Mason Aug. 9, 2013, 12:47 a.m. UTC | #5
On Wed, Aug 7, 2013 at 7:48 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>> Because for safety, I won't touch the running pcie device(RP, UP/DP,EP) mps.
>>> Consider the following case:
>>> Root Port------>Upstream Port------>Downstream Port 1 ---->Endpoint device A (newly inserted device)
>>> mps=256          256               |    256                     128
>>>                                    |Downstream Port 2 ---->Endpoint device B
>>>                                         256                     256
>>>
>>> This patch try to update device A's mps equal to its parent DP 1 mps (256).
>>> Because EP A device is not running, newly inserted, not configure yet. So configure its mps
>>> to 256B is safe.
>>
>> I understand your logic now and agree.
>
> Hi Jon,
>    Thanks for your comments!
> I think we can improve this logic a little . As your comments in pcie_find_smpss(), if the PCI hotplug
> slot is directly connected to the root port and there are not other devices on the fabric, then this
> is not an issue..  So I think in this case, we can first update the newly hot added device mps as above logic.
> if newly hot added device mpss < root port mps, then we can modify both root port device and newly hot added device
> mps to device mpss. What do you think?
>
> eg.
> Root port --------------> slot (mps is default 128,assume mpss is 256)
> (mps 512)
>
> Only in this case, I think we try to update the parent device is safe.
>
> after update:
> Root port --------------> slot (mps is default 128,assume mpss is 256)
> (mps 512-->256)         mps 128--->256


Yes, but this is where it get difficult.  You can only do it if the
parent bus is the root port.  Otherwise, the other devices on the bus
will already have their MPS configured and changing the root port MPS
will do bad things.  That is why I have the check in pcie_find_smpss()
of

        if (dev->is_hotplug_bridge && (!list_is_singular(&dev->bus->devices) ||
             (dev->bus->self &&
              pci_pcie_type(dev->bus->self) != PCI_EXP_TYPE_ROOT_PORT)))

So, you either need to mimic this code in your new function or make
PCIE_BUS_SAFE the default option.  I'm leaning towards the latter, but
if you need something for the stable tree then we should temporarily
have a patch which does it the former.

Thoughts?

>
>>
>>> But if we try to configure DP 1 mps to 128B, it's not safe. its parent Upstream port still is 256.
>>
>> This is exactly the case that the "PERFORMANCE" option is trying to
>> allow.  If the MRRS is set to the MPS of the device, it should work.
>> If not, then we should rip out all of the "PERFORMANCE" code.  Is this
>> something you can verify?
>
> Hi Jon,
>    I am not very clear about what the role of MRRS. So if I understand wrong, please correct me, thanks.
>
> eg.
> DP1 ----------------> EP A
> mps=256                 mps=128
>
> MRRS can control the read request TLP size when the Function as a Requester.
> So if we set the EP A MRRS to mps value(128), EP A won't generate
> TLP larger than 128, so Request stream from EP A to DP1 is safe.
>
> But if EP A is as a receiver, DP 1 generate completion TLP (larger than 128) to EP A.
> these TLPs will be discarded by EP A, right?

Yes

> So in my idea, If we set both mps and mrrs of EP A to 128, we can insure TLPs stream from EP A is safe.
> But How do we guarantee that DP1 won't generate TLPs to EP A is larger than 128?

The device will never do any reads of larger than the MRRS, but writes
to the device are an issue.  Assuming that all I/O is between CPU/RAM
and the device and there are no peer-to-peer transfers, we should be
safe (but is that a safe assumption?).  So my question to you is that
the setup you have that is failing due to MPS sizes being off, can you
set the only MRRS to 128 and get it working?

> Jon, PCIe Spec only involves mps and mrrs setting a little, Are there any other specs about this?

I have the Mindshare PCIE book, but it doesn't go very deep into this.
 Most of this is an idea from benh, which I attempted to implement.
If it works then all credit goes to him, if it breaks then it is my
fault :)

Thanks,
Jon

>
>
> Thanks!
> Yijing.
>
>
>>
>> Thanks,
>> Jon
>>
>>>>
>>>>> +}
>>>>> +
>>>>>  /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down,
>>>>>   * parents then children fashion.  If this changes, then this code will not
>>>>>   * work as designed.
>>>>> @@ -1614,6 +1648,15 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss)
>>>>>         if (!pci_is_pcie(bus->self))
>>>>>                 return;
>>>>>
>>>>> +       /* Sometimes we should update device mps here,
>>>>> +        * eg. after hot add, device mps value will be
>>>>> +        * set to default(128B), but the upstream port
>>>>> +        * mps value may be larger than 128B, if we do
>>>>> +        * not update the device mps, it maybe can not
>>>>> +        * work normally.
>>>>
>>>> This is slightly confusing to me.  It would be more clear to say:
>>>> There are situations (i.e., hot add) where the upstream port might
>>>> have a larger MPS than the device.  In these situations, the port MPS
>>>> needs to be reconfigured to the lower value or the device will not
>>>> operate properly.
>>>
>>> Sorry for my poor English, I mean the device  is the newly hot added device, not
>>> the port device. So we will only reconfigure the newly hot added device mps.
>>>
>>>>
>>>>> +        */
>>>>> +       pcie_bus_update_setting(bus);
>>>>
>>>> This only seems to be necessary in the "TUNE_OFF" case.  It would be
>>>> best to move it under that, just 2 lines down.
>>>
>>> Good idea, will update, thanks!
>>>
>>>
>>> Thanks!
>>> Yijing.
>>>>
>>>>> +
>>>>>         if (pcie_bus_config == PCIE_BUS_TUNE_OFF)
>>>>
>>>> Perhaps it is time to make "SAFE" the default option, per the
>>>> discussion at last years PCI mini-summit.
>>>> Bjorn, thoughts?
>>>>
>>>>
>>>> Thanks,
>>>> Jon
>>>>
>>>>>                 return;
>>>>>
>>>>> --
>>>>> 1.7.1
>>>>>
>>>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>> --
>>> Thanks!
>>> Yijing
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> .
>>
>
>
> --
> Thanks!
> Yijing
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang Aug. 9, 2013, 2:39 a.m. UTC | #6
>> Hi Jon,
>>    Thanks for your comments!
>> I think we can improve this logic a little . As your comments in pcie_find_smpss(), if the PCI hotplug
>> slot is directly connected to the root port and there are not other devices on the fabric, then this
>> is not an issue..  So I think in this case, we can first update the newly hot added device mps as above logic.
>> if newly hot added device mpss < root port mps, then we can modify both root port device and newly hot added device
>> mps to device mpss. What do you think?
>>
>> eg.
>> Root port --------------> slot (mps is default 128,assume mpss is 256)
>> (mps 512)
>>
>> Only in this case, I think we try to update the parent device is safe.
>>
>> after update:
>> Root port --------------> slot (mps is default 128,assume mpss is 256)
>> (mps 512-->256)         mps 128--->256
> 
> 
> Yes, but this is where it get difficult.  You can only do it if the
> parent bus is the root port.  Otherwise, the other devices on the bus
> will already have their MPS configured and changing the root port MPS
> will do bad things.  That is why I have the check in pcie_find_smpss()
> of
> 
>         if (dev->is_hotplug_bridge && (!list_is_singular(&dev->bus->devices) ||
>              (dev->bus->self &&
>               pci_pcie_type(dev->bus->self) != PCI_EXP_TYPE_ROOT_PORT)))
> 
> So, you either need to mimic this code in your new function or make
> PCIE_BUS_SAFE the default option.  I'm leaning towards the latter, but
> if you need something for the stable tree then we should temporarily
> have a patch which does it the former.
> 
> Thoughts?

Yes, I need one patch fix this issue in the stable tree. Will you send out
the series patch recently? Bjorn now began to review this mps code, so
this is a good chance to rework mps related codes I think.

> 
>>
>>>
>>>> But if we try to configure DP 1 mps to 128B, it's not safe. its parent Upstream port still is 256.
>>>
>>> This is exactly the case that the "PERFORMANCE" option is trying to
>>> allow.  If the MRRS is set to the MPS of the device, it should work.
>>> If not, then we should rip out all of the "PERFORMANCE" code.  Is this
>>> something you can verify?
>>
>> Hi Jon,
>>    I am not very clear about what the role of MRRS. So if I understand wrong, please correct me, thanks.
>>
>> eg.
>> DP1 ----------------> EP A
>> mps=256                 mps=128
>>
>> MRRS can control the read request TLP size when the Function as a Requester.
>> So if we set the EP A MRRS to mps value(128), EP A won't generate
>> TLP larger than 128, so Request stream from EP A to DP1 is safe.
>>
>> But if EP A is as a receiver, DP 1 generate completion TLP (larger than 128) to EP A.
>> these TLPs will be discarded by EP A, right?
> 
> Yes
> 
>> So in my idea, If we set both mps and mrrs of EP A to 128, we can insure TLPs stream from EP A is safe.
>> But How do we guarantee that DP1 won't generate TLPs to EP A is larger than 128?
> 
> The device will never do any reads of larger than the MRRS, but writes
> to the device are an issue.  Assuming that all I/O is between CPU/RAM
> and the device and there are no peer-to-peer transfers, we should be
> safe (but is that a safe assumption?).  So my question to you is that
> the setup you have that is failing due to MPS sizes being off, can you
> set the only MRRS to 128 and get it working?

I don't have PCIe analyzer, so I can not catch TLP, I'm not sure the TLPs size in stream.
my test is using ping $dest_ip -s 65500

local machine				remote machine
IP: 128.5.160.31			IP: 128.5.160.28	


Root port ---------------->Endpoint device(bnx2 NIC)
00:01.0				01:00.1

default setting
	Root Port(00:01.0):
	Capabilities: [90] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag+ RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes

	Endpoint device(01:00.1)
	Capabilities: [ac] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes

test1:
change root port mps to 256

Root port ---------------->EndPoint device(bnx2 NIC)
mps=256	(default 128)		mps=128
mrrs=128			mrrs=512(default)

result:
ping 128.5.160.28(remote ip) -s 65500  ----->fail
ping 128.5.160.31(remove machine ping local ip)------->fail

test2:
change root port mps to 256, EP device mrrs to 128


Root port ---------------->EndPoint device(bnx2 NIC)
mps=256	(default 128)		mps=128
mrrs=128			mrrs=128(default 512)

result:
ping 128.5.160.28(remote ip) -s 65500  ----->success
ping 128.5.160.31(remove machine ping local ip)------->success

It seems change mrrs to 128 take effect.

Thanks!
Yijing.

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index cf57fe7..9b0e634 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1603,6 +1603,40 @@  static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
 	return 0;
 }
 
+static int pcie_bus_update_set(struct pci_dev *dev, void *data)
+{
+	int mps, p_mps;
+
+	if (!pci_is_pcie(dev) || !dev->bus->self)
+		return 0;
+
+	mps = pcie_get_mps(dev);
+	p_mps = pcie_get_mps(dev->bus->self);
+
+	if (mps < p_mps)
+		goto update;
+
+	return 0;
+
+update:
+	/* If current mpss is lager than upstream, use upstream mps to update
+	 * current mps, otherwise print warning info.
+	 */
+	if ((128 << dev->pcie_mpss) >= p_mps)
+		pcie_write_mps(dev, p_mps);
+	else
+		dev_warn(&dev->dev, "MPS %d MPSS %d both smaller than upstream MPS %d\n"
+				"If necessary, use \"pci=pcie_bus_peer2peer\" boot parameter to avoid this problem\n",
+				mps, 128 << dev->pcie_mpss, p_mps);
+	return 0;
+}
+
+static void pcie_bus_update_setting(struct pci_bus *bus)
+{
+	if (bus->self->is_hotplug_bridge)
+		pci_walk_bus(bus, pcie_bus_update_set, NULL);
+}
+
 /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down,
  * parents then children fashion.  If this changes, then this code will not
  * work as designed.
@@ -1614,6 +1648,15 @@  void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss)
 	if (!pci_is_pcie(bus->self))
 		return;
 
+	/* Sometimes we should update device mps here,
+	 * eg. after hot add, device mps value will be
+	 * set to default(128B), but the upstream port
+	 * mps value may be larger than 128B, if we do
+	 * not update the device mps, it maybe can not
+	 * work normally.
+	 */
+	pcie_bus_update_setting(bus);
+
 	if (pcie_bus_config == PCIE_BUS_TUNE_OFF)
 		return;