Message ID | 1375776540-23988-1-git-send-email-wangyijing@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Tue, Aug 6, 2013 at 1:09 AM, Yijing Wang <wangyijing@huawei.com> wrote: > v2->v3: Update CC stable tag suggested by Li Zefan. > v1->v2: Update patch log, remove Joe's reported-by, because his problem > was mainly caused by BIOS incorrect setting. But this patch mainly > to fix the bug caused by device hot add. Conservatively, this > version only update the mps problem when hot add. When the device > mps < parent mps found, this patch try to update device mps. > It seems unlikely device mps > parent mps after hot add device. > So we don't care that situation. > > Currently we don't update device's mps vaule when doing > pci device hot-add. The hot-added device's mps will be set > to default value (128B). But the upstream port device's mps > may be larger than 128B which was set by firmware during > system bootup. In this case the new added device may not > work normally. This patch try to update the hot added device > mps euqal to its parent mps, if device mpss < parent mps, > print warning. > > References: https://bugzilla.kernel.org/show_bug.cgi?id=60671 > Reported-by: Yijing Wang <wangyijing@huawei.com> > Signed-off-by: Yijing Wang <wangyijing@huawei.com> > Cc: Jon Mason <jdmason@kudzu.us> > Cc: stable@vger.kernel.org # 3.4+ > --- > drivers/pci/probe.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 43 insertions(+), 0 deletions(-) > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index cf57fe7..9b0e634 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -1603,6 +1603,40 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data) > return 0; > } > > +static int pcie_bus_update_set(struct pci_dev *dev, void *data) > +{ > + int mps, p_mps; > + > + if (!pci_is_pcie(dev) || !dev->bus->self) > + return 0; > + > + mps = pcie_get_mps(dev); > + p_mps = pcie_get_mps(dev->bus->self); > + > + if (mps < p_mps) > + goto update; It would be more clear to have it return 0 if mps >= p_mps, and not have the goto statement. > + > + return 0; > + > +update: > + /* If current mpss is lager than upstream, use upstream mps to update > + * current mps, otherwise print warning info. > + */ > + if ((128 << dev->pcie_mpss) >= p_mps) > + pcie_write_mps(dev, p_mps); > + else > + dev_warn(&dev->dev, "MPS %d MPSS %d both smaller than upstream MPS %d\n" > + "If necessary, use \"pci=pcie_bus_peer2peer\" boot parameter to avoid this problem\n", > + mps, 128 << dev->pcie_mpss, p_mps); > + return 0; > +} > + > +static void pcie_bus_update_setting(struct pci_bus *bus) > +{ > + if (bus->self->is_hotplug_bridge) > + pci_walk_bus(bus, pcie_bus_update_set, NULL); Why not just call pcie_bus_configure_set(bus->self, &smpss); with smpss of 0? > +} > + > /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down, > * parents then children fashion. If this changes, then this code will not > * work as designed. > @@ -1614,6 +1648,15 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss) > if (!pci_is_pcie(bus->self)) > return; > > + /* Sometimes we should update device mps here, > + * eg. after hot add, device mps value will be > + * set to default(128B), but the upstream port > + * mps value may be larger than 128B, if we do > + * not update the device mps, it maybe can not > + * work normally. This is slightly confusing to me. It would be more clear to say: There are situations (i.e., hot add) where the upstream port might have a larger MPS than the device. In these situations, the port MPS needs to be reconfigured to the lower value or the device will not operate properly. > + */ > + pcie_bus_update_setting(bus); This only seems to be necessary in the "TUNE_OFF" case. It would be best to move it under that, just 2 lines down. > + > if (pcie_bus_config == PCIE_BUS_TUNE_OFF) Perhaps it is time to make "SAFE" the default option, per the discussion at last years PCI mini-summit. Bjorn, thoughts? Thanks, Jon > return; > > -- > 1.7.1 > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> +static int pcie_bus_update_set(struct pci_dev *dev, void *data) >> +{ >> + int mps, p_mps; >> + >> + if (!pci_is_pcie(dev) || !dev->bus->self) >> + return 0; >> + >> + mps = pcie_get_mps(dev); >> + p_mps = pcie_get_mps(dev->bus->self); >> + >> + if (mps < p_mps) >> + goto update; Hi Jon, Thanks for your review and comments! > > It would be more clear to have it return 0 if mps >= p_mps, and not > have the goto statement. OK, will update. > >> + >> + return 0; >> + >> +update: >> + /* If current mpss is lager than upstream, use upstream mps to update >> + * current mps, otherwise print warning info. >> + */ >> + if ((128 << dev->pcie_mpss) >= p_mps) >> + pcie_write_mps(dev, p_mps); >> + else >> + dev_warn(&dev->dev, "MPS %d MPSS %d both smaller than upstream MPS %d\n" >> + "If necessary, use \"pci=pcie_bus_peer2peer\" boot parameter to avoid this problem\n", >> + mps, 128 << dev->pcie_mpss, p_mps); >> + return 0; >> +} >> + >> +static void pcie_bus_update_setting(struct pci_bus *bus) >> +{ >> + if (bus->self->is_hotplug_bridge) >> + pci_walk_bus(bus, pcie_bus_update_set, NULL); > > Why not just call pcie_bus_configure_set(bus->self, &smpss); with smpss of 0? Because for safety, I won't touch the running pcie device(RP, UP/DP,EP) mps. Consider the following case: Root Port------>Upstream Port------>Downstream Port 1 ---->Endpoint device A (newly inserted device) mps=256 256 | 256 128 |Downstream Port 2 ---->Endpoint device B 256 256 This patch try to update device A's mps equal to its parent DP 1 mps (256). Because EP A device is not running, newly inserted, not configure yet. So configure its mps to 256B is safe. But if we try to configure DP 1 mps to 128B, it's not safe. its parent Upstream port still is 256. > >> +} >> + >> /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down, >> * parents then children fashion. If this changes, then this code will not >> * work as designed. >> @@ -1614,6 +1648,15 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss) >> if (!pci_is_pcie(bus->self)) >> return; >> >> + /* Sometimes we should update device mps here, >> + * eg. after hot add, device mps value will be >> + * set to default(128B), but the upstream port >> + * mps value may be larger than 128B, if we do >> + * not update the device mps, it maybe can not >> + * work normally. > > This is slightly confusing to me. It would be more clear to say: > There are situations (i.e., hot add) where the upstream port might > have a larger MPS than the device. In these situations, the port MPS > needs to be reconfigured to the lower value or the device will not > operate properly. Sorry for my poor English, I mean the device is the newly hot added device, not the port device. So we will only reconfigure the newly hot added device mps. > >> + */ >> + pcie_bus_update_setting(bus); > > This only seems to be necessary in the "TUNE_OFF" case. It would be > best to move it under that, just 2 lines down. Good idea, will update, thanks! Thanks! Yijing. > >> + >> if (pcie_bus_config == PCIE_BUS_TUNE_OFF) > > Perhaps it is time to make "SAFE" the default option, per the > discussion at last years PCI mini-summit. > Bjorn, thoughts? > > > Thanks, > Jon > >> return; >> >> -- >> 1.7.1 >> >> > > . >
On Tue, Aug 6, 2013 at 6:52 PM, Yijing Wang <wangyijing@huawei.com> wrote: >>> +static int pcie_bus_update_set(struct pci_dev *dev, void *data) >>> +{ >>> + int mps, p_mps; >>> + >>> + if (!pci_is_pcie(dev) || !dev->bus->self) >>> + return 0; >>> + >>> + mps = pcie_get_mps(dev); >>> + p_mps = pcie_get_mps(dev->bus->self); >>> + >>> + if (mps < p_mps) >>> + goto update; > > Hi Jon, > Thanks for your review and comments! > >> >> It would be more clear to have it return 0 if mps >= p_mps, and not >> have the goto statement. > > OK, will update. > > >> >>> + >>> + return 0; >>> + >>> +update: >>> + /* If current mpss is lager than upstream, use upstream mps to update >>> + * current mps, otherwise print warning info. >>> + */ >>> + if ((128 << dev->pcie_mpss) >= p_mps) >>> + pcie_write_mps(dev, p_mps); >>> + else >>> + dev_warn(&dev->dev, "MPS %d MPSS %d both smaller than upstream MPS %d\n" >>> + "If necessary, use \"pci=pcie_bus_peer2peer\" boot parameter to avoid this problem\n", >>> + mps, 128 << dev->pcie_mpss, p_mps); >>> + return 0; >>> +} >>> + >>> +static void pcie_bus_update_setting(struct pci_bus *bus) >>> +{ >>> + if (bus->self->is_hotplug_bridge) >>> + pci_walk_bus(bus, pcie_bus_update_set, NULL); >> >> Why not just call pcie_bus_configure_set(bus->self, &smpss); with smpss of 0? > > Because for safety, I won't touch the running pcie device(RP, UP/DP,EP) mps. > Consider the following case: > Root Port------>Upstream Port------>Downstream Port 1 ---->Endpoint device A (newly inserted device) > mps=256 256 | 256 128 > |Downstream Port 2 ---->Endpoint device B > 256 256 > > This patch try to update device A's mps equal to its parent DP 1 mps (256). > Because EP A device is not running, newly inserted, not configure yet. So configure its mps > to 256B is safe. I understand your logic now and agree. > But if we try to configure DP 1 mps to 128B, it's not safe. its parent Upstream port still is 256. This is exactly the case that the "PERFORMANCE" option is trying to allow. If the MRRS is set to the MPS of the device, it should work. If not, then we should rip out all of the "PERFORMANCE" code. Is this something you can verify? Thanks, Jon >> >>> +} >>> + >>> /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down, >>> * parents then children fashion. If this changes, then this code will not >>> * work as designed. >>> @@ -1614,6 +1648,15 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss) >>> if (!pci_is_pcie(bus->self)) >>> return; >>> >>> + /* Sometimes we should update device mps here, >>> + * eg. after hot add, device mps value will be >>> + * set to default(128B), but the upstream port >>> + * mps value may be larger than 128B, if we do >>> + * not update the device mps, it maybe can not >>> + * work normally. >> >> This is slightly confusing to me. It would be more clear to say: >> There are situations (i.e., hot add) where the upstream port might >> have a larger MPS than the device. In these situations, the port MPS >> needs to be reconfigured to the lower value or the device will not >> operate properly. > > Sorry for my poor English, I mean the device is the newly hot added device, not > the port device. So we will only reconfigure the newly hot added device mps. > >> >>> + */ >>> + pcie_bus_update_setting(bus); >> >> This only seems to be necessary in the "TUNE_OFF" case. It would be >> best to move it under that, just 2 lines down. > > Good idea, will update, thanks! > > > Thanks! > Yijing. >> >>> + >>> if (pcie_bus_config == PCIE_BUS_TUNE_OFF) >> >> Perhaps it is time to make "SAFE" the default option, per the >> discussion at last years PCI mini-summit. >> Bjorn, thoughts? >> >> >> Thanks, >> Jon >> >>> return; >>> >>> -- >>> 1.7.1 >>> >>> >> >> . >> > > > -- > Thanks! > Yijing > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Because for safety, I won't touch the running pcie device(RP, UP/DP,EP) mps. >> Consider the following case: >> Root Port------>Upstream Port------>Downstream Port 1 ---->Endpoint device A (newly inserted device) >> mps=256 256 | 256 128 >> |Downstream Port 2 ---->Endpoint device B >> 256 256 >> >> This patch try to update device A's mps equal to its parent DP 1 mps (256). >> Because EP A device is not running, newly inserted, not configure yet. So configure its mps >> to 256B is safe. > > I understand your logic now and agree. Hi Jon, Thanks for your comments! I think we can improve this logic a little . As your comments in pcie_find_smpss(), if the PCI hotplug slot is directly connected to the root port and there are not other devices on the fabric, then this is not an issue.. So I think in this case, we can first update the newly hot added device mps as above logic. if newly hot added device mpss < root port mps, then we can modify both root port device and newly hot added device mps to device mpss. What do you think? eg. Root port --------------> slot (mps is default 128,assume mpss is 256) (mps 512) Only in this case, I think we try to update the parent device is safe. after update: Root port --------------> slot (mps is default 128,assume mpss is 256) (mps 512-->256) mps 128--->256 > >> But if we try to configure DP 1 mps to 128B, it's not safe. its parent Upstream port still is 256. > > This is exactly the case that the "PERFORMANCE" option is trying to > allow. If the MRRS is set to the MPS of the device, it should work. > If not, then we should rip out all of the "PERFORMANCE" code. Is this > something you can verify? Hi Jon, I am not very clear about what the role of MRRS. So if I understand wrong, please correct me, thanks. eg. DP1 ----------------> EP A mps=256 mps=128 MRRS can control the read request TLP size when the Function as a Requester. So if we set the EP A MRRS to mps value(128), EP A won't generate TLP larger than 128, so Request stream from EP A to DP1 is safe. But if EP A is as a receiver, DP 1 generate completion TLP (larger than 128) to EP A. these TLPs will be discarded by EP A, right? So in my idea, If we set both mps and mrrs of EP A to 128, we can insure TLPs stream from EP A is safe. But How do we guarantee that DP1 won't generate TLPs to EP A is larger than 128? Jon, PCIe Spec only involves mps and mrrs setting a little, Are there any other specs about this? Thanks! Yijing. > > Thanks, > Jon > >>> >>>> +} >>>> + >>>> /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down, >>>> * parents then children fashion. If this changes, then this code will not >>>> * work as designed. >>>> @@ -1614,6 +1648,15 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss) >>>> if (!pci_is_pcie(bus->self)) >>>> return; >>>> >>>> + /* Sometimes we should update device mps here, >>>> + * eg. after hot add, device mps value will be >>>> + * set to default(128B), but the upstream port >>>> + * mps value may be larger than 128B, if we do >>>> + * not update the device mps, it maybe can not >>>> + * work normally. >>> >>> This is slightly confusing to me. It would be more clear to say: >>> There are situations (i.e., hot add) where the upstream port might >>> have a larger MPS than the device. In these situations, the port MPS >>> needs to be reconfigured to the lower value or the device will not >>> operate properly. >> >> Sorry for my poor English, I mean the device is the newly hot added device, not >> the port device. So we will only reconfigure the newly hot added device mps. >> >>> >>>> + */ >>>> + pcie_bus_update_setting(bus); >>> >>> This only seems to be necessary in the "TUNE_OFF" case. It would be >>> best to move it under that, just 2 lines down. >> >> Good idea, will update, thanks! >> >> >> Thanks! >> Yijing. >>> >>>> + >>>> if (pcie_bus_config == PCIE_BUS_TUNE_OFF) >>> >>> Perhaps it is time to make "SAFE" the default option, per the >>> discussion at last years PCI mini-summit. >>> Bjorn, thoughts? >>> >>> >>> Thanks, >>> Jon >>> >>>> return; >>>> >>>> -- >>>> 1.7.1 >>>> >>>> >>> >>> . >>> >> >> >> -- >> Thanks! >> Yijing >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-pci" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > . >
On Wed, Aug 7, 2013 at 7:48 PM, Yijing Wang <wangyijing@huawei.com> wrote: >>> Because for safety, I won't touch the running pcie device(RP, UP/DP,EP) mps. >>> Consider the following case: >>> Root Port------>Upstream Port------>Downstream Port 1 ---->Endpoint device A (newly inserted device) >>> mps=256 256 | 256 128 >>> |Downstream Port 2 ---->Endpoint device B >>> 256 256 >>> >>> This patch try to update device A's mps equal to its parent DP 1 mps (256). >>> Because EP A device is not running, newly inserted, not configure yet. So configure its mps >>> to 256B is safe. >> >> I understand your logic now and agree. > > Hi Jon, > Thanks for your comments! > I think we can improve this logic a little . As your comments in pcie_find_smpss(), if the PCI hotplug > slot is directly connected to the root port and there are not other devices on the fabric, then this > is not an issue.. So I think in this case, we can first update the newly hot added device mps as above logic. > if newly hot added device mpss < root port mps, then we can modify both root port device and newly hot added device > mps to device mpss. What do you think? > > eg. > Root port --------------> slot (mps is default 128,assume mpss is 256) > (mps 512) > > Only in this case, I think we try to update the parent device is safe. > > after update: > Root port --------------> slot (mps is default 128,assume mpss is 256) > (mps 512-->256) mps 128--->256 Yes, but this is where it get difficult. You can only do it if the parent bus is the root port. Otherwise, the other devices on the bus will already have their MPS configured and changing the root port MPS will do bad things. That is why I have the check in pcie_find_smpss() of if (dev->is_hotplug_bridge && (!list_is_singular(&dev->bus->devices) || (dev->bus->self && pci_pcie_type(dev->bus->self) != PCI_EXP_TYPE_ROOT_PORT))) So, you either need to mimic this code in your new function or make PCIE_BUS_SAFE the default option. I'm leaning towards the latter, but if you need something for the stable tree then we should temporarily have a patch which does it the former. Thoughts? > >> >>> But if we try to configure DP 1 mps to 128B, it's not safe. its parent Upstream port still is 256. >> >> This is exactly the case that the "PERFORMANCE" option is trying to >> allow. If the MRRS is set to the MPS of the device, it should work. >> If not, then we should rip out all of the "PERFORMANCE" code. Is this >> something you can verify? > > Hi Jon, > I am not very clear about what the role of MRRS. So if I understand wrong, please correct me, thanks. > > eg. > DP1 ----------------> EP A > mps=256 mps=128 > > MRRS can control the read request TLP size when the Function as a Requester. > So if we set the EP A MRRS to mps value(128), EP A won't generate > TLP larger than 128, so Request stream from EP A to DP1 is safe. > > But if EP A is as a receiver, DP 1 generate completion TLP (larger than 128) to EP A. > these TLPs will be discarded by EP A, right? Yes > So in my idea, If we set both mps and mrrs of EP A to 128, we can insure TLPs stream from EP A is safe. > But How do we guarantee that DP1 won't generate TLPs to EP A is larger than 128? The device will never do any reads of larger than the MRRS, but writes to the device are an issue. Assuming that all I/O is between CPU/RAM and the device and there are no peer-to-peer transfers, we should be safe (but is that a safe assumption?). So my question to you is that the setup you have that is failing due to MPS sizes being off, can you set the only MRRS to 128 and get it working? > Jon, PCIe Spec only involves mps and mrrs setting a little, Are there any other specs about this? I have the Mindshare PCIE book, but it doesn't go very deep into this. Most of this is an idea from benh, which I attempted to implement. If it works then all credit goes to him, if it breaks then it is my fault :) Thanks, Jon > > > Thanks! > Yijing. > > >> >> Thanks, >> Jon >> >>>> >>>>> +} >>>>> + >>>>> /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down, >>>>> * parents then children fashion. If this changes, then this code will not >>>>> * work as designed. >>>>> @@ -1614,6 +1648,15 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss) >>>>> if (!pci_is_pcie(bus->self)) >>>>> return; >>>>> >>>>> + /* Sometimes we should update device mps here, >>>>> + * eg. after hot add, device mps value will be >>>>> + * set to default(128B), but the upstream port >>>>> + * mps value may be larger than 128B, if we do >>>>> + * not update the device mps, it maybe can not >>>>> + * work normally. >>>> >>>> This is slightly confusing to me. It would be more clear to say: >>>> There are situations (i.e., hot add) where the upstream port might >>>> have a larger MPS than the device. In these situations, the port MPS >>>> needs to be reconfigured to the lower value or the device will not >>>> operate properly. >>> >>> Sorry for my poor English, I mean the device is the newly hot added device, not >>> the port device. So we will only reconfigure the newly hot added device mps. >>> >>>> >>>>> + */ >>>>> + pcie_bus_update_setting(bus); >>>> >>>> This only seems to be necessary in the "TUNE_OFF" case. It would be >>>> best to move it under that, just 2 lines down. >>> >>> Good idea, will update, thanks! >>> >>> >>> Thanks! >>> Yijing. >>>> >>>>> + >>>>> if (pcie_bus_config == PCIE_BUS_TUNE_OFF) >>>> >>>> Perhaps it is time to make "SAFE" the default option, per the >>>> discussion at last years PCI mini-summit. >>>> Bjorn, thoughts? >>>> >>>> >>>> Thanks, >>>> Jon >>>> >>>>> return; >>>>> >>>>> -- >>>>> 1.7.1 >>>>> >>>>> >>>> >>>> . >>>> >>> >>> >>> -- >>> Thanks! >>> Yijing >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> . >> > > > -- > Thanks! > Yijing > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Hi Jon, >> Thanks for your comments! >> I think we can improve this logic a little . As your comments in pcie_find_smpss(), if the PCI hotplug >> slot is directly connected to the root port and there are not other devices on the fabric, then this >> is not an issue.. So I think in this case, we can first update the newly hot added device mps as above logic. >> if newly hot added device mpss < root port mps, then we can modify both root port device and newly hot added device >> mps to device mpss. What do you think? >> >> eg. >> Root port --------------> slot (mps is default 128,assume mpss is 256) >> (mps 512) >> >> Only in this case, I think we try to update the parent device is safe. >> >> after update: >> Root port --------------> slot (mps is default 128,assume mpss is 256) >> (mps 512-->256) mps 128--->256 > > > Yes, but this is where it get difficult. You can only do it if the > parent bus is the root port. Otherwise, the other devices on the bus > will already have their MPS configured and changing the root port MPS > will do bad things. That is why I have the check in pcie_find_smpss() > of > > if (dev->is_hotplug_bridge && (!list_is_singular(&dev->bus->devices) || > (dev->bus->self && > pci_pcie_type(dev->bus->self) != PCI_EXP_TYPE_ROOT_PORT))) > > So, you either need to mimic this code in your new function or make > PCIE_BUS_SAFE the default option. I'm leaning towards the latter, but > if you need something for the stable tree then we should temporarily > have a patch which does it the former. > > Thoughts? Yes, I need one patch fix this issue in the stable tree. Will you send out the series patch recently? Bjorn now began to review this mps code, so this is a good chance to rework mps related codes I think. > >> >>> >>>> But if we try to configure DP 1 mps to 128B, it's not safe. its parent Upstream port still is 256. >>> >>> This is exactly the case that the "PERFORMANCE" option is trying to >>> allow. If the MRRS is set to the MPS of the device, it should work. >>> If not, then we should rip out all of the "PERFORMANCE" code. Is this >>> something you can verify? >> >> Hi Jon, >> I am not very clear about what the role of MRRS. So if I understand wrong, please correct me, thanks. >> >> eg. >> DP1 ----------------> EP A >> mps=256 mps=128 >> >> MRRS can control the read request TLP size when the Function as a Requester. >> So if we set the EP A MRRS to mps value(128), EP A won't generate >> TLP larger than 128, so Request stream from EP A to DP1 is safe. >> >> But if EP A is as a receiver, DP 1 generate completion TLP (larger than 128) to EP A. >> these TLPs will be discarded by EP A, right? > > Yes > >> So in my idea, If we set both mps and mrrs of EP A to 128, we can insure TLPs stream from EP A is safe. >> But How do we guarantee that DP1 won't generate TLPs to EP A is larger than 128? > > The device will never do any reads of larger than the MRRS, but writes > to the device are an issue. Assuming that all I/O is between CPU/RAM > and the device and there are no peer-to-peer transfers, we should be > safe (but is that a safe assumption?). So my question to you is that > the setup you have that is failing due to MPS sizes being off, can you > set the only MRRS to 128 and get it working? I don't have PCIe analyzer, so I can not catch TLP, I'm not sure the TLPs size in stream. my test is using ping $dest_ip -s 65500 local machine remote machine IP: 128.5.160.31 IP: 128.5.160.28 Root port ---------------->Endpoint device(bnx2 NIC) 00:01.0 01:00.1 default setting Root Port(00:01.0): Capabilities: [90] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes Endpoint device(01:00.1) Capabilities: [ac] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes test1: change root port mps to 256 Root port ---------------->EndPoint device(bnx2 NIC) mps=256 (default 128) mps=128 mrrs=128 mrrs=512(default) result: ping 128.5.160.28(remote ip) -s 65500 ----->fail ping 128.5.160.31(remove machine ping local ip)------->fail test2: change root port mps to 256, EP device mrrs to 128 Root port ---------------->EndPoint device(bnx2 NIC) mps=256 (default 128) mps=128 mrrs=128 mrrs=128(default 512) result: ping 128.5.160.28(remote ip) -s 65500 ----->success ping 128.5.160.31(remove machine ping local ip)------->success It seems change mrrs to 128 take effect. Thanks! Yijing. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index cf57fe7..9b0e634 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1603,6 +1603,40 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data) return 0; } +static int pcie_bus_update_set(struct pci_dev *dev, void *data) +{ + int mps, p_mps; + + if (!pci_is_pcie(dev) || !dev->bus->self) + return 0; + + mps = pcie_get_mps(dev); + p_mps = pcie_get_mps(dev->bus->self); + + if (mps < p_mps) + goto update; + + return 0; + +update: + /* If current mpss is lager than upstream, use upstream mps to update + * current mps, otherwise print warning info. + */ + if ((128 << dev->pcie_mpss) >= p_mps) + pcie_write_mps(dev, p_mps); + else + dev_warn(&dev->dev, "MPS %d MPSS %d both smaller than upstream MPS %d\n" + "If necessary, use \"pci=pcie_bus_peer2peer\" boot parameter to avoid this problem\n", + mps, 128 << dev->pcie_mpss, p_mps); + return 0; +} + +static void pcie_bus_update_setting(struct pci_bus *bus) +{ + if (bus->self->is_hotplug_bridge) + pci_walk_bus(bus, pcie_bus_update_set, NULL); +} + /* pcie_bus_configure_settings requires that pci_walk_bus work in a top-down, * parents then children fashion. If this changes, then this code will not * work as designed. @@ -1614,6 +1648,15 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss) if (!pci_is_pcie(bus->self)) return; + /* Sometimes we should update device mps here, + * eg. after hot add, device mps value will be + * set to default(128B), but the upstream port + * mps value may be larger than 128B, if we do + * not update the device mps, it maybe can not + * work normally. + */ + pcie_bus_update_setting(bus); + if (pcie_bus_config == PCIE_BUS_TUNE_OFF) return;
v2->v3: Update CC stable tag suggested by Li Zefan. v1->v2: Update patch log, remove Joe's reported-by, because his problem was mainly caused by BIOS incorrect setting. But this patch mainly to fix the bug caused by device hot add. Conservatively, this version only update the mps problem when hot add. When the device mps < parent mps found, this patch try to update device mps. It seems unlikely device mps > parent mps after hot add device. So we don't care that situation. Currently we don't update device's mps vaule when doing pci device hot-add. The hot-added device's mps will be set to default value (128B). But the upstream port device's mps may be larger than 128B which was set by firmware during system bootup. In this case the new added device may not work normally. This patch try to update the hot added device mps euqal to its parent mps, if device mpss < parent mps, print warning. References: https://bugzilla.kernel.org/show_bug.cgi?id=60671 Reported-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Yijing Wang <wangyijing@huawei.com> Cc: Jon Mason <jdmason@kudzu.us> Cc: stable@vger.kernel.org # 3.4+ --- drivers/pci/probe.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 43 insertions(+), 0 deletions(-)