Message ID | 20230817121708.53213-1-ilpo.jarvinen@linux.intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Add PCIe Bandwidth Controller | expand |
On 8/17/2023 5:46 PM, Ilpo Järvinen wrote: > Hi all, > > This series adds PCIe bandwidth controller (bwctrl) and associated PCIe > cooling driver to the thermal core side for limiting PCIe link speed > due to thermal reasons. PCIe bandwidth controller is a PCI express bus > port service driver. A cooling device is created for each port the > service driver finds if they support changing speeds. I see we had support for only link speed changes here but we need to add support for link width change also as bandwidth notification from PCIe supports both link speed and link width. - KC > > bwctrl now is built on top of BW notifications revert. I'm just not > sure what is the best practice when re-adding some old functionality in > a modified form so please let me know if I need to somehow alter that > patch. > > The series is based on top of the RMW changes in pci/pcie-rmw. > > Ilpo Järvinen (10): > PCI: Protect Link Control 2 Register with RMW locking > drm/radeon: Use RMW accessors for changing LNKCTL2 > drm/amdgpu: Use RMW accessors for changing LNKCTL2 > drm/IB/hfi1: Use RMW accessors for changing LNKCTL2 > PCI: Store all PCIe Supported Link Speeds > PCI: Cache PCIe device's Supported Speed Vector > PCI/LINK: Re-add BW notification portdrv as PCIe BW controller > PCI/bwctrl: Add "controller" part into PCIe bwctrl > thermal: Add PCIe cooling driver > selftests/pcie_bwctrl: Create selftests > > MAINTAINERS | 8 + > drivers/gpu/drm/amd/amdgpu/cik.c | 41 +-- > drivers/gpu/drm/amd/amdgpu/si.c | 41 +-- > drivers/gpu/drm/radeon/cik.c | 40 +-- > drivers/gpu/drm/radeon/si.c | 40 +-- > drivers/infiniband/hw/hfi1/pcie.c | 30 +- > drivers/pci/pcie/Kconfig | 9 + > drivers/pci/pcie/Makefile | 1 + > drivers/pci/pcie/bwctrl.c | 309 ++++++++++++++++++ > drivers/pci/pcie/portdrv.c | 9 +- > drivers/pci/pcie/portdrv.h | 10 +- > drivers/pci/probe.c | 38 ++- > drivers/pci/remove.c | 2 + > drivers/thermal/Kconfig | 10 + > drivers/thermal/Makefile | 2 + > drivers/thermal/pcie_cooling.c | 107 ++++++ > include/linux/pci-bwctrl.h | 33 ++ > include/linux/pci.h | 3 + > include/uapi/linux/pci_regs.h | 1 + > tools/testing/selftests/Makefile | 1 + > tools/testing/selftests/pcie_bwctrl/Makefile | 2 + > .../pcie_bwctrl/set_pcie_cooling_state.sh | 122 +++++++ > .../selftests/pcie_bwctrl/set_pcie_speed.sh | 67 ++++ > 23 files changed, 795 insertions(+), 131 deletions(-) > create mode 100644 drivers/pci/pcie/bwctrl.c > create mode 100644 drivers/thermal/pcie_cooling.c > create mode 100644 include/linux/pci-bwctrl.h > create mode 100644 tools/testing/selftests/pcie_bwctrl/Makefile > create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_cooling_state.sh > create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_speed.sh >
On Mon, 4 Sep 2023, Krishna Chaitanya Chundru wrote: > > On 8/17/2023 5:46 PM, Ilpo Järvinen wrote: > > Hi all, > > > > This series adds PCIe bandwidth controller (bwctrl) and associated PCIe > > cooling driver to the thermal core side for limiting PCIe link speed > > due to thermal reasons. PCIe bandwidth controller is a PCI express bus > > port service driver. A cooling device is created for each port the > > service driver finds if they support changing speeds. > > I see we had support for only link speed changes here but we need to add > support for > > link width change also as bandwidth notification from PCIe supports both link > speed and link width. Hi, Thanks for the comment. In case you mean that the changes in Link Width should be reported correctly, they already are since the sysfs interface reads them directly from LNKSTA register. Or did you perhaps mean that Bandwidth Controller should support also changing Link Width? If this is the case I don't know how it can be realized so a pointer on how it can be achieved would be appreciated.
On 9/4/2023 4:46 PM, Ilpo Järvinen wrote: > On Mon, 4 Sep 2023, Krishna Chaitanya Chundru wrote: > >> On 8/17/2023 5:46 PM, Ilpo Järvinen wrote: >>> Hi all, >>> >>> This series adds PCIe bandwidth controller (bwctrl) and associated PCIe >>> cooling driver to the thermal core side for limiting PCIe link speed >>> due to thermal reasons. PCIe bandwidth controller is a PCI express bus >>> port service driver. A cooling device is created for each port the >>> service driver finds if they support changing speeds. >> I see we had support for only link speed changes here but we need to add >> support for >> >> link width change also as bandwidth notification from PCIe supports both link >> speed and link width. > Hi, > > Thanks for the comment. In case you mean that the changes in Link Width > should be reported correctly, they already are since the sysfs interface > reads them directly from LNKSTA register. > > Or did you perhaps mean that Bandwidth Controller should support also > changing Link Width? If this is the case I don't know how it can be > realized so a pointer on how it can be achieved would be appreciated. Hi, I didn't have any idea on how thermal framework works. But as we are adding bandwidth controller support we need to add support for width change also, may be we are not using this now, but we may need it in the future. We had similar use case based on the bandwidth requirement on devices like WLAN, the client try to reduce or increase the link speed and link width. So in the bandwidth controller driver we can add support for link width also. So any client can easily use the driver to change link speed or width or both to reduce the power consumption. Adding link width support should be similar to how you added the link speed supported. Please correct me if I misunderstood something here. Thanks & Regards, Krishna Chaitanya.
+ thermal people. On Mon, 11 Sep 2023, Krishna Chaitanya Chundru wrote: > On 9/4/2023 4:46 PM, Ilpo Järvinen wrote: > > On Mon, 4 Sep 2023, Krishna Chaitanya Chundru wrote: > > > On 8/17/2023 5:46 PM, Ilpo Järvinen wrote: > > > > > > > > This series adds PCIe bandwidth controller (bwctrl) and associated PCIe > > > > cooling driver to the thermal core side for limiting PCIe link speed > > > > due to thermal reasons. PCIe bandwidth controller is a PCI express bus > > > > port service driver. A cooling device is created for each port the > > > > service driver finds if they support changing speeds. > > > I see we had support for only link speed changes here but we need to add > > > support for > > > > > > link width change also as bandwidth notification from PCIe supports both > > > link > > > speed and link width. > > Hi, > > > > Thanks for the comment. In case you mean that the changes in Link Width > > should be reported correctly, they already are since the sysfs interface > > reads them directly from LNKSTA register. > > > > Or did you perhaps mean that Bandwidth Controller should support also > > changing Link Width? If this is the case I don't know how it can be > > realized so a pointer on how it can be achieved would be appreciated. > > I didn't have any idea on how thermal framework works. > > But as we are adding bandwidth controller support we need to add support for > width change also, may be we are not using this now, but we may need it in the > future. > > We had similar use case based on the bandwidth requirement on devices like > WLAN, the client try to reduce or increase the link speed and link width. > > So in the bandwidth controller driver we can add support for link width also. > So any client can easily use the driver to change link speed or width or both > to reduce the power consumption. > > Adding link width support should be similar to how you added the link speed > supported. > > Please correct me if I misunderstood something here. Hi, Okay, thanks for the clarification. So the point is to plan for adding support for Link Width later and currently only support throttling Link Speed. In any case, the Link Width control seems to be controlled using a different approach (Link Width change does not require Link Retraining). I don't know either how such 2 dimensioned throttling (Link Speed and Link Width) is supposed to be realized using the thermal/cooling device interface which only provides a single integer as the current state. That is, whether to provide a single cooling device (with a single integer exposed to userspace) or separate cooling device for each dimension? Perhaps thermal people could provide some insight on this? Is there some precedent I could take look at?
On Mon, 2023-09-11 at 18:47 +0300, Ilpo Järvinen wrote: > + thermal people. > > ... > Hi, > > Okay, thanks for the clarification. So the point is to plan for > adding > support for Link Width later and currently only support throttling > Link > Speed. In any case, the Link Width control seems to be controlled > using > a different approach (Link Width change does not require Link > Retraining). > > I don't know either how such 2 dimensioned throttling (Link Speed and > Link Width) is supposed to be realized using the thermal/cooling > device > interface which only provides a single integer as the current state. > That > is, whether to provide a single cooling device (with a single integer > exposed to userspace) or separate cooling device for each dimension? > > Perhaps thermal people could provide some insight on this? Is there > some > precedent I could take look at? Yes. The processor cooling device does similar. 1-3 are reserved for P- state and and 4-7 for T-states. But I don't suggest using such method. This causes confusion and difficult to change. For example if we increase range of P-state control, then there is no way to know what is the start point of T- states. It is best to create to separate cooling devices for BW and link width. Also there is a requirement that anything you add to thermal sysfs, it should have some purpose for thermal control. I hope Link width control is targeted to similar use case BW control. Thanks, Srinivas >
On Mon, 11 Sep 2023, srinivas pandruvada wrote: > On Mon, 2023-09-11 at 18:47 +0300, Ilpo Järvinen wrote: > > > > Okay, thanks for the clarification. So the point is to plan for > > adding > > support for Link Width later and currently only support throttling > > Link > > Speed. In any case, the Link Width control seems to be controlled > > using > > a different approach (Link Width change does not require Link > > Retraining). > > > > I don't know either how such 2 dimensioned throttling (Link Speed and > > Link Width) is supposed to be realized using the thermal/cooling > > device > > interface which only provides a single integer as the current state. > > That > > is, whether to provide a single cooling device (with a single integer > > exposed to userspace) or separate cooling device for each dimension? > > > > Perhaps thermal people could provide some insight on this? Is there > > some > > precedent I could take look at? > > Yes. The processor cooling device does similar. 1-3 are reserved for P- > state and and 4-7 for T-states. > > But I don't suggest using such method. This causes confusion and > difficult to change. For example if we increase range of P-state > control, then there is no way to know what is the start point of T- > states. Yes. I understand it would be confusing. > It is best to create to separate cooling devices for BW and link width. Okay. If that's the case, then I see no reason to add the Link Width cooling device now as it could do nothing besides reporting the current link width. The only question that then remains is how to take this into account in the naming of the cooling devices, currently PCIe_Port_<pci_name()> is used but perhaps it would be better to change that to PCIe_Port_Link_Speed_... to allow PCI_Port_Link_Width_... to be added later beside it? > Also there is a requirement that anything you add to thermal sysfs, it > should have some purpose for thermal control. I hope Link width control > is targeted to similar use case BW control. Ability to control Link Width seems to be part of PCIe 6.0 L0p. AFAICT, the reasons are to lower/control power consumption so it seems to be within scope.
On Tue, 2023-09-12 at 15:52 +0300, Ilpo Järvinen wrote: > On Mon, 11 Sep 2023, srinivas pandruvada wrote: > > On Mon, 2023-09-11 at 18:47 +0300, Ilpo Järvinen wrote: > > > > [...] > > But I don't suggest using such method. This causes confusion and > > difficult to change. For example if we increase range of P-state > > control, then there is no way to know what is the start point of T- > > states. > > Yes. I understand it would be confusing. > > > It is best to create to separate cooling devices for BW and link > > width. > > Okay. If that's the case, then I see no reason to add the Link Width > cooling device now as it could do nothing besides reporting the > current > link width. > > The only question that then remains is how to take this into account > in > the naming of the cooling devices, currently PCIe_Port_<pci_name()> > is > used but perhaps it would be better to change that to > PCIe_Port_Link_Speed_... to allow PCI_Port_Link_Width_... to be added > later beside it? It is better in that way to add BW controller later. Also adding separate cooling device will let thermal configuration, choose different method at different thermal thresholds or all together. Thanks, Srinivas > > > Also there is a requirement that anything you add to thermal sysfs, > > it > > should have some purpose for thermal control. I hope Link width > > control > > is targeted to similar use case BW control. > > Ability to control Link Width seems to be part of PCIe 6.0 L0p. > AFAICT, > the reasons are to lower/control power consumption so it seems to be > within scope. > >
On Tue, 2023-09-12 at 10:45 -0700, srinivas pandruvada wrote: > On Tue, 2023-09-12 at 15:52 +0300, Ilpo Järvinen wrote: > > On Mon, 11 Sep 2023, srinivas pandruvada wrote: > > > On Mon, 2023-09-11 at 18:47 +0300, Ilpo Järvinen wrote: > > > > > > > > [...] > > > > But I don't suggest using such method. This causes confusion and > > > difficult to change. For example if we increase range of P-state > > > control, then there is no way to know what is the start point of > > > T- > > > states. > > > > Yes. I understand it would be confusing. > > > > > It is best to create to separate cooling devices for BW and link > > > width. > > > > Okay. If that's the case, then I see no reason to add the Link > > Width > > cooling device now as it could do nothing besides reporting the > > current > > link width. > > > > The only question that then remains is how to take this into > > account > > in > > the naming of the cooling devices, currently PCIe_Port_<pci_name()> > > is > > used but perhaps it would be better to change that to > > PCIe_Port_Link_Speed_... to allow PCI_Port_Link_Width_... to be > > added > > later beside it? > It is better in that way to add BW sorry, link width controller > controller later. > > Also adding separate cooling device will let thermal configuration, > choose different method at different thermal thresholds or all > together. > > Thanks, > Srinivas > > > > > > Also there is a requirement that anything you add to thermal > > > sysfs, > > > it > > > should have some purpose for thermal control. I hope Link width > > > control > > > is targeted to similar use case BW control. > > > > Ability to control Link Width seems to be part of PCIe 6.0 L0p. > > AFAICT, > > the reasons are to lower/control power consumption so it seems to > > be > > within scope. > > > > >