mbox series

[00/10] Add PCIe Bandwidth Controller

Message ID 20230817121708.53213-1-ilpo.jarvinen@linux.intel.com (mailing list archive)
Headers show
Series Add PCIe Bandwidth Controller | expand

Message

Ilpo Järvinen Aug. 17, 2023, 12:16 p.m. UTC
Hi all,

This series adds PCIe bandwidth controller (bwctrl) and associated PCIe
cooling driver to the thermal core side for limiting PCIe link speed
due to thermal reasons. PCIe bandwidth controller is a PCI express bus
port service driver. A cooling device is created for each port the
service driver finds if they support changing speeds.

bwctrl now is built on top of BW notifications revert. I'm just not
sure what is the best practice when re-adding some old functionality in
a modified form so please let me know if I need to somehow alter that
patch.

The series is based on top of the RMW changes in pci/pcie-rmw.

Ilpo Järvinen (10):
  PCI: Protect Link Control 2 Register with RMW locking
  drm/radeon: Use RMW accessors for changing LNKCTL2
  drm/amdgpu: Use RMW accessors for changing LNKCTL2
  drm/IB/hfi1: Use RMW accessors for changing LNKCTL2
  PCI: Store all PCIe Supported Link Speeds
  PCI: Cache PCIe device's Supported Speed Vector
  PCI/LINK: Re-add BW notification portdrv as PCIe BW controller
  PCI/bwctrl: Add "controller" part into PCIe bwctrl
  thermal: Add PCIe cooling driver
  selftests/pcie_bwctrl: Create selftests

 MAINTAINERS                                   |   8 +
 drivers/gpu/drm/amd/amdgpu/cik.c              |  41 +--
 drivers/gpu/drm/amd/amdgpu/si.c               |  41 +--
 drivers/gpu/drm/radeon/cik.c                  |  40 +--
 drivers/gpu/drm/radeon/si.c                   |  40 +--
 drivers/infiniband/hw/hfi1/pcie.c             |  30 +-
 drivers/pci/pcie/Kconfig                      |   9 +
 drivers/pci/pcie/Makefile                     |   1 +
 drivers/pci/pcie/bwctrl.c                     | 309 ++++++++++++++++++
 drivers/pci/pcie/portdrv.c                    |   9 +-
 drivers/pci/pcie/portdrv.h                    |  10 +-
 drivers/pci/probe.c                           |  38 ++-
 drivers/pci/remove.c                          |   2 +
 drivers/thermal/Kconfig                       |  10 +
 drivers/thermal/Makefile                      |   2 +
 drivers/thermal/pcie_cooling.c                | 107 ++++++
 include/linux/pci-bwctrl.h                    |  33 ++
 include/linux/pci.h                           |   3 +
 include/uapi/linux/pci_regs.h                 |   1 +
 tools/testing/selftests/Makefile              |   1 +
 tools/testing/selftests/pcie_bwctrl/Makefile  |   2 +
 .../pcie_bwctrl/set_pcie_cooling_state.sh     | 122 +++++++
 .../selftests/pcie_bwctrl/set_pcie_speed.sh   |  67 ++++
 23 files changed, 795 insertions(+), 131 deletions(-)
 create mode 100644 drivers/pci/pcie/bwctrl.c
 create mode 100644 drivers/thermal/pcie_cooling.c
 create mode 100644 include/linux/pci-bwctrl.h
 create mode 100644 tools/testing/selftests/pcie_bwctrl/Makefile
 create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_cooling_state.sh
 create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_speed.sh

Comments

Krishna Chaitanya Chundru Sept. 4, 2023, 6:26 a.m. UTC | #1
On 8/17/2023 5:46 PM, Ilpo Järvinen wrote:
> Hi all,
>
> This series adds PCIe bandwidth controller (bwctrl) and associated PCIe
> cooling driver to the thermal core side for limiting PCIe link speed
> due to thermal reasons. PCIe bandwidth controller is a PCI express bus
> port service driver. A cooling device is created for each port the
> service driver finds if they support changing speeds.

I see we had support for only link speed changes here but we need to add 
support for

link width change also as bandwidth notification from PCIe supports both 
link speed and link width.

- KC

>
> bwctrl now is built on top of BW notifications revert. I'm just not
> sure what is the best practice when re-adding some old functionality in
> a modified form so please let me know if I need to somehow alter that
> patch.
>
> The series is based on top of the RMW changes in pci/pcie-rmw.
>
> Ilpo Järvinen (10):
>    PCI: Protect Link Control 2 Register with RMW locking
>    drm/radeon: Use RMW accessors for changing LNKCTL2
>    drm/amdgpu: Use RMW accessors for changing LNKCTL2
>    drm/IB/hfi1: Use RMW accessors for changing LNKCTL2
>    PCI: Store all PCIe Supported Link Speeds
>    PCI: Cache PCIe device's Supported Speed Vector
>    PCI/LINK: Re-add BW notification portdrv as PCIe BW controller
>    PCI/bwctrl: Add "controller" part into PCIe bwctrl
>    thermal: Add PCIe cooling driver
>    selftests/pcie_bwctrl: Create selftests
>
>   MAINTAINERS                                   |   8 +
>   drivers/gpu/drm/amd/amdgpu/cik.c              |  41 +--
>   drivers/gpu/drm/amd/amdgpu/si.c               |  41 +--
>   drivers/gpu/drm/radeon/cik.c                  |  40 +--
>   drivers/gpu/drm/radeon/si.c                   |  40 +--
>   drivers/infiniband/hw/hfi1/pcie.c             |  30 +-
>   drivers/pci/pcie/Kconfig                      |   9 +
>   drivers/pci/pcie/Makefile                     |   1 +
>   drivers/pci/pcie/bwctrl.c                     | 309 ++++++++++++++++++
>   drivers/pci/pcie/portdrv.c                    |   9 +-
>   drivers/pci/pcie/portdrv.h                    |  10 +-
>   drivers/pci/probe.c                           |  38 ++-
>   drivers/pci/remove.c                          |   2 +
>   drivers/thermal/Kconfig                       |  10 +
>   drivers/thermal/Makefile                      |   2 +
>   drivers/thermal/pcie_cooling.c                | 107 ++++++
>   include/linux/pci-bwctrl.h                    |  33 ++
>   include/linux/pci.h                           |   3 +
>   include/uapi/linux/pci_regs.h                 |   1 +
>   tools/testing/selftests/Makefile              |   1 +
>   tools/testing/selftests/pcie_bwctrl/Makefile  |   2 +
>   .../pcie_bwctrl/set_pcie_cooling_state.sh     | 122 +++++++
>   .../selftests/pcie_bwctrl/set_pcie_speed.sh   |  67 ++++
>   23 files changed, 795 insertions(+), 131 deletions(-)
>   create mode 100644 drivers/pci/pcie/bwctrl.c
>   create mode 100644 drivers/thermal/pcie_cooling.c
>   create mode 100644 include/linux/pci-bwctrl.h
>   create mode 100644 tools/testing/selftests/pcie_bwctrl/Makefile
>   create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_cooling_state.sh
>   create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_speed.sh
>
Ilpo Järvinen Sept. 4, 2023, 11:16 a.m. UTC | #2
On Mon, 4 Sep 2023, Krishna Chaitanya Chundru wrote:

> 
> On 8/17/2023 5:46 PM, Ilpo Järvinen wrote:
> > Hi all,
> > 
> > This series adds PCIe bandwidth controller (bwctrl) and associated PCIe
> > cooling driver to the thermal core side for limiting PCIe link speed
> > due to thermal reasons. PCIe bandwidth controller is a PCI express bus
> > port service driver. A cooling device is created for each port the
> > service driver finds if they support changing speeds.
> 
> I see we had support for only link speed changes here but we need to add
> support for
> 
> link width change also as bandwidth notification from PCIe supports both link
> speed and link width.

Hi,

Thanks for the comment. In case you mean that the changes in Link Width 
should be reported correctly, they already are since the sysfs interface 
reads them directly from LNKSTA register.

Or did you perhaps mean that Bandwidth Controller should support also 
changing Link Width? If this is the case I don't know how it can be 
realized so a pointer on how it can be achieved would be appreciated.
Krishna Chaitanya Chundru Sept. 11, 2023, 1:21 p.m. UTC | #3
On 9/4/2023 4:46 PM, Ilpo Järvinen wrote:
> On Mon, 4 Sep 2023, Krishna Chaitanya Chundru wrote:
>
>> On 8/17/2023 5:46 PM, Ilpo Järvinen wrote:
>>> Hi all,
>>>
>>> This series adds PCIe bandwidth controller (bwctrl) and associated PCIe
>>> cooling driver to the thermal core side for limiting PCIe link speed
>>> due to thermal reasons. PCIe bandwidth controller is a PCI express bus
>>> port service driver. A cooling device is created for each port the
>>> service driver finds if they support changing speeds.
>> I see we had support for only link speed changes here but we need to add
>> support for
>>
>> link width change also as bandwidth notification from PCIe supports both link
>> speed and link width.
> Hi,
>
> Thanks for the comment. In case you mean that the changes in Link Width
> should be reported correctly, they already are since the sysfs interface
> reads them directly from LNKSTA register.
>
> Or did you perhaps mean that Bandwidth Controller should support also
> changing Link Width? If this is the case I don't know how it can be
> realized so a pointer on how it can be achieved would be appreciated.

Hi,

I didn't have any idea on how thermal framework works.

But as we are adding bandwidth controller support we need to add support 
for width change also, may be we are not using this now, but we may need 
it in the future.

We had similar use case based on the bandwidth requirement on devices 
like WLAN, the client try to reduce or increase the link speed and link 
width.

So in the bandwidth controller driver we can add support for link width 
also. So any client can easily use the driver to change link speed or 
width or both to reduce the power consumption.

Adding link width support should be similar to how you added the link 
speed supported.

Please correct me if I misunderstood something here.

Thanks & Regards,

Krishna Chaitanya.
Ilpo Järvinen Sept. 11, 2023, 3:47 p.m. UTC | #4
+ thermal people.

On Mon, 11 Sep 2023, Krishna Chaitanya Chundru wrote:
> On 9/4/2023 4:46 PM, Ilpo Järvinen wrote:
> > On Mon, 4 Sep 2023, Krishna Chaitanya Chundru wrote:
> > > On 8/17/2023 5:46 PM, Ilpo Järvinen wrote:
> > > > 
> > > > This series adds PCIe bandwidth controller (bwctrl) and associated PCIe
> > > > cooling driver to the thermal core side for limiting PCIe link speed
> > > > due to thermal reasons. PCIe bandwidth controller is a PCI express bus
> > > > port service driver. A cooling device is created for each port the
> > > > service driver finds if they support changing speeds.
> > > I see we had support for only link speed changes here but we need to add
> > > support for
> > > 
> > > link width change also as bandwidth notification from PCIe supports both
> > > link
> > > speed and link width.
> > Hi,
> > 
> > Thanks for the comment. In case you mean that the changes in Link Width
> > should be reported correctly, they already are since the sysfs interface
> > reads them directly from LNKSTA register.
> > 
> > Or did you perhaps mean that Bandwidth Controller should support also
> > changing Link Width? If this is the case I don't know how it can be
> > realized so a pointer on how it can be achieved would be appreciated.
> 
> I didn't have any idea on how thermal framework works.
> 
> But as we are adding bandwidth controller support we need to add support for
> width change also, may be we are not using this now, but we may need it in the
> future.
> 
> We had similar use case based on the bandwidth requirement on devices like
> WLAN, the client try to reduce or increase the link speed and link width.
> 
> So in the bandwidth controller driver we can add support for link width also.
> So any client can easily use the driver to change link speed or width or both
> to reduce the power consumption.
> 
> Adding link width support should be similar to how you added the link speed
> supported.
> 
> Please correct me if I misunderstood something here.

Hi,

Okay, thanks for the clarification. So the point is to plan for adding 
support for Link Width later and currently only support throttling Link 
Speed. In any case, the Link Width control seems to be controlled using 
a different approach (Link Width change does not require Link Retraining).

I don't know either how such 2 dimensioned throttling (Link Speed and 
Link Width) is supposed to be realized using the thermal/cooling device 
interface which only provides a single integer as the current state. That 
is, whether to provide a single cooling device (with a single integer 
exposed to userspace) or separate cooling device for each dimension?

Perhaps thermal people could provide some insight on this? Is there some 
precedent I could take look at?
Srinivas Pandruvada Sept. 11, 2023, 4:14 p.m. UTC | #5
On Mon, 2023-09-11 at 18:47 +0300, Ilpo Järvinen wrote:
> + thermal people.
> 
> 

...

> Hi,
> 
> Okay, thanks for the clarification. So the point is to plan for
> adding 
> support for Link Width later and currently only support throttling
> Link 
> Speed. In any case, the Link Width control seems to be controlled
> using 
> a different approach (Link Width change does not require Link
> Retraining).
> 
> I don't know either how such 2 dimensioned throttling (Link Speed and
> Link Width) is supposed to be realized using the thermal/cooling
> device 
> interface which only provides a single integer as the current state.
> That 
> is, whether to provide a single cooling device (with a single integer
> exposed to userspace) or separate cooling device for each dimension?
> 
> Perhaps thermal people could provide some insight on this? Is there
> some 
> precedent I could take look at?
Yes. The processor cooling device does similar. 1-3 are reserved for P-
state and and 4-7 for T-states.

But I don't suggest using such method. This causes confusion and
difficult to change. For example if we increase range of P-state
control, then there is no way to know what is the start point of T-
states.

It is best to create to separate cooling devices for BW and link width.

Also there is a requirement that anything you add to thermal sysfs, it
should have some purpose for thermal control. I hope Link width control
is targeted to similar use case BW control.

Thanks,
Srinivas


>
Ilpo Järvinen Sept. 12, 2023, 12:52 p.m. UTC | #6
On Mon, 11 Sep 2023, srinivas pandruvada wrote:
> On Mon, 2023-09-11 at 18:47 +0300, Ilpo Järvinen wrote:
> > 
> > Okay, thanks for the clarification. So the point is to plan for
> > adding 
> > support for Link Width later and currently only support throttling
> > Link 
> > Speed. In any case, the Link Width control seems to be controlled
> > using 
> > a different approach (Link Width change does not require Link
> > Retraining).
> > 
> > I don't know either how such 2 dimensioned throttling (Link Speed and
> > Link Width) is supposed to be realized using the thermal/cooling
> > device 
> > interface which only provides a single integer as the current state.
> > That 
> > is, whether to provide a single cooling device (with a single integer
> > exposed to userspace) or separate cooling device for each dimension?
> > 
> > Perhaps thermal people could provide some insight on this? Is there
> > some 
> > precedent I could take look at?
>
> Yes. The processor cooling device does similar. 1-3 are reserved for P-
> state and and 4-7 for T-states.
> 
> But I don't suggest using such method. This causes confusion and
> difficult to change. For example if we increase range of P-state
> control, then there is no way to know what is the start point of T-
> states.

Yes. I understand it would be confusing.

> It is best to create to separate cooling devices for BW and link width.

Okay. If that's the case, then I see no reason to add the Link Width 
cooling device now as it could do nothing besides reporting the current 
link width.

The only question that then remains is how to take this into account in 
the naming of the cooling devices, currently PCIe_Port_<pci_name()> is 
used but perhaps it would be better to change that to 
PCIe_Port_Link_Speed_... to allow PCI_Port_Link_Width_... to be added 
later beside it?

> Also there is a requirement that anything you add to thermal sysfs, it
> should have some purpose for thermal control. I hope Link width control
> is targeted to similar use case BW control.

Ability to control Link Width seems to be part of PCIe 6.0 L0p. AFAICT, 
the reasons are to lower/control power consumption so it seems to be 
within scope.
Srinivas Pandruvada Sept. 12, 2023, 5:45 p.m. UTC | #7
On Tue, 2023-09-12 at 15:52 +0300, Ilpo Järvinen wrote:
> On Mon, 11 Sep 2023, srinivas pandruvada wrote:
> > On Mon, 2023-09-11 at 18:47 +0300, Ilpo Järvinen wrote:
> > 
> > 

[...]

> > But I don't suggest using such method. This causes confusion and
> > difficult to change. For example if we increase range of P-state
> > control, then there is no way to know what is the start point of T-
> > states.
> 
> Yes. I understand it would be confusing.
> 
> > It is best to create to separate cooling devices for BW and link
> > width.
> 
> Okay. If that's the case, then I see no reason to add the Link Width 
> cooling device now as it could do nothing besides reporting the
> current 
> link width.
> 
> The only question that then remains is how to take this into account
> in 
> the naming of the cooling devices, currently PCIe_Port_<pci_name()>
> is 
> used but perhaps it would be better to change that to 
> PCIe_Port_Link_Speed_... to allow PCI_Port_Link_Width_... to be added
> later beside it?
It is better in that way to add BW controller later.

Also adding separate cooling device will let thermal configuration,
choose different method at different thermal thresholds or all
together.

Thanks,
Srinivas

> 
> > Also there is a requirement that anything you add to thermal sysfs,
> > it
> > should have some purpose for thermal control. I hope Link width
> > control
> > is targeted to similar use case BW control.
> 
> Ability to control Link Width seems to be part of PCIe 6.0 L0p.
> AFAICT, 
> the reasons are to lower/control power consumption so it seems to be 
> within scope.
> 
>
Srinivas Pandruvada Sept. 12, 2023, 6:08 p.m. UTC | #8
On Tue, 2023-09-12 at 10:45 -0700, srinivas pandruvada wrote:
> On Tue, 2023-09-12 at 15:52 +0300, Ilpo Järvinen wrote:
> > On Mon, 11 Sep 2023, srinivas pandruvada wrote:
> > > On Mon, 2023-09-11 at 18:47 +0300, Ilpo Järvinen wrote:
> > > 
> > > 
> 
> [...]
> 
> > > But I don't suggest using such method. This causes confusion and
> > > difficult to change. For example if we increase range of P-state
> > > control, then there is no way to know what is the start point of
> > > T-
> > > states.
> > 
> > Yes. I understand it would be confusing.
> > 
> > > It is best to create to separate cooling devices for BW and link
> > > width.
> > 
> > Okay. If that's the case, then I see no reason to add the Link
> > Width 
> > cooling device now as it could do nothing besides reporting the
> > current 
> > link width.
> > 
> > The only question that then remains is how to take this into
> > account
> > in 
> > the naming of the cooling devices, currently PCIe_Port_<pci_name()>
> > is 
> > used but perhaps it would be better to change that to 
> > PCIe_Port_Link_Speed_... to allow PCI_Port_Link_Width_... to be
> > added
> > later beside it?
> It is better in that way to add BW 
sorry, link width controller

> controller later.
> 
> Also adding separate cooling device will let thermal configuration,
> choose different method at different thermal thresholds or all
> together.
> 
> Thanks,
> Srinivas
> 
> > 
> > > Also there is a requirement that anything you add to thermal
> > > sysfs,
> > > it
> > > should have some purpose for thermal control. I hope Link width
> > > control
> > > is targeted to similar use case BW control.
> > 
> > Ability to control Link Width seems to be part of PCIe 6.0 L0p.
> > AFAICT, 
> > the reasons are to lower/control power consumption so it seems to
> > be 
> > within scope.
> > 
> > 
>