mbox series

[RFC,0/2] GENL interface for ACPI _DSM methods

Message ID 20250106163045.508959-1-wathsala.vithanage@arm.com (mailing list archive)
Headers show
Series GENL interface for ACPI _DSM methods | expand

Message

Wathsala Wathawana Vithanage Jan. 6, 2025, 4:30 p.m. UTC
Linux v6.13-rc1 added support for PCIe TPH and direct cache injection.
As already described in the patch set[1] that introduced this feature,
the cache injection in supported hardware allows optimal utilization of
platform resources for specific requests on the PCIe bus. However, the
patch set [1] implements the functionality for usage within the kernel.
But certain user space applications, especially those whose performance
is sensitive to the latency of inbound writes as seen by a CPU core, may
benefit from using this information (E.g., the DPDK cache stashing
feature discussed in RFC [2]). This RFC is an attempt to obtain the PCIe
steering tag information from the kernel to be used by user mode
applications. We understand that there is more than one way to provide
this information. Please review and suggest alternatives if necessary.

The first of the two patches introduced in this RFC attempts to overcome
the kernel-only limitation by providing an API to kernel subsystems to
hook up relevant _DSM methods to a GENL interface. User space
applications can invoke a _DSM hooked up to this interface via the
"acpi-event" GENL family socket, granted they have the minimum
capabilities and message formats demanded by the kernel subsystem that
hooked up the _DSM method. This feature is added by extending the
"acpi-event" GENL family that multicasts ACPI events to the user-space
applications such as acpid.

The second patch of this RFC hooks up the PCIe root-port TLP Processing
Hints (TPH) _DSM to the ACPI GENL interface. User space applications
like [2] can now request the kernel to execute the _DSM on their behalf
and return steering-tag information.

[1] lore.kernel.org/linux-pci/20241002165954.128085-1-wei.huang2@amd.com
[2] inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com

Wathsala Vithanage (2):
  ACPI: Add support for invoking select _DSM methods from user space
  PCI: Add generic netlink interface to TPH _DSM

 drivers/acpi/Makefile                 |   3 +-
 drivers/acpi/{event.c => acpi_genl.c} | 110 ++++++++++++++++++++++-
 drivers/acpi/acpi_genl_dsm.c          |  76 ++++++++++++++++
 drivers/pci/tph.c                     | 121 ++++++++++++++++++++++++++
 include/acpi/acpi_genl.h              |  54 ++++++++++++
 include/linux/acpi.h                  |   1 +
 6 files changed, 360 insertions(+), 5 deletions(-)
 rename drivers/acpi/{event.c => acpi_genl.c} (63%)
 create mode 100644 drivers/acpi/acpi_genl_dsm.c
 create mode 100644 include/acpi/acpi_genl.h

Comments

Jonathan Cameron Jan. 6, 2025, 6:01 p.m. UTC | #1
On Mon, 6 Jan 2025 16:30:43 +0000
Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:

> Linux v6.13-rc1 added support for PCIe TPH and direct cache injection.
> As already described in the patch set[1] that introduced this feature,
> the cache injection in supported hardware allows optimal utilization of
> platform resources for specific requests on the PCIe bus. However, the
> patch set [1] implements the functionality for usage within the kernel.
> But certain user space applications, especially those whose performance
> is sensitive to the latency of inbound writes as seen by a CPU core, may
> benefit from using this information (E.g., the DPDK cache stashing
> feature discussed in RFC [2]). This RFC is an attempt to obtain the PCIe
> steering tag information from the kernel to be used by user mode
> applications. We understand that there is more than one way to provide
> this information. Please review and suggest alternatives if necessary.
> 
> The first of the two patches introduced in this RFC attempts to overcome
> the kernel-only limitation by providing an API to kernel subsystems to
> hook up relevant _DSM methods to a GENL interface. User space
> applications can invoke a _DSM hooked up to this interface via the
> "acpi-event" GENL family socket, granted they have the minimum
> capabilities and message formats demanded by the kernel subsystem that
> hooked up the _DSM method. This feature is added by extending the
> "acpi-event" GENL family that multicasts ACPI events to the user-space
> applications such as acpid.
> 
> The second patch of this RFC hooks up the PCIe root-port TLP Processing
> Hints (TPH) _DSM to the ACPI GENL interface. User space applications
> like [2] can now request the kernel to execute the _DSM on their behalf
> and return steering-tag information.
> 
> [1] lore.kernel.org/linux-pci/20241002165954.128085-1-wei.huang2@amd.com
> [2] inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com

Hi Wathsala,

Superficially this feels like another potential interface that could be wrapped
up under appropriate fwctl. Jason, what do you think?

Mind you I'm not personally convinced that an interface that focuses on
exposing _DSM calls to userspace makes sense as opposed to subsystem specific
stuff.

Maybe consider associating the actual interface with the individual PCI functions
(which provides the first chunk of the message directly).

Also, _DSM is just one form of firmware interface used for PCI supporting
system. Tying the userspace interface to that feels unwise.  I can certainly
foresee a PSCI/SCMI or similar interface for this on ARM platforms
wrapped up in _DSM where ACPI is present but directly accessed when DT
is in use.

I'd also request that you break out what goes in ARG0,1,2 as that is all
stuff that the kernel is aware of and not all reviewers have access to the
ECN (I do though).  In particular the fact there are ACPI UIDs may
need a more generic solution.

Jonathan

> Wathsala Vithanage (2):
>   ACPI: Add support for invoking select _DSM methods from user space
>   PCI: Add generic netlink interface to TPH _DSM
> 
>  drivers/acpi/Makefile                 |   3 +-
>  drivers/acpi/{event.c => acpi_genl.c} | 110 ++++++++++++++++++++++-
>  drivers/acpi/acpi_genl_dsm.c          |  76 ++++++++++++++++
>  drivers/pci/tph.c                     | 121 ++++++++++++++++++++++++++
>  include/acpi/acpi_genl.h              |  54 ++++++++++++
>  include/linux/acpi.h                  |   1 +
>  6 files changed, 360 insertions(+), 5 deletions(-)
>  rename drivers/acpi/{event.c => acpi_genl.c} (63%)
>  create mode 100644 drivers/acpi/acpi_genl_dsm.c
>  create mode 100644 include/acpi/acpi_genl.h
>
Jeremy Linton Jan. 7, 2025, 5:37 p.m. UTC | #2
Hi,

On 1/6/25 12:01 PM, Jonathan Cameron wrote:
> On Mon, 6 Jan 2025 16:30:43 +0000
> Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:
> 
>> Linux v6.13-rc1 added support for PCIe TPH and direct cache injection.
>> As already described in the patch set[1] that introduced this feature,
>> the cache injection in supported hardware allows optimal utilization of
>> platform resources for specific requests on the PCIe bus. However, the
>> patch set [1] implements the functionality for usage within the kernel.
>> But certain user space applications, especially those whose performance
>> is sensitive to the latency of inbound writes as seen by a CPU core, may
>> benefit from using this information (E.g., the DPDK cache stashing
>> feature discussed in RFC [2]). This RFC is an attempt to obtain the PCIe
>> steering tag information from the kernel to be used by user mode
>> applications. We understand that there is more than one way to provide
>> this information. Please review and suggest alternatives if necessary.
>>
>> The first of the two patches introduced in this RFC attempts to overcome
>> the kernel-only limitation by providing an API to kernel subsystems to
>> hook up relevant _DSM methods to a GENL interface. User space
>> applications can invoke a _DSM hooked up to this interface via the
>> "acpi-event" GENL family socket, granted they have the minimum
>> capabilities and message formats demanded by the kernel subsystem that
>> hooked up the _DSM method. This feature is added by extending the
>> "acpi-event" GENL family that multicasts ACPI events to the user-space
>> applications such as acpid.
>>
>> The second patch of this RFC hooks up the PCIe root-port TLP Processing
>> Hints (TPH) _DSM to the ACPI GENL interface. User space applications
>> like [2] can now request the kernel to execute the _DSM on their behalf
>> and return steering-tag information.
>>
>> [1] lore.kernel.org/linux-pci/20241002165954.128085-1-wei.huang2@amd.com
>> [2] inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com
> 
> Hi Wathsala,
> 
> Superficially this feels like another potential interface that could be wrapped
> up under appropriate fwctl. Jason, what do you think?
> 
> Mind you I'm not personally convinced that an interface that focuses on
> exposing _DSM calls to userspace makes sense as opposed to subsystem specific
> stuff.
> 
> Maybe consider associating the actual interface with the individual PCI functions
> (which provides the first chunk of the message directly).

Right,

I think this was similar to a conversation we had internally, which was 
basically to detect the PCIe extended capability and export a 'steering' 
entry in sysfs on each PCIe device which can take a logical cpu/cache 
value, translate those on write to the ACPI cpu/cache id's, make the 
firmware call, then directly update the PCIe device's capability with 
the result. This also leaves the door open for future 
cpu/cache->steering tag translation methods to transparently replace the 
_DSM call while leaving the userspace API the same.


> 
> Also, _DSM is just one form of firmware interface used for PCI supporting
> system. Tying the userspace interface to that feels unwise.  I can certainly
> foresee a PSCI/SCMI or similar interface for this on ARM platforms
> wrapped up in _DSM where ACPI is present but directly accessed when DT
> is in use.
> 
> I'd also request that you break out what goes in ARG0,1,2 as that is all
> stuff that the kernel is aware of and not all reviewers have access to the
> ECN (I do though).  In particular the fact there are ACPI UIDs may
> need a more generic solution.
> 
> Jonathan
> 
>> Wathsala Vithanage (2):
>>    ACPI: Add support for invoking select _DSM methods from user space
>>    PCI: Add generic netlink interface to TPH _DSM
>>
>>   drivers/acpi/Makefile                 |   3 +-
>>   drivers/acpi/{event.c => acpi_genl.c} | 110 ++++++++++++++++++++++-
>>   drivers/acpi/acpi_genl_dsm.c          |  76 ++++++++++++++++
>>   drivers/pci/tph.c                     | 121 ++++++++++++++++++++++++++
>>   include/acpi/acpi_genl.h              |  54 ++++++++++++
>>   include/linux/acpi.h                  |   1 +
>>   6 files changed, 360 insertions(+), 5 deletions(-)
>>   rename drivers/acpi/{event.c => acpi_genl.c} (63%)
>>   create mode 100644 drivers/acpi/acpi_genl_dsm.c
>>   create mode 100644 include/acpi/acpi_genl.h
>>
>
Jason Gunthorpe Jan. 7, 2025, 5:48 p.m. UTC | #3
On Tue, Jan 07, 2025 at 11:37:01AM -0600, Jeremy Linton wrote:
> Hi,
> 
> On 1/6/25 12:01 PM, Jonathan Cameron wrote:
> > On Mon, 6 Jan 2025 16:30:43 +0000
> > Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:
> > 
> > > Linux v6.13-rc1 added support for PCIe TPH and direct cache injection.
> > > As already described in the patch set[1] that introduced this feature,
> > > the cache injection in supported hardware allows optimal utilization of
> > > platform resources for specific requests on the PCIe bus. However, the
> > > patch set [1] implements the functionality for usage within the kernel.
> > > But certain user space applications, especially those whose performance
> > > is sensitive to the latency of inbound writes as seen by a CPU core, may
> > > benefit from using this information (E.g., the DPDK cache stashing
> > > feature discussed in RFC [2]). 

There is no way for userspace to program TPH information into a PCI
device without going through a kernel driver, and the kernel driver
must be the exclusive owner of the steering tag configuration or chaos
would ensue. Having a way for sysfs to override this seems very wrong
to me, and I think you should not go in this direction.

DPDK runs on VFIO or RDMA. It would natural to have an VFIO native API
to manipulate the steering tags, and we are already discussing what
RDMA support for steering tag would look like.

> > Superficially this feels like another potential interface that could be wrapped
> > up under appropriate fwctl. Jason, what do you think?

As above, I think this very squarely belongs under the appropriate
subsystems that are providing the kernel drivers for the device. There
is no reasonable way to share steering tags with unrelated userspace
through any mechanism. Basically it fails the independence test of
fwctl.

> I think this was similar to a conversation we had internally, which was
> basically to detect the PCIe extended capability and export a 'steering'
> entry in sysfs on each PCIe device which can take a logical cpu/cache value,
> translate those on write to the ACPI cpu/cache id's, make the firmware call,
> then directly update the PCIe device's capability with the result. 

Seems wrong, driver must do this. If the driver was already using that
entry for something else you've just wrecked it.

Jason
Jeremy Linton Jan. 8, 2025, 7:59 p.m. UTC | #4
Hi,

On 1/7/25 11:48 AM, Jason Gunthorpe wrote:
> On Tue, Jan 07, 2025 at 11:37:01AM -0600, Jeremy Linton wrote:
>> Hi,
>>
>> On 1/6/25 12:01 PM, Jonathan Cameron wrote:
>>> On Mon, 6 Jan 2025 16:30:43 +0000
>>> Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:
>>>
>>>> Linux v6.13-rc1 added support for PCIe TPH and direct cache injection.
>>>> As already described in the patch set[1] that introduced this feature,
>>>> the cache injection in supported hardware allows optimal utilization of
>>>> platform resources for specific requests on the PCIe bus. However, the
>>>> patch set [1] implements the functionality for usage within the kernel.
>>>> But certain user space applications, especially those whose performance
>>>> is sensitive to the latency of inbound writes as seen by a CPU core, may
>>>> benefit from using this information (E.g., the DPDK cache stashing
>>>> feature discussed in RFC [2]).
> 
> There is no way for userspace to program TPH information into a PCI
> device without going through a kernel driver, and the kernel driver
> must be the exclusive owner of the steering tag configuration or chaos
> would ensue. Having a way for sysfs to override this seems very wrong
> to me, and I think you should not go in this direction.
> 
> DPDK runs on VFIO or RDMA. It would natural to have an VFIO native API
> to manipulate the steering tags, and we are already discussing what
> RDMA support for steering tag would look like.
> 
>>> Superficially this feels like another potential interface that could be wrapped
>>> up under appropriate fwctl. Jason, what do you think?
> 
> As above, I think this very squarely belongs under the appropriate
> subsystems that are providing the kernel drivers for the device. There
> is no reasonable way to share steering tags with unrelated userspace
> through any mechanism. Basically it fails the independence test of
> fwctl.
> 
>> I think this was similar to a conversation we had internally, which was
>> basically to detect the PCIe extended capability and export a 'steering'
>> entry in sysfs on each PCIe device which can take a logical cpu/cache value,
>> translate those on write to the ACPI cpu/cache id's, make the firmware call,
>> then directly update the PCIe device's capability with the result.
> 
> Seems wrong, driver must do this. If the driver was already using that
> entry for something else you've just wrecked it.

Can you clarify what you mean by 'wrecked'? AFAIK a valid, if poorly 
chosen, steering tag is going to be sub-optimal performance.

I'm under the impression this is a similar problem to cpu/irq/numa 
affinity where the driver/subsystem should be making the choice, but the 
user is provided the opportunity to override the defaults if they think 
there is benefit in their environment. Again AFAIK, the whole 
OS/software stashing is already well down the 'I know better than the HW 
where to store this data' rabbit hole.


Thanks,
Jason Gunthorpe Jan. 8, 2025, 8:50 p.m. UTC | #5
On Wed, Jan 08, 2025 at 01:59:35PM -0600, Jeremy Linton wrote:

> I'm under the impression this is a similar problem to cpu/irq/numa affinity
> where the driver/subsystem should be making the choice, but the user is
> provided the opportunity to override the defaults if they think there is
> benefit in their environment. 

Which I think has been proven to have been a mistake. Instead over
overriding irq affinity though proc/irq under the covers of the driver
and hoping for the best the driver itself should have the opportinuty
to set the affinity for its objects directly.

Lets us not repeat this mistake with steering tag. The driver should
always be involved in this stuff, if you want it to work with DPDK
then go through the kernel driver that DPDK is running on top of (VFIO
or RDMA)

Jason
Wathsala Wathawana Vithanage Jan. 9, 2025, 12:34 a.m. UTC | #6
> 
> On Wed, Jan 08, 2025 at 01:59:35PM -0600, Jeremy Linton wrote:
> 
> > I'm under the impression this is a similar problem to cpu/irq/numa
> > affinity where the driver/subsystem should be making the choice, but
> > the user is provided the opportunity to override the defaults if they
> > think there is benefit in their environment.
> 
> Which I think has been proven to have been a mistake. Instead over overriding irq
> affinity though proc/irq under the covers of the driver and hoping for the best the
> driver itself should have the opportinuty to set the affinity for its objects directly.
> 

Do you mean that the driver should handle affinity requests from the user directly
as per its policy?

> Lets us not repeat this mistake with steering tag. The driver should always be
> involved in this stuff, if you want it to work with DPDK then go through the kernel
> driver that DPDK is running on top of (VFIO or RDMA)
> 

This RFC is only about acquiring the steering tag from the ACPI _DSM, which the DPDK
user space driver will set in the queue context of the device it manages.
Setting of the steering tag part happens in the DPDK device driver.
Are you suggesting that I should instead pass a CPU and a cache ID to VFIO and let VFIO
decide what's right for the application?


--wathsala