mbox series

[RESEND,0/4] Add support for HiSilicon PCIe Tune and Trace device

Message ID 1618654631-42454-1-git-send-email-yangyicong@hisilicon.com (mailing list archive)
Headers show
Series Add support for HiSilicon PCIe Tune and Trace device | expand

Message

Yicong Yang April 17, 2021, 10:17 a.m. UTC
[RESEND with perf and coresight folks Cc'ed]

HiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex
integrated Endpoint (RCiEP) device, providing the capability
to dynamically monitor and tune the PCIe traffic (tune),
and trace the TLP headers (trace).

PTT tune is designed for monitoring and adjusting PCIe link parameters.
We provide several parameters of the PCIe link. Through the driver,
user can adjust the value of certain parameter to affect the PCIe link
for the purpose of enhancing the performance in certian situation.

PTT trace is designed for dumping the TLP headers to the memory, which
can be used to analyze the transactions and usage condition of the PCIe
Link. Users can choose filters to trace headers, by either requester
ID, or those downstream of a set of Root Ports on the same core of the
PTT device. It's also supported to trace the headers of certain type and
of certain direction.

We use debugfs to expose the interface. For tune, one parameter is a
debugfs file and user can set/get the value by reading/writing the
file. For trace, we have several control files for the user to
configure the trace parameters like filters, TLP type and format,
the desired trace data size and so on. There is one data file for
dumping the traced data to the user. The traced data maybe hundreds
of megabytes so sysfs cannot support. The reason for debugfs rather
than character device is that we don't want to have additional
userspace tools. The operation through debugfs is easier and a bit
like ftrace.

The reason for not using perf is because there is no current support
for uncore tracing in the perf facilities. We have our own format
of data and don't need perf doing the parsing. The setting through
perf tools doesn't seem to be friendly as well. For example,
we cannot count on perf to decode the usual format BDF number like
<domain>:<bus>:<dev>.<fn>, which user can use to filter the TLP
headers through the PTT device.

A similar approach for implementing this function is ETM, which use
sysfs for configuring and a character device for dumping data.

Greg has some comments on our implementation and doesn't advocate
to build driver on debugfs [1]. So I resend this series to
collect more feedbacks on the implementation of this driver.

Hi perf and ETM related experts, is it suggested to adapt this driver
to perf? Or is the debugfs approach acceptable? Otherwise use
sysfs + character device like ETM and use perf tools for decoding it?
Any comments is welcomed.

[1] https://lore.kernel.org/linux-pci/1617713154-35533-1-git-send-email-yangyicong@hisilicon.com/

Yicong Yang (4):
  hwtracing: Add trace function support for HiSilicon PCIe Tune and
    Trace device
  hwtracing: Add tune function support for HiSilicon PCIe Tune and Trace
    device
  docs: Add HiSilicon PTT device driver documentation
  MAINTAINERS: Add maintainer for HiSilicon PTT driver

 Documentation/trace/hisi-ptt.rst       |  326 +++++++
 MAINTAINERS                            |    7 +
 drivers/Makefile                       |    1 +
 drivers/hwtracing/Kconfig              |    2 +
 drivers/hwtracing/hisilicon/Kconfig    |   11 +
 drivers/hwtracing/hisilicon/Makefile   |    2 +
 drivers/hwtracing/hisilicon/hisi_ptt.c | 1636 ++++++++++++++++++++++++++++++++
 7 files changed, 1985 insertions(+)
 create mode 100644 Documentation/trace/hisi-ptt.rst
 create mode 100644 drivers/hwtracing/hisilicon/Kconfig
 create mode 100644 drivers/hwtracing/hisilicon/Makefile
 create mode 100644 drivers/hwtracing/hisilicon/hisi_ptt.c

Comments

Alexander Shishkin April 17, 2021, 1:56 p.m. UTC | #1
Yicong Yang <yangyicong@hisilicon.com> writes:

> The reason for not using perf is because there is no current support
> for uncore tracing in the perf facilities.

Not unless you count

$ perf list|grep -ic uncore
77

> We have our own format
> of data and don't need perf doing the parsing.

Perf has AUX buffers, which are used for all kinds of own formats.

> A similar approach for implementing this function is ETM, which use
> sysfs for configuring and a character device for dumping data.

And also perf. One reason ETM has a sysfs interface is because the
driver predates perf's AUX buffers. Can't say if it's the only
reason. I'm assuming you're talking about Coresight ETM.

> Greg has some comments on our implementation and doesn't advocate
> to build driver on debugfs [1]. So I resend this series to
> collect more feedbacks on the implementation of this driver.
>
> Hi perf and ETM related experts, is it suggested to adapt this driver
> to perf? Or is the debugfs approach acceptable? Otherwise use

Aside from the above, I don't think the use of debugfs for kernel ABIs
is ever encouraged.

Regards,
--
Ale
Suzuki K Poulose April 19, 2021, 11:17 a.m. UTC | #2
On 17/04/2021 11:17, Yicong Yang wrote:
> [RESEND with perf and coresight folks Cc'ed]
> 
> HiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex
> integrated Endpoint (RCiEP) device, providing the capability
> to dynamically monitor and tune the PCIe traffic (tune),
> and trace the TLP headers (trace).
> 
> PTT tune is designed for monitoring and adjusting PCIe link parameters.
> We provide several parameters of the PCIe link. Through the driver,
> user can adjust the value of certain parameter to affect the PCIe link
> for the purpose of enhancing the performance in certian situation.

...

> 
> The reason for not using perf is because there is no current support
> for uncore tracing in the perf facilities. We have our own format
> of data and don't need perf doing the parsing. The setting through
> perf tools doesn't seem to be friendly as well. For example,
> we cannot count on perf to decode the usual format BDF number like
> <domain>:<bus>:<dev>.<fn>, which user can use to filter the TLP
> headers through the PTT device.
> 
> A similar approach for implementing this function is ETM, which use
> sysfs for configuring and a character device for dumping data.
> 
> Greg has some comments on our implementation and doesn't advocate
> to build driver on debugfs [1]. So I resend this series to
> collect more feedbacks on the implementation of this driver.
> 
> Hi perf and ETM related experts, is it suggested to adapt this driver
> to perf? Or is the debugfs approach acceptable? Otherwise use
> sysfs + character device like ETM and use perf tools for decoding it?
> Any comments is welcomed.

Please use perf. Debugfs / sysfs is not the right place for these things.

Also, please move your driver to drivers/perf/

As Alex mentioned, the ETM drivers were initially developed when the AUX
buffer was not available. The sysfs interface is there only for the 
backward compatibility and for bring up ( due to the nature of the
connections between the CoreSight components and sometimes the missing 
engineering spec).

Suzuki
Yicong Yang April 19, 2021, 1:03 p.m. UTC | #3
On 2021/4/17 21:56, Alexander Shishkin wrote:
> Yicong Yang <yangyicong@hisilicon.com> writes:
> 
>> The reason for not using perf is because there is no current support
>> for uncore tracing in the perf facilities.
> 
> Not unless you count
> 
> $ perf list|grep -ic uncore
> 77
> 

these are uncore events probably do not support sampling.

I tried on x86:

# ./perf record -e uncore_imc_0/cas_count_read/
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (uncore_imc_0/cas_count_read/).
/bin/dmesg | grep -i perf may provide additional information.

For HiSilicon uncore PMUs, we don't support uncore sampling:

'The current driver does not support sampling. So "perf record" is unsupported. ' [1]

and also in another PMU:

'PMU doesn't support process specific events and cannot be used in sampling mode.' [2]

[1] Documentation/admin-guide/perf/hisi-pmu.rst
[2] Documentation/admin-guide/perf/arm_dsu_pmu.rst

>> We have our own format
>> of data and don't need perf doing the parsing.
> 
> Perf has AUX buffers, which are used for all kinds of own formats.
> 

ok. we thought perf will break the data format but AUX buffers seems won't.
do we need to add full support for tracing as well as parsing or it's ok for
not parsing it through perf?

>> A similar approach for implementing this function is ETM, which use
>> sysfs for configuring and a character device for dumping data.
> 
> And also perf. One reason ETM has a sysfs interface is because the
> driver predates perf's AUX buffers. Can't say if it's the only
> reason. I'm assuming you're talking about Coresight ETM.
> 

got it. thanks.

>> Greg has some comments on our implementation and doesn't advocate
>> to build driver on debugfs [1]. So I resend this series to
>> collect more feedbacks on the implementation of this driver.
>>
>> Hi perf and ETM related experts, is it suggested to adapt this driver
>> to perf? Or is the debugfs approach acceptable? Otherwise use
> 
> Aside from the above, I don't think the use of debugfs for kernel ABIs
> is ever encouraged.
> 

ok. thanks for the suggestions.

Regards,
Yicong

> Regards,
> --
> Ale
> 
> .
>
Yicong Yang April 19, 2021, 1:21 p.m. UTC | #4
On 2021/4/19 19:17, Suzuki K Poulose wrote:
> On 17/04/2021 11:17, Yicong Yang wrote:
>> [RESEND with perf and coresight folks Cc'ed]
>>
>> HiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex
>> integrated Endpoint (RCiEP) device, providing the capability
>> to dynamically monitor and tune the PCIe traffic (tune),
>> and trace the TLP headers (trace).
>>
>> PTT tune is designed for monitoring and adjusting PCIe link parameters.
>> We provide several parameters of the PCIe link. Through the driver,
>> user can adjust the value of certain parameter to affect the PCIe link
>> for the purpose of enhancing the performance in certian situation.
> 
> ...
> 
>>
>> The reason for not using perf is because there is no current support
>> for uncore tracing in the perf facilities. We have our own format
>> of data and don't need perf doing the parsing. The setting through
>> perf tools doesn't seem to be friendly as well. For example,
>> we cannot count on perf to decode the usual format BDF number like
>> <domain>:<bus>:<dev>.<fn>, which user can use to filter the TLP
>> headers through the PTT device.
>>
>> A similar approach for implementing this function is ETM, which use
>> sysfs for configuring and a character device for dumping data.
>>
>> Greg has some comments on our implementation and doesn't advocate
>> to build driver on debugfs [1]. So I resend this series to
>> collect more feedbacks on the implementation of this driver.
>>
>> Hi perf and ETM related experts, is it suggested to adapt this driver
>> to perf? Or is the debugfs approach acceptable? Otherwise use
>> sysfs + character device like ETM and use perf tools for decoding it?
>> Any comments is welcomed.
> 
> Please use perf. Debugfs / sysfs is not the right place for these things.
> 

ok.

> Also, please move your driver to drivers/perf/
> 

Does it make sense as it's a tuning and tracing device, and doesn't have counters
nor do the sampling like usual PMU device under drivers/perf/.

> As Alex mentioned, the ETM drivers were initially developed when the AUX
> buffer was not available. The sysfs interface is there only for the backward compatibility and for bring up ( due to the nature of the
> connections between the CoreSight components and sometimes the missing engineering spec).
> 

got it. thanks for the explanation.

Regards,
Yicong

> Suzuki
> 
> .
Suzuki K Poulose April 19, 2021, 4:11 p.m. UTC | #5
On 19/04/2021 14:21, Yicong Yang wrote:
> On 2021/4/19 19:17, Suzuki K Poulose wrote:
>> On 17/04/2021 11:17, Yicong Yang wrote:
>>> [RESEND with perf and coresight folks Cc'ed]
>>>
>>> HiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex
>>> integrated Endpoint (RCiEP) device, providing the capability
>>> to dynamically monitor and tune the PCIe traffic (tune),
>>> and trace the TLP headers (trace).
>>>
>>> PTT tune is designed for monitoring and adjusting PCIe link parameters.
>>> We provide several parameters of the PCIe link. Through the driver,
>>> user can adjust the value of certain parameter to affect the PCIe link
>>> for the purpose of enhancing the performance in certian situation.
>>
>> ...
>>
>>>
>>> The reason for not using perf is because there is no current support
>>> for uncore tracing in the perf facilities. We have our own format
>>> of data and don't need perf doing the parsing. The setting through
>>> perf tools doesn't seem to be friendly as well. For example,
>>> we cannot count on perf to decode the usual format BDF number like
>>> <domain>:<bus>:<dev>.<fn>, which user can use to filter the TLP
>>> headers through the PTT device.
>>>
>>> A similar approach for implementing this function is ETM, which use
>>> sysfs for configuring and a character device for dumping data.
>>>
>>> Greg has some comments on our implementation and doesn't advocate
>>> to build driver on debugfs [1]. So I resend this series to
>>> collect more feedbacks on the implementation of this driver.
>>>
>>> Hi perf and ETM related experts, is it suggested to adapt this driver
>>> to perf? Or is the debugfs approach acceptable? Otherwise use
>>> sysfs + character device like ETM and use perf tools for decoding it?
>>> Any comments is welcomed.
>>
>> Please use perf. Debugfs / sysfs is not the right place for these things.
>>
> 
> ok.
> 
>> Also, please move your driver to drivers/perf/
>>
> 
> Does it make sense as it's a tuning and tracing device, and doesn't have counters
> nor do the sampling like usual PMU device under drivers/perf/.

It doesn't matter. As long as you can drive it via the perf interface,
it can live there. The CoreSight was added way before there was kind
of a suitable place like the above. You could find other uncore PMUs
under drivers/perf.

Suzuki
Leo Yan April 22, 2021, 3:49 a.m. UTC | #6
On Mon, Apr 19, 2021 at 09:03:18PM +0800, Yicong Yang wrote:
> On 2021/4/17 21:56, Alexander Shishkin wrote:
> > Yicong Yang <yangyicong@hisilicon.com> writes:
> > 
> >> The reason for not using perf is because there is no current support
> >> for uncore tracing in the perf facilities.
> > 
> > Not unless you count
> > 
> > $ perf list|grep -ic uncore
> > 77
> > 
> 
> these are uncore events probably do not support sampling.
> 
> I tried on x86:
> 
> # ./perf record -e uncore_imc_0/cas_count_read/
> Error:
> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (uncore_imc_0/cas_count_read/).
> /bin/dmesg | grep -i perf may provide additional information.
> 
> For HiSilicon uncore PMUs, we don't support uncore sampling:
> 
> 'The current driver does not support sampling. So "perf record" is unsupported. ' [1]
> 
> and also in another PMU:
> 
> 'PMU doesn't support process specific events and cannot be used in sampling mode.' [2]
> 
> [1] Documentation/admin-guide/perf/hisi-pmu.rst
> [2] Documentation/admin-guide/perf/arm_dsu_pmu.rst

I did some debugging for this, and yes, it's related with the event
doesn't support sampling for these x86 uncore events.

So I can use below commands for the uncore event
'uncore_imc/data_reads/' in my experiment:

  # perf record -e 'uncore_imc/data_reads/' --no-samples -- ls
  # perf stat -e 'uncore_imc/data_reads/' -- ls

For your case, I think you need to write the callback
pmu::event_init(), it should not forbid any tracing even if set
sampling, just like other perf event drive for support AUX tracing.

> >> We have our own format
> >> of data and don't need perf doing the parsing.
> > 
> > Perf has AUX buffers, which are used for all kinds of own formats.
> > 
> 
> ok. we thought perf will break the data format but AUX buffers seems won't.
> do we need to add full support for tracing as well as parsing or it's ok for
> not parsing it through perf?

IMHO, this could divide into two parts.  The first part is to enable
perf drive with support AUX tracing, and perf tool can capture the trace
data.  The second part is to add the decoder in the perf tool so that
the developers can *consume* the trace data; for the decoder, you
could refer the codes:

  tools/perf/util/intel-pt-decoder/
  tools/perf/util/cs-etm-decoder/

Or Arm SPE case:

  tools/perf/util/arm-spe-decoder/

> >> A similar approach for implementing this function is ETM, which use
> >> sysfs for configuring and a character device for dumping data.
> > 
> > And also perf. One reason ETM has a sysfs interface is because the
> > driver predates perf's AUX buffers. Can't say if it's the only
> > reason. I'm assuming you're talking about Coresight ETM.

I am not the best person to give background for this.  Mathieu or Mike
could give more info for this.  From my undersanding, Sysfs nodes can
be used as knobs for configuration, but it's difficult for profiling.

Let's think about for the profiling, if one developer uses the Sysfs
for the setting and read out the trace data, these informations are
discrete.  If another developer wants to review the profiling result,
then all these info need to be shared together.

So we can benefit much from the perf tool for the usage, since all the
profiling context will be gathered (DSOs, hardware configuration which
can be saved into metadata), so the final profiling file can be easily
shared and more friendly for reviewing.

Thanks,
Leo
Yicong Yang April 22, 2021, 12:54 p.m. UTC | #7
On 2021/4/22 11:49, Leo Yan wrote:
> On Mon, Apr 19, 2021 at 09:03:18PM +0800, Yicong Yang wrote:
>> On 2021/4/17 21:56, Alexander Shishkin wrote:
>>> Yicong Yang <yangyicong@hisilicon.com> writes:
>>>
>>>> The reason for not using perf is because there is no current support
>>>> for uncore tracing in the perf facilities.
>>>
>>> Not unless you count
>>>
>>> $ perf list|grep -ic uncore
>>> 77
>>>
>>
>> these are uncore events probably do not support sampling.
>>
>> I tried on x86:
>>
>> # ./perf record -e uncore_imc_0/cas_count_read/
>> Error:
>> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (uncore_imc_0/cas_count_read/).
>> /bin/dmesg | grep -i perf may provide additional information.
>>
>> For HiSilicon uncore PMUs, we don't support uncore sampling:
>>
>> 'The current driver does not support sampling. So "perf record" is unsupported. ' [1]
>>
>> and also in another PMU:
>>
>> 'PMU doesn't support process specific events and cannot be used in sampling mode.' [2]
>>
>> [1] Documentation/admin-guide/perf/hisi-pmu.rst
>> [2] Documentation/admin-guide/perf/arm_dsu_pmu.rst
> 
> I did some debugging for this, and yes, it's related with the event
> doesn't support sampling for these x86 uncore events.
> 
> So I can use below commands for the uncore event
> 'uncore_imc/data_reads/' in my experiment:
> 
>   # perf record -e 'uncore_imc/data_reads/' --no-samples -- ls
>   # perf stat -e 'uncore_imc/data_reads/' -- ls
> 
> For your case, I think you need to write the callback
> pmu::event_init(), it should not forbid any tracing even if set
> sampling, just like other perf event drive for support AUX tracing.
> 

thanks for the hint! I didn't know much about perf so I only do
the basic test. will further investigate on this.

>>>> We have our own format
>>>> of data and don't need perf doing the parsing.
>>>
>>> Perf has AUX buffers, which are used for all kinds of own formats.
>>>
>>
>> ok. we thought perf will break the data format but AUX buffers seems won't.
>> do we need to add full support for tracing as well as parsing or it's ok for
>> not parsing it through perf?
> 
> IMHO, this could divide into two parts.  The first part is to enable
> perf drive with support AUX tracing, and perf tool can capture the trace
> data.  The second part is to add the decoder in the perf tool so that
> the developers can *consume* the trace data; for the decoder, you
> could refer the codes:
> 
>   tools/perf/util/intel-pt-decoder/
>   tools/perf/util/cs-etm-decoder/
> 
> Or Arm SPE case:
> 
>   tools/perf/util/arm-spe-decoder/
> 

will refer to these implementation to see how to add the decoder for our
traced data. very detailed guidance!

>>>> A similar approach for implementing this function is ETM, which use
>>>> sysfs for configuring and a character device for dumping data.
>>>
>>> And also perf. One reason ETM has a sysfs interface is because the
>>> driver predates perf's AUX buffers. Can't say if it's the only
>>> reason. I'm assuming you're talking about Coresight ETM.
> 
> I am not the best person to give background for this.  Mathieu or Mike
> could give more info for this.  From my undersanding, Sysfs nodes can
> be used as knobs for configuration, but it's difficult for profiling.
> 

as explained by the maintainers that there are some historical reasons for
having sysfs interfaces for ETM as there is no perf AUX buffers at
beginning. I thought sysfs interface as an option but perf AUX buffer
is better as suggested.

> Let's think about for the profiling, if one developer uses the Sysfs
> for the setting and read out the trace data, these informations are
> discrete.  If another developer wants to review the profiling result,
> then all these info need to be shared together.
> 

ok. make sense to me.

> So we can benefit much from the perf tool for the usage, since all the
> profiling context will be gathered (DSOs, hardware configuration which
> can be saved into metadata), so the final profiling file can be easily
> shared and more friendly for reviewing.
> 

ok. it will be beneficial if we use perf for both tracing and decoding,
as we'll also get addition information attached to the trace data.

Considering we have two functions: tracing and tuning. For tracing we
can make use of perf AUX buffer but for tuning, I still cannot see how to
make use of perf. So probably we can make tuning go through sysfs?
And Daniel suggested so.

Appreciate for the suggestion and guidance!

Regards,
Yicong

> Thanks,
> Leo
> 
> .
>