mbox series

[0/4] riscv: Allow userspace to directly access perf counters

Message ID 20230413161725.195417-1-alexghiti@rivosinc.com (mailing list archive)
Headers show
Series riscv: Allow userspace to directly access perf counters | expand

Message

Alexandre Ghiti April 13, 2023, 4:17 p.m. UTC
riscv used to allow direct access to cycle/time/instret counters,
bypassing the perf framework, this patchset intends to allow the user to
mmap any counter when accessed through perf. But we can't break the
existing behaviour so we introduce a sysctl perf_user_access like arm64
does, which defaults to the legacy mode described above.

The core of this patchset lies in patch 4, the first 3 patches are
simple fixes.

base-commit-tag: v6.3-rc1

Alexandre Ghiti (4):
  perf: Fix wrong comment about default event_idx
  include: riscv: Fix wrong include guard in riscv_pmu.h
  riscv: Make legacy counter enum match the HW numbering
  riscv: Enable perf counters user access only through perf

 Documentation/admin-guide/sysctl/kernel.rst |  23 +++-
 arch/riscv/include/asm/perf_event.h         |   3 +
 arch/riscv/kernel/Makefile                  |   2 +-
 arch/riscv/kernel/perf_event.c              |  65 +++++++++++
 drivers/perf/riscv_pmu.c                    |  42 ++++++++
 drivers/perf/riscv_pmu_legacy.c             |  24 ++++-
 drivers/perf/riscv_pmu_sbi.c                | 113 ++++++++++++++++++--
 include/linux/perf/riscv_pmu.h              |   9 +-
 include/linux/perf_event.h                  |   3 +-
 tools/lib/perf/mmap.c                       |  65 +++++++++++
 10 files changed, 332 insertions(+), 17 deletions(-)
 create mode 100644 arch/riscv/kernel/perf_event.c

Comments

Ian Rogers April 13, 2023, 4:36 p.m. UTC | #1
On Thu, Apr 13, 2023 at 9:17 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> riscv used to allow direct access to cycle/time/instret counters,
> bypassing the perf framework, this patchset intends to allow the user to
> mmap any counter when accessed through perf. But we can't break the
> existing behaviour so we introduce a sysctl perf_user_access like arm64
> does, which defaults to the legacy mode described above.
>
> The core of this patchset lies in patch 4, the first 3 patches are
> simple fixes.
>
> base-commit-tag: v6.3-rc1
>
> Alexandre Ghiti (4):
>   perf: Fix wrong comment about default event_idx
>   include: riscv: Fix wrong include guard in riscv_pmu.h
>   riscv: Make legacy counter enum match the HW numbering
>   riscv: Enable perf counters user access only through perf

Presumably the test also needs patching:
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/tests/mmap-basic.c?h=perf-tools-next#n287

Thanks,
Ian


>  Documentation/admin-guide/sysctl/kernel.rst |  23 +++-
>  arch/riscv/include/asm/perf_event.h         |   3 +
>  arch/riscv/kernel/Makefile                  |   2 +-
>  arch/riscv/kernel/perf_event.c              |  65 +++++++++++
>  drivers/perf/riscv_pmu.c                    |  42 ++++++++
>  drivers/perf/riscv_pmu_legacy.c             |  24 ++++-
>  drivers/perf/riscv_pmu_sbi.c                | 113 ++++++++++++++++++--
>  include/linux/perf/riscv_pmu.h              |   9 +-
>  include/linux/perf_event.h                  |   3 +-
>  tools/lib/perf/mmap.c                       |  65 +++++++++++
>  10 files changed, 332 insertions(+), 17 deletions(-)
>  create mode 100644 arch/riscv/kernel/perf_event.c
>
> --
> 2.37.2
>
Atish Patra April 13, 2023, 7:17 p.m. UTC | #2
On Thu, Apr 13, 2023 at 9:47 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> riscv used to allow direct access to cycle/time/instret counters,
> bypassing the perf framework, this patchset intends to allow the user to
> mmap any counter when accessed through perf. But we can't break the
> existing behaviour so we introduce a sysctl perf_user_access like arm64
> does, which defaults to the legacy mode described above.
>

It would be good provide additional direction for user space packages:

The legacy behavior is supported for now in order to avoid breaking
existing software.
However, reading counters directly without perf interaction may
provide incorrect values which
the userspace software must avoid. We are hoping that the user space
packages which
read the cycle/instret directly, will move to the proper interface
eventually if they actually need it.
Most of the users are supposed to read "time" instead of "cycle" if
they intend to read timestamps.

The legacy sysctl option will be removed in the future. The plan is
that the distros will
set the default option to SYSCTL_USER_ACCESS which enables user
counters only through perf
sooner (as soon as they make sure the packages built for that distro
don't read cycle/instret) directly.

> The core of this patchset lies in patch 4, the first 3 patches are
> simple fixes.
>
> base-commit-tag: v6.3-rc1
>
> Alexandre Ghiti (4):
>   perf: Fix wrong comment about default event_idx
>   include: riscv: Fix wrong include guard in riscv_pmu.h
>   riscv: Make legacy counter enum match the HW numbering
>   riscv: Enable perf counters user access only through perf
>
>  Documentation/admin-guide/sysctl/kernel.rst |  23 +++-
>  arch/riscv/include/asm/perf_event.h         |   3 +
>  arch/riscv/kernel/Makefile                  |   2 +-
>  arch/riscv/kernel/perf_event.c              |  65 +++++++++++
>  drivers/perf/riscv_pmu.c                    |  42 ++++++++
>  drivers/perf/riscv_pmu_legacy.c             |  24 ++++-
>  drivers/perf/riscv_pmu_sbi.c                | 113 ++++++++++++++++++--
>  include/linux/perf/riscv_pmu.h              |   9 +-
>  include/linux/perf_event.h                  |   3 +-
>  tools/lib/perf/mmap.c                       |  65 +++++++++++
>  10 files changed, 332 insertions(+), 17 deletions(-)
>  create mode 100644 arch/riscv/kernel/perf_event.c
>
> --
> 2.37.2
>
David Laight April 13, 2023, 9:10 p.m. UTC | #3
From: Atish Patra
> Sent: 13 April 2023 20:18
> 
> On Thu, Apr 13, 2023 at 9:47 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> >
> > riscv used to allow direct access to cycle/time/instret counters,
> > bypassing the perf framework, this patchset intends to allow the user to
> > mmap any counter when accessed through perf. But we can't break the
> > existing behaviour so we introduce a sysctl perf_user_access like arm64
> > does, which defaults to the legacy mode described above.
> >
> 
> It would be good provide additional direction for user space packages:
> 
> The legacy behavior is supported for now in order to avoid breaking
> existing software.
> However, reading counters directly without perf interaction may
> provide incorrect values which
> the userspace software must avoid. We are hoping that the user space
> packages which
> read the cycle/instret directly, will move to the proper interface
> eventually if they actually need it.
> Most of the users are supposed to read "time" instead of "cycle" if
> they intend to read timestamps.

If you are trying to measure the performance of short code
fragments then you need pretty much raw access directly to
the cycle/clock count register.

I've done this on x86 to compare the actual cycle times
of different implementations of the IP checksum loop
(and compare them to the theoretical limit).
The perf framework just added far too much latency,
only directly reading the cpu registers gave anything
like reliable (and consistent) answers.

Clearly process switches (especially cpu migrations) cause
problems, but they are obviously invalid values and can
be ignored.

So while a lot of uses may be 'happy' with the values the
perf framework gives, sometimes you do need to directly
read the relevant registers.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Atish Patra April 18, 2023, 4:43 p.m. UTC | #4
On Fri, Apr 14, 2023 at 2:40 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Atish Patra
> > Sent: 13 April 2023 20:18
> >
> > On Thu, Apr 13, 2023 at 9:47 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > >
> > > riscv used to allow direct access to cycle/time/instret counters,
> > > bypassing the perf framework, this patchset intends to allow the user to
> > > mmap any counter when accessed through perf. But we can't break the
> > > existing behaviour so we introduce a sysctl perf_user_access like arm64
> > > does, which defaults to the legacy mode described above.
> > >
> >
> > It would be good provide additional direction for user space packages:
> >
> > The legacy behavior is supported for now in order to avoid breaking
> > existing software.
> > However, reading counters directly without perf interaction may
> > provide incorrect values which
> > the userspace software must avoid. We are hoping that the user space
> > packages which
> > read the cycle/instret directly, will move to the proper interface
> > eventually if they actually need it.
> > Most of the users are supposed to read "time" instead of "cycle" if
> > they intend to read timestamps.
>
> If you are trying to measure the performance of short code
> fragments then you need pretty much raw access directly to
> the cycle/clock count register.
>
> I've done this on x86 to compare the actual cycle times
> of different implementations of the IP checksum loop
> (and compare them to the theoretical limit).
> The perf framework just added far too much latency,
> only directly reading the cpu registers gave anything
> like reliable (and consistent) answers.
>

This series allows direct access to the counters once configured
through the perf.
Earlier the cycle/instret counters are directly exposed to the
userspace without kernel/perf frameworking knowing
when/which user space application is reading it. That has security implications.

With this series applied, the user space application just needs to
configure the event (cycle/instret) through perf syscall.
Once configured, the userspace application can find out the counter
information from the mmap & directly
read the counter. There is no latency while reading the counters.

This mechanism allows stop/clear the counters when the requesting task
is not running. It also takes care of context switching
which may result in invalid values as you mentioned below. This is
nothing new and all other arch (x86, ARM64) allow user space
counter read through the same mechanism.

Here is the relevant upstream discussion:
https://lore.kernel.org/lkml/Y7wLa7I2hlz3rKw%2F@hirez.programming.kicks-ass.net/T/

ARM64:
https://docs.kernel.org/arm64/perf.html?highlight=perf_user_access#perf-userspace-pmu-hardware-counter-access

example usage in x86:
https://github.com/andikleen/pmu-tools/blob/master/jevents/rdpmc.c

> Clearly process switches (especially cpu migrations) cause
> problems, but they are obviously invalid values and can
> be ignored.
>
> So while a lot of uses may be 'happy' with the values the
> perf framework gives, sometimes you do need to directly
> read the relevant registers.
>
>         David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
Ian Rogers April 18, 2023, 6:15 p.m. UTC | #5
On Tue, Apr 18, 2023 at 9:43 AM Atish Patra <atishp@atishpatra.org> wrote:
>
> On Fri, Apr 14, 2023 at 2:40 AM David Laight <David.Laight@aculab.com> wrote:
> >
> > From: Atish Patra
> > > Sent: 13 April 2023 20:18
> > >
> > > On Thu, Apr 13, 2023 at 9:47 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > >
> > > > riscv used to allow direct access to cycle/time/instret counters,
> > > > bypassing the perf framework, this patchset intends to allow the user to
> > > > mmap any counter when accessed through perf. But we can't break the
> > > > existing behaviour so we introduce a sysctl perf_user_access like arm64
> > > > does, which defaults to the legacy mode described above.
> > > >
> > >
> > > It would be good provide additional direction for user space packages:
> > >
> > > The legacy behavior is supported for now in order to avoid breaking
> > > existing software.
> > > However, reading counters directly without perf interaction may
> > > provide incorrect values which
> > > the userspace software must avoid. We are hoping that the user space
> > > packages which
> > > read the cycle/instret directly, will move to the proper interface
> > > eventually if they actually need it.
> > > Most of the users are supposed to read "time" instead of "cycle" if
> > > they intend to read timestamps.
> >
> > If you are trying to measure the performance of short code
> > fragments then you need pretty much raw access directly to
> > the cycle/clock count register.
> >
> > I've done this on x86 to compare the actual cycle times
> > of different implementations of the IP checksum loop
> > (and compare them to the theoretical limit).
> > The perf framework just added far too much latency,
> > only directly reading the cpu registers gave anything
> > like reliable (and consistent) answers.
> >
>
> This series allows direct access to the counters once configured
> through the perf.
> Earlier the cycle/instret counters are directly exposed to the
> userspace without kernel/perf frameworking knowing
> when/which user space application is reading it. That has security implications.
>
> With this series applied, the user space application just needs to
> configure the event (cycle/instret) through perf syscall.
> Once configured, the userspace application can find out the counter
> information from the mmap & directly
> read the counter. There is no latency while reading the counters.
>
> This mechanism allows stop/clear the counters when the requesting task
> is not running. It also takes care of context switching
> which may result in invalid values as you mentioned below. This is
> nothing new and all other arch (x86, ARM64) allow user space
> counter read through the same mechanism.
>
> Here is the relevant upstream discussion:
> https://lore.kernel.org/lkml/Y7wLa7I2hlz3rKw%2F@hirez.programming.kicks-ass.net/T/
>
> ARM64:
> https://docs.kernel.org/arm64/perf.html?highlight=perf_user_access#perf-userspace-pmu-hardware-counter-access
>
> example usage in x86:
> https://github.com/andikleen/pmu-tools/blob/master/jevents/rdpmc.c

The canonical implementation of this should be:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/mmap.c#n400
which is updated in these patches but the tests are not:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/tests/mmap-basic.c#n287
Which appears to be an oversight. The tests display some differences
between x86 and aarch64 that have assumed userspace hardware counter
access, and everything else that it is assumed don't.

Thanks,
Ian

> > Clearly process switches (especially cpu migrations) cause
> > problems, but they are obviously invalid values and can
> > be ignored.
> >
> > So while a lot of uses may be 'happy' with the values the
> > perf framework gives, sometimes you do need to directly
> > read the relevant registers.
> >
> >         David
> >
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > Registration No: 1397386 (Wales)
>
>
>
> --
> Regards,
> Atish
Atish Patra April 18, 2023, 8:30 p.m. UTC | #6
On Tue, Apr 18, 2023 at 11:46 PM Ian Rogers <irogers@google.com> wrote:
>
> On Tue, Apr 18, 2023 at 9:43 AM Atish Patra <atishp@atishpatra.org> wrote:
> >
> > On Fri, Apr 14, 2023 at 2:40 AM David Laight <David.Laight@aculab.com> wrote:
> > >
> > > From: Atish Patra
> > > > Sent: 13 April 2023 20:18
> > > >
> > > > On Thu, Apr 13, 2023 at 9:47 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > >
> > > > > riscv used to allow direct access to cycle/time/instret counters,
> > > > > bypassing the perf framework, this patchset intends to allow the user to
> > > > > mmap any counter when accessed through perf. But we can't break the
> > > > > existing behaviour so we introduce a sysctl perf_user_access like arm64
> > > > > does, which defaults to the legacy mode described above.
> > > > >
> > > >
> > > > It would be good provide additional direction for user space packages:
> > > >
> > > > The legacy behavior is supported for now in order to avoid breaking
> > > > existing software.
> > > > However, reading counters directly without perf interaction may
> > > > provide incorrect values which
> > > > the userspace software must avoid. We are hoping that the user space
> > > > packages which
> > > > read the cycle/instret directly, will move to the proper interface
> > > > eventually if they actually need it.
> > > > Most of the users are supposed to read "time" instead of "cycle" if
> > > > they intend to read timestamps.
> > >
> > > If you are trying to measure the performance of short code
> > > fragments then you need pretty much raw access directly to
> > > the cycle/clock count register.
> > >
> > > I've done this on x86 to compare the actual cycle times
> > > of different implementations of the IP checksum loop
> > > (and compare them to the theoretical limit).
> > > The perf framework just added far too much latency,
> > > only directly reading the cpu registers gave anything
> > > like reliable (and consistent) answers.
> > >
> >
> > This series allows direct access to the counters once configured
> > through the perf.
> > Earlier the cycle/instret counters are directly exposed to the
> > userspace without kernel/perf frameworking knowing
> > when/which user space application is reading it. That has security implications.
> >
> > With this series applied, the user space application just needs to
> > configure the event (cycle/instret) through perf syscall.
> > Once configured, the userspace application can find out the counter
> > information from the mmap & directly
> > read the counter. There is no latency while reading the counters.
> >
> > This mechanism allows stop/clear the counters when the requesting task
> > is not running. It also takes care of context switching
> > which may result in invalid values as you mentioned below. This is
> > nothing new and all other arch (x86, ARM64) allow user space
> > counter read through the same mechanism.
> >
> > Here is the relevant upstream discussion:
> > https://lore.kernel.org/lkml/Y7wLa7I2hlz3rKw%2F@hirez.programming.kicks-ass.net/T/
> >
> > ARM64:
> > https://docs.kernel.org/arm64/perf.html?highlight=perf_user_access#perf-userspace-pmu-hardware-counter-access
> >
> > example usage in x86:
> > https://github.com/andikleen/pmu-tools/blob/master/jevents/rdpmc.c
>
> The canonical implementation of this should be:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/mmap.c#n400

Thanks for sharing the libperf implementation.

> which is updated in these patches but the tests are not:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/tests/mmap-basic.c#n287
> Which appears to be an oversight. The tests display some differences

Yes. It's an oversight. We should make sure that perf mmap tests pass
for RISC-V as well.


> between x86 and aarch64 that have assumed userspace hardware counter
> access, and everything else that it is assumed don't.
>
> Thanks,
> Ian
>
> > > Clearly process switches (especially cpu migrations) cause
> > > problems, but they are obviously invalid values and can
> > > be ignored.
> > >
> > > So while a lot of uses may be 'happy' with the values the
> > > perf framework gives, sometimes you do need to directly
> > > read the relevant registers.
> > >
> > >         David
> > >
> > > -
> > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > > Registration No: 1397386 (Wales)
> >
> >
> >
> > --
> > Regards,
> > Atish



--
Regards,
Atish
Alexandre Ghiti April 19, 2023, 9:21 a.m. UTC | #7
Hi Ian,

On Tue, Apr 18, 2023 at 10:30 PM Atish Patra <atishp@atishpatra.org> wrote:
>
> On Tue, Apr 18, 2023 at 11:46 PM Ian Rogers <irogers@google.com> wrote:
> >
> > On Tue, Apr 18, 2023 at 9:43 AM Atish Patra <atishp@atishpatra.org> wrote:
> > >
> > > On Fri, Apr 14, 2023 at 2:40 AM David Laight <David.Laight@aculab.com> wrote:
> > > >
> > > > From: Atish Patra
> > > > > Sent: 13 April 2023 20:18
> > > > >
> > > > > On Thu, Apr 13, 2023 at 9:47 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > > >
> > > > > > riscv used to allow direct access to cycle/time/instret counters,
> > > > > > bypassing the perf framework, this patchset intends to allow the user to
> > > > > > mmap any counter when accessed through perf. But we can't break the
> > > > > > existing behaviour so we introduce a sysctl perf_user_access like arm64
> > > > > > does, which defaults to the legacy mode described above.
> > > > > >
> > > > >
> > > > > It would be good provide additional direction for user space packages:
> > > > >
> > > > > The legacy behavior is supported for now in order to avoid breaking
> > > > > existing software.
> > > > > However, reading counters directly without perf interaction may
> > > > > provide incorrect values which
> > > > > the userspace software must avoid. We are hoping that the user space
> > > > > packages which
> > > > > read the cycle/instret directly, will move to the proper interface
> > > > > eventually if they actually need it.
> > > > > Most of the users are supposed to read "time" instead of "cycle" if
> > > > > they intend to read timestamps.
> > > >
> > > > If you are trying to measure the performance of short code
> > > > fragments then you need pretty much raw access directly to
> > > > the cycle/clock count register.
> > > >
> > > > I've done this on x86 to compare the actual cycle times
> > > > of different implementations of the IP checksum loop
> > > > (and compare them to the theoretical limit).
> > > > The perf framework just added far too much latency,
> > > > only directly reading the cpu registers gave anything
> > > > like reliable (and consistent) answers.
> > > >
> > >
> > > This series allows direct access to the counters once configured
> > > through the perf.
> > > Earlier the cycle/instret counters are directly exposed to the
> > > userspace without kernel/perf frameworking knowing
> > > when/which user space application is reading it. That has security implications.
> > >
> > > With this series applied, the user space application just needs to
> > > configure the event (cycle/instret) through perf syscall.
> > > Once configured, the userspace application can find out the counter
> > > information from the mmap & directly
> > > read the counter. There is no latency while reading the counters.
> > >
> > > This mechanism allows stop/clear the counters when the requesting task
> > > is not running. It also takes care of context switching
> > > which may result in invalid values as you mentioned below. This is
> > > nothing new and all other arch (x86, ARM64) allow user space
> > > counter read through the same mechanism.
> > >
> > > Here is the relevant upstream discussion:
> > > https://lore.kernel.org/lkml/Y7wLa7I2hlz3rKw%2F@hirez.programming.kicks-ass.net/T/
> > >
> > > ARM64:
> > > https://docs.kernel.org/arm64/perf.html?highlight=perf_user_access#perf-userspace-pmu-hardware-counter-access
> > >
> > > example usage in x86:
> > > https://github.com/andikleen/pmu-tools/blob/master/jevents/rdpmc.c
> >
> > The canonical implementation of this should be:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/mmap.c#n400
>
> Thanks for sharing the libperf implementation.
>
> > which is updated in these patches but the tests are not:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/tests/mmap-basic.c#n287
> > Which appears to be an oversight. The tests display some differences
>
> Yes. It's an oversight. We should make sure that perf mmap tests pass
> for RISC-V as well.

Yes, that's an oversight, I had a local test adapted from this one but
forgot to update it afterwards, I'll do that in the next version.

Thanks for your quick feedbacks and sorry for being late,

Alex


>
>
> > between x86 and aarch64 that have assumed userspace hardware counter
> > access, and everything else that it is assumed don't.
> >
> > Thanks,
> > Ian
> >
> > > > Clearly process switches (especially cpu migrations) cause
> > > > problems, but they are obviously invalid values and can
> > > > be ignored.
> > > >
> > > > So while a lot of uses may be 'happy' with the values the
> > > > perf framework gives, sometimes you do need to directly
> > > > read the relevant registers.
> > > >
> > > >         David
> > > >
> > > > -
> > > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > > > Registration No: 1397386 (Wales)
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Atish
>
>
>
> --
> Regards,
> Atish
Ian Rogers April 19, 2023, 5:42 p.m. UTC | #8
On Wed, Apr 19, 2023 at 2:21 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> Hi Ian,
>
> On Tue, Apr 18, 2023 at 10:30 PM Atish Patra <atishp@atishpatra.org> wrote:
> >
> > On Tue, Apr 18, 2023 at 11:46 PM Ian Rogers <irogers@google.com> wrote:
> > >
> > > On Tue, Apr 18, 2023 at 9:43 AM Atish Patra <atishp@atishpatra.org> wrote:
> > > >
> > > > On Fri, Apr 14, 2023 at 2:40 AM David Laight <David.Laight@aculab.com> wrote:
> > > > >
> > > > > From: Atish Patra
> > > > > > Sent: 13 April 2023 20:18
> > > > > >
> > > > > > On Thu, Apr 13, 2023 at 9:47 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > > > >
> > > > > > > riscv used to allow direct access to cycle/time/instret counters,
> > > > > > > bypassing the perf framework, this patchset intends to allow the user to
> > > > > > > mmap any counter when accessed through perf. But we can't break the
> > > > > > > existing behaviour so we introduce a sysctl perf_user_access like arm64
> > > > > > > does, which defaults to the legacy mode described above.
> > > > > > >
> > > > > >
> > > > > > It would be good provide additional direction for user space packages:
> > > > > >
> > > > > > The legacy behavior is supported for now in order to avoid breaking
> > > > > > existing software.
> > > > > > However, reading counters directly without perf interaction may
> > > > > > provide incorrect values which
> > > > > > the userspace software must avoid. We are hoping that the user space
> > > > > > packages which
> > > > > > read the cycle/instret directly, will move to the proper interface
> > > > > > eventually if they actually need it.
> > > > > > Most of the users are supposed to read "time" instead of "cycle" if
> > > > > > they intend to read timestamps.
> > > > >
> > > > > If you are trying to measure the performance of short code
> > > > > fragments then you need pretty much raw access directly to
> > > > > the cycle/clock count register.
> > > > >
> > > > > I've done this on x86 to compare the actual cycle times
> > > > > of different implementations of the IP checksum loop
> > > > > (and compare them to the theoretical limit).
> > > > > The perf framework just added far too much latency,
> > > > > only directly reading the cpu registers gave anything
> > > > > like reliable (and consistent) answers.
> > > > >
> > > >
> > > > This series allows direct access to the counters once configured
> > > > through the perf.
> > > > Earlier the cycle/instret counters are directly exposed to the
> > > > userspace without kernel/perf frameworking knowing
> > > > when/which user space application is reading it. That has security implications.
> > > >
> > > > With this series applied, the user space application just needs to
> > > > configure the event (cycle/instret) through perf syscall.
> > > > Once configured, the userspace application can find out the counter
> > > > information from the mmap & directly
> > > > read the counter. There is no latency while reading the counters.
> > > >
> > > > This mechanism allows stop/clear the counters when the requesting task
> > > > is not running. It also takes care of context switching
> > > > which may result in invalid values as you mentioned below. This is
> > > > nothing new and all other arch (x86, ARM64) allow user space
> > > > counter read through the same mechanism.
> > > >
> > > > Here is the relevant upstream discussion:
> > > > https://lore.kernel.org/lkml/Y7wLa7I2hlz3rKw%2F@hirez.programming.kicks-ass.net/T/
> > > >
> > > > ARM64:
> > > > https://docs.kernel.org/arm64/perf.html?highlight=perf_user_access#perf-userspace-pmu-hardware-counter-access
> > > >
> > > > example usage in x86:
> > > > https://github.com/andikleen/pmu-tools/blob/master/jevents/rdpmc.c
> > >
> > > The canonical implementation of this should be:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/mmap.c#n400
> >
> > Thanks for sharing the libperf implementation.
> >
> > > which is updated in these patches but the tests are not:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/tests/mmap-basic.c#n287
> > > Which appears to be an oversight. The tests display some differences
> >
> > Yes. It's an oversight. We should make sure that perf mmap tests pass
> > for RISC-V as well.
>
> Yes, that's an oversight, I had a local test adapted from this one but
> forgot to update it afterwards, I'll do that in the next version.
>
> Thanks for your quick feedbacks and sorry for being late,
>
> Alex

Thanks Alex, there was an equally likely chance that I wasn't
understanding things :-) Is there any information on RISC-V PMU
testing? I know ParanLee is interested. It'd be awesome to have
something say on:
https://perf.wiki.kernel.org/index.php/Main_Page
on how to run tests, perhaps on QEMU or known to work boards. We can
also just drop a link on there if there is information. We can also
add the RISC-V PMU information to the links here:
https://perf.wiki.kernel.org/index.php/Useful_Links

Thanks,
Ian

>
> >
> >
> > > between x86 and aarch64 that have assumed userspace hardware counter
> > > access, and everything else that it is assumed don't.
> > >
> > > Thanks,
> > > Ian
> > >
> > > > > Clearly process switches (especially cpu migrations) cause
> > > > > problems, but they are obviously invalid values and can
> > > > > be ignored.
> > > > >
> > > > > So while a lot of uses may be 'happy' with the values the
> > > > > perf framework gives, sometimes you do need to directly
> > > > > read the relevant registers.
> > > > >
> > > > >         David
> > > > >
> > > > > -
> > > > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > > > > Registration No: 1397386 (Wales)
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Atish
> >
> >
> >
> > --
> > Regards,
> > Atish
Atish Patra April 19, 2023, 11:21 p.m. UTC | #9
On Wed, Apr 19, 2023 at 11:13 PM Ian Rogers <irogers@google.com> wrote:
>
> On Wed, Apr 19, 2023 at 2:21 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> >
> > Hi Ian,
> >
> > On Tue, Apr 18, 2023 at 10:30 PM Atish Patra <atishp@atishpatra.org> wrote:
> > >
> > > On Tue, Apr 18, 2023 at 11:46 PM Ian Rogers <irogers@google.com> wrote:
> > > >
> > > > On Tue, Apr 18, 2023 at 9:43 AM Atish Patra <atishp@atishpatra.org> wrote:
> > > > >
> > > > > On Fri, Apr 14, 2023 at 2:40 AM David Laight <David.Laight@aculab.com> wrote:
> > > > > >
> > > > > > From: Atish Patra
> > > > > > > Sent: 13 April 2023 20:18
> > > > > > >
> > > > > > > On Thu, Apr 13, 2023 at 9:47 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > > > > >
> > > > > > > > riscv used to allow direct access to cycle/time/instret counters,
> > > > > > > > bypassing the perf framework, this patchset intends to allow the user to
> > > > > > > > mmap any counter when accessed through perf. But we can't break the
> > > > > > > > existing behaviour so we introduce a sysctl perf_user_access like arm64
> > > > > > > > does, which defaults to the legacy mode described above.
> > > > > > > >
> > > > > > >
> > > > > > > It would be good provide additional direction for user space packages:
> > > > > > >
> > > > > > > The legacy behavior is supported for now in order to avoid breaking
> > > > > > > existing software.
> > > > > > > However, reading counters directly without perf interaction may
> > > > > > > provide incorrect values which
> > > > > > > the userspace software must avoid. We are hoping that the user space
> > > > > > > packages which
> > > > > > > read the cycle/instret directly, will move to the proper interface
> > > > > > > eventually if they actually need it.
> > > > > > > Most of the users are supposed to read "time" instead of "cycle" if
> > > > > > > they intend to read timestamps.
> > > > > >
> > > > > > If you are trying to measure the performance of short code
> > > > > > fragments then you need pretty much raw access directly to
> > > > > > the cycle/clock count register.
> > > > > >
> > > > > > I've done this on x86 to compare the actual cycle times
> > > > > > of different implementations of the IP checksum loop
> > > > > > (and compare them to the theoretical limit).
> > > > > > The perf framework just added far too much latency,
> > > > > > only directly reading the cpu registers gave anything
> > > > > > like reliable (and consistent) answers.
> > > > > >
> > > > >
> > > > > This series allows direct access to the counters once configured
> > > > > through the perf.
> > > > > Earlier the cycle/instret counters are directly exposed to the
> > > > > userspace without kernel/perf frameworking knowing
> > > > > when/which user space application is reading it. That has security implications.
> > > > >
> > > > > With this series applied, the user space application just needs to
> > > > > configure the event (cycle/instret) through perf syscall.
> > > > > Once configured, the userspace application can find out the counter
> > > > > information from the mmap & directly
> > > > > read the counter. There is no latency while reading the counters.
> > > > >
> > > > > This mechanism allows stop/clear the counters when the requesting task
> > > > > is not running. It also takes care of context switching
> > > > > which may result in invalid values as you mentioned below. This is
> > > > > nothing new and all other arch (x86, ARM64) allow user space
> > > > > counter read through the same mechanism.
> > > > >
> > > > > Here is the relevant upstream discussion:
> > > > > https://lore.kernel.org/lkml/Y7wLa7I2hlz3rKw%2F@hirez.programming.kicks-ass.net/T/
> > > > >
> > > > > ARM64:
> > > > > https://docs.kernel.org/arm64/perf.html?highlight=perf_user_access#perf-userspace-pmu-hardware-counter-access
> > > > >
> > > > > example usage in x86:
> > > > > https://github.com/andikleen/pmu-tools/blob/master/jevents/rdpmc.c
> > > >
> > > > The canonical implementation of this should be:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/mmap.c#n400
> > >
> > > Thanks for sharing the libperf implementation.
> > >
> > > > which is updated in these patches but the tests are not:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/tests/mmap-basic.c#n287
> > > > Which appears to be an oversight. The tests display some differences
> > >
> > > Yes. It's an oversight. We should make sure that perf mmap tests pass
> > > for RISC-V as well.
> >
> > Yes, that's an oversight, I had a local test adapted from this one but
> > forgot to update it afterwards, I'll do that in the next version.
> >
> > Thanks for your quick feedbacks and sorry for being late,
> >
> > Alex
>
> Thanks Alex, there was an equally likely chance that I wasn't
> understanding things :-) Is there any information on RISC-V PMU
> testing? I know ParanLee is interested. It'd be awesome to have

Are you looking for something specific to RISC-V general or perf on RISC-V?
All the RISC-V PMU patches have been upstream for a while (both in the
Qemu & Linux kernel).
Perf should work out-of-the box when you boot the latest kernel in the
latest version of the Qemu.

Initial KVM[1] patches support got merged during the last merge
window. It doesn't support
event sampling yet. We are working on that.

[1] https://lore.kernel.org/lkml/20230207095529.1787260-1-atishp@rivosinc.com/

> something say on:
> https://perf.wiki.kernel.org/index.php/Main_Page
> on how to run tests, perhaps on QEMU or known to work boards. We can
> also just drop a link on there if there is information. We can also
> add the RISC-V PMU information to the links here:
> https://perf.wiki.kernel.org/index.php/Useful_Links
>

I did not see any arch specific information there. Let us know what
would be good to
add there and we would be happy to add.

> Thanks,
> Ian
>
> >
> > >
> > >
> > > > between x86 and aarch64 that have assumed userspace hardware counter
> > > > access, and everything else that it is assumed don't.
> > > >
> > > > Thanks,
> > > > Ian
> > > >
> > > > > > Clearly process switches (especially cpu migrations) cause
> > > > > > problems, but they are obviously invalid values and can
> > > > > > be ignored.
> > > > > >
> > > > > > So while a lot of uses may be 'happy' with the values the
> > > > > > perf framework gives, sometimes you do need to directly
> > > > > > read the relevant registers.
> > > > > >
> > > > > >         David
> > > > > >
> > > > > > -
> > > > > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > > > > > Registration No: 1397386 (Wales)
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Atish
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Atish
Ian Rogers April 20, 2023, 12:31 a.m. UTC | #10
On Wed, Apr 19, 2023 at 4:22 PM Atish Patra <atishp@atishpatra.org> wrote:
>
> On Wed, Apr 19, 2023 at 11:13 PM Ian Rogers <irogers@google.com> wrote:
> >
> > On Wed, Apr 19, 2023 at 2:21 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > >
> > > Hi Ian,
> > >
> > > On Tue, Apr 18, 2023 at 10:30 PM Atish Patra <atishp@atishpatra.org> wrote:
> > > >
> > > > On Tue, Apr 18, 2023 at 11:46 PM Ian Rogers <irogers@google.com> wrote:
> > > > >
> > > > > On Tue, Apr 18, 2023 at 9:43 AM Atish Patra <atishp@atishpatra.org> wrote:
> > > > > >
> > > > > > On Fri, Apr 14, 2023 at 2:40 AM David Laight <David.Laight@aculab.com> wrote:
> > > > > > >
> > > > > > > From: Atish Patra
> > > > > > > > Sent: 13 April 2023 20:18
> > > > > > > >
> > > > > > > > On Thu, Apr 13, 2023 at 9:47 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > > > > > >
> > > > > > > > > riscv used to allow direct access to cycle/time/instret counters,
> > > > > > > > > bypassing the perf framework, this patchset intends to allow the user to
> > > > > > > > > mmap any counter when accessed through perf. But we can't break the
> > > > > > > > > existing behaviour so we introduce a sysctl perf_user_access like arm64
> > > > > > > > > does, which defaults to the legacy mode described above.
> > > > > > > > >
> > > > > > > >
> > > > > > > > It would be good provide additional direction for user space packages:
> > > > > > > >
> > > > > > > > The legacy behavior is supported for now in order to avoid breaking
> > > > > > > > existing software.
> > > > > > > > However, reading counters directly without perf interaction may
> > > > > > > > provide incorrect values which
> > > > > > > > the userspace software must avoid. We are hoping that the user space
> > > > > > > > packages which
> > > > > > > > read the cycle/instret directly, will move to the proper interface
> > > > > > > > eventually if they actually need it.
> > > > > > > > Most of the users are supposed to read "time" instead of "cycle" if
> > > > > > > > they intend to read timestamps.
> > > > > > >
> > > > > > > If you are trying to measure the performance of short code
> > > > > > > fragments then you need pretty much raw access directly to
> > > > > > > the cycle/clock count register.
> > > > > > >
> > > > > > > I've done this on x86 to compare the actual cycle times
> > > > > > > of different implementations of the IP checksum loop
> > > > > > > (and compare them to the theoretical limit).
> > > > > > > The perf framework just added far too much latency,
> > > > > > > only directly reading the cpu registers gave anything
> > > > > > > like reliable (and consistent) answers.
> > > > > > >
> > > > > >
> > > > > > This series allows direct access to the counters once configured
> > > > > > through the perf.
> > > > > > Earlier the cycle/instret counters are directly exposed to the
> > > > > > userspace without kernel/perf frameworking knowing
> > > > > > when/which user space application is reading it. That has security implications.
> > > > > >
> > > > > > With this series applied, the user space application just needs to
> > > > > > configure the event (cycle/instret) through perf syscall.
> > > > > > Once configured, the userspace application can find out the counter
> > > > > > information from the mmap & directly
> > > > > > read the counter. There is no latency while reading the counters.
> > > > > >
> > > > > > This mechanism allows stop/clear the counters when the requesting task
> > > > > > is not running. It also takes care of context switching
> > > > > > which may result in invalid values as you mentioned below. This is
> > > > > > nothing new and all other arch (x86, ARM64) allow user space
> > > > > > counter read through the same mechanism.
> > > > > >
> > > > > > Here is the relevant upstream discussion:
> > > > > > https://lore.kernel.org/lkml/Y7wLa7I2hlz3rKw%2F@hirez.programming.kicks-ass.net/T/
> > > > > >
> > > > > > ARM64:
> > > > > > https://docs.kernel.org/arm64/perf.html?highlight=perf_user_access#perf-userspace-pmu-hardware-counter-access
> > > > > >
> > > > > > example usage in x86:
> > > > > > https://github.com/andikleen/pmu-tools/blob/master/jevents/rdpmc.c
> > > > >
> > > > > The canonical implementation of this should be:
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/mmap.c#n400
> > > >
> > > > Thanks for sharing the libperf implementation.
> > > >
> > > > > which is updated in these patches but the tests are not:
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/tests/mmap-basic.c#n287
> > > > > Which appears to be an oversight. The tests display some differences
> > > >
> > > > Yes. It's an oversight. We should make sure that perf mmap tests pass
> > > > for RISC-V as well.
> > >
> > > Yes, that's an oversight, I had a local test adapted from this one but
> > > forgot to update it afterwards, I'll do that in the next version.
> > >
> > > Thanks for your quick feedbacks and sorry for being late,
> > >
> > > Alex
> >
> > Thanks Alex, there was an equally likely chance that I wasn't
> > understanding things :-) Is there any information on RISC-V PMU
> > testing? I know ParanLee is interested. It'd be awesome to have
>
> Are you looking for something specific to RISC-V general or perf on RISC-V?
> All the RISC-V PMU patches have been upstream for a while (both in the
> Qemu & Linux kernel).
> Perf should work out-of-the box when you boot the latest kernel in the
> latest version of the Qemu.
>
> Initial KVM[1] patches support got merged during the last merge
> window. It doesn't support
> event sampling yet. We are working on that.
>
> [1] https://lore.kernel.org/lkml/20230207095529.1787260-1-atishp@rivosinc.com/

Cool, it'd be nice to have a recipe for this from x86 Linux :-)

> > something say on:
> > https://perf.wiki.kernel.org/index.php/Main_Page
> > on how to run tests, perhaps on QEMU or known to work boards. We can
> > also just drop a link on there if there is information. We can also
> > add the RISC-V PMU information to the links here:
> > https://perf.wiki.kernel.org/index.php/Useful_Links
> >
>
> I did not see any arch specific information there. Let us know what
> would be good to
> add there and we would be happy to add.

I was specifically thinking under Manuals where the Intel, AMD and ARM
manuals are, links to the RISC-V documentation could be added.

Thanks,
Ian

> > Thanks,
> > Ian
> >
> > >
> > > >
> > > >
> > > > > between x86 and aarch64 that have assumed userspace hardware counter
> > > > > access, and everything else that it is assumed don't.
> > > > >
> > > > > Thanks,
> > > > > Ian
> > > > >
> > > > > > > Clearly process switches (especially cpu migrations) cause
> > > > > > > problems, but they are obviously invalid values and can
> > > > > > > be ignored.
> > > > > > >
> > > > > > > So while a lot of uses may be 'happy' with the values the
> > > > > > > perf framework gives, sometimes you do need to directly
> > > > > > > read the relevant registers.
> > > > > > >
> > > > > > >         David
> > > > > > >
> > > > > > > -
> > > > > > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > > > > > > Registration No: 1397386 (Wales)
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > > Atish
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Atish
>
>
>
> --
> Regards,
> Atish