drivers/perf: arm-ccn: stop spamming dmesg in event_init

Message ID	20180504104117.8086-1-mark.rutland@arm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> From: Mark Rutland <mark.rutland@arm.com> To: linux-arm-kernel@lists.infradead.org, will.deacon@arm.com Subject: [PATCH] drivers/perf: arm-ccn: stop spamming dmesg in event_init Date: Fri, 4 May 2018 11:41:17 +0100 Message-Id: <20180504104117.8086-1-mark.rutland@arm.com> Precedence: list Cc: Mark Rutland <mark.rutland@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Mark Rutland May 4, 2018, 10:41 a.m. UTC

The ARM CCN PMU driver uses dev_warn() to complain about parameters in
the user-provided perf_event_attr. This means that under normal
operation (e.g. a single invocation of the perf tool), dmesg may be
spammed with multiple messages.

Tools may issue multiple syscalls to probe for feature support, and
multiple applications (from multiple users) can attempt to open events
simultaneously, so this is not very helpful, even if a user happens to
have access to dmesg. Worse, this can push important information out of
the dmesg ring buffer, and can significantly slow down syscall fuzzers,
vastly increasing the time it takes to find critical bugs.

Demote the dev_warn() instances to dev_dbg(), as is the case for all
other PMU drivers under drivers/perf/. Users who wish to debug PMU event
initialisation can enable dynamic debug to receive these messages.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 drivers/perf/arm-ccn.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

Kim Phillips May 8, 2018, 11:40 p.m. UTC | #1

On Fri,  4 May 2018 11:41:17 +0100
Mark Rutland <mark.rutland@arm.com> wrote:

[adding Pawel, arm-ccn driver author]

> The ARM CCN PMU driver uses dev_warn() to complain about parameters in
> the user-provided perf_event_attr. This means that under normal
> operation (e.g. a single invocation of the perf tool), dmesg may be
> spammed with multiple messages.

Nothing new there:  perf does that all the time: E.g., a single, basic
x86 invocation causes all this:

[77360.630571] perf: interrupt took too long (2549 > 2500), lowering kernel.perf_event_max_sample_rate to 78250
[77360.630800] perf: interrupt took too long (3492 > 3186), lowering kernel.perf_event_max_sample_rate to 57250
[77360.631045] perf: interrupt took too long (4379 > 4365), lowering kernel.perf_event_max_sample_rate to 45500
[77360.631437] perf: interrupt took too long (5845 > 5473), lowering kernel.perf_event_max_sample_rate to 34000
[77360.631967] perf: interrupt took too long (7736 > 7306), lowering kernel.perf_event_max_sample_rate to 25750
[77360.632520] perf: interrupt took too long (9921 > 9670), lowering kernel.perf_event_max_sample_rate to 20000
[77360.633344] perf: interrupt took too long (12778 > 12401), lowering kernel.perf_event_max_sample_rate to 15500
[77360.634294] perf: interrupt took too long (16030 > 15972), lowering kernel.perf_event_max_sample_rate to 12250
[77360.635587] perf: interrupt took too long (20105 > 20037), lowering kernel.perf_event_max_sample_rate to 9750
[77360.637301] perf: interrupt took too long (25319 > 25131), lowering kernel.perf_event_max_sample_rate to 7750

> Tools may issue multiple syscalls to probe for feature support, and

Why isn't it helpful?  A user can see the tool is trying different
options, and as they are tried, to see which ones are they can do
something about.  Arm-ccn has plenty of h/w specific errors, and users
need to know things like node specific details.

> multiple applications (from multiple users) can attempt to open events
> simultaneously, so this is not very helpful, 

Does running perf on the same PMU h/w against other users even work?
Is it even practical, if they get it to?  In any case, for Arm-ccn,
since it's system-wide PMU h/w, this use-case is completely unrealistic.

> even if a user happens to have access to dmesg.

Again, unrealistic:  If they have access to run arm-ccn, kptrs are
unrestricted, so they have way more access than to just dmesg.

> Worse, this can push important information out of
> the dmesg ring buffer,

Like what, specifically?  People about to run perf do so on a stable
system, for measurement of performance accuracy reasons.   Anyway,
there are logging daemons to help with that, and arm-ccn doesn't emit
*that* many messages such as to even come close to filling the buffer.

> and can significantly slow down syscall fuzzers,
> vastly increasing the time it takes to find critical bugs.

Facilitating first user experience  - esp. for Arm perf - trumps fuzzer
runner convenience any day, in my book.  Fuzzer runners can patch their
kernels prior to running the fuzzers.

> Demote the dev_warn() instances to dev_dbg(), as is the case for all
> other PMU drivers under drivers/perf/. Users who wish to debug PMU event
> initialisation can enable dynamic debug to receive these messages.

After having already debugged things to the point where they find the
problem is the driver's event_init() function?  Then they find they
have to recompile their kernels yet again, with DYNAMIC_DEBUG set?

No, please, this is a definite usability regression for Arm-ccn users.

In fact, are you sure it shouldn't be the other way around?:
{kernel,PMU driver} developers that wish to run fuzzers should disable
warnings using /proc/sys/kernel/printk, or boot with a different
loglevel on the boot command line.  I think our users would appreciate
that more than this patch.

Thanks,

Kim

Pawel Moll May 9, 2018, 4:06 p.m. UTC | #2

On Wed, 2018-05-09 at 00:40 +0100, Kim Phillips wrote:
> On Fri,  4 May 2018 11:41:17 +0100 Mark Rutland <mark.rutland@arm.com> wrote:
> 
> [adding Pawel, arm-ccn driver author]

We had this discussion way too many times for my liking. As I said
before - *I* will be fine with the debug messages in the CCN driver.
Now, if there ever turns out to be other user of this thing and gets
into problems with event configuration, I'd hope that he/she can count
on support from the knowledgable people here... (just checked and both
RHEL 7.5 and Ubuntu 16.04.3 kernels on AArch64 come with
CONFIG_DYNAMIC_DEBUG=y, so it's a matter of explaining what has to be
enabled where; sadly this information did not find its way into the
commit description)

> > The ARM CCN PMU driver uses dev_warn() to complain about parameters
> in
> > the user-provided perf_event_attr. This means that under normal
> > operation (e.g. a single invocation of the perf tool), dmesg may be
> > spammed with multiple messages.

Surely Mark, in his role as maintainer of drivers/perf/ (and a few
other locations), meant to use much more technical and emotion-free
subject, along the lines of "reduce a number of dmesg warnings at event
init".

Regards

Paweł

Mark Rutland May 9, 2018, 4:12 p.m. UTC | #3

On Wed, May 09, 2018 at 05:06:43PM +0100, Pawel Moll wrote:
> On Wed, 2018-05-09 at 00:40 +0100, Kim Phillips wrote:
> > On Fri,  4 May 2018 11:41:17 +0100 Mark Rutland <mark.rutland@arm.com> wrote:
> > 
> > [adding Pawel, arm-ccn driver author]
> 
> We had this discussion way too many times for my liking. As I said
> before - *I* will be fine with the debug messages in the CCN driver.

> > > The ARM CCN PMU driver uses dev_warn() to complain about parameters
> > in
> > > the user-provided perf_event_attr. This means that under normal
> > > operation (e.g. a single invocation of the perf tool), dmesg may be
> > > spammed with multiple messages.
> 
> Surely Mark, in his role as maintainer of drivers/perf/ (and a few
> other locations), meant to use much more technical and emotion-free
> subject, along the lines of "reduce a number of dmesg warnings at event
> init".

True. I'll reword the above to:

  The ARM CCN PMU driver uses dev_warn() to complain about parameters in
  the user-provided perf_event_attr. This means that under normal
  operation (e.g. a single invocation of the perf tool), a number of
  messages warnings may be logged to dmesg.

... with that, can I take your ack?

Thanks,
Mark.

Kim Phillips May 9, 2018, 11:41 p.m. UTC | #4

On Wed, 9 May 2018 17:06:43 +0100
Pawel Moll <pawel.moll@arm.com> wrote:

> On Wed, 2018-05-09 at 00:40 +0100, Kim Phillips wrote:
> > On Fri,  4 May 2018 11:41:17 +0100 Mark Rutland <mark.rutland@arm.com> wrote:
> > 
> > [adding Pawel, arm-ccn driver author]
> 
> We had this discussion way too many times for my liking. As I said
> before - *I* will be fine with the debug messages in the CCN driver.
> Now, if there ever turns out to be other user of this thing and gets

Sure, this isn't just about Pawel using the driver he wrote - we know
you know how to use it because you wrote it.  No, it's about all the
other potential users out there, esp. first time users, as I once was.

> into problems with event configuration, I'd hope that he/she can count
> on support from the knowledgable people here... (just checked and both

I abhor having to suggest our users email this list in order to find out
how to use the PMU drivers.  First time users are going to tend to
steer completely away if they don't have the patience to debug a silent
PMU, rather than email this mailing list - sorry, but that's just
adding a huge usage barrier - for what - fuzzer runner's convenience?

> RHEL 7.5 and Ubuntu 16.04.3 kernels on AArch64 come with
> CONFIG_DYNAMIC_DEBUG=y, so it's a matter of explaining what has to be
> enabled where; sadly this information did not find its way into the
> commit description)

I can't think of a more difficult to find, more far-away, alien
place for end users to find help with perf, sorry.

> > > The ARM CCN PMU driver uses dev_warn() to complain about parameters
> > in
> > > the user-provided perf_event_attr. This means that under normal
> > > operation (e.g. a single invocation of the perf tool), dmesg may be
> > > spammed with multiple messages.
> 
> Surely Mark, in his role as maintainer of drivers/perf/ (and a few
> other locations), meant to use much more technical and emotion-free
> subject, along the lines of "reduce a number of dmesg warnings at event
> init".

'reduce a number' is the wrong word:  warnings are completely
eliminated. Debug-level messages occur at exactly the same
frequency/amount.

But I still object to the rationale overall - it seems this is about
running fuzzers?  I even offered an alternative for fuzzer runners: is
modifying the loglevel prior to fuzzing somehow unacceptable?

Kim

Kim Phillips May 10, 2018, 12:02 a.m. UTC | #5

On Wed, 9 May 2018 17:12:42 +0100
Mark Rutland <mark.rutland@arm.com> wrote:

> On Wed, May 09, 2018 at 05:06:43PM +0100, Pawel Moll wrote:
> > On Wed, 2018-05-09 at 00:40 +0100, Kim Phillips wrote:
> > > On Fri,  4 May 2018 11:41:17 +0100 Mark Rutland <mark.rutland@arm.com> wrote:
> > > 
> > > [adding Pawel, arm-ccn driver author]
> > 
> > We had this discussion way too many times for my liking. As I said
> > before - *I* will be fine with the debug messages in the CCN driver.
> 
> > > > The ARM CCN PMU driver uses dev_warn() to complain about parameters
> > > in
> > > > the user-provided perf_event_attr. This means that under normal
> > > > operation (e.g. a single invocation of the perf tool), dmesg may be
> > > > spammed with multiple messages.
> > 
> > Surely Mark, in his role as maintainer of drivers/perf/ (and a few
> > other locations), meant to use much more technical and emotion-free
> > subject, along the lines of "reduce a number of dmesg warnings at event
> > init".
> 
> True. I'll reword the above to:
> 
>   The ARM CCN PMU driver uses dev_warn() to complain about parameters in
>   the user-provided perf_event_attr. This means that under normal
>   operation (e.g. a single invocation of the perf tool), a number of
>   messages warnings may be logged to dmesg.

So from:

"...dmesg may be spammed with multiple messages."

to:

"...a number of messages warnings may be logged to dmesg."

Not sure what's *that* different between the two, but is the assumption
that "normal operation" - i.e., a single *valid* perf invocation -
still emits messages?  In which case, could the misunderstanding here
possibly be the following?:

Assuming I know what node is valid, I can still get messages:

$ dmesg -w &
$ sudo ./perf stat -e ccn/hnf_cache_miss,node=8/ find /usr > /dev/null
[626610.173759] arm-ccn e8000000.ccn: Can't exclude execution levels!
[626610.173790] arm-ccn e8000000.ccn: Can't exclude execution levels!

 Performance counter stats for 'system wide':

         1,096,602      ccn/hnf_cache_miss,node=8/                                   

       3.708314082 seconds time elapsed

But the driver is still technically correct in that it WARNs (not
invisibly DEBUGs) it's users about the EL problem.  In order to not
exclude the default execution levels, we postpend 'hku' to the event,
so now the EL messages are gone:

$ sudo ./perf stat -e ccn/hnf_cache_miss,node=8/hku find /usr > /dev/null

 Performance counter stats for 'system wide':

         1,086,521      ccn/hnf_cache_miss,node=8/hku                                   

       3.699726648 seconds time elapsed

What's important to keep the user warned about with arm-ccn is these
types of messages:

$ sudo ./perf stat -e ccn/hnf_cache_miss,node=1/hku find /usr > /dev/null
[626675.574828] arm-ccn e8000000.ccn: Invalid type 0x4 for node 1!
[626675.574841] arm-ccn e8000000.ccn: Invalid type 0x4 for node 1!

 Performance counter stats for 'system wide':

   <not supported>      ccn/hnf_cache_miss,node=1/                                   

       3.686169559 seconds time elapsed

See?  Now the user is pointed directly at the problem.

I've even enhanced the tool to help identify to the user the event with
an unsupported sampling error (commit 114bc191c370 "perf evsel: Say
which PMU Hardware event doesn't support sampling/overflow-interrupts").

BTW, the dmesg output above was only two lines per effective message:
That should be totally acceptable - it's only one line more than one
line per message.  Are you seeing much more than that?  If so, can we
see an example of an invocation that's completely unacceptable,
dmesg-volume-wise?

Thanks,

Kim

Florian Fainelli May 10, 2018, 12:22 a.m. UTC | #6

On 05/09/2018 04:41 PM, Kim Phillips wrote:
> On Wed, 9 May 2018 17:06:43 +0100
> Pawel Moll <pawel.moll@arm.com> wrote:
> 
>> On Wed, 2018-05-09 at 00:40 +0100, Kim Phillips wrote:
>>> On Fri,  4 May 2018 11:41:17 +0100 Mark Rutland <mark.rutland@arm.com> wrote:
>>>
>>> [adding Pawel, arm-ccn driver author]
>>
>> We had this discussion way too many times for my liking. As I said
>> before - *I* will be fine with the debug messages in the CCN driver.
>> Now, if there ever turns out to be other user of this thing and gets
> 
> Sure, this isn't just about Pawel using the driver he wrote - we know
> you know how to use it because you wrote it.  No, it's about all the
> other potential users out there, esp. first time users, as I once was.
> 
>> into problems with event configuration, I'd hope that he/she can count
>> on support from the knowledgable people here... (just checked and both
> 
> I abhor having to suggest our users email this list in order to find out
> how to use the PMU drivers.  First time users are going to tend to
> steer completely away if they don't have the patience to debug a silent
> PMU, rather than email this mailing list - sorry, but that's just
> adding a huge usage barrier - for what - fuzzer runner's convenience?
> 
>> RHEL 7.5 and Ubuntu 16.04.3 kernels on AArch64 come with
>> CONFIG_DYNAMIC_DEBUG=y, so it's a matter of explaining what has to be
>> enabled where; sadly this information did not find its way into the
>> commit description)
> 
> I can't think of a more difficult to find, more far-away, alien
> place for end users to find help with perf, sorry.
> 
>>>> The ARM CCN PMU driver uses dev_warn() to complain about parameters
>>> in
>>>> the user-provided perf_event_attr. This means that under normal
>>>> operation (e.g. a single invocation of the perf tool), dmesg may be
>>>> spammed with multiple messages.
>>
>> Surely Mark, in his role as maintainer of drivers/perf/ (and a few
>> other locations), meant to use much more technical and emotion-free
>> subject, along the lines of "reduce a number of dmesg warnings at event
>> init".
> 
> 'reduce a number' is the wrong word:  warnings are completely
> eliminated. Debug-level messages occur at exactly the same
> frequency/amount.
> 
> But I still object to the rationale overall - it seems this is about
> running fuzzers?  I even offered an alternative for fuzzer runners: is
> modifying the loglevel prior to fuzzing somehow unacceptable?

I don't have any dog in this, but maybe if providing information to the
users is so essential to having a pleasant user experience, then
rethinking the whole way these messages are funneled is necessary
because the kernel log + dmesg is by no means appropriate. Take a look
at what the networking maintainers recently did with netlink extended
ack. You used to just get an "unsupported" error, and now you know
exactly what is wrong when extack is available. It seems to me like
something like this is what you might want here since you want to have
perf be as user friendly as possible.

My 2 cents.

Pawel Moll May 11, 2018, 1:12 p.m. UTC | #7

Apologies about delay - I was away yesterday. And advance apologies
about - most likely - not participating in discussions in the next 4
weeks, but I'll be on holiday in rather remote areas.

On Wed, 2018-05-09 at 17:12 +0100, Mark Rutland wrote:
> True. I'll reword the above to:
> 
>   The ARM CCN PMU driver uses dev_warn() to complain about parameters
> in
>   the user-provided perf_event_attr. This means that under normal
>   operation (e.g. a single invocation of the perf tool), a number of
>   messages warnings may be logged to dmesg.
> 
> ... with that, can I take your ack?

I'm by no means enthusiastic nor happy about this change, so I'd rather
say "you won't get my NAK" (although even this wouldn't matter much, I
believe...)

Paweł

Pawel Moll May 11, 2018, 1:16 p.m. UTC | #8

On Thu, 2018-05-10 at 01:22 +0100, Florian Fainelli wrote:
> I don't have any dog in this, but maybe if providing information to
> the
> users is so essential to having a pleasant user experience, then
> rethinking the whole way these messages are funneled is necessary
> because the kernel log + dmesg is by no means appropriate. Take a look
> at what the networking maintainers recently did with netlink extended
> ack. You used to just get an "unsupported" error, and now you know
> exactly what is wrong when extack is available. It seems to me like
> something like this is what you might want here since you want to have
> perf be as user friendly as possible.
> 
> My 2 cents.

I couldn't agree more. I find the advice "check the dmesg, you may find
something there" that perf tool uses currently... well, pretty lame
(although this was exactly the reason I started dmesg-ing error details
in the first place).

One could play the "you're breaking userspace promise" card here, but
considering that the number of people actually using CCN PMU is...
small, I don't want to do this.

But so far no one was determined enough to get better error reporting
in place. I have ran out of time when I tried it 2 or 3 years ago, I
believe Kim has discussed some approaches and received pushback. Kim
would be better suited to summarize what has happened so far...

Regards

Paweł

Kim Phillips May 17, 2018, 12:25 a.m. UTC | #9

[adding acme, LKML, linux-perf-users, and other potentially interested]

On Wed, 9 May 2018 17:22:42 -0700
Florian Fainelli <f.fainelli@gmail.com> wrote:

> On 05/09/2018 04:41 PM, Kim Phillips wrote:
> > On Wed, 9 May 2018 17:06:43 +0100
> > Pawel Moll <pawel.moll@arm.com> wrote:
> > 
> >> On Wed, 2018-05-09 at 00:40 +0100, Kim Phillips wrote:
> >>> On Fri,  4 May 2018 11:41:17 +0100 Mark Rutland <mark.rutland@arm.com> wrote:
> >>>
> >>> [adding Pawel, arm-ccn driver author]
> >>
> >> We had this discussion way too many times for my liking. As I said
> >> before - *I* will be fine with the debug messages in the CCN driver.
> >> Now, if there ever turns out to be other user of this thing and gets
> > 
> > Sure, this isn't just about Pawel using the driver he wrote - we know
> > you know how to use it because you wrote it.  No, it's about all the
> > other potential users out there, esp. first time users, as I once was.
> > 
> >> into problems with event configuration, I'd hope that he/she can count
> >> on support from the knowledgable people here... (just checked and both
> > 
> > I abhor having to suggest our users email this list in order to find out
> > how to use the PMU drivers.  First time users are going to tend to
> > steer completely away if they don't have the patience to debug a silent
> > PMU, rather than email this mailing list - sorry, but that's just
> > adding a huge usage barrier - for what - fuzzer runner's convenience?
> > 
> >> RHEL 7.5 and Ubuntu 16.04.3 kernels on AArch64 come with
> >> CONFIG_DYNAMIC_DEBUG=y, so it's a matter of explaining what has to be
> >> enabled where; sadly this information did not find its way into the
> >> commit description)
> > 
> > I can't think of a more difficult to find, more far-away, alien
> > place for end users to find help with perf, sorry.
> > 
> >>>> The ARM CCN PMU driver uses dev_warn() to complain about parameters
> >>> in
> >>>> the user-provided perf_event_attr. This means that under normal
> >>>> operation (e.g. a single invocation of the perf tool), dmesg may be
> >>>> spammed with multiple messages.
> >>
> >> Surely Mark, in his role as maintainer of drivers/perf/ (and a few
> >> other locations), meant to use much more technical and emotion-free
> >> subject, along the lines of "reduce a number of dmesg warnings at event
> >> init".
> > 
> > 'reduce a number' is the wrong word:  warnings are completely
> > eliminated. Debug-level messages occur at exactly the same
> > frequency/amount.
> > 
> > But I still object to the rationale overall - it seems this is about
> > running fuzzers?  I even offered an alternative for fuzzer runners: is
> > modifying the loglevel prior to fuzzing somehow unacceptable?
> 
> I don't have any dog in this, but maybe if providing information to the
> users is so essential to having a pleasant user experience, then
> rethinking the whole way these messages are funneled is necessary
> because the kernel log + dmesg is by no means appropriate. Take a look
> at what the networking maintainers recently did with netlink extended
> ack. You used to just get an "unsupported" error, and now you know
> exactly what is wrong when extack is available. It seems to me like
> something like this is what you might want here since you want to have
> perf be as user friendly as possible.

Thanks, Florian.

Acme & other perf people, do you foresee a problem adding netlink
extended ack support to the perf subsystem for extended error message
communication?

If not, is struct perf_event_attr amenable to having error reporting
bit(s) added?  Recall my last attempt failed because it couldn't
discriminate between the perf core and the PMU driver returning the
-Evalue:

https://patchwork.kernel.org/patch/10025555/

Thanks,

Kim

Matt Sealey May 17, 2018, 3:07 p.m. UTC | #10

Hi all,

> > I don't have any dog in this, but maybe if providing information to the
> > users is so essential to having a pleasant user experience, then
> > rethinking the whole way these messages are funneled is necessary
> > because the kernel log + dmesg is by no means appropriate. Take a look
> > at what the networking maintainers recently did with netlink extended
> > ack. You used to just get an "unsupported" error, and now you know
> > exactly what is wrong when extack is available. It seems to me like
> > something like this is what you might want here since you want to have
> > perf be as user friendly as possible.
>
> Thanks, Florian.

Florian, I'd love to know if you mean "implement netlink extended ack in
perf" or "do something idiomatic for perf"?

Let us assume we are semantically challenged over here. I'm going to
proceed from the latter.

> Acme & other perf people, do you foresee a problem adding netlink
> extended ack support to the perf subsystem for extended error message
> communication?
>
> If not, is struct perf_event_attr amenable to having error reporting
> bit(s) added?

I did have a think about this when Kim mentioned it in passing, and my
reasoning was that a serialized error record similar to ACPI APEI BERT/ERST
tables (without the NVRAM abstraction) would fit the need.

As soon as I wrote down my thoughts I realized it scratch more itches than
just something useful for perf.

It would conceptually be a buffer passed from userspace with the syscall,
which would be populated in the kernel with an identifying header, each
record denoting its own length. The syscall fills the buffer with records
of particular format for the syscall, and the errno returned to the
application is then the state of the error recording buffer - for instance,
if the kernel ran out of space in the buffer before reporting all the errors
it could (-ENOBUFS or -EAGAIN), if it stopped recording errors on something
that requires being contained and returned at that point (-EFAULT), and so
on (anyone who's got RAS on the brain will see where I'm coming from).

I have a distinct dislike of filling the kernel with const strings, so the
records would be strictly machine-readable and contain information about
the error for the record and the source. If userspace needs to print a
string then it can look up unique identifiers (UUIDv1, give or take, to
remove the need for any authority on numbering) in a database - be that
plaintext, gettext.po, bdb, sqlite, xml, json, C structure embedded in the
tool - one for each error source. That keeps strings, translations, string
formatting entirely outside the kernel, and keeps records from being freeform
typo-laden strings.

That'd give some generic Producer code in the kernel, and imply a companion
Consumer library in userspace (with said database backend), which could also
be responsible for logging the binary records somewhere for future reference
(perhaps bounded by capabilities or container privileges).

Pretty much every syscall that has problems returning 'just an errno'
could benefit from such a system, the only impediment I can see to it is
that it's adding a new subsystem to the kernel to produce these records,
and any syscall that needs it would have to gain either an extra parameter,
or attribute setting addition (like setsockopts) or to shoehorn the buffer
pointer into an existing parameter structure like perf_event_attr. That
would have to be locked down to a consistent method that would be
recommended (like adding a new syscall interface, not everyone has an
attribute interface or parameter structure) although there's no stopping
kernel devs adding in a way for legacy applications to easily receive
the same information through existing ABI.

In the case of perf_event_attr there is space enough to mark a bit to say
that errors could be reported in a buffer, but not enough in the reserved
space that exists to store a pointer for 64-bit systems (or 128-bit ones..)
without increasing its size. But, besides that, it would have the benefit
of simply being serialized with the syscall, and not a supplemental,
potentially non-thread-safe errno/strerror-like kernel-side implementation,
nor extra syscalls to retrieve information or arbitrary formatted or
unformatted strings.

Netlink extended ack could benefit from it simply by having been passed
an nlattr pointing to the buffer and recording extended error information
in it - the extended ack structure can report the status of the buffer
and use the cookie field to reproduce some information if necessary (which
extends the RAS error record concept further when needed).

Thoughts? Does anyone have any objections to a RAS-like error reporting
system for system calls, or any ideas on things that would benefit from
it above and beyond perf? We could always audit all the system calls and
their behaviors so we could do some worked examples but anyone who's got
a good candidate outside perf is welcome to suggest it.

Ta,
Matt




IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

drivers/perf: arm-ccn: stop spamming dmesg in event_init

Commit Message

Comments

Patch