diff mbox series

[RFC,1/2] devlink: add simple fw crash helpers

Message ID 20200519211531.3702593-1-kuba@kernel.org (mailing list archive)
State New, archived
Headers show
Series [RFC,1/2] devlink: add simple fw crash helpers | expand

Commit Message

Jakub Kicinski May 19, 2020, 9:15 p.m. UTC
Add infra for creating devlink instances for a device to report
fw crashes. This patch expects the devlink instance to be registered
at probe time. I belive to be the cleanest. We can also add a devm
version of the helpers, so that we don't have to do the clean up.
Or we can go even further and register the devlink instance only
once error has happened (for the first time, then we can just
find out if already registered by traversing the list like we
do here).

With the patch applied and a sample driver converted we get:

$ devlink dev
pci/0000:07:00.0

Then monitor for errors:

$ devlink mon health
[health,status] pci/0000:07:00.0:
  reporter fw
    state error error 1 recover 0
[health,status] pci/0000:07:00.0:
  reporter fw
    state error error 2 recover 0

These are the events I triggered on purpose. One can also inspect
the health of all devices capable of reporting fw errors:

$ devlink health
pci/0000:07:00.0:
  reporter fw
    state error error 7 recover 0

Obviously drivers may upgrade to the full devlink health API
which includes state dump, state dump auto-collect and automatic
error recovery control.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 include/linux/devlink.h               |  11 +++
 net/core/Makefile                     |   2 +-
 net/core/devlink_simple_fw_reporter.c | 101 ++++++++++++++++++++++++++
 3 files changed, 113 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/devlink.h
 create mode 100644 net/core/devlink_simple_fw_reporter.c

Comments

Luis Chamberlain May 22, 2020, 5:20 a.m. UTC | #1
On Tue, May 19, 2020 at 02:15:30PM -0700, Jakub Kicinski wrote:
> Add infra for creating devlink instances for a device to report

Thanks for doing this series as a PoC, counter to the module_firmware_crash()
which I proposed to taint the kernel with a firmware crash flag to the kernel
and module.

For those not famliar about devlink:

https://lwn.net/Articles/677967/
https://www.kernel.org/doc/html/latest/networking/devlink/index.html

The github page also is now 404 as Jiri merged that stuff into iproute2:

git://git.kernel.org/pub/scm/network/iproute2/iproute2.git

> fw crashes. This patch expects the devlink instance to be registered
> at probe time. I belive to be the cleanest. We can also add a devm
> version of the helpers, so that we don't have to do the clean up.
> Or we can go even further and register the devlink instance only
> once error has happened (for the first time, then we can just
> find out if already registered by traversing the list like we
> do here).
> 
> With the patch applied and a sample driver converted we get:
> 
> $ devlink dev
> pci/0000:07:00.0
> 
> Then monitor for errors:
> 
> $ devlink mon health
> [health,status] pci/0000:07:00.0:
>   reporter fw
>     state error error 1 recover 0
> [health,status] pci/0000:07:00.0:
>   reporter fw
>     state error error 2 recover 0
> 
> These are the events I triggered on purpose. One can also inspect
> the health of all devices capable of reporting fw errors:
> 
> $ devlink health
> pci/0000:07:00.0:
>   reporter fw
>     state error error 7 recover 0
> 
> Obviously drivers may upgrade to the full devlink health API
> which includes state dump, state dump auto-collect and automatic
> error recovery control.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  include/linux/devlink.h               |  11 +++
>  net/core/Makefile                     |   2 +-
>  net/core/devlink_simple_fw_reporter.c | 101 ++++++++++++++++++++++++++
>  3 files changed, 113 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/devlink.h
>  create mode 100644 net/core/devlink_simple_fw_reporter.c
> 
> diff --git a/include/linux/devlink.h b/include/linux/devlink.h
> new file mode 100644
> index 000000000000..2b73987eefca
> --- /dev/null
> +++ b/include/linux/devlink.h
> @@ -0,0 +1,11 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#ifndef _LINUX_DEVLINK_H_
> +#define _LINUX_DEVLINK_H_
> +
> +struct device;
> +
> +void devlink_simple_fw_reporter_prepare(struct device *dev);
> +void devlink_simple_fw_reporter_cleanup(struct device *dev);
> +void devlink_simple_fw_reporter_report_crash(struct device *dev);
> +
> +#endif
> diff --git a/net/core/Makefile b/net/core/Makefile
> index 3e2c378e5f31..6f1513781c17 100644
> --- a/net/core/Makefile
> +++ b/net/core/Makefile
> @@ -31,7 +31,7 @@ obj-$(CONFIG_LWTUNNEL_BPF) += lwt_bpf.o
>  obj-$(CONFIG_BPF_STREAM_PARSER) += sock_map.o
>  obj-$(CONFIG_DST_CACHE) += dst_cache.o
>  obj-$(CONFIG_HWBM) += hwbm.o
> -obj-$(CONFIG_NET_DEVLINK) += devlink.o
> +obj-$(CONFIG_NET_DEVLINK) += devlink.o devlink_simple_fw_reporter.o

This was looking super sexy up to here. This is networking specific.
We want something generic for *anything* that requests firmware.

I'm afraid this won't work for something generic. I don't think its
throw-away work though, the idea to provide a generic interface to
dump firmware through netlink might be nice for networking, or other
things.

But I have a feeling we'll want something still more generic than this.

So networking may want to be aware that a firmware crash happened as
part of this network device health thing, but firmware crashing is a
generic thing.

I have now extended my patch set to include uvents and I am more set on
that we need the taint now more than ever.

  Luis

>  obj-$(CONFIG_GRO_CELLS) += gro_cells.o
>  obj-$(CONFIG_FAILOVER) += failover.o
>  obj-$(CONFIG_BPF_SYSCALL) += bpf_sk_storage.o
> diff --git a/net/core/devlink_simple_fw_reporter.c b/net/core/devlink_simple_fw_reporter.c
> new file mode 100644
> index 000000000000..48dde9123c3c
> --- /dev/null
> +++ b/net/core/devlink_simple_fw_reporter.c
> @@ -0,0 +1,101 @@
> +#include <linux/devlink.h>
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +#include <net/devlink.h>
> +
> +struct devlink_simple_fw_reporter {
> +	struct list_head list;
> +	struct devlink_health_reporter *reporter;
> +};
> +
> +
> +static LIST_HEAD(devlink_simple_fw_reporters);
> +static DEFINE_MUTEX(devlink_simple_fw_reporters_mutex);
> +
> +static const struct devlink_health_reporter_ops simple_devlink_health = {
> +	.name = "fw",
> +};
> +
> +static const struct devlink_ops simple_devlink_ops = {
> +};
> +
> +static struct devlink_simple_fw_reporter *
> +devlink_simple_fw_reporter_find_for_dev(struct device *dev)
> +{
> +	struct devlink_simple_fw_reporter *simple_devlink, *ret = NULL;
> +	struct devlink *devlink;
> +
> +	mutex_lock(&devlink_simple_fw_reporters_mutex);
> +	list_for_each_entry(simple_devlink, &devlink_simple_fw_reporters,
> +			    list) {
> +		devlink = priv_to_devlink(simple_devlink);
> +		if (devlink->dev == dev) {
> +			ret = simple_devlink;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&devlink_simple_fw_reporters_mutex);
> +
> +	return ret;
> +}
> +
> +void devlink_simple_fw_reporter_report_crash(struct device *dev)
> +{
> +	struct devlink_simple_fw_reporter *simple_devlink;
> +
> +	simple_devlink = devlink_simple_fw_reporter_find_for_dev(dev);
> +	if (!simple_devlink)
> +		return;
> +
> +	devlink_health_report(simple_devlink->reporter, "firmware crash", NULL);
> +}
> +EXPORT_SYMBOL_GPL(devlink_simple_fw_reporter_report_crash);
> +
> +void devlink_simple_fw_reporter_prepare(struct device *dev)
> +{
> +	struct devlink_simple_fw_reporter *simple_devlink;
> +	struct devlink *devlink;
> +
> +	devlink = devlink_alloc(&simple_devlink_ops,
> +				sizeof(struct devlink_simple_fw_reporter));
> +	if (!devlink)
> +		return;
> +
> +	if (devlink_register(devlink, dev))
> +		goto err_free;
> +
> +	simple_devlink = devlink_priv(devlink);
> +	simple_devlink->reporter =
> +		devlink_health_reporter_create(devlink, &simple_devlink_health,
> +					       0, NULL);
> +	if (IS_ERR(simple_devlink->reporter))
> +		goto err_unregister;
> +
> +	mutex_lock(&devlink_simple_fw_reporters_mutex);
> +	list_add_tail(&simple_devlink->list, &devlink_simple_fw_reporters);
> +	mutex_unlock(&devlink_simple_fw_reporters_mutex);
> +
> +	return;
> +
> +err_unregister:
> +	devlink_unregister(devlink);
> +err_free:
> +	devlink_free(devlink);
> +}
> +EXPORT_SYMBOL_GPL(devlink_simple_fw_reporter_prepare);
> +
> +void devlink_simple_fw_reporter_cleanup(struct device *dev)
> +{
> +	struct devlink_simple_fw_reporter *simple_devlink;
> +	struct devlink *devlink;
> +
> +	simple_devlink = devlink_simple_fw_reporter_find_for_dev(dev);
> +	if (!simple_devlink)
> +		return;
> +
> +	devlink = priv_to_devlink(simple_devlink);
> +	devlink_health_reporter_destroy(simple_devlink->reporter);
> +	devlink_unregister(devlink);
> +	devlink_free(devlink);
> +}
> +EXPORT_SYMBOL_GPL(devlink_simple_fw_reporter_cleanup);
> -- 
> 2.25.4
>
Jakub Kicinski May 22, 2020, 5:17 p.m. UTC | #2
On Fri, 22 May 2020 05:20:46 +0000 Luis Chamberlain wrote:
> > diff --git a/net/core/Makefile b/net/core/Makefile
> > index 3e2c378e5f31..6f1513781c17 100644
> > --- a/net/core/Makefile
> > +++ b/net/core/Makefile
> > @@ -31,7 +31,7 @@ obj-$(CONFIG_LWTUNNEL_BPF) += lwt_bpf.o
> >  obj-$(CONFIG_BPF_STREAM_PARSER) += sock_map.o
> >  obj-$(CONFIG_DST_CACHE) += dst_cache.o
> >  obj-$(CONFIG_HWBM) += hwbm.o
> > -obj-$(CONFIG_NET_DEVLINK) += devlink.o
> > +obj-$(CONFIG_NET_DEVLINK) += devlink.o devlink_simple_fw_reporter.o  
> 
> This was looking super sexy up to here. This is networking specific.
> We want something generic for *anything* that requests firmware.

You can't be serious. It's network specific because of how the Kconfig
is named?

Working for a company operating large data centers I would strongly
prefer if we didn't have ten different ways of reporting firmware
problems in the fleet.

> I'm afraid this won't work for something generic. I don't think its
> throw-away work though, the idea to provide a generic interface to
> dump firmware through netlink might be nice for networking, or other
> things.
> 
> But I have a feeling we'll want something still more generic than this.

Please be specific. Saying generic a lot is not helpful. The code (as
you can see in this patch) is in no way network specific. Or are you
saying there are machines out there running without netlink sockets?

> So networking may want to be aware that a firmware crash happened as
> part of this network device health thing, but firmware crashing is a
> generic thing.
> 
> I have now extended my patch set to include uvents and I am more set on
> that we need the taint now more than ever.

Please expect my nack if you're trying to add this to networking
drivers.

The irony is you have a problem with a networking device and all the
devices your initial set touched are networking. Two of the drivers 
you touched either have or will soon have devlink health reporters
implemented.
Johannes Berg May 22, 2020, 8:46 p.m. UTC | #3
On Fri, 2020-05-22 at 10:17 -0700, Jakub Kicinski wrote:

> > > --- a/net/core/Makefile
> > > +++ b/net/core/Makefile
> > > @@ -31,7 +31,7 @@ obj-$(CONFIG_LWTUNNEL_BPF) += lwt_bpf.o
> > >  obj-$(CONFIG_BPF_STREAM_PARSER) += sock_map.o
> > >  obj-$(CONFIG_DST_CACHE) += dst_cache.o
> > >  obj-$(CONFIG_HWBM) += hwbm.o
> > > -obj-$(CONFIG_NET_DEVLINK) += devlink.o
> > > +obj-$(CONFIG_NET_DEVLINK) += devlink.o devlink_simple_fw_reporter.o  
> > 
> > This was looking super sexy up to here. This is networking specific.
> > We want something generic for *anything* that requests firmware.
> 
> You can't be serious. It's network specific because of how the Kconfig
> is named?

Wait, yeah, what?

> Working for a company operating large data centers I would strongly
> prefer if we didn't have ten different ways of reporting firmware
> problems in the fleet.

Agree. I don't actually operate anything, but still ...

Thinking about this - maybe there's a way to still combine devcoredump
and devlink somehow?

Or (optionally) make devlink trigger devcoredump while userspace
migrates?

> > So networking may want to be aware that a firmware crash happened as
> > part of this network device health thing, but firmware crashing is a
> > generic thing.
> > 
> > I have now extended my patch set to include uvents and I am more set on
> > that we need the taint now more than ever.

FWIW, I still completely disagree on that taint. You (Luis) obviously
have been running into a bug in that driver, I doubt the firmware
actually managed to wedge the hardware.

But even if it did, that's still not really a kernel taint. The kernel
itself isn't in any way affected by this.

Yes, the system is in a weird state now. But that's *not* equivalent to
"kernel tainted".

> The irony is you have a problem with a networking device and all the
> devices your initial set touched are networking. Two of the drivers 
> you touched either have or will soon have devlink health reporters
> implemented.

Like I said above, do you think it'd be feasible to make a devcoredump
out of devlink health reports? And can the report be in a way that we
control the file format, or are there limits? I guess I should read the
code to find out, but I figure you probably just know. But feel free to
tell me to read it :)

The reason I'm asking is that it's starting to sound like we really
ought to be implementing devlink, but we've got a bunch of
infrastructure that uses the devcoredump, and it'll take time
(significantly so) to change all that...

johannes
Luis Chamberlain May 22, 2020, 9:49 p.m. UTC | #4
On Fri, May 22, 2020 at 10:17:38AM -0700, Jakub Kicinski wrote:
> On Fri, 22 May 2020 05:20:46 +0000 Luis Chamberlain wrote:
> > > diff --git a/net/core/Makefile b/net/core/Makefile
> > > index 3e2c378e5f31..6f1513781c17 100644
> > > --- a/net/core/Makefile
> > > +++ b/net/core/Makefile
> > > @@ -31,7 +31,7 @@ obj-$(CONFIG_LWTUNNEL_BPF) += lwt_bpf.o
> > >  obj-$(CONFIG_BPF_STREAM_PARSER) += sock_map.o
> > >  obj-$(CONFIG_DST_CACHE) += dst_cache.o
> > >  obj-$(CONFIG_HWBM) += hwbm.o
> > > -obj-$(CONFIG_NET_DEVLINK) += devlink.o
> > > +obj-$(CONFIG_NET_DEVLINK) += devlink.o devlink_simple_fw_reporter.o  
> > 
> > This was looking super sexy up to here. This is networking specific.
> > We want something generic for *anything* that requests firmware.
> 
> You can't be serious. It's network specific because of how the Kconfig
> is named?

Kconfig? What has that to do with anything? The issue I have is that the
solution I am looking for is for it to be agnostic to the subsystem. I
have found similar firmware crashes on gpu, media, scsci.

> Working for a company operating large data centers I would strongly
> prefer if we didn't have ten different ways of reporting firmware
> problems in the fleet.

Indeed.

> > I'm afraid this won't work for something generic. I don't think its
> > throw-away work though, the idea to provide a generic interface to
> > dump firmware through netlink might be nice for networking, or other
> > things.
> > 
> > But I have a feeling we'll want something still more generic than this.
> 
> Please be specific. Saying generic a lot is not helpful. The code (as
> you can see in this patch) is in no way network specific. Or are you
> saying there are machines out there running without netlink sockets?

No, I am saying I want something to work with any struct device.

> > So networking may want to be aware that a firmware crash happened as
> > part of this network device health thing, but firmware crashing is a
> > generic thing.
> > 
> > I have now extended my patch set to include uvents and I am more set on
> > that we need the taint now more than ever.
> 
> Please expect my nack if you're trying to add this to networking
> drivers.

The uevent mechanism is not for networking.

The taint however is, and I'd like to undertand how it is you do not see
that an undesirable requirement for a reboot is a clear case for a taint.

> The irony is you have a problem with a networking device and all the
> devices your initial set touched are networking. Two of the drivers 
> you touched either have or will soon have devlink health reporters
> implemented.

That is all great, and I don't think its a bad idea to add
infrastructure / extend it to get more information about a firmware
crash dump. However, suggesting that devlink is the only solution we
need in the kernel without considering other subsystems is what I am
suggesting doesn't suit my needs. Networking was just the first
subsystem I am taclking now but I have patches where similar situations
happen across the kernel.

  Luis
Luis Chamberlain May 22, 2020, 9:51 p.m. UTC | #5
On Fri, May 22, 2020 at 10:46:07PM +0200, Johannes Berg wrote:
> FWIW, I still completely disagree on that taint. You (Luis) obviously
> have been running into a bug in that driver, I doubt the firmware
> actually managed to wedge the hardware.

This hasn't happened just once, its happed many times sporadically now,
once a week or two weeks I'd say. And the system isn't being moved
around.

> But even if it did, that's still not really a kernel taint. The kernel
> itself isn't in any way affected by this.

Of course it is, a full reboot is required.

> Yes, the system is in a weird state now. But that's *not* equivalent to
> "kernel tainted".

Requiring a full reboot is a dire situation to be in, and loosing
connectivity to the point this is not recoverable likewise.

You guys are making out a taint to be the end of the world. We have a
taint even for a kernel warning, and as others have mentioned mac80211
already produces these.

What exactly is the opposition to a taint to clarify that a device
firmware has crashed and your system requires a full reboot?

  Luis
Steve deRosier May 22, 2020, 11:23 p.m. UTC | #6
On Fri, May 22, 2020 at 2:51 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Fri, May 22, 2020 at 10:46:07PM +0200, Johannes Berg wrote:
> > FWIW, I still completely disagree on that taint. You (Luis) obviously
> > have been running into a bug in that driver, I doubt the firmware
> > actually managed to wedge the hardware.
>
> This hasn't happened just once, its happed many times sporadically now,
> once a week or two weeks I'd say. And the system isn't being moved
> around.
>
> > But even if it did, that's still not really a kernel taint. The kernel
> > itself isn't in any way affected by this.
>
> Of course it is, a full reboot is required.
>
> > Yes, the system is in a weird state now. But that's *not* equivalent to
> > "kernel tainted".
>
> Requiring a full reboot is a dire situation to be in, and loosing
> connectivity to the point this is not recoverable likewise.
>
> You guys are making out a taint to be the end of the world. We have a
> taint even for a kernel warning, and as others have mentioned mac80211
> already produces these.
>

I had to go RTFM re: kernel taints because it has been a very long
time since I looked at them. It had always seemed to me that most were
caused by "kernel-unfriendly" user actions.  The most famous of course
is loading proprietary modules, out-of-tree modules, forced module
loads, etc...  Honestly, I had forgotten the large variety of uses of
the taint flags. For anyone who hasn't looked at taints recently, I
recommend: https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html

In light of this I don't object to setting a taint on this anymore.
I'm a little uneasy, but I've softened on it now, and now I feel it
depends on implementation.

Specifically, I don't think we should set a taint flag when a driver
easily handles a routine firmware crash and is confident that things
have come up just fine again. In other words, triggering the taint in
every driver module where it spits out a log comment that it had a
firmware crash and had to recover seems too much. Sure, firmware
shouldn't crash, sure it should be open source so we can fix it,
whatever... those sort of wishful comments simply ignore reality and
our ability to affect effective change. A lot of WiFi firmware crashes
and for well-known cases the drivers handle them well. And in some
cases, not so well and that should be a place the driver should detect
and thus raise a red flag.  If a WiFi firmware crash can bring down
the kernel, there's either a major driver bug or some very funky
hardware crap going on. That sort of thing we should be able to
detect, mark with a taint (or something), and fix if within our sphere
of influence. I guess what it comes down to me is how aggressive we
are about setting the flag.

I would like there to be a single solution, or a minimized set
depending on what makes sense for the requirements. I haven't had time
to look into the alternatives mentioned here so I don't have an
informed opinion about the solution. I do think Luis is trying to
solve a real problem though. Can we look at this from the point of
view of what are the requirements?  What is it we're trying to solve?

I _think_ that the goal of Luis's original proposal is to report up to
the user, at some future point when the user is interested (because
something super drastic just occured, but long after the fw crash),
that there was a firmware crash without the user having to grep
through all logs on the machine. And then if the user sees that flag
and suspects it, then they can bother to find it in the logs or do
more drastic debugging steps like finding the fw crash in the log and
pulling firmware crash dumps, etc.

I think the various alternate solutions are great but perhaps solving
a superset of features (like adding in user-space notifications etc)?
Perhaps different people on these related threads are trying to solve
different problems?


- Steve
Luis Chamberlain May 22, 2020, 11:44 p.m. UTC | #7
On Fri, May 22, 2020 at 04:23:55PM -0700, Steve deRosier wrote:
> Specifically, I don't think we should set a taint flag when a driver
> easily handles a routine firmware crash and is confident that things
> have come up just fine again. In other words, triggering the taint in
> every driver module where it spits out a log comment that it had a
> firmware crash and had to recover seems too much. Sure, firmware
> shouldn't crash, sure it should be open source so we can fix it,
> whatever... those sort of wishful comments simply ignore reality and
> our ability to affect effective change. A lot of WiFi firmware crashes
> and for well-known cases the drivers handle them well. And in some
> cases, not so well and that should be a place the driver should detect
> and thus raise a red flag.  If a WiFi firmware crash can bring down
> the kernel, there's either a major driver bug or some very funky
> hardware crap going on. That sort of thing we should be able to
> detect, mark with a taint (or something), and fix if within our sphere
> of influence. I guess what it comes down to me is how aggressive we
> are about setting the flag.

Exactly the crux of the issue.

I hope that by now we should all be in agreement that at least a
firmware crash requiring a reboot is something we should record and
inform the user of. A taint seems like a reasonable standard practice
for these sorts of things.

> I would like there to be a single solution, or a minimized set
> depending on what makes sense for the requirements. I haven't had time
> to look into the alternatives mentioned here so I don't have an
> informed opinion about the solution. I do think Luis is trying to
> solve a real problem though. Can we look at this from the point of
> view of what are the requirements?  What is it we're trying to solve?
> 
> I _think_ that the goal of Luis's original proposal is to report up to
> the user, at some future point when the user is interested (because
> something super drastic just occured, but long after the fw crash),
> that there was a firmware crash without the user having to grep
> through all logs on the machine. And then if the user sees that flag
> and suspects it, then they can bother to find it in the logs or do
> more drastic debugging steps like finding the fw crash in the log and
> pulling firmware crash dumps, etc.

Yes, that's exactly it. Not all users are clueful to inspect logs. I now
have a generic uevent mechanism drafted which sends a uevent for *any*
taint. So that is, it does not even depend on this series. But it
accomplishes the goal of informing the user of taints.

> I think the various alternate solutions are great but perhaps solving
> a superset of features (like adding in user-space notifications etc)?
> Perhaps different people on these related threads are trying to solve
> different problems?

The uevent mechanism I implemented (but not yet posted for review) at
least sends out a smoke signal. I think that if each subsystem wants
to expand on this with dumping facilities that is great too!

  Luis
Andy Shevchenko May 25, 2020, 9:07 a.m. UTC | #8
On Fri, May 22, 2020 at 04:23:55PM -0700, Steve deRosier wrote:
> On Fri, May 22, 2020 at 2:51 PM Luis Chamberlain <mcgrof@kernel.org> wrote:

> I had to go RTFM re: kernel taints because it has been a very long
> time since I looked at them. It had always seemed to me that most were
> caused by "kernel-unfriendly" user actions.  The most famous of course
> is loading proprietary modules, out-of-tree modules, forced module
> loads, etc...  Honestly, I had forgotten the large variety of uses of
> the taint flags. For anyone who hasn't looked at taints recently, I
> recommend: https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html
> 
> In light of this I don't object to setting a taint on this anymore.
> I'm a little uneasy, but I've softened on it now, and now I feel it
> depends on implementation.
> 
> Specifically, I don't think we should set a taint flag when a driver
> easily handles a routine firmware crash and is confident that things
> have come up just fine again. In other words, triggering the taint in
> every driver module where it spits out a log comment that it had a
> firmware crash and had to recover seems too much. Sure, firmware
> shouldn't crash, sure it should be open source so we can fix it,
> whatever...

While it may sound idealistic the firmware for the end-user, and even for mere
kernel developer like me, is a complete blackbox which has more access than
root user in the kernel. We have tons of firmwares and each of them potentially
dangerous beast. As a user I really care about my data and privacy (hacker can
oops a firmware in order to set a specific vector attack). So, tainting kernel
is _a least_ we can do there, the strict rules would be to reboot immediately.

> those sort of wishful comments simply ignore reality and
> our ability to affect effective change.

We can encourage users not to buy cheap crap for the starter.

> A lot of WiFi firmware crashes
> and for well-known cases the drivers handle them well. And in some
> cases, not so well and that should be a place the driver should detect
> and thus raise a red flag.  If a WiFi firmware crash can bring down
> the kernel, there's either a major driver bug or some very funky
> hardware crap going on. That sort of thing we should be able to
> detect, mark with a taint (or something), and fix if within our sphere
> of influence. I guess what it comes down to me is how aggressive we
> are about setting the flag.
Ben Greear May 25, 2020, 5:08 p.m. UTC | #9
On 05/25/2020 02:07 AM, Andy Shevchenko wrote:
> On Fri, May 22, 2020 at 04:23:55PM -0700, Steve deRosier wrote:
>> On Fri, May 22, 2020 at 2:51 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
>> I had to go RTFM re: kernel taints because it has been a very long
>> time since I looked at them. It had always seemed to me that most were
>> caused by "kernel-unfriendly" user actions.  The most famous of course
>> is loading proprietary modules, out-of-tree modules, forced module
>> loads, etc...  Honestly, I had forgotten the large variety of uses of
>> the taint flags. For anyone who hasn't looked at taints recently, I
>> recommend: https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html
>>
>> In light of this I don't object to setting a taint on this anymore.
>> I'm a little uneasy, but I've softened on it now, and now I feel it
>> depends on implementation.
>>
>> Specifically, I don't think we should set a taint flag when a driver
>> easily handles a routine firmware crash and is confident that things
>> have come up just fine again. In other words, triggering the taint in
>> every driver module where it spits out a log comment that it had a
>> firmware crash and had to recover seems too much. Sure, firmware
>> shouldn't crash, sure it should be open source so we can fix it,
>> whatever...
>
> While it may sound idealistic the firmware for the end-user, and even for mere
> kernel developer like me, is a complete blackbox which has more access than
> root user in the kernel. We have tons of firmwares and each of them potentially
> dangerous beast. As a user I really care about my data and privacy (hacker can
> oops a firmware in order to set a specific vector attack). So, tainting kernel
> is _a least_ we can do there, the strict rules would be to reboot immediately.
>
>> those sort of wishful comments simply ignore reality and
>> our ability to affect effective change.
>
> We can encourage users not to buy cheap crap for the starter.

There is no stable wifi firmware for any price.

There is also no obvious feedback from even name-brand NICs like ath10k or AX200
when you report a crash.

That said, at least in my experience with ath10k-ct, the OS normally recovers fine
from firmware crashes.  ath10k already reports full crash reports on udev, so
easy for user-space to notice and report bug reports upstream if it cares to.  Probably
other NICs do the same, and if not, they certainly could.

Thanks,
Ben
Jakub Kicinski May 25, 2020, 8:57 p.m. UTC | #10
On Fri, 22 May 2020 22:46:07 +0200 Johannes Berg wrote:
> > The irony is you have a problem with a networking device and all the
> > devices your initial set touched are networking. Two of the drivers 
> > you touched either have or will soon have devlink health reporters
> > implemented.  
> 
> Like I said above, do you think it'd be feasible to make a devcoredump
> out of devlink health reports? And can the report be in a way that we
> control the file format, or are there limits? I guess I should read the
> code to find out, but I figure you probably just know. But feel free to
> tell me to read it :)
> 
> The reason I'm asking is that it's starting to sound like we really
> ought to be implementing devlink, but we've got a bunch of
> infrastructure that uses the devcoredump, and it'll take time
> (significantly so) to change all that...

In devlink world pure FW core dumps are exposed by devlink regions.
An API allowing reading device memory, registers, etc., but also 
creating dumps of memory regions when things go wrong. It should be
a fairly straightforward migration.

Devlink health is more targeted, the dump is supposed to contain only
relevant information, selected and formatted by the driver. When device
misbehaves driver reads the relevant registers and FW state and creates
a formatted state dump. I believe each element of the dump must fit
into a netlink message (but there may be multiple elements, see
devlink_fmsg_prepare_skb()).

We should be able to convert dl-regions dumps and dl-health dumps into
devcoredumps, but since health reporters are supposed to be more
targeted there's usually multiple of them per device.

Conversely devcoredumps can be trivially exposed as dl-region dumps,
but I believe dl-health would require driver changes to format the
information appropriately.
Johannes Berg July 30, 2020, 1:56 p.m. UTC | #11
Hi,

Sorry ... long delay.

> > The reason I'm asking is that it's starting to sound like we really
> > ought to be implementing devlink, but we've got a bunch of
> > infrastructure that uses the devcoredump, and it'll take time
> > (significantly so) to change all that...
> 
> In devlink world pure FW core dumps are exposed by devlink regions.
> An API allowing reading device memory, registers, etc., but also 
> creating dumps of memory regions when things go wrong. It should be
> a fairly straightforward migration.

Right. We also have regions (various memory pieces, registers, ...).

One issue might be that for devlink we wouldn't want to expose these as
a single blob, I guess, but for devcoredump we already have a custom
format to glue all the things together. Since it seems unlikely that
anyone else would want to use the *iwlwifi* format to glue things
together, we'd have to do something there.

But perhaps that could be a matter of providing a "glue things into a
devcoredump" function that would have a reasonable default but could be
overridden by the driver for those migration cases.

> Devlink health is more targeted, the dump is supposed to contain only
> relevant information, selected and formatted by the driver. When device
> misbehaves driver reads the relevant registers and FW state and creates
> a formatted state dump. I believe each element of the dump must fit
> into a netlink message (but there may be multiple elements, see
> devlink_fmsg_prepare_skb()).

That wouldn't help for our big memory dumps, but OK.

> We should be able to convert dl-regions dumps and dl-health dumps into
> devcoredumps, but since health reporters are supposed to be more
> targeted there's usually multiple of them per device.

Right.

> Conversely devcoredumps can be trivially exposed as dl-region dumps,
> but I believe dl-health would require driver changes to format the
> information appropriately.

Agree.

Anyway, thanks. I'll put it on my list of things to look at ... not too
hopeful that will be soon, given how long it even took me to get back to
this email :)

johannes
diff mbox series

Patch

diff --git a/include/linux/devlink.h b/include/linux/devlink.h
new file mode 100644
index 000000000000..2b73987eefca
--- /dev/null
+++ b/include/linux/devlink.h
@@ -0,0 +1,11 @@ 
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _LINUX_DEVLINK_H_
+#define _LINUX_DEVLINK_H_
+
+struct device;
+
+void devlink_simple_fw_reporter_prepare(struct device *dev);
+void devlink_simple_fw_reporter_cleanup(struct device *dev);
+void devlink_simple_fw_reporter_report_crash(struct device *dev);
+
+#endif
diff --git a/net/core/Makefile b/net/core/Makefile
index 3e2c378e5f31..6f1513781c17 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -31,7 +31,7 @@  obj-$(CONFIG_LWTUNNEL_BPF) += lwt_bpf.o
 obj-$(CONFIG_BPF_STREAM_PARSER) += sock_map.o
 obj-$(CONFIG_DST_CACHE) += dst_cache.o
 obj-$(CONFIG_HWBM) += hwbm.o
-obj-$(CONFIG_NET_DEVLINK) += devlink.o
+obj-$(CONFIG_NET_DEVLINK) += devlink.o devlink_simple_fw_reporter.o
 obj-$(CONFIG_GRO_CELLS) += gro_cells.o
 obj-$(CONFIG_FAILOVER) += failover.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_sk_storage.o
diff --git a/net/core/devlink_simple_fw_reporter.c b/net/core/devlink_simple_fw_reporter.c
new file mode 100644
index 000000000000..48dde9123c3c
--- /dev/null
+++ b/net/core/devlink_simple_fw_reporter.c
@@ -0,0 +1,101 @@ 
+#include <linux/devlink.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <net/devlink.h>
+
+struct devlink_simple_fw_reporter {
+	struct list_head list;
+	struct devlink_health_reporter *reporter;
+};
+
+
+static LIST_HEAD(devlink_simple_fw_reporters);
+static DEFINE_MUTEX(devlink_simple_fw_reporters_mutex);
+
+static const struct devlink_health_reporter_ops simple_devlink_health = {
+	.name = "fw",
+};
+
+static const struct devlink_ops simple_devlink_ops = {
+};
+
+static struct devlink_simple_fw_reporter *
+devlink_simple_fw_reporter_find_for_dev(struct device *dev)
+{
+	struct devlink_simple_fw_reporter *simple_devlink, *ret = NULL;
+	struct devlink *devlink;
+
+	mutex_lock(&devlink_simple_fw_reporters_mutex);
+	list_for_each_entry(simple_devlink, &devlink_simple_fw_reporters,
+			    list) {
+		devlink = priv_to_devlink(simple_devlink);
+		if (devlink->dev == dev) {
+			ret = simple_devlink;
+			break;
+		}
+	}
+	mutex_unlock(&devlink_simple_fw_reporters_mutex);
+
+	return ret;
+}
+
+void devlink_simple_fw_reporter_report_crash(struct device *dev)
+{
+	struct devlink_simple_fw_reporter *simple_devlink;
+
+	simple_devlink = devlink_simple_fw_reporter_find_for_dev(dev);
+	if (!simple_devlink)
+		return;
+
+	devlink_health_report(simple_devlink->reporter, "firmware crash", NULL);
+}
+EXPORT_SYMBOL_GPL(devlink_simple_fw_reporter_report_crash);
+
+void devlink_simple_fw_reporter_prepare(struct device *dev)
+{
+	struct devlink_simple_fw_reporter *simple_devlink;
+	struct devlink *devlink;
+
+	devlink = devlink_alloc(&simple_devlink_ops,
+				sizeof(struct devlink_simple_fw_reporter));
+	if (!devlink)
+		return;
+
+	if (devlink_register(devlink, dev))
+		goto err_free;
+
+	simple_devlink = devlink_priv(devlink);
+	simple_devlink->reporter =
+		devlink_health_reporter_create(devlink, &simple_devlink_health,
+					       0, NULL);
+	if (IS_ERR(simple_devlink->reporter))
+		goto err_unregister;
+
+	mutex_lock(&devlink_simple_fw_reporters_mutex);
+	list_add_tail(&simple_devlink->list, &devlink_simple_fw_reporters);
+	mutex_unlock(&devlink_simple_fw_reporters_mutex);
+
+	return;
+
+err_unregister:
+	devlink_unregister(devlink);
+err_free:
+	devlink_free(devlink);
+}
+EXPORT_SYMBOL_GPL(devlink_simple_fw_reporter_prepare);
+
+void devlink_simple_fw_reporter_cleanup(struct device *dev)
+{
+	struct devlink_simple_fw_reporter *simple_devlink;
+	struct devlink *devlink;
+
+	simple_devlink = devlink_simple_fw_reporter_find_for_dev(dev);
+	if (!simple_devlink)
+		return;
+
+	devlink = priv_to_devlink(simple_devlink);
+	devlink_health_reporter_destroy(simple_devlink->reporter);
+	devlink_unregister(devlink);
+	devlink_free(devlink);
+}
+EXPORT_SYMBOL_GPL(devlink_simple_fw_reporter_cleanup);