Message ID | 20230406234143.11318-5-shannon.nelson@amd.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | pds_core driver | expand |
On Thu, Apr 06, 2023 at 04:41:33PM -0700, Shannon Nelson wrote: > Add devlink health reporting on top of our fw watchdog. > > Example: > # devlink health show pci/0000:2b:00.0 reporter fw > pci/0000:2b:00.0: > reporter fw > state healthy error 0 recover 0 > # devlink health diagnose pci/0000:2b:00.0 reporter fw > Status: healthy State: 1 Generation: 0 Recoveries: 0 > > Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> > --- > .../device_drivers/ethernet/amd/pds_core.rst | 12 ++++++ > drivers/net/ethernet/amd/pds_core/Makefile | 1 + > drivers/net/ethernet/amd/pds_core/core.c | 6 +++ > drivers/net/ethernet/amd/pds_core/core.h | 6 +++ > drivers/net/ethernet/amd/pds_core/devlink.c | 37 +++++++++++++++++++ > drivers/net/ethernet/amd/pds_core/main.c | 22 +++++++++++ > 6 files changed, 84 insertions(+) > create mode 100644 drivers/net/ethernet/amd/pds_core/devlink.c <...> > +int pdsc_fw_reporter_diagnose(struct devlink_health_reporter *reporter, > + struct devlink_fmsg *fmsg, > + struct netlink_ext_ack *extack) > +{ > + struct pdsc *pdsc = devlink_health_reporter_priv(reporter); > + int err = 0; > + > + if (test_bit(PDSC_S_FW_DEAD, &pdsc->state)) How is this check protected from race with your health workqueue added in previous patch? > + err = devlink_fmsg_string_pair_put(fmsg, "Status", "dead"); > + else if (!pdsc_is_fw_good(pdsc)) Same question. > + err = devlink_fmsg_string_pair_put(fmsg, "Status", "unhealthy"); > + else > + err = devlink_fmsg_string_pair_put(fmsg, "Status", "healthy"); > + if (err) > + return err; Thanks
On 4/9/23 4:54 AM, Leon Romanovsky wrote: > > On Thu, Apr 06, 2023 at 04:41:33PM -0700, Shannon Nelson wrote: >> Add devlink health reporting on top of our fw watchdog. >> >> Example: >> # devlink health show pci/0000:2b:00.0 reporter fw >> pci/0000:2b:00.0: >> reporter fw >> state healthy error 0 recover 0 >> # devlink health diagnose pci/0000:2b:00.0 reporter fw >> Status: healthy State: 1 Generation: 0 Recoveries: 0 >> >> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> >> --- >> .../device_drivers/ethernet/amd/pds_core.rst | 12 ++++++ >> drivers/net/ethernet/amd/pds_core/Makefile | 1 + >> drivers/net/ethernet/amd/pds_core/core.c | 6 +++ >> drivers/net/ethernet/amd/pds_core/core.h | 6 +++ >> drivers/net/ethernet/amd/pds_core/devlink.c | 37 +++++++++++++++++++ >> drivers/net/ethernet/amd/pds_core/main.c | 22 +++++++++++ >> 6 files changed, 84 insertions(+) >> create mode 100644 drivers/net/ethernet/amd/pds_core/devlink.c > > <...> > >> +int pdsc_fw_reporter_diagnose(struct devlink_health_reporter *reporter, >> + struct devlink_fmsg *fmsg, >> + struct netlink_ext_ack *extack) >> +{ >> + struct pdsc *pdsc = devlink_health_reporter_priv(reporter); >> + int err = 0; >> + >> + if (test_bit(PDSC_S_FW_DEAD, &pdsc->state)) > > How is this check protected from race with your health workqueue added > in previous patch? > >> + err = devlink_fmsg_string_pair_put(fmsg, "Status", "dead"); >> + else if (!pdsc_is_fw_good(pdsc)) > > Same question. Yes, it would be good to wrap these in the config_lock. > >> + err = devlink_fmsg_string_pair_put(fmsg, "Status", "unhealthy"); >> + else >> + err = devlink_fmsg_string_pair_put(fmsg, "Status", "healthy"); >> + if (err) >> + return err; > > Thanks
diff --git a/Documentation/networking/device_drivers/ethernet/amd/pds_core.rst b/Documentation/networking/device_drivers/ethernet/amd/pds_core.rst index 58a28b255d37..90b473559bac 100644 --- a/Documentation/networking/device_drivers/ethernet/amd/pds_core.rst +++ b/Documentation/networking/device_drivers/ethernet/amd/pds_core.rst @@ -26,6 +26,18 @@ messages such as these:: pds_core 0000:b6:00.0: 252.048 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x16 link) pds_core 0000:b6:00.0: FW: 1.60.0-73 +Health Reporters +================ + +The driver supports a devlink health reporter for FW status:: + + # devlink health show pci/0000:2b:00.0 reporter fw + pci/0000:2b:00.0: + reporter fw + state healthy error 0 recover 0 + # devlink health diagnose pci/0000:2b:00.0 reporter fw + Status: healthy State: 1 Generation: 0 Recoveries: 0 + Support ======= diff --git a/drivers/net/ethernet/amd/pds_core/Makefile b/drivers/net/ethernet/amd/pds_core/Makefile index 95a6c31e92d2..eaca8557ba66 100644 --- a/drivers/net/ethernet/amd/pds_core/Makefile +++ b/drivers/net/ethernet/amd/pds_core/Makefile @@ -4,6 +4,7 @@ obj-$(CONFIG_PDS_CORE) := pds_core.o pds_core-y := main.o \ + devlink.o \ dev.o \ core.o diff --git a/drivers/net/ethernet/amd/pds_core/core.c b/drivers/net/ethernet/amd/pds_core/core.c index 701d27471858..52236af6b0e0 100644 --- a/drivers/net/ethernet/amd/pds_core/core.c +++ b/drivers/net/ethernet/amd/pds_core/core.c @@ -42,6 +42,8 @@ static void pdsc_fw_down(struct pdsc *pdsc) return; } + devlink_health_report(pdsc->fw_reporter, "FW down reported", pdsc); + pdsc_teardown(pdsc, PDSC_TEARDOWN_RECOVERY); } @@ -58,6 +60,10 @@ static void pdsc_fw_up(struct pdsc *pdsc) if (err) goto err_out; + pdsc->fw_recoveries++; + devlink_health_reporter_state_update(pdsc->fw_reporter, + DEVLINK_HEALTH_REPORTER_STATE_HEALTHY); + return; err_out: diff --git a/drivers/net/ethernet/amd/pds_core/core.h b/drivers/net/ethernet/amd/pds_core/core.h index ffc9e01dec31..3758071c94da 100644 --- a/drivers/net/ethernet/amd/pds_core/core.h +++ b/drivers/net/ethernet/amd/pds_core/core.h @@ -68,6 +68,8 @@ struct pdsc { struct timer_list wdtimer; unsigned int wdtimer_period; struct work_struct health_work; + struct devlink_health_reporter *fw_reporter; + u32 fw_recoveries; struct pdsc_devinfo dev_info; struct pds_core_dev_identity dev_ident; @@ -88,6 +90,10 @@ struct pdsc { u64 __iomem *kern_dbpage; }; +int pdsc_fw_reporter_diagnose(struct devlink_health_reporter *reporter, + struct devlink_fmsg *fmsg, + struct netlink_ext_ack *extack); + #ifdef CONFIG_DEBUG_FS void pdsc_debugfs_create(void); void pdsc_debugfs_destroy(void); diff --git a/drivers/net/ethernet/amd/pds_core/devlink.c b/drivers/net/ethernet/amd/pds_core/devlink.c new file mode 100644 index 000000000000..717fcbf91aee --- /dev/null +++ b/drivers/net/ethernet/amd/pds_core/devlink.c @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2023 Advanced Micro Devices, Inc */ + +#include "core.h" + +int pdsc_fw_reporter_diagnose(struct devlink_health_reporter *reporter, + struct devlink_fmsg *fmsg, + struct netlink_ext_ack *extack) +{ + struct pdsc *pdsc = devlink_health_reporter_priv(reporter); + int err = 0; + + if (test_bit(PDSC_S_FW_DEAD, &pdsc->state)) + err = devlink_fmsg_string_pair_put(fmsg, "Status", "dead"); + else if (!pdsc_is_fw_good(pdsc)) + err = devlink_fmsg_string_pair_put(fmsg, "Status", "unhealthy"); + else + err = devlink_fmsg_string_pair_put(fmsg, "Status", "healthy"); + if (err) + return err; + + err = devlink_fmsg_u32_pair_put(fmsg, "State", + pdsc->fw_status & + ~PDS_CORE_FW_STS_F_GENERATION); + if (err) + return err; + err = devlink_fmsg_u32_pair_put(fmsg, "Generation", + pdsc->fw_generation >> 4); + if (err) + return err; + err = devlink_fmsg_u32_pair_put(fmsg, "Recoveries", + pdsc->fw_recoveries); + if (err) + return err; + + return 0; +} diff --git a/drivers/net/ethernet/amd/pds_core/main.c b/drivers/net/ethernet/amd/pds_core/main.c index 5032fc199603..82ce180d7b48 100644 --- a/drivers/net/ethernet/amd/pds_core/main.c +++ b/drivers/net/ethernet/amd/pds_core/main.c @@ -141,10 +141,16 @@ static int pdsc_init_vf(struct pdsc *vf) return -1; } +static const struct devlink_health_reporter_ops pdsc_fw_reporter_ops = { + .name = "fw", + .diagnose = pdsc_fw_reporter_diagnose, +}; + #define PDSC_WQ_NAME_LEN 24 static int pdsc_init_pf(struct pdsc *pdsc) { + struct devlink_health_reporter *hr; char wq_name[PDSC_WQ_NAME_LEN]; struct devlink *dl; int err; @@ -183,6 +189,16 @@ static int pdsc_init_pf(struct pdsc *pdsc) dl = priv_to_devlink(pdsc); devl_lock(dl); + + hr = devl_health_reporter_create(dl, &pdsc_fw_reporter_ops, 0, pdsc); + if (IS_ERR(hr)) { + dev_warn(pdsc->dev, "Failed to create fw reporter: %pe\n", hr); + err = PTR_ERR(hr); + devl_unlock(dl); + goto err_out_teardown; + } + pdsc->fw_reporter = hr; + devl_register(dl); devl_unlock(dl); @@ -191,6 +207,8 @@ static int pdsc_init_pf(struct pdsc *pdsc) return 0; +err_out_teardown: + pdsc_teardown(pdsc, PDSC_TEARDOWN_REMOVING); err_out_unmap_bars: mutex_unlock(&pdsc->config_lock); del_timer_sync(&pdsc->wdtimer); @@ -297,6 +315,10 @@ static void pdsc_remove(struct pci_dev *pdev) dl = priv_to_devlink(pdsc); devl_lock(dl); devl_unregister(dl); + if (pdsc->fw_reporter) { + devl_health_reporter_destroy(pdsc->fw_reporter); + pdsc->fw_reporter = NULL; + } devl_unlock(dl); if (!pdev->is_virtfn) {
Add devlink health reporting on top of our fw watchdog. Example: # devlink health show pci/0000:2b:00.0 reporter fw pci/0000:2b:00.0: reporter fw state healthy error 0 recover 0 # devlink health diagnose pci/0000:2b:00.0 reporter fw Status: healthy State: 1 Generation: 0 Recoveries: 0 Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> --- .../device_drivers/ethernet/amd/pds_core.rst | 12 ++++++ drivers/net/ethernet/amd/pds_core/Makefile | 1 + drivers/net/ethernet/amd/pds_core/core.c | 6 +++ drivers/net/ethernet/amd/pds_core/core.h | 6 +++ drivers/net/ethernet/amd/pds_core/devlink.c | 37 +++++++++++++++++++ drivers/net/ethernet/amd/pds_core/main.c | 22 +++++++++++ 6 files changed, 84 insertions(+) create mode 100644 drivers/net/ethernet/amd/pds_core/devlink.c