Message ID | 20231212005122.2401-14-michael.chan@broadcom.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 056bce63c469ca397e30d16bdbd4408489f089a9 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | bnxt_en: Update for net-next | expand |
Hello Michael, Pavan, On Mon, Dec 11, 2023 at 04:51:22PM -0800, Michael Chan wrote: > From: Pavan Chebbi <pavan.chebbi@broadcom.com> > > In a busy network, especially with flow control enabled, we may > experience timestamp query failures fairly regularly. After a while, > dmesg may be flooded with timestamp query failure error messages. > > Silence the error message from the low level hwrm function that > sends the firmware message. Change netdev_err() to netdev_WARN_ONCE() > if this FW call ever fails. This is starting to cause a warning now, which is not ideal, because this error-now-warning happens quite frequently in Meta's fleet. At the same time, we want to have our kernels running warninglessly. Moreover, the call stack displayed by the warning doesn't seem to be quite useful and doees not help to investigate "the problem", I _think_. Is it OK to move it back to error, something as: - netdev_WARN_ONCE(bp->dev, + netdev_err_once(bp->dev, "TS query for TX timer failed rc = %x\n", rc); Thank you
On Wed, Jan 24, 2024 at 3:48 PM Breno Leitao <leitao@debian.org> wrote: > > Hello Michael, Pavan, > > On Mon, Dec 11, 2023 at 04:51:22PM -0800, Michael Chan wrote: > > From: Pavan Chebbi <pavan.chebbi@broadcom.com> > > > > In a busy network, especially with flow control enabled, we may > > experience timestamp query failures fairly regularly. After a while, > > dmesg may be flooded with timestamp query failure error messages. > > > > Silence the error message from the low level hwrm function that > > sends the firmware message. Change netdev_err() to netdev_WARN_ONCE() > > if this FW call ever fails. > > This is starting to cause a warning now, which is not ideal, because > this error-now-warning happens quite frequently in Meta's fleet. > > At the same time, we want to have our kernels running warninglessly. > Moreover, the call stack displayed by the warning doesn't seem to be > quite useful and doees not help to investigate "the problem", I _think_. > > Is it OK to move it back to error, something as: > > - netdev_WARN_ONCE(bp->dev, > + netdev_err_once(bp->dev, > "TS query for TX timer failed rc = %x\n", rc); Hi Breno, I think it is OK to change. Would you be submitting a patch for this? > > Thank you
On Wed, Jan 24, 2024 at 7:35 PM Pavan Chebbi <pavan.chebbi@broadcom.com> wrote: > > On Wed, Jan 24, 2024 at 3:48 PM Breno Leitao <leitao@debian.org> wrote: > > > > Hello Michael, Pavan, > > > > On Mon, Dec 11, 2023 at 04:51:22PM -0800, Michael Chan wrote: > > > From: Pavan Chebbi <pavan.chebbi@broadcom.com> > > > > > > In a busy network, especially with flow control enabled, we may > > > experience timestamp query failures fairly regularly. After a while, > > > dmesg may be flooded with timestamp query failure error messages. > > > > > > Silence the error message from the low level hwrm function that > > > sends the firmware message. Change netdev_err() to netdev_WARN_ONCE() > > > if this FW call ever fails. > > > > This is starting to cause a warning now, which is not ideal, because > > this error-now-warning happens quite frequently in Meta's fleet. > > > > At the same time, we want to have our kernels running warninglessly. > > Moreover, the call stack displayed by the warning doesn't seem to be > > quite useful and doees not help to investigate "the problem", I _think_. > > > > Is it OK to move it back to error, something as: > > > > - netdev_WARN_ONCE(bp->dev, > > + netdev_err_once(bp->dev, > > "TS query for TX timer failed rc = %x\n", rc); > > Hi Breno, I think it is OK to change. > Would you be submitting a patch for this? > Why not netdev_warn_once()? It will just print a message at the warning level without the stack trace. I think we consider this condition to be just a warning and not an error. Thanks.
On Thu, Jan 25, 2024 at 09:05:39AM +0530, Pavan Chebbi wrote: > On Wed, Jan 24, 2024 at 3:48 PM Breno Leitao <leitao@debian.org> wrote: > > > > Hello Michael, Pavan, > > > > On Mon, Dec 11, 2023 at 04:51:22PM -0800, Michael Chan wrote: > > > From: Pavan Chebbi <pavan.chebbi@broadcom.com> > > > > > > In a busy network, especially with flow control enabled, we may > > > experience timestamp query failures fairly regularly. After a while, > > > dmesg may be flooded with timestamp query failure error messages. > > > > > > Silence the error message from the low level hwrm function that > > > sends the firmware message. Change netdev_err() to netdev_WARN_ONCE() > > > if this FW call ever fails. > > > > This is starting to cause a warning now, which is not ideal, because > > this error-now-warning happens quite frequently in Meta's fleet. > > > > At the same time, we want to have our kernels running warninglessly. > > Moreover, the call stack displayed by the warning doesn't seem to be > > quite useful and doees not help to investigate "the problem", I _think_. > > > > Is it OK to move it back to error, something as: > > > > - netdev_WARN_ONCE(bp->dev, > > + netdev_err_once(bp->dev, > > "TS query for TX timer failed rc = %x\n", rc); > > Hi Breno, I think it is OK to change. > Would you be submitting a patch for this? Yes, let me send a patch. I will follow Michael's suggestion and use netdev_warn_once() Thanks!
On Wed, Jan 24, 2024 at 08:47:03PM -0800, Michael Chan wrote: > On Wed, Jan 24, 2024 at 7:35 PM Pavan Chebbi <pavan.chebbi@broadcom.com> wrote: > > > > On Wed, Jan 24, 2024 at 3:48 PM Breno Leitao <leitao@debian.org> wrote: > > > > > > Hello Michael, Pavan, > > > > > > On Mon, Dec 11, 2023 at 04:51:22PM -0800, Michael Chan wrote: > > > > From: Pavan Chebbi <pavan.chebbi@broadcom.com> > > > > > > > > In a busy network, especially with flow control enabled, we may > > > > experience timestamp query failures fairly regularly. After a while, > > > > dmesg may be flooded with timestamp query failure error messages. > > > > > > > > Silence the error message from the low level hwrm function that > > > > sends the firmware message. Change netdev_err() to netdev_WARN_ONCE() > > > > if this FW call ever fails. > > > > > > This is starting to cause a warning now, which is not ideal, because > > > this error-now-warning happens quite frequently in Meta's fleet. > > > > > > At the same time, we want to have our kernels running warninglessly. > > > Moreover, the call stack displayed by the warning doesn't seem to be > > > quite useful and doees not help to investigate "the problem", I _think_. > > > > > > Is it OK to move it back to error, something as: > > > > > > - netdev_WARN_ONCE(bp->dev, > > > + netdev_err_once(bp->dev, > > > "TS query for TX timer failed rc = %x\n", rc); > > > > Hi Breno, I think it is OK to change. > > Would you be submitting a patch for this? > > > > Why not netdev_warn_once()? It will just print a message at the > warning level without the stack trace. I think we consider this > condition to be just a warning and not an error. Thanks. This is even better. I will send a patch shortly. Thanks
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c index 3d1c36d384c2..adad188e38b8 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c @@ -129,7 +129,7 @@ static int bnxt_hwrm_port_ts_query(struct bnxt *bp, u32 flags, u64 *ts) } resp = hwrm_req_hold(bp, req); - rc = hwrm_req_send(bp, req); + rc = hwrm_req_send_silent(bp, req); if (!rc) *ts = le64_to_cpu(resp->ptp_msg_ts); hwrm_req_drop(bp, req); @@ -684,8 +684,8 @@ static void bnxt_stamp_tx_skb(struct bnxt *bp, struct sk_buff *skb) timestamp.hwtstamp = ns_to_ktime(ns); skb_tstamp_tx(ptp->tx_skb, ×tamp); } else { - netdev_err(bp->dev, "TS query for TX timer failed rc = %x\n", - rc); + netdev_WARN_ONCE(bp->dev, + "TS query for TX timer failed rc = %x\n", rc); } dev_kfree_skb_any(ptp->tx_skb);