Message ID | 1722901088-12115-1-git-send-email-longli@linuxonhyperv.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [net] net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings | expand |
> -----Original Message----- > From: longli@linuxonhyperv.com <longli@linuxonhyperv.com> > Sent: Monday, August 5, 2024 7:38 PM > To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang > <haiyangz@microsoft.com>; Wei Liu <wei.liu@kernel.org>; Dexuan Cui > <decui@microsoft.com>; David S. Miller <davem@davemloft.net>; Eric Dumazet > <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni > <pabeni@redhat.com>; Shradha Gupta <shradhagupta@linux.microsoft.com>; > Simon Horman <horms@kernel.org>; Konstantin Taranov > <kotaranov@microsoft.com>; Souradeep Chakrabarti > <schakrabarti@linux.microsoft.com>; Erick Archer > <erick.archer@outlook.com>; linux-hyperv@vger.kernel.org; > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; linux- > rdma@vger.kernel.org > Cc: Long Li <longli@microsoft.com>; stable@vger.kernel.org > Subject: [PATCH net] net: mana: Fix doorbell out of order violation and > avoid unnecessary doorbell rings > > From: Long Li <longli@microsoft.com> > > After napi_complete_done() is called, another NAPI may be running on > another CPU and ring the doorbell before the current CPU does. When > combined with unnecessary rings when there is no need to ARM the CQ, this > triggers error paths in the hardware. > > Fix this by always ring the doorbell in sequence and avoid unnecessary > rings. > > Cc: stable@vger.kernel.org > Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ") > Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Thank you.
在 2024/8/6 7:38, longli@linuxonhyperv.com 写道: > From: Long Li <longli@microsoft.com> > > After napi_complete_done() is called, another NAPI may be running on > another CPU and ring the doorbell before the current CPU does. When > combined with unnecessary rings when there is no need to ARM the CQ, this > triggers error paths in the hardware. > > Fix this by always ring the doorbell in sequence and avoid unnecessary > rings. Trivial problem^_^ s/ring/ringing ? Zhu Yanjun > > Cc: stable@vger.kernel.org > Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ") > Signed-off-by: Long Li <longli@microsoft.com> > --- > drivers/net/ethernet/microsoft/mana/mana_en.c | 24 ++++++++++++------- > include/net/mana/mana.h | 1 + > 2 files changed, 16 insertions(+), 9 deletions(-) > > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c > index d2f07e179e86..7d08e23c6749 100644 > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c > @@ -1788,7 +1788,6 @@ static void mana_poll_rx_cq(struct mana_cq *cq) > static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue) > { > struct mana_cq *cq = context; > - u8 arm_bit; > int w; > > WARN_ON_ONCE(cq->gdma_cq != gdma_queue); > @@ -1799,16 +1798,23 @@ static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue) > mana_poll_tx_cq(cq); > > w = cq->work_done; > - > - if (w < cq->budget && > - napi_complete_done(&cq->napi, w)) { > - arm_bit = SET_ARM_BIT; > - } else { > - arm_bit = 0; > + cq->work_done_since_doorbell += w; > + > + if (w < cq->budget) { > + mana_gd_ring_cq(gdma_queue, SET_ARM_BIT); > + cq->work_done_since_doorbell = 0; > + napi_complete_done(&cq->napi, w); > + } else if (cq->work_done_since_doorbell > > + cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) { > + /* MANA hardware requires at least one doorbell ring every 8 > + * wraparounds of CQ even there is no need to ARM. This driver > + * rings the doorbell as soon as we have execceded 4 > + * wraparounds. > + */ > + mana_gd_ring_cq(gdma_queue, 0); > + cq->work_done_since_doorbell = 0; > } > > - mana_gd_ring_cq(gdma_queue, arm_bit); > - > return w; > } > > diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h > index 6439fd8b437b..7caa334f4888 100644 > --- a/include/net/mana/mana.h > +++ b/include/net/mana/mana.h > @@ -275,6 +275,7 @@ struct mana_cq { > /* NAPI data */ > struct napi_struct napi; > int work_done; > + int work_done_since_doorbell; > int budget; > }; >
> From: longli@linuxonhyperv.com <longli@linuxonhyperv.com> > Sent: Monday, August 5, 2024 4:38 PM > [...] > After napi_complete_done() is called, another NAPI may be running on > another CPU and ring the doorbell before the current CPU does. When Can you please share more details about "another NAPI"? Is it about busy_poll? > combined with unnecessary rings when there is no need to ARM the CQ, this > triggers error paths in the hardware. > > Fix this by always ring the doorbell in sequence and avoid unnecessary > rings. I'm not sure what "error paths in the hardware" means. It's better to describe the user-visible consequence. Maybe this is clearer: When there is no need to arm the CQ from NAPI's perspective, the driver must not combine "too many" arming operations due to a MANA hardware requirement: the driver must ring the doorbell at least once within every 8 wraparounds of the CQ, otherwise "XXX" would happen. //Dexuan: I don't know what the "XXX" is Add a per-CQ counter cq->work_done_since_doorbell, and make sure the CQ is armed within 4 wraparounds of the CQ. //Dexuan: why not 8 or 7? > + if (w < cq->budget) { > + mana_gd_ring_cq(gdma_queue, SET_ARM_BIT); > + cq->work_done_since_doorbell = 0; > + napi_complete_done(&cq->napi, w); > + } else if (cq->work_done_since_doorbell > > + cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) { > + /* MANA hardware requires at least one doorbell ring every 8 s/ring every 8/arming within every 8/ ? > + * wraparounds of CQ even there is no need to ARM. This > driver s/ARM/arming/ ? s/even/even if/ ? Thanks, Dexuan
> Subject: Re: [PATCH net] net: mana: Fix doorbell out of order violation and avoid > unnecessary doorbell rings > > 在 2024/8/6 7:38, longli@linuxonhyperv.com 写道: > > From: Long Li <longli@microsoft.com> > > > > After napi_complete_done() is called, another NAPI may be running on > > another CPU and ring the doorbell before the current CPU does. When > > combined with unnecessary rings when there is no need to ARM the CQ, > > this triggers error paths in the hardware. > > > > Fix this by always ring the doorbell in sequence and avoid unnecessary > > rings. > > Trivial problem^_^ > > s/ring/ringing ? > > Zhu Yanjun I'm sending v2 to fix this. Thanks, Long
> Subject: RE: [PATCH net] net: mana: Fix doorbell out of order violation and avoid > unnecessary doorbell rings > > > From: longli@linuxonhyperv.com <longli@linuxonhyperv.com> > > Sent: Monday, August 5, 2024 4:38 PM > > [...] > > After napi_complete_done() is called, another NAPI may be running on > > another CPU and ring the doorbell before the current CPU does. When > > Can you please share more details about "another NAPI"? Is it about busy_poll? > > > combined with unnecessary rings when there is no need to ARM the CQ, > > this triggers error paths in the hardware. > > > > Fix this by always ring the doorbell in sequence and avoid unnecessary > > rings. > > I'm not sure what "error paths in the hardware" means. It's better to describe the > user-visible consequence. > > Maybe this is clearer: > > When there is no need to arm the CQ from NAPI's perspective, the driver must > not combine "too many" arming operations due to a MANA hardware > requirement: > the driver must ring the doorbell at least once within every 8 wraparounds of the > CQ, otherwise "XXX" would happen. //Dexuan: I don't know what the "XXX" is > > Add a per-CQ counter cq->work_done_since_doorbell, and make sure the CQ is > armed within 4 wraparounds of the CQ. //Dexuan: why not 8 or 7? I'm sending v2 to address the details in the comments. > > > > + if (w < cq->budget) { > > + mana_gd_ring_cq(gdma_queue, SET_ARM_BIT); > > + cq->work_done_since_doorbell = 0; > > + napi_complete_done(&cq->napi, w); > > + } else if (cq->work_done_since_doorbell > > > + cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) { > > + /* MANA hardware requires at least one doorbell ring every 8 > s/ring every 8/arming within every 8/ ? > > > + * wraparounds of CQ even there is no need to ARM. This > > driver > > s/ARM/arming/ ? > s/even/even if/ ? Will fix this in v2. Thanks, Long
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c index d2f07e179e86..7d08e23c6749 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -1788,7 +1788,6 @@ static void mana_poll_rx_cq(struct mana_cq *cq) static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue) { struct mana_cq *cq = context; - u8 arm_bit; int w; WARN_ON_ONCE(cq->gdma_cq != gdma_queue); @@ -1799,16 +1798,23 @@ static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue) mana_poll_tx_cq(cq); w = cq->work_done; - - if (w < cq->budget && - napi_complete_done(&cq->napi, w)) { - arm_bit = SET_ARM_BIT; - } else { - arm_bit = 0; + cq->work_done_since_doorbell += w; + + if (w < cq->budget) { + mana_gd_ring_cq(gdma_queue, SET_ARM_BIT); + cq->work_done_since_doorbell = 0; + napi_complete_done(&cq->napi, w); + } else if (cq->work_done_since_doorbell > + cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) { + /* MANA hardware requires at least one doorbell ring every 8 + * wraparounds of CQ even there is no need to ARM. This driver + * rings the doorbell as soon as we have execceded 4 + * wraparounds. + */ + mana_gd_ring_cq(gdma_queue, 0); + cq->work_done_since_doorbell = 0; } - mana_gd_ring_cq(gdma_queue, arm_bit); - return w; } diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h index 6439fd8b437b..7caa334f4888 100644 --- a/include/net/mana/mana.h +++ b/include/net/mana/mana.h @@ -275,6 +275,7 @@ struct mana_cq { /* NAPI data */ struct napi_struct napi; int work_done; + int work_done_since_doorbell; int budget; };