Message ID | d941bb44798b938705ca6944d8ee02c97af7ddd5.1664017836.git.leonro@nvidia.com (mailing list archive) |
---|---|
State | Deferred |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] net/mlx5: Make ASO poll CQ usable in atomic context | expand |
On 24 Sep 14:17, Leon Romanovsky wrote: >From: Leon Romanovsky <leonro@nvidia.com> > >Poll CQ functions shouldn't sleep as they are called in atomic context. >The following splat appears once the mlx5_aso_poll_cq() is used in such >flow. > > BUG: scheduling while atomic: swapper/17/0/0x00000100 ... > /* With newer FW, the wait for the first ASO WQE is more than 2us, put the wait 10ms. */ >- err = mlx5_aso_poll_cq(aso, true, 10); >+ expires = jiffies + msecs_to_jiffies(10); >+ do { >+ err = mlx5_aso_poll_cq(aso, true); >+ } while (err && time_is_after_jiffies(expires)); > mutex_unlock(&flow_meters->aso_lock); ^^^^ busy poll won't work, this mutex is held and can sleep anyway. Let's discuss internally and solve this by design.
On Sat, Sep 24, 2022 at 10:24:25AM -0700, Saeed Mahameed wrote: > On 24 Sep 14:17, Leon Romanovsky wrote: > > From: Leon Romanovsky <leonro@nvidia.com> > > > > Poll CQ functions shouldn't sleep as they are called in atomic context. > > The following splat appears once the mlx5_aso_poll_cq() is used in such > > flow. > > > > BUG: scheduling while atomic: swapper/17/0/0x00000100 > > ... > > > /* With newer FW, the wait for the first ASO WQE is more than 2us, put the wait 10ms. */ > > - err = mlx5_aso_poll_cq(aso, true, 10); > > + expires = jiffies + msecs_to_jiffies(10); > > + do { > > + err = mlx5_aso_poll_cq(aso, true); > > + } while (err && time_is_after_jiffies(expires)); > > mutex_unlock(&flow_meters->aso_lock); > ^^^^ > busy poll won't work, this mutex is held and can sleep anyway. > Let's discuss internally and solve this by design. This is TC code, it doesn't need atomic context and had mutex + sleep from the beginning. My change cleans mlx5_aso_poll_cq() from busy loop for the IPsec path, so why do you plan to change in the design? Thanks
On 24 Sep 21:51, Leon Romanovsky wrote: >On Sat, Sep 24, 2022 at 10:24:25AM -0700, Saeed Mahameed wrote: >> On 24 Sep 14:17, Leon Romanovsky wrote: >> > From: Leon Romanovsky <leonro@nvidia.com> >> > >> > Poll CQ functions shouldn't sleep as they are called in atomic context. >> > The following splat appears once the mlx5_aso_poll_cq() is used in such >> > flow. >> > >> > BUG: scheduling while atomic: swapper/17/0/0x00000100 >> >> ... >> >> > /* With newer FW, the wait for the first ASO WQE is more than 2us, put the wait 10ms. */ >> > - err = mlx5_aso_poll_cq(aso, true, 10); >> > + expires = jiffies + msecs_to_jiffies(10); >> > + do { >> > + err = mlx5_aso_poll_cq(aso, true); >> > + } while (err && time_is_after_jiffies(expires)); >> > mutex_unlock(&flow_meters->aso_lock); >> ^^^^ >> busy poll won't work, this mutex is held and can sleep anyway. >> Let's discuss internally and solve this by design. > >This is TC code, it doesn't need atomic context and had mutex + sleep >from the beginning. > then let's bring back the sleep for TC use-case, we don't need to burn the CPU! But still the change has bigger effect on other aso use cases, see below. >My change cleans mlx5_aso_poll_cq() from busy loop for the IPsec path, >so why do you plan to change in the design? > a typical use case for aso is post_aso_wqe(); poll_aso_cqe(); The HW needs time to consume and excute the wqe then generate the CQE. This is why the driver needs to wait a bit when polling for the cqe, your change will break others. Let's discuss internally. >Thanks
On Sat, Sep 24, 2022 at 01:11:31PM -0700, Saeed Mahameed wrote: > On 24 Sep 21:51, Leon Romanovsky wrote: > > On Sat, Sep 24, 2022 at 10:24:25AM -0700, Saeed Mahameed wrote: > > > On 24 Sep 14:17, Leon Romanovsky wrote: > > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > > > > > Poll CQ functions shouldn't sleep as they are called in atomic context. > > > > The following splat appears once the mlx5_aso_poll_cq() is used in such > > > > flow. > > > > > > > > BUG: scheduling while atomic: swapper/17/0/0x00000100 > > > > > > ... > > > > > > > /* With newer FW, the wait for the first ASO WQE is more than 2us, put the wait 10ms. */ > > > > - err = mlx5_aso_poll_cq(aso, true, 10); > > > > + expires = jiffies + msecs_to_jiffies(10); > > > > + do { > > > > + err = mlx5_aso_poll_cq(aso, true); > > > > + } while (err && time_is_after_jiffies(expires)); > > > > mutex_unlock(&flow_meters->aso_lock); > > > ^^^^ > > > busy poll won't work, this mutex is held and can sleep anyway. > > > Let's discuss internally and solve this by design. > > > > This is TC code, it doesn't need atomic context and had mutex + sleep > > from the beginning. > > > > then let's bring back the sleep for TC use-case, we don't need to burn the > CPU! Again, this is exactly how TC was implemented. The difference is that before it burnt cycles in poll_cq and now it does it inside TC code. > But still the change has bigger effect on other aso use cases, see below. I have serious doubts about it. > > > My change cleans mlx5_aso_poll_cq() from busy loop for the IPsec path, > > so why do you plan to change in the design? > > > > a typical use case for aso is > > post_aso_wqe(); > poll_aso_cqe(); > > The HW needs time to consume and excute the wqe then generate the CQE. > This is why the driver needs to wait a bit when polling for the cqe, > your change will break others. Let's discuss internally. IPsec was always implemented without any time delays, and I'm sure that MACsec doesn't need too, it is probably copy/paste. More generally, post_wqe->poll_cq is very basic paradigm in RDMA and delays were never needed. But if you insist, let's discuss internally. Thanks > > > Thanks
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/meter.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/meter.c index a53e205f4a89..b9011356299e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/meter.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/meter.c @@ -115,6 +115,7 @@ mlx5e_tc_meter_modify(struct mlx5_core_dev *mdev, struct mlx5e_flow_meters *flow_meters; u8 cir_man, cir_exp, cbs_man, cbs_exp; struct mlx5_aso_wqe *aso_wqe; + unsigned long expires; struct mlx5_aso *aso; u64 rate, burst; u8 ds_cnt; @@ -187,7 +188,10 @@ mlx5e_tc_meter_modify(struct mlx5_core_dev *mdev, mlx5_aso_post_wqe(aso, true, &aso_wqe->ctrl); /* With newer FW, the wait for the first ASO WQE is more than 2us, put the wait 10ms. */ - err = mlx5_aso_poll_cq(aso, true, 10); + expires = jiffies + msecs_to_jiffies(10); + do { + err = mlx5_aso_poll_cq(aso, true); + } while (err && time_is_after_jiffies(expires)); mutex_unlock(&flow_meters->aso_lock); return err; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c index a13169723153..31e1973c3740 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c @@ -1441,7 +1441,7 @@ static int macsec_aso_set_arm_event(struct mlx5_core_dev *mdev, struct mlx5e_mac MLX5_ACCESS_ASO_OPC_MOD_MACSEC); macsec_aso_build_ctrl(aso, &aso_wqe->aso_ctrl, in); mlx5_aso_post_wqe(maso, false, &aso_wqe->ctrl); - err = mlx5_aso_poll_cq(maso, false, 10); + err = mlx5_aso_poll_cq(maso, false); mutex_unlock(&aso->aso_lock); return err; @@ -1466,7 +1466,7 @@ static int macsec_aso_query(struct mlx5_core_dev *mdev, struct mlx5e_macsec *mac macsec_aso_build_wqe_ctrl_seg(aso, &aso_wqe->aso_ctrl, NULL); mlx5_aso_post_wqe(maso, false, &aso_wqe->ctrl); - err = mlx5_aso_poll_cq(maso, false, 10); + err = mlx5_aso_poll_cq(maso, false); if (err) goto err_out; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c index 21e14507ff5c..baa8092f335e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c @@ -381,20 +381,12 @@ void mlx5_aso_post_wqe(struct mlx5_aso *aso, bool with_data, WRITE_ONCE(doorbell_cseg, NULL); } -int mlx5_aso_poll_cq(struct mlx5_aso *aso, bool with_data, u32 interval_ms) +int mlx5_aso_poll_cq(struct mlx5_aso *aso, bool with_data) { struct mlx5_aso_cq *cq = &aso->cq; struct mlx5_cqe64 *cqe; - unsigned long expires; cqe = mlx5_cqwq_get_cqe(&cq->wq); - - expires = jiffies + msecs_to_jiffies(interval_ms); - while (!cqe && time_is_after_jiffies(expires)) { - usleep_range(2, 10); - cqe = mlx5_cqwq_get_cqe(&cq->wq); - } - if (!cqe) return -ETIMEDOUT; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.h index d854e01d7fc5..2d40dcf9d42e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.h @@ -83,7 +83,7 @@ void mlx5_aso_build_wqe(struct mlx5_aso *aso, u8 ds_cnt, u32 obj_id, u32 opc_mode); void mlx5_aso_post_wqe(struct mlx5_aso *aso, bool with_data, struct mlx5_wqe_ctrl_seg *doorbell_cseg); -int mlx5_aso_poll_cq(struct mlx5_aso *aso, bool with_data, u32 interval_ms); +int mlx5_aso_poll_cq(struct mlx5_aso *aso, bool with_data); struct mlx5_aso *mlx5_aso_create(struct mlx5_core_dev *mdev, u32 pdn); void mlx5_aso_destroy(struct mlx5_aso *aso);