diff mbox series

[mlx5-vhost,v2,01/10] net/mlx5: Support throttled commands from async API

Message ID 20240816090159.1967650-2-dtatulea@nvidia.com (mailing list archive)
State New, archived
Headers show
Series vdpa/mlx5: Parallelize device suspend/resume | expand

Commit Message

Dragos Tatulea Aug. 16, 2024, 9:01 a.m. UTC
Currently, commands that qualify as throttled can't be used via the
async API. That's due to the fact that the throttle semaphore can sleep
but the async API can't.

This patch allows throttling in the async API by using the tentative
variant of the semaphore and upon failure (semaphore at 0) returns EBUSY
to signal to the caller that they need to wait for the completion of
previously issued commands.

Furthermore, make sure that the semaphore is released in the callback.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Cc: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 21 ++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

Comments

Dragos Tatulea Sept. 9, 2024, 9:32 a.m. UTC | #1
On 16.08.24 11:01, Dragos Tatulea wrote:
> Currently, commands that qualify as throttled can't be used via the
> async API. That's due to the fact that the throttle semaphore can sleep
> but the async API can't.
> 
> This patch allows throttling in the async API by using the tentative
> variant of the semaphore and upon failure (semaphore at 0) returns EBUSY
> to signal to the caller that they need to wait for the completion of
> previously issued commands.
> 
> Furthermore, make sure that the semaphore is released in the callback.
> 
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Cc: Leon Romanovsky <leonro@nvidia.com>
> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Same reminder as in v1: Tariq is the maintainer for mlx5 so his review
also counts as Acked-by.

Thanks,
Dragos

> ---
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 21 ++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> index 20768ef2e9d2..f69c977c1569 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> @@ -1882,10 +1882,12 @@ static int cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size, void *out,
>  
>  	throttle_op = mlx5_cmd_is_throttle_opcode(opcode);
>  	if (throttle_op) {
> -		/* atomic context may not sleep */
> -		if (callback)
> -			return -EINVAL;
> -		down(&dev->cmd.vars.throttle_sem);
> +		if (callback) {
> +			if (down_trylock(&dev->cmd.vars.throttle_sem))
> +				return -EBUSY;
> +		} else {
> +			down(&dev->cmd.vars.throttle_sem);
> +		}
>  	}
>  
>  	pages_queue = is_manage_pages(in);
> @@ -2091,10 +2093,19 @@ static void mlx5_cmd_exec_cb_handler(int status, void *_work)
>  {
>  	struct mlx5_async_work *work = _work;
>  	struct mlx5_async_ctx *ctx;
> +	struct mlx5_core_dev *dev;
> +	u16 opcode;
>  
>  	ctx = work->ctx;
> -	status = cmd_status_err(ctx->dev, status, work->opcode, work->op_mod, work->out);
> +	dev = ctx->dev;
> +	opcode = work->opcode;
> +	status = cmd_status_err(dev, status, work->opcode, work->op_mod, work->out);
>  	work->user_callback(status, work);
> +	/* Can't access "work" from this point on. It could have been freed in
> +	 * the callback.
> +	 */
> +	if (mlx5_cmd_is_throttle_opcode(opcode))
> +		up(&dev->cmd.vars.throttle_sem);
>  	if (atomic_dec_and_test(&ctx->num_inflight))
>  		complete(&ctx->inflight_done);
>  }
Eugenio Pérez Sept. 11, 2024, 8 a.m. UTC | #2
On Mon, Sep 9, 2024 at 11:33 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
>
>
> On 16.08.24 11:01, Dragos Tatulea wrote:
> > Currently, commands that qualify as throttled can't be used via the
> > async API. That's due to the fact that the throttle semaphore can sleep
> > but the async API can't.
> >
> > This patch allows throttling in the async API by using the tentative
> > variant of the semaphore and upon failure (semaphore at 0) returns EBUSY
> > to signal to the caller that they need to wait for the completion of
> > previously issued commands.
> >
> > Furthermore, make sure that the semaphore is released in the callback.
> >
> > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > Cc: Leon Romanovsky <leonro@nvidia.com>
> > Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> Same reminder as in v1: Tariq is the maintainer for mlx5 so his review
> also counts as Acked-by.
>

Not sure if it was the case when you send the mail, but this series is
already in the maintainer's branch:
* https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/
* https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?h=vhost&id=691fd851e1bc8ec043798e1ab337305e6291cd6b

Thanks!
Dragos Tatulea Sept. 11, 2024, 5:05 p.m. UTC | #3
On 11.09.24 10:00, Eugenio Perez Martin wrote:
> On Mon, Sep 9, 2024 at 11:33 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>>
>>
>>
>> On 16.08.24 11:01, Dragos Tatulea wrote:
>>> Currently, commands that qualify as throttled can't be used via the
>>> async API. That's due to the fact that the throttle semaphore can sleep
>>> but the async API can't.
>>>
>>> This patch allows throttling in the async API by using the tentative
>>> variant of the semaphore and upon failure (semaphore at 0) returns EBUSY
>>> to signal to the caller that they need to wait for the completion of
>>> previously issued commands.
>>>
>>> Furthermore, make sure that the semaphore is released in the callback.
>>>
>>> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
>>> Cc: Leon Romanovsky <leonro@nvidia.com>
>>> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
>> Same reminder as in v1: Tariq is the maintainer for mlx5 so his review
>> also counts as Acked-by.
>>
> 
> Not sure if it was the case when you send the mail, but this series is
> already in the maintainer's branch:
> * https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/
> * https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?h=vhost&id=691fd851e1bc8ec043798e1ab337305e6291cd6b
It wasn't. Thanks for the notice!

Thanks,
Dragos
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 20768ef2e9d2..f69c977c1569 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -1882,10 +1882,12 @@  static int cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size, void *out,
 
 	throttle_op = mlx5_cmd_is_throttle_opcode(opcode);
 	if (throttle_op) {
-		/* atomic context may not sleep */
-		if (callback)
-			return -EINVAL;
-		down(&dev->cmd.vars.throttle_sem);
+		if (callback) {
+			if (down_trylock(&dev->cmd.vars.throttle_sem))
+				return -EBUSY;
+		} else {
+			down(&dev->cmd.vars.throttle_sem);
+		}
 	}
 
 	pages_queue = is_manage_pages(in);
@@ -2091,10 +2093,19 @@  static void mlx5_cmd_exec_cb_handler(int status, void *_work)
 {
 	struct mlx5_async_work *work = _work;
 	struct mlx5_async_ctx *ctx;
+	struct mlx5_core_dev *dev;
+	u16 opcode;
 
 	ctx = work->ctx;
-	status = cmd_status_err(ctx->dev, status, work->opcode, work->op_mod, work->out);
+	dev = ctx->dev;
+	opcode = work->opcode;
+	status = cmd_status_err(dev, status, work->opcode, work->op_mod, work->out);
 	work->user_callback(status, work);
+	/* Can't access "work" from this point on. It could have been freed in
+	 * the callback.
+	 */
+	if (mlx5_cmd_is_throttle_opcode(opcode))
+		up(&dev->cmd.vars.throttle_sem);
 	if (atomic_dec_and_test(&ctx->num_inflight))
 		complete(&ctx->inflight_done);
 }