mbox series

[vhost,0/7] vdpa/mlx5: Parallelize device suspend/resume

Message ID 20240802072039.267446-1-dtatulea@nvidia.com (mailing list archive)
Headers show
Series vdpa/mlx5: Parallelize device suspend/resume | expand

Message

Dragos Tatulea Aug. 2, 2024, 7:20 a.m. UTC
This series parallelizes the mlx5_vdpa device suspend and resume
operations through the firmware async API. The purpose is to reduce live
migration downtime.

The series starts with changing the VQ suspend and resume commands
to the async API. After that, the switch is made to issue multiple
commands of the same type in parallel.

Finally, a bonus improvement is thrown in: keep the notifierd enabled
during suspend but make it a NOP. Upon resume make sure that the link
state is forwarded. This shaves around 30ms per device constant time.

For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
x 2 threads per core), the improvements are:

+-------------------+--------+--------+-----------+
| operation         | Before | After  | Reduction |
|-------------------+--------+--------+-----------|
| mlx5_vdpa_suspend | 37 ms  | 2.5 ms |     14x   |
| mlx5_vdpa_resume  | 16 ms  | 5 ms   |      3x   |
+-------------------+--------+--------+-----------+

Note for the maintainers:
The first patch contains changes for mlx5_core. This must be applied
into the mlx5-vhost tree [0] first. Once this patch is applied on
mlx5-vhost, the change has to be pulled from mlx5-vdpa into the vhost
tree and only then the remaining patches can be applied.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost

Dragos Tatulea (7):
  net/mlx5: Support throttled commands from async API
  vdpa/mlx5: Introduce error logging function
  vdpa/mlx5: Use async API for vq query command
  vdpa/mlx5: Use async API for vq modify commands
  vdpa/mlx5: Parallelize device suspend
  vdpa/mlx5: Parallelize device resume
  vdpa/mlx5: Keep notifiers during suspend but ignore

 drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
 drivers/vdpa/mlx5/core/mlx5_vdpa.h            |   7 +
 drivers/vdpa/mlx5/net/mlx5_vnet.c             | 435 +++++++++++++-----
 3 files changed, 333 insertions(+), 130 deletions(-)

Comments

Michael S. Tsirkin Aug. 2, 2024, 1:14 p.m. UTC | #1
On Fri, Aug 02, 2024 at 10:20:17AM +0300, Dragos Tatulea wrote:
> This series parallelizes the mlx5_vdpa device suspend and resume
> operations through the firmware async API. The purpose is to reduce live
> migration downtime.
> 
> The series starts with changing the VQ suspend and resume commands
> to the async API. After that, the switch is made to issue multiple
> commands of the same type in parallel.
> 
> Finally, a bonus improvement is thrown in: keep the notifierd enabled
> during suspend but make it a NOP. Upon resume make sure that the link
> state is forwarded. This shaves around 30ms per device constant time.
> 
> For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
> x 2 threads per core), the improvements are:
> 
> +-------------------+--------+--------+-----------+
> | operation         | Before | After  | Reduction |
> |-------------------+--------+--------+-----------|
> | mlx5_vdpa_suspend | 37 ms  | 2.5 ms |     14x   |
> | mlx5_vdpa_resume  | 16 ms  | 5 ms   |      3x   |
> +-------------------+--------+--------+-----------+
> 
> Note for the maintainers:
> The first patch contains changes for mlx5_core. This must be applied
> into the mlx5-vhost tree [0] first. Once this patch is applied on
> mlx5-vhost, the change has to be pulled from mlx5-vdpa into the vhost
> tree and only then the remaining patches can be applied.

Or maintainer just acks it and I apply directly.

Let me know when all this can happen.

> [0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
> 
> Dragos Tatulea (7):
>   net/mlx5: Support throttled commands from async API
>   vdpa/mlx5: Introduce error logging function
>   vdpa/mlx5: Use async API for vq query command
>   vdpa/mlx5: Use async API for vq modify commands
>   vdpa/mlx5: Parallelize device suspend
>   vdpa/mlx5: Parallelize device resume
>   vdpa/mlx5: Keep notifiers during suspend but ignore
> 
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
>  drivers/vdpa/mlx5/core/mlx5_vdpa.h            |   7 +
>  drivers/vdpa/mlx5/net/mlx5_vnet.c             | 435 +++++++++++++-----
>  3 files changed, 333 insertions(+), 130 deletions(-)
> 
> -- 
> 2.45.2
Leon Romanovsky Aug. 4, 2024, 8:48 a.m. UTC | #2
On Fri, Aug 02, 2024 at 09:14:28AM -0400, Michael S. Tsirkin wrote:
> On Fri, Aug 02, 2024 at 10:20:17AM +0300, Dragos Tatulea wrote:
> > This series parallelizes the mlx5_vdpa device suspend and resume
> > operations through the firmware async API. The purpose is to reduce live
> > migration downtime.
> > 
> > The series starts with changing the VQ suspend and resume commands
> > to the async API. After that, the switch is made to issue multiple
> > commands of the same type in parallel.
> > 
> > Finally, a bonus improvement is thrown in: keep the notifierd enabled
> > during suspend but make it a NOP. Upon resume make sure that the link
> > state is forwarded. This shaves around 30ms per device constant time.
> > 
> > For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
> > x 2 threads per core), the improvements are:
> > 
> > +-------------------+--------+--------+-----------+
> > | operation         | Before | After  | Reduction |
> > |-------------------+--------+--------+-----------|
> > | mlx5_vdpa_suspend | 37 ms  | 2.5 ms |     14x   |
> > | mlx5_vdpa_resume  | 16 ms  | 5 ms   |      3x   |
> > +-------------------+--------+--------+-----------+
> > 
> > Note for the maintainers:
> > The first patch contains changes for mlx5_core. This must be applied
> > into the mlx5-vhost tree [0] first. Once this patch is applied on
> > mlx5-vhost, the change has to be pulled from mlx5-vdpa into the vhost
> > tree and only then the remaining patches can be applied.
> 
> Or maintainer just acks it and I apply directly.

We can do it, but there is a potential to create a conflict between your tree
and netdev for whole cycle, which will be a bit annoying. Easiest way to avoid
this is to have a shared branch, but in august everyone is on vacation, so it
will be probably fine to apply such patch directly.

Thanks

> 
> Let me know when all this can happen.
> 
> > [0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
> > 
> > Dragos Tatulea (7):
> >   net/mlx5: Support throttled commands from async API
> >   vdpa/mlx5: Introduce error logging function
> >   vdpa/mlx5: Use async API for vq query command
> >   vdpa/mlx5: Use async API for vq modify commands
> >   vdpa/mlx5: Parallelize device suspend
> >   vdpa/mlx5: Parallelize device resume
> >   vdpa/mlx5: Keep notifiers during suspend but ignore
> > 
> >  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
> >  drivers/vdpa/mlx5/core/mlx5_vdpa.h            |   7 +
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c             | 435 +++++++++++++-----
> >  3 files changed, 333 insertions(+), 130 deletions(-)
> > 
> > -- 
> > 2.45.2
> 
>
Michael S. Tsirkin Aug. 4, 2024, 1:39 p.m. UTC | #3
On Sun, Aug 04, 2024 at 11:48:39AM +0300, Leon Romanovsky wrote:
> On Fri, Aug 02, 2024 at 09:14:28AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Aug 02, 2024 at 10:20:17AM +0300, Dragos Tatulea wrote:
> > > This series parallelizes the mlx5_vdpa device suspend and resume
> > > operations through the firmware async API. The purpose is to reduce live
> > > migration downtime.
> > > 
> > > The series starts with changing the VQ suspend and resume commands
> > > to the async API. After that, the switch is made to issue multiple
> > > commands of the same type in parallel.
> > > 
> > > Finally, a bonus improvement is thrown in: keep the notifierd enabled
> > > during suspend but make it a NOP. Upon resume make sure that the link
> > > state is forwarded. This shaves around 30ms per device constant time.
> > > 
> > > For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
> > > x 2 threads per core), the improvements are:
> > > 
> > > +-------------------+--------+--------+-----------+
> > > | operation         | Before | After  | Reduction |
> > > |-------------------+--------+--------+-----------|
> > > | mlx5_vdpa_suspend | 37 ms  | 2.5 ms |     14x   |
> > > | mlx5_vdpa_resume  | 16 ms  | 5 ms   |      3x   |
> > > +-------------------+--------+--------+-----------+
> > > 
> > > Note for the maintainers:
> > > The first patch contains changes for mlx5_core. This must be applied
> > > into the mlx5-vhost tree [0] first. Once this patch is applied on
> > > mlx5-vhost, the change has to be pulled from mlx5-vdpa into the vhost
> > > tree and only then the remaining patches can be applied.
> > 
> > Or maintainer just acks it and I apply directly.
> 
> We can do it, but there is a potential to create a conflict between your tree
> and netdev for whole cycle, which will be a bit annoying. Easiest way to avoid
> this is to have a shared branch, but in august everyone is on vacation, so it
> will be probably fine to apply such patch directly.
> 
> Thanks

We can let Linus do something, it's ok ;)

> > 
> > Let me know when all this can happen.
> > 
> > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
> > > 
> > > Dragos Tatulea (7):
> > >   net/mlx5: Support throttled commands from async API
> > >   vdpa/mlx5: Introduce error logging function
> > >   vdpa/mlx5: Use async API for vq query command
> > >   vdpa/mlx5: Use async API for vq modify commands
> > >   vdpa/mlx5: Parallelize device suspend
> > >   vdpa/mlx5: Parallelize device resume
> > >   vdpa/mlx5: Keep notifiers during suspend but ignore
> > > 
> > >  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
> > >  drivers/vdpa/mlx5/core/mlx5_vdpa.h            |   7 +
> > >  drivers/vdpa/mlx5/net/mlx5_vnet.c             | 435 +++++++++++++-----
> > >  3 files changed, 333 insertions(+), 130 deletions(-)
> > > 
> > > -- 
> > > 2.45.2
> > 
> >
Leon Romanovsky Aug. 4, 2024, 2:52 p.m. UTC | #4
On Sun, Aug 04, 2024 at 09:39:29AM -0400, Michael S. Tsirkin wrote:
> On Sun, Aug 04, 2024 at 11:48:39AM +0300, Leon Romanovsky wrote:
> > On Fri, Aug 02, 2024 at 09:14:28AM -0400, Michael S. Tsirkin wrote:
> > > On Fri, Aug 02, 2024 at 10:20:17AM +0300, Dragos Tatulea wrote:
> > > > This series parallelizes the mlx5_vdpa device suspend and resume
> > > > operations through the firmware async API. The purpose is to reduce live
> > > > migration downtime.
> > > > 
> > > > The series starts with changing the VQ suspend and resume commands
> > > > to the async API. After that, the switch is made to issue multiple
> > > > commands of the same type in parallel.
> > > > 
> > > > Finally, a bonus improvement is thrown in: keep the notifierd enabled
> > > > during suspend but make it a NOP. Upon resume make sure that the link
> > > > state is forwarded. This shaves around 30ms per device constant time.
> > > > 
> > > > For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
> > > > x 2 threads per core), the improvements are:
> > > > 
> > > > +-------------------+--------+--------+-----------+
> > > > | operation         | Before | After  | Reduction |
> > > > |-------------------+--------+--------+-----------|
> > > > | mlx5_vdpa_suspend | 37 ms  | 2.5 ms |     14x   |
> > > > | mlx5_vdpa_resume  | 16 ms  | 5 ms   |      3x   |
> > > > +-------------------+--------+--------+-----------+
> > > > 
> > > > Note for the maintainers:
> > > > The first patch contains changes for mlx5_core. This must be applied
> > > > into the mlx5-vhost tree [0] first. Once this patch is applied on
> > > > mlx5-vhost, the change has to be pulled from mlx5-vdpa into the vhost
> > > > tree and only then the remaining patches can be applied.
> > > 
> > > Or maintainer just acks it and I apply directly.
> > 
> > We can do it, but there is a potential to create a conflict between your tree
> > and netdev for whole cycle, which will be a bit annoying. Easiest way to avoid
> > this is to have a shared branch, but in august everyone is on vacation, so it
> > will be probably fine to apply such patch directly.
> > 
> > Thanks
> 
> We can let Linus do something, it's ok ;)

Right and this is how it was for years - Linus dealt with the conflicts
between RDMA and netdev, until he pushed us to have a shared branch :).

However, in this specific cycle and for this specific change, we probably won't
get any conflicts between various trees.

Thanks

> 
> > > 
> > > Let me know when all this can happen.
> > > 
> > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
> > > > 
> > > > Dragos Tatulea (7):
> > > >   net/mlx5: Support throttled commands from async API
> > > >   vdpa/mlx5: Introduce error logging function
> > > >   vdpa/mlx5: Use async API for vq query command
> > > >   vdpa/mlx5: Use async API for vq modify commands
> > > >   vdpa/mlx5: Parallelize device suspend
> > > >   vdpa/mlx5: Parallelize device resume
> > > >   vdpa/mlx5: Keep notifiers during suspend but ignore
> > > > 
> > > >  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
> > > >  drivers/vdpa/mlx5/core/mlx5_vdpa.h            |   7 +
> > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c             | 435 +++++++++++++-----
> > > >  3 files changed, 333 insertions(+), 130 deletions(-)
> > > > 
> > > > -- 
> > > > 2.45.2
> > > 
> > > 
>
Eugenio Perez Martin Aug. 7, 2024, 1:25 p.m. UTC | #5
On Fri, Aug 2, 2024 at 9:24 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> This series parallelizes the mlx5_vdpa device suspend and resume
> operations through the firmware async API. The purpose is to reduce live
> migration downtime.
>
> The series starts with changing the VQ suspend and resume commands
> to the async API. After that, the switch is made to issue multiple
> commands of the same type in parallel.
>

There is a missed opportunity processing the CVQ MQ command here,
isn't it? It can be applied on top in another series for sure.

> Finally, a bonus improvement is thrown in: keep the notifierd enabled
> during suspend but make it a NOP. Upon resume make sure that the link
> state is forwarded. This shaves around 30ms per device constant time.
>
> For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
> x 2 threads per core), the improvements are:
>
> +-------------------+--------+--------+-----------+
> | operation         | Before | After  | Reduction |
> |-------------------+--------+--------+-----------|
> | mlx5_vdpa_suspend | 37 ms  | 2.5 ms |     14x   |
> | mlx5_vdpa_resume  | 16 ms  | 5 ms   |      3x   |
> +-------------------+--------+--------+-----------+
>

Looks great :).

Apart from the nitpick,

Acked-by: Eugenio Pérez <eperezma@redhat.com>

For the vhost part.

Thanks!

> Note for the maintainers:
> The first patch contains changes for mlx5_core. This must be applied
> into the mlx5-vhost tree [0] first. Once this patch is applied on
> mlx5-vhost, the change has to be pulled from mlx5-vdpa into the vhost
> tree and only then the remaining patches can be applied.
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
>
> Dragos Tatulea (7):
>   net/mlx5: Support throttled commands from async API
>   vdpa/mlx5: Introduce error logging function
>   vdpa/mlx5: Use async API for vq query command
>   vdpa/mlx5: Use async API for vq modify commands
>   vdpa/mlx5: Parallelize device suspend
>   vdpa/mlx5: Parallelize device resume
>   vdpa/mlx5: Keep notifiers during suspend but ignore
>
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
>  drivers/vdpa/mlx5/core/mlx5_vdpa.h            |   7 +
>  drivers/vdpa/mlx5/net/mlx5_vnet.c             | 435 +++++++++++++-----
>  3 files changed, 333 insertions(+), 130 deletions(-)
>
> --
> 2.45.2
>
Dragos Tatulea Aug. 7, 2024, 2:54 p.m. UTC | #6
On 07.08.24 15:25, Eugenio Perez Martin wrote:
> On Fri, Aug 2, 2024 at 9:24 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>>
>> This series parallelizes the mlx5_vdpa device suspend and resume
>> operations through the firmware async API. The purpose is to reduce live
>> migration downtime.
>>
>> The series starts with changing the VQ suspend and resume commands
>> to the async API. After that, the switch is made to issue multiple
>> commands of the same type in parallel.
>>
> 
> There is a missed opportunity processing the CVQ MQ command here,
> isn't it? It can be applied on top in another series for sure.
> 
Initially I considered that it would complicate the code too much in
change_num_qps(). But in the current state of the patches it's doable.

Will send a V2 with an extra patch for this.

>> Finally, a bonus improvement is thrown in: keep the notifierd enabled
>> during suspend but make it a NOP. Upon resume make sure that the link
>> state is forwarded. This shaves around 30ms per device constant time.
>>
>> For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
>> x 2 threads per core), the improvements are:
>>
>> +-------------------+--------+--------+-----------+
>> | operation         | Before | After  | Reduction |
>> |-------------------+--------+--------+-----------|
>> | mlx5_vdpa_suspend | 37 ms  | 2.5 ms |     14x   |
>> | mlx5_vdpa_resume  | 16 ms  | 5 ms   |      3x   |
>> +-------------------+--------+--------+-----------+
>>
> 
> Looks great :).
> 
> Apart from the nitpick,
>
> Acked-by: Eugenio Pérez <eperezma@redhat.com>
> 
> For the vhost part.
Thanks!

> 
> Thanks!
> 
>> Note for the maintainers:
>> The first patch contains changes for mlx5_core. This must be applied
>> into the mlx5-vhost tree [0] first. Once this patch is applied on
>> mlx5-vhost, the change has to be pulled from mlx5-vdpa into the vhost
>> tree and only then the remaining patches can be applied.
>>
>> [0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
>>
>> Dragos Tatulea (7):
>>   net/mlx5: Support throttled commands from async API
>>   vdpa/mlx5: Introduce error logging function
>>   vdpa/mlx5: Use async API for vq query command
>>   vdpa/mlx5: Use async API for vq modify commands
>>   vdpa/mlx5: Parallelize device suspend
>>   vdpa/mlx5: Parallelize device resume
>>   vdpa/mlx5: Keep notifiers during suspend but ignore
>>
>>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
>>  drivers/vdpa/mlx5/core/mlx5_vdpa.h            |   7 +
>>  drivers/vdpa/mlx5/net/mlx5_vnet.c             | 435 +++++++++++++-----
>>  3 files changed, 333 insertions(+), 130 deletions(-)
>>
>> --
>> 2.45.2
>>
>
Dragos Tatulea Aug. 16, 2024, 9:13 a.m. UTC | #7
On 02.08.24 15:14, Michael S. Tsirkin wrote:
> On Fri, Aug 02, 2024 at 10:20:17AM +0300, Dragos Tatulea wrote:
>> This series parallelizes the mlx5_vdpa device suspend and resume
>> operations through the firmware async API. The purpose is to reduce live
>> migration downtime.
>>
>> The series starts with changing the VQ suspend and resume commands
>> to the async API. After that, the switch is made to issue multiple
>> commands of the same type in parallel.
>>
>> Finally, a bonus improvement is thrown in: keep the notifierd enabled
>> during suspend but make it a NOP. Upon resume make sure that the link
>> state is forwarded. This shaves around 30ms per device constant time.
>>
>> For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
>> x 2 threads per core), the improvements are:
>>
>> +-------------------+--------+--------+-----------+
>> | operation         | Before | After  | Reduction |
>> |-------------------+--------+--------+-----------|
>> | mlx5_vdpa_suspend | 37 ms  | 2.5 ms |     14x   |
>> | mlx5_vdpa_resume  | 16 ms  | 5 ms   |      3x   |
>> +-------------------+--------+--------+-----------+
>>
>> Note for the maintainers:
>> The first patch contains changes for mlx5_core. This must be applied
>> into the mlx5-vhost tree [0] first. Once this patch is applied on
>> mlx5-vhost, the change has to be pulled from mlx5-vdpa into the vhost
>> tree and only then the remaining patches can be applied.
> 
> Or maintainer just acks it and I apply directly.
> 
Tariq reviewed the patch, he is a mlx5_core maintainer. So consider it acked.
Just sent the v2 with the same note in the cover letter.

Thanks,
Dragos

> Let me know when all this can happen.
> 
>> [0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
>>
>> Dragos Tatulea (7):
>>   net/mlx5: Support throttled commands from async API
>>   vdpa/mlx5: Introduce error logging function
>>   vdpa/mlx5: Use async API for vq query command
>>   vdpa/mlx5: Use async API for vq modify commands
>>   vdpa/mlx5: Parallelize device suspend
>>   vdpa/mlx5: Parallelize device resume
>>   vdpa/mlx5: Keep notifiers during suspend but ignore
>>
>>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  21 +-
>>  drivers/vdpa/mlx5/core/mlx5_vdpa.h            |   7 +
>>  drivers/vdpa/mlx5/net/mlx5_vnet.c             | 435 +++++++++++++-----
>>  3 files changed, 333 insertions(+), 130 deletions(-)
>>
>> -- 
>> 2.45.2
>