mbox series

[mlx5-next,0/5] Improve mlx5 live migration driver

Message ID 20220427093120.161402-1-yishaih@nvidia.com (mailing list archive)
Headers show
Series Improve mlx5 live migration driver | expand

Message

Yishai Hadas April 27, 2022, 9:31 a.m. UTC
This series improves mlx5 live migration driver in few aspects as of
below.

Refactor to enable running migration commands in parallel over the PF
command interface.

To achieve that we exposed from mlx5_core an API to let the VF be
notified before that the PF command interface goes down/up. (e.g. PF
reload upon health recovery).

Once having the above functionality in place mlx5 vfio doesn't need any
more to obtain the global PF lock upon using the command interface but
can rely on the above mechanism to be in sync with the PF.

This can enable parallel VFs migration over the PF command interface
from kernel driver point of view.

In addition,
Moved to use the PF async command mode for the SAVE state command.
This enables returning earlier to user space upon issuing successfully
the command and improve latency by let things run in parallel.

Alex, as this series touches mlx5_core we may need to send this in a
pull request format to VFIO to avoid conflicts before acceptance.

Yishai

Yishai Hadas (5):
  vfio/mlx5: Reorganize the VF is migratable code
  net/mlx5: Expose mlx5_sriov_blocking_notifier_register /  unregister
    APIs
  vfio/mlx5: Manage the VF attach/detach callback from the PF
  vfio/mlx5: Refactor to enable VFs migration in parallel
  vfio/mlx5: Run the SAVE state command in an async mode

 .../net/ethernet/mellanox/mlx5/core/sriov.c   |  65 ++++-
 drivers/vfio/pci/mlx5/cmd.c                   | 229 +++++++++++++-----
 drivers/vfio/pci/mlx5/cmd.h                   |  50 +++-
 drivers/vfio/pci/mlx5/main.c                  | 133 +++++-----
 include/linux/mlx5/driver.h                   |  12 +
 5 files changed, 358 insertions(+), 131 deletions(-)

Comments

Yishai Hadas May 4, 2022, 1:29 p.m. UTC | #1
On 27/04/2022 12:31, Yishai Hadas wrote:
> This series improves mlx5 live migration driver in few aspects as of
> below.
>
> Refactor to enable running migration commands in parallel over the PF
> command interface.
>
> To achieve that we exposed from mlx5_core an API to let the VF be
> notified before that the PF command interface goes down/up. (e.g. PF
> reload upon health recovery).
>
> Once having the above functionality in place mlx5 vfio doesn't need any
> more to obtain the global PF lock upon using the command interface but
> can rely on the above mechanism to be in sync with the PF.
>
> This can enable parallel VFs migration over the PF command interface
> from kernel driver point of view.
>
> In addition,
> Moved to use the PF async command mode for the SAVE state command.
> This enables returning earlier to user space upon issuing successfully
> the command and improve latency by let things run in parallel.
>
> Alex, as this series touches mlx5_core we may need to send this in a
> pull request format to VFIO to avoid conflicts before acceptance.
>
> Yishai
>
> Yishai Hadas (5):
>    vfio/mlx5: Reorganize the VF is migratable code
>    net/mlx5: Expose mlx5_sriov_blocking_notifier_register /  unregister
>      APIs
>    vfio/mlx5: Manage the VF attach/detach callback from the PF
>    vfio/mlx5: Refactor to enable VFs migration in parallel
>    vfio/mlx5: Run the SAVE state command in an async mode
>
>   .../net/ethernet/mellanox/mlx5/core/sriov.c   |  65 ++++-
>   drivers/vfio/pci/mlx5/cmd.c                   | 229 +++++++++++++-----
>   drivers/vfio/pci/mlx5/cmd.h                   |  50 +++-
>   drivers/vfio/pci/mlx5/main.c                  | 133 +++++-----
>   include/linux/mlx5/driver.h                   |  12 +
>   5 files changed, 358 insertions(+), 131 deletions(-)
>
Hi Alex,

Did you have the chance to look at the series ? It touches mlx5 code 
(vfio, net), no core changes.

This may go apparently via your tree as a PR from mlx5-next once you'll 
be fine with.

Thanks,
Yishai
Alex Williamson May 4, 2022, 8:19 p.m. UTC | #2
On Wed, 4 May 2022 16:29:37 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> On 27/04/2022 12:31, Yishai Hadas wrote:
> > This series improves mlx5 live migration driver in few aspects as of
> > below.
> >
> > Refactor to enable running migration commands in parallel over the PF
> > command interface.
> >
> > To achieve that we exposed from mlx5_core an API to let the VF be
> > notified before that the PF command interface goes down/up. (e.g. PF
> > reload upon health recovery).
> >
> > Once having the above functionality in place mlx5 vfio doesn't need any
> > more to obtain the global PF lock upon using the command interface but
> > can rely on the above mechanism to be in sync with the PF.
> >
> > This can enable parallel VFs migration over the PF command interface
> > from kernel driver point of view.
> >
> > In addition,
> > Moved to use the PF async command mode for the SAVE state command.
> > This enables returning earlier to user space upon issuing successfully
> > the command and improve latency by let things run in parallel.
> >
> > Alex, as this series touches mlx5_core we may need to send this in a
> > pull request format to VFIO to avoid conflicts before acceptance.
> >
> > Yishai
> >
> > Yishai Hadas (5):
> >    vfio/mlx5: Reorganize the VF is migratable code
> >    net/mlx5: Expose mlx5_sriov_blocking_notifier_register /  unregister
> >      APIs
> >    vfio/mlx5: Manage the VF attach/detach callback from the PF
> >    vfio/mlx5: Refactor to enable VFs migration in parallel
> >    vfio/mlx5: Run the SAVE state command in an async mode
> >
> >   .../net/ethernet/mellanox/mlx5/core/sriov.c   |  65 ++++-
> >   drivers/vfio/pci/mlx5/cmd.c                   | 229 +++++++++++++-----
> >   drivers/vfio/pci/mlx5/cmd.h                   |  50 +++-
> >   drivers/vfio/pci/mlx5/main.c                  | 133 +++++-----
> >   include/linux/mlx5/driver.h                   |  12 +
> >   5 files changed, 358 insertions(+), 131 deletions(-)
> >  
> Hi Alex,
> 
> Did you have the chance to look at the series ? It touches mlx5 code 
> (vfio, net), no core changes.
> 
> This may go apparently via your tree as a PR from mlx5-next once you'll 
> be fine with.

As Jason noted, the net/mlx5 changes seem confined to the 2nd patch,
which has no other dependencies in this series.  Is there something
else blocking committing that via the mlx tree and providing a branch
for the remainder to go in through the vfio tree?  Thanks,

Alex
Jason Gunthorpe May 4, 2022, 9:33 p.m. UTC | #3
On Wed, May 04, 2022 at 02:19:19PM -0600, Alex Williamson wrote:

> > This may go apparently via your tree as a PR from mlx5-next once you'll 
> > be fine with.
> 
> As Jason noted, the net/mlx5 changes seem confined to the 2nd patch,
> which has no other dependencies in this series.  Is there something
> else blocking committing that via the mlx tree and providing a branch
> for the remainder to go in through the vfio tree?  Thanks,

Our process is to not add dead code to our non-rebasing branches until
we have an ack on the consumer patches.

So you can get a PR from Leon with everything sorted out including the
VFIO bits, or you can get a PR from Leon with just the shared branch,
after you say OK.

Jason
Alex Williamson May 4, 2022, 10:48 p.m. UTC | #4
On Wed, 4 May 2022 18:33:09 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Wed, May 04, 2022 at 02:19:19PM -0600, Alex Williamson wrote:
> 
> > > This may go apparently via your tree as a PR from mlx5-next once you'll 
> > > be fine with.  
> > 
> > As Jason noted, the net/mlx5 changes seem confined to the 2nd patch,
> > which has no other dependencies in this series.  Is there something
> > else blocking committing that via the mlx tree and providing a branch
> > for the remainder to go in through the vfio tree?  Thanks,  
> 
> Our process is to not add dead code to our non-rebasing branches until
> we have an ack on the consumer patches.
> 
> So you can get a PR from Leon with everything sorted out including the
> VFIO bits, or you can get a PR from Leon with just the shared branch,
> after you say OK.

As long as Leon wants to wait for some acks in the former case, I'm fine
with either, but I don't expect to be able to shoot down the premise of
the series.  You folks are the experts how your device works and there
are no API changes on the vfio side for me to critique here.  Thanks,

Alex
Leon Romanovsky May 5, 2022, 5:38 a.m. UTC | #5
On Wed, May 04, 2022 at 04:48:17PM -0600, Alex Williamson wrote:
> On Wed, 4 May 2022 18:33:09 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Wed, May 04, 2022 at 02:19:19PM -0600, Alex Williamson wrote:
> > 
> > > > This may go apparently via your tree as a PR from mlx5-next once you'll 
> > > > be fine with.  
> > > 
> > > As Jason noted, the net/mlx5 changes seem confined to the 2nd patch,
> > > which has no other dependencies in this series.  Is there something
> > > else blocking committing that via the mlx tree and providing a branch
> > > for the remainder to go in through the vfio tree?  Thanks,  
> > 
> > Our process is to not add dead code to our non-rebasing branches until
> > we have an ack on the consumer patches.
> > 
> > So you can get a PR from Leon with everything sorted out including the
> > VFIO bits, or you can get a PR from Leon with just the shared branch,
> > after you say OK.
> 
> As long as Leon wants to wait for some acks in the former case, I'm fine
> with either, but I don't expect to be able to shoot down the premise of
> the series.  You folks are the experts how your device works and there
> are no API changes on the vfio side for me to critique here.  Thanks,

I will prepare PR on Sunday/Monday.

Thanks

> 
> Alex
>