mbox series

[rdma-next,0/4] Allow relaxed ordering read in VFs and VMs

Message ID cover.1681131553.git.leon@kernel.org (mailing list archive)
Headers show
Series Allow relaxed ordering read in VFs and VMs | expand

Message

Leon Romanovsky April 10, 2023, 1:07 p.m. UTC
From: Leon Romanovsky <leonro@nvidia.com>

From Avihai,

Currently, Relaxed Ordering (RO) can't be used in VFs directly and in
VFs assigned to QEMU, even if the PF supports RO. This is due to issues
in reporting/emulation of PCI config space RO bit and due to current
HCA capability behavior.

This series fixes it by using a new HCA capability and by relying on FW
to do the "right thing" according to the PF's PCI config space RO value.

Allowing RO in VFs and VMs is valuable since it can greatly improve
performance on some setups. For example, testing throughput of a VF on
an AMD EPYC 7763 and ConnectX-6 Dx setup showed roughly 60% performance
improvement.

Thanks

Avihai Horon (4):
  RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write
  RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR
  net/mlx5: Update relaxed ordering read HCA capabilities
  RDMA/mlx5: Allow relaxed ordering read in VFs and VMs

 drivers/infiniband/hw/mlx5/mr.c                     | 12 ++++++++----
 drivers/infiniband/hw/mlx5/umr.c                    |  7 +++++--
 drivers/infiniband/hw/mlx5/umr.h                    |  3 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en/params.c |  3 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_common.c |  9 +++++----
 include/linux/mlx5/mlx5_ifc.h                       |  5 +++--
 6 files changed, 24 insertions(+), 15 deletions(-)

Comments

Jason Gunthorpe April 11, 2023, 2:01 p.m. UTC | #1
On Mon, Apr 10, 2023 at 04:07:49PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> From Avihai,
> 
> Currently, Relaxed Ordering (RO) can't be used in VFs directly and in
> VFs assigned to QEMU, even if the PF supports RO. This is due to issues
> in reporting/emulation of PCI config space RO bit and due to current
> HCA capability behavior.
> 
> This series fixes it by using a new HCA capability and by relying on FW
> to do the "right thing" according to the PF's PCI config space RO value.
> 
> Allowing RO in VFs and VMs is valuable since it can greatly improve
> performance on some setups. For example, testing throughput of a VF on
> an AMD EPYC 7763 and ConnectX-6 Dx setup showed roughly 60% performance
> improvement.
> 
> Thanks
> 
> Avihai Horon (4):
>   RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write
>   RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR
>   net/mlx5: Update relaxed ordering read HCA capabilities
>   RDMA/mlx5: Allow relaxed ordering read in VFs and VMs

This looks OK, but the patch structure is pretty confusing.

It seems to me there are really only two patches here, the first is to
add some static inline

'mlx5 supports read ro'

which supports both the cap bits described in
the PRM, with a little comment to explain that old devices only set
the old cap.

And a second patch to call it in all the places we need to check before
setting the mkc ro read bit.

Maybe a final third patch to sort out that mistake in the write side.

But this really doesn't have anything to do with VFs and VMs, this is
adjusting the code to follow the current PRM because the old one was
mis-desgined.

Jason
Leon Romanovsky April 11, 2023, 2:09 p.m. UTC | #2
On Tue, Apr 11, 2023 at 11:01:03AM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 10, 2023 at 04:07:49PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > From Avihai,
> > 
> > Currently, Relaxed Ordering (RO) can't be used in VFs directly and in
> > VFs assigned to QEMU, even if the PF supports RO. This is due to issues
> > in reporting/emulation of PCI config space RO bit and due to current
> > HCA capability behavior.
> > 
> > This series fixes it by using a new HCA capability and by relying on FW
> > to do the "right thing" according to the PF's PCI config space RO value.
> > 
> > Allowing RO in VFs and VMs is valuable since it can greatly improve
> > performance on some setups. For example, testing throughput of a VF on
> > an AMD EPYC 7763 and ConnectX-6 Dx setup showed roughly 60% performance
> > improvement.
> > 
> > Thanks
> > 
> > Avihai Horon (4):
> >   RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write
> >   RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR
> >   net/mlx5: Update relaxed ordering read HCA capabilities
> >   RDMA/mlx5: Allow relaxed ordering read in VFs and VMs
> 
> This looks OK, but the patch structure is pretty confusing.
> 
> It seems to me there are really only two patches here, the first is to
> add some static inline

I asked from Avihai to align all pcie_relaxed_ordering_enabled() calls
to be relevant for RO only. This is how we came to first two patches.

Thanks

> 
> 'mlx5 supports read ro'
> 
> which supports both the cap bits described in
> the PRM, with a little comment to explain that old devices only set
> the old cap.
> 
> And a second patch to call it in all the places we need to check before
> setting the mkc ro read bit.
> 
> Maybe a final third patch to sort out that mistake in the write side.
> 
> But this really doesn't have anything to do with VFs and VMs, this is
> adjusting the code to follow the current PRM because the old one was
> mis-desgined.
> 
> Jason
Jacob Keller April 11, 2023, 11:21 p.m. UTC | #3
On 4/11/2023 7:01 AM, Jason Gunthorpe wrote:
> On Mon, Apr 10, 2023 at 04:07:49PM +0300, Leon Romanovsky wrote:
>> From: Leon Romanovsky <leonro@nvidia.com>
>>
>> From Avihai,
>>
>> Currently, Relaxed Ordering (RO) can't be used in VFs directly and in
>> VFs assigned to QEMU, even if the PF supports RO. This is due to issues
>> in reporting/emulation of PCI config space RO bit and due to current
>> HCA capability behavior.
>>
>> This series fixes it by using a new HCA capability and by relying on FW
>> to do the "right thing" according to the PF's PCI config space RO value.
>>
>> Allowing RO in VFs and VMs is valuable since it can greatly improve
>> performance on some setups. For example, testing throughput of a VF on
>> an AMD EPYC 7763 and ConnectX-6 Dx setup showed roughly 60% performance
>> improvement.
>>
>> Thanks
>>
>> Avihai Horon (4):
>>   RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write
>>   RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR
>>   net/mlx5: Update relaxed ordering read HCA capabilities
>>   RDMA/mlx5: Allow relaxed ordering read in VFs and VMs
> 
> This looks OK, but the patch structure is pretty confusing.
> 
> It seems to me there are really only two patches here, the first is to
> add some static inline
> 
> 'mlx5 supports read ro'
> 
> which supports both the cap bits described in
> the PRM, with a little comment to explain that old devices only set
> the old cap.
> 
> And a second patch to call it in all the places we need to check before
> setting the mkc ro read bit.
> 
> Maybe a final third patch to sort out that mistake in the write side.
> 
> But this really doesn't have anything to do with VFs and VMs, this is
> adjusting the code to follow the current PRM because the old one was
> mis-desgined.
> 
> Jason

FWIW I think Jason's outline here makes sense too and might be slightly
better. However, reading through the series I was reasonably able to
understand things enough that I think its fine as-is.

In some sense its not about VF or VM, but fixing this has the result
that it fixes a setup with VF and VM, so I think thats an ok thing to
call out as the goal.
Leon Romanovsky April 13, 2023, 12:49 p.m. UTC | #4
On Tue, Apr 11, 2023 at 04:21:09PM -0700, Jacob Keller wrote:
> 
> 
> On 4/11/2023 7:01 AM, Jason Gunthorpe wrote:
> > On Mon, Apr 10, 2023 at 04:07:49PM +0300, Leon Romanovsky wrote:
> >> From: Leon Romanovsky <leonro@nvidia.com>
> >>
> >> From Avihai,
> >>
> >> Currently, Relaxed Ordering (RO) can't be used in VFs directly and in
> >> VFs assigned to QEMU, even if the PF supports RO. This is due to issues
> >> in reporting/emulation of PCI config space RO bit and due to current
> >> HCA capability behavior.
> >>
> >> This series fixes it by using a new HCA capability and by relying on FW
> >> to do the "right thing" according to the PF's PCI config space RO value.
> >>
> >> Allowing RO in VFs and VMs is valuable since it can greatly improve
> >> performance on some setups. For example, testing throughput of a VF on
> >> an AMD EPYC 7763 and ConnectX-6 Dx setup showed roughly 60% performance
> >> improvement.
> >>
> >> Thanks
> >>
> >> Avihai Horon (4):
> >>   RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write
> >>   RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR
> >>   net/mlx5: Update relaxed ordering read HCA capabilities
> >>   RDMA/mlx5: Allow relaxed ordering read in VFs and VMs
> > 
> > This looks OK, but the patch structure is pretty confusing.
> > 
> > It seems to me there are really only two patches here, the first is to
> > add some static inline
> > 
> > 'mlx5 supports read ro'
> > 
> > which supports both the cap bits described in
> > the PRM, with a little comment to explain that old devices only set
> > the old cap.
> > 
> > And a second patch to call it in all the places we need to check before
> > setting the mkc ro read bit.
> > 
> > Maybe a final third patch to sort out that mistake in the write side.
> > 
> > But this really doesn't have anything to do with VFs and VMs, this is
> > adjusting the code to follow the current PRM because the old one was
> > mis-desgined.
> > 
> > Jason
> 
> FWIW I think Jason's outline here makes sense too and might be slightly
> better. However, reading through the series I was reasonably able to
> understand things enough that I think its fine as-is.
> 
> In some sense its not about VF or VM, but fixing this has the result
> that it fixes a setup with VF and VM, so I think thats an ok thing to
> call out as the goal.

VF or VM came from user perspective of where this behavior is not
correct. Avihai saw this in QEMU, so he described it in terms which
are more clear to the end user.

Thanks
Jason Gunthorpe April 13, 2023, 2:46 p.m. UTC | #5
On Thu, Apr 13, 2023 at 03:49:29PM +0300, Leon Romanovsky wrote:

> > that it fixes a setup with VF and VM, so I think thats an ok thing to
> > call out as the goal.
> 
> VF or VM came from user perspective of where this behavior is not
> correct. Avihai saw this in QEMU, so he described it in terms which
> are more clear to the end user.

Except it is not clear, the VF/VM issue is more properly solved by
showing the real relaxed order cap to the VM.

This series really is about fixing the FW mistake that had a dynamic
cap bit for relaxed ordering. The driver does not support cap bits
that change during runtime.

mlx5 racily bodged around the broken cap by by protecting the feature
with the same test the FW was using to make the cap dynamic, but this
is all just wrong.

The new cap bit is static, doesn't change like a cap bit should, and
so we don't need the bodge anymore.

That the bodge didn't work in VMs because of a qmeu/vfio issue is
another bad side effect, but it isn't really the point of this series.

This is why I'd like it if the code was more closely organized to make
it clear that the old cap is OLD and that the bodge that goes along
with it is part of making the cap bit work. It kind of gets lost in
the way things are organized what is old/new.

Jason
Leon Romanovsky April 16, 2023, 10:28 a.m. UTC | #6
On Thu, Apr 13, 2023 at 11:46:16AM -0300, Jason Gunthorpe wrote:
> On Thu, Apr 13, 2023 at 03:49:29PM +0300, Leon Romanovsky wrote:
> 
> > > that it fixes a setup with VF and VM, so I think thats an ok thing to
> > > call out as the goal.
> > 
> > VF or VM came from user perspective of where this behavior is not
> > correct. Avihai saw this in QEMU, so he described it in terms which
> > are more clear to the end user.
> 
> Except it is not clear, the VF/VM issue is more properly solved by
> showing the real relaxed order cap to the VM.

I'm not convinced that patch restructure is really needed for something
so low as fix to problematic FW. I'm applying the series as is and
curious reader will read this discussion through Link tag from the
patch.

Thanks
Leon Romanovsky April 16, 2023, 10:30 a.m. UTC | #7
On Mon, 10 Apr 2023 16:07:49 +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> From Avihai,
> 
> Currently, Relaxed Ordering (RO) can't be used in VFs directly and in
> VFs assigned to QEMU, even if the PF supports RO. This is due to issues
> in reporting/emulation of PCI config space RO bit and due to current
> HCA capability behavior.
> 
> [...]

Applied, thanks!

[1/4] RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write
      https://git.kernel.org/rdma/rdma/c/ed4b0661cce119
[2/4] RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR
      https://git.kernel.org/rdma/rdma/c/d43b020b0f82c0
[3/4] net/mlx5: Update relaxed ordering read HCA capabilities
      https://git.kernel.org/rdma/rdma/c/ccbbfe0682f2ff
[4/4] RDMA/mlx5: Allow relaxed ordering read in VFs and VMs
      https://git.kernel.org/rdma/rdma/c/bd4ba605c4a92b

Best regards,