Message ID | cover.1681131553.git.leon@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | Allow relaxed ordering read in VFs and VMs | expand |
On Mon, Apr 10, 2023 at 04:07:49PM +0300, Leon Romanovsky wrote: > From: Leon Romanovsky <leonro@nvidia.com> > > From Avihai, > > Currently, Relaxed Ordering (RO) can't be used in VFs directly and in > VFs assigned to QEMU, even if the PF supports RO. This is due to issues > in reporting/emulation of PCI config space RO bit and due to current > HCA capability behavior. > > This series fixes it by using a new HCA capability and by relying on FW > to do the "right thing" according to the PF's PCI config space RO value. > > Allowing RO in VFs and VMs is valuable since it can greatly improve > performance on some setups. For example, testing throughput of a VF on > an AMD EPYC 7763 and ConnectX-6 Dx setup showed roughly 60% performance > improvement. > > Thanks > > Avihai Horon (4): > RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write > RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR > net/mlx5: Update relaxed ordering read HCA capabilities > RDMA/mlx5: Allow relaxed ordering read in VFs and VMs This looks OK, but the patch structure is pretty confusing. It seems to me there are really only two patches here, the first is to add some static inline 'mlx5 supports read ro' which supports both the cap bits described in the PRM, with a little comment to explain that old devices only set the old cap. And a second patch to call it in all the places we need to check before setting the mkc ro read bit. Maybe a final third patch to sort out that mistake in the write side. But this really doesn't have anything to do with VFs and VMs, this is adjusting the code to follow the current PRM because the old one was mis-desgined. Jason
On Tue, Apr 11, 2023 at 11:01:03AM -0300, Jason Gunthorpe wrote: > On Mon, Apr 10, 2023 at 04:07:49PM +0300, Leon Romanovsky wrote: > > From: Leon Romanovsky <leonro@nvidia.com> > > > > From Avihai, > > > > Currently, Relaxed Ordering (RO) can't be used in VFs directly and in > > VFs assigned to QEMU, even if the PF supports RO. This is due to issues > > in reporting/emulation of PCI config space RO bit and due to current > > HCA capability behavior. > > > > This series fixes it by using a new HCA capability and by relying on FW > > to do the "right thing" according to the PF's PCI config space RO value. > > > > Allowing RO in VFs and VMs is valuable since it can greatly improve > > performance on some setups. For example, testing throughput of a VF on > > an AMD EPYC 7763 and ConnectX-6 Dx setup showed roughly 60% performance > > improvement. > > > > Thanks > > > > Avihai Horon (4): > > RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write > > RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR > > net/mlx5: Update relaxed ordering read HCA capabilities > > RDMA/mlx5: Allow relaxed ordering read in VFs and VMs > > This looks OK, but the patch structure is pretty confusing. > > It seems to me there are really only two patches here, the first is to > add some static inline I asked from Avihai to align all pcie_relaxed_ordering_enabled() calls to be relevant for RO only. This is how we came to first two patches. Thanks > > 'mlx5 supports read ro' > > which supports both the cap bits described in > the PRM, with a little comment to explain that old devices only set > the old cap. > > And a second patch to call it in all the places we need to check before > setting the mkc ro read bit. > > Maybe a final third patch to sort out that mistake in the write side. > > But this really doesn't have anything to do with VFs and VMs, this is > adjusting the code to follow the current PRM because the old one was > mis-desgined. > > Jason
On 4/11/2023 7:01 AM, Jason Gunthorpe wrote: > On Mon, Apr 10, 2023 at 04:07:49PM +0300, Leon Romanovsky wrote: >> From: Leon Romanovsky <leonro@nvidia.com> >> >> From Avihai, >> >> Currently, Relaxed Ordering (RO) can't be used in VFs directly and in >> VFs assigned to QEMU, even if the PF supports RO. This is due to issues >> in reporting/emulation of PCI config space RO bit and due to current >> HCA capability behavior. >> >> This series fixes it by using a new HCA capability and by relying on FW >> to do the "right thing" according to the PF's PCI config space RO value. >> >> Allowing RO in VFs and VMs is valuable since it can greatly improve >> performance on some setups. For example, testing throughput of a VF on >> an AMD EPYC 7763 and ConnectX-6 Dx setup showed roughly 60% performance >> improvement. >> >> Thanks >> >> Avihai Horon (4): >> RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write >> RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR >> net/mlx5: Update relaxed ordering read HCA capabilities >> RDMA/mlx5: Allow relaxed ordering read in VFs and VMs > > This looks OK, but the patch structure is pretty confusing. > > It seems to me there are really only two patches here, the first is to > add some static inline > > 'mlx5 supports read ro' > > which supports both the cap bits described in > the PRM, with a little comment to explain that old devices only set > the old cap. > > And a second patch to call it in all the places we need to check before > setting the mkc ro read bit. > > Maybe a final third patch to sort out that mistake in the write side. > > But this really doesn't have anything to do with VFs and VMs, this is > adjusting the code to follow the current PRM because the old one was > mis-desgined. > > Jason FWIW I think Jason's outline here makes sense too and might be slightly better. However, reading through the series I was reasonably able to understand things enough that I think its fine as-is. In some sense its not about VF or VM, but fixing this has the result that it fixes a setup with VF and VM, so I think thats an ok thing to call out as the goal.
On Tue, Apr 11, 2023 at 04:21:09PM -0700, Jacob Keller wrote: > > > On 4/11/2023 7:01 AM, Jason Gunthorpe wrote: > > On Mon, Apr 10, 2023 at 04:07:49PM +0300, Leon Romanovsky wrote: > >> From: Leon Romanovsky <leonro@nvidia.com> > >> > >> From Avihai, > >> > >> Currently, Relaxed Ordering (RO) can't be used in VFs directly and in > >> VFs assigned to QEMU, even if the PF supports RO. This is due to issues > >> in reporting/emulation of PCI config space RO bit and due to current > >> HCA capability behavior. > >> > >> This series fixes it by using a new HCA capability and by relying on FW > >> to do the "right thing" according to the PF's PCI config space RO value. > >> > >> Allowing RO in VFs and VMs is valuable since it can greatly improve > >> performance on some setups. For example, testing throughput of a VF on > >> an AMD EPYC 7763 and ConnectX-6 Dx setup showed roughly 60% performance > >> improvement. > >> > >> Thanks > >> > >> Avihai Horon (4): > >> RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write > >> RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR > >> net/mlx5: Update relaxed ordering read HCA capabilities > >> RDMA/mlx5: Allow relaxed ordering read in VFs and VMs > > > > This looks OK, but the patch structure is pretty confusing. > > > > It seems to me there are really only two patches here, the first is to > > add some static inline > > > > 'mlx5 supports read ro' > > > > which supports both the cap bits described in > > the PRM, with a little comment to explain that old devices only set > > the old cap. > > > > And a second patch to call it in all the places we need to check before > > setting the mkc ro read bit. > > > > Maybe a final third patch to sort out that mistake in the write side. > > > > But this really doesn't have anything to do with VFs and VMs, this is > > adjusting the code to follow the current PRM because the old one was > > mis-desgined. > > > > Jason > > FWIW I think Jason's outline here makes sense too and might be slightly > better. However, reading through the series I was reasonably able to > understand things enough that I think its fine as-is. > > In some sense its not about VF or VM, but fixing this has the result > that it fixes a setup with VF and VM, so I think thats an ok thing to > call out as the goal. VF or VM came from user perspective of where this behavior is not correct. Avihai saw this in QEMU, so he described it in terms which are more clear to the end user. Thanks
On Thu, Apr 13, 2023 at 03:49:29PM +0300, Leon Romanovsky wrote: > > that it fixes a setup with VF and VM, so I think thats an ok thing to > > call out as the goal. > > VF or VM came from user perspective of where this behavior is not > correct. Avihai saw this in QEMU, so he described it in terms which > are more clear to the end user. Except it is not clear, the VF/VM issue is more properly solved by showing the real relaxed order cap to the VM. This series really is about fixing the FW mistake that had a dynamic cap bit for relaxed ordering. The driver does not support cap bits that change during runtime. mlx5 racily bodged around the broken cap by by protecting the feature with the same test the FW was using to make the cap dynamic, but this is all just wrong. The new cap bit is static, doesn't change like a cap bit should, and so we don't need the bodge anymore. That the bodge didn't work in VMs because of a qmeu/vfio issue is another bad side effect, but it isn't really the point of this series. This is why I'd like it if the code was more closely organized to make it clear that the old cap is OLD and that the bodge that goes along with it is part of making the cap bit work. It kind of gets lost in the way things are organized what is old/new. Jason
On Thu, Apr 13, 2023 at 11:46:16AM -0300, Jason Gunthorpe wrote: > On Thu, Apr 13, 2023 at 03:49:29PM +0300, Leon Romanovsky wrote: > > > > that it fixes a setup with VF and VM, so I think thats an ok thing to > > > call out as the goal. > > > > VF or VM came from user perspective of where this behavior is not > > correct. Avihai saw this in QEMU, so he described it in terms which > > are more clear to the end user. > > Except it is not clear, the VF/VM issue is more properly solved by > showing the real relaxed order cap to the VM. I'm not convinced that patch restructure is really needed for something so low as fix to problematic FW. I'm applying the series as is and curious reader will read this discussion through Link tag from the patch. Thanks
On Mon, 10 Apr 2023 16:07:49 +0300, Leon Romanovsky wrote: > From: Leon Romanovsky <leonro@nvidia.com> > > From Avihai, > > Currently, Relaxed Ordering (RO) can't be used in VFs directly and in > VFs assigned to QEMU, even if the PF supports RO. This is due to issues > in reporting/emulation of PCI config space RO bit and due to current > HCA capability behavior. > > [...] Applied, thanks! [1/4] RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write https://git.kernel.org/rdma/rdma/c/ed4b0661cce119 [2/4] RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR https://git.kernel.org/rdma/rdma/c/d43b020b0f82c0 [3/4] net/mlx5: Update relaxed ordering read HCA capabilities https://git.kernel.org/rdma/rdma/c/ccbbfe0682f2ff [4/4] RDMA/mlx5: Allow relaxed ordering read in VFs and VMs https://git.kernel.org/rdma/rdma/c/bd4ba605c4a92b Best regards,
From: Leon Romanovsky <leonro@nvidia.com> From Avihai, Currently, Relaxed Ordering (RO) can't be used in VFs directly and in VFs assigned to QEMU, even if the PF supports RO. This is due to issues in reporting/emulation of PCI config space RO bit and due to current HCA capability behavior. This series fixes it by using a new HCA capability and by relying on FW to do the "right thing" according to the PF's PCI config space RO value. Allowing RO in VFs and VMs is valuable since it can greatly improve performance on some setups. For example, testing throughput of a VF on an AMD EPYC 7763 and ConnectX-6 Dx setup showed roughly 60% performance improvement. Thanks Avihai Horon (4): RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR net/mlx5: Update relaxed ordering read HCA capabilities RDMA/mlx5: Allow relaxed ordering read in VFs and VMs drivers/infiniband/hw/mlx5/mr.c | 12 ++++++++---- drivers/infiniband/hw/mlx5/umr.c | 7 +++++-- drivers/infiniband/hw/mlx5/umr.h | 3 ++- drivers/net/ethernet/mellanox/mlx5/core/en/params.c | 3 +-- drivers/net/ethernet/mellanox/mlx5/core/en_common.c | 9 +++++---- include/linux/mlx5/mlx5_ifc.h | 5 +++-- 6 files changed, 24 insertions(+), 15 deletions(-)