Message ID | 1447152255-28231-2-git-send-email-sagig@mellanox.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Tue, Nov 10, 2015 at 12:44:13PM +0200, Sagi Grimberg wrote: > Different devices may require different access permissions > for rdma reads e.g. for Infiniband devices, local write access > suffices while iWARP devices require remote write permissions as > well. > > This situation generates extra logic for ULPs that need to be aware > of the underlying device to determine rdma read access. Instead, > add a device attribute for ULPs to use without being aware of the > underlying device. > > Also, set rdma_read_access_flags in the relevant device drivers: > mlx4, mlx5, qib, ocrdma, nes: IB_ACCESS_LOCAL_WRITE > cxgb3, cxgb4, hfi: IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE From all I can tell nes also is a iWarp driver. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From all I can tell nes also is a iWarp driver.
It is.. I don't know why I treated it as IB :)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Le mardi 10 novembre 2015 à 12:44 +0200, Sagi Grimberg a écrit : > Different devices may require different access permissions > for rdma reads e.g. for Infiniband devices, local write access > suffices while iWARP devices require remote write permissions as > well. > > This situation generates extra logic for ULPs that need to be aware > of the underlying device to determine rdma read access. Instead, > add a device attribute for ULPs to use without being aware of the > underlying device. > > Also, set rdma_read_access_flags in the relevant device drivers: > mlx4, mlx5, qib, ocrdma, nes: IB_ACCESS_LOCAL_WRITE > cxgb3, cxgb4, hfi: IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE > Why were those hw providers not modified to enforce IB_ACCESS_REMOTE_WRITE when needed, instead of asking users to set it for them ? Regards. -- Yann Droneaud OPTEYA -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Yann, > > Why were those hw providers not modified to > enforce IB_ACCESS_REMOTE_WRITE when needed, instead of asking users to > set it for them ? Do you mean that ULPs will set IB_ACCESS_LOCAL_WRITE and iWARP providers executing the memory registration will add IB_ACCESS_REMOTE_WRITE? That's even better I think! Given that changing the access rights under the user legs is something we'd want to do. is it? Given that for IB memory registration is not even needed for rdma_read, I'd like to move away from code like: if (rdma_protocol_iwarp()) register memory for rdma_read else go ahead with ib_sge for rdma_read I've maid some initial experimentation on trying to abstract this aspect from ULPs but it's far from being complete and needs more work. Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/10/2015 6:51 AM, Yann Droneaud wrote: > Le mardi 10 novembre 2015 à 12:44 +0200, Sagi Grimberg a écrit : >> Also, set rdma_read_access_flags in the relevant device drivers: >> mlx4, mlx5, qib, ocrdma, nes: IB_ACCESS_LOCAL_WRITE >> cxgb3, cxgb4, hfi: IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE >> > > Why were those hw providers not modified to > enforce IB_ACCESS_REMOTE_WRITE when needed, instead of asking users to > set it for them ? Because the Linux verbs don't tell the driver whether the registration is for an RDMA Read sink buffer. Sagi, the Windows NDKPI has an NDK_MR_FLAG_RDMA_READ_SINK attribute which the upper layer can use to convey this information, I've mentioned it here before. https://msdn.microsoft.com/en-us/library/windows/hardware/hh439908(v=vs.85).aspx When this approach is used, the upper layer doesn't have to be aware at all of the lower layer's details. Tom. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/11/2015 14:28, Sagi Grimberg wrote: > Hi Yann, > >> >> Why were those hw providers not modified to >> enforce IB_ACCESS_REMOTE_WRITE when needed, instead of asking users to >> set it for them ? > > Do you mean that ULPs will set IB_ACCESS_LOCAL_WRITE and > iWARP providers executing the memory registration will add > IB_ACCESS_REMOTE_WRITE? That's even better I think! Given that > changing the access rights under the user legs is something we'd want > to do. is it? Oh, forgot that the providers do not know how this key will be used. Thanks Tom for reminding me. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Sagi, the Windows NDKPI has an NDK_MR_FLAG_RDMA_READ_SINK attribute > which the upper layer can use to convey this information, I've mentioned > it here before. > https://msdn.microsoft.com/en-us/library/windows/hardware/hh439908(v=vs.85).aspx > Thanks for the tip Tom. > > When this approach is used, the upper layer doesn't have to be aware > at all of the lower layer's details. Yea, we could do that too. Any preferences from other people? I'm pretty indifferent on which way to go... Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 10, 2015 at 02:42:44PM +0200, Sagi Grimberg wrote: > >When this approach is used, the upper layer doesn't have to be aware > >at all of the lower layer's details. > > Yea, we could do that too. Any preferences from other people? > I'm pretty indifferent on which way to go... Yes, that's the way to go. I though we had agree on doing that as one of the next step of the MR API cleanups, but decided to postpone it past the first iteration for some reason that made it non-trivial. I have to say I still don't like the Windows API either, though as it still keeps the criptic {local,remote}_{read,write} flags. Basically what we want is two-dimension selection: /* * If %true this is the target of RDMA READ/WRITE operations * from the remote system. If %false this is the sink / source * for RDMA READ/WRITE operations issued by the local system. */ enum ib_mr_scope { IB_MR_REMOTE, IB_MR_LOCAL }; /* * Operation we're registering for. Can be OR'ed together * when allowing RDMA READs and WRITEs on a single MR. */ enum ib_mr_dir { IB_MR_RDMA_READ = 1 << 0, IB_MR_RDMA_WRITE = 1 << 1 }; -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 10, 2015 at 12:44:13PM +0200, Sagi Grimberg wrote: > Different devices may require different access permissions > for rdma reads e.g. for Infiniband devices, local write access > suffices while iWARP devices require remote write permissions as > well. > > This situation generates extra logic for ULPs that need to be aware > of the underlying device to determine rdma read access. Instead, > add a device attribute for ULPs to use without being aware of the > underlying device. > > Also, set rdma_read_access_flags in the relevant device drivers: > mlx4, mlx5, qib, ocrdma, nes: IB_ACCESS_LOCAL_WRITE > cxgb3, cxgb4, hfi: IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE IB devices *should* be using a local_dma_lkey instead of a memory registration for the RDMA READ target memory. iWarp devices *must* use a memory registration instead. Having an API that encourages ULPs to do useless MR work on IB does not seem like a great idea, but if the ULP needs to create a MR anyhow (SG limits or whatever), then it makes some sense.. But not in absence of the 'need a mr' general check.. Finally, please don't add rdma_read_access_flags - we went to a lot of trouble to add the cap stuff, and this stuff is already covered by it. No need to change every driver. I'd suggest something like unsigned int rdma_cap_rdma_read_mr_flags(const struct ib_pd *pd) { if (rdma_protocol_iwarp(pd->device, rdma_start_port(pd->device))) return IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE; return IB_ACCESS_LOCAL_WRITE; } bool rdma_cap_need_rdma_read_mr(const struct ib_pd *pd) ... Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 10, 2015 at 12:51:10PM +0100, Yann Droneaud wrote: > Why were those hw providers not modified to > enforce IB_ACCESS_REMOTE_WRITE when needed, instead of asking users to > set it for them ? iWarp hardware simply cannot do it it all. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 10, 2015 at 11:01:56AM -0700, Jason Gunthorpe wrote: > No need to change every driver. > > I'd suggest something like > > unsigned int rdma_cap_rdma_read_mr_flags(const struct ib_pd *pd) > { > if (rdma_protocol_iwarp(pd->device, rdma_start_port(pd->device))) > return IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE; > return IB_ACCESS_LOCAL_WRITE; > } Yes, this looks good! -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/11/2015 10:08, Christoph Hellwig wrote: > On Tue, Nov 10, 2015 at 11:01:56AM -0700, Jason Gunthorpe wrote: >> No need to change every driver. >> >> I'd suggest something like >> >> unsigned int rdma_cap_rdma_read_mr_flags(const struct ib_pd *pd) >> { >> if (rdma_protocol_iwarp(pd->device, rdma_start_port(pd->device))) >> return IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE; >> return IB_ACCESS_LOCAL_WRITE; >> } > > Yes, this looks good! It does, but it doesn't look like something we'd want to check for each IO... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 11, 2015 at 11:09:03AM +0200, Sagi Grimberg wrote: > It does, but it doesn't look like something we'd want to check for each > IO... Both potential callers already have a access flags variable in object that's assigned to at setup time so I don't see a problem here. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index db4c08c53ae3..483ae3c8a13b 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -1467,6 +1467,8 @@ int iwch_register_device(struct iwch_dev *dev) dev->ibdev.max_pd = dev->attr.max_pds; dev->ibdev.local_ca_ack_delay = 0; dev->ibdev.max_fast_reg_page_list_len = T3_MAX_FASTREG_DEPTH; + ibdev->rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_WRITE; ret = ib_register_device(&dev->ibdev, NULL); if (ret) diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c index 84e9b5db0341..001db1cc737c 100644 --- a/drivers/infiniband/hw/cxgb4/provider.c +++ b/drivers/infiniband/hw/cxgb4/provider.c @@ -564,6 +564,8 @@ int c4iw_register_device(struct c4iw_dev *dev) dev->ibdev.max_pd = T4_MAX_NUM_PD; dev->ibdev.local_ca_ack_delay = 0; dev->ibdev.max_fast_reg_page_list_len = t4_max_fr_depth(use_dsgl); + dev->ibdev.rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_WRITE; ret = ib_register_device(&dev->ibdev, NULL); if (ret) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 46305dc27f9f..82af524dfb4e 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -524,6 +524,7 @@ static int mlx4_ib_init_device_flags(struct ib_device *ibdev) ibdev->max_map_per_fmr = dev->dev->caps.max_fmr_maps; ibdev->hca_core_clock = dev->dev->caps.hca_core_clock * 1000UL; ibdev->timestamp_mask = 0xFFFFFFFFFFFFULL; + ibdev->rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE; if (!mlx4_is_slave(dev->dev)) err = mlx4_get_internal_clock_params(dev->dev, &clock_params); diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 5b733221ca5f..b29f79a03494 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -287,6 +287,7 @@ static int mlx5_ib_init_device_flags(struct ib_device *ibdev) ibdev->max_total_mcast_qp_attach = ibdev->max_mcast_qp_attach * ibdev->max_mcast_grp; ibdev->max_map_per_fmr = INT_MAX; /* no limit in ConnectIB */ + ibdev->rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE; #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING if (MLX5_CAP_GEN(mdev, pg)) diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c index 28d7a8bee3f7..ea1882bbe7a5 100644 --- a/drivers/infiniband/hw/mthca/mthca_provider.c +++ b/drivers/infiniband/hw/mthca/mthca_provider.c @@ -110,6 +110,7 @@ static int mthca_init_device_flags(struct ib_device *ibdev) ibdev->max_mcast_qp_attach = MTHCA_QP_PER_MGM; ibdev->max_total_mcast_qp_attach = ibdev->max_mcast_qp_attach * ibdev->max_mcast_grp; + ibdev->rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE; /* * If Sinai memory key optimization is being used, then only * the 8-bit key portion will change. For other HCAs, the diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index 2ef911953e38..adcea173f8dc 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -3868,6 +3868,8 @@ struct nes_ib_device *nes_init_ofa_device(struct net_device *netdev) nesibdev->ibdev.max_mw = nesibdev->max_mr; nesibdev->ibdev.max_pd = nesibdev->max_pd; nesibdev->ibdev.max_sge_rd = 1; + nesibdev->ibdev.rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE; + switch (nesdev->nesadapter->max_irrq_wr) { case 0: nesibdev->ibdev.max_qp_rd_atom = 2; diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c index 00ddcf9294e4..235bc90bf2c6 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c @@ -244,6 +244,7 @@ static int ocrdma_register_device(struct ocrdma_dev *dev) dev->ibdev.local_ca_ack_delay = dev->attr.local_ca_ack_delay; dev->ibdev.max_fast_reg_page_list_len = dev->attr.max_pages_per_frmr; dev->ibdev.max_pkeys = 1; + dev->ibdev.rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE; return ib_register_device(&dev->ibdev, NULL); } diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c index 9e1af0b9857a..be84307611c4 100644 --- a/drivers/infiniband/hw/qib/qib_verbs.c +++ b/drivers/infiniband/hw/qib/qib_verbs.c @@ -2257,6 +2257,7 @@ int qib_register_ib_device(struct qib_devdata *dd) ibdev->max_mcast_qp_attach = ib_qib_max_mcast_qp_attached; ibdev->max_total_mcast_qp_attach = ibdev->max_mcast_qp_attach * ibdev->max_mcast_grp; + ibdev->rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE; snprintf(ibdev->node_desc, sizeof(ibdev->node_desc), "Intel Infiniband HCA %s", init_utsname()->nodename); diff --git a/drivers/staging/rdma/hfi1/verbs.c b/drivers/staging/rdma/hfi1/verbs.c index 664548a62e4d..d3e0dd27a8e2 100644 --- a/drivers/staging/rdma/hfi1/verbs.c +++ b/drivers/staging/rdma/hfi1/verbs.c @@ -2062,6 +2062,8 @@ int hfi1_register_ib_device(struct hfi1_devdata *dd) ibdev->max_mcast_qp_attach = hfi1_max_mcast_qp_attached; ibdev->max_total_mcast_qp_attach = ibdev->max_mcast_qp_attach * ibdev->max_mcast_grp; + ibdev->rdma_read_access_flags = IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_WRITE; strncpy(ibdev->node_desc, init_utsname()->nodename, sizeof(ibdev->node_desc)); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 861d92f1a0b0..2abbdd976c18 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1851,6 +1851,7 @@ struct ib_device { struct ib_odp_caps odp_caps; uint64_t timestamp_mask; uint64_t hca_core_clock; /* in KHZ */ + int rdma_read_access_flags; /** * The following mandatory functions are used only at device
Different devices may require different access permissions for rdma reads e.g. for Infiniband devices, local write access suffices while iWARP devices require remote write permissions as well. This situation generates extra logic for ULPs that need to be aware of the underlying device to determine rdma read access. Instead, add a device attribute for ULPs to use without being aware of the underlying device. Also, set rdma_read_access_flags in the relevant device drivers: mlx4, mlx5, qib, ocrdma, nes: IB_ACCESS_LOCAL_WRITE cxgb3, cxgb4, hfi: IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE Signed-off-by: Sagi Grimberg <sagig@mellanox.com> --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 2 ++ drivers/infiniband/hw/cxgb4/provider.c | 2 ++ drivers/infiniband/hw/mlx4/main.c | 1 + drivers/infiniband/hw/mlx5/main.c | 1 + drivers/infiniband/hw/mthca/mthca_provider.c | 1 + drivers/infiniband/hw/nes/nes_verbs.c | 2 ++ drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1 + drivers/infiniband/hw/qib/qib_verbs.c | 1 + drivers/staging/rdma/hfi1/verbs.c | 2 ++ include/rdma/ib_verbs.h | 1 + 10 files changed, 14 insertions(+)