Message ID | 1505648922-21346-1-git-send-email-yishaih@mellanox.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
On Sun, Sep 17, 2017 at 02:48:42PM +0300, Yishai Hadas wrote: > Resource domain represents a collection of IB resources (as of QP, CQ, > etc.) which are accessed by an application from the same context (e.g. > same thread) and as such can share hardware and software resources to > allow smarter management of resources by the provider library and better > performance. This sounds exactly like a PD to me. Why do we need another construct? What is wrong with a 'thread-unsafe' flag during PD creation and then contain the shared resources in the PD? > uint32_t raw_packet_caps; /* Use ibv_raw_packet_caps */ > + uint32_t max_resource_domains; Even with this approach, not sure a max makes much sense, this value should just be hashed into whatever range the provider has on a resource by resource basis. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 18, 2017 at 11:54 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Sun, Sep 17, 2017 at 02:48:42PM +0300, Yishai Hadas wrote: > >> Resource domain represents a collection of IB resources (as of QP, CQ, >> etc.) which are accessed by an application from the same context (e.g. >> same thread) and as such can share hardware and software resources to >> allow smarter management of resources by the provider library and better >> performance. > > This sounds exactly like a PD to me. Why do we need another construct? > What is wrong with a 'thread-unsafe' flag during PD creation and then > contain the shared resources in the PD? PD does not fit good enough. It is does not cover CQ's in allocation flow, but limited to QP's, SRQ's, WQ's. Also, using multiple PD's will require to use a dedicated MR per PD, instead of sharing a single MR for the entire memory used over multiple threads / QP's. > >> uint32_t raw_packet_caps; /* Use ibv_raw_packet_caps */ >> + uint32_t max_resource_domains; > > Even with this approach, not sure a max makes much sense, this value should > just be hashed into whatever range the provider has on a resource by > resource basis. By exposing the max_resource_domains we allow the application to take the decision of which element will get hurt if it reaches that limit. If we leave the logic hidden inside the libraries, the application losses control over the best performance vs average better performance. Alex -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 19, 2017 at 05:05:11PM +0300, Alex Rosenbaum wrote: > On Mon, Sep 18, 2017 at 11:54 PM, Jason Gunthorpe > <jgunthorpe@obsidianresearch.com> wrote: > > On Sun, Sep 17, 2017 at 02:48:42PM +0300, Yishai Hadas wrote: > > > >> Resource domain represents a collection of IB resources (as of QP, CQ, > >> etc.) which are accessed by an application from the same context (e.g. > >> same thread) and as such can share hardware and software resources to > >> allow smarter management of resources by the provider library and better > >> performance. > > > > This sounds exactly like a PD to me. Why do we need another construct? > > What is wrong with a 'thread-unsafe' flag during PD creation and then > > contain the shared resources in the PD? > > PD does not fit good enough. It is does not cover CQ's in allocation > flow, but limited to QP's, SRQ's, WQ's. Well, CQ could gain a PD parameter, so that isn't a big deal.. > Also, using multiple PD's will require to use a dedicated MR per PD, > instead of sharing a single MR for the entire memory used over > multiple threads / QP's. This is a bigger problem, but it could be solved, if not a bit messy.. Your concept patch requires adding a new input to every single object creation function, this is a pretty big API change.. On the other hand, the PD is already an input to almost all of the required creation functions so it much less API churn. But, as you say, you'd have to solve the MR sharing problem at least to make PD workable. Perhaps some kind of 'child' PD that was only for threading.. > >> uint32_t raw_packet_caps; /* Use ibv_raw_packet_caps */ > >> + uint32_t max_resource_domains; > > > > Even with this approach, not sure a max makes much sense, this value should > > just be hashed into whatever range the provider has on a resource by > > resource basis. > > By exposing the max_resource_domains we allow the application to take > the decision of which element will get hurt if it reaches that limit. > If we leave the logic hidden inside the libraries, the application > losses control over the best performance vs average better > performance. That goes both ways, a single 'max' does not make sense if there are multiple different sorts of objects being controlled (eg UAR and something else). It only works well in your context of having a single constrained UAR. Probably the better option is to introduce something like a 'struct verbs_locking_domain *' input which is a proper object and can have a range of operators. I would say each creating locking domain objects is just strided across the various limited resouces in the device. 'dv' accessors, eg mlxdv_set_locking_uar() can be provided to directly assign specific limited resources under the control of the application. 'dv' query functions can be used to directly learn the specific limits for each specific thing. This way we don't have to revisit things. This would also be a good place to centralize all the crazy no threading flags we have accumulated in various places and ways (including the stupid environment variables). a 'struct verbs_locking_domain' option that switches off all internal locking would make sense too. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 21, 2017 at 4:41 AM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > But, as you say, you'd have to solve the MR sharing problem at least > to make PD workable. Perhaps some kind of 'child' PD that was only for > threading.. OK, this definitely sounds like an interesting approach. // alloc a child ibv_pd, which inherits the parent properties and adds some limitations (e.g. thread domain, memory domain) Two options I'm thinking in this spirit: 1) Extend the original alloc PD API (stay with a single PD alloc entry point): struct ibv_pd *ibv_alloc_pd_ex( struct ibv_context *my_ctx, struct ibv_pd_init_attr { ibv_pd* parent_pd = my_pd, // use NULL to alloc a parent PD, or use a parent pd to alloc a child PD int flags = IBV_PD_PARENT | IBV_PD_INIT_ATTR_THREAD_DOMAIN) } ); 2) New dedicated alloc Child PD API: struct ibv_pd *ibv_alloc_child_pd(struct ibv_pd *pd, int flags = IBV_PD_INIT_ATTR_THREAD_DOMAIN); > That goes both ways, a single 'max' does not make sense if there are > multiple different sorts of objects being controlled (eg UAR and > something else). It only works well in your context of having a single > constrained UAR. I agree. > Probably the better option is to introduce something like a 'struct > verbs_locking_domain *' input which is a proper object and can have a > range of operators. I did not understand how the 'struct verbs_locking_domain' comes into play here. In regards to the child PD, I'm assuming the 'struct ibv_pd' will be encap in a 'struct verbs_pd', and/or maybe encap in a 'struct provider_pd'. Any linkage and dedicated resource should be stored in the internal structures. > I would say each creating locking domain objects is just strided > across the various limited resources in the device. With the above verbs API for child PD on dedicated thread, user gets a best effort allocation of thread resource. I agree that striding across resources sounds good. > 'dv' accessors, eg mlxdv_set_locking_uar() can be provided to directly > assign specific limited resources under the control of the > application. 'dv' query functions can be used to directly learn the > specific limits for each specific thing. int mlx5dv_set_pd_uar_index(struct ibv_pd* pd, int uar_index); So with DV API, application can selectively choose a UAR index for Mellanox HW for a child (or parent) PD. All objects created with this PD will use the chosen UAR index or other thread options. max_uar_index will be expose by the DV device caps via 'mlx5dv_query_device()'. If user exceeds the max_uar_index, the above function call will fail. Alex -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 26, 2017 at 12:09:12PM +0300, Alex Rosenbaum wrote: > 2) New dedicated alloc Child PD API: > struct ibv_pd *ibv_alloc_child_pd(struct ibv_pd *pd, int flags = > IBV_PD_INIT_ATTR_THREAD_DOMAIN); I think I prefer this one. This sounds like a nice idea, but I wonder what a provider implementation will look like now that there are two types of PD objects and so forth. The providers will need helpers 'get_pd()' to return the top level pd struct that has the driver data and 'get_locking()' that returns the thread properties associated with that PD. All accesses to the PD will have to go through those accessors. > > Probably the better option is to introduce something like a 'struct > > verbs_locking_domain *' input which is a proper object and can have a > > range of operators. > > I did not understand how the 'struct verbs_locking_domain' comes into play here. > In regards to the child PD, I'm assuming the 'struct ibv_pd' will be > encap in a 'struct verbs_pd', and/or maybe encap in a 'struct > provider_pd'. Any linkage and dedicated resource should be stored in > the internal structures. This was intented as an alterantive. If the PD approach is no good, then use this instead of just an integer. In the PD case this object exists but is hidden from the uapi. 'get_locking()' returns it. eg a child pd would have a struct verbs_locking_domain and the driver specific version as the driver data of a pd object. > int mlx5dv_set_pd_uar_index(struct ibv_pd* pd, int uar_index); > So with DV API, application can selectively choose a UAR index for > Mellanox HW for a child (or parent) PD. > All objects created with this PD will use the chosen UAR index or > other thread options. > max_uar_index will be expose by the DV device caps via > 'mlx5dv_query_device()'. If user exceeds the max_uar_index, the above > function call will fail. Yes. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/libibverbs/verbs.h b/libibverbs/verbs.h index 8cdf8ab..b2b51c1 100644 --- a/libibverbs/verbs.h +++ b/libibverbs/verbs.h @@ -272,6 +272,7 @@ struct ibv_device_attr_ex { uint32_t max_wq_type_rq; struct ibv_packet_pacing_caps packet_pacing_caps; uint32_t raw_packet_caps; /* Use ibv_raw_packet_caps */ + uint32_t max_resource_domains; }; enum ibv_mtu { @@ -785,7 +786,8 @@ enum ibv_qp_init_attr_mask { IBV_QP_INIT_ATTR_MAX_TSO_HEADER = 1 << 3, IBV_QP_INIT_ATTR_IND_TABLE = 1 << 4, IBV_QP_INIT_ATTR_RX_HASH = 1 << 5, - IBV_QP_INIT_ATTR_RESERVED = 1 << 6 + IBV_QP_INIT_ATTR_RES_DOMAIN = 1 << 6, + IBV_QP_INIT_ATTR_RESERVED = 1 << 7 }; enum ibv_qp_create_flags { @@ -821,6 +823,7 @@ struct ibv_qp_init_attr_ex { struct ibv_rwq_ind_table *rwq_ind_tbl; struct ibv_rx_hash_conf rx_hash_conf; uint32_t source_qpn; + uint32_t resource_domain; }; enum ibv_qp_open_attr_mask {
Resource domain represents a collection of IB resources (as of QP, CQ, etc.) which are accessed by an application from the same context (e.g. same thread) and as such can share hardware and software resources to allow smarter management of resources by the provider library and better performance. Today, without the resource domain, provider libraries ensure thread safety for all resources and implement some arbitrary logic regarding hardware resource management (e.g. pre-allocated limit pools, dynamic allocation, round-robin vs lru, etc.). We would like to allow the application to explicitly specify which resources are going to be used by which resource domains, and by that allow the libraries to improve the HW and SW resource utilization. When application creates several resources on the same resource domain index, the provider library can share the related hardware resources between them and as such better utilize its available resources. On the other hand if each resource is created by the application on a dedicated resource domain index it will use a dedicated hardware resource. For example if few QPs are used by the application from the same thread the hardware doorbell can be shared between them. This drops the need to allocate a dedicate doorbell per QP or share doorbells which are handled by different threads. On the other hand if few QPs are not logically related an application can enjoy the option to get a dedicated resource per QP and prevent the need to take a lock in its data path flow to protect that doorbell update operations. This RFC patch introduces the following verbs extensions to support that: 1) Extend ibv_query_device_ex to return max_resource_domains. This attribute represents range of valid [0..max_resource_domains] device domains that resources can be part of. Each driver which supports this feature will return a positive value, declaring its max unique resource domains that can be used. 2) Extend ibv_create_qp_ex to get resource_domain to enable associating a QP into a specific resource domain. Each library will use the input value to manage the related hardware resources, and reduce the amount of locks required. Application is required to use a single context when accessing any of the IB objects per each resource domain. Specifically, we plan to use this API in mlx5 driver to better manage the doorbell (UAR) hardware resource which is a limited hardware resource by exposing an option from kernel to allocate it on demand and share among QPs or attach it to a specific a QP based on application needs which are represented by the resource domain property. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> --- libibverbs/verbs.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)