diff mbox

[RFC,03/16] ib/mad: Add check for jumbo MADs support on a device

Message ID 1415908465-24392-4-git-send-email-ira.weiny@intel.com (mailing list archive)
State Superseded
Headers show

Commit Message

Ira Weiny Nov. 13, 2014, 7:54 p.m. UTC
From: Ira Weiny <ira.weiny@intel.com>

Check for IB_DEVICE_JUMBO_MAD_SUPPORT in the device capabilities and if
supported mark the special QPs created.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 drivers/infiniband/core/mad.c      | 26 +++++++++++++++++++++++---
 drivers/infiniband/core/mad_priv.h |  1 +
 2 files changed, 24 insertions(+), 3 deletions(-)

Comments

Or Gerlitz Nov. 27, 2014, 11:47 a.m. UTC | #1
On 11/13/2014 9:54 PM, ira.weiny@intel.com wrote:

> Add a device capability flag for OPA devices to signal their support of "jumbo"
> MADs.
>
> Check for IB_DEVICE_JUMBO_MAD_SUPPORT in the device capabilities and if
> supported mark the special QPs created.

Ira,

Few comments here, the device capability is OK, specifically if it helps 
for the mad
layer logic to be simpler and/or more robust.

You should add device attribute telling what is the size of MADs 
supported by the
device, I think the IBTA mandated 256Band in the OPA case, the driver 
will fill there 2k.

You can't just state that (jumbo == 2k) and go to complete design/changes
which use this hard-coded assumption. You need to see how to re-spin 
this series
w.o this assumption.

I find it very annoying that upper level drivers replicate in different 
ways elements
from the IB device attributes returned by ib_query_device. I met that in 
multiple
drivers and upcoming designs for which I do code review. Are you up to 
come up
with a patch that caches the device attributes on the device structure? 
if not,
I can do that.. and have your code to see it.

Also, I would go and unify (squash) this patch and the preceding one.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Nov. 27, 2014, 1:51 p.m. UTC | #2
On 11/27/2014 1:47 PM, Or Gerlitz wrote:
> On 11/13/2014 9:54 PM, ira.weiny@intel.com wrote:
>
>> Add a device capability flag for OPA devices to signal their support
>> of "jumbo"
>> MADs.
>>
>> Check for IB_DEVICE_JUMBO_MAD_SUPPORT in the device capabilities and if
>> supported mark the special QPs created.
>
> Ira,
>
> Few comments here, the device capability is OK, specifically if it helps
> for the mad
> layer logic to be simpler and/or more robust.
>
> You should add device attribute telling what is the size of MADs
> supported by the
> device, I think the IBTA mandated 256Band in the OPA case, the driver
> will fill there 2k.
>
> You can't just state that (jumbo == 2k) and go to complete design/changes
> which use this hard-coded assumption. You need to see how to re-spin
> this series
> w.o this assumption.

Why do we need both? can't we just rely on just the size?

We're already short on bits in device_cap_flags so maybe we can do
without?

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Nov. 27, 2014, 2:59 p.m. UTC | #3
On 11/27/2014 3:51 PM, Sagi Grimberg wrote:
> We're already short on bits in device_cap_flags

no shortage @ the kernel... we can add more 32 bits if/when we need it
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ira Weiny Dec. 8, 2014, 12:23 a.m. UTC | #4
> 
> On 11/13/2014 9:54 PM, ira.weiny@intel.com wrote:
> 
> > Add a device capability flag for OPA devices to signal their support of "jumbo"
> > MADs.
> >
> > Check for IB_DEVICE_JUMBO_MAD_SUPPORT in the device capabilities and
> > if supported mark the special QPs created.
> 
> Ira,
> 
> Few comments here, the device capability is OK, specifically if it helps for the
> mad layer logic to be simpler and/or more robust.
> 
> You should add device attribute telling what is the size of MADs supported by
> the device, I think the IBTA mandated 256Band in the OPA case, the driver will
> fill there 2k.

I don't think this is a bad idea, however, the complication is determining the kmem_cache object size in that case.  How about we limit this value to < 2K until such time as there is a need for larger support?

> 
> You can't just state that (jumbo == 2k) and go to complete design/changes
> which use this hard-coded assumption. You need to see how to re-spin this
> series w.o this assumption.

I'm looking into it.  I believe I will still need to have a capability bit, but it may end up having a different meaning.

> 
> I find it very annoying that upper level drivers replicate in different ways
> elements from the IB device attributes returned by ib_query_device. I met that
> in multiple drivers and upcoming designs for which I do code review. Are you
> up to come up with a patch that caches the device attributes on the device
> structure?

I don't follow what you are asking for.  Could you give more details?

> if not,
> I can do that.. and have your code to see it.
> 
> Also, I would go and unify (squash) this patch and the preceding one.
> 

I'll keep this in mind as I rework the patches.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Roland Dreier Dec. 8, 2014, 12:47 a.m. UTC | #5
On Sun, Dec 7, 2014 at 4:23 PM, Weiny, Ira <ira.weiny@intel.com> wrote:
> I don't think this is a bad idea, however, the complication is determining the kmem_cache object size in that case.  How about we limit this value to < 2K until such time as there is a need for larger support?

Can we just get rid of all the kmem_caches used in the MAD code?  I
don't think any of this is so performance-critical that we couldn't
just use kmalloc...

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Dec. 8, 2014, 11:29 a.m. UTC | #6
On Mon, Dec 8, 2014 at 2:23 AM, Weiny, Ira <ira.weiny@intel.com> wrote:

>> I find it very annoying that upper level drivers replicate in different ways
>> elements from the IB device attributes returned by ib_query_device. I met that
>> in multiple drivers and upcoming designs for which I do code review. Are you
>> up to come up with a patch that caches the device attributes on the device
>> structure?

> I don't follow what you are asking for.  Could you give more details?

1. add a struct ib_device_attr field to struct ib_device

2. when the device registers itself with the IB core, go and run the
query_device verb with the param being pointer to that field
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ira Weiny Dec. 9, 2014, 11:23 p.m. UTC | #7
PiANCj4gT24gU3VuLCBEZWMgNywgMjAxNCBhdCA0OjIzIFBNLCBXZWlueSwgSXJhIDxpcmEud2Vp
bnlAaW50ZWwuY29tPiB3cm90ZToNCj4gPiBJIGRvbid0IHRoaW5rIHRoaXMgaXMgYSBiYWQgaWRl
YSwgaG93ZXZlciwgdGhlIGNvbXBsaWNhdGlvbiBpcyBkZXRlcm1pbmluZyB0aGUNCj4ga21lbV9j
YWNoZSBvYmplY3Qgc2l6ZSBpbiB0aGF0IGNhc2UuICBIb3cgYWJvdXQgd2UgbGltaXQgdGhpcyB2
YWx1ZSB0byA8IDJLDQo+IHVudGlsIHN1Y2ggdGltZSBhcyB0aGVyZSBpcyBhIG5lZWQgZm9yIGxh
cmdlciBzdXBwb3J0Pw0KPiANCj4gQ2FuIHdlIGp1c3QgZ2V0IHJpZCBvZiBhbGwgdGhlIGttZW1f
Y2FjaGVzIHVzZWQgaW4gdGhlIE1BRCBjb2RlPyAgSSBkb24ndA0KPiB0aGluayBhbnkgb2YgdGhp
cyBpcyBzbyBwZXJmb3JtYW5jZS1jcml0aWNhbCB0aGF0IHdlIGNvdWxkbid0IGp1c3QgdXNlIGtt
YWxsb2MuLi4NCj4gDQoNCk9uIGFuIFNNIG5vZGUgdGhlIE1BRCBjb2RlIGlzIF92ZXJ5XyBwZXJm
b3JtYW5jZS1jcml0aWNhbC4gIEkgZG9u4oCZdCB0aGluayB3ZSBzaG91bGQgZ2l2ZSB0aGF0IHVw
Lg0KDQpBbm90aGVyIGFsdGVybmF0aXZlIGlzIHRvIGhhdmUgYSBjYWNoZSBwZXIgZGV2aWNlIGJh
c2VkIG9uIHRoZSBzaXplIHJlcG9ydGVkLg0KDQotLSBJcmENCg0K
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ira Weiny Jan. 7, 2015, 4:32 p.m. UTC | #8
> 
> On 11/27/2014 3:51 PM, Sagi Grimberg wrote:
> > We're already short on bits in device_cap_flags
> 
> no shortage @ the kernel... we can add more 32 bits if/when we need it

Why is there a gap in the bits?

enum ib_device_cap_flags {
...
        IB_DEVICE_MEM_WINDOW_TYPE_2B    = (1<<24), 
        IB_DEVICE_MANAGED_FLOW_STEERING = (1<<29),
...
};

25-28 are not used?  Is there some legacy software out there which uses these?

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Jan. 8, 2015, 11:41 a.m. UTC | #9
On 1/7/2015 6:32 PM, Weiny, Ira wrote:
>> >On 11/27/2014 3:51 PM, Sagi Grimberg wrote:
>>> > >We're already short on bits in device_cap_flags
>> >
>> >no shortage @ the kernel... we can add more 32 bits if/when we need it
> Why is there a gap in the bits?
>
> enum ib_device_cap_flags {
> ...
>          IB_DEVICE_MEM_WINDOW_TYPE_2B    = (1<<24),
>          IB_DEVICE_MANAGED_FLOW_STEERING = (1<<29),
> ...
> };
>
> 25-28 are not used?  Is there some legacy software out there which uses these?

sort of... the verbs RSS design which wasn't accepted upstream uses bits 
25/26/27 and Mellanox Core-Direct (whose verbs model called 
"Cross-Channel") which wasn't submitted upstream yet uses bit 28. Since 
in the kernel we can trivially use 64 bits, I would opt to keep these 
entries free for now.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 4673262..9f5641d 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2855,7 +2855,8 @@  static void init_mad_queue(struct ib_mad_qp_info *qp_info,
 }
 
 static void init_mad_qp(struct ib_mad_port_private *port_priv,
-			struct ib_mad_qp_info *qp_info)
+			struct ib_mad_qp_info *qp_info,
+			int supports_jumbo_mads)
 {
 	qp_info->port_priv = port_priv;
 	init_mad_queue(qp_info, &qp_info->send_queue);
@@ -2865,6 +2866,7 @@  static void init_mad_qp(struct ib_mad_port_private *port_priv,
 	qp_info->snoop_table = NULL;
 	qp_info->snoop_table_size = 0;
 	atomic_set(&qp_info->snoop_count, 0);
+	qp_info->supports_jumbo_mads = supports_jumbo_mads;
 }
 
 static int create_mad_qp(struct ib_mad_qp_info *qp_info,
@@ -2911,6 +2913,17 @@  static void destroy_mad_qp(struct ib_mad_qp_info *qp_info)
 	kfree(qp_info->snoop_table);
 }
 
+static int
+mad_device_supports_jumbo_mads(struct ib_device *device)
+{
+	struct ib_device_attr attr;
+
+	if (!ib_query_device(device, &attr))
+		return (attr.device_cap_flags
+			& IB_DEVICE_JUMBO_MAD_SUPPORT);
+	return 0;
+}
+
 /*
  * Open the port
  * Create the QP, PD, MR, and CQ if needed
@@ -2923,6 +2936,7 @@  static int ib_mad_port_open(struct ib_device *device,
 	unsigned long flags;
 	char name[sizeof "ib_mad123"];
 	int has_smi;
+	int supports_jumbo_mads;
 
 	/* Create new device info */
 	port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL);
@@ -2935,8 +2949,14 @@  static int ib_mad_port_open(struct ib_device *device,
 	port_priv->port_num = port_num;
 	spin_lock_init(&port_priv->reg_lock);
 	INIT_LIST_HEAD(&port_priv->agent_list);
-	init_mad_qp(port_priv, &port_priv->qp_info[0]);
-	init_mad_qp(port_priv, &port_priv->qp_info[1]);
+
+	supports_jumbo_mads = mad_device_supports_jumbo_mads(device);
+	if (supports_jumbo_mads)
+		pr_info("Jumbo MAD support enabled for %s:%d\n",
+				device->name, port_num);
+
+	init_mad_qp(port_priv, &port_priv->qp_info[0], supports_jumbo_mads);
+	init_mad_qp(port_priv, &port_priv->qp_info[1], supports_jumbo_mads);
 
 	cq_size = mad_sendq_size + mad_recvq_size;
 	has_smi = rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND;
diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h
index d1a0b0e..4b4110d 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -192,6 +192,7 @@  struct ib_mad_qp_info {
 	struct ib_mad_snoop_private **snoop_table;
 	int snoop_table_size;
 	atomic_t snoop_count;
+	int supports_jumbo_mads;
 };
 
 struct ib_mad_port_private {