Message ID | 1412728888-13100-1-git-send-email-jkallickal@emulex.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
On 10/8/2014 3:41 AM, Jay Kallickal wrote: > From: Jayamohan Kallickal <jayamohank@gmail.com> > > This patch allows the underlying hardware to choose > values other than hard coded max values for cqe and send_wr > while preventing them from exceeding max supported values. Hi Minh and Jayamohan, So I agree that we would want to take device capabilities into account here, but we need to be able to adjust scsi_cmds_max (can_queue) in case the max wqe supported is lower than scsi_cmds_max * num_posts_per_cmd. So generally I agree with this approach, but we need to take care of stuff later when the session is created. One more thing, this is not rebased on the latest iser patches please send v1 on top of: http://marc.info/?l=linux-rdma&m=141216135013146&w=2 P.S. What device did you test with (that supports less than iSER needs)? Thanks! Sagi. > > Signed-off-by: Minh Tran <minhduc.tran@emulex.com> > Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com> > --- > drivers/infiniband/ulp/iser/iser_verbs.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/9/2014 8:14 AM, Jayamohan.K wrote: <SNIP> > Hi Minh and Jayamohan, > > So I agree that we would want to take device capabilities into account > here, but we need to be able to adjust scsi_cmds_max (can_queue) in case > the max wqe supported is lower than scsi_cmds_max * num_posts_per_cmd. > > > > I feel we should be fine as long as we support the max_cmds of 128 > supported by open-iscsi layer. > The cmds_max can be modified by the user, so we should be fine in this case as well. > I see the iser layer uses a value of 512 though it only serves as a > limit check. So if iser supports less than 512 (due to device capability) the boundary check should be modified as well. > > > So generally I agree with this approach, but we need to take care of > stuff later when the session is created. > > One more thing, this is not rebased on the latest iser patches please > send v1 on top of: > http://marc.info/?l=linux-__rdma&m=141216135013146&w=2 > <http://marc.info/?l=linux-rdma&m=141216135013146&w=2> > > > Yes, I will recreate the patch and send on top of this Great! > > > > P.S. > What device did you test with (that supports less than iSER needs)? > This question still holds. Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/8/2014 2:41 AM, Jay Kallickal wrote: > This patch allows the underlying hardware to choose > values other than hard coded max values for cqe and send_wr > while preventing them from exceeding max supported values. These values are not "just" hard coded. If your HW driver (ocrdma?) can't support them, the way to fix that is different e.g use smaller number of commands per session. I don't see the point to rebase the patch before make a small design discussion here. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Or Gerlitz, I am new to IB/iser so don't know much about the history of all these max settings being #define instead of taking the real numbers from querying the HW. Yes, our HW driver is the ocrdma which distributes number of cqe per CQ up to 32 CQ. There is a missing adjustable knob in iser to fine tune accordingly the underlining HW. To give you some ideas how these values are being define now, here are the numbers and the added comments: ISER_MAX_RX_CQ_LEN 4096 /* This number should be calculated during create_session */ ISER_QP_MAX_RECV_DTOS 512 /* Why can't we use ISCSI_DEF_XMIT_CMDS_MAX here ? */ ISER_MAX_TX_CQ_LEN 36944 /* the mlx4 hw supports up to 3 CQ, but the ocrdma hw supports up to 32CQ with lower number of cqe per CQ */ ISER_QP_MAX_REQ_DTOS 4618 ISCSI_ISER_MAX_CONN 8 /* I am not sure what this 8 connections per CQ is. Open-iscsi will supports 1 connection per session so this can implies either one of these two things: 1- mlx4 is limited to 8 sessions per CQ 2- mlx4 is doing something proprietary on the hw to have multiple QP per session */ ISER_MAX_CQ 4 /* Should this number be much higher or bases on the number of cpu cores on the system to distribute CQ processing per core? */ We are open for suggestions. -Minh -----Original Message----- From: Or Gerlitz [mailto:ogerlitz@mellanox.com] Sent: Tuesday, October 14, 2014 12:51 AM To: Jay Kallickal; Minh Duc Tran Cc: michaelc@cs.wisc.edu; linux-rdma@vger.kernel.org; Jayamohan Kallickal Subject: Re: [PATCH 1/1] IB/iser: Remove hard coded values for cqe and send_wr On 10/8/2014 2:41 AM, Jay Kallickal wrote: > This patch allows the underlying hardware to choose values other than > hard coded max values for cqe and send_wr while preventing them from > exceeding max supported values. These values are not "just" hard coded. If your HW driver (ocrdma?) can't support them, the way to fix that is different e.g use smaller number of commands per session. I don't see the point to rebase the patch before make a small design discussion here. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How many CQEs per CQ does the ocrdma driver supports? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
V2l0aCB0aGUgSFcgYW5kIGZ3IHByb2ZpbGUgd2UgYXJlIHJ1bm5pbmcgd2l0aCB0aGUgb2NyZG1h IGN1cnJlbnRseSwgaXQncyA4ayBwZXIgQ1EuICBUaGlzIG51bWJlciBjb3VsZCBjaGFuZ2UgaWYg d2UgcnVuIG9uIGRpZmZlcmVudCAgaHcgb3IgZncgcHJvZmlsZS4NCg0KLS0tLS1PcmlnaW5hbCBN ZXNzYWdlLS0tLS0NCkZyb206IE9yIEdlcmxpdHogW21haWx0bzpnZXJsaXR6Lm9yQGdtYWlsLmNv bV0gDQpTZW50OiBXZWRuZXNkYXksIE9jdG9iZXIgMTUsIDIwMTQgMzozMiBQTQ0KVG86IE1pbmgg RHVjIFRyYW4NCkNjOiBPciBHZXJsaXR6OyBKYXkgS2FsbGlja2FsOyBtaWNoYWVsY0Bjcy53aXNj LmVkdTsgbGludXgtcmRtYUB2Z2VyLmtlcm5lbC5vcmc7IEpheWFtb2hhbiBLYWxsaWNrYWwNClN1 YmplY3Q6IFJlOiBbUEFUQ0ggMS8xXSBJQi9pc2VyOiBSZW1vdmUgaGFyZCBjb2RlZCB2YWx1ZXMg Zm9yIGNxZSBhbmQgc2VuZF93cg0KDQpIb3cgbWFueSBDUUVzIHBlciBDUSBkb2VzIHRoZSBvY3Jk bWEgZHJpdmVyIHN1cHBvcnRzPw0K -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 16, 2014 at 1:41 AM, Minh Duc Tran <MinhDuc.Tran@emulex.com> wrote: > With the HW and fw profile we are running with the ocrdma currently, it's 8k per CQ. This number could change if we run on different hw or fw profile. OK. So CQEs per CQ wise, there's nothing in the ocrdma (sw/fw/fw) which is extremely different. The more major difference is the relatively small numbers of CQs per device you can support on your driver. Sorry for being a bit short and not explaining everything, I'm on LPC 2014 so a bit busy... but AFAI-See-This, here's the list of TODO items here: 1. change the the number of CQs to be min(num_cpus, 1/2 of what the device can support) 2. add the # of SCSI commands per session and y/s immediate data is supported for this session to ep_connect_with_params Sagi, agree? #1 is pretty easy and we actually have it ready for 3.19 #2 should be easy too, Max, please add it to your TODO for the ep connect changes Also please use TEXT based replies applied in bottom posting manner [1]. Or. [1] http://en.wikipedia.org/wiki/Posting_style#Bottom-posting > > -----Original Message----- > From: Or Gerlitz [mailto:gerlitz.or@gmail.com] > Sent: Wednesday, October 15, 2014 3:32 PM > To: Minh Duc Tran > Cc: Or Gerlitz; Jay Kallickal; michaelc@cs.wisc.edu; linux-rdma@vger.kernel.org; Jayamohan Kallickal > Subject: Re: [PATCH 1/1] IB/iser: Remove hard coded values for cqe and send_wr > > How many CQEs per CQ does the ocrdma driver supports? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/15/2014 12:53 AM, Minh Duc Tran wrote: > Hi Or Gerlitz, > I am new to IB/iser so don't know much about the history of all these max settings being #define instead of taking the real numbers from querying the HW. Yes, our HW driver is the ocrdma which distributes number of cqe per CQ up to 32 CQ. There is a missing adjustable knob in iser to fine tune accordingly the underlining HW. To give you some ideas how these values are being define now, here are the numbers and the added comments: > Hey Minh, > ISER_MAX_RX_CQ_LEN 4096 /* This number should be calculated during create_session */ So in iSER CQs are shared a across the device - so this number should satisfy maximum number of connections per CQ (which is currently 8). > ISER_QP_MAX_RECV_DTOS 512 /* Why can't we use ISCSI_DEF_XMIT_CMDS_MAX here ? */ iSER creates the connection QP when before the session is created - so it doesn't know what the user will set at cmds_max (which is potentially larger than ISCSI_DEF_XMIT_CMDS_MAX). So we allow 512 at the moment and adjust the session cmds_max accordingly. I agree that this is a work around for the moment as we don't know at QP creation time what is the user setting of cmds_max. > ISER_MAX_TX_CQ_LEN 36944 /* the mlx4 hw supports up to 3 CQ, but the ocrdma hw supports up to 32CQ with lower number of cqe per CQ */ What led you to conclude that: "the mlx4 hw supports up to 3 CQ"? TX CQ length should be > ISER_QP_MAX_REQ_DTOS 4618 > ISCSI_ISER_MAX_CONN 8 /* I am not sure what this 8 connections per CQ is. Open-iscsi will supports 1 connection per session so this can implies either one of these two things: > 1- mlx4 is limited to 8 sessions per CQ > 2- mlx4 is doing something proprietary on the hw to have multiple QP per session */ As I said, CQs are per device and shared across iscsi connections. So each CQ will support up to 8 connections per CQ. I agree we should allocate more CQs in case of more connections are opened - but we never got any CQ to overrun (even in high stress) so this still on my todo list... > ISER_MAX_CQ 4 /* Should this number be much higher or bases on the number of cpu cores on the system to distribute CQ processing per core? */ I completely agree, This is a legacy MAX ceiling. I have a patch for that pending at Or's table. I am all for getting it to 3.18/3.19 > > We are open for suggestions. OK, I'll respond to Or's reply on the TODOs. Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/16/2014 8:31 AM, Or Gerlitz wrote: > On Thu, Oct 16, 2014 at 1:41 AM, Minh Duc Tran <MinhDuc.Tran@emulex.com> wrote: >> With the HW and fw profile we are running with the ocrdma currently, it's 8k per CQ. This number could change if we run on different hw or fw profile. > > OK. So CQEs per CQ wise, there's nothing in the ocrdma (sw/fw/fw) > which is extremely different. The more major difference is the > relatively small numbers > of CQs per device you can support on your driver. > > Sorry for being a bit short and not explaining everything, I'm on LPC > 2014 so a bit busy... but AFAI-See-This, > here's the list of TODO items here: > > 1. change the the number of CQs to be min(num_cpus, 1/2 of what the > device can support) > 2. add the # of SCSI commands per session and y/s immediate data is > supported for this session to ep_connect_with_params > > Sagi, agree? > > #1 is pretty easy and we actually have it ready for 3.19 Maybe even 3.18? > > #2 should be easy too, Max, please add it to your TODO for the ep > connect changes > I don't think we need it in ep_connect. We can create CQs/QPs with min(desired, device_support) and just keep the sizes and adjust session cmds_max at session creation time, and max QPs per CQ. What I am concerned about is that we don't enforce max QPs per CQ. we have never seen this overrun - but no reason why it wouldn't happen. I have it on my todo list, but we need to take care of it soon. This issue would be somewhat relaxed if we got rid of the artificial ISER_MAX_CQ. Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Oct 19, 2014 at 6:50 PM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote: > On 10/16/2014 8:31 AM, Or Gerlitz wrote: >> >> On Thu, Oct 16, 2014 at 1:41 AM, Minh Duc Tran <MinhDuc.Tran@emulex.com> >> wrote: >>> >>> With the HW and fw profile we are running with the ocrdma currently, it's >>> 8k per CQ. This number could change if we run on different hw or fw >>> profile. >> >> >> OK. So CQEs per CQ wise, there's nothing in the ocrdma (sw/fw/fw) >> which is extremely different. The more major difference is the >> relatively small numbers >> of CQs per device you can support on your driver. >> >> Sorry for being a bit short and not explaining everything, I'm on LPC >> 2014 so a bit busy... but AFAI-See-This, >> here's the list of TODO items here: >> >> 1. change the the number of CQs to be min(num_cpus, 1/2 of what the >> device can support) >> 2. add the # of SCSI commands per session and y/s immediate data is >> supported for this session to ep_connect_with_params >> >> Sagi, agree? >> >> #1 is pretty easy and we actually have it ready for 3.19 > Maybe even 3.18? It doesn't fix anything, I don't really see the point. >> #2 should be easy too, Max, please add it to your TODO for the ep >> connect changes >> > > I don't think we need it in ep_connect. > We can create CQs/QPs with min(desired, device_support) and just keep > the sizes and adjust session cmds_max at session creation time, and max > QPs per CQ. By "desired" you mean a hard coded maximum on the session cmds_max as we have today (512)? > What I am concerned about is that we don't enforce max QPs per > CQ. we have never seen this overrun - but no reason why it wouldn't > happen. I have it on my todo list, but we need to take care of it soon. > This issue would be somewhat relaxed if we got rid of the artificial > ISER_MAX_CQ. I haven't seen this coming into play too. But I tend to agree that once we go for the larger # of CQs, we should be @ a more robust state. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 32849f2..7cdb297 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -73,7 +73,7 @@ static int iser_create_device_ib_res(struct iser_device *device) { struct iser_cq_desc *cq_desc; struct ib_device_attr *dev_attr = &device->dev_attr; - int ret, i, j; + int ret, i, j, max_cqe; ret = ib_query_device(device->ib_device, dev_attr); if (ret) { @@ -120,18 +120,24 @@ static int iser_create_device_ib_res(struct iser_device *device) cq_desc[i].device = device; cq_desc[i].cq_index = i; + max_cqe = (dev_attr->max_cqe < ISER_MAX_RX_CQ_LEN) ? + dev_attr->max_cqe : ISER_MAX_RX_CQ_LEN; + device->rx_cq[i] = ib_create_cq(device->ib_device, iser_cq_callback, iser_cq_event_callback, (void *)&cq_desc[i], - ISER_MAX_RX_CQ_LEN, i); + max_cqe, i); if (IS_ERR(device->rx_cq[i])) goto cq_err; + max_cqe = (dev_attr->max_cqe < ISER_MAX_TX_CQ_LEN) ? + dev_attr->max_cqe : ISER_MAX_TX_CQ_LEN; + device->tx_cq[i] = ib_create_cq(device->ib_device, NULL, iser_cq_event_callback, (void *)&cq_desc[i], - ISER_MAX_TX_CQ_LEN, i); + max_cqe, i); if (IS_ERR(device->tx_cq[i])) goto cq_err; @@ -439,6 +445,7 @@ void iser_free_fastreg_pool(struct iser_conn *ib_conn) static int iser_create_ib_conn_res(struct iser_conn *ib_conn) { struct iser_device *device; + struct ib_device_attr *dev_attr; struct ib_qp_init_attr init_attr; int ret = -ENOMEM; int index, min_index = 0; @@ -459,6 +466,7 @@ static int iser_create_ib_conn_res(struct iser_conn *ib_conn) mutex_unlock(&ig.connlist_mutex); iser_info("cq index %d used for ib_conn %p\n", min_index, ib_conn); + dev_attr = &device->dev_attr; init_attr.event_handler = iser_qp_event_callback; init_attr.qp_context = (void *)ib_conn; init_attr.send_cq = device->tx_cq[min_index]; @@ -472,7 +480,9 @@ static int iser_create_ib_conn_res(struct iser_conn *ib_conn) init_attr.cap.max_send_wr = ISER_QP_SIG_MAX_REQ_DTOS; init_attr.create_flags |= IB_QP_CREATE_SIGNATURE_EN; } else { - init_attr.cap.max_send_wr = ISER_QP_MAX_REQ_DTOS; + init_attr.cap.max_send_wr = + (dev_attr->max_qp_wr < ISER_QP_MAX_REQ_DTOS) ? + dev_attr->max_qp_wr : ISER_QP_MAX_REQ_DTOS; } ret = rdma_create_qp(ib_conn->cma_id, device->pd, &init_attr);