Message ID | 5706E548.3080002@sandisk.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Thu, 2016-04-07 at 15:55 -0700, Bart Van Assche wrote: > That patch causes the ib_srpt driver to crash as soon as the first > SCSI command is received. This means that that patch was untested. > Hence revert it. The shortcomings of that patch are as follows: > - It makes the ib_srpt driver use I/O contexts allocated by > transport_alloc_session_tags() but it does not initialize these > I/O contexts properly. All the initializations performed by > srpt_alloc_ioctx() are skipped. > - It swaps the order of the send ioctx allocation and the transition > to RTR mode which is wrong. > - The amount of memory that is needed for I/O contexts is doubled. > - srpt_rdma_ch.free_list is no longer used but is not removed. > > Revert commit 0fd10721fe36 and thereby fix the following kernel crash: > > kernel BUG at drivers/infiniband/ulp/srpt/ib_srpt.c:1439! > invalid opcode: 0000 [#1] SMP > Workqueue: target_completion target_complete_ok_work [target_core_mod] > RIP: 0010:[<ffffffffa052ef37>] [<ffffffffa052ef37>] srpt_queue_response+0x437/0x4a0 [ib_srpt] > Call Trace: > [<ffffffffa052f009>] srpt_queue_data_in+0x9/0x10 [ib_srpt] > [<ffffffffa04f1ee2>] target_complete_ok_work+0x152/0x2b0 [target_core_mod] > [<ffffffff81071ea7>] process_one_work+0x197/0x480 > [<ffffffff810721d9>] worker_thread+0x49/0x490 > [<ffffffff8107878a>] kthread+0xea/0x100 > [<ffffffff8159b172>] ret_from_fork+0x22/0x40 > > Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> > Cc: Nicholas Bellinger <nab@linux-iscsi.org> Please stop trying to bypass target-pending for your target related patches. I've already asked you not to revert the patch, because I'm working on a patch to address the v4.6-rc regression here: http://www.spinics.net/lists/target-devel/msg12535.html If you've found a bug in that patch, please comment in that thread, or even better send an incremental diff for what you've found. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/07/2016 04:37 PM, Nicholas A. Bellinger wrote: > On Thu, 2016-04-07 at 15:55 -0700, Bart Van Assche wrote: >> That patch causes the ib_srpt driver to crash as soon as the first >> SCSI command is received. This means that that patch was untested. >> Hence revert it. The shortcomings of that patch are as follows: >> - It makes the ib_srpt driver use I/O contexts allocated by >> transport_alloc_session_tags() but it does not initialize these >> I/O contexts properly. All the initializations performed by >> srpt_alloc_ioctx() are skipped. >> - It swaps the order of the send ioctx allocation and the transition >> to RTR mode which is wrong. >> - The amount of memory that is needed for I/O contexts is doubled. >> - srpt_rdma_ch.free_list is no longer used but is not removed. >> >> Revert commit 0fd10721fe36 and thereby fix the following kernel crash: >> >> kernel BUG at drivers/infiniband/ulp/srpt/ib_srpt.c:1439! >> invalid opcode: 0000 [#1] SMP >> Workqueue: target_completion target_complete_ok_work [target_core_mod] >> RIP: 0010:[<ffffffffa052ef37>] [<ffffffffa052ef37>] srpt_queue_response+0x437/0x4a0 [ib_srpt] >> Call Trace: >> [<ffffffffa052f009>] srpt_queue_data_in+0x9/0x10 [ib_srpt] >> [<ffffffffa04f1ee2>] target_complete_ok_work+0x152/0x2b0 [target_core_mod] >> [<ffffffff81071ea7>] process_one_work+0x197/0x480 >> [<ffffffff810721d9>] worker_thread+0x49/0x490 >> [<ffffffff8107878a>] kthread+0xea/0x100 >> [<ffffffff8159b172>] ret_from_fork+0x22/0x40 >> >> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> >> Cc: Nicholas Bellinger <nab@linux-iscsi.org> > > I've already asked you not to revert the patch [ ... ] But why not to revert that patch? Your patch was completely untested and makes the ib_srpt driver unusable. I think the regular approach for such patches is to revert them and to rework these in a later kernel version. Linus, please let me know if you disagree. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Apr 7, 2016 16:37, "Nicholas A. Bellinger" <nab@linux-iscsi.org> wrote: > > Please stop trying to bypass target-pending for your target related > patches. Nicholas, if you send me totally broken and untested crap, you have lost *all* rights to then complain about "bypass you". So get off your high horse. Bart did the right thing, and *you* fucked up. Don't try to shift him be the bad guy. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2016-04-07 at 16:44 -0700, Bart Van Assche wrote: > On 04/07/2016 04:37 PM, Nicholas A. Bellinger wrote: > > On Thu, 2016-04-07 at 15:55 -0700, Bart Van Assche wrote: > >> That patch causes the ib_srpt driver to crash as soon as the first > >> SCSI command is received. This means that that patch was untested. > >> Hence revert it. The shortcomings of that patch are as follows: > >> - It makes the ib_srpt driver use I/O contexts allocated by > >> transport_alloc_session_tags() but it does not initialize these > >> I/O contexts properly. All the initializations performed by > >> srpt_alloc_ioctx() are skipped. > >> - It swaps the order of the send ioctx allocation and the transition > >> to RTR mode which is wrong. > >> - The amount of memory that is needed for I/O contexts is doubled. > >> - srpt_rdma_ch.free_list is no longer used but is not removed. > >> > >> Revert commit 0fd10721fe36 and thereby fix the following kernel crash: > >> > >> kernel BUG at drivers/infiniband/ulp/srpt/ib_srpt.c:1439! > >> invalid opcode: 0000 [#1] SMP > >> Workqueue: target_completion target_complete_ok_work [target_core_mod] > >> RIP: 0010:[<ffffffffa052ef37>] [<ffffffffa052ef37>] srpt_queue_response+0x437/0x4a0 [ib_srpt] > >> Call Trace: > >> [<ffffffffa052f009>] srpt_queue_data_in+0x9/0x10 [ib_srpt] > >> [<ffffffffa04f1ee2>] target_complete_ok_work+0x152/0x2b0 [target_core_mod] > >> [<ffffffff81071ea7>] process_one_work+0x197/0x480 > >> [<ffffffff810721d9>] worker_thread+0x49/0x490 > >> [<ffffffff8107878a>] kthread+0xea/0x100 > >> [<ffffffff8159b172>] ret_from_fork+0x22/0x40 > >> > >> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> > >> Cc: Nicholas Bellinger <nab@linux-iscsi.org> > > > > I've already asked you not to revert the patch [ ... ] > > But why not to revert that patch? Your patch was completely untested and > makes the ib_srpt driver unusable. I think the regular approach for such > patches is to revert them and to rework these in a later kernel version. > Linus, please let me know if you disagree. Because I'm actively working on a bug-fix for the regression, and it's -rc2 in a target driver that hardly anyone cares about. If the regression is not addressed before v4.6-rc6, then by all means revert the original patch. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 07, 2016 at 04:52:03PM -0700, Nicholas A. Bellinger wrote: > Because I'm actively working on a bug-fix for the regression, and it's > -rc2 in a target driver that hardly anyone cares about. > > If the regression is not addressed before v4.6-rc6, then by all means > revert the original patch. Stop it. People do care. At least I do for sure, and it's blocking a mayor feature I want to land in 4.7. Bart has a proper fix, and it's perfectly valid that you ask him to resend with various tweaks towards how your prefer various details to be done, but there is no reason to be agressive and personal. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2016-04-07 at 16:49 -0700, Linus Torvalds wrote: > On Apr 7, 2016 16:37, "Nicholas A. Bellinger" <nab@linux-iscsi.org> wrote: > > > > Please stop trying to bypass target-pending for your target related > > patches. > > Nicholas, if you send me totally broken and untested crap, you have > lost *all* rights to then complain about "bypass you". > > So get off your high horse. Bart did the right thing, and *you* fucked > up. Don't try to shift him be the bad guy. > Yes, I fucked up and am actively fixing the regression. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2016-04-07 at 16:54 -0700, Christoph Hellwig wrote: > On Thu, Apr 07, 2016 at 04:52:03PM -0700, Nicholas A. Bellinger wrote: > > Because I'm actively working on a bug-fix for the regression, and it's > > -rc2 in a target driver that hardly anyone cares about. > > > > If the regression is not addressed before v4.6-rc6, then by all means > > revert the original patch. > > Stop it. People do care. At least I do for sure, and it's blocking a > mayor feature I want to land in 4.7. Bart has a proper fix, and it's > perfectly valid that you ask him to resend with various tweaks towards > how your prefer various details to be done, but there is no reason to > be agressive and personal. As mentioned, I'm actively working on a -v2 for this regression. You'll certainly be the first to hear about it as it's resolved over the next days. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4/7/16 4:52 PM, Nicholas A. Bellinger wrote: > On Thu, 2016-04-07 at 16:44 -0700, Bart Van Assche wrote: >> On 04/07/2016 04:37 PM, Nicholas A. Bellinger wrote: >>> On Thu, 2016-04-07 at 15:55 -0700, Bart Van Assche wrote: >>>> That patch causes the ib_srpt driver to crash as soon as the first >>>> SCSI command is received. This means that that patch was untested. >>>> Hence revert it. The shortcomings of that patch are as follows: >>>> - It makes the ib_srpt driver use I/O contexts allocated by >>>> transport_alloc_session_tags() but it does not initialize these >>>> I/O contexts properly. All the initializations performed by >>>> srpt_alloc_ioctx() are skipped. >>>> - It swaps the order of the send ioctx allocation and the transition >>>> to RTR mode which is wrong. >>>> - The amount of memory that is needed for I/O contexts is doubled. >>>> - srpt_rdma_ch.free_list is no longer used but is not removed. >>>> >>>> Revert commit 0fd10721fe36 and thereby fix the following kernel crash: >>>> >>>> kernel BUG at drivers/infiniband/ulp/srpt/ib_srpt.c:1439! >>>> invalid opcode: 0000 [#1] SMP >>>> Workqueue: target_completion target_complete_ok_work [target_core_mod] >>>> RIP: 0010:[<ffffffffa052ef37>] [<ffffffffa052ef37>] srpt_queue_response+0x437/0x4a0 [ib_srpt] >>>> Call Trace: >>>> [<ffffffffa052f009>] srpt_queue_data_in+0x9/0x10 [ib_srpt] >>>> [<ffffffffa04f1ee2>] target_complete_ok_work+0x152/0x2b0 [target_core_mod] >>>> [<ffffffff81071ea7>] process_one_work+0x197/0x480 >>>> [<ffffffff810721d9>] worker_thread+0x49/0x490 >>>> [<ffffffff8107878a>] kthread+0xea/0x100 >>>> [<ffffffff8159b172>] ret_from_fork+0x22/0x40 >>>> >>>> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> >>>> Cc: Nicholas Bellinger <nab@linux-iscsi.org> >>> >>> I've already asked you not to revert the patch [ ... ] >> >> But why not to revert that patch? Your patch was completely untested and >> makes the ib_srpt driver unusable. I think the regular approach for such >> patches is to revert them and to rework these in a later kernel version. >> Linus, please let me know if you disagree. > > Because I'm actively working on a bug-fix for the regression, and it's > -rc2 in a target driver that hardly anyone cares about. I care about this driver. It's what we use internally for all of our SRP testing as we don't have and of the big DDN arrays that provide an SRP targets, so we built our own so we could test our clients. > If the regression is not addressed before v4.6-rc6, then by all means > revert the original patch. > > >
On Thu, Apr 7, 2016 at 4:56 PM, Nicholas A. Bellinger <nab@linux-iscsi.org> wrote: > > Yes, I fucked up and am actively fixing the regression. .. and then you complained when somebody who relies on that driver wants it *working*. Really. I don't know why you and Bart butt heads, but I don't _need_ to know. If people find major bugs that keep their machines from working and want them reverted, that's quite understandable. So I'm going to apply that revert, and I will *not* pull a fix from you until I hear that the fix actually got tested and has a Tested-by: Bart Van Assche <bart.vanassche@sandisk.com> or similar from people who actually have the hardware. Because bugs happen, and that's life. But it's _not_ acceptable to complain when a used points out that the change had no testing what-so-ever, and asks to have it reverted. If you have a patch to fix it, reply to a revert request with "Here, try this instead": That's a good approach, and allows the user of that hardware to continue testing. But if the change really had zero testing, and broke a driver that Bart needs to work, then I certainly see why he'd want to have it reverted. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 13594db..3b425af 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -1264,26 +1264,40 @@ free_mem: */ static struct srpt_send_ioctx *srpt_get_send_ioctx(struct srpt_rdma_ch *ch) { - struct se_session *se_sess; struct srpt_send_ioctx *ioctx; - int tag; + unsigned long flags; BUG_ON(!ch); - se_sess = ch->sess; - tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING); - if (tag < 0) { - pr_err("Unable to obtain tag for srpt_send_ioctx\n"); - return NULL; + ioctx = NULL; + spin_lock_irqsave(&ch->spinlock, flags); + if (!list_empty(&ch->free_list)) { + ioctx = list_first_entry(&ch->free_list, + struct srpt_send_ioctx, free_list); + list_del(&ioctx->free_list); } - ioctx = &((struct srpt_send_ioctx *)se_sess->sess_cmd_map)[tag]; - memset(ioctx, 0, sizeof(struct srpt_send_ioctx)); - ioctx->ch = ch; + spin_unlock_irqrestore(&ch->spinlock, flags); + + if (!ioctx) + return ioctx; + + BUG_ON(ioctx->ch != ch); spin_lock_init(&ioctx->spinlock); ioctx->state = SRPT_STATE_NEW; + ioctx->n_rbuf = 0; + ioctx->rbufs = NULL; + ioctx->n_rdma = 0; + ioctx->n_rdma_wrs = 0; + ioctx->rdma_wrs = NULL; + ioctx->mapped_sg_count = 0; init_completion(&ioctx->tx_done); - - ioctx->cmd.map_tag = tag; + ioctx->queue_status_only = false; + /* + * transport_init_se_cmd() does not initialize all fields, so do it + * here. + */ + memset(&ioctx->cmd, 0, sizeof(ioctx->cmd)); + memset(&ioctx->sense_data, 0, sizeof(ioctx->sense_data)); return ioctx; } @@ -2013,7 +2027,7 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id, struct ib_cm_rep_param *rep_param; struct srpt_rdma_ch *ch, *tmp_ch; u32 it_iu_len; - int ret = 0; + int i, ret = 0; unsigned char *p; WARN_ON_ONCE(irqs_disabled()); @@ -2135,6 +2149,12 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id, if (!ch->ioctx_ring) goto free_ch; + INIT_LIST_HEAD(&ch->free_list); + for (i = 0; i < ch->rq_size; i++) { + ch->ioctx_ring[i]->ch = ch; + list_add_tail(&ch->ioctx_ring[i]->free_list, &ch->free_list); + } + ret = srpt_create_ch_ib(ch); if (ret) { rej->reason = cpu_to_be32( @@ -2165,8 +2185,7 @@ static int srpt_cm_req_recv(struct ib_cm_id *cm_id, p = &ch->sess_name[0]; try_again: - ch->sess = target_alloc_session(&sport->port_tpg_1, ch->rq_size, - sizeof(struct srpt_send_ioctx), + ch->sess = target_alloc_session(&sport->port_tpg_1, 0, 0, TARGET_PROT_NORMAL, p, ch, NULL); if (IS_ERR(ch->sess)) { pr_info("Rejected login because no ACL has been" @@ -2873,7 +2892,7 @@ static void srpt_release_cmd(struct se_cmd *se_cmd) struct srpt_send_ioctx *ioctx = container_of(se_cmd, struct srpt_send_ioctx, cmd); struct srpt_rdma_ch *ch = ioctx->ch; - struct se_session *se_sess = ch->sess; + unsigned long flags; WARN_ON(ioctx->state != SRPT_STATE_DONE); WARN_ON(ioctx->mapped_sg_count != 0); @@ -2884,7 +2903,9 @@ static void srpt_release_cmd(struct se_cmd *se_cmd) ioctx->n_rbuf = 0; } - percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag); + spin_lock_irqsave(&ch->spinlock, flags); + list_add(&ioctx->free_list, &ch->free_list); + spin_unlock_irqrestore(&ch->spinlock, flags); } /** diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.h b/drivers/infiniband/ulp/srpt/ib_srpt.h index ca288f0..af9b8b5 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.h +++ b/drivers/infiniband/ulp/srpt/ib_srpt.h @@ -179,6 +179,7 @@ struct srpt_recv_ioctx { * struct srpt_send_ioctx - SRPT send I/O context. * @ioctx: See above. * @ch: Channel pointer. + * @free_list: Node in srpt_rdma_ch.free_list. * @n_rbuf: Number of data buffers in the received SRP command. * @rbufs: Pointer to SRP data buffer array. * @single_rbuf: SRP data buffer if the command has only a single buffer. @@ -201,6 +202,7 @@ struct srpt_send_ioctx { struct srp_direct_buf *rbufs; struct srp_direct_buf single_rbuf; struct scatterlist *sg; + struct list_head free_list; spinlock_t spinlock; enum srpt_command_state state; struct se_cmd cmd;
That patch causes the ib_srpt driver to crash as soon as the first SCSI command is received. This means that that patch was untested. Hence revert it. The shortcomings of that patch are as follows: - It makes the ib_srpt driver use I/O contexts allocated by transport_alloc_session_tags() but it does not initialize these I/O contexts properly. All the initializations performed by srpt_alloc_ioctx() are skipped. - It swaps the order of the send ioctx allocation and the transition to RTR mode which is wrong. - The amount of memory that is needed for I/O contexts is doubled. - srpt_rdma_ch.free_list is no longer used but is not removed. Revert commit 0fd10721fe36 and thereby fix the following kernel crash: kernel BUG at drivers/infiniband/ulp/srpt/ib_srpt.c:1439! invalid opcode: 0000 [#1] SMP Workqueue: target_completion target_complete_ok_work [target_core_mod] RIP: 0010:[<ffffffffa052ef37>] [<ffffffffa052ef37>] srpt_queue_response+0x437/0x4a0 [ib_srpt] Call Trace: [<ffffffffa052f009>] srpt_queue_data_in+0x9/0x10 [ib_srpt] [<ffffffffa04f1ee2>] target_complete_ok_work+0x152/0x2b0 [target_core_mod] [<ffffffff81071ea7>] process_one_work+0x197/0x480 [<ffffffff810721d9>] worker_thread+0x49/0x490 [<ffffffff8107878a>] kthread+0xea/0x100 [<ffffffff8159b172>] ret_from_fork+0x22/0x40 Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> --- drivers/infiniband/ulp/srpt/ib_srpt.c | 55 ++++++++++++++++++++++++----------- drivers/infiniband/ulp/srpt/ib_srpt.h | 2 ++ 2 files changed, 40 insertions(+), 17 deletions(-)