diff mbox

IB/srpt: Revert "Convert to percpu_ida tag allocation"

Message ID 5706E548.3080002@sandisk.com (mailing list archive)
State Superseded
Headers show

Commit Message

Bart Van Assche April 7, 2016, 10:55 p.m. UTC
That patch causes the ib_srpt driver to crash as soon as the first
SCSI command is received. This means that that patch was untested.
Hence revert it. The shortcomings of that patch are as follows:
- It makes the ib_srpt driver use I/O contexts allocated by
  transport_alloc_session_tags() but it does not initialize these
  I/O contexts properly. All the initializations performed by
  srpt_alloc_ioctx() are skipped.
- It swaps the order of the send ioctx allocation and the transition
  to RTR mode which is wrong.
- The amount of memory that is needed for I/O contexts is doubled.
- srpt_rdma_ch.free_list is no longer used but is not removed.

Revert commit 0fd10721fe36 and thereby fix the following kernel crash:

kernel BUG at drivers/infiniband/ulp/srpt/ib_srpt.c:1439!
invalid opcode: 0000 [#1] SMP
Workqueue: target_completion target_complete_ok_work [target_core_mod]
RIP: 0010:[<ffffffffa052ef37>]  [<ffffffffa052ef37>] srpt_queue_response+0x437/0x4a0 [ib_srpt]
Call Trace:
 [<ffffffffa052f009>] srpt_queue_data_in+0x9/0x10 [ib_srpt]
 [<ffffffffa04f1ee2>] target_complete_ok_work+0x152/0x2b0 [target_core_mod]
 [<ffffffff81071ea7>] process_one_work+0x197/0x480
 [<ffffffff810721d9>] worker_thread+0x49/0x490
 [<ffffffff8107878a>] kthread+0xea/0x100
 [<ffffffff8159b172>] ret_from_fork+0x22/0x40

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 55 ++++++++++++++++++++++++-----------
 drivers/infiniband/ulp/srpt/ib_srpt.h |  2 ++
 2 files changed, 40 insertions(+), 17 deletions(-)

Comments

Nicholas A. Bellinger April 7, 2016, 11:37 p.m. UTC | #1
On Thu, 2016-04-07 at 15:55 -0700, Bart Van Assche wrote:
> That patch causes the ib_srpt driver to crash as soon as the first
> SCSI command is received. This means that that patch was untested.
> Hence revert it. The shortcomings of that patch are as follows:
> - It makes the ib_srpt driver use I/O contexts allocated by
>   transport_alloc_session_tags() but it does not initialize these
>   I/O contexts properly. All the initializations performed by
>   srpt_alloc_ioctx() are skipped.
> - It swaps the order of the send ioctx allocation and the transition
>   to RTR mode which is wrong.
> - The amount of memory that is needed for I/O contexts is doubled.
> - srpt_rdma_ch.free_list is no longer used but is not removed.
> 
> Revert commit 0fd10721fe36 and thereby fix the following kernel crash:
> 
> kernel BUG at drivers/infiniband/ulp/srpt/ib_srpt.c:1439!
> invalid opcode: 0000 [#1] SMP
> Workqueue: target_completion target_complete_ok_work [target_core_mod]
> RIP: 0010:[<ffffffffa052ef37>]  [<ffffffffa052ef37>] srpt_queue_response+0x437/0x4a0 [ib_srpt]
> Call Trace:
>  [<ffffffffa052f009>] srpt_queue_data_in+0x9/0x10 [ib_srpt]
>  [<ffffffffa04f1ee2>] target_complete_ok_work+0x152/0x2b0 [target_core_mod]
>  [<ffffffff81071ea7>] process_one_work+0x197/0x480
>  [<ffffffff810721d9>] worker_thread+0x49/0x490
>  [<ffffffff8107878a>] kthread+0xea/0x100
>  [<ffffffff8159b172>] ret_from_fork+0x22/0x40
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Nicholas Bellinger <nab@linux-iscsi.org>

Please stop trying to bypass target-pending for your target related
patches.

I've already asked you not to revert the patch, because I'm working on a
patch to address the v4.6-rc regression here:

http://www.spinics.net/lists/target-devel/msg12535.html

If you've found a bug in that patch, please comment in that thread, or
even better send an incremental diff for what you've found.





--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche April 7, 2016, 11:44 p.m. UTC | #2
On 04/07/2016 04:37 PM, Nicholas A. Bellinger wrote:
> On Thu, 2016-04-07 at 15:55 -0700, Bart Van Assche wrote:
>> That patch causes the ib_srpt driver to crash as soon as the first
>> SCSI command is received. This means that that patch was untested.
>> Hence revert it. The shortcomings of that patch are as follows:
>> - It makes the ib_srpt driver use I/O contexts allocated by
>>    transport_alloc_session_tags() but it does not initialize these
>>    I/O contexts properly. All the initializations performed by
>>    srpt_alloc_ioctx() are skipped.
>> - It swaps the order of the send ioctx allocation and the transition
>>    to RTR mode which is wrong.
>> - The amount of memory that is needed for I/O contexts is doubled.
>> - srpt_rdma_ch.free_list is no longer used but is not removed.
>>
>> Revert commit 0fd10721fe36 and thereby fix the following kernel crash:
>>
>> kernel BUG at drivers/infiniband/ulp/srpt/ib_srpt.c:1439!
>> invalid opcode: 0000 [#1] SMP
>> Workqueue: target_completion target_complete_ok_work [target_core_mod]
>> RIP: 0010:[<ffffffffa052ef37>]  [<ffffffffa052ef37>] srpt_queue_response+0x437/0x4a0 [ib_srpt]
>> Call Trace:
>>   [<ffffffffa052f009>] srpt_queue_data_in+0x9/0x10 [ib_srpt]
>>   [<ffffffffa04f1ee2>] target_complete_ok_work+0x152/0x2b0 [target_core_mod]
>>   [<ffffffff81071ea7>] process_one_work+0x197/0x480
>>   [<ffffffff810721d9>] worker_thread+0x49/0x490
>>   [<ffffffff8107878a>] kthread+0xea/0x100
>>   [<ffffffff8159b172>] ret_from_fork+0x22/0x40
>>
>> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
>> Cc: Nicholas Bellinger <nab@linux-iscsi.org>
>
> I've already asked you not to revert the patch [ ... ]

But why not to revert that patch? Your patch was completely untested and 
makes the ib_srpt driver unusable. I think the regular approach for such 
patches is to revert them and to rework these in a later kernel version. 
Linus, please let me know if you disagree.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds April 7, 2016, 11:49 p.m. UTC | #3
On Apr 7, 2016 16:37, "Nicholas A. Bellinger" <nab@linux-iscsi.org> wrote:
>
> Please stop trying to bypass target-pending for your target related
> patches.

Nicholas, if you send me totally broken and untested crap, you have
lost *all* rights to then complain about "bypass you".

So get off your high horse. Bart did the right thing, and *you* fucked
up. Don't try to shift him be the bad guy.

                Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger April 7, 2016, 11:52 p.m. UTC | #4
On Thu, 2016-04-07 at 16:44 -0700, Bart Van Assche wrote:
> On 04/07/2016 04:37 PM, Nicholas A. Bellinger wrote:
> > On Thu, 2016-04-07 at 15:55 -0700, Bart Van Assche wrote:
> >> That patch causes the ib_srpt driver to crash as soon as the first
> >> SCSI command is received. This means that that patch was untested.
> >> Hence revert it. The shortcomings of that patch are as follows:
> >> - It makes the ib_srpt driver use I/O contexts allocated by
> >>    transport_alloc_session_tags() but it does not initialize these
> >>    I/O contexts properly. All the initializations performed by
> >>    srpt_alloc_ioctx() are skipped.
> >> - It swaps the order of the send ioctx allocation and the transition
> >>    to RTR mode which is wrong.
> >> - The amount of memory that is needed for I/O contexts is doubled.
> >> - srpt_rdma_ch.free_list is no longer used but is not removed.
> >>
> >> Revert commit 0fd10721fe36 and thereby fix the following kernel crash:
> >>
> >> kernel BUG at drivers/infiniband/ulp/srpt/ib_srpt.c:1439!
> >> invalid opcode: 0000 [#1] SMP
> >> Workqueue: target_completion target_complete_ok_work [target_core_mod]
> >> RIP: 0010:[<ffffffffa052ef37>]  [<ffffffffa052ef37>] srpt_queue_response+0x437/0x4a0 [ib_srpt]
> >> Call Trace:
> >>   [<ffffffffa052f009>] srpt_queue_data_in+0x9/0x10 [ib_srpt]
> >>   [<ffffffffa04f1ee2>] target_complete_ok_work+0x152/0x2b0 [target_core_mod]
> >>   [<ffffffff81071ea7>] process_one_work+0x197/0x480
> >>   [<ffffffff810721d9>] worker_thread+0x49/0x490
> >>   [<ffffffff8107878a>] kthread+0xea/0x100
> >>   [<ffffffff8159b172>] ret_from_fork+0x22/0x40
> >>
> >> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> >> Cc: Nicholas Bellinger <nab@linux-iscsi.org>
> >
> > I've already asked you not to revert the patch [ ... ]
> 
> But why not to revert that patch? Your patch was completely untested and 
> makes the ib_srpt driver unusable. I think the regular approach for such 
> patches is to revert them and to rework these in a later kernel version. 
> Linus, please let me know if you disagree.

Because I'm actively working on a bug-fix for the regression, and it's
-rc2 in a target driver that hardly anyone cares about.

If the regression is not addressed before v4.6-rc6, then by all means
revert the original patch.



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig April 7, 2016, 11:54 p.m. UTC | #5
On Thu, Apr 07, 2016 at 04:52:03PM -0700, Nicholas A. Bellinger wrote:
> Because I'm actively working on a bug-fix for the regression, and it's
> -rc2 in a target driver that hardly anyone cares about.
> 
> If the regression is not addressed before v4.6-rc6, then by all means
> revert the original patch.

Stop it.  People do care.  At least I do for sure, and it's blocking a
mayor feature I want to land in 4.7.  Bart has a proper fix, and it's
perfectly valid that you ask him to resend with various tweaks towards
how your prefer various details to be done, but there is no reason to
be agressive and personal.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger April 7, 2016, 11:56 p.m. UTC | #6
On Thu, 2016-04-07 at 16:49 -0700, Linus Torvalds wrote:
> On Apr 7, 2016 16:37, "Nicholas A. Bellinger" <nab@linux-iscsi.org> wrote:
> >
> > Please stop trying to bypass target-pending for your target related
> > patches.
> 
> Nicholas, if you send me totally broken and untested crap, you have
> lost *all* rights to then complain about "bypass you".
> 
> So get off your high horse. Bart did the right thing, and *you* fucked
> up. Don't try to shift him be the bad guy.
> 

Yes, I fucked up and am actively fixing the regression.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger April 8, 2016, midnight UTC | #7
On Thu, 2016-04-07 at 16:54 -0700, Christoph Hellwig wrote:
> On Thu, Apr 07, 2016 at 04:52:03PM -0700, Nicholas A. Bellinger wrote:
> > Because I'm actively working on a bug-fix for the regression, and it's
> > -rc2 in a target driver that hardly anyone cares about.
> > 
> > If the regression is not addressed before v4.6-rc6, then by all means
> > revert the original patch.
> 
> Stop it.  People do care.  At least I do for sure, and it's blocking a
> mayor feature I want to land in 4.7.  Bart has a proper fix, and it's
> perfectly valid that you ask him to resend with various tweaks towards
> how your prefer various details to be done, but there is no reason to
> be agressive and personal.

As mentioned, I'm actively working on a -v2 for this regression.

You'll certainly be the first to hear about it as it's resolved over the
next days.










--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford April 8, 2016, midnight UTC | #8
On 4/7/16 4:52 PM, Nicholas A. Bellinger wrote:
> On Thu, 2016-04-07 at 16:44 -0700, Bart Van Assche wrote:
>> On 04/07/2016 04:37 PM, Nicholas A. Bellinger wrote:
>>> On Thu, 2016-04-07 at 15:55 -0700, Bart Van Assche wrote:
>>>> That patch causes the ib_srpt driver to crash as soon as the first
>>>> SCSI command is received. This means that that patch was untested.
>>>> Hence revert it. The shortcomings of that patch are as follows:
>>>> - It makes the ib_srpt driver use I/O contexts allocated by
>>>>    transport_alloc_session_tags() but it does not initialize these
>>>>    I/O contexts properly. All the initializations performed by
>>>>    srpt_alloc_ioctx() are skipped.
>>>> - It swaps the order of the send ioctx allocation and the transition
>>>>    to RTR mode which is wrong.
>>>> - The amount of memory that is needed for I/O contexts is doubled.
>>>> - srpt_rdma_ch.free_list is no longer used but is not removed.
>>>>
>>>> Revert commit 0fd10721fe36 and thereby fix the following kernel crash:
>>>>
>>>> kernel BUG at drivers/infiniband/ulp/srpt/ib_srpt.c:1439!
>>>> invalid opcode: 0000 [#1] SMP
>>>> Workqueue: target_completion target_complete_ok_work [target_core_mod]
>>>> RIP: 0010:[<ffffffffa052ef37>]  [<ffffffffa052ef37>] srpt_queue_response+0x437/0x4a0 [ib_srpt]
>>>> Call Trace:
>>>>   [<ffffffffa052f009>] srpt_queue_data_in+0x9/0x10 [ib_srpt]
>>>>   [<ffffffffa04f1ee2>] target_complete_ok_work+0x152/0x2b0 [target_core_mod]
>>>>   [<ffffffff81071ea7>] process_one_work+0x197/0x480
>>>>   [<ffffffff810721d9>] worker_thread+0x49/0x490
>>>>   [<ffffffff8107878a>] kthread+0xea/0x100
>>>>   [<ffffffff8159b172>] ret_from_fork+0x22/0x40
>>>>
>>>> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
>>>> Cc: Nicholas Bellinger <nab@linux-iscsi.org>
>>>
>>> I've already asked you not to revert the patch [ ... ]
>>
>> But why not to revert that patch? Your patch was completely untested and 
>> makes the ib_srpt driver unusable. I think the regular approach for such 
>> patches is to revert them and to rework these in a later kernel version. 
>> Linus, please let me know if you disagree.
> 
> Because I'm actively working on a bug-fix for the regression, and it's
> -rc2 in a target driver that hardly anyone cares about.

I care about this driver.  It's what we use internally for all of our
SRP testing as we don't have and of the big DDN arrays that provide an
SRP targets, so we built our own so we could test our clients.

> If the regression is not addressed before v4.6-rc6, then by all means
> revert the original patch.
> 
> 
>
Linus Torvalds April 8, 2016, 12:13 a.m. UTC | #9
On Thu, Apr 7, 2016 at 4:56 PM, Nicholas A. Bellinger
<nab@linux-iscsi.org> wrote:
>
> Yes, I fucked up and am actively fixing the regression.

.. and then you complained when somebody who relies on that driver
wants it *working*.

Really.

I don't know why you and Bart butt heads, but I don't _need_ to know.
If people find major bugs that keep their machines from working and
want them reverted, that's quite understandable.

So I'm going to apply that revert, and I will *not* pull a fix from
you until I hear that the fix actually got tested and has a

   Tested-by: Bart Van Assche <bart.vanassche@sandisk.com>

or similar from people who actually have the hardware.

Because bugs happen, and that's life.

But it's _not_ acceptable to complain when a used points out that the
change had no testing what-so-ever, and asks to have it reverted.

If you have a patch to fix it, reply to a revert request with "Here,
try this instead": That's a good approach, and allows the user of that
hardware to continue testing.

But if the change really had zero testing, and broke a driver that
Bart needs to work, then I certainly see why he'd want to have it
reverted.

                    Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 13594db..3b425af 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -1264,26 +1264,40 @@  free_mem:
  */
 static struct srpt_send_ioctx *srpt_get_send_ioctx(struct srpt_rdma_ch *ch)
 {
-	struct se_session *se_sess;
 	struct srpt_send_ioctx *ioctx;
-	int tag;
+	unsigned long flags;
 
 	BUG_ON(!ch);
-	se_sess = ch->sess;
 
-	tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
-	if (tag < 0) {
-		pr_err("Unable to obtain tag for srpt_send_ioctx\n");
-		return NULL;
+	ioctx = NULL;
+	spin_lock_irqsave(&ch->spinlock, flags);
+	if (!list_empty(&ch->free_list)) {
+		ioctx = list_first_entry(&ch->free_list,
+					 struct srpt_send_ioctx, free_list);
+		list_del(&ioctx->free_list);
 	}
-	ioctx = &((struct srpt_send_ioctx *)se_sess->sess_cmd_map)[tag];
-	memset(ioctx, 0, sizeof(struct srpt_send_ioctx));
-	ioctx->ch = ch;
+	spin_unlock_irqrestore(&ch->spinlock, flags);
+
+	if (!ioctx)
+		return ioctx;
+
+	BUG_ON(ioctx->ch != ch);
 	spin_lock_init(&ioctx->spinlock);
 	ioctx->state = SRPT_STATE_NEW;
+	ioctx->n_rbuf = 0;
+	ioctx->rbufs = NULL;
+	ioctx->n_rdma = 0;
+	ioctx->n_rdma_wrs = 0;
+	ioctx->rdma_wrs = NULL;
+	ioctx->mapped_sg_count = 0;
 	init_completion(&ioctx->tx_done);
-
-	ioctx->cmd.map_tag = tag;
+	ioctx->queue_status_only = false;
+	/*
+	 * transport_init_se_cmd() does not initialize all fields, so do it
+	 * here.
+	 */
+	memset(&ioctx->cmd, 0, sizeof(ioctx->cmd));
+	memset(&ioctx->sense_data, 0, sizeof(ioctx->sense_data));
 
 	return ioctx;
 }
@@ -2013,7 +2027,7 @@  static int srpt_cm_req_recv(struct ib_cm_id *cm_id,
 	struct ib_cm_rep_param *rep_param;
 	struct srpt_rdma_ch *ch, *tmp_ch;
 	u32 it_iu_len;
-	int ret = 0;
+	int i, ret = 0;
 	unsigned char *p;
 
 	WARN_ON_ONCE(irqs_disabled());
@@ -2135,6 +2149,12 @@  static int srpt_cm_req_recv(struct ib_cm_id *cm_id,
 	if (!ch->ioctx_ring)
 		goto free_ch;
 
+	INIT_LIST_HEAD(&ch->free_list);
+	for (i = 0; i < ch->rq_size; i++) {
+		ch->ioctx_ring[i]->ch = ch;
+		list_add_tail(&ch->ioctx_ring[i]->free_list, &ch->free_list);
+	}
+
 	ret = srpt_create_ch_ib(ch);
 	if (ret) {
 		rej->reason = cpu_to_be32(
@@ -2165,8 +2185,7 @@  static int srpt_cm_req_recv(struct ib_cm_id *cm_id,
 	p = &ch->sess_name[0];
 
 try_again:
-	ch->sess = target_alloc_session(&sport->port_tpg_1, ch->rq_size,
-					sizeof(struct srpt_send_ioctx),
+	ch->sess = target_alloc_session(&sport->port_tpg_1, 0, 0,
 					TARGET_PROT_NORMAL, p, ch, NULL);
 	if (IS_ERR(ch->sess)) {
 		pr_info("Rejected login because no ACL has been"
@@ -2873,7 +2892,7 @@  static void srpt_release_cmd(struct se_cmd *se_cmd)
 	struct srpt_send_ioctx *ioctx = container_of(se_cmd,
 				struct srpt_send_ioctx, cmd);
 	struct srpt_rdma_ch *ch = ioctx->ch;
-	struct se_session *se_sess = ch->sess;
+	unsigned long flags;
 
 	WARN_ON(ioctx->state != SRPT_STATE_DONE);
 	WARN_ON(ioctx->mapped_sg_count != 0);
@@ -2884,7 +2903,9 @@  static void srpt_release_cmd(struct se_cmd *se_cmd)
 		ioctx->n_rbuf = 0;
 	}
 
-	percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
+	spin_lock_irqsave(&ch->spinlock, flags);
+	list_add(&ioctx->free_list, &ch->free_list);
+	spin_unlock_irqrestore(&ch->spinlock, flags);
 }
 
 /**
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.h b/drivers/infiniband/ulp/srpt/ib_srpt.h
index ca288f0..af9b8b5 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.h
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.h
@@ -179,6 +179,7 @@  struct srpt_recv_ioctx {
  * struct srpt_send_ioctx - SRPT send I/O context.
  * @ioctx:       See above.
  * @ch:          Channel pointer.
+ * @free_list:   Node in srpt_rdma_ch.free_list.
  * @n_rbuf:      Number of data buffers in the received SRP command.
  * @rbufs:       Pointer to SRP data buffer array.
  * @single_rbuf: SRP data buffer if the command has only a single buffer.
@@ -201,6 +202,7 @@  struct srpt_send_ioctx {
 	struct srp_direct_buf	*rbufs;
 	struct srp_direct_buf	single_rbuf;
 	struct scatterlist	*sg;
+	struct list_head	free_list;
 	spinlock_t		spinlock;
 	enum srpt_command_state	state;
 	struct se_cmd		cmd;