[0/5] stop normal completion path entering a timeout req

Message ID	a68ad043-26a1-d3d8-2009-504ba4230e0f@oracle.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> Subject: Re: [PATCH 0/5]stop normal completion path entering a timeout req To: Christoph Hellwig <hch@lst.de> Cc: Keith Busch <keith.busch@intel.com>, axboe@kernel.dk, martin.petersen@oracle.com, josef@toxicpanda.com, ulf.hansson@linaro.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org References: <1529500964-28429-1-git-send-email-jianchao.w.wang@oracle.com> <20180620181601.GA24145@localhost.localdomain> <e87f5946-bb21-942e-2dcc-b24cee1ad23c@oracle.com> <20180621081900.GA5183@lst.de> From: "jianchao.wang" <jianchao.w.wang@oracle.com> Message-ID: <a68ad043-26a1-d3d8-2009-504ba4230e0f@oracle.com> Date: Thu, 21 Jun 2018 16:22:22 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180621081900.GA5183@lst.de> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-block-owner@vger.kernel.org Precedence: bulk

Message ID

a68ad043-26a1-d3d8-2009-504ba4230e0f@oracle.com (mailing list archive)

State

New, archived

Headers

Subject: Re: [PATCH 0/5]stop normal completion path entering a timeout req
To: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <keith.busch@intel.com>, axboe@kernel.dk,
	martin.petersen@oracle.com, josef@toxicpanda.com,
	ulf.hansson@linaro.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org
References: <1529500964-28429-1-git-send-email-jianchao.w.wang@oracle.com>
	<20180620181601.GA24145@localhost.localdomain>
	<e87f5946-bb21-942e-2dcc-b24cee1ad23c@oracle.com>
	<20180621081900.GA5183@lst.de>
From: "jianchao.wang" <jianchao.w.wang@oracle.com>
Message-ID: <a68ad043-26a1-d3d8-2009-504ba4230e0f@oracle.com>
Date: Thu, 21 Jun 2018 16:22:22 +0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
	Thunderbird/52.7.0
MIME-Version: 1.0
In-Reply-To: <20180621081900.GA5183@lst.de>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk

Commit Message

jianchao.wang June 21, 2018, 8:22 a.m. UTC

Hi Christoph

Thanks for your kindly response.

On 06/21/2018 04:19 PM, Christoph Hellwig wrote:
> On Thu, Jun 21, 2018 at 09:43:26AM +0800, jianchao.wang wrote:
>> So we have to preserve the ability of block layer that it could prevent
>> IO completion path from entering a timeout request.
>>
>> With scsi-debug module, I tried to simulate a scenario where timeout and IO
>> completion path could occur concurrently, the system ran into crash easily.
> 
> Trace, please.  With the latest kernel.  I'm not saying that there
> is nothing to fix, but the mode of never completing once timeout
> requests as currently done is SCSI is clearly broken.
> 

I didn't find the existing method to simulate this.
So I modified the scsi-debug as following patch as install it as following:
modprobe scsi-debug delay=-1 ndelay=-1
Both 4.17-rc1 and 4.18-rc1 with this patch set could survive from the test.

Comments

Christoph Hellwig June 22, 2018, 3:10 p.m. UTC | #1

On Thu, Jun 21, 2018 at 04:22:22PM +0800, jianchao.wang wrote:
> > Trace, please.  With the latest kernel.  I'm not saying that there
> > is nothing to fix, but the mode of never completing once timeout
> > requests as currently done is SCSI is clearly broken.
> > 
> 
> I didn't find the existing method to simulate this.
> So I modified the scsi-debug as following patch as install it as following:
> modprobe scsi-debug delay=-1 ndelay=-1
> Both 4.17-rc1 and 4.18-rc1 with this patch set could survive from the test.

What tree is this against?  I can't apply it to either current Linus'
tree or 4.17 for that matter.

Also I'm not sure this blk_abort_request call is representative
of the real world.  Drivers do drain their queues before calling
it in general, e.g. take a look at ata_eh_set_pending for the
probably most common user.

jianchao.wang June 25, 2018, 1:29 a.m. UTC | #2

Hi Christoph

Sorry for delayed response.

On 06/22/2018 11:10 PM, Christoph Hellwig wrote:
> On Thu, Jun 21, 2018 at 04:22:22PM +0800, jianchao.wang wrote:
>>> Trace, please.  With the latest kernel.  I'm not saying that there
>>> is nothing to fix, but the mode of never completing once timeout
>>> requests as currently done is SCSI is clearly broken.
>>>
>>
>> I didn't find the existing method to simulate this.
>> So I modified the scsi-debug as following patch as install it as following:
>> modprobe scsi-debug delay=-1 ndelay=-1
>> Both 4.17-rc1 and 4.18-rc1 with this patch set could survive from the test.
> 
> What tree is this against?  I can't apply it to either current Linus'
> tree or 4.17 for that matter.

Ｉ made the patch against 4.18.rc1.

> Also I'm not sure this blk_abort_request call is representative
> of the real world.  Drivers do drain their queues before calling
> it in general, e.g. take a look at ata_eh_set_pending for the
> probably most common user.
> 

This blk_abort_request here is to force request timed out and simulate the
scenario where timeout path and io completion path could occur concurrently.
It is hard for me to trigger this scenario in real world, so I made  this patch
which may looks bad. What I want is to trigger the io completion and timeout path
concurrently.


Thanks
Jianchao

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 24d7496..f278e6c 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -4323,6 +4323,8 @@  static void setup_inject(struct sdebug_queue *sqp,
        sqcp->inj_host_busy = !!(SDEBUG_OPT_HOST_BUSY & sdebug_opts);
 }
 
+static atomic_t g_abort_counter;
+
 /* Complete the processing of the thread that queued a SCSI command to this
  * driver. It either completes the command by calling cmnd_done() or
  * schedules a hr timer or work queue then returns 0. Returns
@@ -4459,6 +4461,11 @@  static int schedule_resp(struct scsi_cmnd *cmnd, struct sdebug_dev_info *devip,
                        sd_dp->issuing_cpu = raw_smp_processor_id();
                sd_dp->defer_t = SDEB_DEFER_WQ;
                schedule_work(&sd_dp->ew.work);
+               atomic_inc(&g_abort_counter);
+               if (atomic_read(&g_abort_counter)%2000 == 0) {
+                       blk_abort_request(cmnd->request);
+                       trace_printk("abort request tag %d\n", cmnd->request->tag);
+               }
        }
        if (unlikely((SDEBUG_OPT_Q_NOISE & sdebug_opts) &&
                     (scsi_result == device_qfull_result)))
@@ -5844,6 +5851,7 @@  static int sdebug_driver_probe(struct device *dev)
        struct Scsi_Host *hpnt;
        int hprot;
 
+       atomic_set(&g_abort_counter, 0);
        sdbg_host = to_sdebug_host(dev);
 
        sdebug_driver_template.can_queue = sdebug_max_queue;

[0/5] stop normal completion path entering a timeout req

Commit Message

Comments

Patch