diff mbox

[0/5] stop normal completion path entering a timeout req

Message ID a68ad043-26a1-d3d8-2009-504ba4230e0f@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

jianchao.wang June 21, 2018, 8:22 a.m. UTC
Hi Christoph

Thanks for your kindly response.

On 06/21/2018 04:19 PM, Christoph Hellwig wrote:
> On Thu, Jun 21, 2018 at 09:43:26AM +0800, jianchao.wang wrote:
>> So we have to preserve the ability of block layer that it could prevent
>> IO completion path from entering a timeout request.
>>
>> With scsi-debug module, I tried to simulate a scenario where timeout and IO
>> completion path could occur concurrently, the system ran into crash easily.
> 
> Trace, please.  With the latest kernel.  I'm not saying that there
> is nothing to fix, but the mode of never completing once timeout
> requests as currently done is SCSI is clearly broken.
> 

I didn't find the existing method to simulate this.
So I modified the scsi-debug as following patch as install it as following:
modprobe scsi-debug delay=-1 ndelay=-1
Both 4.17-rc1 and 4.18-rc1 with this patch set could survive from the test.

Comments

Christoph Hellwig June 22, 2018, 3:10 p.m. UTC | #1
On Thu, Jun 21, 2018 at 04:22:22PM +0800, jianchao.wang wrote:
> > Trace, please.  With the latest kernel.  I'm not saying that there
> > is nothing to fix, but the mode of never completing once timeout
> > requests as currently done is SCSI is clearly broken.
> > 
> 
> I didn't find the existing method to simulate this.
> So I modified the scsi-debug as following patch as install it as following:
> modprobe scsi-debug delay=-1 ndelay=-1
> Both 4.17-rc1 and 4.18-rc1 with this patch set could survive from the test.

What tree is this against?  I can't apply it to either current Linus'
tree or 4.17 for that matter.

Also I'm not sure this blk_abort_request call is representative
of the real world.  Drivers do drain their queues before calling
it in general, e.g. take a look at ata_eh_set_pending for the
probably most common user.
jianchao.wang June 25, 2018, 1:29 a.m. UTC | #2
Hi Christoph

Sorry for delayed response.

On 06/22/2018 11:10 PM, Christoph Hellwig wrote:
> On Thu, Jun 21, 2018 at 04:22:22PM +0800, jianchao.wang wrote:
>>> Trace, please.  With the latest kernel.  I'm not saying that there
>>> is nothing to fix, but the mode of never completing once timeout
>>> requests as currently done is SCSI is clearly broken.
>>>
>>
>> I didn't find the existing method to simulate this.
>> So I modified the scsi-debug as following patch as install it as following:
>> modprobe scsi-debug delay=-1 ndelay=-1
>> Both 4.17-rc1 and 4.18-rc1 with this patch set could survive from the test.
> 
> What tree is this against?  I can't apply it to either current Linus'
> tree or 4.17 for that matter.

I made the patch against 4.18.rc1.

> Also I'm not sure this blk_abort_request call is representative
> of the real world.  Drivers do drain their queues before calling
> it in general, e.g. take a look at ata_eh_set_pending for the
> probably most common user.
> 

This blk_abort_request here is to force request timed out and simulate the
scenario where timeout path and io completion path could occur concurrently.
It is hard for me to trigger this scenario in real world, so I made  this patch
which may looks bad. What I want is to trigger the io completion and timeout path
concurrently.


Thanks
Jianchao
diff mbox

Patch

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 24d7496..f278e6c 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -4323,6 +4323,8 @@  static void setup_inject(struct sdebug_queue *sqp,
        sqcp->inj_host_busy = !!(SDEBUG_OPT_HOST_BUSY & sdebug_opts);
 }
 
+static atomic_t g_abort_counter;
+
 /* Complete the processing of the thread that queued a SCSI command to this
  * driver. It either completes the command by calling cmnd_done() or
  * schedules a hr timer or work queue then returns 0. Returns
@@ -4459,6 +4461,11 @@  static int schedule_resp(struct scsi_cmnd *cmnd, struct sdebug_dev_info *devip,
                        sd_dp->issuing_cpu = raw_smp_processor_id();
                sd_dp->defer_t = SDEB_DEFER_WQ;
                schedule_work(&sd_dp->ew.work);
+               atomic_inc(&g_abort_counter);
+               if (atomic_read(&g_abort_counter)%2000 == 0) {
+                       blk_abort_request(cmnd->request);
+                       trace_printk("abort request tag %d\n", cmnd->request->tag);
+               }
        }
        if (unlikely((SDEBUG_OPT_Q_NOISE & sdebug_opts) &&
                     (scsi_result == device_qfull_result)))
@@ -5844,6 +5851,7 @@  static int sdebug_driver_probe(struct device *dev)
        struct Scsi_Host *hpnt;
        int hprot;
 
+       atomic_set(&g_abort_counter, 0);
        sdbg_host = to_sdebug_host(dev);
 
        sdebug_driver_template.can_queue = sdebug_max_queue;