diff mbox

[v1] mpt3sas: Fix calltrace observed while running IO & host reset

Message ID 1528971921-15216-1-git-send-email-chaitra.basappa@broadcom.com (mailing list archive)
State Changes Requested
Headers show

Commit Message

Chaitra P B June 14, 2018, 10:25 a.m. UTC
Below kernel BUG was observed while running IOs with host reset
(issued from application),

mpt3sas_cm0: diag reset: SUCCESS
------------[ cut here ]------------
WARNING: CPU: 12 PID: 4336 at drivers/scsi/mpt3sas/mpt3sas_base.c:3282 mpt3sas_base_clear_st+0x3d/0x40 [mpt3sas]
Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support
 dcdbas pcspkr joydev ipmi_ssif ses enclosure sg ipmi_devintf acpi_pad ipmi_msghandler acpi_power_meter mei_me lpc_ich wmi mei shpchp ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix mpt3sas libata crct10dif_pclmul crct10dif_common tg3 crc32c_intel i2c_core raid_class ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 12 PID: 4336 Comm: python Kdump: loaded Tainted: G        W      ------------   3.10.0-875.el7.brdc.x86_64 #1
Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 1.5.0 03/08/2013
Call Trace:
 [<ffffffff9cf16583>] dump_stack+0x19/0x1b
 [<ffffffff9c891698>] __warn+0xd8/0x100
 [<ffffffff9c8917dd>] warn_slowpath_null+0x1d/0x20
 [<ffffffffc04f3f4d>] mpt3sas_base_clear_st+0x3d/0x40 [mpt3sas]
 [<ffffffffc05047d2>] _scsih_flush_running_cmds+0x92/0xe0 [mpt3sas]
 [<ffffffffc05095db>] mpt3sas_scsih_reset_handler+0x43b/0xaf0 [mpt3sas]
 [<ffffffff9c894829>] ? vprintk_default+0x29/0x40
 [<ffffffff9cf10531>] ? printk+0x60/0x77
 [<ffffffffc04f06c8>] ? _base_diag_reset+0x238/0x340 [mpt3sas]
 [<ffffffffc04f794d>] mpt3sas_base_hard_reset_handler+0x1ad/0x420 [mpt3sas]
 [<ffffffffc05132b9>] _ctl_ioctl_main.isra.12+0x11b9/0x1200 [mpt3sas]
 [<ffffffffc068d585>] ? xfs_file_aio_write+0x155/0x1b0 [xfs]
 [<ffffffff9ca1a4e3>] ? do_sync_write+0x93/0xe0
 [<ffffffffc051337a>] _ctl_ioctl+0x1a/0x20 [mpt3sas]
 [<ffffffff9ca2fe90>] do_vfs_ioctl+0x350/0x560
 [<ffffffff9ca1dec1>] ? __sb_end_write+0x31/0x60
 [<ffffffff9ca30141>] SyS_ioctl+0xa1/0xc0
 [<ffffffff9cf28715>] ? system_call_after_swapgs+0xa2/0x146
 [<ffffffff9cf287d5>] system_call_fastpath+0x1c/0x21
 [<ffffffff9cf28721>] ? system_call_after_swapgs+0xae/0x146
---[ end trace 5dac5b98d89aaa3c ]---
------------[ cut here ]------------
kernel BUG at block/blk-core.c:1476!
invalid opcode: 0000 [#1] SMP
Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support
 dcdbas pcspkr joydev ipmi_ssif ses enclosure sg ipmi_devintf acpi_pad ipmi_msghandler acpi_power_meter mei_me lpc_ich wmi mei shpchp ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix mpt3sas libata crct10dif_pclmul crct10dif_common tg3 crc32c_intel i2c_core raid_class ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 12 PID: 4336 Comm: python Kdump: loaded Tainted: G        W      ------------   3.10.0-875.el7.brdc.x86_64 #1
Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 1.5.0 03/08/2013
task: ffff903fc96e0fd0 ti: ffff903fb1eec000 task.ti: ffff903fb1eec000
RIP: 0010:[<ffffffff9cb19ec0>]  [<ffffffff9cb19ec0>] blk_requeue_request+0x90/0xa0
RSP: 0018:ffff903c6b783dc0  EFLAGS: 00010087
RAX: ffff903bb67026d0 RBX: ffff903b7d6a6140 RCX: dead000000000200
RDX: ffff903bb67026d0 RSI: ffff903bb6702580 RDI: ffff903bb67026d0
RBP: ffff903c6b783dd8 R08: ffff903bb67026d0 R09: ffffd97e80000000
R10: ffff903c658bac00 R11: 0000000000000000 R12: ffff903bb6702580
R13: ffff903fa9a292f0 R14: 0000000000000246 R15: 0000000000001057
FS:  00007f7026f5b740(0000) GS:ffff903c6b780000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f298877c004 CR3: 00000000caf36000 CR4: 00000000000607e0
Call Trace:
 <IRQ>
 [<ffffffff9cca68ff>] __scsi_queue_insert+0xbf/0x110
 [<ffffffff9cca79ca>] scsi_io_completion+0x5da/0x6a0
 [<ffffffff9cc9ca3c>] scsi_finish_command+0xdc/0x140
 [<ffffffff9cca6aa2>] scsi_softirq_done+0x132/0x160
 [<ffffffff9cb240c6>] blk_done_softirq+0x96/0xc0
 [<ffffffff9c89a905>] __do_softirq+0xf5/0x280
 [<ffffffff9cf2bd2c>] call_softirq+0x1c/0x30
 [<ffffffff9c82d625>] do_softirq+0x65/0xa0
 [<ffffffff9c89ac85>] irq_exit+0x105/0x110
 [<ffffffff9cf2d0a8>] smp_apic_timer_interrupt+0x48/0x60
 [<ffffffff9cf297f2>] apic_timer_interrupt+0x162/0x170
 <EOI>
 [<ffffffff9cca5f41>] ? scsi_done+0x21/0x60
 [<ffffffff9cb5ac18>] ? delay_tsc+0x38/0x60
 [<ffffffff9cb5ab5d>] __const_udelay+0x2d/0x30
 [<ffffffffc04effde>] _base_handshake_req_reply_wait+0x8e/0x4a0 [mpt3sas]
 [<ffffffffc04f0b13>] _base_get_ioc_facts+0x123/0x590 [mpt3sas]
 [<ffffffffc04f06c8>] ? _base_diag_reset+0x238/0x340 [mpt3sas]
 [<ffffffffc04f7993>] mpt3sas_base_hard_reset_handler+0x1f3/0x420 [mpt3sas]
 [<ffffffffc05132b9>] _ctl_ioctl_main.isra.12+0x11b9/0x1200 [mpt3sas]
 [<ffffffffc068d585>] ? xfs_file_aio_write+0x155/0x1b0 [xfs]
 [<ffffffff9ca1a4e3>] ? do_sync_write+0x93/0xe0
 [<ffffffffc051337a>] _ctl_ioctl+0x1a/0x20 [mpt3sas]
 [<ffffffff9ca2fe90>] do_vfs_ioctl+0x350/0x560
 [<ffffffff9ca1dec1>] ? __sb_end_write+0x31/0x60
 [<ffffffff9ca30141>] SyS_ioctl+0xa1/0xc0
 [<ffffffff9cf28715>] ? system_call_after_swapgs+0xa2/0x146
 [<ffffffff9cf287d5>] system_call_fastpath+0x1c/0x21
 [<ffffffff9cf28721>] ? system_call_after_swapgs+0xae/0x146
Code: 83 c3 10 4c 89 e2 4c 89 ee e8 8d 21 04 00 48 8b 03 48 85 c0 75 e5 41 f6 44 24 4a 10 74 ad 4c 89 e6 4c 89 ef e8 b2 42 00 00 eb a0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
RIP  [<ffffffff9cb19ec0>] blk_requeue_request+0x90/0xa0
 RSP <ffff903c6b783dc0>

As a part of host reset operation, driver will flushout all IOs
outstanding at driver level with "DID_RESET" result.
To find which are all commands outstanding at the driver level,
driver loops with smid starting from one to HBA queue depth and
calls mpt3sas_scsih_scsi_lookup_get() to get scmd as shown below

 for (smid = 1; smid <= ioc->scsiio_depth; smid++) {
                scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid);
                if (!scmd)
                        continue;

But in mpt3sas_scsih_scsi_lookup_get() function, driver returns some
scsi cmnds which are not outstanding at the driver level (possibly request
is constructed at block layer since QUEUE_FLAG_QUIESCED is not set. Even if
driver uses scsi_block_requests and scsi_unblock_requests, issue still
persists as they will be just blocking further IO from scsi layer and
not from block layer) and these commands are flushed with
DID_RESET host bytes thus resulting into above kernel BUG.

To fix this issue, we have modified the mpt3sas_scsih_scsi_lookup_get()
to check for smid equals to zero (note: whenever any scsi cmnd is
outstanding at the driver level then smid for that scsi cmnd won't
be zero, always it starts from one) before it returns the scmd pointer to
the caller. If smid is zero then this function returns scmd pointer as
NULL and driver won't flushout those scsi cmnds at driver level with
DID_RESET host byte thus this issue will not be observed.

Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
---
 drivers/scsi/mpt3sas/mpt3sas_base.c  | 1 +
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

Comments

Bart Van Assche June 14, 2018, 3:45 p.m. UTC | #1
On Thu, 2018-06-14 at 06:25 -0400, Chaitra P B wrote:
> As a part of host reset operation, driver will flushout all IOs

> outstanding at driver level with "DID_RESET" result.

> To find which are all commands outstanding at the driver level,

> driver loops with smid starting from one to HBA queue depth and

> calls mpt3sas_scsih_scsi_lookup_get() to get scmd as shown below

> 

>  for (smid = 1; smid <= ioc->scsiio_depth; smid++) {

>                 scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid);

>                 if (!scmd)

>                         continue;

> 

> But in mpt3sas_scsih_scsi_lookup_get() function, driver returns some

> scsi cmnds which are not outstanding at the driver level (possibly request

> is constructed at block layer since QUEUE_FLAG_QUIESCED is not set. Even if

> driver uses scsi_block_requests and scsi_unblock_requests, issue still

> persists as they will be just blocking further IO from scsi layer and

> not from block layer) and these commands are flushed with

> DID_RESET host bytes thus resulting into above kernel BUG.


Have you considered to call scsi_target_block() before flushing out pending
I/O and to call scsi_target_unblock() after flushing out pending I/O has
finished? I think that would be a much cleaner solution than the proposed
patch. The effect of scsi_target_block() is explained above
scsi_internal_device_block():

/**
 * scsi_internal_device_block - try to transition to the SDEV_BLOCK state
 * @sdev: device to block
 *
 * Pause SCSI command processing on the specified device and wait until all
 * ongoing scsi_request_fn() / scsi_queue_rq() calls have finished. May sleep.
 *
 * Returns zero if successful or a negative error code upon failure.
 *
 * Note:
 * This routine transitions the device to the SDEV_BLOCK state (which must be
 * a legal transition). When the device is in this state, command processing
 * is paused until the device leaves the SDEV_BLOCK state. See also
 * scsi_internal_device_unblock().
 *
 * To do: avoid that scsi_send_eh_cmnd() calls queuecommand() after
 * scsi_internal_device_block() has blocked a SCSI device and also
 * remove the rport mutex lock and unlock calls from srp_queuecommand().
 */

Thanks,

Bart.
Bart Van Assche June 20, 2018, 6:13 p.m. UTC | #2
On Wed, 2018-06-20 at 09:18 +0530, Chaitra Basappa wrote:
> We have tried with calling scsi_internal_device_block_nowait() API before

> doing IOC reset (i.e. host reset) and called

> scsi_internal_device_unblock_nowait() after performing IOC reset.

> We have tested this code change with various test cases such as

> adding/removing target drives or expanders during diag reset with and

> without IOs and at high level we see all are working but we observe below

> error messages while performing hibernation operation,

> 

> sd 1:0:0:0: device_block, handle(0x0028)

> BRCM Debug: sdev->sdev_state: 5 before device_block_nowait

> BRCM Debug: sdev->sdev_state: 5 after_device_block_nowait

> sd 1:0:0:0: device_block failed with return(-22) for handle(0x0028)

> .

> .

> sd 0:0:0:0: device_unblock and setting to running, handle(0x0028)

> sd 0:0:0:0: device_unblock failed with return(-22) for handle(0x0028)

> performing a block followed by an unblock

> sd 0:0:0:0: retried device_block failed with return(-22) for handle(0x0028)

> sd 0:0:0:0: retried device_unblock failed with return(-22) for

> handle(0x0028)

> 

> We are observing these messages during of system resume time, during which

> driver issues IOC reset operation in the .resume() callback function.

> In the above error messages we see that drives are in SDEV_QUIESCE state.

> When drives are SDEV_QUIESCE state then moving these drives to

> SDEV_BLOCK state is not allowed and hence we observe above error messages.

> 

> SDEV_QUIESCE state means that Device quiescent. No block commands will be

> accepted, only specials (which originate in the midlayer).


Neither scsi_internal_device_block_nowait() nor
scsi_internal_device_unblock_nowait() should ever have been changed from
static into exported functions. But that's another discussion. Regarding the
adverse interaction of scsi_internal_device_block_nowait() and
scsi_internal_device_unblock_nowait() with the power management code, have
you considered to surround code that blocks and unblocks SCSI devices with
lock_system_sleep() / unlock_system_sleep() to avoid that these functions
fail with error code -22?

Thanks,

Bart.
Sreekanth Reddy June 21, 2018, 10:11 a.m. UTC | #3
On Wed, Jun 20, 2018 at 11:43 PM, Bart Van Assche
<Bart.VanAssche@wdc.com> wrote:
> On Wed, 2018-06-20 at 09:18 +0530, Chaitra Basappa wrote:
>> We have tried with calling scsi_internal_device_block_nowait() API before
>> doing IOC reset (i.e. host reset) and called
>> scsi_internal_device_unblock_nowait() after performing IOC reset.
>> We have tested this code change with various test cases such as
>> adding/removing target drives or expanders during diag reset with and
>> without IOs and at high level we see all are working but we observe below
>> error messages while performing hibernation operation,
>>
>> sd 1:0:0:0: device_block, handle(0x0028)
>> BRCM Debug: sdev->sdev_state: 5 before device_block_nowait
>> BRCM Debug: sdev->sdev_state: 5 after_device_block_nowait
>> sd 1:0:0:0: device_block failed with return(-22) for handle(0x0028)
>> .
>> .
>> sd 0:0:0:0: device_unblock and setting to running, handle(0x0028)
>> sd 0:0:0:0: device_unblock failed with return(-22) for handle(0x0028)
>> performing a block followed by an unblock
>> sd 0:0:0:0: retried device_block failed with return(-22) for handle(0x0028)
>> sd 0:0:0:0: retried device_unblock failed with return(-22) for
>> handle(0x0028)
>>
>> We are observing these messages during of system resume time, during which
>> driver issues IOC reset operation in the .resume() callback function.
>> In the above error messages we see that drives are in SDEV_QUIESCE state.
>> When drives are SDEV_QUIESCE state then moving these drives to
>> SDEV_BLOCK state is not allowed and hence we observe above error messages.
>>
>> SDEV_QUIESCE state means that Device quiescent. No block commands will be
>> accepted, only specials (which originate in the midlayer).
>
> Neither scsi_internal_device_block_nowait() nor
> scsi_internal_device_unblock_nowait() should ever have been changed from
> static into exported functions. But that's another discussion. Regarding the
> adverse interaction of scsi_internal_device_block_nowait() and
> scsi_internal_device_unblock_nowait() with the power management code, have
> you considered to surround code that blocks and unblocks SCSI devices with
> lock_system_sleep() / unlock_system_sleep() to avoid that these functions
> fail with error code -22?
>

Bart, we tried using lock_system_sleep() before calling IOC reset
operation in .resume() callback function and unlock_system_sleep()
after the IOC reset. With this code change we see system is going to
hang state during hibernation and we just see below messages,

[  625.788598] PM: hibernation entry
Jun 21 05:37:33 localhost kernel: PM: hibernation entry
[  627.428159] PM: Syncing filesystems ...
Jun 21 05:37:34 localhost kernel: PM: Syncing filesystems ...
[  628.756119] PM: done.
[  628.758707] Freezing user space processes ... (elapsed 0.001 seconds) done.
[  628.768340] OOM killer disabled.
[  628.772010] PM: Preallocating image memory... done (allocated 197704 pages)
[  632.554470] PM: Allocated 790816 kbytes in 3.77 seconds (209.76 MB/s)
[  632.561664] Freezing remaining freezable tasks ... (elapsed 0.002
seconds) done.
[  632.572269] Suspending console(s) (use no_console_suspend to debug)


The fix which we have posted looks simple and we don't see any side
effects of it.
We have done complete regression testing on our fix and we don't see
any issue with it. So please consider our fix which have posted.

Thanks,
Sreekanth


> Thanks,
>
> Bart.
>
>
>
Bart Van Assche June 21, 2018, 2:19 p.m. UTC | #4
On Thu, 2018-06-21 at 15:41 +0530, Sreekanth Reddy wrote:
> Bart, we tried using lock_system_sleep() before calling IOC reset

> operation in .resume() callback function and unlock_system_sleep()

> after the IOC reset. With this code change we see system is going to

> hang state during hibernation and we just see below messages,

> 

> [  625.788598] PM: hibernation entry

> Jun 21 05:37:33 localhost kernel: PM: hibernation entry

> [  627.428159] PM: Syncing filesystems ...

> Jun 21 05:37:34 localhost kernel: PM: Syncing filesystems ...

> [  628.756119] PM: done.

> [  628.758707] Freezing user space processes ... (elapsed 0.001 seconds) done.

> [  628.768340] OOM killer disabled.

> [  628.772010] PM: Preallocating image memory... done (allocated 197704 pages)

> [  632.554470] PM: Allocated 790816 kbytes in 3.77 seconds (209.76 MB/s)

> [  632.561664] Freezing remaining freezable tasks ... (elapsed 0.002

> seconds) done.

> [  632.572269] Suspending console(s) (use no_console_suspend to debug)

> 

> 

> The fix which we have posted looks simple and we don't see any side

> effects of it.

> We have done complete regression testing on our fix and we don't see

> any issue with it. So please consider our fix which have posted.


scsi_internal_device_block_nowait() can be called by the mpt3sas driver from
interrupt context. lock_system_sleep() locks a mutex and hence must not be
called from interrupt context. Does the above mean that (a) a call to
lock_system_sleep() was inserted in an interrupt handler and (b) that the
above test was performed with a kernel in which lockdep was disabled?

Bart.
Sreekanth Reddy June 22, 2018, 5:35 a.m. UTC | #5
On Thu, Jun 21, 2018 at 7:49 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> On Thu, 2018-06-21 at 15:41 +0530, Sreekanth Reddy wrote:
>> Bart, we tried using lock_system_sleep() before calling IOC reset
>> operation in .resume() callback function and unlock_system_sleep()
>> after the IOC reset. With this code change we see system is going to
>> hang state during hibernation and we just see below messages,
>>
>> [  625.788598] PM: hibernation entry
>> Jun 21 05:37:33 localhost kernel: PM: hibernation entry
>> [  627.428159] PM: Syncing filesystems ...
>> Jun 21 05:37:34 localhost kernel: PM: Syncing filesystems ...
>> [  628.756119] PM: done.
>> [  628.758707] Freezing user space processes ... (elapsed 0.001 seconds) done.
>> [  628.768340] OOM killer disabled.
>> [  628.772010] PM: Preallocating image memory... done (allocated 197704 pages)
>> [  632.554470] PM: Allocated 790816 kbytes in 3.77 seconds (209.76 MB/s)
>> [  632.561664] Freezing remaining freezable tasks ... (elapsed 0.002
>> seconds) done.
>> [  632.572269] Suspending console(s) (use no_console_suspend to debug)
>>
>>
>> The fix which we have posted looks simple and we don't see any side
>> effects of it.
>> We have done complete regression testing on our fix and we don't see
>> any issue with it. So please consider our fix which have posted.
>
> scsi_internal_device_block_nowait() can be called by the mpt3sas driver from
> interrupt context. lock_system_sleep() locks a mutex and hence must not be
> called from interrupt context. Does the above mean that (a) a call to
> lock_system_sleep() was inserted in an interrupt handler and

No, lock_system_sleep() is not inserted in the interrupt context. we
have inserted it in .resume() call back function just before issuing
the IOC reset.

(b) that the
> above test was performed with a kernel in which lockdep was disabled?

No, lockdep is enabled in the kernel.


~Sreekanth
>
> Bart.
>
>
>
Bart Van Assche June 22, 2018, 2:52 p.m. UTC | #6
On 06/21/18 22:35, Sreekanth Reddy wrote:
> No, lock_system_sleep() is not inserted in the interrupt context. we
> have inserted it in .resume() call back function just before issuing
> the IOC reset.

That's the wrong place to insert a lock_system_sleep() call. Please have 
a look at drivers/scsi/scsi_transport_spi.c for an example of how to use 
that function.

Thanks,

Bart.
Sreekanth Reddy June 22, 2018, 4:38 p.m. UTC | #7
On Fri, Jun 22, 2018 at 8:22 PM, Bart Van Assche <bart.vanassche@wdc.com> wrote:
> On 06/21/18 22:35, Sreekanth Reddy wrote:
>>
>> No, lock_system_sleep() is not inserted in the interrupt context. we
>> have inserted it in .resume() call back function just before issuing
>> the IOC reset.
>
>
> That's the wrong place to insert a lock_system_sleep() call. Please have a
> look at drivers/scsi/scsi_transport_spi.c for an example of how to use that
> function.
>

I am totally confused.

In driver's .resume() callback function, driver is doing IOC reset
operation. And as per your suggestion we tried using
scsi_internal_device_block_nowait() to block the all the devices
attached to the HBA before going for IOC reset operation. During
system resume time as drives will be in quiesce state and calling
scsi_internal_device_block_nowait() return with error status -22.

So you suggested us to try using lock_system_sleep() API before
calling scsi_internal_device_block_nowait(), so that we don't get -22
return status when we call scsi_internal_device_block_nowait() API
during resume time. We tried the same and we see system get hang
during hibernation. Please correct me if I am wrong or if I have
wrongly understood your suggestions.

I feel that the fix which have posted is the better fix then using
scsi_internal_device_block_nowait()/scsi_internal_device_unblock_nowait()
APIs.

Thanks,
Sreekanth



> Thanks,
>
> Bart.
Bart Van Assche June 22, 2018, 9:26 p.m. UTC | #8
On 06/22/18 09:38, Sreekanth Reddy wrote:
> In driver's .resume() callback function, driver is doing IOC reset
> operation. And as per your suggestion we tried using
> scsi_internal_device_block_nowait() to block the all the devices
> attached to the HBA before going for IOC reset operation. During
> system resume time as drives will be in quiesce state and calling
> scsi_internal_device_block_nowait() return with error status -22.
> 
> So you suggested us to try using lock_system_sleep() API before
> calling scsi_internal_device_block_nowait(), so that we don't get -22
> return status when we call scsi_internal_device_block_nowait() API
> during resume time. We tried the same and we see system get hang
> during hibernation. Please correct me if I am wrong or if I have
> wrongly understood your suggestions.
> 
> I feel that the fix which have posted is the better fix then using
> scsi_internal_device_block_nowait()/scsi_internal_device_unblock_nowait()
> APIs.

Hello Sreekanth,

It seems like there is a misunderstanding: what I proposed was to use 
lock_system_sleep() to delay hibernation until unlock_system_sleep() has 
been called. I had not realized that the mpt3sas driver can call 
scsi_internal_device_block_nowait() during system resume.

There is another issue with the scsi_internal_device_block_nowait() 
calls by the mpt3sas driver: callers like 
_scsih_sas_broadcast_primitive_event() seem to assume that all 
scsih_qcmd() calls have finished as soon as 
scsi_internal_device_block_nowait() has returned. However, that 
assumption is not correct, especially when using scsi-mq.

If the patch "mpt3sas: Fix calltrace observed while running IO & host 
reset" would be allowed upstream, will the issues mentioned above ever 
be addressed?

Bart.
Sreekanth Reddy June 25, 2018, 6:10 a.m. UTC | #9
On Sat, Jun 23, 2018 at 2:56 AM, Bart Van Assche <bart.vanassche@wdc.com> wrote:
> On 06/22/18 09:38, Sreekanth Reddy wrote:
>>
>> In driver's .resume() callback function, driver is doing IOC reset
>> operation. And as per your suggestion we tried using
>> scsi_internal_device_block_nowait() to block the all the devices
>> attached to the HBA before going for IOC reset operation. During
>> system resume time as drives will be in quiesce state and calling
>> scsi_internal_device_block_nowait() return with error status -22.
>>
>> So you suggested us to try using lock_system_sleep() API before
>> calling scsi_internal_device_block_nowait(), so that we don't get -22
>> return status when we call scsi_internal_device_block_nowait() API
>> during resume time. We tried the same and we see system get hang
>> during hibernation. Please correct me if I am wrong or if I have
>> wrongly understood your suggestions.
>>
>> I feel that the fix which have posted is the better fix then using
>> scsi_internal_device_block_nowait()/scsi_internal_device_unblock_nowait()
>> APIs.
>
>
> Hello Sreekanth,
>
> It seems like there is a misunderstanding: what I proposed was to use
> lock_system_sleep() to delay hibernation until unlock_system_sleep() has
> been called. I had not realized that the mpt3sas driver can call
> scsi_internal_device_block_nowait() during system resume.
>
> There is another issue with the scsi_internal_device_block_nowait() calls by
> the mpt3sas driver: callers like _scsih_sas_broadcast_primitive_event() seem
> to assume that all scsih_qcmd() calls have finished as soon as
> scsi_internal_device_block_nowait() has returned. However, that assumption
> is not correct, especially when using scsi-mq.
>

Hello Bart,

Before calling scsi_internal_device_block_nowait() API; driver sets
sas_device_priv_data->block flag as one. And in the scsih_qcmd()
driver checks for this flag as shown below and return the commands
with host busy status.

} else if (sas_target_priv_data->tm_busy ||
            sas_device_priv_data->block)
                return SCSI_MLQUEUE_DEVICE_BUSY;

 Also as a part of processing the braodcast primitive event driver
issues the task query TM to determine where the IO command is and will
only abort those IOs (by issuing task abort TM) which are queued in
the target devices. Hope I have clarified the issue which you have
pointed out.

> If the patch "mpt3sas: Fix calltrace observed while running IO & host reset"
> would be allowed upstream, will the issues mentioned above ever be
> addressed?
If their are any issues we are always avilable to address them.


Thanks,
Sreekanth

>
> Bart.
Bart Van Assche June 27, 2018, 10:18 p.m. UTC | #10
On 06/24/18 23:10, Sreekanth Reddy wrote:
> Before calling scsi_internal_device_block_nowait() API; driver sets
> sas_device_priv_data->block flag as one. And in the scsih_qcmd()
> driver checks for this flag as shown below and return the commands
> with host busy status.
> 
> } else if (sas_target_priv_data->tm_busy ||
>              sas_device_priv_data->block)
>                  return SCSI_MLQUEUE_DEVICE_BUSY;

That's exactly the kind of construct that should occur in the SCSI core 
or block layer core and not in a SCSI LLD. Additionally, as explained 
before, the construct you described above is racy.

Bart.
Sreekanth Reddy June 28, 2018, 5:26 a.m. UTC | #11
On Thu, Jun 28, 2018 at 3:48 AM, Bart Van Assche <bart.vanassche@wdc.com> wrote:
> On 06/24/18 23:10, Sreekanth Reddy wrote:
>>
>> Before calling scsi_internal_device_block_nowait() API; driver sets
>> sas_device_priv_data->block flag as one. And in the scsih_qcmd()
>> driver checks for this flag as shown below and return the commands
>> with host busy status.
>>
>> } else if (sas_target_priv_data->tm_busy ||
>>              sas_device_priv_data->block)
>>                  return SCSI_MLQUEUE_DEVICE_BUSY;
>
>
> That's exactly the kind of construct that should occur in the SCSI core or
> block layer core and not in a SCSI LLD. Additionally, as explained before,
> the construct you described above is racy.

Can you please elaborate more in details about the racy condition
which you think?
Also if at all is their any racy condition here then we are ready to
work on it separately,
So please consider the fix which we have posted.

Thanks,
Sreekanth




>
> Bart.
Sreekanth Reddy July 9, 2018, 11:44 a.m. UTC | #12
On Thu, Jun 28, 2018 at 10:56 AM, Sreekanth Reddy
<sreekanth.reddy@broadcom.com> wrote:
> On Thu, Jun 28, 2018 at 3:48 AM, Bart Van Assche <bart.vanassche@wdc.com> wrote:
>> On 06/24/18 23:10, Sreekanth Reddy wrote:
>>>
>>> Before calling scsi_internal_device_block_nowait() API; driver sets
>>> sas_device_priv_data->block flag as one. And in the scsih_qcmd()
>>> driver checks for this flag as shown below and return the commands
>>> with host busy status.
>>>
>>> } else if (sas_target_priv_data->tm_busy ||
>>>              sas_device_priv_data->block)
>>>                  return SCSI_MLQUEUE_DEVICE_BUSY;
>>
>>
>> That's exactly the kind of construct that should occur in the SCSI core or
>> block layer core and not in a SCSI LLD. Additionally, as explained before,
>> the construct you described above is racy.
>
> Can you please elaborate more in details about the racy condition
> which you think?
> Also if at all is their any racy condition here then we are ready to
> work on it separately,
> So please consider the fix which we have posted.
>
> Thanks,
> Sreekanth
>
>
>
>
>>
>> Bart.


Any update regarding this patch?

Thanks,
Sreekanth
Sreekanth Reddy July 10, 2018, 5:28 p.m. UTC | #13
On Mon, Jul 9, 2018 at 5:14 PM, Sreekanth Reddy
<sreekanth.reddy@broadcom.com> wrote:
> On Thu, Jun 28, 2018 at 10:56 AM, Sreekanth Reddy
> <sreekanth.reddy@broadcom.com> wrote:
>> On Thu, Jun 28, 2018 at 3:48 AM, Bart Van Assche <bart.vanassche@wdc.com> wrote:
>>> On 06/24/18 23:10, Sreekanth Reddy wrote:
>>>>
>>>> Before calling scsi_internal_device_block_nowait() API; driver sets
>>>> sas_device_priv_data->block flag as one. And in the scsih_qcmd()
>>>> driver checks for this flag as shown below and return the commands
>>>> with host busy status.
>>>>
>>>> } else if (sas_target_priv_data->tm_busy ||
>>>>              sas_device_priv_data->block)
>>>>                  return SCSI_MLQUEUE_DEVICE_BUSY;
>>>
>>>
>>> That's exactly the kind of construct that should occur in the SCSI core or
>>> block layer core and not in a SCSI LLD. Additionally, as explained before,
>>> the construct you described above is racy.
>>
>> Can you please elaborate more in details about the racy condition
>> which you think?
>> Also if at all is their any racy condition here then we are ready to
>> work on it separately,
>> So please consider the fix which we have posted.
>>
>> Thanks,
>> Sreekanth
>>
>>
>>
>>
>>>
>>> Bart.
>
>
> Any update regarding this patch?

Pinging once again! please consider this patch fix. this is a very
simple fix for the issue reported and doesn't have any side effort.

Regards,
Sreekanth

>
> Thanks,
> Sreekanth
Sreekanth Reddy July 12, 2018, 12:35 p.m. UTC | #14
On Tue, Jul 10, 2018 at 10:58 PM, Sreekanth Reddy
<sreekanth.reddy@broadcom.com> wrote:
> On Mon, Jul 9, 2018 at 5:14 PM, Sreekanth Reddy
> <sreekanth.reddy@broadcom.com> wrote:
>> On Thu, Jun 28, 2018 at 10:56 AM, Sreekanth Reddy
>> <sreekanth.reddy@broadcom.com> wrote:
>>> On Thu, Jun 28, 2018 at 3:48 AM, Bart Van Assche <bart.vanassche@wdc.com> wrote:
>>>> On 06/24/18 23:10, Sreekanth Reddy wrote:
>>>>>
>>>>> Before calling scsi_internal_device_block_nowait() API; driver sets
>>>>> sas_device_priv_data->block flag as one. And in the scsih_qcmd()
>>>>> driver checks for this flag as shown below and return the commands
>>>>> with host busy status.
>>>>>
>>>>> } else if (sas_target_priv_data->tm_busy ||
>>>>>              sas_device_priv_data->block)
>>>>>                  return SCSI_MLQUEUE_DEVICE_BUSY;
>>>>
>>>>
>>>> That's exactly the kind of construct that should occur in the SCSI core or
>>>> block layer core and not in a SCSI LLD. Additionally, as explained before,
>>>> the construct you described above is racy.
>>>
>>> Can you please elaborate more in details about the racy condition
>>> which you think?
>>> Also if at all is their any racy condition here then we are ready to
>>> work on it separately,
>>> So please consider the fix which we have posted.
>>>
>>> Thanks,
>>> Sreekanth
>>>
>>>
>>>
>>>
>>>>
>>>> Bart.
>>
>>
>> Any update regarding this patch?
>
> Pinging once again! please consider this patch fix. this is a very
> simple fix for the issue reported and doesn't have any side effort.


Any update on acceptance of this patch.

This is a very simple fix for the current issue  which is described in
the patch header, please consider this patch.

We will also review complete driver code path once again and we are
ready to fix if any racy conditions found while reviewing in coming
days.

Thanks,
Sreekanth
diff mbox

Patch

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 668f350..49d92d0 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -3309,6 +3309,7 @@  void mpt3sas_base_clear_st(struct MPT3SAS_ADAPTER *ioc,
 		return;
 	st->cb_idx = 0xFF;
 	st->direct_io = 0;
+	st->smid = 0;
 	atomic_set(&ioc->chain_lookup[st->smid - 1].chain_offset, 0);
 }
 
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 23902ad..96e523a 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -1489,7 +1489,7 @@  struct scsi_cmnd *
 		scmd = scsi_host_find_tag(ioc->shost, unique_tag);
 		if (scmd) {
 			st = scsi_cmd_priv(scmd);
-			if (st->cb_idx == 0xFF)
+			if (st->cb_idx == 0xFF || st->smid == 0)
 				scmd = NULL;
 		}
 	}