diff mbox

mpt3sas: Fix calltrace observed while running IO & host reset

Message ID 1528809478-44645-1-git-send-email-chaitra.basappa@broadcom.com (mailing list archive)
State Changes Requested
Headers show

Commit Message

Chaitra P B June 12, 2018, 1:17 p.m. UTC
Below kernel BUG was observed while running IOs with host reset
(issued from application),

mpt3sas_cm0: diag reset: SUCCESS
------------[ cut here ]------------
WARNING: CPU: 12 PID: 4336 at drivers/scsi/mpt3sas/mpt3sas_base.c:3282 mpt3sas_base_clear_st+0x3d/0x40 [mpt3sas]
Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support
 dcdbas pcspkr joydev ipmi_ssif ses enclosure sg ipmi_devintf acpi_pad ipmi_msghandler acpi_power_meter mei_me lpc_ich wmi mei shpchp ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix mpt3sas libata crct10dif_pclmul crct10dif_common tg3 crc32c_intel i2c_core raid_class ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 12 PID: 4336 Comm: python Kdump: loaded Tainted: G        W      ------------   3.10.0-875.el7.brdc.x86_64 #1
Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 1.5.0 03/08/2013
Call Trace:
 [<ffffffff9cf16583>] dump_stack+0x19/0x1b
 [<ffffffff9c891698>] __warn+0xd8/0x100
 [<ffffffff9c8917dd>] warn_slowpath_null+0x1d/0x20
 [<ffffffffc04f3f4d>] mpt3sas_base_clear_st+0x3d/0x40 [mpt3sas]
 [<ffffffffc05047d2>] _scsih_flush_running_cmds+0x92/0xe0 [mpt3sas]
 [<ffffffffc05095db>] mpt3sas_scsih_reset_handler+0x43b/0xaf0 [mpt3sas]
 [<ffffffff9c894829>] ? vprintk_default+0x29/0x40
 [<ffffffff9cf10531>] ? printk+0x60/0x77
 [<ffffffffc04f06c8>] ? _base_diag_reset+0x238/0x340 [mpt3sas]
 [<ffffffffc04f794d>] mpt3sas_base_hard_reset_handler+0x1ad/0x420 [mpt3sas]
 [<ffffffffc05132b9>] _ctl_ioctl_main.isra.12+0x11b9/0x1200 [mpt3sas]
 [<ffffffffc068d585>] ? xfs_file_aio_write+0x155/0x1b0 [xfs]
 [<ffffffff9ca1a4e3>] ? do_sync_write+0x93/0xe0
 [<ffffffffc051337a>] _ctl_ioctl+0x1a/0x20 [mpt3sas]
 [<ffffffff9ca2fe90>] do_vfs_ioctl+0x350/0x560
 [<ffffffff9ca1dec1>] ? __sb_end_write+0x31/0x60
 [<ffffffff9ca30141>] SyS_ioctl+0xa1/0xc0
 [<ffffffff9cf28715>] ? system_call_after_swapgs+0xa2/0x146
 [<ffffffff9cf287d5>] system_call_fastpath+0x1c/0x21
 [<ffffffff9cf28721>] ? system_call_after_swapgs+0xae/0x146
---[ end trace 5dac5b98d89aaa3c ]---
------------[ cut here ]------------
kernel BUG at block/blk-core.c:1476!
invalid opcode: 0000 [#1] SMP
Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support
 dcdbas pcspkr joydev ipmi_ssif ses enclosure sg ipmi_devintf acpi_pad ipmi_msghandler acpi_power_meter mei_me lpc_ich wmi mei shpchp ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix mpt3sas libata crct10dif_pclmul crct10dif_common tg3 crc32c_intel i2c_core raid_class ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 12 PID: 4336 Comm: python Kdump: loaded Tainted: G        W      ------------   3.10.0-875.el7.brdc.x86_64 #1
Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 1.5.0 03/08/2013
task: ffff903fc96e0fd0 ti: ffff903fb1eec000 task.ti: ffff903fb1eec000
RIP: 0010:[<ffffffff9cb19ec0>]  [<ffffffff9cb19ec0>] blk_requeue_request+0x90/0xa0
RSP: 0018:ffff903c6b783dc0  EFLAGS: 00010087
RAX: ffff903bb67026d0 RBX: ffff903b7d6a6140 RCX: dead000000000200
RDX: ffff903bb67026d0 RSI: ffff903bb6702580 RDI: ffff903bb67026d0
RBP: ffff903c6b783dd8 R08: ffff903bb67026d0 R09: ffffd97e80000000
R10: ffff903c658bac00 R11: 0000000000000000 R12: ffff903bb6702580
R13: ffff903fa9a292f0 R14: 0000000000000246 R15: 0000000000001057
FS:  00007f7026f5b740(0000) GS:ffff903c6b780000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f298877c004 CR3: 00000000caf36000 CR4: 00000000000607e0
Call Trace:
 <IRQ>
 [<ffffffff9cca68ff>] __scsi_queue_insert+0xbf/0x110
 [<ffffffff9cca79ca>] scsi_io_completion+0x5da/0x6a0
 [<ffffffff9cc9ca3c>] scsi_finish_command+0xdc/0x140
 [<ffffffff9cca6aa2>] scsi_softirq_done+0x132/0x160
 [<ffffffff9cb240c6>] blk_done_softirq+0x96/0xc0
 [<ffffffff9c89a905>] __do_softirq+0xf5/0x280
 [<ffffffff9cf2bd2c>] call_softirq+0x1c/0x30
 [<ffffffff9c82d625>] do_softirq+0x65/0xa0
 [<ffffffff9c89ac85>] irq_exit+0x105/0x110
 [<ffffffff9cf2d0a8>] smp_apic_timer_interrupt+0x48/0x60
 [<ffffffff9cf297f2>] apic_timer_interrupt+0x162/0x170
 <EOI>
 [<ffffffff9cca5f41>] ? scsi_done+0x21/0x60
 [<ffffffff9cb5ac18>] ? delay_tsc+0x38/0x60
 [<ffffffff9cb5ab5d>] __const_udelay+0x2d/0x30
 [<ffffffffc04effde>] _base_handshake_req_reply_wait+0x8e/0x4a0 [mpt3sas]
 [<ffffffffc04f0b13>] _base_get_ioc_facts+0x123/0x590 [mpt3sas]
 [<ffffffffc04f06c8>] ? _base_diag_reset+0x238/0x340 [mpt3sas]
 [<ffffffffc04f7993>] mpt3sas_base_hard_reset_handler+0x1f3/0x420 [mpt3sas]
 [<ffffffffc05132b9>] _ctl_ioctl_main.isra.12+0x11b9/0x1200 [mpt3sas]
 [<ffffffffc068d585>] ? xfs_file_aio_write+0x155/0x1b0 [xfs]
 [<ffffffff9ca1a4e3>] ? do_sync_write+0x93/0xe0
 [<ffffffffc051337a>] _ctl_ioctl+0x1a/0x20 [mpt3sas]
 [<ffffffff9ca2fe90>] do_vfs_ioctl+0x350/0x560
 [<ffffffff9ca1dec1>] ? __sb_end_write+0x31/0x60
 [<ffffffff9ca30141>] SyS_ioctl+0xa1/0xc0
 [<ffffffff9cf28715>] ? system_call_after_swapgs+0xa2/0x146
 [<ffffffff9cf287d5>] system_call_fastpath+0x1c/0x21
 [<ffffffff9cf28721>] ? system_call_after_swapgs+0xae/0x146
Code: 83 c3 10 4c 89 e2 4c 89 ee e8 8d 21 04 00 48 8b 03 48 85 c0 75 e5 41 f6 44 24 4a 10 74 ad 4c 89 e6 4c 89 ef e8 b2 42 00 00 eb a0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
RIP  [<ffffffff9cb19ec0>] blk_requeue_request+0x90/0xa0
 RSP <ffff903c6b783dc0>

As a part of host reset operation, driver will flushout all IOs
outstanding at driver level with "DID_RESET" result.
To find which are all commands outstanding at the driver level,
driver loops with smid starting from one to HBA queue depth and
calls mpt3sas_scsih_scsi_lookup_get() to get scmd as shown below

 for (smid = 1; smid <= ioc->scsiio_depth; smid++) {
                scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid);
                if (!scmd)
                        continue;

But in mpt3sas_scsih_scsi_lookup_get() function, driver returns some
scsi cmnds which are not outstanding at the driver level (possibly request
is constructed at block layer since QUEUE_FLAG_QUIESCED is not set. Even if
driver uses scsi_block_requests and scsi_unblock_requests, issue still
persists as they will be just blocking further IO from scsi layer and
not from block layer) and these commands are flushed with
DID_RESET host bytes thus resulting into above kernel BUG.

To fix this issue, we have modified the mpt3sas_scsih_scsi_lookup_get()
to check for smid equals to zero (note: whenever any scsi cmnd is
outstanding at the driver level then smid for that scsi cmnd won't
be zero, always it starts from one) before it returns the scmd pointer to
the caller. If smid is zero then this function returns scmd pointer as
NULL and driver won't flushout those scsi cmnds at driver level with
DID_RESET host byte thus this issue will not be observed.

Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com> 
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Bart Van Assche June 12, 2018, 3:24 p.m. UTC | #1
On Tue, 2018-06-12 at 09:17 -0400, Chaitra P B wrote:
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c

> index 23902ad..96e523a 100644

> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c

> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c

> @@ -1489,7 +1489,7 @@ struct scsi_cmnd *

>  		scmd = scsi_host_find_tag(ioc->shost, unique_tag);

>  		if (scmd) {

>  			st = scsi_cmd_priv(scmd);

> -			if (st->cb_idx == 0xFF)

> +			if (st->cb_idx == 0xFF || st->smid == 0)

>  				scmd = NULL;

>  		}

>  	}


What guarantees that st->smid won't change after it has been checked and
before scmd is used?

Thanks,

Bart.
Chaitra P B June 13, 2018, 10:16 a.m. UTC | #2
Bart,
 When host reset is issued from application, through ioctl reset handler
_ctl_do_reset() -> mpt3sas_base_hard_reset_handler() sets
“ioc->shost_recovery” flag.
If “ioc->shost_recovery” flag is set then driver will return all the
incoming SCSI cmds with “SCSI_MLQUEUE_HOST_BUSY” in the scsih_qcmd(). And
hence no new request gets processed by the driver until the reset completes,
which guarantees that the smid won't change.

Thanks,
 Chaitra

-----Original Message-----
From: Bart Van Assche [mailto:Bart.VanAssche@wdc.com]
Sent: Tuesday, June 12, 2018 8:54 PM
To: chaitra.basappa@broadcom.com; linux-scsi@vger.kernel.org
Cc: sathya.prakash@broadcom.com; suganath-prabu.subramani@broadcom.com;
Sreekanth.Reddy@broadcom.com
Subject: Re: [PATCH] mpt3sas: Fix calltrace observed while running IO & host
reset

On Tue, 2018-06-12 at 09:17 -0400, Chaitra P B wrote:
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 23902ad..96e523a 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -1489,7 +1489,7 @@ struct scsi_cmnd *
>  		scmd = scsi_host_find_tag(ioc->shost, unique_tag);
>  		if (scmd) {
>  			st = scsi_cmd_priv(scmd);
> -			if (st->cb_idx == 0xFF)
> +			if (st->cb_idx == 0xFF || st->smid == 0)
>  				scmd = NULL;
>  		}
>  	}

What guarantees that st->smid won't change after it has been checked and
before scmd is used?

Thanks,

Bart.
Bart Van Assche June 13, 2018, 3:52 p.m. UTC | #3
On Wed, 2018-06-13 at 15:46 +0530, Chaitra Basappa wrote:
>  When host reset is issued from application, through ioctl reset handler

> _ctl_do_reset() -> mpt3sas_base_hard_reset_handler() sets

> “ioc->shost_recovery” flag.

> If “ioc->shost_recovery” flag is set then driver will return all the

> incoming SCSI cmds with “SCSI_MLQUEUE_HOST_BUSY” in the scsih_qcmd(). And

> hence no new request gets processed by the driver until the reset completes,

> which guarantees that the smid won't change.


Hello Chaitra,

The patch at the start of this e-mail thread checks whether st->smid is zero.
That check could only be useful if there would be code in the mpt3sas driver
that clears that field upon command completion. However, I haven't found any
such code in the mpt3sas driver.

Another concern is that setting ioc->shost_recovery prevents new calls of
scsih_qcmd() to submit any commands. But I don't think that setting that flag
prevents any scsih_qcmd() calls that had already been started to submit a new
command.

In other words, I don't think that checking whether or not st->smid == 0 is
sufficient to fix the reported race.

Bart.
Chaitra P B June 14, 2018, 10:26 a.m. UTC | #4
Bart,
 Please see my replies inline.

Thanks,
 Chaitra

-----Original Message-----
From: Bart Van Assche [mailto:Bart.VanAssche@wdc.com]
Sent: Wednesday, June 13, 2018 9:22 PM
To: chaitra.basappa@broadcom.com; linux-scsi@vger.kernel.org
Cc: sathya.prakash@broadcom.com; suganath-prabu.subramani@broadcom.com;
sreekanth.reddy@broadcom.com
Subject: Re: [PATCH] mpt3sas: Fix calltrace observed while running IO & host
reset

On Wed, 2018-06-13 at 15:46 +0530, Chaitra Basappa wrote:
>  When host reset is issued from application, through ioctl reset handler
> _ctl_do_reset() -> mpt3sas_base_hard_reset_handler() sets
> “ioc->shost_recovery” flag.
> If “ioc->shost_recovery” flag is set then driver will return all the
> incoming SCSI cmds with “SCSI_MLQUEUE_HOST_BUSY” in the scsih_qcmd(). And
> hence no new request gets processed by the driver until the reset
> completes,
> which guarantees that the smid won't change.

Hello Chaitra,

The patch at the start of this e-mail thread checks whether st->smid is
zero.
That check could only be useful if there would be code in the mpt3sas driver
that clears that field upon command completion. However, I haven't found any
such code in the mpt3sas driver.

[Chaitra]
Before starting the host reset operation, driver will set
"ioc->shost_recovery" flag to one, so during host reset time if driver
receives any IO commands then below check in scsih_qcmd() returns these scsi
commands with host busy status and hence these commands are not issued to
the HBA FW. So these scsi commands will not be outstanding at the driver
level, hence smid for these scsi commands will be zero and no need to flush
out these commands during host reset time.

        /* host recovery or link resets sent via IOCTLs */
        if (ioc->shost_recovery || ioc->ioc_link_reset_in_progress)
                return SCSI_MLQUEUE_HOST_BUSY;

As a part of host reset operation, driver will flush out all the scsi
commands which are outstanding at the driver level with "DID_RESET" result.

To determine whether scsi cmnds are outstanding at the driver level while
looping from 'tag' value zero to hba queue depth, driver will check for
below two fields from the scsiio_tracker

1. cb_idx == 0xFF : this means that scsi cmnd has completed from the driver,
so this command is not outstanding at the driver level. So this check itself
is enough to determine that scsi cmnd is completed            from the
driver and no need reset smid to zero.
But any way it is better to reset the smid field also to zero along with
cb_idx setting to 0xff. And hence we will re-post this patch with setting of
smid field in scsiio_tracker to zero upon completion of the scsi cmnd by the
driver.

2. smid == 0 (zero): this means that scsi cmnd has not issued to the HBA
firmware, so this command is not outstanding at the driver level. (current
driver was not checking this case and hence we are observing this issue. In
this patch we have added this check to fix this issue)

If cd_idx != 0xff && smid != 0 , this means that scsi cmnd is outstanding at
the driver level and Driver will flush this scsi cmnd with "DID_RESET"
during diag reset time.



Another concern is that setting ioc->shost_recovery prevents new calls of
scsih_qcmd() to submit any commands. But I don't think that setting that
flag
prevents any scsih_qcmd() calls that had already been started to submit a
new
command.

[Chaitra]
If scsi cmnd has already crossed the check for "ioc->shost_recovery" flag
(it means that scmd has been issued just before starting of host reset
operation) then such commands will be processed by driver , which assigns
valid 'smid' whose value b/w 1 and <= ioc->scsiio_depth (i.e. scsi cmnd's
tag value + 1)  thus these commands will be outstanding at driver level and
hence will be flushed out with "DID_RESET" during reset operation.

In other words, I don't think that checking whether or not st->smid == 0 is
sufficient to fix the reported race.

Bart.
Bart Van Assche June 14, 2018, 3:47 p.m. UTC | #5
On Thu, 2018-06-14 at 15:56 +0530, Chaitra Basappa wrote:
> Bart,

>  Please see my replies inline.


Hello Chaitra,

Please don't use inline replies. That makes it very hard to follow the
conversation. BTW, in the headers of your e-mail I found the following:
"X-Mailer: Microsoft Outlook 14.0". Please use another e-mail client that is
better suited for collaboration on open source mailing lists. If outgoing
e-mails have to pass through an Exchange server, the following e-mail clients
support Exchange servers and are better suited for open source collaboration:
* Evolution (Linux only).
* Thunderbird + ExQuilla plugin.
* If IMAP has been enabled on the Exchange server that you are using then
  that means that you can choose from the many open source e-mail clients
  that support IMAP.

Thanks,

Bart.
diff mbox

Patch

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 23902ad..96e523a 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -1489,7 +1489,7 @@  struct scsi_cmnd *
 		scmd = scsi_host_find_tag(ioc->shost, unique_tag);
 		if (scmd) {
 			st = scsi_cmd_priv(scmd);
-			if (st->cb_idx == 0xFF)
+			if (st->cb_idx == 0xFF || st->smid == 0)
 				scmd = NULL;
 		}
 	}