diff mbox

[REPOST] scsi: ses: don't ask for diagnostic pages repeatedly during probe

Message ID 20171113234804.4675-1-dongyang.li@anu.edu.au (mailing list archive)
State Accepted
Headers show

Commit Message

Li Dongyang Nov. 13, 2017, 11:48 p.m. UTC
We are testing if there is a match with the ses device in a loop
by calling ses_match_to_enclosure(), which will issue scsi receive
diagnostics commands to the ses device for every device on the
same host.
On one of our boxes with 840 disks, it takes a long time to load
the driver:

[root@g1b-oss06 ~]# time modprobe ses

real	40m48.247s
user	0m0.001s
sys	0m0.196s

With the patch:

[root@g1b-oss06 ~]# time modprobe ses

real	0m17.915s
user	0m0.008s
sys	0m0.053s

Note that we still need to refresh page 10 when we see a new disk
to create the link.

Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au>
---
 drivers/scsi/ses.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

Comments

Jason Ozolins Nov. 21, 2017, 5:45 p.m. UTC | #1
Supporting info for the use case this patch addresses:
On 14/11/2017 10:48, Li Dongyang wrote:
> We are testing if there is a match with the ses device in a loop
> by calling ses_match_to_enclosure(), which will issue scsi receive
> diagnostics commands to the ses device for every device on the
> same host.
> On one of our boxes with 840 disks, it takes a long time to load
> the driver:
> 
> [root@g1b-oss06 ~]# time modprobe ses
> 
> real	40m48.247s
> user	0m0.001s
> sys	0m0.196s

The use case the patch addresses is Linux HA storage servers natively
handling block storage at scales that until now have been handled
by proprietary modular storage arrays.

Config tested has 840 targets across 26 SES devices in chained JBODS:
host#  #targets #enclosures
1       40       2
2      400      12
3      400      12

Without the patch, time taken in ses_init goes up as
#disks * #SES enclosures; so ~41 minutes is spent within
class_interface_register(), holding the scsi_device class mutex.

Stack from kworker during modprobe ses:
[<ffffffff812f440b>] blk_execute_rq+0xab/0x150
[<ffffffff81457653>] scsi_execute+0xd3/0x170
[<ffffffff81458ffe>] scsi_execute_req_flags+0xee/0x100
[<ffffffffa028e19c>] ses_recv_diag+0x7c/0xd0 [ses]
[<ffffffffa028e9fc>] ses_enclosure_data_process+0x7c/0x3e0 [ses]
[<ffffffffa028edac>] ses_match_to_enclosure+0x4c/0xb0 [ses]
[<ffffffffa028f279>] ses_intf_add+0x469/0x4d0 [ses]
[<ffffffff8142dbe9>] class_interface_register+0xb9/0x110
[<ffffffff8145fcb6>] scsi_register_interface+0x16/0x20
[<ffffffffa0037013>] ses_init+0x13/0x1000 [ses]
[<ffffffff810020e8>] do_one_initcall+0xb8/0x230
[<ffffffff81100288>] load_module+0x22c8/0x2930
[<ffffffff81100aa6>] SyS_finit_module+0xa6/0xd0
[<ffffffff816964c9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

During this time, if e.g. failed disks are replaced, the kworker
handling that event in the hpsa driver will block trying to get
the scsi_device class mutex, and hung task messages will be printed.

Task dump excerpt after disk insertion:
[  961.323141] kworker/u56:2   D ffff88407ee98680     0   676      2 0x00000000
[  961.323429] Workqueue: rescan_3_hpsa hpsa_rescan_ctlr_worker [hpsa]
[  961.323727]  ffff881ff456f890 0000000000000046 ffff881ffa454f10 ffff881ff456ffd8
[  961.324034]  ffff881ff456ffd8 ffff881ff456ffd8 ffff881ffa454f10 ffff88407ee98678
[  961.324332]  ffff88407ee9867c ffff881ffa454f10 00000000ffffffff ffff88407ee98680
[  961.324646] Call Trace:
[  961.324951]  [<ffffffff816aa409>] schedule_preempt_disabled+0x29/0x70
[  961.325252]  [<ffffffff816a8337>] __mutex_lock_slowpath+0xc7/0x1d0
[  961.325605]  [<ffffffff816a774f>] mutex_lock+0x1f/0x2f
[  961.325939]  [<ffffffff8143c7f5>] device_add+0x535/0x7c0
[  961.326266]  [<ffffffff81473d27>] scsi_sysfs_add_sdev+0xb7/0x280
[  961.326633]  [<ffffffff81470d05>] scsi_probe_and_add_lun+0xc35/0xe30
[  961.326971]  [<ffffffff8144acfc>] ? __pm_runtime_resume+0x5c/0x80
[  961.327327]  [<ffffffff8147189d>] __scsi_scan_target+0xad/0x260
[  961.327662]  [<ffffffff81471b68>] scsi_scan_target+0x118/0x130
[  961.327981]  [<ffffffffc00a9df6>] sas_rphy_add+0x106/0x170 [scsi_transport_sas]
[  961.328334]  [<ffffffffc00f2c3c>] adjust_hpsa_scsi_table+0xefc/0x10b0 [hpsa]
[  961.328705]  [<ffffffffc00f4120>] ? hpsa_update_scsi_devices+0x1330/0x18f0 [hpsa]
[  961.329055]  [<ffffffffc00f37d7>] hpsa_update_scsi_devices+0x9e7/0x18f0 [hpsa]
[  961.329442]  [<ffffffff812fa413>] ? blk_peek_request+0x83/0x280
[  961.329841]  [<ffffffffc00f4890>] hpsa_scan_start+0x1b0/0x1e0 [hpsa]
[  961.330216]  [<ffffffffc00f4fd1>] hpsa_rescan_ctlr_worker+0x181/0x657 [hpsa]
[  961.330655]  [<ffffffff810a881a>] process_one_work+0x17a/0x440
[  961.331089]  [<ffffffff810a94e6>] worker_thread+0x126/0x3c0
[  961.331519]  [<ffffffff810a93c0>] ? manage_workers.isra.24+0x2a0/0x2a0
[  961.331951]  [<ffffffff810b098f>] kthread+0xcf/0xe0
[  961.332379]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[  961.332830]  [<ffffffff816b4f58>] ret_from_fork+0x58/0x90
[  961.333249]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40

The delay processing disk changes causes problems in maintenance and
hardware failure situations where the machine has recently rebooted.

The HPSA HBA firmware presents disks before enclosures when it detects
a change to the SAS fabric; the patch speeds up initialization in
this case:

- When the module loads at boot, in the ses_intf_add() for each
disk device, there are not yet any enclosures registered for that host;
so ses_match_to_enclosure() in the enclosure_find() loop body is not
called.

- When ses_intf_add() is called for an enclosure device, it directly
calls ses_enclosure_data_process(), which populates the enclosure_device
with results from one page 10/page 7 query.  The subsequent
call to ses_match_to_enclosure() within the shost_for_each_device()
loop fixes up all the disks for that enclosure, with refresh=0
preventing redundant ses_enclosure_data_process() calls on all
registered enclosure_devices for each device processed within the loop.

- Once enclosure_devices have been registered for a host, ses_intf_add()
calls for disk devices in that host will refresh registered
enclosure_devices through the ses_match_to_enclosure() call within the
enclosure_find() loop. For disks within enclosures not yet processed
by ses_intf_add(), they get handled by the cases already given above.

The detection ordering of disks then enclosures at boot holds for HPSA
driver HBAs, but may not hold in general.  This case is also produced
by the HPSA driver if a cable to a chain of enclosures is replaced,
causing a whole chain of disks and enclosures to be re-detected, while 
other disks and enclosures remain on the host.

We confirmed that enclosure device sysfs entries are correctly created
for that case, and correctly re-created if the module is removed and
reprobed, just with extra time taken during ses_init() for the
disks that follow already created enclosure devices. For any ordering
of devices on a host where disks precede enclosure devices, this patch
will reduce time taken in ses_init(). 

So the patch helps for this use case, and does not hurt in general.

Tested-by: Jason Ozolins <jason.ozolins@hpe.com>

> With the patch:
> 
> [root@g1b-oss06 ~]# time modprobe ses
> 
> real	0m17.915s
> user	0m0.008s
> sys	0m0.053s
> 
> Note that we still need to refresh page 10 when we see a new disk
> to create the link.
> 
> Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au>
> ---
>  drivers/scsi/ses.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
> index 11826c5c2dd4..62f04c0511cf 100644
> --- a/drivers/scsi/ses.c
> +++ b/drivers/scsi/ses.c
> @@ -615,13 +615,16 @@ static void ses_enclosure_data_process(struct enclosure_device *edev,
>  }
>  
>  static void ses_match_to_enclosure(struct enclosure_device *edev,
> -				   struct scsi_device *sdev)
> +				   struct scsi_device *sdev,
> +				   int refresh)
>  {
> +	struct scsi_device *edev_sdev = to_scsi_device(edev->edev.parent);
>  	struct efd efd = {
>  		.addr = 0,
>  	};
>  
> -	ses_enclosure_data_process(edev, to_scsi_device(edev->edev.parent), 0);
> +	if (refresh)
> +		ses_enclosure_data_process(edev, edev_sdev, 0);
>  
>  	if (scsi_is_sas_rphy(sdev->sdev_target->dev.parent))
>  		efd.addr = sas_get_address(sdev);
> @@ -652,7 +655,7 @@ static int ses_intf_add(struct device *cdev,
>  		struct enclosure_device *prev = NULL;
>  
>  		while ((edev = enclosure_find(&sdev->host->shost_gendev, prev)) != NULL) {
> -			ses_match_to_enclosure(edev, sdev);
> +			ses_match_to_enclosure(edev, sdev, 1);
>  			prev = edev;
>  		}
>  		return -ENODEV;
> @@ -768,7 +771,7 @@ static int ses_intf_add(struct device *cdev,
>  	shost_for_each_device(tmp_sdev, sdev->host) {
>  		if (tmp_sdev->lun != 0 || scsi_device_enclosure(tmp_sdev))
>  			continue;
> -		ses_match_to_enclosure(edev, tmp_sdev);
> +		ses_match_to_enclosure(edev, tmp_sdev, 0);
>  	}
>  
>  	return 0;
>
diff mbox

Patch

diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
index 11826c5c2dd4..62f04c0511cf 100644
--- a/drivers/scsi/ses.c
+++ b/drivers/scsi/ses.c
@@ -615,13 +615,16 @@  static void ses_enclosure_data_process(struct enclosure_device *edev,
 }
 
 static void ses_match_to_enclosure(struct enclosure_device *edev,
-				   struct scsi_device *sdev)
+				   struct scsi_device *sdev,
+				   int refresh)
 {
+	struct scsi_device *edev_sdev = to_scsi_device(edev->edev.parent);
 	struct efd efd = {
 		.addr = 0,
 	};
 
-	ses_enclosure_data_process(edev, to_scsi_device(edev->edev.parent), 0);
+	if (refresh)
+		ses_enclosure_data_process(edev, edev_sdev, 0);
 
 	if (scsi_is_sas_rphy(sdev->sdev_target->dev.parent))
 		efd.addr = sas_get_address(sdev);
@@ -652,7 +655,7 @@  static int ses_intf_add(struct device *cdev,
 		struct enclosure_device *prev = NULL;
 
 		while ((edev = enclosure_find(&sdev->host->shost_gendev, prev)) != NULL) {
-			ses_match_to_enclosure(edev, sdev);
+			ses_match_to_enclosure(edev, sdev, 1);
 			prev = edev;
 		}
 		return -ENODEV;
@@ -768,7 +771,7 @@  static int ses_intf_add(struct device *cdev,
 	shost_for_each_device(tmp_sdev, sdev->host) {
 		if (tmp_sdev->lun != 0 || scsi_device_enclosure(tmp_sdev))
 			continue;
-		ses_match_to_enclosure(edev, tmp_sdev);
+		ses_match_to_enclosure(edev, tmp_sdev, 0);
 	}
 
 	return 0;