From patchwork Mon Jan 30 13:52:03 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kashyap Desai X-Patchwork-Id: 9545295 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id D93D060425 for ; Mon, 30 Jan 2017 13:52:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B4313271FD for ; Mon, 30 Jan 2017 13:52:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A768E27D16; Mon, 30 Jan 2017 13:52:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 15602271FD for ; Mon, 30 Jan 2017 13:52:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753411AbdA3NwL (ORCPT ); Mon, 30 Jan 2017 08:52:11 -0500 Received: from mail-qt0-f172.google.com ([209.85.216.172]:35408 "EHLO mail-qt0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753455AbdA3NwG (ORCPT ); Mon, 30 Jan 2017 08:52:06 -0500 Received: by mail-qt0-f172.google.com with SMTP id x49so202607930qtc.2 for ; Mon, 30 Jan 2017 05:52:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=from:references:in-reply-to:mime-version:thread-index:date :message-id:subject:to:cc; bh=up+6HxvUvgDPQTRI6Sk1pHcNRSH2oi9tQHmPWouH+6I=; b=D8nzSDWNqqLLch4hkCxuv1OPBsZlJb6K/xa4bxWONADSMhoHIWVWS780bKWUcFjaJ4 RpaPK6BSNN/+o+aVtyc9Pc/cNEuL3cQRBEOQQ/nCrcRd+6Hdd78Cl/j6fH3bMwkgTUmx TOmbrJkHmXgX8X1Y++MKD757lN6dW4MT1bspg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to:cc; bh=up+6HxvUvgDPQTRI6Sk1pHcNRSH2oi9tQHmPWouH+6I=; b=FR95V5Ir0h4dRbQ6A8jACyr1AOX/k2dtZ4AjY/Bm6CQhMy2cCZmmUQgjArPmGmFWT2 giW20v/DNBjULQiW6/tMcftcA3e/yG+eTArf2VIdNlGDYHUTH+/sDIYKXUoQUzEltYMc Zez/yzDDIIOC8Fx10L6BMecy/vt67z6tVUCYD6hmkvv4gHq5eNcYU8nDJaIZAomPO+4U OhMpry1WYcSs9lD31cXxPXB5g8OTDwWfpHDHikrl+kwGNdDqHCGm2VQfkl5lISKym0iJ K/JR7LiA4R0v1YAHrrZqxpWDwaEkWEWIrlklJkFBSLmzQ2yrqNwvP3bo2avBH7yarfUM xhgw== X-Gm-Message-State: AIkVDXJpbx3PYh4oGTznkiYHQWvj66KF2nEWvyla97MthVvnb/i9DeYUUgKM4sKdKRyawkuooCPG1TTf6BLoG2yf X-Received: by 10.200.35.250 with SMTP id r55mr20087714qtr.162.1485784325396; Mon, 30 Jan 2017 05:52:05 -0800 (PST) From: Kashyap Desai References: <2d656e9c9fbde7206e40a635c61a6084@mail.gmail.com> <298b6ff6-9feb-4b70-ec4c-d1295a0e1f41@kernel.dk> 7a9b012d8c7c456e9ec87d1ba5866a9d@mail.gmail.com In-Reply-To: 7a9b012d8c7c456e9ec87d1ba5866a9d@mail.gmail.com MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQG3KaPRY2263WgtinLgiKsr5/ednQIFsbNeoOl/yLCAjfkikA== Date: Mon, 30 Jan 2017 19:22:03 +0530 Message-ID: Subject: RE: Device or HBA level QD throttling creates randomness in sequetial workload To: Jens Axboe , Omar Sandoval Cc: linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Christoph Hellwig , paolo.valente@linaro.org Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Jens/Omar, I used git.kernel.dk/linux-block branch - blk-mq-sched (commit 0efe27068ecf37ece2728a99b863763286049ab5) and confirm that issue reported in this thread is resolved. Now I am seeing MQ and SQ mode both are resulting in sequential IO pattern while IO is getting re-queued in block layer. To make similar performance without blk-mq-sched feature, is it good to pause IO for few usec in LLD? I mean, I want to avoid driver asking SML/Block layer to re-queue the IO (if it is Sequential on Rotational media.) Explaining w.r.t megaraid_sas driver. This driver expose can_queue, but it internally consume commands for raid 1, fast path. In worst case, can_queue/2 will consume all firmware resources and driver will re-queue further IOs to SML as below - if (atomic_inc_return(&instance->fw_outstanding) > instance->host->can_queue) { atomic_dec(&instance->fw_outstanding); return SCSI_MLQUEUE_HOST_BUSY; } I want to avoid above SCSI_MLQUEUE_HOST_BUSY. Need your suggestion for below changes - @@ -2584,11 +2593,15 @@ void megasas_prepare_secondRaid1_IO(struct megasas_instance *instance, return SCSI_MLQUEUE_DEVICE_BUSY; } - if (atomic_inc_return(&instance->fw_outstanding) > - instance->host->can_queue) { - atomic_dec(&instance->fw_outstanding); - return SCSI_MLQUEUE_HOST_BUSY; - } + if (atomic_inc_return(&instance->fw_outstanding) > safe_can_queue) { + is_nonrot = blk_queue_nonrot(scmd->device->request_queue); + /* For rotational device wait for sometime to get fusion command from pool. + * This is just to reduce proactive re-queue at mid layer which is not + * sending sorted IO in SCSI.MQ mode. + */ + if (!is_nonrot) + udelay(100); + } cmd = megasas_get_cmd_fusion(instance, scmd->request->tag); ` Kashyap > -----Original Message----- > From: Kashyap Desai [mailto:kashyap.desai@broadcom.com] > Sent: Tuesday, November 01, 2016 11:11 AM > To: 'Jens Axboe'; 'Omar Sandoval' > Cc: 'linux-scsi@vger.kernel.org'; 'linux-kernel@vger.kernel.org'; 'linux- > block@vger.kernel.org'; 'Christoph Hellwig'; 'paolo.valente@linaro.org' > Subject: RE: Device or HBA level QD throttling creates randomness in > sequetial workload > > Jens- Replied inline. > > > Omar - I tested your WIP repo and figure out System hangs only if I pass > " > scsi_mod.use_blk_mq=Y". Without this, your WIP branch works fine, but I > am looking for scsi_mod.use_blk_mq=Y. > > Also below is snippet of blktrace. In case of higher per device QD, I see > Requeue request in blktrace. > > 65,128 10 6268 2.432404509 18594 P N [fio] > 65,128 10 6269 2.432405013 18594 U N [fio] 1 > 65,128 10 6270 2.432405143 18594 I WS 148800 + 8 [fio] > 65,128 10 6271 2.432405740 18594 R WS 148800 + 8 [0] > 65,128 10 6272 2.432409794 18594 Q WS 148808 + 8 [fio] > 65,128 10 6273 2.432410234 18594 G WS 148808 + 8 [fio] > 65,128 10 6274 2.432410424 18594 S WS 148808 + 8 [fio] > 65,128 23 3626 2.432432595 16232 D WS 148800 + 8 > [kworker/23:1H] > 65,128 22 3279 2.432973482 0 C WS 147432 + 8 [0] > 65,128 7 6126 2.433032637 18594 P N [fio] > 65,128 7 6127 2.433033204 18594 U N [fio] 1 > 65,128 7 6128 2.433033346 18594 I WS 148808 + 8 [fio] > 65,128 7 6129 2.433033871 18594 D WS 148808 + 8 [fio] > 65,128 7 6130 2.433034559 18594 R WS 148808 + 8 [0] > 65,128 7 6131 2.433039796 18594 Q WS 148816 + 8 [fio] > 65,128 7 6132 2.433040206 18594 G WS 148816 + 8 [fio] > 65,128 7 6133 2.433040351 18594 S WS 148816 + 8 [fio] > 65,128 9 6392 2.433133729 0 C WS 147240 + 8 [0] > 65,128 9 6393 2.433138166 905 D WS 148808 + 8 [kworker/9:1H] > 65,128 7 6134 2.433167450 18594 P N [fio] > 65,128 7 6135 2.433167911 18594 U N [fio] 1 > 65,128 7 6136 2.433168074 18594 I WS 148816 + 8 [fio] > 65,128 7 6137 2.433168492 18594 D WS 148816 + 8 [fio] > 65,128 7 6138 2.433174016 18594 Q WS 148824 + 8 [fio] > 65,128 7 6139 2.433174282 18594 G WS 148824 + 8 [fio] > 65,128 7 6140 2.433174613 18594 S WS 148824 + 8 [fio] > CPU0 (sdy): > Reads Queued: 0, 0KiB Writes Queued: 79, > 316KiB > Read Dispatches: 0, 0KiB Write Dispatches: 67, > 18,446,744,073PiB > Reads Requeued: 0 Writes Requeued: 86 > Reads Completed: 0, 0KiB Writes Completed: 98, > 392KiB > Read Merges: 0, 0KiB Write Merges: 0, > 0KiB > Read depth: 0 Write depth: 5 > IO unplugs: 79 Timer unplugs: 0 > > > > ` Kashyap > > > -----Original Message----- > > From: Jens Axboe [mailto:axboe@kernel.dk] > > Sent: Monday, October 31, 2016 10:54 PM > > To: Kashyap Desai; Omar Sandoval > > Cc: linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org; linux- > > block@vger.kernel.org; Christoph Hellwig; paolo.valente@linaro.org > > Subject: Re: Device or HBA level QD throttling creates randomness in > > sequetial workload > > > > Hi, > > > > One guess would be that this isn't around a requeue condition, but > > rather the fact that we don't really guarantee any sort of hard FIFO > > behavior between the software queues. Can you try this test patch to > > see if it changes the behavior for you? Warning: untested... > > Jens - I tested the patch, but I still see random IO pattern for expected > Sequential Run. I am intentionally running case of Re-queue and seeing > issue at the time of Re-queue. > If there is no Requeue, I see no issue at LLD. > > > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c index > > f3d27a6dee09..5404ca9c71b2 > > 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -772,6 +772,14 @@ static inline unsigned int > > queued_to_index(unsigned int > > queued) > > return min(BLK_MQ_MAX_DISPATCH_ORDER - 1, ilog2(queued) + 1); > > } > > > > +static int rq_pos_cmp(void *priv, struct list_head *a, struct > > +list_head > > +*b) { > > + struct request *rqa = container_of(a, struct request, queuelist); > > + struct request *rqb = container_of(b, struct request, queuelist); > > + > > + return blk_rq_pos(rqa) < blk_rq_pos(rqb); } > > + > > /* > > * Run this hardware queue, pulling any software queues mapped to it > > in. > > * Note that this function currently has various problems around > > ordering @@ - > > 812,6 +820,14 @@ static void __blk_mq_run_hw_queue(struct > > blk_mq_hw_ctx > > *hctx) > > } > > > > /* > > + * If the device is rotational, sort the list sanely to avoid > > + * unecessary seeks. The software queues are roughly FIFO, but > > + * only roughly, there are no hard guarantees. > > + */ > > + if (!blk_queue_nonrot(q)) > > + list_sort(NULL, &rq_list, rq_pos_cmp); > > + > > + /* > > * Start off with dptr being NULL, so we start the first request > > * immediately, even if we have more pending. > > */ > > > > -- > > Jens Axboe --- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c index 9a9c84f..a683eb0 100644 --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c @@ -54,6 +54,7 @@ #include #include #include +#include #include "megaraid_sas_fusion.h" #include "megaraid_sas.h" @@ -2572,7 +2573,15 @@ void megasas_prepare_secondRaid1_IO(struct megasas_instance *instance, struct megasas_cmd_fusion *cmd, *r1_cmd = NULL; union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc; u32 index; - struct fusion_context *fusion; + bool is_nonrot; + u32 safe_can_queue; + u32 num_cpus; + struct fusion_context *fusion; + + fusion = instance->ctrl_context; + + num_cpus = num_online_cpus(); + safe_can_queue = instance->cur_can_queue - num_cpus; fusion = instance->ctrl_context;