From patchwork Tue Jul 26 05:19:37 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Liu X-Patchwork-Id: 9247625 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 13E7360757 for ; Tue, 26 Jul 2016 05:22:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EFE7626223 for ; Tue, 26 Jul 2016 05:22:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DF577276AE; Tue, 26 Jul 2016 05:22:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id E8F2326223 for ; Tue, 26 Jul 2016 05:22:50 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bRumm-0003Sb-Nz; Tue, 26 Jul 2016 05:20:08 +0000 Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bRuml-0003Ow-SU for xen-devel@lists.xenproject.org; Tue, 26 Jul 2016 05:20:08 +0000 Received: from [85.158.137.68] by server-13.bemta-3.messagelabs.com id E4/36-06162-703F6975; Tue, 26 Jul 2016 05:20:07 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrBLMWRWlGSWpSXmKPExsXSO6nOVZft87R wg65Ki+9bJjM5MHoc/nCFJYAxijUzLym/IoE148Kk8oLVURXnp59kbGBs8eli5OIQEuhgktjb dIMNwvnCKLFs8V4oZyOjxMzzV1khnD5Gif8rlzJ3MXJwsAkoScy/n97FyMkhIqAgsbn3GStIm FmgTGLyX2GQsLBAgMS/r82MIDaLgKrE962f2EFsXgEnidObbjOB2BICihLdzyawgbRyCjhL7H 7AARIWAip5ebWBDaLEWKJvVh/LBEa+BYwMqxg1ilOLylKLdI2M9ZKKMtMzSnITM3N0DQ2M9XJ Ti4sT01NzEpOK9ZLzczcxAkOknoGBcQdj316/Q4ySHExKorxME6aFC/El5adUZiQWZ8QXleak Fh9ilOHgUJLgvfcRKCdYlJqeWpGWmQMMVpi0BAePkgjvS5A0b3FBYm5xZjpE6hSjopQ4bxNIQ gAkkVGaB9cGi5BLjLJSwryMDAwMQjwFqUW5mSWo8q8YxTkYlYR534NM4cnMK4Gb/gpoMRPQ4g U8k0EWlyQipKQaGBekrlsfYiRmdrfsbmWiwlnW5BV8NXEhJ2cfunht4QHbf/wOtm6eHfEpFye /cWXs4g1aE9diVRl896jYQnflsCm/WBqkjh44JM90RPVtj8uiM/bzt2jaHNjOskw92U6uY0Hl Wq39m/6HTdg6u9kxb9v986mOzJwl62M2aAls4jPpeluWkjRfVYmlOCPRUIu5qDgRAC/Jy1aLA gAA X-Env-Sender: bob.liu@oracle.com X-Msg-Ref: server-5.tower-31.messagelabs.com!1469510404!49220473!1 X-Originating-IP: [141.146.126.69] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogMTQxLjE0Ni4xMjYuNjkgPT4gMjc3MjE4\n X-StarScan-Received: X-StarScan-Version: 8.77; banners=-,-,- X-VirusChecked: Checked Received: (qmail 57682 invoked from network); 26 Jul 2016 05:20:05 -0000 Received: from aserp1040.oracle.com (HELO aserp1040.oracle.com) (141.146.126.69) by server-5.tower-31.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 26 Jul 2016 05:20:05 -0000 Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u6Q5K3HY029575 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 26 Jul 2016 05:20:03 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u6Q5K2kR019266 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 26 Jul 2016 05:20:03 GMT Received: from abhmp0004.oracle.com (abhmp0004.oracle.com [141.146.116.10]) by userv0121.oracle.com (8.13.8/8.13.8) with ESMTP id u6Q5K2fq025019; Tue, 26 Jul 2016 05:20:02 GMT Received: from boliuliu.jp.oracle.com (/10.191.12.130) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 25 Jul 2016 22:20:01 -0700 From: Bob Liu To: linux-kernel@vger.kernel.org Date: Tue, 26 Jul 2016 13:19:37 +0800 Message-Id: <1469510377-15131-3-git-send-email-bob.liu@oracle.com> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <1469510377-15131-1-git-send-email-bob.liu@oracle.com> References: <1469510377-15131-1-git-send-email-bob.liu@oracle.com> X-Source-IP: aserv0021.oracle.com [141.146.126.233] Cc: xen-devel@lists.xenproject.org, Bob Liu , roger.pau@citrix.com Subject: [Xen-devel] [PATCH v2 3/3] xen-blkfront: dynamic configuration of per-vbd resources X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP The current VBD layer reserves buffer space for each attached device based on three statically configured settings which are read at boot time. * max_indirect_segs: Maximum amount of segments. * max_ring_page_order: Maximum order of pages to be used for the shared ring. * max_queues: Maximum of queues(rings) to be used. But the storage backend, workload, and guest memory result in very different tuning requirements. It's impossible to centrally predict application characteristics so it's best to leave allow the settings can be dynamiclly adjusted based on workload inside the Guest. Usage: Show current values: cat /sys/devices/vbd-xxx/max_indirect_segs cat /sys/devices/vbd-xxx/max_ring_page_order cat /sys/devices/vbd-xxx/max_queues Write new values: echo > /sys/devices/vbd-xxx/max_indirect_segs echo > /sys/devices/vbd-xxx/max_ring_page_order echo > /sys/devices/vbd-xxx/max_queues Signed-off-by: Bob Liu --- v2: Rename to max_ring_page_order and rm the waiting code suggested by Roger. --- drivers/block/xen-blkfront.c | 275 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 269 insertions(+), 6 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 1b4c380..ff5ebe5 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -212,6 +212,11 @@ struct blkfront_info /* Save uncomplete reqs and bios for migration. */ struct list_head requests; struct bio_list bio_list; + /* For dynamic configuration. */ + unsigned int reconfiguring:1; + int new_max_indirect_segments; + int max_ring_page_order; + int max_queues; }; static unsigned int nr_minors; @@ -1350,6 +1355,31 @@ static void blkif_free(struct blkfront_info *info, int suspend) for (i = 0; i < info->nr_rings; i++) blkif_free_ring(&info->rinfo[i]); + /* Remove old xenstore nodes. */ + if (info->nr_ring_pages > 1) + xenbus_rm(XBT_NIL, info->xbdev->nodename, "ring-page-order"); + + if (info->nr_rings == 1) { + if (info->nr_ring_pages == 1) { + xenbus_rm(XBT_NIL, info->xbdev->nodename, "ring-ref"); + } else { + for (i = 0; i < info->nr_ring_pages; i++) { + char ring_ref_name[RINGREF_NAME_LEN]; + + snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i); + xenbus_rm(XBT_NIL, info->xbdev->nodename, ring_ref_name); + } + } + } else { + xenbus_rm(XBT_NIL, info->xbdev->nodename, "multi-queue-num-queues"); + + for (i = 0; i < info->nr_rings; i++) { + char queuename[QUEUE_NAME_LEN]; + + snprintf(queuename, QUEUE_NAME_LEN, "queue-%u", i); + xenbus_rm(XBT_NIL, info->xbdev->nodename, queuename); + } + } kfree(info->rinfo); info->rinfo = NULL; info->nr_rings = 0; @@ -1763,15 +1793,21 @@ static int talk_to_blkback(struct xenbus_device *dev, const char *message = NULL; struct xenbus_transaction xbt; int err; - unsigned int i, max_page_order = 0; + unsigned int i, backend_max_order = 0; unsigned int ring_page_order = 0; err = xenbus_scanf(XBT_NIL, info->xbdev->otherend, - "max-ring-page-order", "%u", &max_page_order); + "max-ring-page-order", "%u", &backend_max_order); if (err != 1) info->nr_ring_pages = 1; else { - ring_page_order = min(xen_blkif_max_ring_order, max_page_order); + if (info->max_ring_page_order) { + /* Dynamic configured through /sys. */ + BUG_ON(info->max_ring_page_order > backend_max_order); + ring_page_order = info->max_ring_page_order; + } else + /* Default. */ + ring_page_order = min(xen_blkif_max_ring_order, backend_max_order); info->nr_ring_pages = 1 << ring_page_order; } @@ -1894,7 +1930,14 @@ static int negotiate_mq(struct blkfront_info *info) if (err < 0) backend_max_queues = 1; - info->nr_rings = min(backend_max_queues, xen_blkif_max_queues); + if (info->max_queues) { + /* Dynamic configured through /sys */ + BUG_ON(info->max_queues > backend_max_queues); + info->nr_rings = info->max_queues; + } else + /* Default. */ + info->nr_rings = min(backend_max_queues, xen_blkif_max_queues); + /* We need at least one ring. */ if (!info->nr_rings) info->nr_rings = 1; @@ -2352,11 +2395,197 @@ static void blkfront_gather_backend_features(struct blkfront_info *info) NULL); if (err) info->max_indirect_segments = 0; - else + else { info->max_indirect_segments = min(indirect_segments, xen_blkif_max_segments); + if (info->new_max_indirect_segments) { + BUG_ON(info->new_max_indirect_segments > indirect_segments); + info->max_indirect_segments = info->new_max_indirect_segments; + } + } +} + +static ssize_t max_ring_page_order_show(struct device *dev, + struct device_attribute *attr, char *page) +{ + struct blkfront_info *info = dev_get_drvdata(dev); + + return sprintf(page, "%u\n", get_order(info->nr_ring_pages * XEN_PAGE_SIZE)); } +static ssize_t max_indirect_segs_show(struct device *dev, + struct device_attribute *attr, char *page) +{ + struct blkfront_info *info = dev_get_drvdata(dev); + + return sprintf(page, "%u\n", info->max_indirect_segments); +} + +static ssize_t max_queues_show(struct device *dev, + struct device_attribute *attr, char *page) +{ + struct blkfront_info *info = dev_get_drvdata(dev); + + return sprintf(page, "%u\n", info->nr_rings); +} + +static ssize_t dynamic_reconfig_device(struct blkfront_info *info, ssize_t count) +{ + /* + * Prevent new requests even to software request queue. + */ + blk_mq_freeze_queue(info->rq); + + /* + * Guarantee no uncompleted reqs. + */ + if (part_in_flight(&info->gd->part0) || info->reconfiguring) { + blk_mq_unfreeze_queue(info->rq); + pr_err("Dev:%s busy, please retry later.\n", dev_name(&info->xbdev->dev)); + return -EBUSY; + } + + /* + * Frontend: Backend: + * Freeze_queue() + * Switch to XenbusStateClosed + * frontend_changed(StateClosed) + * > xen_blkif_disconnect() + * > Switch to XenbusStateClosed + * blkback_changed(StateClosed) + * > blkfront_resume() + * > Switch to StateInitialised + * frontend_changed(StateInitialised): + * > reconnect + * > Switch to StateConnected + * blkback_changed(StateConnected) + * > blkif_recover() + * > Also switch to StateConnected + * > Unfreeze_queue() + */ + info->reconfiguring = true; + xenbus_switch_state(info->xbdev, XenbusStateClosed); + + return count; +} + +static ssize_t max_indirect_segs_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t count) +{ + ssize_t ret; + unsigned int max_segs = 0, backend_max_segs = 0; + struct blkfront_info *info = dev_get_drvdata(dev); + int err; + + ret = kstrtouint(buf, 10, &max_segs); + if (ret < 0) + return ret; + + if (max_segs == info->max_indirect_segments) + return count; + + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, + "feature-max-indirect-segments", "%u", &backend_max_segs, + NULL); + if (err) { + pr_err("Backend %s doesn't support feature-indirect-segments.\n", + info->xbdev->otherend); + return -EOPNOTSUPP; + } + + if (max_segs > backend_max_segs) { + pr_err("Invalid max indirect segment (%u), backend-max: %u.\n", + max_segs, backend_max_segs); + return -EINVAL; + } + + info->new_max_indirect_segments = max_segs; + + return dynamic_reconfig_device(info, count); +} + +static ssize_t max_ring_page_order_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + ssize_t ret; + unsigned int max_order = 0, backend_max_order = 0; + struct blkfront_info *info = dev_get_drvdata(dev); + int err; + + ret = kstrtouint(buf, 10, &max_order); + if (ret < 0) + return ret; + + if ((1 << max_order) == info->nr_ring_pages) + return count; + + if (max_order > XENBUS_MAX_RING_GRANT_ORDER) { + pr_err("Invalid max_ring_page_order (%u), max: %u.\n", + max_order, XENBUS_MAX_RING_GRANT_ORDER); + return -EINVAL; + } + + err = xenbus_scanf(XBT_NIL, info->xbdev->otherend, + "max-ring-page-order", "%u", &backend_max_order); + if (err != 1) { + pr_err("Backend %s doesn't support feature multi-page-ring.\n", + info->xbdev->otherend); + return -EOPNOTSUPP; + } + if (max_order > backend_max_order) { + pr_err("Invalid max_ring_page_order (%u), backend supports max: %u.\n", + max_order, backend_max_order); + return -EINVAL; + } + info->max_ring_page_order = max_order; + + return dynamic_reconfig_device(info, count); +} + +static ssize_t max_queues_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + ssize_t ret; + unsigned int max_queues = 0, backend_max_queues = 0; + struct blkfront_info *info = dev_get_drvdata(dev); + int err; + + ret = kstrtouint(buf, 10, &max_queues); + if (ret < 0) + return ret; + + if (max_queues == info->nr_rings) + return count; + + if (max_queues > num_online_cpus()) { + pr_err("Invalid max_queues (%u), can't bigger than online cpus: %u.\n", + max_queues, num_online_cpus()); + return -EINVAL; + } + + err = xenbus_scanf(XBT_NIL, info->xbdev->otherend, + "multi-queue-max-queues", "%u", &backend_max_queues); + if (err != 1) { + pr_err("Backend %s doesn't support block multi queue.\n", + info->xbdev->otherend); + return -EOPNOTSUPP; + } + if (max_queues > backend_max_queues) { + pr_err("Invalid max_queues (%u), backend supports max: %u.\n", + max_queues, backend_max_queues); + return -EINVAL; + } + info->max_queues = max_queues; + + return dynamic_reconfig_device(info, count); +} + +static DEVICE_ATTR_RW(max_queues); +static DEVICE_ATTR_RW(max_ring_page_order); +static DEVICE_ATTR_RW(max_indirect_segs); + /* * Invoked when the backend is finally 'ready' (and has told produced * the details about the physical device - #sectors, size, etc). @@ -2393,6 +2622,10 @@ static void blkfront_connect(struct blkfront_info *info) * supports indirect descriptors, and how many. */ blkif_recover(info); + if (info->reconfiguring) { + blk_mq_unfreeze_queue(info->rq); + info->reconfiguring = false; + } return; default: @@ -2443,6 +2676,22 @@ static void blkfront_connect(struct blkfront_info *info) return; } + err = device_create_file(&info->xbdev->dev, &dev_attr_max_ring_page_order); + if (err) + goto fail; + + err = device_create_file(&info->xbdev->dev, &dev_attr_max_indirect_segs); + if (err) { + device_remove_file(&info->xbdev->dev, &dev_attr_max_ring_page_order); + goto fail; + } + + err = device_create_file(&info->xbdev->dev, &dev_attr_max_queues); + if (err) { + device_remove_file(&info->xbdev->dev, &dev_attr_max_ring_page_order); + device_remove_file(&info->xbdev->dev, &dev_attr_max_indirect_segs); + goto fail; + } xenbus_switch_state(info->xbdev, XenbusStateConnected); /* Kick pending requests. */ @@ -2453,6 +2702,12 @@ static void blkfront_connect(struct blkfront_info *info) add_disk(info->gd); info->is_ready = 1; + return; + +fail: + blkif_free(info, 0); + xlvbd_release_gendisk(info); + return; } /** @@ -2500,8 +2755,16 @@ static void blkback_changed(struct xenbus_device *dev, break; case XenbusStateClosed: - if (dev->state == XenbusStateClosed) + if (dev->state == XenbusStateClosed) { + if (info->reconfiguring) + if (blkfront_resume(info->xbdev)) { + /* Resume failed. */ + info->reconfiguring = false; + xenbus_switch_state(info->xbdev, XenbusStateClosed); + pr_err("Resume from dynamic configuration failed\n"); + } break; + } /* Missed the backend's Closing state -- fallthrough */ case XenbusStateClosing: if (info)