From patchwork Tue Jan 29 15:47:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 10786521 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C2DE11390 for ; Tue, 29 Jan 2019 15:48:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B08772CADC for ; Tue, 29 Jan 2019 15:48:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A4AD42CEDC; Tue, 29 Jan 2019 15:48:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 215E92CADC for ; Tue, 29 Jan 2019 15:48:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727616AbfA2PsA (ORCPT ); Tue, 29 Jan 2019 10:48:00 -0500 Received: from mail-wm1-f66.google.com ([209.85.128.66]:34312 "EHLO mail-wm1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725846AbfA2Pr7 (ORCPT ); Tue, 29 Jan 2019 10:47:59 -0500 Received: by mail-wm1-f66.google.com with SMTP id y185so13281967wmd.1 for ; Tue, 29 Jan 2019 07:47:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=JELi5Y0i9mNyLhYIRHNHnX5cvRp987Geg2DWrZDJicM=; b=OcddzblE527FuBXiEpz9IsKNhDaDia0rPXBqtXh1fs5vrJyHCA12H+YZttn6fHwYh2 9h2fGPU8e78iKD2JYk6aRMDPNz0imD5Erqmbi0qBnCGsfIF+FA3yNhu6eVB66Y1Ug+Xi bOuJx4jcwiFgjPddqH+rkK9FL/CysnOhQrJzF9X6iMtTHW//i8mL5RzvnTjxawHASdqy 5PBf8O75X5/KONH/NbDMRRqwgb4awgAqR4fgTvSP99l4NTQ7+/8P/VZRfQSNDr9Wk7v4 ukVY41c3QaLGuQnnRdJSlEhUImDPYnNf8jzL3IJOcWljy6gEGY1epUlC/983HdVtLJbU pQwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=JELi5Y0i9mNyLhYIRHNHnX5cvRp987Geg2DWrZDJicM=; b=PByaYd0IGchTHQ97eSasoA0dGGmYv6Vkn1WP8KpQmg/nfVKWe6pm7NVkplrKekNAQ1 EXFJZfBu4XLJcMVaAlW57PNTNUfnMEN+BOukyXZvkP59OWqriIh50Ja746F4HxlOxynY +7zuanAtFnJ0aPJXcmMth+//+Sp/4nP2XOw0M6ogPeuJeNBur66wYt0ElLmaVmvra75n 49t2Osm+yamaiw1/Vvl0T8dmDz0cho6KKUR6kMmtYpAi9bYcd8emX7qsW9IjX23wGzB5 mbXo3QD3ykpO+JE7j4369O2+ibBNNT0Elcj11HcUuGA4CgmmpoN4HSbuo7BkM2IkRWKR 0oTQ== X-Gm-Message-State: AJcUukc49/U6Tz0C7qzEzXSEgn5vP/byMhzkDMCDdyOioNfXy4VIdqIu 4ywltZvOZ9jUb7dAO4VYmKqlfGZ8 X-Google-Smtp-Source: ALg8bN7CPYGl+YiYEoSHEi0ygTNk1Bju9XciZmkobe/nemR5hUn7AomgYzbH0OfdpdmfM4KLmLil8g== X-Received: by 2002:a7b:cb86:: with SMTP id m6mr21653436wmi.61.1548776876455; Tue, 29 Jan 2019 07:47:56 -0800 (PST) Received: from orange.brq.redhat.com (nat-pool-brq-t.redhat.com. [213.175.37.10]) by smtp.gmail.com with ESMTPSA id g9sm3129423wmg.44.2019.01.29.07.47.55 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 29 Jan 2019 07:47:55 -0800 (PST) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Subject: [PATCH 3/3] rbd: round off and ignore discards that are too small Date: Tue, 29 Jan 2019 16:47:29 +0100 Message-Id: <20190129154729.1031-4-idryomov@gmail.com> X-Mailer: git-send-email 2.14.4 In-Reply-To: <20190129154729.1031-1-idryomov@gmail.com> References: <20190129154729.1031-1-idryomov@gmail.com> Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If, after rounding off, the discard request is smaller than alloc_size, drop it on the floor in __rbd_img_fill_request(). Default alloc_size to 64k. This should cover both HDD and SSD based bluestore OSDs and somewhat improve things for filestore. For OSDs on filestore with filestore_punch_hole = false, alloc_size is best set to object size in order to allow deletes and truncates and disallow zero op. Signed-off-by: Ilya Dryomov --- drivers/block/rbd.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 50 insertions(+), 6 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 87f30bf49f1f..1f5517355af3 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -733,6 +733,7 @@ static struct rbd_client *rbd_client_find(struct ceph_options *ceph_opts) */ enum { Opt_queue_depth, + Opt_alloc_size, Opt_lock_timeout, Opt_last_int, /* int args above */ @@ -749,6 +750,7 @@ enum { static match_table_t rbd_opts_tokens = { {Opt_queue_depth, "queue_depth=%d"}, + {Opt_alloc_size, "alloc_size=%d"}, {Opt_lock_timeout, "lock_timeout=%d"}, /* int args above */ {Opt_pool_ns, "_pool_ns=%s"}, @@ -765,6 +767,7 @@ static match_table_t rbd_opts_tokens = { struct rbd_options { int queue_depth; + int alloc_size; unsigned long lock_timeout; bool read_only; bool lock_on_read; @@ -773,6 +776,7 @@ struct rbd_options { }; #define RBD_QUEUE_DEPTH_DEFAULT BLKDEV_MAX_RQ +#define RBD_ALLOC_SIZE_DEFAULT (64 * 1024) #define RBD_LOCK_TIMEOUT_DEFAULT 0 /* no timeout */ #define RBD_READ_ONLY_DEFAULT false #define RBD_LOCK_ON_READ_DEFAULT false @@ -812,6 +816,17 @@ static int parse_rbd_opts_token(char *c, void *private) } pctx->opts->queue_depth = intval; break; + case Opt_alloc_size: + if (intval < 1) { + pr_err("alloc_size out of range\n"); + return -EINVAL; + } + if (!is_power_of_2(intval)) { + pr_err("alloc_size must be a power of 2\n"); + return -EINVAL; + } + pctx->opts->alloc_size = intval; + break; case Opt_lock_timeout: /* 0 is "wait forever" (i.e. infinite timeout) */ if (intval < 0 || intval > INT_MAX / 1000) { @@ -1853,6 +1868,26 @@ static u16 truncate_or_zero_opcode(struct rbd_obj_request *obj_req) static int rbd_obj_setup_discard(struct rbd_obj_request *obj_req) { + struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; + u64 off = obj_req->ex.oe_off; + u64 next_off = obj_req->ex.oe_off + obj_req->ex.oe_len; + + /* + * Align the range to alloc_size boundary and punt on discards + * that are too small to free up any space. + * + * alloc_size == object_size && is_tail() is a special case for + * filestore with filestore_punch_hole = false, needed to allow + * truncate (in addition to delete). + */ + if (rbd_dev->opts->alloc_size != rbd_dev->layout.object_size || + !rbd_obj_is_tail(obj_req)) { + off = round_up(off, rbd_dev->opts->alloc_size); + next_off = round_down(next_off, rbd_dev->opts->alloc_size); + if (off >= next_off) + return 1; + } + obj_req->osd_req = rbd_osd_req_create(obj_req, 1); if (!obj_req->osd_req) return -ENOMEM; @@ -1860,10 +1895,12 @@ static int rbd_obj_setup_discard(struct rbd_obj_request *obj_req) if (rbd_obj_is_entire(obj_req)) { osd_req_op_init(obj_req->osd_req, 0, CEPH_OSD_OP_DELETE, 0); } else { + dout("%s %p %llu~%llu -> %llu~%llu\n", __func__, + obj_req, obj_req->ex.oe_off, obj_req->ex.oe_len, + off, next_off - off); osd_req_op_extent_init(obj_req->osd_req, 0, truncate_or_zero_opcode(obj_req), - obj_req->ex.oe_off, obj_req->ex.oe_len, - 0, 0); + off, next_off - off, 0, 0); } obj_req->write_state = RBD_OBJ_WRITE_FLAT; @@ -1946,10 +1983,10 @@ static int rbd_obj_setup_zeroout(struct rbd_obj_request *obj_req) */ static int __rbd_img_fill_request(struct rbd_img_request *img_req) { - struct rbd_obj_request *obj_req; + struct rbd_obj_request *obj_req, *next_obj_req; int ret; - for_each_obj_request(img_req, obj_req) { + for_each_obj_request_safe(img_req, obj_req, next_obj_req) { switch (img_req->op_type) { case OBJ_OP_READ: ret = rbd_obj_setup_read(obj_req); @@ -1966,8 +2003,14 @@ static int __rbd_img_fill_request(struct rbd_img_request *img_req) default: rbd_assert(0); } - if (ret) + if (ret < 0) return ret; + if (ret > 0) { + img_req->xferred += obj_req->ex.oe_len; + img_req->pending_count--; + rbd_img_obj_request_del(img_req, obj_req); + continue; + } ret = ceph_osdc_alloc_messages(obj_req->osd_req, GFP_NOIO); if (ret) @@ -3757,7 +3800,7 @@ static void rbd_queue_workfn(struct work_struct *work) else result = rbd_img_fill_from_bio(img_request, offset, length, rq->bio); - if (result) + if (result || !img_request->pending_count) goto err_img_request; rbd_img_request_submit(img_request); @@ -5418,6 +5461,7 @@ static int rbd_add_parse_args(const char *buf, pctx.opts->read_only = RBD_READ_ONLY_DEFAULT; pctx.opts->queue_depth = RBD_QUEUE_DEPTH_DEFAULT; + pctx.opts->alloc_size = RBD_ALLOC_SIZE_DEFAULT; pctx.opts->lock_timeout = RBD_LOCK_TIMEOUT_DEFAULT; pctx.opts->lock_on_read = RBD_LOCK_ON_READ_DEFAULT; pctx.opts->exclusive = RBD_EXCLUSIVE_DEFAULT;