From patchwork Tue Jun 25 14:40:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015755 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B923C112C for ; Tue, 25 Jun 2019 14:41:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AB401289ED for ; Tue, 25 Jun 2019 14:41:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9F80628BC4; Tue, 25 Jun 2019 14:41:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 768EE212D5 for ; Tue, 25 Jun 2019 14:41:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731085AbfFYOlL (ORCPT ); Tue, 25 Jun 2019 10:41:11 -0400 Received: from mail-wr1-f68.google.com ([209.85.221.68]:39743 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730758AbfFYOlL (ORCPT ); Tue, 25 Jun 2019 10:41:11 -0400 Received: by mail-wr1-f68.google.com with SMTP id x4so18175313wrt.6 for ; Tue, 25 Jun 2019 07:41:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=yDrJLonIBJ2yVcjyU5Psh+oZg1bksaryXhyQxm2Gwug=; b=grBfuVkQlYIS5vX7cdQbMXBf2pV9nTOzxldF2T62cOWlwgYEmCyzAJ4Mz9wAF/UPZ1 wMuoY7z/QU5OffIscmLOn22+jcnl2APg0dWLTUbKFFhJUXAdKwevhzP+9zW10Dfjcjo9 jy28/lTTkHhHe1ANeSomuZ0GNTcVq8K4Pp5iRmr6fxEl5B816G90pAqA9//MO+r3J0EG Tq7LKo1W1a8bbnhcB+3bdeNws4XvxrY9YRGu7yMVRcPYQz3KPi+1gZRRTxQbp6smG8Wv UOS1Pc/h6C56HrtuFthftyB3neVBzVtuh9wMD28ryLXu1+PfjN2LMBpQcphvn9OEO0Yk AmWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=yDrJLonIBJ2yVcjyU5Psh+oZg1bksaryXhyQxm2Gwug=; b=Mf70KM+mJvs2lQfgBmRAGT6rHpZD9JwJfLOlw+TA5zxpaWxZd5sX3WNOYmlGmZvK9w WMCn1voN+dqVV5MTZDZxWLUDcYeJ2l5ETODp/nxRrSOzMsn5W0f4hesr/53U8gAXMgU3 IpghMobIfEv0HeyehM7x7DJWmgRhj/MdZ8L2hJOBu9jX8L146X20XiNdGNdy1Ul+wP5l Xk0eME5nBr6ApthFOu2roCmUPgF5WdEFwkq0n+OUhXACaXxhyDxwTtq3SIlW2/qBkLVB 6o2yA51XeKc6tKRiEv4G3hMy8PFz377po2YirlgEeCVq4Xt92eBQ2AI3t0L1U2kn1RjJ BiWQ== X-Gm-Message-State: APjAAAWBcljiqKCaqXTOAFI5JebeyfmFR0gImv43BMjq5UEo4Fn5sNLs aHmm7SxAnIbbzFH/5QLKhjzecfUP+/0= X-Google-Smtp-Source: APXvYqyl/uZ8VQKykWBMWI6hqdpfqih8PZz4lr5a/b/IABiql22Li20RzVMSHVaSIqEZc0T4bCi4+Q== X-Received: by 2002:a5d:46ce:: with SMTP id g14mr1646436wrs.203.1561473668190; Tue, 25 Jun 2019 07:41:08 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.07 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:07 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 01/20] rbd: get rid of obj_req->xferred, obj_req->result and img_req->xferred Date: Tue, 25 Jun 2019 16:40:52 +0200 Message-Id: <20190625144111.11270-2-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP obj_req->xferred and img_req->xferred don't bring any value. The former is used for short reads and has to be set to obj_req->ex.oe_len after that and elsewhere. The latter is just an aggregate. Use result for short reads (>=0 - number of bytes read, <0 - error) and pass it around explicitly. No need to store it in obj_req. Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang --- drivers/block/rbd.c | 149 +++++++++++++++++--------------------------- 1 file changed, 58 insertions(+), 91 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index e5009a34f9c2..a9b0b23148f9 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -276,9 +276,6 @@ struct rbd_obj_request { struct ceph_osd_request *osd_req; - u64 xferred; /* bytes transferred */ - int result; - struct kref kref; }; @@ -301,7 +298,6 @@ struct rbd_img_request { struct rbd_obj_request *obj_request; /* obj req initiator */ }; spinlock_t completion_lock; - u64 xferred;/* aggregate bytes transferred */ int result; /* first nonzero obj_request result */ struct list_head object_extents; /* obj_req.ex structs */ @@ -584,6 +580,8 @@ static int _rbd_dev_v2_snap_size(struct rbd_device *rbd_dev, u64 snap_id, static int _rbd_dev_v2_snap_features(struct rbd_device *rbd_dev, u64 snap_id, u64 *snap_features); +static void rbd_obj_handle_request(struct rbd_obj_request *obj_req, int result); + static int rbd_open(struct block_device *bdev, fmode_t mode) { struct rbd_device *rbd_dev = bdev->bd_disk->private_data; @@ -1317,6 +1315,8 @@ static void zero_bvecs(struct ceph_bvec_iter *bvec_pos, u32 off, u32 bytes) static void rbd_obj_zero_range(struct rbd_obj_request *obj_req, u32 off, u32 bytes) { + dout("%s %p data buf %u~%u\n", __func__, obj_req, off, bytes); + switch (obj_req->img_request->data_type) { case OBJ_REQUEST_BIO: zero_bios(&obj_req->bio_pos, off, bytes); @@ -1457,28 +1457,26 @@ static bool rbd_img_is_write(struct rbd_img_request *img_req) } } -static void rbd_obj_handle_request(struct rbd_obj_request *obj_req); - static void rbd_osd_req_callback(struct ceph_osd_request *osd_req) { struct rbd_obj_request *obj_req = osd_req->r_priv; + int result; dout("%s osd_req %p result %d for obj_req %p\n", __func__, osd_req, osd_req->r_result, obj_req); rbd_assert(osd_req == obj_req->osd_req); - obj_req->result = osd_req->r_result < 0 ? osd_req->r_result : 0; - if (!obj_req->result && !rbd_img_is_write(obj_req->img_request)) - obj_req->xferred = osd_req->r_result; + /* + * Writes aren't allowed to return a data payload. In some + * guarded write cases (e.g. stat + zero on an empty object) + * a stat response makes it through, but we don't care. + */ + if (osd_req->r_result > 0 && rbd_img_is_write(obj_req->img_request)) + result = 0; else - /* - * Writes aren't allowed to return a data payload. In some - * guarded write cases (e.g. stat + zero on an empty object) - * a stat response makes it through, but we don't care. - */ - obj_req->xferred = 0; + result = osd_req->r_result; - rbd_obj_handle_request(obj_req); + rbd_obj_handle_request(obj_req, result); } static void rbd_osd_req_format_read(struct rbd_obj_request *obj_request) @@ -2041,7 +2039,6 @@ static int __rbd_img_fill_request(struct rbd_img_request *img_req) if (ret < 0) return ret; if (ret > 0) { - img_req->xferred += obj_req->ex.oe_len; img_req->pending_count--; rbd_img_obj_request_del(img_req, obj_req); continue; @@ -2400,17 +2397,17 @@ static int rbd_obj_read_from_parent(struct rbd_obj_request *obj_req) return 0; } -static bool rbd_obj_handle_read(struct rbd_obj_request *obj_req) +static bool rbd_obj_handle_read(struct rbd_obj_request *obj_req, int *result) { struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; int ret; - if (obj_req->result == -ENOENT && + if (*result == -ENOENT && rbd_dev->parent_overlap && !obj_req->tried_parent) { /* reverse map this object extent onto the parent */ ret = rbd_obj_calc_img_extents(obj_req, false); if (ret) { - obj_req->result = ret; + *result = ret; return true; } @@ -2418,7 +2415,7 @@ static bool rbd_obj_handle_read(struct rbd_obj_request *obj_req) obj_req->tried_parent = true; ret = rbd_obj_read_from_parent(obj_req); if (ret) { - obj_req->result = ret; + *result = ret; return true; } return false; @@ -2428,16 +2425,18 @@ static bool rbd_obj_handle_read(struct rbd_obj_request *obj_req) /* * -ENOENT means a hole in the image -- zero-fill the entire * length of the request. A short read also implies zero-fill - * to the end of the request. In both cases we update xferred - * count to indicate the whole request was satisfied. + * to the end of the request. */ - if (obj_req->result == -ENOENT || - (!obj_req->result && obj_req->xferred < obj_req->ex.oe_len)) { - rbd_assert(!obj_req->xferred || !obj_req->result); - rbd_obj_zero_range(obj_req, obj_req->xferred, - obj_req->ex.oe_len - obj_req->xferred); - obj_req->result = 0; - obj_req->xferred = obj_req->ex.oe_len; + if (*result == -ENOENT) { + rbd_obj_zero_range(obj_req, 0, obj_req->ex.oe_len); + *result = 0; + } else if (*result >= 0) { + if (*result < obj_req->ex.oe_len) + rbd_obj_zero_range(obj_req, *result, + obj_req->ex.oe_len - *result); + else + rbd_assert(*result == obj_req->ex.oe_len); + *result = 0; } return true; @@ -2635,14 +2634,13 @@ static int rbd_obj_handle_write_guard(struct rbd_obj_request *obj_req) return rbd_obj_read_from_parent(obj_req); } -static bool rbd_obj_handle_write(struct rbd_obj_request *obj_req) +static bool rbd_obj_handle_write(struct rbd_obj_request *obj_req, int *result) { int ret; switch (obj_req->write_state) { case RBD_OBJ_WRITE_GUARD: - rbd_assert(!obj_req->xferred); - if (obj_req->result == -ENOENT) { + if (*result == -ENOENT) { /* * The target object doesn't exist. Read the data for * the entire target object up to the overlap point (if @@ -2650,7 +2648,7 @@ static bool rbd_obj_handle_write(struct rbd_obj_request *obj_req) */ ret = rbd_obj_handle_write_guard(obj_req); if (ret) { - obj_req->result = ret; + *result = ret; return true; } return false; @@ -2658,33 +2656,26 @@ static bool rbd_obj_handle_write(struct rbd_obj_request *obj_req) /* fall through */ case RBD_OBJ_WRITE_FLAT: case RBD_OBJ_WRITE_COPYUP_OPS: - if (!obj_req->result) - /* - * There is no such thing as a successful short - * write -- indicate the whole request was satisfied. - */ - obj_req->xferred = obj_req->ex.oe_len; return true; case RBD_OBJ_WRITE_READ_FROM_PARENT: - if (obj_req->result) + if (*result < 0) return true; - rbd_assert(obj_req->xferred); - ret = rbd_obj_issue_copyup(obj_req, obj_req->xferred); + rbd_assert(*result); + ret = rbd_obj_issue_copyup(obj_req, *result); if (ret) { - obj_req->result = ret; - obj_req->xferred = 0; + *result = ret; return true; } return false; case RBD_OBJ_WRITE_COPYUP_EMPTY_SNAPC: - if (obj_req->result) + if (*result) return true; obj_req->write_state = RBD_OBJ_WRITE_COPYUP_OPS; ret = rbd_obj_issue_copyup_ops(obj_req, MODS_ONLY); if (ret) { - obj_req->result = ret; + *result = ret; return true; } return false; @@ -2696,24 +2687,23 @@ static bool rbd_obj_handle_write(struct rbd_obj_request *obj_req) /* * Returns true if @obj_req is completed, or false otherwise. */ -static bool __rbd_obj_handle_request(struct rbd_obj_request *obj_req) +static bool __rbd_obj_handle_request(struct rbd_obj_request *obj_req, + int *result) { switch (obj_req->img_request->op_type) { case OBJ_OP_READ: - return rbd_obj_handle_read(obj_req); + return rbd_obj_handle_read(obj_req, result); case OBJ_OP_WRITE: - return rbd_obj_handle_write(obj_req); + return rbd_obj_handle_write(obj_req, result); case OBJ_OP_DISCARD: case OBJ_OP_ZEROOUT: - if (rbd_obj_handle_write(obj_req)) { + if (rbd_obj_handle_write(obj_req, result)) { /* * Hide -ENOENT from delete/truncate/zero -- discarding * a non-existent object is not a problem. */ - if (obj_req->result == -ENOENT) { - obj_req->result = 0; - obj_req->xferred = obj_req->ex.oe_len; - } + if (*result == -ENOENT) + *result = 0; return true; } return false; @@ -2722,66 +2712,41 @@ static bool __rbd_obj_handle_request(struct rbd_obj_request *obj_req) } } -static void rbd_obj_end_request(struct rbd_obj_request *obj_req) +static void rbd_obj_end_request(struct rbd_obj_request *obj_req, int result) { struct rbd_img_request *img_req = obj_req->img_request; - rbd_assert((!obj_req->result && - obj_req->xferred == obj_req->ex.oe_len) || - (obj_req->result < 0 && !obj_req->xferred)); - if (!obj_req->result) { - img_req->xferred += obj_req->xferred; + rbd_assert(result <= 0); + if (!result) return; - } - rbd_warn(img_req->rbd_dev, - "%s at objno %llu %llu~%llu result %d xferred %llu", + rbd_warn(img_req->rbd_dev, "%s at objno %llu %llu~%llu result %d", obj_op_name(img_req->op_type), obj_req->ex.oe_objno, - obj_req->ex.oe_off, obj_req->ex.oe_len, obj_req->result, - obj_req->xferred); - if (!img_req->result) { - img_req->result = obj_req->result; - img_req->xferred = 0; - } -} - -static void rbd_img_end_child_request(struct rbd_img_request *img_req) -{ - struct rbd_obj_request *obj_req = img_req->obj_request; - - rbd_assert(test_bit(IMG_REQ_CHILD, &img_req->flags)); - rbd_assert((!img_req->result && - img_req->xferred == rbd_obj_img_extents_bytes(obj_req)) || - (img_req->result < 0 && !img_req->xferred)); - - obj_req->result = img_req->result; - obj_req->xferred = img_req->xferred; - rbd_img_request_put(img_req); + obj_req->ex.oe_off, obj_req->ex.oe_len, result); + if (!img_req->result) + img_req->result = result; } static void rbd_img_end_request(struct rbd_img_request *img_req) { rbd_assert(!test_bit(IMG_REQ_CHILD, &img_req->flags)); - rbd_assert((!img_req->result && - img_req->xferred == blk_rq_bytes(img_req->rq)) || - (img_req->result < 0 && !img_req->xferred)); blk_mq_end_request(img_req->rq, errno_to_blk_status(img_req->result)); rbd_img_request_put(img_req); } -static void rbd_obj_handle_request(struct rbd_obj_request *obj_req) +static void rbd_obj_handle_request(struct rbd_obj_request *obj_req, int result) { struct rbd_img_request *img_req; again: - if (!__rbd_obj_handle_request(obj_req)) + if (!__rbd_obj_handle_request(obj_req, &result)) return; img_req = obj_req->img_request; spin_lock(&img_req->completion_lock); - rbd_obj_end_request(obj_req); + rbd_obj_end_request(obj_req, result); rbd_assert(img_req->pending_count); if (--img_req->pending_count) { spin_unlock(&img_req->completion_lock); @@ -2789,9 +2754,11 @@ static void rbd_obj_handle_request(struct rbd_obj_request *obj_req) } spin_unlock(&img_req->completion_lock); + rbd_assert(img_req->result <= 0); if (test_bit(IMG_REQ_CHILD, &img_req->flags)) { obj_req = img_req->obj_request; - rbd_img_end_child_request(img_req); + result = img_req->result ?: rbd_obj_img_extents_bytes(obj_req); + rbd_img_request_put(img_req); goto again; } rbd_img_end_request(img_req); From patchwork Tue Jun 25 14:40:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015757 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 676CC1575 for ; Tue, 25 Jun 2019 14:41:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 57F5328A1D for ; Tue, 25 Jun 2019 14:41:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4BC4228802; Tue, 25 Jun 2019 14:41:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3B6A428657 for ; Tue, 25 Jun 2019 14:41:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731150AbfFYOlM (ORCPT ); Tue, 25 Jun 2019 10:41:12 -0400 Received: from mail-wm1-f67.google.com ([209.85.128.67]:34504 "EHLO mail-wm1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730689AbfFYOlM (ORCPT ); Tue, 25 Jun 2019 10:41:12 -0400 Received: by mail-wm1-f67.google.com with SMTP id w9so2450062wmd.1 for ; Tue, 25 Jun 2019 07:41:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=FQSup7rRjtMCyrYz7PYIx6nI6ADEVg6xf1gvoVYFCbs=; b=WMq8aSGF5jz/WexyoXGgZYTI4H5wim2o9QgNq50bbeb+Ov6FLbrXdkCLtegKFRcNqy wHKj1426lkgkODFFL7sN7KahpnALsqyAOqPgLTEH9Y14Ut+G2xK5unzjYHg1f9CJ2hOh Ke3nZEhdXubMeK5Vk2Sy7hBXMOSAJcYz3R6ybnFXyGvhhfb5aPYH6zjYNZ3nJ4lyk3O9 WcC4LaPgc9G821wlDdMv91d9hhIBVHLKKEz9maahlHebqqrxFyJevKh4CrWCchFaS+hb isyMjUrJbTnpMehtA8api9+qxge0zV5JfsY+qqrM+YsOjddJptFoKkDAONUXSGPyfFwq YslQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=FQSup7rRjtMCyrYz7PYIx6nI6ADEVg6xf1gvoVYFCbs=; b=D8q408G+O7qY0ZW8u3/8uq1kzTqBbpzS1MtqK6ThaChc4vt/IOh8XYEDhhoHuRV94p 7bPk8BTZwTeOTUxefdi8ji4Tgo09PRH0QW7NwESCmq/GeYvEmDlnVjG2o7rUHNIpElTB 4xPlcHPLsOiXmtTHBb0lAo3OZSk6qydddQEU3HZe/h2PAVfsEkfF4FUvBbwkVYhsMGsU /ptlolOdDtPzKE19bZKf++yUbOED9ka0u+DuGlK52XpCVPghr5n7LycBHhN2iH4uayhn PAqz2uXmaESONNHQxbNMhY6T8kRg1gKlDtKht2B+lQNqAlJgwGYRdZ6xuqLu0C5+Minc /ugg== X-Gm-Message-State: APjAAAU64+R2+cbK7NzElzh/VeyWgqiYjdR+uBZwyFax0OhbOjbjetbX E0CnB9DxoVO9qc6vFEzRfT/GiK8ksvM= X-Google-Smtp-Source: APXvYqzMVohmjo5TPCTz/CrbfKwbHThGL/hz8nzBbcwjJXfIZH6X8rQ2P1vb+2B0H4TnDxY0Sl8cAg== X-Received: by 2002:a1c:740f:: with SMTP id p15mr20108394wmc.103.1561473669313; Tue, 25 Jun 2019 07:41:09 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.08 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:08 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 02/20] rbd: replace obj_req->tried_parent with obj_req->read_state Date: Tue, 25 Jun 2019 16:40:53 +0200 Message-Id: <20190625144111.11270-3-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Make rbd_obj_handle_read() look like a state machine and get rid of the necessity to patch result in rbd_obj_handle_request(), completing the removal of obj_req->xferred and img_req->xferred. Signed-off-by: Ilya Dryomov --- drivers/block/rbd.c | 82 +++++++++++++++++++++++++-------------------- 1 file changed, 46 insertions(+), 36 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index a9b0b23148f9..7925b2fdde79 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -219,6 +219,11 @@ enum obj_operation_type { OBJ_OP_ZEROOUT, }; +enum rbd_obj_read_state { + RBD_OBJ_READ_OBJECT = 1, + RBD_OBJ_READ_PARENT, +}; + /* * Writes go through the following state machine to deal with * layering: @@ -255,7 +260,7 @@ enum rbd_obj_write_state { struct rbd_obj_request { struct ceph_object_extent ex; union { - bool tried_parent; /* for reads */ + enum rbd_obj_read_state read_state; /* for reads */ enum rbd_obj_write_state write_state; /* for writes */ }; @@ -1794,6 +1799,7 @@ static int rbd_obj_setup_read(struct rbd_obj_request *obj_req) rbd_osd_req_setup_data(obj_req, 0); rbd_osd_req_format_read(obj_req); + obj_req->read_state = RBD_OBJ_READ_OBJECT; return 0; } @@ -2402,44 +2408,48 @@ static bool rbd_obj_handle_read(struct rbd_obj_request *obj_req, int *result) struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; int ret; - if (*result == -ENOENT && - rbd_dev->parent_overlap && !obj_req->tried_parent) { - /* reverse map this object extent onto the parent */ - ret = rbd_obj_calc_img_extents(obj_req, false); - if (ret) { - *result = ret; - return true; - } - - if (obj_req->num_img_extents) { - obj_req->tried_parent = true; - ret = rbd_obj_read_from_parent(obj_req); + switch (obj_req->read_state) { + case RBD_OBJ_READ_OBJECT: + if (*result == -ENOENT && rbd_dev->parent_overlap) { + /* reverse map this object extent onto the parent */ + ret = rbd_obj_calc_img_extents(obj_req, false); if (ret) { *result = ret; return true; } - return false; + if (obj_req->num_img_extents) { + ret = rbd_obj_read_from_parent(obj_req); + if (ret) { + *result = ret; + return true; + } + obj_req->read_state = RBD_OBJ_READ_PARENT; + return false; + } } - } - /* - * -ENOENT means a hole in the image -- zero-fill the entire - * length of the request. A short read also implies zero-fill - * to the end of the request. - */ - if (*result == -ENOENT) { - rbd_obj_zero_range(obj_req, 0, obj_req->ex.oe_len); - *result = 0; - } else if (*result >= 0) { - if (*result < obj_req->ex.oe_len) - rbd_obj_zero_range(obj_req, *result, - obj_req->ex.oe_len - *result); - else - rbd_assert(*result == obj_req->ex.oe_len); - *result = 0; + /* + * -ENOENT means a hole in the image -- zero-fill the entire + * length of the request. A short read also implies zero-fill + * to the end of the request. + */ + if (*result == -ENOENT) { + rbd_obj_zero_range(obj_req, 0, obj_req->ex.oe_len); + *result = 0; + } else if (*result >= 0) { + if (*result < obj_req->ex.oe_len) + rbd_obj_zero_range(obj_req, *result, + obj_req->ex.oe_len - *result); + else + rbd_assert(*result == obj_req->ex.oe_len); + *result = 0; + } + return true; + case RBD_OBJ_READ_PARENT: + return true; + default: + BUG(); } - - return true; } /* @@ -2658,11 +2668,11 @@ static bool rbd_obj_handle_write(struct rbd_obj_request *obj_req, int *result) case RBD_OBJ_WRITE_COPYUP_OPS: return true; case RBD_OBJ_WRITE_READ_FROM_PARENT: - if (*result < 0) + if (*result) return true; - rbd_assert(*result); - ret = rbd_obj_issue_copyup(obj_req, *result); + ret = rbd_obj_issue_copyup(obj_req, + rbd_obj_img_extents_bytes(obj_req)); if (ret) { *result = ret; return true; @@ -2757,7 +2767,7 @@ static void rbd_obj_handle_request(struct rbd_obj_request *obj_req, int result) rbd_assert(img_req->result <= 0); if (test_bit(IMG_REQ_CHILD, &img_req->flags)) { obj_req = img_req->obj_request; - result = img_req->result ?: rbd_obj_img_extents_bytes(obj_req); + result = img_req->result; rbd_img_request_put(img_req); goto again; } From patchwork Tue Jun 25 14:40:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015761 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C714E112C for ; Tue, 25 Jun 2019 14:41:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B496728972 for ; Tue, 25 Jun 2019 14:41:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A8050287C5; Tue, 25 Jun 2019 14:41:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E1E43287C5 for ; Tue, 25 Jun 2019 14:41:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731291AbfFYOlP (ORCPT ); Tue, 25 Jun 2019 10:41:15 -0400 Received: from mail-wm1-f65.google.com ([209.85.128.65]:38091 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730758AbfFYOlO (ORCPT ); Tue, 25 Jun 2019 10:41:14 -0400 Received: by mail-wm1-f65.google.com with SMTP id s15so3245303wmj.3 for ; Tue, 25 Jun 2019 07:41:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=hjiKF6W5YArRGMlIr6RuVtD15rOklVYGBYBVGE8NRJE=; b=V3+LLvR30bDvK+Z7JpXkV4dPhMdvMCWXKp90ZDsg8c7sY01uqOh/sgKK1g4INquJoA brX4yBEE1C6n4IeXYPC0j02BUDjgwvZpTXzxLwofNOQk4JwYAMuAspZXfch+C2SWktSW FWmxtrZMoS7SzzvsjXAMK8FAu2EBNlGWDxAeQ6g0GuCModLHOowa2j6qI2QhoBUFpK8e tiEOcFNIYWKWK2xskV0EO3rQW1DivFxSIQ0A5haYYRyGwZjwtxNyWI/Ox2zRJ4GSE4L0 5qxQGRxMoRlhOs2/aMIthuTwzXZW1zai3wzr1Tw1cf5JI5+GEB4MiSWtpZ2GVZ9u6PH6 r2Ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=hjiKF6W5YArRGMlIr6RuVtD15rOklVYGBYBVGE8NRJE=; b=dXHAnFD9RLB+yscIO7fQIo8V4GaqTo7rgaNZZUqdo74p8vuQVFjH4fdBLGarPR+g5F 2S5yVAUkBd/e2e0VAFm47vBpgs+do19EUz6fn2s/dxHVVfUueRo8Fq/qtHxQC/kSe7nE g76DzcsDFBlTO7HDoP+lr/0HlxyG2DWEw1jwv5XMHBZWwg8ni4Q/iRfvagLQVUPcm9Mb 3C2Qk0GHCCuHsQQW6m1sAJgrXQNeLMk5HlgcPBkNEUuHCht5kJE8se3JGn0eF9hI6A99 e0FNj9ch7zyLrOaElhfXOSaVUGgggZrJHFfIT8Am7+2tO5rap2Nkei/y5B1TWgZbvm2O FWVw== X-Gm-Message-State: APjAAAVSBr638udd1MFxQjRv3ZjI+UW0Rjd3pFceaZR3NM1MkyvWyiSs CmVbWB3o4Pxb2YEbactoz2b0EuZW0q8= X-Google-Smtp-Source: APXvYqzFsU//I1E/GcdWZI+LfzLSvqSuJOtEVOK4LPGWn3HeMzEKJGX/sacDFNeCGCI6lzUd68movw== X-Received: by 2002:a1c:e341:: with SMTP id a62mr20858748wmh.165.1561473670238; Tue, 25 Jun 2019 07:41:10 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.09 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:09 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 03/20] rbd: get rid of RBD_OBJ_WRITE_{FLAT,GUARD} Date: Tue, 25 Jun 2019 16:40:54 +0200 Message-Id: <20190625144111.11270-4-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In preparation for moving OSD request allocation and submission into object request state machines, get rid of RBD_OBJ_WRITE_{FLAT,GUARD}. We would need to start in a new state, whether the request is guarded or not. Unify them into RBD_OBJ_WRITE_OBJECT and pass guard info through obj_req->flags. While at it, make our ENOENT handling a little more precise: only hide ENOENT when it is actually expected, that is on delete. Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang --- drivers/block/rbd.c | 112 ++++++++++++++++++++++++-------------------- 1 file changed, 60 insertions(+), 52 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 7925b2fdde79..488da877a2bb 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -219,6 +219,9 @@ enum obj_operation_type { OBJ_OP_ZEROOUT, }; +#define RBD_OBJ_FLAG_DELETION (1U << 0) +#define RBD_OBJ_FLAG_COPYUP_ENABLED (1U << 1) + enum rbd_obj_read_state { RBD_OBJ_READ_OBJECT = 1, RBD_OBJ_READ_PARENT, @@ -250,8 +253,7 @@ enum rbd_obj_read_state { * even if there is a parent). */ enum rbd_obj_write_state { - RBD_OBJ_WRITE_FLAT = 1, - RBD_OBJ_WRITE_GUARD, + RBD_OBJ_WRITE_OBJECT = 1, RBD_OBJ_WRITE_READ_FROM_PARENT, RBD_OBJ_WRITE_COPYUP_EMPTY_SNAPC, RBD_OBJ_WRITE_COPYUP_OPS, @@ -259,6 +261,7 @@ enum rbd_obj_write_state { struct rbd_obj_request { struct ceph_object_extent ex; + unsigned int flags; /* RBD_OBJ_FLAG_* */ union { enum rbd_obj_read_state read_state; /* for reads */ enum rbd_obj_write_state write_state; /* for writes */ @@ -1858,7 +1861,6 @@ static void __rbd_obj_setup_write(struct rbd_obj_request *obj_req, static int rbd_obj_setup_write(struct rbd_obj_request *obj_req) { unsigned int num_osd_ops, which = 0; - bool need_guard; int ret; /* reverse map the entire object onto the parent */ @@ -1866,23 +1868,24 @@ static int rbd_obj_setup_write(struct rbd_obj_request *obj_req) if (ret) return ret; - need_guard = rbd_obj_copyup_enabled(obj_req); - num_osd_ops = need_guard + count_write_ops(obj_req); + if (rbd_obj_copyup_enabled(obj_req)) + obj_req->flags |= RBD_OBJ_FLAG_COPYUP_ENABLED; + + num_osd_ops = count_write_ops(obj_req); + if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED) + num_osd_ops++; /* stat */ obj_req->osd_req = rbd_osd_req_create(obj_req, num_osd_ops); if (!obj_req->osd_req) return -ENOMEM; - if (need_guard) { + if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED) { ret = __rbd_obj_setup_stat(obj_req, which++); if (ret) return ret; - - obj_req->write_state = RBD_OBJ_WRITE_GUARD; - } else { - obj_req->write_state = RBD_OBJ_WRITE_FLAT; } + obj_req->write_state = RBD_OBJ_WRITE_OBJECT; __rbd_obj_setup_write(obj_req, which); return 0; } @@ -1921,11 +1924,15 @@ static int rbd_obj_setup_discard(struct rbd_obj_request *obj_req) if (ret) return ret; + if (rbd_obj_is_entire(obj_req) && !obj_req->num_img_extents) + obj_req->flags |= RBD_OBJ_FLAG_DELETION; + obj_req->osd_req = rbd_osd_req_create(obj_req, 1); if (!obj_req->osd_req) return -ENOMEM; if (rbd_obj_is_entire(obj_req) && !obj_req->num_img_extents) { + rbd_assert(obj_req->flags & RBD_OBJ_FLAG_DELETION); osd_req_op_init(obj_req->osd_req, 0, CEPH_OSD_OP_DELETE, 0); } else { dout("%s %p %llu~%llu -> %llu~%llu\n", __func__, @@ -1936,7 +1943,7 @@ static int rbd_obj_setup_discard(struct rbd_obj_request *obj_req) off, next_off - off, 0, 0); } - obj_req->write_state = RBD_OBJ_WRITE_FLAT; + obj_req->write_state = RBD_OBJ_WRITE_OBJECT; rbd_osd_req_format_write(obj_req); return 0; } @@ -1961,11 +1968,12 @@ static void __rbd_obj_setup_zeroout(struct rbd_obj_request *obj_req, if (rbd_obj_is_entire(obj_req)) { if (obj_req->num_img_extents) { - if (!rbd_obj_copyup_enabled(obj_req)) + if (!(obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED)) osd_req_op_init(obj_req->osd_req, which++, CEPH_OSD_OP_CREATE, 0); opcode = CEPH_OSD_OP_TRUNCATE; } else { + rbd_assert(obj_req->flags & RBD_OBJ_FLAG_DELETION); osd_req_op_init(obj_req->osd_req, which++, CEPH_OSD_OP_DELETE, 0); opcode = 0; @@ -1986,7 +1994,6 @@ static void __rbd_obj_setup_zeroout(struct rbd_obj_request *obj_req, static int rbd_obj_setup_zeroout(struct rbd_obj_request *obj_req) { unsigned int num_osd_ops, which = 0; - bool need_guard; int ret; /* reverse map the entire object onto the parent */ @@ -1994,23 +2001,28 @@ static int rbd_obj_setup_zeroout(struct rbd_obj_request *obj_req) if (ret) return ret; - need_guard = rbd_obj_copyup_enabled(obj_req); - num_osd_ops = need_guard + count_zeroout_ops(obj_req); + if (rbd_obj_copyup_enabled(obj_req)) + obj_req->flags |= RBD_OBJ_FLAG_COPYUP_ENABLED; + if (!obj_req->num_img_extents) { + if (rbd_obj_is_entire(obj_req)) + obj_req->flags |= RBD_OBJ_FLAG_DELETION; + } + + num_osd_ops = count_zeroout_ops(obj_req); + if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED) + num_osd_ops++; /* stat */ obj_req->osd_req = rbd_osd_req_create(obj_req, num_osd_ops); if (!obj_req->osd_req) return -ENOMEM; - if (need_guard) { + if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED) { ret = __rbd_obj_setup_stat(obj_req, which++); if (ret) return ret; - - obj_req->write_state = RBD_OBJ_WRITE_GUARD; - } else { - obj_req->write_state = RBD_OBJ_WRITE_FLAT; } + obj_req->write_state = RBD_OBJ_WRITE_OBJECT; __rbd_obj_setup_zeroout(obj_req, which); return 0; } @@ -2617,6 +2629,11 @@ static int setup_copyup_bvecs(struct rbd_obj_request *obj_req, u64 obj_overlap) return 0; } +/* + * The target object doesn't exist. Read the data for the entire + * target object up to the overlap point (if any) from the parent, + * so we can use it for a copyup. + */ static int rbd_obj_handle_write_guard(struct rbd_obj_request *obj_req) { struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; @@ -2649,22 +2666,24 @@ static bool rbd_obj_handle_write(struct rbd_obj_request *obj_req, int *result) int ret; switch (obj_req->write_state) { - case RBD_OBJ_WRITE_GUARD: + case RBD_OBJ_WRITE_OBJECT: if (*result == -ENOENT) { + if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED) { + ret = rbd_obj_handle_write_guard(obj_req); + if (ret) { + *result = ret; + return true; + } + return false; + } /* - * The target object doesn't exist. Read the data for - * the entire target object up to the overlap point (if - * any) from the parent, so we can use it for a copyup. + * On a non-existent object: + * delete - -ENOENT, truncate/zero - 0 */ - ret = rbd_obj_handle_write_guard(obj_req); - if (ret) { - *result = ret; - return true; - } - return false; + if (obj_req->flags & RBD_OBJ_FLAG_DELETION) + *result = 0; } /* fall through */ - case RBD_OBJ_WRITE_FLAT: case RBD_OBJ_WRITE_COPYUP_OPS: return true; case RBD_OBJ_WRITE_READ_FROM_PARENT: @@ -2695,31 +2714,20 @@ static bool rbd_obj_handle_write(struct rbd_obj_request *obj_req, int *result) } /* - * Returns true if @obj_req is completed, or false otherwise. + * Return true if @obj_req is completed. */ static bool __rbd_obj_handle_request(struct rbd_obj_request *obj_req, int *result) { - switch (obj_req->img_request->op_type) { - case OBJ_OP_READ: - return rbd_obj_handle_read(obj_req, result); - case OBJ_OP_WRITE: - return rbd_obj_handle_write(obj_req, result); - case OBJ_OP_DISCARD: - case OBJ_OP_ZEROOUT: - if (rbd_obj_handle_write(obj_req, result)) { - /* - * Hide -ENOENT from delete/truncate/zero -- discarding - * a non-existent object is not a problem. - */ - if (*result == -ENOENT) - *result = 0; - return true; - } - return false; - default: - BUG(); - } + struct rbd_img_request *img_req = obj_req->img_request; + bool done; + + if (!rbd_img_is_write(img_req)) + done = rbd_obj_handle_read(obj_req, result); + else + done = rbd_obj_handle_write(obj_req, result); + + return done; } static void rbd_obj_end_request(struct rbd_obj_request *obj_req, int result) From patchwork Tue Jun 25 14:40:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015759 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D74C01398 for ; Tue, 25 Jun 2019 14:41:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C94C5287C5 for ; Tue, 25 Jun 2019 14:41:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BDFE3289ED; Tue, 25 Jun 2019 14:41:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 39A4428657 for ; Tue, 25 Jun 2019 14:41:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731534AbfFYOlP (ORCPT ); Tue, 25 Jun 2019 10:41:15 -0400 Received: from mail-wr1-f66.google.com ([209.85.221.66]:44298 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730689AbfFYOlO (ORCPT ); Tue, 25 Jun 2019 10:41:14 -0400 Received: by mail-wr1-f66.google.com with SMTP id r16so18159312wrl.11 for ; Tue, 25 Jun 2019 07:41:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Qe/U9AKfQAWDBylF7c8/i1SeuzUZ8V2J8WgofUcdj6s=; b=ZbGgUIV2UA9IDOdEBPOO6oYnqkxCuYOVSucuEoqV7v3gt4CjDzZQCNpyriWGvfK1XD Yy051huoMoo4ff6DaVBriuhXPE1nZC+iB87CkQn+sPbTaPFk084I6DBbSEQY80MfBzOS cwuF/D8hn6ma/wbhKLSNSmrWwK+xQg8S+Kjb71RcxMxMj9qhm45Tg3nqMDML0P3kbn6t qdsc/8BLHGT5uCPD3s+mjr0JdPbkmVJoSpWTVy0NPjlzFLzZBoBFvnJqriJKPJZCi2Vn twS84s6ZY0cM/HpoOoEB3tKCP3BDctACE9q7Ih29SjsZrFAfbkw1zNsMNY1FtIOPvH0h 0FCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Qe/U9AKfQAWDBylF7c8/i1SeuzUZ8V2J8WgofUcdj6s=; b=CGE2tyVAZP61Ccg47JCg7OaSGiWmKgFkazTBFIn5bnzTlBJyL5DJx4Ep8XZ6PmJ4Li EKx2gbRTvguR55DtvOOJTFuB3YD0qCcpLw8ceBoxHDytjONQc9IpuL0MSXVKTbH5LSWJ wE3wqDjmL3f/y+OfhCneceKrJLcuCpKtJCbGkmpRicXd+Gk4jOW6AQMaqpgIyi1k3Kh/ FQjwtpBSF3YiD9M0PStx9sWyPGEbt79k9GJasEvi2yMyczhzoKvKCLrqGfo4R3z0Hr// 5Tw8gqVRSDPiYcbJQJcW6v/PMK6t/bkdckNCGOHh6O0FNDpAeL7mqx2xxr/MZ8yBZr5I Rb5Q== X-Gm-Message-State: APjAAAU44PfTKZZSzDr9bCWh3WORFsn+tFxXXuTnl5Cqa9beJBTLg+pn TDOjuLaNRUsazJzLE0Ayb57pI4XXAmk= X-Google-Smtp-Source: APXvYqw7CJXT20h++24GL0hwf9Fp/lNuuNTNtso6pNPYPL9ZgmbEB7hj4/sKejdKkGWF2OzN2wkZCQ== X-Received: by 2002:adf:e311:: with SMTP id b17mr22937456wrj.11.1561473671414; Tue, 25 Jun 2019 07:41:11 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.10 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:10 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 04/20] rbd: move OSD request submission into object request state machines Date: Tue, 25 Jun 2019 16:40:55 +0200 Message-Id: <20190625144111.11270-5-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Start eliminating asymmetry where the initial OSD request is allocated and submitted from outside the state machine, making error handling and restarts harder than they could be. This commit deals with submission, a commit that deals with allocation will follow. Note that this commit adds parent chain recursion on the submission side: rbd_img_request_submit rbd_obj_handle_request __rbd_obj_handle_request rbd_obj_handle_read rbd_obj_handle_write_guard rbd_obj_read_from_parent rbd_img_request_submit This will be fixed in the next commit. Signed-off-by: Ilya Dryomov --- drivers/block/rbd.c | 60 ++++++++++++++++++++++++++++++++++++--------- 1 file changed, 49 insertions(+), 11 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 488da877a2bb..9c6be82353c0 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -223,7 +223,8 @@ enum obj_operation_type { #define RBD_OBJ_FLAG_COPYUP_ENABLED (1U << 1) enum rbd_obj_read_state { - RBD_OBJ_READ_OBJECT = 1, + RBD_OBJ_READ_START = 1, + RBD_OBJ_READ_OBJECT, RBD_OBJ_READ_PARENT, }; @@ -253,7 +254,8 @@ enum rbd_obj_read_state { * even if there is a parent). */ enum rbd_obj_write_state { - RBD_OBJ_WRITE_OBJECT = 1, + RBD_OBJ_WRITE_START = 1, + RBD_OBJ_WRITE_OBJECT, RBD_OBJ_WRITE_READ_FROM_PARENT, RBD_OBJ_WRITE_COPYUP_EMPTY_SNAPC, RBD_OBJ_WRITE_COPYUP_OPS, @@ -284,6 +286,7 @@ struct rbd_obj_request { struct ceph_osd_request *osd_req; + struct mutex state_mutex; struct kref kref; }; @@ -1560,6 +1563,7 @@ static struct rbd_obj_request *rbd_obj_request_create(void) return NULL; ceph_object_extent_init(&obj_request->ex); + mutex_init(&obj_request->state_mutex); kref_init(&obj_request->kref); dout("%s %p\n", __func__, obj_request); @@ -1802,7 +1806,7 @@ static int rbd_obj_setup_read(struct rbd_obj_request *obj_req) rbd_osd_req_setup_data(obj_req, 0); rbd_osd_req_format_read(obj_req); - obj_req->read_state = RBD_OBJ_READ_OBJECT; + obj_req->read_state = RBD_OBJ_READ_START; return 0; } @@ -1885,7 +1889,7 @@ static int rbd_obj_setup_write(struct rbd_obj_request *obj_req) return ret; } - obj_req->write_state = RBD_OBJ_WRITE_OBJECT; + obj_req->write_state = RBD_OBJ_WRITE_START; __rbd_obj_setup_write(obj_req, which); return 0; } @@ -1943,7 +1947,7 @@ static int rbd_obj_setup_discard(struct rbd_obj_request *obj_req) off, next_off - off, 0, 0); } - obj_req->write_state = RBD_OBJ_WRITE_OBJECT; + obj_req->write_state = RBD_OBJ_WRITE_START; rbd_osd_req_format_write(obj_req); return 0; } @@ -2022,7 +2026,7 @@ static int rbd_obj_setup_zeroout(struct rbd_obj_request *obj_req) return ret; } - obj_req->write_state = RBD_OBJ_WRITE_OBJECT; + obj_req->write_state = RBD_OBJ_WRITE_START; __rbd_obj_setup_zeroout(obj_req, which); return 0; } @@ -2363,11 +2367,17 @@ static void rbd_img_request_submit(struct rbd_img_request *img_request) rbd_img_request_get(img_request); for_each_obj_request(img_request, obj_request) - rbd_obj_request_submit(obj_request); + rbd_obj_handle_request(obj_request, 0); rbd_img_request_put(img_request); } +static int rbd_obj_read_object(struct rbd_obj_request *obj_req) +{ + rbd_obj_request_submit(obj_req); + return 0; +} + static int rbd_obj_read_from_parent(struct rbd_obj_request *obj_req) { struct rbd_img_request *img_req = obj_req->img_request; @@ -2415,12 +2425,22 @@ static int rbd_obj_read_from_parent(struct rbd_obj_request *obj_req) return 0; } -static bool rbd_obj_handle_read(struct rbd_obj_request *obj_req, int *result) +static bool rbd_obj_advance_read(struct rbd_obj_request *obj_req, int *result) { struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; int ret; switch (obj_req->read_state) { + case RBD_OBJ_READ_START: + rbd_assert(!*result); + + ret = rbd_obj_read_object(obj_req); + if (ret) { + *result = ret; + return true; + } + obj_req->read_state = RBD_OBJ_READ_OBJECT; + return false; case RBD_OBJ_READ_OBJECT: if (*result == -ENOENT && rbd_dev->parent_overlap) { /* reverse map this object extent onto the parent */ @@ -2464,6 +2484,12 @@ static bool rbd_obj_handle_read(struct rbd_obj_request *obj_req, int *result) } } +static int rbd_obj_write_object(struct rbd_obj_request *obj_req) +{ + rbd_obj_request_submit(obj_req); + return 0; +} + /* * copyup_bvecs pages are never highmem pages */ @@ -2661,11 +2687,21 @@ static int rbd_obj_handle_write_guard(struct rbd_obj_request *obj_req) return rbd_obj_read_from_parent(obj_req); } -static bool rbd_obj_handle_write(struct rbd_obj_request *obj_req, int *result) +static bool rbd_obj_advance_write(struct rbd_obj_request *obj_req, int *result) { int ret; switch (obj_req->write_state) { + case RBD_OBJ_WRITE_START: + rbd_assert(!*result); + + ret = rbd_obj_write_object(obj_req); + if (ret) { + *result = ret; + return true; + } + obj_req->write_state = RBD_OBJ_WRITE_OBJECT; + return false; case RBD_OBJ_WRITE_OBJECT: if (*result == -ENOENT) { if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED) { @@ -2722,10 +2758,12 @@ static bool __rbd_obj_handle_request(struct rbd_obj_request *obj_req, struct rbd_img_request *img_req = obj_req->img_request; bool done; + mutex_lock(&obj_req->state_mutex); if (!rbd_img_is_write(img_req)) - done = rbd_obj_handle_read(obj_req, result); + done = rbd_obj_advance_read(obj_req, result); else - done = rbd_obj_handle_write(obj_req, result); + done = rbd_obj_advance_write(obj_req, result); + mutex_unlock(&obj_req->state_mutex); return done; } From patchwork Tue Jun 25 14:40:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015763 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 16C1D1575 for ; Tue, 25 Jun 2019 14:41:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 07EDA28972 for ; Tue, 25 Jun 2019 14:41:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EF8D228A4B; Tue, 25 Jun 2019 14:41:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3942828C01 for ; Tue, 25 Jun 2019 14:41:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731213AbfFYOlQ (ORCPT ); Tue, 25 Jun 2019 10:41:16 -0400 Received: from mail-wm1-f67.google.com ([209.85.128.67]:40402 "EHLO mail-wm1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731158AbfFYOlQ (ORCPT ); Tue, 25 Jun 2019 10:41:16 -0400 Received: by mail-wm1-f67.google.com with SMTP id v19so3232806wmj.5 for ; Tue, 25 Jun 2019 07:41:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=pkYRHfqmWnZbEbCIZrYKnZFLr+jbOQKUzgVZY5euJpo=; b=tjzBRAakFpkcXQB0hswowZo8D+bv0B9oCt7PSx8roNsaP+9G/e7wOnRADWoYV1Hl4W g37hTObV4kMC3XMKoX2umUTElXkQJc1BXvTvmoFH+pdsAEkBrkVloYItZn4ZhjXXb2ph Pd7LKX2yThFE4GWrDL5sYghZvtIxuc3o42/DPcJ//XUfvbCtJSMwOlh2b6eI4Eyz7ou1 KvTMEC09iJ2OWRh6RSqdcIjMR4QEd6qY8v3LnMTHlzP+Ye2UxIGZ4It6DGgQPeYNqzYB i1/jhHAuffC420T359QMkBPCI9GatY/VvJE/YYRaMHN/vejzvxYIwT3IMOwJH1Lop8pT zfUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=pkYRHfqmWnZbEbCIZrYKnZFLr+jbOQKUzgVZY5euJpo=; b=DQROkHdxyOJgpasxsQrYhaJkZ9RCFlTt08DUvayFBKmERYIv8xM7IFxjFKydEmt5+f S+QqiefsSi6LT0hr67KtJmcxjDNOiPcwGAXrrPoc0wS9qVDRdMSWm6EIeiRK327o8ax6 EGaGkyk7+eUjFsTs+Z3m8+oSUw55JIaJezm2sgFT2Zh/wC7bPmxmYZZvxc8k9E6yNu7o mZOmOxoN6bxaCu5QcR6Ou8iYUumU2gTQ9DlL+hJ4AQrDcyL0mkv8izbhlTaFzYbX6T4N 5X9wNMg+GJZHyJ+oQwHBaU+Xf5653/czZQz3KHGma+9rkjqYBaMHkKfN9ATPJD0ZIBHJ rqQQ== X-Gm-Message-State: APjAAAVhXH5BxwomJ0ZMYVT1MK0cTXKa+n1VSvk/FM7vIEmynp8kXM8Y jB564M/zalyjJ/yLnNSAF0DhobDxGcc= X-Google-Smtp-Source: APXvYqzgh44OlGHsI6IbrAZsVn7QLPNBulI/5WJZBtjKSQNRzz8jl4pB6affm8DZARqet2z8wAoDdA== X-Received: by 2002:a7b:ce10:: with SMTP id m16mr19687117wmc.21.1561473672907; Tue, 25 Jun 2019 07:41:12 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.11 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:12 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 05/20] rbd: introduce image request state machine Date: Tue, 25 Jun 2019 16:40:56 +0200 Message-Id: <20190625144111.11270-6-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Make it possible to schedule image requests on a workqueue. This fixes parent chain recursion added in the previous commit and lays the ground for exclusive lock wait/wake improvements. The "wait for pending subrequests and report first nonzero result" code is generalized to be used by object request state machine. Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang --- drivers/block/rbd.c | 194 +++++++++++++++++++++++++++++++------------- 1 file changed, 137 insertions(+), 57 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 9c6be82353c0..51dd1b99c242 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -203,6 +203,11 @@ struct rbd_client { struct list_head node; }; +struct pending_result { + int result; /* first nonzero result */ + int num_pending; +}; + struct rbd_img_request; enum obj_request_type { @@ -295,11 +300,18 @@ enum img_req_flags { IMG_REQ_LAYERED, /* ENOENT handling: normal = 0, layered = 1 */ }; +enum rbd_img_state { + RBD_IMG_START = 1, + __RBD_IMG_OBJECT_REQUESTS, + RBD_IMG_OBJECT_REQUESTS, +}; + struct rbd_img_request { struct rbd_device *rbd_dev; enum obj_operation_type op_type; enum obj_request_type data_type; unsigned long flags; + enum rbd_img_state state; union { u64 snap_id; /* for reads */ struct ceph_snap_context *snapc; /* for writes */ @@ -308,12 +320,13 @@ struct rbd_img_request { struct request *rq; /* block request */ struct rbd_obj_request *obj_request; /* obj req initiator */ }; - spinlock_t completion_lock; - int result; /* first nonzero obj_request result */ struct list_head object_extents; /* obj_req.ex structs */ - u32 pending_count; + struct mutex state_mutex; + struct pending_result pending; + struct work_struct work; + int work_result; struct kref kref; }; @@ -592,6 +605,23 @@ static int _rbd_dev_v2_snap_features(struct rbd_device *rbd_dev, u64 snap_id, u64 *snap_features); static void rbd_obj_handle_request(struct rbd_obj_request *obj_req, int result); +static void rbd_img_handle_request(struct rbd_img_request *img_req, int result); + +/* + * Return true if nothing else is pending. + */ +static bool pending_result_dec(struct pending_result *pending, int *result) +{ + rbd_assert(pending->num_pending > 0); + + if (*result && !pending->result) + pending->result = *result; + if (--pending->num_pending) + return false; + + *result = pending->result; + return true; +} static int rbd_open(struct block_device *bdev, fmode_t mode) { @@ -1350,13 +1380,6 @@ static void rbd_obj_request_put(struct rbd_obj_request *obj_request) kref_put(&obj_request->kref, rbd_obj_request_destroy); } -static void rbd_img_request_get(struct rbd_img_request *img_request) -{ - dout("%s: img %p (was %d)\n", __func__, img_request, - kref_read(&img_request->kref)); - kref_get(&img_request->kref); -} - static void rbd_img_request_destroy(struct kref *kref); static void rbd_img_request_put(struct rbd_img_request *img_request) { @@ -1373,7 +1396,6 @@ static inline void rbd_img_obj_request_add(struct rbd_img_request *img_request, /* Image request now owns object's original reference */ obj_request->img_request = img_request; - img_request->pending_count++; dout("%s: img %p obj %p\n", __func__, img_request, obj_request); } @@ -1694,8 +1716,8 @@ static struct rbd_img_request *rbd_img_request_create( if (rbd_dev_parent_get(rbd_dev)) img_request_layered_set(img_request); - spin_lock_init(&img_request->completion_lock); INIT_LIST_HEAD(&img_request->object_extents); + mutex_init(&img_request->state_mutex); kref_init(&img_request->kref); dout("%s: rbd_dev %p %s -> img %p\n", __func__, rbd_dev, @@ -2061,7 +2083,6 @@ static int __rbd_img_fill_request(struct rbd_img_request *img_req) if (ret < 0) return ret; if (ret > 0) { - img_req->pending_count--; rbd_img_obj_request_del(img_req, obj_req); continue; } @@ -2071,6 +2092,7 @@ static int __rbd_img_fill_request(struct rbd_img_request *img_req) return ret; } + img_req->state = RBD_IMG_START; return 0; } @@ -2359,17 +2381,19 @@ static int rbd_img_fill_from_bvecs(struct rbd_img_request *img_req, &it); } -static void rbd_img_request_submit(struct rbd_img_request *img_request) +static void rbd_img_handle_request_work(struct work_struct *work) { - struct rbd_obj_request *obj_request; + struct rbd_img_request *img_req = + container_of(work, struct rbd_img_request, work); - dout("%s: img %p\n", __func__, img_request); - - rbd_img_request_get(img_request); - for_each_obj_request(img_request, obj_request) - rbd_obj_handle_request(obj_request, 0); + rbd_img_handle_request(img_req, img_req->work_result); +} - rbd_img_request_put(img_request); +static void rbd_img_schedule(struct rbd_img_request *img_req, int result) +{ + INIT_WORK(&img_req->work, rbd_img_handle_request_work); + img_req->work_result = result; + queue_work(rbd_wq, &img_req->work); } static int rbd_obj_read_object(struct rbd_obj_request *obj_req) @@ -2421,7 +2445,8 @@ static int rbd_obj_read_from_parent(struct rbd_obj_request *obj_req) return ret; } - rbd_img_request_submit(child_img_req); + /* avoid parent chain recursion */ + rbd_img_schedule(child_img_req, 0); return 0; } @@ -2756,6 +2781,7 @@ static bool __rbd_obj_handle_request(struct rbd_obj_request *obj_req, int *result) { struct rbd_img_request *img_req = obj_req->img_request; + struct rbd_device *rbd_dev = img_req->rbd_dev; bool done; mutex_lock(&obj_req->state_mutex); @@ -2765,59 +2791,113 @@ static bool __rbd_obj_handle_request(struct rbd_obj_request *obj_req, done = rbd_obj_advance_write(obj_req, result); mutex_unlock(&obj_req->state_mutex); + if (done && *result) { + rbd_assert(*result < 0); + rbd_warn(rbd_dev, "%s at objno %llu %llu~%llu result %d", + obj_op_name(img_req->op_type), obj_req->ex.oe_objno, + obj_req->ex.oe_off, obj_req->ex.oe_len, *result); + } return done; } -static void rbd_obj_end_request(struct rbd_obj_request *obj_req, int result) +/* + * This is open-coded in rbd_img_handle_request() to avoid parent chain + * recursion. + */ +static void rbd_obj_handle_request(struct rbd_obj_request *obj_req, int result) +{ + if (__rbd_obj_handle_request(obj_req, &result)) + rbd_img_handle_request(obj_req->img_request, result); +} + +static void rbd_img_object_requests(struct rbd_img_request *img_req) { - struct rbd_img_request *img_req = obj_req->img_request; + struct rbd_obj_request *obj_req; - rbd_assert(result <= 0); - if (!result) - return; + rbd_assert(!img_req->pending.result && !img_req->pending.num_pending); + + for_each_obj_request(img_req, obj_req) { + int result = 0; - rbd_warn(img_req->rbd_dev, "%s at objno %llu %llu~%llu result %d", - obj_op_name(img_req->op_type), obj_req->ex.oe_objno, - obj_req->ex.oe_off, obj_req->ex.oe_len, result); - if (!img_req->result) - img_req->result = result; + if (__rbd_obj_handle_request(obj_req, &result)) { + if (result) { + img_req->pending.result = result; + return; + } + } else { + img_req->pending.num_pending++; + } + } } -static void rbd_img_end_request(struct rbd_img_request *img_req) +static bool rbd_img_advance(struct rbd_img_request *img_req, int *result) { - rbd_assert(!test_bit(IMG_REQ_CHILD, &img_req->flags)); +again: + switch (img_req->state) { + case RBD_IMG_START: + rbd_assert(!*result); - blk_mq_end_request(img_req->rq, - errno_to_blk_status(img_req->result)); - rbd_img_request_put(img_req); + rbd_img_object_requests(img_req); + if (!img_req->pending.num_pending) { + *result = img_req->pending.result; + img_req->state = RBD_IMG_OBJECT_REQUESTS; + goto again; + } + img_req->state = __RBD_IMG_OBJECT_REQUESTS; + return false; + case __RBD_IMG_OBJECT_REQUESTS: + if (!pending_result_dec(&img_req->pending, result)) + return false; + /* fall through */ + case RBD_IMG_OBJECT_REQUESTS: + return true; + default: + BUG(); + } } -static void rbd_obj_handle_request(struct rbd_obj_request *obj_req, int result) +/* + * Return true if @img_req is completed. + */ +static bool __rbd_img_handle_request(struct rbd_img_request *img_req, + int *result) { - struct rbd_img_request *img_req; + struct rbd_device *rbd_dev = img_req->rbd_dev; + bool done; -again: - if (!__rbd_obj_handle_request(obj_req, &result)) - return; + mutex_lock(&img_req->state_mutex); + done = rbd_img_advance(img_req, result); + mutex_unlock(&img_req->state_mutex); - img_req = obj_req->img_request; - spin_lock(&img_req->completion_lock); - rbd_obj_end_request(obj_req, result); - rbd_assert(img_req->pending_count); - if (--img_req->pending_count) { - spin_unlock(&img_req->completion_lock); - return; + if (done && *result) { + rbd_assert(*result < 0); + rbd_warn(rbd_dev, "%s%s result %d", + test_bit(IMG_REQ_CHILD, &img_req->flags) ? "child " : "", + obj_op_name(img_req->op_type), *result); } + return done; +} + +static void rbd_img_handle_request(struct rbd_img_request *img_req, int result) +{ +again: + if (!__rbd_img_handle_request(img_req, &result)) + return; - spin_unlock(&img_req->completion_lock); - rbd_assert(img_req->result <= 0); if (test_bit(IMG_REQ_CHILD, &img_req->flags)) { - obj_req = img_req->obj_request; - result = img_req->result; + struct rbd_obj_request *obj_req = img_req->obj_request; + rbd_img_request_put(img_req); - goto again; + if (__rbd_obj_handle_request(obj_req, &result)) { + img_req = obj_req->img_request; + goto again; + } + } else { + struct request *rq = img_req->rq; + + rbd_img_request_put(img_req); + blk_mq_end_request(rq, errno_to_blk_status(result)); } - rbd_img_end_request(img_req); } static const struct rbd_client_id rbd_empty_cid; @@ -3933,10 +4013,10 @@ static void rbd_queue_workfn(struct work_struct *work) else result = rbd_img_fill_from_bio(img_request, offset, length, rq->bio); - if (result || !img_request->pending_count) + if (result) goto err_img_request; - rbd_img_request_submit(img_request); + rbd_img_handle_request(img_request, 0); if (must_be_locked) up_read(&rbd_dev->lock_rwsem); return; From patchwork Tue Jun 25 14:40:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015769 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 765EF1575 for ; Tue, 25 Jun 2019 14:41:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 682C92870F for ; Tue, 25 Jun 2019 14:41:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5CC8428C00; Tue, 25 Jun 2019 14:41:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0AF5928A1D for ; Tue, 25 Jun 2019 14:41:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731475AbfFYOlT (ORCPT ); Tue, 25 Jun 2019 10:41:19 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:37584 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731513AbfFYOlS (ORCPT ); Tue, 25 Jun 2019 10:41:18 -0400 Received: by mail-wr1-f65.google.com with SMTP id v14so18200545wrr.4 for ; Tue, 25 Jun 2019 07:41:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=yHggmNnliZX946Ynyj6VnHr1USjTsZhN6yEN2+7cZ+I=; b=CVepk/ErrUhVTJOaEEi8IrVicr0vpENEp3DcgqBDVJyFovcRPdgoXmtDP6K1g2pg38 MB2ZIoIfPu2T6ddpI+mnRhTisD18X7mouDuYkHAs7uKIr2PuRFk+B7MADmOwkkJcsMRN 7siAFmMwscohZVbGlMvreiwVn0lXn0uzho/WWnP5lxAlnh5XLW0FIlh3psamFJ3RFjWB Njhef472V+LTE8mYdFozPWgyFd34LT1Dlc+zda5fWP4HkpkD8R+sh63qUMHB0XREyQht fMjSgkmiBZxfggciMBTcMgOBFkuwsOcre0rD5/3m5cyG9mi31lXS2aDf4E2VpM4IDRw7 K7+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=yHggmNnliZX946Ynyj6VnHr1USjTsZhN6yEN2+7cZ+I=; b=BVVEhhIkekg7s3sZqV6GWJ0P/0lgjvKvxOwgYb6gUmdfXDOpTaDNv3zDoYz0GWZc6h lWg6dFy+BfjipZdSW/Fnds3KqfUBBUU3Mos0tBoJmviwNIJfLLfGQQelioPa3aTTBlZw 1Zvrut1wstQNB9iYLAmJ2ecZmbJrWWhpCSK/BZpTdmU39dhgw1VR5nFJblSKD09LYWOv UFss9EMbhxb9+2fCfw1/LdZFF1WM+2gX/ZCoV5EELRkrcXYanV1TdEHwR0ThBNwkhNsw UzLZwgE71y86/tS4GPaQcWGL0DRwzyZGAUhk5ngkjArWPps+U0DtZwjOfA939G8hPs+c LoIg== X-Gm-Message-State: APjAAAVtHhaJIRBwjCYt/nm8h0BmvH35HRjuB6uNDTbXnIcVK53iTh+a wXw0jk+1mq1IziBDPTij13mIt0ZcWMo= X-Google-Smtp-Source: APXvYqxAaHSuzj1Vtxbsb/cnx3O/Pu6Kii3+tDXkH8uIPlrCXXoX08IdHBlVXmXXFGFKAaiA1Wn51w== X-Received: by 2002:adf:ec49:: with SMTP id w9mr44648185wrn.303.1561473673817; Tue, 25 Jun 2019 07:41:13 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.12 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:13 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 06/20] rbd: introduce obj_req->osd_reqs list Date: Tue, 25 Jun 2019 16:40:57 +0200 Message-Id: <20190625144111.11270-7-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Since the dawn of time it had been assumed that a single object request spawns a single OSD request. This is already impacting copyup: instead of sending empty and current snapc copyups together, we wait for empty snapc OSD request to complete in order to reassign obj_req->osd_req with current snapc OSD request. Looking further, updating potentially hundreds of snapshot object maps serially is a non-starter. Replace obj_req->osd_req pointer with obj_req->osd_reqs list. Use osd_req->r_unsafe_item for linkage -- it's used by the filesystem for a similar purpose. Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang --- drivers/block/rbd.c | 191 +++++++++++++++++++++++--------------------- 1 file changed, 100 insertions(+), 91 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 51dd1b99c242..5c34fe215c63 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -289,7 +289,7 @@ struct rbd_obj_request { struct bio_vec *copyup_bvecs; u32 copyup_bvec_count; - struct ceph_osd_request *osd_req; + struct list_head osd_reqs; /* w/ r_unsafe_item */ struct mutex state_mutex; struct kref kref; @@ -1410,7 +1410,9 @@ static inline void rbd_img_obj_request_del(struct rbd_img_request *img_request, static void rbd_obj_request_submit(struct rbd_obj_request *obj_request) { - struct ceph_osd_request *osd_req = obj_request->osd_req; + struct ceph_osd_request *osd_req = + list_last_entry(&obj_request->osd_reqs, struct ceph_osd_request, + r_unsafe_item); dout("%s %p object_no %016llx %llu~%llu osd_req %p\n", __func__, obj_request, obj_request->ex.oe_objno, obj_request->ex.oe_off, @@ -1497,7 +1499,6 @@ static void rbd_osd_req_callback(struct ceph_osd_request *osd_req) dout("%s osd_req %p result %d for obj_req %p\n", __func__, osd_req, osd_req->r_result, obj_req); - rbd_assert(osd_req == obj_req->osd_req); /* * Writes aren't allowed to return a data payload. In some @@ -1512,17 +1513,17 @@ static void rbd_osd_req_callback(struct ceph_osd_request *osd_req) rbd_obj_handle_request(obj_req, result); } -static void rbd_osd_req_format_read(struct rbd_obj_request *obj_request) +static void rbd_osd_format_read(struct ceph_osd_request *osd_req) { - struct ceph_osd_request *osd_req = obj_request->osd_req; + struct rbd_obj_request *obj_request = osd_req->r_priv; osd_req->r_flags = CEPH_OSD_FLAG_READ; osd_req->r_snapid = obj_request->img_request->snap_id; } -static void rbd_osd_req_format_write(struct rbd_obj_request *obj_request) +static void rbd_osd_format_write(struct ceph_osd_request *osd_req) { - struct ceph_osd_request *osd_req = obj_request->osd_req; + struct rbd_obj_request *obj_request = osd_req->r_priv; osd_req->r_flags = CEPH_OSD_FLAG_WRITE; ktime_get_real_ts64(&osd_req->r_mtime); @@ -1530,19 +1531,21 @@ static void rbd_osd_req_format_write(struct rbd_obj_request *obj_request) } static struct ceph_osd_request * -__rbd_osd_req_create(struct rbd_obj_request *obj_req, - struct ceph_snap_context *snapc, unsigned int num_ops) +__rbd_obj_add_osd_request(struct rbd_obj_request *obj_req, + struct ceph_snap_context *snapc, int num_ops) { struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; struct ceph_osd_client *osdc = &rbd_dev->rbd_client->client->osdc; struct ceph_osd_request *req; const char *name_format = rbd_dev->image_format == 1 ? RBD_V1_DATA_FORMAT : RBD_V2_DATA_FORMAT; + int ret; req = ceph_osdc_alloc_request(osdc, snapc, num_ops, false, GFP_NOIO); if (!req) - return NULL; + return ERR_PTR(-ENOMEM); + list_add_tail(&req->r_unsafe_item, &obj_req->osd_reqs); req->r_callback = rbd_osd_req_callback; req->r_priv = obj_req; @@ -1553,27 +1556,20 @@ __rbd_osd_req_create(struct rbd_obj_request *obj_req, ceph_oloc_copy(&req->r_base_oloc, &rbd_dev->header_oloc); req->r_base_oloc.pool = rbd_dev->layout.pool_id; - if (ceph_oid_aprintf(&req->r_base_oid, GFP_NOIO, name_format, - rbd_dev->header.object_prefix, obj_req->ex.oe_objno)) - goto err_req; + ret = ceph_oid_aprintf(&req->r_base_oid, GFP_NOIO, name_format, + rbd_dev->header.object_prefix, + obj_req->ex.oe_objno); + if (ret) + return ERR_PTR(ret); return req; - -err_req: - ceph_osdc_put_request(req); - return NULL; } static struct ceph_osd_request * -rbd_osd_req_create(struct rbd_obj_request *obj_req, unsigned int num_ops) +rbd_obj_add_osd_request(struct rbd_obj_request *obj_req, int num_ops) { - return __rbd_osd_req_create(obj_req, obj_req->img_request->snapc, - num_ops); -} - -static void rbd_osd_req_destroy(struct ceph_osd_request *osd_req) -{ - ceph_osdc_put_request(osd_req); + return __rbd_obj_add_osd_request(obj_req, obj_req->img_request->snapc, + num_ops); } static struct rbd_obj_request *rbd_obj_request_create(void) @@ -1585,6 +1581,7 @@ static struct rbd_obj_request *rbd_obj_request_create(void) return NULL; ceph_object_extent_init(&obj_request->ex); + INIT_LIST_HEAD(&obj_request->osd_reqs); mutex_init(&obj_request->state_mutex); kref_init(&obj_request->kref); @@ -1595,14 +1592,19 @@ static struct rbd_obj_request *rbd_obj_request_create(void) static void rbd_obj_request_destroy(struct kref *kref) { struct rbd_obj_request *obj_request; + struct ceph_osd_request *osd_req; u32 i; obj_request = container_of(kref, struct rbd_obj_request, kref); dout("%s: obj %p\n", __func__, obj_request); - if (obj_request->osd_req) - rbd_osd_req_destroy(obj_request->osd_req); + while (!list_empty(&obj_request->osd_reqs)) { + osd_req = list_first_entry(&obj_request->osd_reqs, + struct ceph_osd_request, r_unsafe_item); + list_del_init(&osd_req->r_unsafe_item); + ceph_osdc_put_request(osd_req); + } switch (obj_request->img_request->data_type) { case OBJ_REQUEST_NODATA: @@ -1796,11 +1798,13 @@ static int rbd_obj_calc_img_extents(struct rbd_obj_request *obj_req, return 0; } -static void rbd_osd_req_setup_data(struct rbd_obj_request *obj_req, u32 which) +static void rbd_osd_setup_data(struct ceph_osd_request *osd_req, int which) { + struct rbd_obj_request *obj_req = osd_req->r_priv; + switch (obj_req->img_request->data_type) { case OBJ_REQUEST_BIO: - osd_req_op_extent_osd_data_bio(obj_req->osd_req, which, + osd_req_op_extent_osd_data_bio(osd_req, which, &obj_req->bio_pos, obj_req->ex.oe_len); break; @@ -1809,7 +1813,7 @@ static void rbd_osd_req_setup_data(struct rbd_obj_request *obj_req, u32 which) rbd_assert(obj_req->bvec_pos.iter.bi_size == obj_req->ex.oe_len); rbd_assert(obj_req->bvec_idx == obj_req->bvec_count); - osd_req_op_extent_osd_data_bvec_pos(obj_req->osd_req, which, + osd_req_op_extent_osd_data_bvec_pos(osd_req, which, &obj_req->bvec_pos); break; default: @@ -1819,21 +1823,22 @@ static void rbd_osd_req_setup_data(struct rbd_obj_request *obj_req, u32 which) static int rbd_obj_setup_read(struct rbd_obj_request *obj_req) { - obj_req->osd_req = __rbd_osd_req_create(obj_req, NULL, 1); - if (!obj_req->osd_req) - return -ENOMEM; + struct ceph_osd_request *osd_req; - osd_req_op_extent_init(obj_req->osd_req, 0, CEPH_OSD_OP_READ, + osd_req = __rbd_obj_add_osd_request(obj_req, NULL, 1); + if (IS_ERR(osd_req)) + return PTR_ERR(osd_req); + + osd_req_op_extent_init(osd_req, 0, CEPH_OSD_OP_READ, obj_req->ex.oe_off, obj_req->ex.oe_len, 0, 0); - rbd_osd_req_setup_data(obj_req, 0); + rbd_osd_setup_data(osd_req, 0); - rbd_osd_req_format_read(obj_req); + rbd_osd_format_read(osd_req); obj_req->read_state = RBD_OBJ_READ_START; return 0; } -static int __rbd_obj_setup_stat(struct rbd_obj_request *obj_req, - unsigned int which) +static int rbd_osd_setup_stat(struct ceph_osd_request *osd_req, int which) { struct page **pages; @@ -1849,8 +1854,8 @@ static int __rbd_obj_setup_stat(struct rbd_obj_request *obj_req, if (IS_ERR(pages)) return PTR_ERR(pages); - osd_req_op_init(obj_req->osd_req, which, CEPH_OSD_OP_STAT, 0); - osd_req_op_raw_data_in_pages(obj_req->osd_req, which, pages, + osd_req_op_init(osd_req, which, CEPH_OSD_OP_STAT, 0); + osd_req_op_raw_data_in_pages(osd_req, which, pages, 8 + sizeof(struct ceph_timespec), 0, false, true); return 0; @@ -1861,13 +1866,14 @@ static int count_write_ops(struct rbd_obj_request *obj_req) return 2; /* setallochint + write/writefull */ } -static void __rbd_obj_setup_write(struct rbd_obj_request *obj_req, - unsigned int which) +static void __rbd_osd_setup_write_ops(struct ceph_osd_request *osd_req, + int which) { + struct rbd_obj_request *obj_req = osd_req->r_priv; struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; u16 opcode; - osd_req_op_alloc_hint_init(obj_req->osd_req, which++, + osd_req_op_alloc_hint_init(osd_req, which++, rbd_dev->layout.object_size, rbd_dev->layout.object_size); @@ -1876,16 +1882,16 @@ static void __rbd_obj_setup_write(struct rbd_obj_request *obj_req, else opcode = CEPH_OSD_OP_WRITE; - osd_req_op_extent_init(obj_req->osd_req, which, opcode, + osd_req_op_extent_init(osd_req, which, opcode, obj_req->ex.oe_off, obj_req->ex.oe_len, 0, 0); - rbd_osd_req_setup_data(obj_req, which++); + rbd_osd_setup_data(osd_req, which); - rbd_assert(which == obj_req->osd_req->r_num_ops); - rbd_osd_req_format_write(obj_req); + rbd_osd_format_write(osd_req); } static int rbd_obj_setup_write(struct rbd_obj_request *obj_req) { + struct ceph_osd_request *osd_req; unsigned int num_osd_ops, which = 0; int ret; @@ -1901,18 +1907,18 @@ static int rbd_obj_setup_write(struct rbd_obj_request *obj_req) if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED) num_osd_ops++; /* stat */ - obj_req->osd_req = rbd_osd_req_create(obj_req, num_osd_ops); - if (!obj_req->osd_req) - return -ENOMEM; + osd_req = rbd_obj_add_osd_request(obj_req, num_osd_ops); + if (IS_ERR(osd_req)) + return PTR_ERR(osd_req); if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED) { - ret = __rbd_obj_setup_stat(obj_req, which++); + ret = rbd_osd_setup_stat(osd_req, which++); if (ret) return ret; } obj_req->write_state = RBD_OBJ_WRITE_START; - __rbd_obj_setup_write(obj_req, which); + __rbd_osd_setup_write_ops(osd_req, which); return 0; } @@ -1925,6 +1931,7 @@ static u16 truncate_or_zero_opcode(struct rbd_obj_request *obj_req) static int rbd_obj_setup_discard(struct rbd_obj_request *obj_req) { struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; + struct ceph_osd_request *osd_req; u64 off = obj_req->ex.oe_off; u64 next_off = obj_req->ex.oe_off + obj_req->ex.oe_len; int ret; @@ -1953,24 +1960,24 @@ static int rbd_obj_setup_discard(struct rbd_obj_request *obj_req) if (rbd_obj_is_entire(obj_req) && !obj_req->num_img_extents) obj_req->flags |= RBD_OBJ_FLAG_DELETION; - obj_req->osd_req = rbd_osd_req_create(obj_req, 1); - if (!obj_req->osd_req) - return -ENOMEM; + osd_req = rbd_obj_add_osd_request(obj_req, 1); + if (IS_ERR(osd_req)) + return PTR_ERR(osd_req); if (rbd_obj_is_entire(obj_req) && !obj_req->num_img_extents) { rbd_assert(obj_req->flags & RBD_OBJ_FLAG_DELETION); - osd_req_op_init(obj_req->osd_req, 0, CEPH_OSD_OP_DELETE, 0); + osd_req_op_init(osd_req, 0, CEPH_OSD_OP_DELETE, 0); } else { dout("%s %p %llu~%llu -> %llu~%llu\n", __func__, obj_req, obj_req->ex.oe_off, obj_req->ex.oe_len, off, next_off - off); - osd_req_op_extent_init(obj_req->osd_req, 0, + osd_req_op_extent_init(osd_req, 0, truncate_or_zero_opcode(obj_req), off, next_off - off, 0, 0); } obj_req->write_state = RBD_OBJ_WRITE_START; - rbd_osd_req_format_write(obj_req); + rbd_osd_format_write(osd_req); return 0; } @@ -1987,20 +1994,21 @@ static int count_zeroout_ops(struct rbd_obj_request *obj_req) return num_osd_ops; } -static void __rbd_obj_setup_zeroout(struct rbd_obj_request *obj_req, - unsigned int which) +static void __rbd_osd_setup_zeroout_ops(struct ceph_osd_request *osd_req, + int which) { + struct rbd_obj_request *obj_req = osd_req->r_priv; u16 opcode; if (rbd_obj_is_entire(obj_req)) { if (obj_req->num_img_extents) { if (!(obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED)) - osd_req_op_init(obj_req->osd_req, which++, + osd_req_op_init(osd_req, which++, CEPH_OSD_OP_CREATE, 0); opcode = CEPH_OSD_OP_TRUNCATE; } else { rbd_assert(obj_req->flags & RBD_OBJ_FLAG_DELETION); - osd_req_op_init(obj_req->osd_req, which++, + osd_req_op_init(osd_req, which++, CEPH_OSD_OP_DELETE, 0); opcode = 0; } @@ -2009,16 +2017,16 @@ static void __rbd_obj_setup_zeroout(struct rbd_obj_request *obj_req, } if (opcode) - osd_req_op_extent_init(obj_req->osd_req, which++, opcode, + osd_req_op_extent_init(osd_req, which, opcode, obj_req->ex.oe_off, obj_req->ex.oe_len, 0, 0); - rbd_assert(which == obj_req->osd_req->r_num_ops); - rbd_osd_req_format_write(obj_req); + rbd_osd_format_write(osd_req); } static int rbd_obj_setup_zeroout(struct rbd_obj_request *obj_req) { + struct ceph_osd_request *osd_req; unsigned int num_osd_ops, which = 0; int ret; @@ -2038,18 +2046,18 @@ static int rbd_obj_setup_zeroout(struct rbd_obj_request *obj_req) if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED) num_osd_ops++; /* stat */ - obj_req->osd_req = rbd_osd_req_create(obj_req, num_osd_ops); - if (!obj_req->osd_req) - return -ENOMEM; + osd_req = rbd_obj_add_osd_request(obj_req, num_osd_ops); + if (IS_ERR(osd_req)) + return PTR_ERR(osd_req); if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED) { - ret = __rbd_obj_setup_stat(obj_req, which++); + ret = rbd_osd_setup_stat(osd_req, which++); if (ret) return ret; } obj_req->write_state = RBD_OBJ_WRITE_START; - __rbd_obj_setup_zeroout(obj_req, which); + __rbd_osd_setup_zeroout_ops(osd_req, which); return 0; } @@ -2061,6 +2069,7 @@ static int rbd_obj_setup_zeroout(struct rbd_obj_request *obj_req) static int __rbd_img_fill_request(struct rbd_img_request *img_req) { struct rbd_obj_request *obj_req, *next_obj_req; + struct ceph_osd_request *osd_req; int ret; for_each_obj_request_safe(img_req, obj_req, next_obj_req) { @@ -2087,7 +2096,10 @@ static int __rbd_img_fill_request(struct rbd_img_request *img_req) continue; } - ret = ceph_osdc_alloc_messages(obj_req->osd_req, GFP_NOIO); + osd_req = list_last_entry(&obj_req->osd_reqs, + struct ceph_osd_request, + r_unsafe_item); + ret = ceph_osdc_alloc_messages(osd_req, GFP_NOIO); if (ret) return ret; } @@ -2538,28 +2550,27 @@ static bool is_zero_bvecs(struct bio_vec *bvecs, u32 bytes) static int rbd_obj_issue_copyup_empty_snapc(struct rbd_obj_request *obj_req, u32 bytes) { + struct ceph_osd_request *osd_req; int ret; dout("%s obj_req %p bytes %u\n", __func__, obj_req, bytes); - rbd_assert(obj_req->osd_req->r_ops[0].op == CEPH_OSD_OP_STAT); rbd_assert(bytes > 0 && bytes != MODS_ONLY); - rbd_osd_req_destroy(obj_req->osd_req); - obj_req->osd_req = __rbd_osd_req_create(obj_req, &rbd_empty_snapc, 1); - if (!obj_req->osd_req) - return -ENOMEM; + osd_req = __rbd_obj_add_osd_request(obj_req, &rbd_empty_snapc, 1); + if (IS_ERR(osd_req)) + return PTR_ERR(osd_req); - ret = osd_req_op_cls_init(obj_req->osd_req, 0, "rbd", "copyup"); + ret = osd_req_op_cls_init(osd_req, 0, "rbd", "copyup"); if (ret) return ret; - osd_req_op_cls_request_data_bvecs(obj_req->osd_req, 0, + osd_req_op_cls_request_data_bvecs(osd_req, 0, obj_req->copyup_bvecs, obj_req->copyup_bvec_count, bytes); - rbd_osd_req_format_write(obj_req); + rbd_osd_format_write(osd_req); - ret = ceph_osdc_alloc_messages(obj_req->osd_req, GFP_NOIO); + ret = ceph_osdc_alloc_messages(osd_req, GFP_NOIO); if (ret) return ret; @@ -2570,14 +2581,12 @@ static int rbd_obj_issue_copyup_empty_snapc(struct rbd_obj_request *obj_req, static int rbd_obj_issue_copyup_ops(struct rbd_obj_request *obj_req, u32 bytes) { struct rbd_img_request *img_req = obj_req->img_request; + struct ceph_osd_request *osd_req; unsigned int num_osd_ops = (bytes != MODS_ONLY); unsigned int which = 0; int ret; dout("%s obj_req %p bytes %u\n", __func__, obj_req, bytes); - rbd_assert(obj_req->osd_req->r_ops[0].op == CEPH_OSD_OP_STAT || - obj_req->osd_req->r_ops[0].op == CEPH_OSD_OP_CALL); - rbd_osd_req_destroy(obj_req->osd_req); switch (img_req->op_type) { case OBJ_OP_WRITE: @@ -2590,17 +2599,17 @@ static int rbd_obj_issue_copyup_ops(struct rbd_obj_request *obj_req, u32 bytes) BUG(); } - obj_req->osd_req = rbd_osd_req_create(obj_req, num_osd_ops); - if (!obj_req->osd_req) - return -ENOMEM; + osd_req = rbd_obj_add_osd_request(obj_req, num_osd_ops); + if (IS_ERR(osd_req)) + return PTR_ERR(osd_req); if (bytes != MODS_ONLY) { - ret = osd_req_op_cls_init(obj_req->osd_req, which, "rbd", + ret = osd_req_op_cls_init(osd_req, which, "rbd", "copyup"); if (ret) return ret; - osd_req_op_cls_request_data_bvecs(obj_req->osd_req, which++, + osd_req_op_cls_request_data_bvecs(osd_req, which++, obj_req->copyup_bvecs, obj_req->copyup_bvec_count, bytes); @@ -2608,16 +2617,16 @@ static int rbd_obj_issue_copyup_ops(struct rbd_obj_request *obj_req, u32 bytes) switch (img_req->op_type) { case OBJ_OP_WRITE: - __rbd_obj_setup_write(obj_req, which); + __rbd_osd_setup_write_ops(osd_req, which); break; case OBJ_OP_ZEROOUT: - __rbd_obj_setup_zeroout(obj_req, which); + __rbd_osd_setup_zeroout_ops(osd_req, which); break; default: BUG(); } - ret = ceph_osdc_alloc_messages(obj_req->osd_req, GFP_NOIO); + ret = ceph_osdc_alloc_messages(osd_req, GFP_NOIO); if (ret) return ret; From patchwork Tue Jun 25 14:40:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015765 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ABC891398 for ; Tue, 25 Jun 2019 14:41:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9EE0428047 for ; Tue, 25 Jun 2019 14:41:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 93B1C28A4B; Tue, 25 Jun 2019 14:41:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 42AA828047 for ; Tue, 25 Jun 2019 14:41:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731233AbfFYOlS (ORCPT ); Tue, 25 Jun 2019 10:41:18 -0400 Received: from mail-wr1-f66.google.com ([209.85.221.66]:45634 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730689AbfFYOlQ (ORCPT ); Tue, 25 Jun 2019 10:41:16 -0400 Received: by mail-wr1-f66.google.com with SMTP id f9so18159240wre.12 for ; Tue, 25 Jun 2019 07:41:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Kfgg56rrIxnrF2y2atSD41gIa/zUpwvAKREsDRQXx4U=; b=tXVNv3PeNlz4vENvBFgPwrmXbz14WsDI9gNsfQOnR8mPAEG4RX93SW01zg7fIxw6bK 2EDleLL710DhyeE8u2OS6Fu7m4Sj91M0e9gDiOWcPTkMXzCLJyLQuR+cZNrnV4reR3rW 7YocIKeeks8W+O/YWYcylnuNaBD3BsKREvkluai/69zdsG5qOs34Q0iE5zCGesEeedXk RTM5hXXVNHPl2a/kagTR2HnN8m6/FWVMOipsRUfTHQQHYHEcEAzNKkJXsnjD0k3PWH3B L7TqbBKiX0nQXmSBUiSJOHTCljriiOdh3U61hNt9Y9S4q5V1M/JvVs4ZBvMeaHWeSNav QC1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Kfgg56rrIxnrF2y2atSD41gIa/zUpwvAKREsDRQXx4U=; b=T2XWv7fC+WImW81gu3qvzIB35zEA0vtBPez8MFu/SKmD8JKHZUFBP+eO4d0zMhFvCm 3BTFDOFCDuaMuwsajHSIaVVAq7s6bHnU2LtRp7aThoPmPUOeN5/I4lPjkFzWa4732B70 HoMbPsSgTbO4l0tf6sttSFEBd+d1yjSHbEmkO2gedfroEp25CyY+5FhUoftu6DFlrTuI kqUjym+obMIbLPb3Gy+V4VXo9uG6d5TKkAN9Ljffnv56PPJn2ZIXwcjaXCbTQ8F5Lkea yAm49JQXI8ejGFA78qanqr/v+tN+CNDM9BAdDcWhulJWEXiZC4f75DxpM1OYO5fTr0jJ o1NQ== X-Gm-Message-State: APjAAAUZw7NhS2Vx1+UlR8Gc6VwI/5DUUYbJJTrHjnDFT6SHUNPsWEM+ Q2BtClOSmdAjaV2NF7sWDo0OKQpYDu4= X-Google-Smtp-Source: APXvYqy8KV/g1AgsDy4kK3IEqjyGFpQZPpH0UAkCYLGYUsc8MvgoAjPdAGMJB07GKUHn8Rk9YSc/7Q== X-Received: by 2002:adf:ec0c:: with SMTP id x12mr22107487wrn.342.1561473675001; Tue, 25 Jun 2019 07:41:15 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.13 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:14 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 07/20] rbd: factor out rbd_osd_setup_copyup() Date: Tue, 25 Jun 2019 16:40:58 +0200 Message-Id: <20190625144111.11270-8-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang --- drivers/block/rbd.c | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 5c34fe215c63..e059a8139e4f 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -1861,6 +1861,21 @@ static int rbd_osd_setup_stat(struct ceph_osd_request *osd_req, int which) return 0; } +static int rbd_osd_setup_copyup(struct ceph_osd_request *osd_req, int which, + u32 bytes) +{ + struct rbd_obj_request *obj_req = osd_req->r_priv; + int ret; + + ret = osd_req_op_cls_init(osd_req, which, "rbd", "copyup"); + if (ret) + return ret; + + osd_req_op_cls_request_data_bvecs(osd_req, which, obj_req->copyup_bvecs, + obj_req->copyup_bvec_count, bytes); + return 0; +} + static int count_write_ops(struct rbd_obj_request *obj_req) { return 2; /* setallochint + write/writefull */ @@ -2560,14 +2575,10 @@ static int rbd_obj_issue_copyup_empty_snapc(struct rbd_obj_request *obj_req, if (IS_ERR(osd_req)) return PTR_ERR(osd_req); - ret = osd_req_op_cls_init(osd_req, 0, "rbd", "copyup"); + ret = rbd_osd_setup_copyup(osd_req, 0, bytes); if (ret) return ret; - osd_req_op_cls_request_data_bvecs(osd_req, 0, - obj_req->copyup_bvecs, - obj_req->copyup_bvec_count, - bytes); rbd_osd_format_write(osd_req); ret = ceph_osdc_alloc_messages(osd_req, GFP_NOIO); @@ -2604,15 +2615,9 @@ static int rbd_obj_issue_copyup_ops(struct rbd_obj_request *obj_req, u32 bytes) return PTR_ERR(osd_req); if (bytes != MODS_ONLY) { - ret = osd_req_op_cls_init(osd_req, which, "rbd", - "copyup"); + ret = rbd_osd_setup_copyup(osd_req, which++, bytes); if (ret) return ret; - - osd_req_op_cls_request_data_bvecs(osd_req, which++, - obj_req->copyup_bvecs, - obj_req->copyup_bvec_count, - bytes); } switch (img_req->op_type) { From patchwork Tue Jun 25 14:40:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015767 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E07BE112C for ; Tue, 25 Jun 2019 14:41:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D35F2286AE for ; Tue, 25 Jun 2019 14:41:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C7BC028949; Tue, 25 Jun 2019 14:41:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 69F0528A39 for ; Tue, 25 Jun 2019 14:41:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731255AbfFYOlS (ORCPT ); Tue, 25 Jun 2019 10:41:18 -0400 Received: from mail-wr1-f66.google.com ([209.85.221.66]:44310 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731158AbfFYOlS (ORCPT ); Tue, 25 Jun 2019 10:41:18 -0400 Received: by mail-wr1-f66.google.com with SMTP id r16so18159657wrl.11 for ; Tue, 25 Jun 2019 07:41:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=gHQJOX947kOtkb2gjtZGqnw55LHsHg48YQn88Ila+kQ=; b=pGnfA6wPSKNJ4kWGSC1AXp7PBrU/rDpwNMqH45vt95Hq2GUjI3y7osByT5SFMrkQZ2 ap98rhh1k5eiFPYH6fEQEhtT6uZAWBdjNuUV2VGkWSQgvrgyAYGPfxBN/4zpVsx262Cx ipOI60O9XgzwAAdPo6BmyAifuZGz902BzDZBsOi9KpoOBtyoTFx2Y2gIw0EZyYSdMJ/I aWmrjGy1wQimpqMqv8DgieTdlLp2jlk7JuKLABHf+rR7YVyysY5XD7Hr6BvE/HHaZpuJ Hrx/dvQfdKqjn5Og+QQ6qIuUCv8ih6IQJoh6JP2pMsU8z3XjmpaIzGwNV27bg+tXI7OU Lv3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=gHQJOX947kOtkb2gjtZGqnw55LHsHg48YQn88Ila+kQ=; b=L4/cbg7Jh9CJBI8uz07RFghgZ2sPmj28f8etq4T8T33fjGYps0thj3Sur9l14dGRun 8TrbDjtIDNYLjlTmIA2x1Ls4eEHH1mqXXpLQ8wOZgXrBuzu375tzQ6NwyCfGZvmGIWmz y0bo073NJWLAG8G6FJx9nyHmT4n4VJrh/3zfbdonBnIYQi9YZgUiiYWa2HKsa2GTNKMS 729bnVTynm6EbEWU6J7tguPskWtzRMyIZT/tqTLBKNYcrUYxR7Fp+c/lTaa81VL7eFZg e2KfGBTcXYo+tO50Qa9YNcP6rpLc/KqiqnYiFObN1kidlpPN/X9kTjnbbXjh0IP34ILH w+0A== X-Gm-Message-State: APjAAAWNmNmgFqc+3oZmI5PACTDCBKVwDoP+QYNsmHY9f2VjUCLpMYN/ qcOdJMNi785g2yogjP3cbVL3a8pQl9c= X-Google-Smtp-Source: APXvYqzVRE8kpaWUFDYeF6m3Uj05VKVnKyF4aPF3zdscJ2JQ7tyQA/f1U94r1N333zvJU3UeJaGfmw== X-Received: by 2002:adf:ce88:: with SMTP id r8mr40114257wrn.42.1561473676301; Tue, 25 Jun 2019 07:41:16 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.15 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:15 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 08/20] rbd: factor out __rbd_osd_setup_discard_ops() Date: Tue, 25 Jun 2019 16:40:59 +0200 Message-Id: <20190625144111.11270-9-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP With obj_req->xferred removed, obj_req->ex.oe_off and obj_req->ex.oe_len can be updated if required for alignment. Previously the new offset and length weren't stored anywhere beyond rbd_obj_setup_discard(). Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang --- drivers/block/rbd.c | 43 +++++++++++++++++++++++++++---------------- 1 file changed, 27 insertions(+), 16 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index e059a8139e4f..acc9017034d7 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -1943,12 +1943,27 @@ static u16 truncate_or_zero_opcode(struct rbd_obj_request *obj_req) CEPH_OSD_OP_ZERO; } +static void __rbd_osd_setup_discard_ops(struct ceph_osd_request *osd_req, + int which) +{ + struct rbd_obj_request *obj_req = osd_req->r_priv; + + if (rbd_obj_is_entire(obj_req) && !obj_req->num_img_extents) { + rbd_assert(obj_req->flags & RBD_OBJ_FLAG_DELETION); + osd_req_op_init(osd_req, which, CEPH_OSD_OP_DELETE, 0); + } else { + osd_req_op_extent_init(osd_req, which, + truncate_or_zero_opcode(obj_req), + obj_req->ex.oe_off, obj_req->ex.oe_len, + 0, 0); + } +} + static int rbd_obj_setup_discard(struct rbd_obj_request *obj_req) { struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; struct ceph_osd_request *osd_req; - u64 off = obj_req->ex.oe_off; - u64 next_off = obj_req->ex.oe_off + obj_req->ex.oe_len; + u64 off, next_off; int ret; /* @@ -1961,10 +1976,17 @@ static int rbd_obj_setup_discard(struct rbd_obj_request *obj_req) */ if (rbd_dev->opts->alloc_size != rbd_dev->layout.object_size || !rbd_obj_is_tail(obj_req)) { - off = round_up(off, rbd_dev->opts->alloc_size); - next_off = round_down(next_off, rbd_dev->opts->alloc_size); + off = round_up(obj_req->ex.oe_off, rbd_dev->opts->alloc_size); + next_off = round_down(obj_req->ex.oe_off + obj_req->ex.oe_len, + rbd_dev->opts->alloc_size); if (off >= next_off) return 1; + + dout("%s %p %llu~%llu -> %llu~%llu\n", __func__, + obj_req, obj_req->ex.oe_off, obj_req->ex.oe_len, + off, next_off - off); + obj_req->ex.oe_off = off; + obj_req->ex.oe_len = next_off - off; } /* reverse map the entire object onto the parent */ @@ -1979,19 +2001,8 @@ static int rbd_obj_setup_discard(struct rbd_obj_request *obj_req) if (IS_ERR(osd_req)) return PTR_ERR(osd_req); - if (rbd_obj_is_entire(obj_req) && !obj_req->num_img_extents) { - rbd_assert(obj_req->flags & RBD_OBJ_FLAG_DELETION); - osd_req_op_init(osd_req, 0, CEPH_OSD_OP_DELETE, 0); - } else { - dout("%s %p %llu~%llu -> %llu~%llu\n", __func__, - obj_req, obj_req->ex.oe_off, obj_req->ex.oe_len, - off, next_off - off); - osd_req_op_extent_init(osd_req, 0, - truncate_or_zero_opcode(obj_req), - off, next_off - off, 0, 0); - } - obj_req->write_state = RBD_OBJ_WRITE_START; + __rbd_osd_setup_discard_ops(osd_req, 0); rbd_osd_format_write(osd_req); return 0; } From patchwork Tue Jun 25 14:41:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015773 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C638F1398 for ; Tue, 25 Jun 2019 14:41:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B80C5287C5 for ; Tue, 25 Jun 2019 14:41:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A9C9328C00; Tue, 25 Jun 2019 14:41:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D114328A1D for ; Tue, 25 Jun 2019 14:41:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731621AbfFYOlW (ORCPT ); Tue, 25 Jun 2019 10:41:22 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:37591 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730689AbfFYOlU (ORCPT ); Tue, 25 Jun 2019 10:41:20 -0400 Received: by mail-wr1-f65.google.com with SMTP id v14so18200749wrr.4 for ; Tue, 25 Jun 2019 07:41:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=fcJZUs5MP1OQHF7kFnoINfmArct/WFpYt008J5vXaAw=; b=vhLYOOSzKH2+V9xKX0ulxL+WcxWvXA9hlbaYjQB7p6kOwF7wA8u9TWX6rHXenIbCWK jTMQ1ymhkeWThSgtVBDpLggddG9b5KtdB7JOagz5vk7XMEnHANFE6jV8u3ova182f+WS 5F01LC+07Wj6thqofr1i6Wh1z3CBmDps1M7UM5UDCk0XzK8bi+yXmGtMnpEtai/3DJiA r7B0YySOosLy6HD6zVtN1+MlfT/i6KY7zuOsM7GznH3r7MVHQdhBPDt2uWXTKh1guJqs FEfh30uYK4GQJfmALLXDx8B+KSVnbRe6BLPbA4UwKoX13bD+V98QIC6dnqmBTggZOqZX j/7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=fcJZUs5MP1OQHF7kFnoINfmArct/WFpYt008J5vXaAw=; b=ZX4i9Qj1MxpbryS/RrgZwkJwL9U+bhAn/XatfL4MD2BYEhaArTzK8bfUKLx4wmJvWN l85KgHfUmuVNZjDOOoUxk/Nvq1Y5viL7TZn1kf535F0HLukCCvC4rawf8RWhpdsCIl3t G09mtb6ereF89HZdYQe1OVxltMtJDQlNCX/95MVw2VGrcYsYwi6e838dHrpAI+zk0fnM 5Z77mOK3bsj21U6HT8nKRPctQkU5qexz3EpT/XvnRx1XlDECdAwWBI2WW8uUJ4zVE3AY muzJZIbC++aoNXXzwv6G2K1wf+1I47QIDhVwWu4d88ygP0oHOHloF/sR42hnlbJAEYSC vXYg== X-Gm-Message-State: APjAAAXYzRX5HWfIhIO6T2k5Ch0Pt+vvQzUiHAiNEXxhnB3vSEQNU5SE 7nGpzSHycpo7+QmWQ5PEaf0ToqZjULE= X-Google-Smtp-Source: APXvYqx47fdUz2ek3jE92HWMLwvIm2uJ5G8tX2soQhTbdpwGXvmehRYoBil+jpX2TzR4WJYr5uSS9w== X-Received: by 2002:a5d:65c5:: with SMTP id e5mr60982490wrw.266.1561473677337; Tue, 25 Jun 2019 07:41:17 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.16 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:16 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 09/20] rbd: move OSD request allocation into object request state machines Date: Tue, 25 Jun 2019 16:41:00 +0200 Message-Id: <20190625144111.11270-10-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Following submission, move initial OSD request allocation into object request state machines. Everything that has to do with OSD requests is now handled inside the state machine, all __rbd_img_fill_request() has left is initialization. Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang --- drivers/block/rbd.c | 214 ++++++++++++++++++++------------------------ 1 file changed, 96 insertions(+), 118 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index acc9017034d7..61bc20cf1c29 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -1408,15 +1408,13 @@ static inline void rbd_img_obj_request_del(struct rbd_img_request *img_request, rbd_obj_request_put(obj_request); } -static void rbd_obj_request_submit(struct rbd_obj_request *obj_request) +static void rbd_osd_submit(struct ceph_osd_request *osd_req) { - struct ceph_osd_request *osd_req = - list_last_entry(&obj_request->osd_reqs, struct ceph_osd_request, - r_unsafe_item); + struct rbd_obj_request *obj_req = osd_req->r_priv; - dout("%s %p object_no %016llx %llu~%llu osd_req %p\n", __func__, - obj_request, obj_request->ex.oe_objno, obj_request->ex.oe_off, - obj_request->ex.oe_len, osd_req); + dout("%s osd_req %p for obj_req %p objno %llu %llu~%llu\n", + __func__, osd_req, obj_req, obj_req->ex.oe_objno, + obj_req->ex.oe_off, obj_req->ex.oe_len); ceph_osdc_start_request(osd_req->r_osdc, osd_req, false); } @@ -1823,17 +1821,6 @@ static void rbd_osd_setup_data(struct ceph_osd_request *osd_req, int which) static int rbd_obj_setup_read(struct rbd_obj_request *obj_req) { - struct ceph_osd_request *osd_req; - - osd_req = __rbd_obj_add_osd_request(obj_req, NULL, 1); - if (IS_ERR(osd_req)) - return PTR_ERR(osd_req); - - osd_req_op_extent_init(osd_req, 0, CEPH_OSD_OP_READ, - obj_req->ex.oe_off, obj_req->ex.oe_len, 0, 0); - rbd_osd_setup_data(osd_req, 0); - - rbd_osd_format_read(osd_req); obj_req->read_state = RBD_OBJ_READ_START; return 0; } @@ -1876,11 +1863,6 @@ static int rbd_osd_setup_copyup(struct ceph_osd_request *osd_req, int which, return 0; } -static int count_write_ops(struct rbd_obj_request *obj_req) -{ - return 2; /* setallochint + write/writefull */ -} - static void __rbd_osd_setup_write_ops(struct ceph_osd_request *osd_req, int which) { @@ -1900,14 +1882,10 @@ static void __rbd_osd_setup_write_ops(struct ceph_osd_request *osd_req, osd_req_op_extent_init(osd_req, which, opcode, obj_req->ex.oe_off, obj_req->ex.oe_len, 0, 0); rbd_osd_setup_data(osd_req, which); - - rbd_osd_format_write(osd_req); } static int rbd_obj_setup_write(struct rbd_obj_request *obj_req) { - struct ceph_osd_request *osd_req; - unsigned int num_osd_ops, which = 0; int ret; /* reverse map the entire object onto the parent */ @@ -1918,22 +1896,7 @@ static int rbd_obj_setup_write(struct rbd_obj_request *obj_req) if (rbd_obj_copyup_enabled(obj_req)) obj_req->flags |= RBD_OBJ_FLAG_COPYUP_ENABLED; - num_osd_ops = count_write_ops(obj_req); - if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED) - num_osd_ops++; /* stat */ - - osd_req = rbd_obj_add_osd_request(obj_req, num_osd_ops); - if (IS_ERR(osd_req)) - return PTR_ERR(osd_req); - - if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED) { - ret = rbd_osd_setup_stat(osd_req, which++); - if (ret) - return ret; - } - obj_req->write_state = RBD_OBJ_WRITE_START; - __rbd_osd_setup_write_ops(osd_req, which); return 0; } @@ -1962,7 +1925,6 @@ static void __rbd_osd_setup_discard_ops(struct ceph_osd_request *osd_req, static int rbd_obj_setup_discard(struct rbd_obj_request *obj_req) { struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; - struct ceph_osd_request *osd_req; u64 off, next_off; int ret; @@ -1997,29 +1959,10 @@ static int rbd_obj_setup_discard(struct rbd_obj_request *obj_req) if (rbd_obj_is_entire(obj_req) && !obj_req->num_img_extents) obj_req->flags |= RBD_OBJ_FLAG_DELETION; - osd_req = rbd_obj_add_osd_request(obj_req, 1); - if (IS_ERR(osd_req)) - return PTR_ERR(osd_req); - obj_req->write_state = RBD_OBJ_WRITE_START; - __rbd_osd_setup_discard_ops(osd_req, 0); - rbd_osd_format_write(osd_req); return 0; } -static int count_zeroout_ops(struct rbd_obj_request *obj_req) -{ - int num_osd_ops; - - if (rbd_obj_is_entire(obj_req) && obj_req->num_img_extents && - !rbd_obj_copyup_enabled(obj_req)) - num_osd_ops = 2; /* create + truncate */ - else - num_osd_ops = 1; /* delete/truncate/zero */ - - return num_osd_ops; -} - static void __rbd_osd_setup_zeroout_ops(struct ceph_osd_request *osd_req, int which) { @@ -2046,14 +1989,10 @@ static void __rbd_osd_setup_zeroout_ops(struct ceph_osd_request *osd_req, osd_req_op_extent_init(osd_req, which, opcode, obj_req->ex.oe_off, obj_req->ex.oe_len, 0, 0); - - rbd_osd_format_write(osd_req); } static int rbd_obj_setup_zeroout(struct rbd_obj_request *obj_req) { - struct ceph_osd_request *osd_req; - unsigned int num_osd_ops, which = 0; int ret; /* reverse map the entire object onto the parent */ @@ -2068,34 +2007,56 @@ static int rbd_obj_setup_zeroout(struct rbd_obj_request *obj_req) obj_req->flags |= RBD_OBJ_FLAG_DELETION; } - num_osd_ops = count_zeroout_ops(obj_req); - if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED) - num_osd_ops++; /* stat */ + obj_req->write_state = RBD_OBJ_WRITE_START; + return 0; +} - osd_req = rbd_obj_add_osd_request(obj_req, num_osd_ops); - if (IS_ERR(osd_req)) - return PTR_ERR(osd_req); +static int count_write_ops(struct rbd_obj_request *obj_req) +{ + switch (obj_req->img_request->op_type) { + case OBJ_OP_WRITE: + return 2; /* setallochint + write/writefull */ + case OBJ_OP_DISCARD: + return 1; /* delete/truncate/zero */ + case OBJ_OP_ZEROOUT: + if (rbd_obj_is_entire(obj_req) && obj_req->num_img_extents && + !(obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED)) + return 2; /* create + truncate */ - if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED) { - ret = rbd_osd_setup_stat(osd_req, which++); - if (ret) - return ret; + return 1; /* delete/truncate/zero */ + default: + BUG(); } +} - obj_req->write_state = RBD_OBJ_WRITE_START; - __rbd_osd_setup_zeroout_ops(osd_req, which); - return 0; +static void rbd_osd_setup_write_ops(struct ceph_osd_request *osd_req, + int which) +{ + struct rbd_obj_request *obj_req = osd_req->r_priv; + + switch (obj_req->img_request->op_type) { + case OBJ_OP_WRITE: + __rbd_osd_setup_write_ops(osd_req, which); + break; + case OBJ_OP_DISCARD: + __rbd_osd_setup_discard_ops(osd_req, which); + break; + case OBJ_OP_ZEROOUT: + __rbd_osd_setup_zeroout_ops(osd_req, which); + break; + default: + BUG(); + } } /* - * For each object request in @img_req, allocate an OSD request, add - * individual OSD ops and prepare them for submission. The number of - * OSD ops depends on op_type and the overlap point (if any). + * Prune the list of object requests (adjust offset and/or length, drop + * redundant requests). Prepare object request state machines and image + * request state machine for execution. */ static int __rbd_img_fill_request(struct rbd_img_request *img_req) { struct rbd_obj_request *obj_req, *next_obj_req; - struct ceph_osd_request *osd_req; int ret; for_each_obj_request_safe(img_req, obj_req, next_obj_req) { @@ -2121,13 +2082,6 @@ static int __rbd_img_fill_request(struct rbd_img_request *img_req) rbd_img_obj_request_del(img_req, obj_req); continue; } - - osd_req = list_last_entry(&obj_req->osd_reqs, - struct ceph_osd_request, - r_unsafe_item); - ret = ceph_osdc_alloc_messages(osd_req, GFP_NOIO); - if (ret) - return ret; } img_req->state = RBD_IMG_START; @@ -2436,7 +2390,23 @@ static void rbd_img_schedule(struct rbd_img_request *img_req, int result) static int rbd_obj_read_object(struct rbd_obj_request *obj_req) { - rbd_obj_request_submit(obj_req); + struct ceph_osd_request *osd_req; + int ret; + + osd_req = __rbd_obj_add_osd_request(obj_req, NULL, 1); + if (IS_ERR(osd_req)) + return PTR_ERR(osd_req); + + osd_req_op_extent_init(osd_req, 0, CEPH_OSD_OP_READ, + obj_req->ex.oe_off, obj_req->ex.oe_len, 0, 0); + rbd_osd_setup_data(osd_req, 0); + rbd_osd_format_read(osd_req); + + ret = ceph_osdc_alloc_messages(osd_req, GFP_NOIO); + if (ret) + return ret; + + rbd_osd_submit(osd_req); return 0; } @@ -2549,7 +2519,32 @@ static bool rbd_obj_advance_read(struct rbd_obj_request *obj_req, int *result) static int rbd_obj_write_object(struct rbd_obj_request *obj_req) { - rbd_obj_request_submit(obj_req); + struct ceph_osd_request *osd_req; + int num_ops = count_write_ops(obj_req); + int which = 0; + int ret; + + if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED) + num_ops++; /* stat */ + + osd_req = rbd_obj_add_osd_request(obj_req, num_ops); + if (IS_ERR(osd_req)) + return PTR_ERR(osd_req); + + if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED) { + ret = rbd_osd_setup_stat(osd_req, which++); + if (ret) + return ret; + } + + rbd_osd_setup_write_ops(osd_req, which); + rbd_osd_format_write(osd_req); + + ret = ceph_osdc_alloc_messages(osd_req, GFP_NOIO); + if (ret) + return ret; + + rbd_osd_submit(osd_req); return 0; } @@ -2596,32 +2591,23 @@ static int rbd_obj_issue_copyup_empty_snapc(struct rbd_obj_request *obj_req, if (ret) return ret; - rbd_obj_request_submit(obj_req); + rbd_osd_submit(osd_req); return 0; } static int rbd_obj_issue_copyup_ops(struct rbd_obj_request *obj_req, u32 bytes) { - struct rbd_img_request *img_req = obj_req->img_request; struct ceph_osd_request *osd_req; - unsigned int num_osd_ops = (bytes != MODS_ONLY); - unsigned int which = 0; + int num_ops = count_write_ops(obj_req); + int which = 0; int ret; dout("%s obj_req %p bytes %u\n", __func__, obj_req, bytes); - switch (img_req->op_type) { - case OBJ_OP_WRITE: - num_osd_ops += count_write_ops(obj_req); - break; - case OBJ_OP_ZEROOUT: - num_osd_ops += count_zeroout_ops(obj_req); - break; - default: - BUG(); - } + if (bytes != MODS_ONLY) + num_ops++; /* copyup */ - osd_req = rbd_obj_add_osd_request(obj_req, num_osd_ops); + osd_req = rbd_obj_add_osd_request(obj_req, num_ops); if (IS_ERR(osd_req)) return PTR_ERR(osd_req); @@ -2631,22 +2617,14 @@ static int rbd_obj_issue_copyup_ops(struct rbd_obj_request *obj_req, u32 bytes) return ret; } - switch (img_req->op_type) { - case OBJ_OP_WRITE: - __rbd_osd_setup_write_ops(osd_req, which); - break; - case OBJ_OP_ZEROOUT: - __rbd_osd_setup_zeroout_ops(osd_req, which); - break; - default: - BUG(); - } + rbd_osd_setup_write_ops(osd_req, which); + rbd_osd_format_write(osd_req); ret = ceph_osdc_alloc_messages(osd_req, GFP_NOIO); if (ret) return ret; - rbd_obj_request_submit(obj_req); + rbd_osd_submit(osd_req); return 0; } From patchwork Tue Jun 25 14:41:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015771 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1F3C0112C for ; Tue, 25 Jun 2019 14:41:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 12C0E2870F for ; Tue, 25 Jun 2019 14:41:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 071FD28A4B; Tue, 25 Jun 2019 14:41:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A286E2870F for ; Tue, 25 Jun 2019 14:41:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731508AbfFYOlU (ORCPT ); Tue, 25 Jun 2019 10:41:20 -0400 Received: from mail-wm1-f67.google.com ([209.85.128.67]:37572 "EHLO mail-wm1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731158AbfFYOlU (ORCPT ); Tue, 25 Jun 2019 10:41:20 -0400 Received: by mail-wm1-f67.google.com with SMTP id f17so3256268wme.2 for ; Tue, 25 Jun 2019 07:41:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=R5Y6YY2EprO0T/O9tPp8lzbdyDH4MC8yxH6mA6CuYpE=; b=ciJOH37zjC5IDssjAF2DzCsQhFV5b+TJM98VU+VQlo/rgMVIGPO1KJYKl+zxxM7Kdp 8rT7bBYt329eoYg581/ww+bzGWnjtDouyBN59tXBpkUXBfJz6YrBkf56WvvPdubQDhXL 6ystToLzKPndUowPNToBqBps4EUrimygx66V0zmPSMUp5944Oa6aNh4BcvNY3xVwf1q5 02UmP/oqkyGFf79/A5df4yqVFD5/APJVeFATqomCbkQ76dMtaSB6F0dI4goiRUMshCII cdNMB6zAQxC38sa8eWXJmU3uwinb7VaKh4CaWOSIOvH0xzeOG2emsLxsgzOgTztSSiX2 ml2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=R5Y6YY2EprO0T/O9tPp8lzbdyDH4MC8yxH6mA6CuYpE=; b=UUe9TF8mR5CfUlvB7ylmBKuUALvMwta83oUjFt5t+ftiYO+yYeUyQJX2iiSP+XiRsC hBdEmSeam1zWGR9DYLSaXGSKZLP1RRqlxpG3q9mC27iRz+K4L4d6XRdiwaJE8jCNcqbS hTwYqpGyPIxqKG4yN5hlyTPfzPNSIpUubiEBeuC7O7FeleHEsm7QaIwIdtrvZsnwpwWF fSv/s1XweZVTCQEFppoqWrAZ0vNA96/z6LT7xCtJBnDhaYlM+LXiZzNd/+CODnpMxiUT 8OREI6mfpz1rvb9OfxaAE3bggOFv38rJF0yC5R8w9sXxkPsafrmWoQLQkUEGJeZkOgB4 5m5g== X-Gm-Message-State: APjAAAWwn5Gxw2Z4aT/YkUOAfI4NxDF7pnYnw+/Iv/+2hS5GSXrEevYf hSWwsU4CdUHLDT2dbWWEknmaa+V0d0U= X-Google-Smtp-Source: APXvYqzWYvnGwgrdP5teyUYFlHvnUfLgEJbmxlWYlLaY7BWSjseOwskgUv6VSpMWs5NimAJE9uevkQ== X-Received: by 2002:a1c:b6d4:: with SMTP id g203mr19917203wmf.19.1561473678144; Tue, 25 Jun 2019 07:41:18 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.17 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:17 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 10/20] rbd: rename rbd_obj_setup_*() to rbd_obj_init_*() Date: Tue, 25 Jun 2019 16:41:01 +0200 Message-Id: <20190625144111.11270-11-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP These functions don't allocate and set up OSD requests anymore. Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang --- drivers/block/rbd.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 61bc20cf1c29..2bafdee61dbd 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -1819,12 +1819,6 @@ static void rbd_osd_setup_data(struct ceph_osd_request *osd_req, int which) } } -static int rbd_obj_setup_read(struct rbd_obj_request *obj_req) -{ - obj_req->read_state = RBD_OBJ_READ_START; - return 0; -} - static int rbd_osd_setup_stat(struct ceph_osd_request *osd_req, int which) { struct page **pages; @@ -1863,6 +1857,12 @@ static int rbd_osd_setup_copyup(struct ceph_osd_request *osd_req, int which, return 0; } +static int rbd_obj_init_read(struct rbd_obj_request *obj_req) +{ + obj_req->read_state = RBD_OBJ_READ_START; + return 0; +} + static void __rbd_osd_setup_write_ops(struct ceph_osd_request *osd_req, int which) { @@ -1884,7 +1884,7 @@ static void __rbd_osd_setup_write_ops(struct ceph_osd_request *osd_req, rbd_osd_setup_data(osd_req, which); } -static int rbd_obj_setup_write(struct rbd_obj_request *obj_req) +static int rbd_obj_init_write(struct rbd_obj_request *obj_req) { int ret; @@ -1922,7 +1922,7 @@ static void __rbd_osd_setup_discard_ops(struct ceph_osd_request *osd_req, } } -static int rbd_obj_setup_discard(struct rbd_obj_request *obj_req) +static int rbd_obj_init_discard(struct rbd_obj_request *obj_req) { struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; u64 off, next_off; @@ -1991,7 +1991,7 @@ static void __rbd_osd_setup_zeroout_ops(struct ceph_osd_request *osd_req, 0, 0); } -static int rbd_obj_setup_zeroout(struct rbd_obj_request *obj_req) +static int rbd_obj_init_zeroout(struct rbd_obj_request *obj_req) { int ret; @@ -2062,16 +2062,16 @@ static int __rbd_img_fill_request(struct rbd_img_request *img_req) for_each_obj_request_safe(img_req, obj_req, next_obj_req) { switch (img_req->op_type) { case OBJ_OP_READ: - ret = rbd_obj_setup_read(obj_req); + ret = rbd_obj_init_read(obj_req); break; case OBJ_OP_WRITE: - ret = rbd_obj_setup_write(obj_req); + ret = rbd_obj_init_write(obj_req); break; case OBJ_OP_DISCARD: - ret = rbd_obj_setup_discard(obj_req); + ret = rbd_obj_init_discard(obj_req); break; case OBJ_OP_ZEROOUT: - ret = rbd_obj_setup_zeroout(obj_req); + ret = rbd_obj_init_zeroout(obj_req); break; default: BUG(); From patchwork Tue Jun 25 14:41:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015775 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 60DB01575 for ; Tue, 25 Jun 2019 14:41:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 52B4128A1D for ; Tue, 25 Jun 2019 14:41:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4722328C10; Tue, 25 Jun 2019 14:41:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9458E28C01 for ; Tue, 25 Jun 2019 14:41:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731613AbfFYOlV (ORCPT ); Tue, 25 Jun 2019 10:41:21 -0400 Received: from mail-wr1-f67.google.com ([209.85.221.67]:36354 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730925AbfFYOlV (ORCPT ); Tue, 25 Jun 2019 10:41:21 -0400 Received: by mail-wr1-f67.google.com with SMTP id n4so16989786wrs.3 for ; Tue, 25 Jun 2019 07:41:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=PY1iG4ZApMiK2oqW/oz2XZSAfCRbFDK3sQtPxnkSvLo=; b=AIdQjAlA6wWjS3adHK9YRO74svLgsHUX21GqEHQtrogi4K6JlVYl+f79gK3OURSfPh oqOQs+qL8Pr9sSQz1wKXmoG3DCeYXrKX4y46f3MVUoKnVZfHqJoUL+nOw8U7Ys3SRhCg UL3wQ27yYyQ7lCWe67OgPdmzMapyQUFfbs9uHE2BVLsrRXBUXQisqNtS4t+TpPfP8KJm 75CIsz/imTYpF8sq7DI8P9V12ZW9dYmU+lTtFzmxsosN/Mxu4l2PBcubv148YdC3SHXb szsnbq12GWKt5c7EtGs5Onuj+hRC+7NRUxJ/B5h/KwMOa66yJIlvAOSoR/UBMnrjDhOl JA+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PY1iG4ZApMiK2oqW/oz2XZSAfCRbFDK3sQtPxnkSvLo=; b=ZcAaPong+Rli1O04nhPNPSoqd66drKKSBMm+jjiNWrw0XdD1FqJLBmF5bd2pUFU6g9 XeCamwvj0EnzC6/8QcividdLNIAxFqlMU+3KFlg1tQN49c63d0lhhSV7wiS2mPt8KDOk ZqQ3PcOa8kRbNT0mVoXMfOLJCN23sqkQ538yVSd2ZZJ5Feqm/HgMcoT/lQjV8ly1rzQ6 ZhsMyHyaX4pvNgy9feKj54JKVZxxxXCCkKOF1LMOJJg/Zl9iOTKGZ5VVF6SzksySQYyh TF27gH6bYR/5lsgxKPglsjoGuyhscG2WYfY1uSD+Iy+FdHTvSBFMWzVflBP/WI7L2e76 OZGg== X-Gm-Message-State: APjAAAWa4jCHk9Ls/rZzqC20p0bOV01Emss4bToXc8DsCYS55YbWBLED iNErzpyM3XTinySQyIh2SG9NML9tiH4= X-Google-Smtp-Source: APXvYqxpiW/cJar2ar/WFpLaxL1cWtdTKQcoyU5wrLlgYUtl+ADmH7stkwqMnvjJmJairZEU8qf/Mg== X-Received: by 2002:a5d:46ce:: with SMTP id g14mr1647063wrs.203.1561473679018; Tue, 25 Jun 2019 07:41:19 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.18 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:18 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 11/20] rbd: introduce copyup state machine Date: Tue, 25 Jun 2019 16:41:02 +0200 Message-Id: <20190625144111.11270-12-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Both write and copyup paths will get more complex with object map. Factor copyup code out into a separate state machine. While at it, take advantage of obj_req->osd_reqs list and issue empty and current snapc OSD requests together, one after another. Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang --- drivers/block/rbd.c | 187 +++++++++++++++++++++++++++++--------------- 1 file changed, 123 insertions(+), 64 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 2bafdee61dbd..34bd45d336e6 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -226,6 +226,7 @@ enum obj_operation_type { #define RBD_OBJ_FLAG_DELETION (1U << 0) #define RBD_OBJ_FLAG_COPYUP_ENABLED (1U << 1) +#define RBD_OBJ_FLAG_COPYUP_ZEROS (1U << 2) enum rbd_obj_read_state { RBD_OBJ_READ_START = 1, @@ -261,9 +262,15 @@ enum rbd_obj_read_state { enum rbd_obj_write_state { RBD_OBJ_WRITE_START = 1, RBD_OBJ_WRITE_OBJECT, - RBD_OBJ_WRITE_READ_FROM_PARENT, - RBD_OBJ_WRITE_COPYUP_EMPTY_SNAPC, - RBD_OBJ_WRITE_COPYUP_OPS, + __RBD_OBJ_WRITE_COPYUP, + RBD_OBJ_WRITE_COPYUP, +}; + +enum rbd_obj_copyup_state { + RBD_OBJ_COPYUP_START = 1, + RBD_OBJ_COPYUP_READ_PARENT, + __RBD_OBJ_COPYUP_WRITE_OBJECT, + RBD_OBJ_COPYUP_WRITE_OBJECT, }; struct rbd_obj_request { @@ -286,12 +293,15 @@ struct rbd_obj_request { u32 bvec_idx; }; }; + + enum rbd_obj_copyup_state copyup_state; struct bio_vec *copyup_bvecs; u32 copyup_bvec_count; struct list_head osd_reqs; /* w/ r_unsafe_item */ struct mutex state_mutex; + struct pending_result pending; struct kref kref; }; @@ -2568,8 +2578,8 @@ static bool is_zero_bvecs(struct bio_vec *bvecs, u32 bytes) #define MODS_ONLY U32_MAX -static int rbd_obj_issue_copyup_empty_snapc(struct rbd_obj_request *obj_req, - u32 bytes) +static int rbd_obj_copyup_empty_snapc(struct rbd_obj_request *obj_req, + u32 bytes) { struct ceph_osd_request *osd_req; int ret; @@ -2595,7 +2605,8 @@ static int rbd_obj_issue_copyup_empty_snapc(struct rbd_obj_request *obj_req, return 0; } -static int rbd_obj_issue_copyup_ops(struct rbd_obj_request *obj_req, u32 bytes) +static int rbd_obj_copyup_current_snapc(struct rbd_obj_request *obj_req, + u32 bytes) { struct ceph_osd_request *osd_req; int num_ops = count_write_ops(obj_req); @@ -2628,33 +2639,6 @@ static int rbd_obj_issue_copyup_ops(struct rbd_obj_request *obj_req, u32 bytes) return 0; } -static int rbd_obj_issue_copyup(struct rbd_obj_request *obj_req, u32 bytes) -{ - /* - * Only send non-zero copyup data to save some I/O and network - * bandwidth -- zero copyup data is equivalent to the object not - * existing. - */ - if (is_zero_bvecs(obj_req->copyup_bvecs, bytes)) { - dout("%s obj_req %p detected zeroes\n", __func__, obj_req); - bytes = 0; - } - - if (obj_req->img_request->snapc->num_snaps && bytes > 0) { - /* - * Send a copyup request with an empty snapshot context to - * deep-copyup the object through all existing snapshots. - * A second request with the current snapshot context will be - * sent for the actual modification. - */ - obj_req->write_state = RBD_OBJ_WRITE_COPYUP_EMPTY_SNAPC; - return rbd_obj_issue_copyup_empty_snapc(obj_req, bytes); - } - - obj_req->write_state = RBD_OBJ_WRITE_COPYUP_OPS; - return rbd_obj_issue_copyup_ops(obj_req, bytes); -} - static int setup_copyup_bvecs(struct rbd_obj_request *obj_req, u64 obj_overlap) { u32 i; @@ -2688,7 +2672,7 @@ static int setup_copyup_bvecs(struct rbd_obj_request *obj_req, u64 obj_overlap) * target object up to the overlap point (if any) from the parent, * so we can use it for a copyup. */ -static int rbd_obj_handle_write_guard(struct rbd_obj_request *obj_req) +static int rbd_obj_copyup_read_parent(struct rbd_obj_request *obj_req) { struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; int ret; @@ -2703,22 +2687,111 @@ static int rbd_obj_handle_write_guard(struct rbd_obj_request *obj_req) * request -- pass MODS_ONLY since the copyup isn't needed * anymore. */ - obj_req->write_state = RBD_OBJ_WRITE_COPYUP_OPS; - return rbd_obj_issue_copyup_ops(obj_req, MODS_ONLY); + return rbd_obj_copyup_current_snapc(obj_req, MODS_ONLY); } ret = setup_copyup_bvecs(obj_req, rbd_obj_img_extents_bytes(obj_req)); if (ret) return ret; - obj_req->write_state = RBD_OBJ_WRITE_READ_FROM_PARENT; return rbd_obj_read_from_parent(obj_req); } +static void rbd_obj_copyup_write_object(struct rbd_obj_request *obj_req) +{ + u32 bytes = rbd_obj_img_extents_bytes(obj_req); + int ret; + + rbd_assert(!obj_req->pending.result && !obj_req->pending.num_pending); + + /* + * Only send non-zero copyup data to save some I/O and network + * bandwidth -- zero copyup data is equivalent to the object not + * existing. + */ + if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ZEROS) + bytes = 0; + + if (obj_req->img_request->snapc->num_snaps && bytes > 0) { + /* + * Send a copyup request with an empty snapshot context to + * deep-copyup the object through all existing snapshots. + * A second request with the current snapshot context will be + * sent for the actual modification. + */ + ret = rbd_obj_copyup_empty_snapc(obj_req, bytes); + if (ret) { + obj_req->pending.result = ret; + return; + } + + obj_req->pending.num_pending++; + bytes = MODS_ONLY; + } + + ret = rbd_obj_copyup_current_snapc(obj_req, bytes); + if (ret) { + obj_req->pending.result = ret; + return; + } + + obj_req->pending.num_pending++; +} + +static bool rbd_obj_advance_copyup(struct rbd_obj_request *obj_req, int *result) +{ + int ret; + +again: + switch (obj_req->copyup_state) { + case RBD_OBJ_COPYUP_START: + rbd_assert(!*result); + + ret = rbd_obj_copyup_read_parent(obj_req); + if (ret) { + *result = ret; + return true; + } + if (obj_req->num_img_extents) + obj_req->copyup_state = RBD_OBJ_COPYUP_READ_PARENT; + else + obj_req->copyup_state = RBD_OBJ_COPYUP_WRITE_OBJECT; + return false; + case RBD_OBJ_COPYUP_READ_PARENT: + if (*result) + return true; + + if (is_zero_bvecs(obj_req->copyup_bvecs, + rbd_obj_img_extents_bytes(obj_req))) { + dout("%s %p detected zeros\n", __func__, obj_req); + obj_req->flags |= RBD_OBJ_FLAG_COPYUP_ZEROS; + } + + rbd_obj_copyup_write_object(obj_req); + if (!obj_req->pending.num_pending) { + *result = obj_req->pending.result; + obj_req->copyup_state = RBD_OBJ_COPYUP_WRITE_OBJECT; + goto again; + } + obj_req->copyup_state = __RBD_OBJ_COPYUP_WRITE_OBJECT; + return false; + case __RBD_OBJ_COPYUP_WRITE_OBJECT: + if (!pending_result_dec(&obj_req->pending, result)) + return false; + /* fall through */ + case RBD_OBJ_COPYUP_WRITE_OBJECT: + return true; + default: + BUG(); + } +} + static bool rbd_obj_advance_write(struct rbd_obj_request *obj_req, int *result) { + struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; int ret; +again: switch (obj_req->write_state) { case RBD_OBJ_WRITE_START: rbd_assert(!*result); @@ -2733,12 +2806,10 @@ static bool rbd_obj_advance_write(struct rbd_obj_request *obj_req, int *result) case RBD_OBJ_WRITE_OBJECT: if (*result == -ENOENT) { if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ENABLED) { - ret = rbd_obj_handle_write_guard(obj_req); - if (ret) { - *result = ret; - return true; - } - return false; + *result = 0; + obj_req->copyup_state = RBD_OBJ_COPYUP_START; + obj_req->write_state = __RBD_OBJ_WRITE_COPYUP; + goto again; } /* * On a non-existent object: @@ -2747,31 +2818,19 @@ static bool rbd_obj_advance_write(struct rbd_obj_request *obj_req, int *result) if (obj_req->flags & RBD_OBJ_FLAG_DELETION) *result = 0; } - /* fall through */ - case RBD_OBJ_WRITE_COPYUP_OPS: - return true; - case RBD_OBJ_WRITE_READ_FROM_PARENT: if (*result) return true; - ret = rbd_obj_issue_copyup(obj_req, - rbd_obj_img_extents_bytes(obj_req)); - if (ret) { - *result = ret; - return true; - } - return false; - case RBD_OBJ_WRITE_COPYUP_EMPTY_SNAPC: + obj_req->write_state = RBD_OBJ_WRITE_COPYUP; + goto again; + case __RBD_OBJ_WRITE_COPYUP: + if (!rbd_obj_advance_copyup(obj_req, result)) + return false; + /* fall through */ + case RBD_OBJ_WRITE_COPYUP: if (*result) - return true; - - obj_req->write_state = RBD_OBJ_WRITE_COPYUP_OPS; - ret = rbd_obj_issue_copyup_ops(obj_req, MODS_ONLY); - if (ret) { - *result = ret; - return true; - } - return false; + rbd_warn(rbd_dev, "copyup failed: %d", *result); + return true; default: BUG(); } From patchwork Tue Jun 25 14:41:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015777 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B081C112C for ; Tue, 25 Jun 2019 14:41:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A0E3B287C5 for ; Tue, 25 Jun 2019 14:41:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9544328A39; Tue, 25 Jun 2019 14:41:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2CB91289EA for ; Tue, 25 Jun 2019 14:41:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731670AbfFYOlX (ORCPT ); Tue, 25 Jun 2019 10:41:23 -0400 Received: from mail-wr1-f66.google.com ([209.85.221.66]:33649 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731158AbfFYOlW (ORCPT ); Tue, 25 Jun 2019 10:41:22 -0400 Received: by mail-wr1-f66.google.com with SMTP id n9so18249612wru.0 for ; Tue, 25 Jun 2019 07:41:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=S+nEyGpQieU0VCspLendCohnzONAtO8Jad1SFzDtK8c=; b=lBGorCarKe0DjUtPS2Avr+Glf7G6n9Isbmb+inhAUqSneTHf0MhBNNNyQhnq+knu+X a1zCj6ONl1JHrgoActaDhnJVGVWMY/KjZYbaC1jq0T98Z6Hwt7rUntKg8/3wQoJSfgcP N3+2ReZGrLNMmIc81jyGQMHRLQWRLlS5UGhYRqQLcZgR08k/5f3/iZOwQ6AV0PTIebS3 emyYiOQ+O36CVWty6oWa2ZuHX+DEMQdrOE+/M8bRWe13GNPWyZCtWBC331gGWHJVW/hn +53nLt7SiYzVhnk6IMZ3x5d2TWsTb87KGPvjHxIRGvNRX2UQXVhJUqzU3+559Ak04lh5 BxuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=S+nEyGpQieU0VCspLendCohnzONAtO8Jad1SFzDtK8c=; b=oMoFJpUYHY9yV0Mb1yPN1c21oDEkyefWijLWjjGp/PbV6qI0UkrvS36Cood0osPbhR doEL+WX7VZuiCyiiCwHWbpkPoeNM23NHrIJchCXoJ1rnAbLPZL8KZWyJVEBOu05OniTH 60YhW2ih2M83ihSyL2RtspQRI6ST6D2WzJXA2rZXMoHGOOpkc+mkOQr5bIHZ69cR/NcY zrskaX1YTYxQuAwEORR767kaPP8YwSrwmR5w8KNiOi799SEMWMl063d5n9kj5xkKga8d KVEkAPpK/R1ipB8erd6zZVlFa6dHys3tIVHip9Tc+BcjJ4W6qf+hrOJ/LLbhgl0Yd2Ut XHQQ== X-Gm-Message-State: APjAAAUxnnFq0Yy1h4Aipawy4bj+/EGWky1Vzo6siNC2JWBNSi2PydQQ 05TQVz6HYPL1AmzP7EAnI3hx/gYbmLM= X-Google-Smtp-Source: APXvYqykqSZLbvBTriZfl9HwgUNkyj2ch0czjUdPN6BtTnxg228+U/hQilBkw3mooeqCulhqnlKHlg== X-Received: by 2002:adf:e84a:: with SMTP id d10mr36224272wrn.316.1561473679843; Tue, 25 Jun 2019 07:41:19 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.19 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:19 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 12/20] rbd: lock should be quiesced on reacquire Date: Tue, 25 Jun 2019 16:41:03 +0200 Message-Id: <20190625144111.11270-13-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Quiesce exclusive lock at the top of rbd_reacquire_lock() instead of only when ceph_cls_set_cookie() fails. This avoids a deadlock on rbd_dev->lock_rwsem. If rbd_dev->lock_rwsem is needed for I/O completion, set_cookie can hang ceph-msgr worker thread if set_cookie reply ends up behind an I/O reply, because, like lock and unlock requests, set_cookie is sent and waited upon with rbd_dev->lock_rwsem held for write. Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang --- drivers/block/rbd.c | 35 +++++++++++++++++++++-------------- 1 file changed, 21 insertions(+), 14 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 34bd45d336e6..5fcb4ebd981a 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -3004,6 +3004,7 @@ static void __rbd_lock(struct rbd_device *rbd_dev, const char *cookie) { struct rbd_client_id cid = rbd_get_cid(rbd_dev); + rbd_dev->lock_state = RBD_LOCK_STATE_LOCKED; strcpy(rbd_dev->lock_cookie, cookie); rbd_set_owner_cid(rbd_dev, &cid); queue_work(rbd_dev->task_wq, &rbd_dev->acquired_lock_work); @@ -3028,7 +3029,6 @@ static int rbd_lock(struct rbd_device *rbd_dev) if (ret) return ret; - rbd_dev->lock_state = RBD_LOCK_STATE_LOCKED; __rbd_lock(rbd_dev, cookie); return 0; } @@ -3411,13 +3411,11 @@ static void rbd_acquire_lock(struct work_struct *work) } } -/* - * lock_rwsem must be held for write - */ -static bool rbd_release_lock(struct rbd_device *rbd_dev) +static bool rbd_quiesce_lock(struct rbd_device *rbd_dev) { - dout("%s rbd_dev %p read lock_state %d\n", __func__, rbd_dev, - rbd_dev->lock_state); + dout("%s rbd_dev %p\n", __func__, rbd_dev); + lockdep_assert_held_exclusive(&rbd_dev->lock_rwsem); + if (rbd_dev->lock_state != RBD_LOCK_STATE_LOCKED) return false; @@ -3433,12 +3431,22 @@ static bool rbd_release_lock(struct rbd_device *rbd_dev) up_read(&rbd_dev->lock_rwsem); down_write(&rbd_dev->lock_rwsem); - dout("%s rbd_dev %p write lock_state %d\n", __func__, rbd_dev, - rbd_dev->lock_state); if (rbd_dev->lock_state != RBD_LOCK_STATE_RELEASING) return false; + return true; +} + +/* + * lock_rwsem must be held for write + */ +static void rbd_release_lock(struct rbd_device *rbd_dev) +{ + if (!rbd_quiesce_lock(rbd_dev)) + return; + rbd_unlock(rbd_dev); + /* * Give others a chance to grab the lock - we would re-acquire * almost immediately if we got new IO during ceph_osdc_sync() @@ -3447,7 +3455,6 @@ static bool rbd_release_lock(struct rbd_device *rbd_dev) * after wake_requests() in rbd_handle_released_lock(). */ cancel_delayed_work(&rbd_dev->lock_dwork); - return true; } static void rbd_release_lock_work(struct work_struct *work) @@ -3795,7 +3802,8 @@ static void rbd_reacquire_lock(struct rbd_device *rbd_dev) char cookie[32]; int ret; - WARN_ON(rbd_dev->lock_state != RBD_LOCK_STATE_LOCKED); + if (!rbd_quiesce_lock(rbd_dev)) + return; format_lock_cookie(rbd_dev, cookie); ret = ceph_cls_set_cookie(osdc, &rbd_dev->header_oid, @@ -3811,9 +3819,8 @@ static void rbd_reacquire_lock(struct rbd_device *rbd_dev) * Lock cookie cannot be updated on older OSDs, so do * a manual release and queue an acquire. */ - if (rbd_release_lock(rbd_dev)) - queue_delayed_work(rbd_dev->task_wq, - &rbd_dev->lock_dwork, 0); + rbd_unlock(rbd_dev); + queue_delayed_work(rbd_dev->task_wq, &rbd_dev->lock_dwork, 0); } else { __rbd_lock(rbd_dev, cookie); } From patchwork Tue Jun 25 14:41:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015779 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C1ACD112C for ; Tue, 25 Jun 2019 14:41:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B28B2212D5 for ; Tue, 25 Jun 2019 14:41:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A6F8128A39; Tue, 25 Jun 2019 14:41:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0C08928C01 for ; Tue, 25 Jun 2019 14:41:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731689AbfFYOlY (ORCPT ); Tue, 25 Jun 2019 10:41:24 -0400 Received: from mail-wm1-f68.google.com ([209.85.128.68]:39668 "EHLO mail-wm1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731601AbfFYOlX (ORCPT ); Tue, 25 Jun 2019 10:41:23 -0400 Received: by mail-wm1-f68.google.com with SMTP id z23so3236710wma.4 for ; Tue, 25 Jun 2019 07:41:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=BAUed1XgGUdufZlXODFnV9udQwj1WesncWPyI/AsAes=; b=djdnrIxVeLQiF7lr3cdcZz+CdTVvuw9JkUildTGm0gH0ljPIu8G2Br7N20ax3qRiMp hS4x2Y/u9cWLf6k4YKa16vw1fhll+1Fj5/4s8motZVOOc2uO/8bFiz/btL98j8oE4+vE c3waEyFze2xXfnkV3YZwPSC09FT5NbemFo7zl7+VO8qGgXIPx848RaeiLFv/5W8FwvNO 8r3PrPdzzMBNm4+61whb6yiv3p0stYfuhNG310OnDXW9cKiveSNbrbBxfzGMFDJTpNpS vzjLEsSlas96+03XziB6H6vCotAAALL1x8FbG0RMpQhUvo3+zclY/95CUl6Ao16T60n3 s23A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=BAUed1XgGUdufZlXODFnV9udQwj1WesncWPyI/AsAes=; b=pnAqmrlKWwKIUOrwyxxsLDROwba4UI7Eot/rzDcKjNqlXVL+vlvHQT9XoMMmKBsnWd L12qGX38TLsIWL6UgWXmt/eVVwzUF47PHu4yfWt9sz/TquspLITv/A4c6dtuCtCQt3Nx rIMLKa6CaST09wEifLvkvjiLCe9vt0LABCLtywrUTMBeOpa0djstZExyPBxslLzjlNnq qGT3MM+/PLnCre0RqqAnvQADD2zR0BDX3yIYLu48fp9PfLqoO34hBjcraqDv2DYETIiA M5oFLVirxI9tSbYSYtrKOEX5Xaxu7shyTRZ2QMuP3W+kLvJvlFL4syEHxTQ+8GP8aSRs VBcg== X-Gm-Message-State: APjAAAXZnmD7069jQVWE3tUM6o+MURMhLmPfrkQdob5s/u7aX6Qedhlo 10gTjXa3YVKj5kmqOOY87PY0jq+WZ4M= X-Google-Smtp-Source: APXvYqwNUkLP+17HY00swl7wazQw3jw9+RHC2OaGXYiWSjRdUOqxzUZvntSHpZzFhSjmXMmuRa55WA== X-Received: by 2002:a1c:740f:: with SMTP id p15mr20108921wmc.103.1561473680908; Tue, 25 Jun 2019 07:41:20 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.19 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:20 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 13/20] rbd: quiescing lock should wait for image requests Date: Tue, 25 Jun 2019 16:41:04 +0200 Message-Id: <20190625144111.11270-14-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Syncing OSD requests doesn't really work. A single image request may be comprised of multiple object requests, each of which can go through a series of OSD requests (original, copyups, etc). On top of that, the OSD cliest may be shared with other rbd devices. What we want is to ensure that all in-flight image requests complete. Introduce rbd_dev->running_list and block in RBD_LOCK_STATE_RELEASING until that happens. New OSD requests may be started during this time. Note that __rbd_img_handle_request() acquires rbd_dev->lock_rwsem only if need_exclusive_lock() returns true. This avoids a deadlock similar to the one outlined in the previous commit between unlock and I/O that doesn't require lock, such as a read with object-map feature disabled. Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang --- drivers/block/rbd.c | 104 ++++++++++++++++++++++++++++++++++++++------ 1 file changed, 90 insertions(+), 14 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 5fcb4ebd981a..59d1fef35663 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -331,6 +331,7 @@ struct rbd_img_request { struct rbd_obj_request *obj_request; /* obj req initiator */ }; + struct list_head lock_item; struct list_head object_extents; /* obj_req.ex structs */ struct mutex state_mutex; @@ -410,6 +411,9 @@ struct rbd_device { struct work_struct released_lock_work; struct delayed_work lock_dwork; struct work_struct unlock_work; + spinlock_t lock_lists_lock; + struct list_head running_list; + struct completion releasing_wait; wait_queue_head_t lock_waitq; struct workqueue_struct *task_wq; @@ -1726,6 +1730,7 @@ static struct rbd_img_request *rbd_img_request_create( if (rbd_dev_parent_get(rbd_dev)) img_request_layered_set(img_request); + INIT_LIST_HEAD(&img_request->lock_item); INIT_LIST_HEAD(&img_request->object_extents); mutex_init(&img_request->state_mutex); kref_init(&img_request->kref); @@ -1745,6 +1750,7 @@ static void rbd_img_request_destroy(struct kref *kref) dout("%s: img %p\n", __func__, img_request); + WARN_ON(!list_empty(&img_request->lock_item)); for_each_obj_request_safe(img_request, obj_request, next_obj_request) rbd_img_obj_request_del(img_request, obj_request); @@ -2872,6 +2878,50 @@ static void rbd_obj_handle_request(struct rbd_obj_request *obj_req, int result) rbd_img_handle_request(obj_req->img_request, result); } +static bool need_exclusive_lock(struct rbd_img_request *img_req) +{ + struct rbd_device *rbd_dev = img_req->rbd_dev; + + if (!(rbd_dev->header.features & RBD_FEATURE_EXCLUSIVE_LOCK)) + return false; + + if (rbd_dev->spec->snap_id != CEPH_NOSNAP) + return false; + + rbd_assert(!test_bit(IMG_REQ_CHILD, &img_req->flags)); + if (rbd_dev->opts->lock_on_read) + return true; + + return rbd_img_is_write(img_req); +} + +static void rbd_lock_add_request(struct rbd_img_request *img_req) +{ + struct rbd_device *rbd_dev = img_req->rbd_dev; + + lockdep_assert_held(&rbd_dev->lock_rwsem); + spin_lock(&rbd_dev->lock_lists_lock); + rbd_assert(list_empty(&img_req->lock_item)); + list_add_tail(&img_req->lock_item, &rbd_dev->running_list); + spin_unlock(&rbd_dev->lock_lists_lock); +} + +static void rbd_lock_del_request(struct rbd_img_request *img_req) +{ + struct rbd_device *rbd_dev = img_req->rbd_dev; + bool need_wakeup; + + lockdep_assert_held(&rbd_dev->lock_rwsem); + spin_lock(&rbd_dev->lock_lists_lock); + rbd_assert(!list_empty(&img_req->lock_item)); + list_del_init(&img_req->lock_item); + need_wakeup = (rbd_dev->lock_state == RBD_LOCK_STATE_RELEASING && + list_empty(&rbd_dev->running_list)); + spin_unlock(&rbd_dev->lock_lists_lock); + if (need_wakeup) + complete(&rbd_dev->releasing_wait); +} + static void rbd_img_object_requests(struct rbd_img_request *img_req) { struct rbd_obj_request *obj_req; @@ -2927,9 +2977,19 @@ static bool __rbd_img_handle_request(struct rbd_img_request *img_req, struct rbd_device *rbd_dev = img_req->rbd_dev; bool done; - mutex_lock(&img_req->state_mutex); - done = rbd_img_advance(img_req, result); - mutex_unlock(&img_req->state_mutex); + if (need_exclusive_lock(img_req)) { + down_read(&rbd_dev->lock_rwsem); + mutex_lock(&img_req->state_mutex); + done = rbd_img_advance(img_req, result); + if (done) + rbd_lock_del_request(img_req); + mutex_unlock(&img_req->state_mutex); + up_read(&rbd_dev->lock_rwsem); + } else { + mutex_lock(&img_req->state_mutex); + done = rbd_img_advance(img_req, result); + mutex_unlock(&img_req->state_mutex); + } if (done && *result) { rbd_assert(*result < 0); @@ -3413,30 +3473,40 @@ static void rbd_acquire_lock(struct work_struct *work) static bool rbd_quiesce_lock(struct rbd_device *rbd_dev) { + bool need_wait; + dout("%s rbd_dev %p\n", __func__, rbd_dev); lockdep_assert_held_exclusive(&rbd_dev->lock_rwsem); if (rbd_dev->lock_state != RBD_LOCK_STATE_LOCKED) return false; - rbd_dev->lock_state = RBD_LOCK_STATE_RELEASING; - downgrade_write(&rbd_dev->lock_rwsem); /* * Ensure that all in-flight IO is flushed. - * - * FIXME: ceph_osdc_sync() flushes the entire OSD client, which - * may be shared with other devices. */ - ceph_osdc_sync(&rbd_dev->rbd_client->client->osdc); + rbd_dev->lock_state = RBD_LOCK_STATE_RELEASING; + rbd_assert(!completion_done(&rbd_dev->releasing_wait)); + need_wait = !list_empty(&rbd_dev->running_list); + downgrade_write(&rbd_dev->lock_rwsem); + if (need_wait) + wait_for_completion(&rbd_dev->releasing_wait); up_read(&rbd_dev->lock_rwsem); down_write(&rbd_dev->lock_rwsem); if (rbd_dev->lock_state != RBD_LOCK_STATE_RELEASING) return false; + rbd_assert(list_empty(&rbd_dev->running_list)); return true; } +static void __rbd_release_lock(struct rbd_device *rbd_dev) +{ + rbd_assert(list_empty(&rbd_dev->running_list)); + + rbd_unlock(rbd_dev); +} + /* * lock_rwsem must be held for write */ @@ -3445,7 +3515,7 @@ static void rbd_release_lock(struct rbd_device *rbd_dev) if (!rbd_quiesce_lock(rbd_dev)) return; - rbd_unlock(rbd_dev); + __rbd_release_lock(rbd_dev); /* * Give others a chance to grab the lock - we would re-acquire @@ -3819,7 +3889,7 @@ static void rbd_reacquire_lock(struct rbd_device *rbd_dev) * Lock cookie cannot be updated on older OSDs, so do * a manual release and queue an acquire. */ - rbd_unlock(rbd_dev); + __rbd_release_lock(rbd_dev); queue_delayed_work(rbd_dev->task_wq, &rbd_dev->lock_dwork, 0); } else { __rbd_lock(rbd_dev, cookie); @@ -4085,9 +4155,12 @@ static void rbd_queue_workfn(struct work_struct *work) if (result) goto err_img_request; - rbd_img_handle_request(img_request, 0); - if (must_be_locked) + if (must_be_locked) { + rbd_lock_add_request(img_request); up_read(&rbd_dev->lock_rwsem); + } + + rbd_img_handle_request(img_request, 0); return; err_img_request: @@ -4761,6 +4834,9 @@ static struct rbd_device *__rbd_dev_create(struct rbd_client *rbdc, INIT_WORK(&rbd_dev->released_lock_work, rbd_notify_released_lock); INIT_DELAYED_WORK(&rbd_dev->lock_dwork, rbd_acquire_lock); INIT_WORK(&rbd_dev->unlock_work, rbd_release_lock_work); + spin_lock_init(&rbd_dev->lock_lists_lock); + INIT_LIST_HEAD(&rbd_dev->running_list); + init_completion(&rbd_dev->releasing_wait); init_waitqueue_head(&rbd_dev->lock_waitq); rbd_dev->dev.bus = &rbd_bus_type; @@ -5777,7 +5853,7 @@ static void rbd_dev_image_unlock(struct rbd_device *rbd_dev) { down_write(&rbd_dev->lock_rwsem); if (__rbd_is_lock_owner(rbd_dev)) - rbd_unlock(rbd_dev); + __rbd_release_lock(rbd_dev); up_write(&rbd_dev->lock_rwsem); } From patchwork Tue Jun 25 14:41:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015783 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3C3D11398 for ; Tue, 25 Jun 2019 14:41:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2E031286A8 for ; Tue, 25 Jun 2019 14:41:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 22C2A289EA; Tue, 25 Jun 2019 14:41:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DE9E2286A8 for ; Tue, 25 Jun 2019 14:41:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731727AbfFYOl0 (ORCPT ); Tue, 25 Jun 2019 10:41:26 -0400 Received: from mail-wr1-f68.google.com ([209.85.221.68]:36365 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730689AbfFYOl0 (ORCPT ); Tue, 25 Jun 2019 10:41:26 -0400 Received: by mail-wr1-f68.google.com with SMTP id n4so16989996wrs.3 for ; Tue, 25 Jun 2019 07:41:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=fowyecVWneYfCbppVVllpmDlIkpxRkWryxIBUaGY+60=; b=UZwLJyruJ7ppodEjXl1ijI9TcPA/39S3NP26vzQAJgoqCBAf5gcrGBY2UJzDsDFXkx 15JRZQXIW/IrdZUFC9iNCBHRNSLkoe2P29zf/k/68esJ4gw7k0VsiyUumx2bvA/l4cPY EJGD4V5Or7Y5cHXxbuGJVNrtTUKkui78HGrSaCNkyFsBafu3IT428/0ae+vpsNI+VMHo x2CoPFORn8TNSFwyq1mke6ANDX56eWxzbtI1HD/+Kz88EpfGTBtyQupW3nJkpjGiE9aR xWHx3sMqDaUdb8B4SIOv/tplQ/Sa5+p6ZRKb9EXLbNgEmOAt4U5d1zKoLso/mVpxiIbK 2FCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=fowyecVWneYfCbppVVllpmDlIkpxRkWryxIBUaGY+60=; b=ZGJFqADPs/NsEfdDmZnEeRIl9azloZq5NKc0Ey/cHgXwcWCbQGv2b6lzDVPeyNhkOS 7fvjFcWcS4oM35a4PB4jacxm6errXlsI42ipgF7ZFjXWKnzlL/lq1e/l3Oeu3iDOmFJU aUXCXlCK6fwRvWTaoxC+J8SBvZ1pXj1K25o60wAW9G2IymacDdkvSnvgcP0GKV5IiDDi k8AGeC9TVDUX/bCQWQ+gDvsRXbfjlt2eOiCrwFNoLS3LB4BGpXhWxPrH6wmSfEUh0BYB Jl6f+mLFQbxz5A6rsGa27SVniNIVufByAJYsy7ZKT8qjkS4g0xQyggXd38Xp7cdDRdkt GgTA== X-Gm-Message-State: APjAAAV7FpDsYDJETlJf8f1NC11+Uzh27Wz9wtX3UFwbdMR3C5OJNbU0 Rjt2McbmAO+o+Oy3BAOHSRW0Avw29oE= X-Google-Smtp-Source: APXvYqzTKQm7bByX+ZGtBRWTuNZkAGZ3tGCXMhiHnXkD2oX7Qmd+LfAWbVWUPB8pAZcWUTXyGmK10g== X-Received: by 2002:a5d:5692:: with SMTP id f18mr57518495wrv.104.1561473681951; Tue, 25 Jun 2019 07:41:21 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.21 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:21 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 14/20] rbd: new exclusive lock wait/wake code Date: Tue, 25 Jun 2019 16:41:05 +0200 Message-Id: <20190625144111.11270-15-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP rbd_wait_state_locked() is built around rbd_dev->lock_waitq and blocks rbd worker threads while waiting for the lock, potentially impacting other rbd devices. There is no good way to pass an error code into image request state machines when acquisition fails, hence the use of RBD_DEV_FLAG_BLACKLISTED for everything and various other issues. Introduce rbd_dev->acquiring_list and move acquisition into image request state machine. Use rbd_img_schedule() for kicking and passing error codes. No blocking occurs while waiting for the lock, but rbd_dev->lock_rwsem is still held across lock, unlock and set_cookie calls. Always acquire the lock on "rbd map" to avoid associating the latency of acquiring the lock with the first I/O request. A slight regression is that lock_timeout is now respected only if lock acquisition is triggered by "rbd map" and not by I/O. This is somewhat compensated by the fact that we no longer block if the peer refuses to release lock -- I/O is failed with EROFS right away. Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang --- drivers/block/rbd.c | 325 +++++++++++++++++++++++++------------------- 1 file changed, 182 insertions(+), 143 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 59d1fef35663..fd3f248ba9c2 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -312,6 +312,7 @@ enum img_req_flags { enum rbd_img_state { RBD_IMG_START = 1, + RBD_IMG_EXCLUSIVE_LOCK, __RBD_IMG_OBJECT_REQUESTS, RBD_IMG_OBJECT_REQUESTS, }; @@ -412,9 +413,11 @@ struct rbd_device { struct delayed_work lock_dwork; struct work_struct unlock_work; spinlock_t lock_lists_lock; + struct list_head acquiring_list; struct list_head running_list; + struct completion acquire_wait; + int acquire_err; struct completion releasing_wait; - wait_queue_head_t lock_waitq; struct workqueue_struct *task_wq; @@ -442,12 +445,10 @@ struct rbd_device { * Flag bits for rbd_dev->flags: * - REMOVING (which is coupled with rbd_dev->open_count) is protected * by rbd_dev->lock - * - BLACKLISTED is protected by rbd_dev->lock_rwsem */ enum rbd_dev_flags { RBD_DEV_FLAG_EXISTS, /* mapped snapshot has not been deleted */ RBD_DEV_FLAG_REMOVING, /* this mapping is being removed */ - RBD_DEV_FLAG_BLACKLISTED, /* our ceph_client is blacklisted */ }; static DEFINE_MUTEX(client_mutex); /* Serialize client creation */ @@ -500,6 +501,8 @@ static int minor_to_rbd_dev_id(int minor) static bool __rbd_is_lock_owner(struct rbd_device *rbd_dev) { + lockdep_assert_held(&rbd_dev->lock_rwsem); + return rbd_dev->lock_state == RBD_LOCK_STATE_LOCKED || rbd_dev->lock_state == RBD_LOCK_STATE_RELEASING; } @@ -2895,15 +2898,21 @@ static bool need_exclusive_lock(struct rbd_img_request *img_req) return rbd_img_is_write(img_req); } -static void rbd_lock_add_request(struct rbd_img_request *img_req) +static bool rbd_lock_add_request(struct rbd_img_request *img_req) { struct rbd_device *rbd_dev = img_req->rbd_dev; + bool locked; lockdep_assert_held(&rbd_dev->lock_rwsem); + locked = rbd_dev->lock_state == RBD_LOCK_STATE_LOCKED; spin_lock(&rbd_dev->lock_lists_lock); rbd_assert(list_empty(&img_req->lock_item)); - list_add_tail(&img_req->lock_item, &rbd_dev->running_list); + if (!locked) + list_add_tail(&img_req->lock_item, &rbd_dev->acquiring_list); + else + list_add_tail(&img_req->lock_item, &rbd_dev->running_list); spin_unlock(&rbd_dev->lock_lists_lock); + return locked; } static void rbd_lock_del_request(struct rbd_img_request *img_req) @@ -2922,6 +2931,30 @@ static void rbd_lock_del_request(struct rbd_img_request *img_req) complete(&rbd_dev->releasing_wait); } +static int rbd_img_exclusive_lock(struct rbd_img_request *img_req) +{ + struct rbd_device *rbd_dev = img_req->rbd_dev; + + if (!need_exclusive_lock(img_req)) + return 1; + + if (rbd_lock_add_request(img_req)) + return 1; + + if (rbd_dev->opts->exclusive) { + WARN_ON(1); /* lock got released? */ + return -EROFS; + } + + /* + * Note the use of mod_delayed_work() in rbd_acquire_lock() + * and cancel_delayed_work() in wake_requests(). + */ + dout("%s rbd_dev %p queueing lock_dwork\n", __func__, rbd_dev); + queue_delayed_work(rbd_dev->task_wq, &rbd_dev->lock_dwork, 0); + return 0; +} + static void rbd_img_object_requests(struct rbd_img_request *img_req) { struct rbd_obj_request *obj_req; @@ -2944,11 +2977,30 @@ static void rbd_img_object_requests(struct rbd_img_request *img_req) static bool rbd_img_advance(struct rbd_img_request *img_req, int *result) { + struct rbd_device *rbd_dev = img_req->rbd_dev; + int ret; + again: switch (img_req->state) { case RBD_IMG_START: rbd_assert(!*result); + ret = rbd_img_exclusive_lock(img_req); + if (ret < 0) { + *result = ret; + return true; + } + img_req->state = RBD_IMG_EXCLUSIVE_LOCK; + if (ret > 0) + goto again; + return false; + case RBD_IMG_EXCLUSIVE_LOCK: + if (*result) + return true; + + rbd_assert(!need_exclusive_lock(img_req) || + __rbd_is_lock_owner(rbd_dev)); + rbd_img_object_requests(img_req); if (!img_req->pending.num_pending) { *result = img_req->pending.result; @@ -3107,7 +3159,7 @@ static void rbd_unlock(struct rbd_device *rbd_dev) ret = ceph_cls_unlock(osdc, &rbd_dev->header_oid, &rbd_dev->header_oloc, RBD_LOCK_NAME, rbd_dev->lock_cookie); if (ret && ret != -ENOENT) - rbd_warn(rbd_dev, "failed to unlock: %d", ret); + rbd_warn(rbd_dev, "failed to unlock header: %d", ret); /* treat errors as the image is unlocked */ rbd_dev->lock_state = RBD_LOCK_STATE_UNLOCKED; @@ -3234,15 +3286,30 @@ static int rbd_request_lock(struct rbd_device *rbd_dev) goto out; } -static void wake_requests(struct rbd_device *rbd_dev, bool wake_all) +static void wake_requests(struct rbd_device *rbd_dev, int result) { - dout("%s rbd_dev %p wake_all %d\n", __func__, rbd_dev, wake_all); + struct rbd_img_request *img_req; + + dout("%s rbd_dev %p result %d\n", __func__, rbd_dev, result); + lockdep_assert_held_exclusive(&rbd_dev->lock_rwsem); cancel_delayed_work(&rbd_dev->lock_dwork); - if (wake_all) - wake_up_all(&rbd_dev->lock_waitq); - else - wake_up(&rbd_dev->lock_waitq); + if (!completion_done(&rbd_dev->acquire_wait)) { + rbd_assert(list_empty(&rbd_dev->acquiring_list) && + list_empty(&rbd_dev->running_list)); + rbd_dev->acquire_err = result; + complete_all(&rbd_dev->acquire_wait); + return; + } + + list_for_each_entry(img_req, &rbd_dev->acquiring_list, lock_item) { + mutex_lock(&img_req->state_mutex); + rbd_assert(img_req->state == RBD_IMG_EXCLUSIVE_LOCK); + rbd_img_schedule(img_req, result); + mutex_unlock(&img_req->state_mutex); + } + + list_splice_tail_init(&rbd_dev->acquiring_list, &rbd_dev->running_list); } static int get_lock_owner_info(struct rbd_device *rbd_dev, @@ -3357,11 +3424,8 @@ static int rbd_try_lock(struct rbd_device *rbd_dev) goto again; ret = find_watcher(rbd_dev, lockers); - if (ret) { - if (ret > 0) - ret = 0; /* have to request lock */ - goto out; - } + if (ret) + goto out; /* request lock or error */ rbd_warn(rbd_dev, "%s%llu seems dead, breaking lock", ENTITY_NAME(lockers[0].id.name)); @@ -3391,52 +3455,65 @@ static int rbd_try_lock(struct rbd_device *rbd_dev) } /* - * ret is set only if lock_state is RBD_LOCK_STATE_UNLOCKED + * Return: + * 0 - lock acquired + * 1 - caller should call rbd_request_lock() + * <0 - error */ -static enum rbd_lock_state rbd_try_acquire_lock(struct rbd_device *rbd_dev, - int *pret) +static int rbd_try_acquire_lock(struct rbd_device *rbd_dev) { - enum rbd_lock_state lock_state; + int ret; down_read(&rbd_dev->lock_rwsem); dout("%s rbd_dev %p read lock_state %d\n", __func__, rbd_dev, rbd_dev->lock_state); if (__rbd_is_lock_owner(rbd_dev)) { - lock_state = rbd_dev->lock_state; up_read(&rbd_dev->lock_rwsem); - return lock_state; + return 0; } up_read(&rbd_dev->lock_rwsem); down_write(&rbd_dev->lock_rwsem); dout("%s rbd_dev %p write lock_state %d\n", __func__, rbd_dev, rbd_dev->lock_state); - if (!__rbd_is_lock_owner(rbd_dev)) { - *pret = rbd_try_lock(rbd_dev); - if (*pret) - rbd_warn(rbd_dev, "failed to acquire lock: %d", *pret); + if (__rbd_is_lock_owner(rbd_dev)) { + up_write(&rbd_dev->lock_rwsem); + return 0; } - lock_state = rbd_dev->lock_state; + ret = rbd_try_lock(rbd_dev); + if (ret < 0) { + rbd_warn(rbd_dev, "failed to lock header: %d", ret); + if (ret == -EBLACKLISTED) + goto out; + + ret = 1; /* request lock anyway */ + } + if (ret > 0) { + up_write(&rbd_dev->lock_rwsem); + return ret; + } + + rbd_assert(rbd_dev->lock_state == RBD_LOCK_STATE_LOCKED); + rbd_assert(list_empty(&rbd_dev->running_list)); + +out: + wake_requests(rbd_dev, ret); up_write(&rbd_dev->lock_rwsem); - return lock_state; + return ret; } static void rbd_acquire_lock(struct work_struct *work) { struct rbd_device *rbd_dev = container_of(to_delayed_work(work), struct rbd_device, lock_dwork); - enum rbd_lock_state lock_state; - int ret = 0; + int ret; dout("%s rbd_dev %p\n", __func__, rbd_dev); again: - lock_state = rbd_try_acquire_lock(rbd_dev, &ret); - if (lock_state != RBD_LOCK_STATE_UNLOCKED || ret == -EBLACKLISTED) { - if (lock_state == RBD_LOCK_STATE_LOCKED) - wake_requests(rbd_dev, true); - dout("%s rbd_dev %p lock_state %d ret %d - done\n", __func__, - rbd_dev, lock_state, ret); + ret = rbd_try_acquire_lock(rbd_dev); + if (ret <= 0) { + dout("%s rbd_dev %p ret %d - done\n", __func__, rbd_dev, ret); return; } @@ -3445,16 +3522,9 @@ static void rbd_acquire_lock(struct work_struct *work) goto again; /* treat this as a dead client */ } else if (ret == -EROFS) { rbd_warn(rbd_dev, "peer will not release lock"); - /* - * If this is rbd_add_acquire_lock(), we want to fail - * immediately -- reuse BLACKLISTED flag. Otherwise we - * want to block. - */ - if (!(rbd_dev->disk->flags & GENHD_FL_UP)) { - set_bit(RBD_DEV_FLAG_BLACKLISTED, &rbd_dev->flags); - /* wake "rbd map --exclusive" process */ - wake_requests(rbd_dev, false); - } + down_write(&rbd_dev->lock_rwsem); + wake_requests(rbd_dev, ret); + up_write(&rbd_dev->lock_rwsem); } else if (ret < 0) { rbd_warn(rbd_dev, "error requesting lock: %d", ret); mod_delayed_work(rbd_dev->task_wq, &rbd_dev->lock_dwork, @@ -3519,10 +3589,10 @@ static void rbd_release_lock(struct rbd_device *rbd_dev) /* * Give others a chance to grab the lock - we would re-acquire - * almost immediately if we got new IO during ceph_osdc_sync() - * otherwise. We need to ack our own notifications, so this - * lock_dwork will be requeued from rbd_wait_state_locked() - * after wake_requests() in rbd_handle_released_lock(). + * almost immediately if we got new IO while draining the running + * list otherwise. We need to ack our own notifications, so this + * lock_dwork will be requeued from rbd_handle_released_lock() by + * way of maybe_kick_acquire(). */ cancel_delayed_work(&rbd_dev->lock_dwork); } @@ -3537,6 +3607,23 @@ static void rbd_release_lock_work(struct work_struct *work) up_write(&rbd_dev->lock_rwsem); } +static void maybe_kick_acquire(struct rbd_device *rbd_dev) +{ + bool have_requests; + + dout("%s rbd_dev %p\n", __func__, rbd_dev); + if (__rbd_is_lock_owner(rbd_dev)) + return; + + spin_lock(&rbd_dev->lock_lists_lock); + have_requests = !list_empty(&rbd_dev->acquiring_list); + spin_unlock(&rbd_dev->lock_lists_lock); + if (have_requests || delayed_work_pending(&rbd_dev->lock_dwork)) { + dout("%s rbd_dev %p kicking lock_dwork\n", __func__, rbd_dev); + mod_delayed_work(rbd_dev->task_wq, &rbd_dev->lock_dwork, 0); + } +} + static void rbd_handle_acquired_lock(struct rbd_device *rbd_dev, u8 struct_v, void **p) { @@ -3566,8 +3653,7 @@ static void rbd_handle_acquired_lock(struct rbd_device *rbd_dev, u8 struct_v, down_read(&rbd_dev->lock_rwsem); } - if (!__rbd_is_lock_owner(rbd_dev)) - wake_requests(rbd_dev, false); + maybe_kick_acquire(rbd_dev); up_read(&rbd_dev->lock_rwsem); } @@ -3599,8 +3685,7 @@ static void rbd_handle_released_lock(struct rbd_device *rbd_dev, u8 struct_v, down_read(&rbd_dev->lock_rwsem); } - if (!__rbd_is_lock_owner(rbd_dev)) - wake_requests(rbd_dev, false); + maybe_kick_acquire(rbd_dev); up_read(&rbd_dev->lock_rwsem); } @@ -3850,7 +3935,6 @@ static void cancel_tasks_sync(struct rbd_device *rbd_dev) static void rbd_unregister_watch(struct rbd_device *rbd_dev) { - WARN_ON(waitqueue_active(&rbd_dev->lock_waitq)); cancel_tasks_sync(rbd_dev); mutex_lock(&rbd_dev->watch_mutex); @@ -3893,6 +3977,7 @@ static void rbd_reacquire_lock(struct rbd_device *rbd_dev) queue_delayed_work(rbd_dev->task_wq, &rbd_dev->lock_dwork, 0); } else { __rbd_lock(rbd_dev, cookie); + wake_requests(rbd_dev, 0); } } @@ -3913,15 +3998,18 @@ static void rbd_reregister_watch(struct work_struct *work) ret = __rbd_register_watch(rbd_dev); if (ret) { rbd_warn(rbd_dev, "failed to reregister watch: %d", ret); - if (ret == -EBLACKLISTED || ret == -ENOENT) { - set_bit(RBD_DEV_FLAG_BLACKLISTED, &rbd_dev->flags); - wake_requests(rbd_dev, true); - } else { + if (ret != -EBLACKLISTED && ret != -ENOENT) { queue_delayed_work(rbd_dev->task_wq, &rbd_dev->watch_dwork, RBD_RETRY_DELAY); + mutex_unlock(&rbd_dev->watch_mutex); + return; } + mutex_unlock(&rbd_dev->watch_mutex); + down_write(&rbd_dev->lock_rwsem); + wake_requests(rbd_dev, ret); + up_write(&rbd_dev->lock_rwsem); return; } @@ -3996,54 +4084,6 @@ static int rbd_obj_method_sync(struct rbd_device *rbd_dev, return ret; } -/* - * lock_rwsem must be held for read - */ -static int rbd_wait_state_locked(struct rbd_device *rbd_dev, bool may_acquire) -{ - DEFINE_WAIT(wait); - unsigned long timeout; - int ret = 0; - - if (test_bit(RBD_DEV_FLAG_BLACKLISTED, &rbd_dev->flags)) - return -EBLACKLISTED; - - if (rbd_dev->lock_state == RBD_LOCK_STATE_LOCKED) - return 0; - - if (!may_acquire) { - rbd_warn(rbd_dev, "exclusive lock required"); - return -EROFS; - } - - do { - /* - * Note the use of mod_delayed_work() in rbd_acquire_lock() - * and cancel_delayed_work() in wake_requests(). - */ - dout("%s rbd_dev %p queueing lock_dwork\n", __func__, rbd_dev); - queue_delayed_work(rbd_dev->task_wq, &rbd_dev->lock_dwork, 0); - prepare_to_wait_exclusive(&rbd_dev->lock_waitq, &wait, - TASK_UNINTERRUPTIBLE); - up_read(&rbd_dev->lock_rwsem); - timeout = schedule_timeout(ceph_timeout_jiffies( - rbd_dev->opts->lock_timeout)); - down_read(&rbd_dev->lock_rwsem); - if (test_bit(RBD_DEV_FLAG_BLACKLISTED, &rbd_dev->flags)) { - ret = -EBLACKLISTED; - break; - } - if (!timeout) { - rbd_warn(rbd_dev, "timed out waiting for lock"); - ret = -ETIMEDOUT; - break; - } - } while (rbd_dev->lock_state != RBD_LOCK_STATE_LOCKED); - - finish_wait(&rbd_dev->lock_waitq, &wait); - return ret; -} - static void rbd_queue_workfn(struct work_struct *work) { struct request *rq = blk_mq_rq_from_pdu(work); @@ -4054,7 +4094,6 @@ static void rbd_queue_workfn(struct work_struct *work) u64 length = blk_rq_bytes(rq); enum obj_operation_type op_type; u64 mapping_size; - bool must_be_locked; int result; switch (req_op(rq)) { @@ -4128,21 +4167,10 @@ static void rbd_queue_workfn(struct work_struct *work) goto err_rq; } - must_be_locked = - (rbd_dev->header.features & RBD_FEATURE_EXCLUSIVE_LOCK) && - (op_type != OBJ_OP_READ || rbd_dev->opts->lock_on_read); - if (must_be_locked) { - down_read(&rbd_dev->lock_rwsem); - result = rbd_wait_state_locked(rbd_dev, - !rbd_dev->opts->exclusive); - if (result) - goto err_unlock; - } - img_request = rbd_img_request_create(rbd_dev, op_type, snapc); if (!img_request) { result = -ENOMEM; - goto err_unlock; + goto err_rq; } img_request->rq = rq; snapc = NULL; /* img_request consumes a ref */ @@ -4155,19 +4183,11 @@ static void rbd_queue_workfn(struct work_struct *work) if (result) goto err_img_request; - if (must_be_locked) { - rbd_lock_add_request(img_request); - up_read(&rbd_dev->lock_rwsem); - } - rbd_img_handle_request(img_request, 0); return; err_img_request: rbd_img_request_put(img_request); -err_unlock: - if (must_be_locked) - up_read(&rbd_dev->lock_rwsem); err_rq: if (result) rbd_warn(rbd_dev, "%s %llx at %llx result %d", @@ -4835,9 +4855,10 @@ static struct rbd_device *__rbd_dev_create(struct rbd_client *rbdc, INIT_DELAYED_WORK(&rbd_dev->lock_dwork, rbd_acquire_lock); INIT_WORK(&rbd_dev->unlock_work, rbd_release_lock_work); spin_lock_init(&rbd_dev->lock_lists_lock); + INIT_LIST_HEAD(&rbd_dev->acquiring_list); INIT_LIST_HEAD(&rbd_dev->running_list); + init_completion(&rbd_dev->acquire_wait); init_completion(&rbd_dev->releasing_wait); - init_waitqueue_head(&rbd_dev->lock_waitq); rbd_dev->dev.bus = &rbd_bus_type; rbd_dev->dev.type = &rbd_device_type; @@ -5857,24 +5878,45 @@ static void rbd_dev_image_unlock(struct rbd_device *rbd_dev) up_write(&rbd_dev->lock_rwsem); } +/* + * If the wait is interrupted, an error is returned even if the lock + * was successfully acquired. rbd_dev_image_unlock() will release it + * if needed. + */ static int rbd_add_acquire_lock(struct rbd_device *rbd_dev) { - int ret; + long ret; if (!(rbd_dev->header.features & RBD_FEATURE_EXCLUSIVE_LOCK)) { + if (!rbd_dev->opts->exclusive && !rbd_dev->opts->lock_on_read) + return 0; + rbd_warn(rbd_dev, "exclusive-lock feature is not enabled"); return -EINVAL; } - /* FIXME: "rbd map --exclusive" should be in interruptible */ - down_read(&rbd_dev->lock_rwsem); - ret = rbd_wait_state_locked(rbd_dev, true); - up_read(&rbd_dev->lock_rwsem); + if (rbd_dev->spec->snap_id != CEPH_NOSNAP) + return 0; + + rbd_assert(!rbd_is_lock_owner(rbd_dev)); + queue_delayed_work(rbd_dev->task_wq, &rbd_dev->lock_dwork, 0); + ret = wait_for_completion_killable_timeout(&rbd_dev->acquire_wait, + ceph_timeout_jiffies(rbd_dev->opts->lock_timeout)); + if (ret > 0) + ret = rbd_dev->acquire_err; + else if (!ret) + ret = -ETIMEDOUT; + if (ret) { - rbd_warn(rbd_dev, "failed to acquire exclusive lock"); - return -EROFS; + rbd_warn(rbd_dev, "failed to acquire exclusive lock: %ld", ret); + return ret; } + /* + * The lock may have been released by now, unless automatic lock + * transitions are disabled. + */ + rbd_assert(!rbd_dev->opts->exclusive || rbd_is_lock_owner(rbd_dev)); return 0; } @@ -6319,11 +6361,9 @@ static ssize_t do_rbd_add(struct bus_type *bus, if (rc) goto err_out_image_probe; - if (rbd_dev->opts->exclusive) { - rc = rbd_add_acquire_lock(rbd_dev); - if (rc) - goto err_out_device_setup; - } + rc = rbd_add_acquire_lock(rbd_dev); + if (rc) + goto err_out_image_lock; /* Everything's ready. Announce the disk to the world. */ @@ -6349,7 +6389,6 @@ static ssize_t do_rbd_add(struct bus_type *bus, err_out_image_lock: rbd_dev_image_unlock(rbd_dev); -err_out_device_setup: rbd_dev_device_release(rbd_dev); err_out_image_probe: rbd_dev_image_release(rbd_dev); From patchwork Tue Jun 25 14:41:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015781 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F25761398 for ; Tue, 25 Jun 2019 14:41:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E46BA289EA for ; Tue, 25 Jun 2019 14:41:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D8B4328A39; Tue, 25 Jun 2019 14:41:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 820BD289EA for ; Tue, 25 Jun 2019 14:41:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731712AbfFYOlZ (ORCPT ); Tue, 25 Jun 2019 10:41:25 -0400 Received: from mail-wm1-f65.google.com ([209.85.128.65]:52013 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731158AbfFYOlZ (ORCPT ); Tue, 25 Jun 2019 10:41:25 -0400 Received: by mail-wm1-f65.google.com with SMTP id 207so3131613wma.1 for ; Tue, 25 Jun 2019 07:41:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=jNIprajFpYogTMU1bZTaoNLOv+qvJodhIpTpehLnmfU=; b=fv02Mkql/WxDtFV1RJjb5GXXiAN0/mZTLse84dGrU1BHKRbcrhtlixHqgqy14E57Fb h9KZQc+FtA4PP4eMimE1oxpA+SeWLQZ+rsVsM8WUzV8gnF5+mVH92BTcLvcAd+XSLHpk eFoxmKcOumm+GHJ9o8iQ+NyRWMRye5j47x6u/kfOyaXxLE2kO65DR6cMTfB1vGH/n31r lwF6qxNrEImt2KPl7EQ4OQ9DSuob253KUDP/d4hOc8c0fPUPWl3cYg/pk7R2SoadB1zs Nxes0sNYrgWNFXRBpgiv2XJ6DpKywCbDNE0jH1MAGdYIt4Nu64IxlwQupBkTmd9OC7FF vkJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jNIprajFpYogTMU1bZTaoNLOv+qvJodhIpTpehLnmfU=; b=bQUdl61q4We0q0RCT4ma6X8uCslJwk66TeyiPOPVW+htHGXXdEJ0xcqJK5al+Si97p x05Ut+EKJNan9O91a9/jtnve7zu3iKSZDuMEs8ByzUEFkw8SipZHY15m4FvoWWgkLsDe jFijDYwe9BnzTvRyXYLHJ40lvLOVc9j2KPYdxNvZvegDzd9rrIaabefLGdeHfwP0zwwe /Wi+61Iqf9SW0x/QFYQFbBHbPFj6BICLlajYDLE1fHrFRLbwiQBcKWWvFZySlw2f6MVl DrUotX+gv/msJoGdjdSy4mbsd0LIdaWg8cqrNfLUXbcc5BfzdcxFrTzZbaqAQXwjX5pk 4liw== X-Gm-Message-State: APjAAAWRD0MwXJOtEJgsJ1DhtaUiJMJU1e7bOlLKtV0zUt4NUejGpzKY 8wF8sj2gFlk+CKye282WV34i+ybSs+o= X-Google-Smtp-Source: APXvYqwEF9lQRVC4jeLlMfmsyZX+gefLxPyd2ezJNylaLsLa1xGdbn58XotSYiiV6OGzNnGgcxdYoQ== X-Received: by 2002:a1c:96:: with SMTP id 144mr7452881wma.45.1561473682990; Tue, 25 Jun 2019 07:41:22 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.22 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:22 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 15/20] libceph: bump CEPH_MSG_MAX_DATA_LEN (again) Date: Tue, 25 Jun 2019 16:41:06 +0200 Message-Id: <20190625144111.11270-16-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This time for rbd object map. Object maps are limited in size to 256000000 objects, two bits per object. Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang --- include/linux/ceph/libceph.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/include/linux/ceph/libceph.h b/include/linux/ceph/libceph.h index 337d5049ff93..0ae60b55e55a 100644 --- a/include/linux/ceph/libceph.h +++ b/include/linux/ceph/libceph.h @@ -84,11 +84,13 @@ struct ceph_options { #define CEPH_MSG_MAX_MIDDLE_LEN (16*1024*1024) /* - * Handle the largest possible rbd object in one message. + * The largest possible rbd data object is 32M. + * The largest possible rbd object map object is 64M. + * * There is no limit on the size of cephfs objects, but it has to obey * rsize and wsize mount options anyway. */ -#define CEPH_MSG_MAX_DATA_LEN (32*1024*1024) +#define CEPH_MSG_MAX_DATA_LEN (64*1024*1024) #define CEPH_AUTH_NAME_DEFAULT "guest" From patchwork Tue Jun 25 14:41:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015789 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7B2F4112C for ; Tue, 25 Jun 2019 14:41:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6E04528C31 for ; Tue, 25 Jun 2019 14:41:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 62BAA28C3F; Tue, 25 Jun 2019 14:41:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA81128C31 for ; Tue, 25 Jun 2019 14:41:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731755AbfFYOl1 (ORCPT ); Tue, 25 Jun 2019 10:41:27 -0400 Received: from mail-wm1-f65.google.com ([209.85.128.65]:54325 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731601AbfFYOl0 (ORCPT ); Tue, 25 Jun 2019 10:41:26 -0400 Received: by mail-wm1-f65.google.com with SMTP id g135so3116185wme.4 for ; Tue, 25 Jun 2019 07:41:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=NiSiJcNheybEZU5a1VpxxQm3nqn3bG09ZHed3XHRzMQ=; b=pOyTIfBx3rQ+6obazteXWNaySDGYUIO9n2G3w6l1N7f8Yb0/3DRKQFtu2ZCSIJBNxY yv9eI37NMvc+u0UIbFc7VWOHyE/vTFyOEoKw1gEPUZ7q1UN9i933uppDFTjNvRLL2F3v 58sM4WNKDkL9g4W7ZJwOH4+7+RnOpVEcsXoMx8DuID8mJkrOO8tOYcviaL+RBqxHkUoW ks4I2/XgR2PPoqe0IQovYr6zGGbIqzoG5DZMPQs01yK3DjVXU5WeH0pXgIoEUVPw209B 677wy5vzfJQk2Ic/IZnI/UFDdPUS306In0hWZTIF1AbpsMpFuEoXkFngaPn2m7BIJWAO ZkCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=NiSiJcNheybEZU5a1VpxxQm3nqn3bG09ZHed3XHRzMQ=; b=JU9byEtfMtkUJynJ/NSx4fxpv09Hwl4R4IfQPrCum9qUJTQEPJU6BRPvXU5eWYJsKJ d2EzDxg5H3CZ27UsRFs/SwwkY8cSKCGFj1l5ykdxvKTBtpokWEWnBC3dHI4618/xPoed SVCA2qbE8NrOa570Ol0DuLORupOxdM6gEekG3br1b6er4kNWZM9CMa0gm9S3oEfdv6hg MMVT/r7Lcp3IO1gR6h628F2tJ4HLXGpXtk6a7Yimgh+5kiZisIcr6+roguR70guYMpvN VKki4rFJ26hfCc2VkJQK5rVZn0yvAgs3nCQgY/tl7nBDpeY2PgveUZMQlmoV1gOZZQgs KW4Q== X-Gm-Message-State: APjAAAWQq1J3G4CA1TBwV4TctAMlDnC6st4x7zAUsum7NWbgxuBZAIQl KlGN5TSpzVKCtLVYUDvU/JynpoyS2ZE= X-Google-Smtp-Source: APXvYqyysShA/GuWjuRBZhTC4CpXAsLKCakMcC0DkdvLM6tfbwXaOqwG2NSUhhws/3oylQ2pnwji6A== X-Received: by 2002:a7b:c776:: with SMTP id x22mr19672293wmk.55.1561473683772; Tue, 25 Jun 2019 07:41:23 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.23 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:23 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 16/20] libceph: change ceph_osdc_call() to take page vector for response Date: Tue, 25 Jun 2019 16:41:07 +0200 Message-Id: <20190625144111.11270-17-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This will be used for loading object map. rbd_obj_read_sync() isn't suitable because object map must be accessed through class methods. Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang Reviewed-by: Jeff Layton --- drivers/block/rbd.c | 8 ++++---- include/linux/ceph/osd_client.h | 2 +- net/ceph/cls_lock_client.c | 2 +- net/ceph/osd_client.c | 10 +++++----- 4 files changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index fd3f248ba9c2..c9f88b0cb730 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -4072,7 +4072,7 @@ static int rbd_obj_method_sync(struct rbd_device *rbd_dev, ret = ceph_osdc_call(osdc, oid, oloc, RBD_DRV_NAME, method_name, CEPH_OSD_FLAG_READ, req_page, outbound_size, - reply_page, &inbound_size); + &reply_page, &inbound_size); if (!ret) { memcpy(inbound, page_address(reply_page), inbound_size); ret = inbound_size; @@ -5098,7 +5098,7 @@ static int __get_parent_info(struct rbd_device *rbd_dev, ret = ceph_osdc_call(osdc, &rbd_dev->header_oid, &rbd_dev->header_oloc, "rbd", "parent_get", CEPH_OSD_FLAG_READ, - req_page, sizeof(u64), reply_page, &reply_len); + req_page, sizeof(u64), &reply_page, &reply_len); if (ret) return ret == -EOPNOTSUPP ? 1 : ret; @@ -5110,7 +5110,7 @@ static int __get_parent_info(struct rbd_device *rbd_dev, ret = ceph_osdc_call(osdc, &rbd_dev->header_oid, &rbd_dev->header_oloc, "rbd", "parent_overlap_get", CEPH_OSD_FLAG_READ, - req_page, sizeof(u64), reply_page, &reply_len); + req_page, sizeof(u64), &reply_page, &reply_len); if (ret) return ret; @@ -5141,7 +5141,7 @@ static int __get_parent_info_legacy(struct rbd_device *rbd_dev, ret = ceph_osdc_call(osdc, &rbd_dev->header_oid, &rbd_dev->header_oloc, "rbd", "get_parent", CEPH_OSD_FLAG_READ, - req_page, sizeof(u64), reply_page, &reply_len); + req_page, sizeof(u64), &reply_page, &reply_len); if (ret) return ret; diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index 2294f963dab7..edb191c40a5c 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -497,7 +497,7 @@ int ceph_osdc_call(struct ceph_osd_client *osdc, const char *class, const char *method, unsigned int flags, struct page *req_page, size_t req_len, - struct page *resp_page, size_t *resp_len); + struct page **resp_pages, size_t *resp_len); extern int ceph_osdc_readpages(struct ceph_osd_client *osdc, struct ceph_vino vino, diff --git a/net/ceph/cls_lock_client.c b/net/ceph/cls_lock_client.c index 4cc28541281b..56bbfe01e3ac 100644 --- a/net/ceph/cls_lock_client.c +++ b/net/ceph/cls_lock_client.c @@ -360,7 +360,7 @@ int ceph_cls_lock_info(struct ceph_osd_client *osdc, dout("%s lock_name %s\n", __func__, lock_name); ret = ceph_osdc_call(osdc, oid, oloc, "lock", "get_info", CEPH_OSD_FLAG_READ, get_info_op_page, - get_info_op_buf_size, reply_page, &reply_len); + get_info_op_buf_size, &reply_page, &reply_len); dout("%s: status %d\n", __func__, ret); if (ret >= 0) { diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 9a8eca5eda65..cc2bf296583d 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -5044,12 +5044,12 @@ int ceph_osdc_call(struct ceph_osd_client *osdc, const char *class, const char *method, unsigned int flags, struct page *req_page, size_t req_len, - struct page *resp_page, size_t *resp_len) + struct page **resp_pages, size_t *resp_len) { struct ceph_osd_request *req; int ret; - if (req_len > PAGE_SIZE || (resp_page && *resp_len > PAGE_SIZE)) + if (req_len > PAGE_SIZE) return -E2BIG; req = ceph_osdc_alloc_request(osdc, NULL, 1, false, GFP_NOIO); @@ -5067,8 +5067,8 @@ int ceph_osdc_call(struct ceph_osd_client *osdc, if (req_page) osd_req_op_cls_request_data_pages(req, 0, &req_page, req_len, 0, false, false); - if (resp_page) - osd_req_op_cls_response_data_pages(req, 0, &resp_page, + if (resp_pages) + osd_req_op_cls_response_data_pages(req, 0, resp_pages, *resp_len, 0, false, false); ret = ceph_osdc_alloc_messages(req, GFP_NOIO); @@ -5079,7 +5079,7 @@ int ceph_osdc_call(struct ceph_osd_client *osdc, ret = ceph_osdc_wait_request(osdc, req); if (ret >= 0) { ret = req->r_ops[0].rval; - if (resp_page) + if (resp_pages) *resp_len = req->r_ops[0].outdata_len; } From patchwork Tue Jun 25 14:41:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015793 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8D8B9112C for ; Tue, 25 Jun 2019 14:41:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 810E0289EA for ; Tue, 25 Jun 2019 14:41:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 75A4D28C45; Tue, 25 Jun 2019 14:41:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2515E289EA for ; Tue, 25 Jun 2019 14:41:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731889AbfFYOlf (ORCPT ); Tue, 25 Jun 2019 10:41:35 -0400 Received: from mail-wm1-f53.google.com ([209.85.128.53]:52163 "EHLO mail-wm1-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731717AbfFYOl1 (ORCPT ); Tue, 25 Jun 2019 10:41:27 -0400 Received: by mail-wm1-f53.google.com with SMTP id 207so3131705wma.1 for ; Tue, 25 Jun 2019 07:41:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cmsVZa69UCb1l4mPg+VwA9ttKDwUdAd1xXDMfo8o/jI=; b=qNpjuqRSdKKOg9ErhDmu9wj0wj/Q3yFQSNVSoqei1wZHy1AcxDNv3ETUjwdVr8iAZl 4lYkwnlIDiJQw5yMwqLRyOjKEK8UrZLTGeB/itMKoCgNWnR1zHn6h3Ke8NBvqw5zIGFH urcyJrr0d1+j2ddkJ8HsiFdDVF/OS/5RFoNINGJAl4IgD+v2Kxl9xSV9m/XwuMLc6j/Y W4APb2ayyCXBbGCRKqorHpiUzu5B+4NVSjs1uI/02sg5FwhJj2ZtgjpYXF2uqnFTG5+A xHFDPO44kCo3mAJSeCyG2jias/XmfNfT1Vgyzb3zRBCDB9lbMld0ruVXlHrnHKlxJ8Me FB0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cmsVZa69UCb1l4mPg+VwA9ttKDwUdAd1xXDMfo8o/jI=; b=s7J6u25uSR5XkIchgp7lB2Gnm3gxYdzgYcb3zw48H8uBCQ+4GTVKo6UhTBUjlrTbhl K0EoVGJ45K8uNT23wSEYIwHHscwVnz7atrj7bkAfs7NTYYSmmfI0Lh4xuhHlcjBDZ3or nEJPJC+dkvfJDepzgd9GW/7ghPNWqvlPFg//7oTRyet+iADKE5ZzfqO6rTbS753cbPTC rFlfoNG2o5PoTpk4wjhVdJf/ZPqP6spk2L0H52kuTeVLQJ3gCOFDHGcgI/2WleAC8i8x h98YfCfm/Jbp6AsrPwyLJxQut4rpMCQy73mj3iPUffinzjS2VCGjrED6a8wSZrEhdsVn Dl1w== X-Gm-Message-State: APjAAAVfW+yiCwRBakA+adVyAm1OpT9VcAfGLiMvlN0f6q5Nx7EWuOfS vKzHHbFbQeAFmlV3YZnqMzdUhsf3IRg= X-Google-Smtp-Source: APXvYqy4eEcEav5LoW8BrvWKtN683f8+SK2EOEGqWSwYSu3qjcx8c6y0OKEVpX2skoiJWB+91uvq9w== X-Received: by 2002:a7b:ce10:: with SMTP id m16mr19687691wmc.21.1561473684600; Tue, 25 Jun 2019 07:41:24 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.23 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:24 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 17/20] libceph: export osd_req_op_data() macro Date: Tue, 25 Jun 2019 16:41:08 +0200 Message-Id: <20190625144111.11270-18-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We already have one exported wrapper around it for extent.osd_data and rbd_object_map_update_finish() needs another one for cls.request_data. Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang Reviewed-by: Jeff Layton --- include/linux/ceph/osd_client.h | 8 ++++++++ net/ceph/osd_client.c | 8 -------- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index edb191c40a5c..44100a4f0808 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -389,6 +389,14 @@ extern void ceph_osdc_handle_map(struct ceph_osd_client *osdc, void ceph_osdc_update_epoch_barrier(struct ceph_osd_client *osdc, u32 eb); void ceph_osdc_abort_requests(struct ceph_osd_client *osdc, int err); +#define osd_req_op_data(oreq, whch, typ, fld) \ +({ \ + struct ceph_osd_request *__oreq = (oreq); \ + unsigned int __whch = (whch); \ + BUG_ON(__whch >= __oreq->r_num_ops); \ + &__oreq->r_ops[__whch].typ.fld; \ +}) + extern void osd_req_op_init(struct ceph_osd_request *osd_req, unsigned int which, u16 opcode, u32 flags); diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index cc2bf296583d..22e8ccc1f975 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -171,14 +171,6 @@ static void ceph_osd_data_bvecs_init(struct ceph_osd_data *osd_data, osd_data->num_bvecs = num_bvecs; } -#define osd_req_op_data(oreq, whch, typ, fld) \ -({ \ - struct ceph_osd_request *__oreq = (oreq); \ - unsigned int __whch = (whch); \ - BUG_ON(__whch >= __oreq->r_num_ops); \ - &__oreq->r_ops[__whch].typ.fld; \ -}) - static struct ceph_osd_data * osd_req_op_raw_data_in(struct ceph_osd_request *osd_req, unsigned int which) { From patchwork Tue Jun 25 14:41:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015785 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E03A11398 for ; Tue, 25 Jun 2019 14:41:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D2D0C2874C for ; Tue, 25 Jun 2019 14:41:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D060928C00; Tue, 25 Jun 2019 14:41:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 67C062874C for ; Tue, 25 Jun 2019 14:41:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731824AbfFYOl2 (ORCPT ); Tue, 25 Jun 2019 10:41:28 -0400 Received: from mail-wm1-f65.google.com ([209.85.128.65]:37588 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730689AbfFYOl1 (ORCPT ); Tue, 25 Jun 2019 10:41:27 -0400 Received: by mail-wm1-f65.google.com with SMTP id f17so3256733wme.2 for ; Tue, 25 Jun 2019 07:41:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=+hidNeN0ILCjWSG8uZu8Qm9rtlUL7iqKVkEkqnG59rk=; b=ALr9dAQJ5VchWladGxwwtCFhe4KLVbbmmJkpdWmKswYodbah2s+52xpxZFwljQus4c t0natdUXsp0A9N7e6FEd7+4wUSbJzhoJ+494RDZq0vwqdNCGDig88Aeh4Ve8Y0h4NprX WM9fw5XrDDa2PijN2SYMcMx07Uc728iC6X9ATc8PqFJ2CbxfBYO3rhML4wOfRJkdlZb5 S5n/KFS0s9rFbo4fVyRbj5gBmLZ/ljj197YN596Hn4wXBTO2TC7uExMNAUAdrtMgRO+M sR5Rbg9jytVBDKC0UH8iPlFhN29mANCXqRwmkYnXle9oz1ujZpL4DZpReerfUzYr39Ib MyDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+hidNeN0ILCjWSG8uZu8Qm9rtlUL7iqKVkEkqnG59rk=; b=MGRTS24RiZom0q20sTqF6fvKWrgbgqgKQCiXULxSLtjL3DiiXzn2VM9pMGnvas3g6W xH/Upt/ODB6LgBU7rkNxVjRzXlqmlBeX/B8O/mclf/0o+PCLuzEWFmDzHNyU2Ls1TDVt 1m8rqGsz+3KYN5zk9Tvod8n1jp/0TPa2BJKMo5VaYClFydMxMi5NYLtw1UnMSEkYceyt ai8iybeRXAPxTnKx6hAiykj0M9LSUcg611C+bXPpJ3SHremY2aflbgCv4sk9kx0Eb3tF F113wCMf2IYe+o8rpwP/xhR/l5CbYZjQ2UL3100s1Q2dSDIOGr4aZIdtrzs7emXYocfN Qndw== X-Gm-Message-State: APjAAAXLgJwaMd6iWNXc7RKFjtUy50YzGbffDDUR2zbyt3N8fPrbTc28 5qWRY96qkqv2tMls3GrDwSHPsa53v0o= X-Google-Smtp-Source: APXvYqw0h8hP1nd3aJk2A2BTPAe7AV/+XkChYctV5sNlOthfGkx659QwedcTgn7/gpEOHjYTIjiOog== X-Received: by 2002:a1c:96c7:: with SMTP id y190mr18449828wmd.87.1561473685653; Tue, 25 Jun 2019 07:41:25 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.24 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:25 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 18/20] rbd: call rbd_dev_mapping_set() from rbd_dev_image_probe() Date: Tue, 25 Jun 2019 16:41:09 +0200 Message-Id: <20190625144111.11270-19-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Snapshot object map will be loaded in rbd_dev_image_probe(), so we need to know snapshot's size (as opposed to HEAD's size) sooner. Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang --- drivers/block/rbd.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index c9f88b0cb730..671041b67957 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -6014,6 +6014,7 @@ static void rbd_dev_unprobe(struct rbd_device *rbd_dev) struct rbd_image_header *header; rbd_dev_parent_put(rbd_dev); + rbd_dev_mapping_clear(rbd_dev); /* Free dynamic fields from the header, then zero it out */ @@ -6114,7 +6115,6 @@ static int rbd_dev_probe_parent(struct rbd_device *rbd_dev, int depth) static void rbd_dev_device_release(struct rbd_device *rbd_dev) { clear_bit(RBD_DEV_FLAG_EXISTS, &rbd_dev->flags); - rbd_dev_mapping_clear(rbd_dev); rbd_free_disk(rbd_dev); if (!single_major) unregister_blkdev(rbd_dev->major, rbd_dev->name); @@ -6148,23 +6148,17 @@ static int rbd_dev_device_setup(struct rbd_device *rbd_dev) if (ret) goto err_out_blkdev; - ret = rbd_dev_mapping_set(rbd_dev); - if (ret) - goto err_out_disk; - set_capacity(rbd_dev->disk, rbd_dev->mapping.size / SECTOR_SIZE); set_disk_ro(rbd_dev->disk, rbd_dev->opts->read_only); ret = dev_set_name(&rbd_dev->dev, "%d", rbd_dev->dev_id); if (ret) - goto err_out_mapping; + goto err_out_disk; set_bit(RBD_DEV_FLAG_EXISTS, &rbd_dev->flags); up_write(&rbd_dev->header_rwsem); return 0; -err_out_mapping: - rbd_dev_mapping_clear(rbd_dev); err_out_disk: rbd_free_disk(rbd_dev); err_out_blkdev: @@ -6265,6 +6259,10 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth) goto err_out_probe; } + ret = rbd_dev_mapping_set(rbd_dev); + if (ret) + goto err_out_probe; + if (rbd_dev->header.features & RBD_FEATURE_LAYERING) { ret = rbd_dev_v2_parent_info(rbd_dev); if (ret) From patchwork Tue Jun 25 14:41:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015791 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CC2361575 for ; Tue, 25 Jun 2019 14:41:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BB40128C45 for ; Tue, 25 Jun 2019 14:41:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AF41128C42; Tue, 25 Jun 2019 14:41:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CFD6F286A8 for ; Tue, 25 Jun 2019 14:41:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731843AbfFYOlb (ORCPT ); Tue, 25 Jun 2019 10:41:31 -0400 Received: from mail-wr1-f68.google.com ([209.85.221.68]:37614 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730689AbfFYOlb (ORCPT ); Tue, 25 Jun 2019 10:41:31 -0400 Received: by mail-wr1-f68.google.com with SMTP id v14so18201360wrr.4 for ; Tue, 25 Jun 2019 07:41:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=kRSzDSvWo96dXBxwVbqVpQ0JiUCB5tW1nZDH3qPpXE0=; b=WA9mAf2vEr+NeLJYLj6HejQiSl4XovWCZCsOkm84f4FcRhVviNhgmQQ/SixsAMgbET O5rJ1o8GE8S4/ZQgvSKj+DejsXdLTyAWimSA9UMz7hM4KSl8H9u8AEmnH8BYR8XGeMlT cLZ5jAKxQe11+TN47+ur42Bmt+gFUgFSrTvhcK+C7sO8ybhagqF1lOnNWQMcjsQKNHdC +eIfy3CTgc2vDNVrR7P3tDQo+AkwBeNfpTmeIICjs2Lo3ltkhDkEB0rThDlklRx4bJ5i p1Tt3gzppX3xirMSDWU+pExiTUT1pLG/I5to6Avh4EHVCVO6af51oXNSrC7HzuanhPZ0 2YBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=kRSzDSvWo96dXBxwVbqVpQ0JiUCB5tW1nZDH3qPpXE0=; b=afdVY7r8dYsKMCX13f4KD8gXfUlZ5P3CkN0QHMrJtLperslstoBtdlYFOkEJuv5Xy2 RVxAjI1IjioASPyfRcjC/J2vgYt95TEi8YOoecBs2bX1DDrs+9eDCCtZkxXW931wMh4+ xonfaVBoElDIijE/8SmZSijWOQ9AkdulN1OafQ85g4GdD13og6X2oUAJVgg3ZdZMO6Y0 GmqhOLW7PXAX/gcrSDdK1j0pTdyIP03N4jai1Y7qrFK/vqkYnnjxBQrV7zWyS7OXEm6V 6CLXwi+mG+n5Hx1xxyV4t8bBBXMnTEw7D06W4hPh9/+A8gMLDyWKKdBYonEwj99WJVpq 2X8w== X-Gm-Message-State: APjAAAVYIl7rHDFEO0H8lIchs/nBoU2CftPR0klauRoMfOX8R/zRLkgB ML3/PhlsVGeI9IKa/q/4/8/fMUoMYFQ= X-Google-Smtp-Source: APXvYqzQLQndhxE/X+nqwA8TyohrbOApdFxCRsMEz1e1rufylXHlqGux6gr2YyPwLHh3/7+uhBkrJA== X-Received: by 2002:adf:ce88:: with SMTP id r8mr40114878wrn.42.1561473686744; Tue, 25 Jun 2019 07:41:26 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.25 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:26 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 19/20] rbd: support for object-map and fast-diff Date: Tue, 25 Jun 2019 16:41:10 +0200 Message-Id: <20190625144111.11270-20-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Speed up reads, discards and zeroouts through RBD_OBJ_FLAG_MAY_EXIST and RBD_OBJ_FLAG_NOOP_FOR_NONEXISTENT based on object map. Invalid object maps are not trusted, but still updated. Note that we never iterate, resize or invalidate object maps. If object-map feature is enabled but object map fails to load, we just fail the requester (either "rbd map" or I/O, by way of post-acquire action). Signed-off-by: Ilya Dryomov --- drivers/block/rbd.c | 721 ++++++++++++++++++++++++++- drivers/block/rbd_types.h | 10 + include/linux/ceph/cls_lock_client.h | 3 + include/linux/ceph/striper.h | 2 + net/ceph/cls_lock_client.c | 45 ++ net/ceph/striper.c | 17 + 6 files changed, 795 insertions(+), 3 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 671041b67957..756595f5fbc9 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -115,6 +115,8 @@ static int atomic_dec_return_safe(atomic_t *v) #define RBD_FEATURE_LAYERING (1ULL<<0) #define RBD_FEATURE_STRIPINGV2 (1ULL<<1) #define RBD_FEATURE_EXCLUSIVE_LOCK (1ULL<<2) +#define RBD_FEATURE_OBJECT_MAP (1ULL<<3) +#define RBD_FEATURE_FAST_DIFF (1ULL<<4) #define RBD_FEATURE_DEEP_FLATTEN (1ULL<<5) #define RBD_FEATURE_DATA_POOL (1ULL<<7) #define RBD_FEATURE_OPERATIONS (1ULL<<8) @@ -122,6 +124,8 @@ static int atomic_dec_return_safe(atomic_t *v) #define RBD_FEATURES_ALL (RBD_FEATURE_LAYERING | \ RBD_FEATURE_STRIPINGV2 | \ RBD_FEATURE_EXCLUSIVE_LOCK | \ + RBD_FEATURE_OBJECT_MAP | \ + RBD_FEATURE_FAST_DIFF | \ RBD_FEATURE_DEEP_FLATTEN | \ RBD_FEATURE_DATA_POOL | \ RBD_FEATURE_OPERATIONS) @@ -227,6 +231,8 @@ enum obj_operation_type { #define RBD_OBJ_FLAG_DELETION (1U << 0) #define RBD_OBJ_FLAG_COPYUP_ENABLED (1U << 1) #define RBD_OBJ_FLAG_COPYUP_ZEROS (1U << 2) +#define RBD_OBJ_FLAG_MAY_EXIST (1U << 3) +#define RBD_OBJ_FLAG_NOOP_FOR_NONEXISTENT (1U << 4) enum rbd_obj_read_state { RBD_OBJ_READ_START = 1, @@ -261,14 +267,18 @@ enum rbd_obj_read_state { */ enum rbd_obj_write_state { RBD_OBJ_WRITE_START = 1, + RBD_OBJ_WRITE_PRE_OBJECT_MAP, RBD_OBJ_WRITE_OBJECT, __RBD_OBJ_WRITE_COPYUP, RBD_OBJ_WRITE_COPYUP, + RBD_OBJ_WRITE_POST_OBJECT_MAP, }; enum rbd_obj_copyup_state { RBD_OBJ_COPYUP_START = 1, RBD_OBJ_COPYUP_READ_PARENT, + __RBD_OBJ_COPYUP_OBJECT_MAPS, + RBD_OBJ_COPYUP_OBJECT_MAPS, __RBD_OBJ_COPYUP_WRITE_OBJECT, RBD_OBJ_COPYUP_WRITE_OBJECT, }; @@ -419,6 +429,11 @@ struct rbd_device { int acquire_err; struct completion releasing_wait; + spinlock_t object_map_lock; + u8 *object_map; + u64 object_map_size; /* in objects */ + u64 object_map_flags; + struct workqueue_struct *task_wq; struct rbd_spec *parent_spec; @@ -620,6 +635,7 @@ static int _rbd_dev_v2_snap_size(struct rbd_device *rbd_dev, u64 snap_id, u8 *order, u64 *snap_size); static int _rbd_dev_v2_snap_features(struct rbd_device *rbd_dev, u64 snap_id, u64 *snap_features); +static int rbd_dev_v2_get_flags(struct rbd_device *rbd_dev); static void rbd_obj_handle_request(struct rbd_obj_request *obj_req, int result); static void rbd_img_handle_request(struct rbd_img_request *img_req, int result); @@ -1768,6 +1784,467 @@ static void rbd_img_request_destroy(struct kref *kref) kmem_cache_free(rbd_img_request_cache, img_request); } +#define BITS_PER_OBJ 2 +#define OBJS_PER_BYTE (BITS_PER_BYTE / BITS_PER_OBJ) +#define OBJ_MASK ((1 << BITS_PER_OBJ) - 1) + +static void __rbd_object_map_index(struct rbd_device *rbd_dev, u64 objno, + u64 *index, u8 *shift) +{ + u32 off; + + rbd_assert(objno < rbd_dev->object_map_size); + *index = div_u64_rem(objno, OBJS_PER_BYTE, &off); + *shift = (OBJS_PER_BYTE - off - 1) * BITS_PER_OBJ; +} + +static u8 __rbd_object_map_get(struct rbd_device *rbd_dev, u64 objno) +{ + u64 index; + u8 shift; + + lockdep_assert_held(&rbd_dev->object_map_lock); + __rbd_object_map_index(rbd_dev, objno, &index, &shift); + return (rbd_dev->object_map[index] >> shift) & OBJ_MASK; +} + +static void __rbd_object_map_set(struct rbd_device *rbd_dev, u64 objno, u8 val) +{ + u64 index; + u8 shift; + u8 *p; + + lockdep_assert_held(&rbd_dev->object_map_lock); + rbd_assert(!(val & ~OBJ_MASK)); + + __rbd_object_map_index(rbd_dev, objno, &index, &shift); + p = &rbd_dev->object_map[index]; + *p = (*p & ~(OBJ_MASK << shift)) | (val << shift); +} + +static u8 rbd_object_map_get(struct rbd_device *rbd_dev, u64 objno) +{ + u8 state; + + spin_lock(&rbd_dev->object_map_lock); + state = __rbd_object_map_get(rbd_dev, objno); + spin_unlock(&rbd_dev->object_map_lock); + return state; +} + +static bool use_object_map(struct rbd_device *rbd_dev) +{ + return ((rbd_dev->header.features & RBD_FEATURE_OBJECT_MAP) && + !(rbd_dev->object_map_flags & RBD_FLAG_OBJECT_MAP_INVALID)); +} + +static bool rbd_object_map_may_exist(struct rbd_device *rbd_dev, u64 objno) +{ + u8 state; + + /* fall back to default logic if object map is disabled or invalid */ + if (!use_object_map(rbd_dev)) + return true; + + state = rbd_object_map_get(rbd_dev, objno); + return state != OBJECT_NONEXISTENT; +} + +static void rbd_object_map_name(struct rbd_device *rbd_dev, u64 snap_id, + struct ceph_object_id *oid) +{ + if (snap_id == CEPH_NOSNAP) + ceph_oid_printf(oid, "%s%s", RBD_OBJECT_MAP_PREFIX, + rbd_dev->spec->image_id); + else + ceph_oid_printf(oid, "%s%s.%016llx", RBD_OBJECT_MAP_PREFIX, + rbd_dev->spec->image_id, snap_id); +} + +static int rbd_object_map_lock(struct rbd_device *rbd_dev) +{ + struct ceph_osd_client *osdc = &rbd_dev->rbd_client->client->osdc; + CEPH_DEFINE_OID_ONSTACK(oid); + u8 lock_type; + char *lock_tag; + struct ceph_locker *lockers; + u32 num_lockers; + bool broke_lock = false; + int ret; + + rbd_object_map_name(rbd_dev, CEPH_NOSNAP, &oid); + +again: + ret = ceph_cls_lock(osdc, &oid, &rbd_dev->header_oloc, RBD_LOCK_NAME, + CEPH_CLS_LOCK_EXCLUSIVE, "", "", "", 0); + if (ret != -EBUSY || broke_lock) { + if (ret == -EEXIST) + ret = 0; /* already locked by myself */ + if (ret) + rbd_warn(rbd_dev, "failed to lock object map: %d", ret); + return ret; + + } + + ret = ceph_cls_lock_info(osdc, &oid, &rbd_dev->header_oloc, + RBD_LOCK_NAME, &lock_type, &lock_tag, + &lockers, &num_lockers); + if (ret) { + if (ret == -ENOENT) + goto again; + + rbd_warn(rbd_dev, "failed to get object map lockers: %d", ret); + return ret; + } + + kfree(lock_tag); + if (num_lockers == 0) + goto again; + + rbd_warn(rbd_dev, "breaking object map lock owned by %s%llu", + ENTITY_NAME(lockers[0].id.name)); + + ret = ceph_cls_break_lock(osdc, &oid, &rbd_dev->header_oloc, + RBD_LOCK_NAME, lockers[0].id.cookie, + &lockers[0].id.name); + ceph_free_lockers(lockers, num_lockers); + if (ret) { + if (ret == -ENOENT) + goto again; + + rbd_warn(rbd_dev, "failed to break object map lock: %d", ret); + return ret; + } + + broke_lock = true; + goto again; +} + +static void rbd_object_map_unlock(struct rbd_device *rbd_dev) +{ + struct ceph_osd_client *osdc = &rbd_dev->rbd_client->client->osdc; + CEPH_DEFINE_OID_ONSTACK(oid); + int ret; + + rbd_object_map_name(rbd_dev, CEPH_NOSNAP, &oid); + + ret = ceph_cls_unlock(osdc, &oid, &rbd_dev->header_oloc, RBD_LOCK_NAME, + ""); + if (ret && ret != -ENOENT) + rbd_warn(rbd_dev, "failed to unlock object map: %d", ret); +} + +static int decode_object_map_header(void **p, void *end, u64 *object_map_size) +{ + u8 struct_v; + u32 struct_len; + u32 header_len; + void *header_end; + int ret; + + ceph_decode_32_safe(p, end, header_len, e_inval); + header_end = *p + header_len; + + ret = ceph_start_decoding(p, end, 1, "BitVector header", &struct_v, + &struct_len); + if (ret) + return ret; + + ceph_decode_64_safe(p, end, *object_map_size, e_inval); + + *p = header_end; + return 0; + +e_inval: + return -EINVAL; +} + +static int __rbd_object_map_load(struct rbd_device *rbd_dev) +{ + struct ceph_osd_client *osdc = &rbd_dev->rbd_client->client->osdc; + CEPH_DEFINE_OID_ONSTACK(oid); + struct page **pages; + void *p, *end; + size_t reply_len; + u64 num_objects; + u64 object_map_bytes; + u64 object_map_size; + int num_pages; + int ret; + + rbd_assert(!rbd_dev->object_map && !rbd_dev->object_map_size); + + num_objects = ceph_get_num_objects(&rbd_dev->layout, + rbd_dev->mapping.size); + object_map_bytes = DIV_ROUND_UP_ULL(num_objects * BITS_PER_OBJ, + BITS_PER_BYTE); + num_pages = calc_pages_for(0, object_map_bytes) + 1; + pages = ceph_alloc_page_vector(num_pages, GFP_KERNEL); + if (IS_ERR(pages)) + return PTR_ERR(pages); + + reply_len = num_pages * PAGE_SIZE; + rbd_object_map_name(rbd_dev, rbd_dev->spec->snap_id, &oid); + ret = ceph_osdc_call(osdc, &oid, &rbd_dev->header_oloc, + "rbd", "object_map_load", CEPH_OSD_FLAG_READ, + NULL, 0, pages, &reply_len); + if (ret) + goto out; + + p = page_address(pages[0]); + end = p + min(reply_len, (size_t)PAGE_SIZE); + ret = decode_object_map_header(&p, end, &object_map_size); + if (ret) + goto out; + + if (object_map_size != num_objects) { + rbd_warn(rbd_dev, "object map size mismatch: %llu vs %llu", + object_map_size, num_objects); + ret = -EINVAL; + goto out; + } + + if (offset_in_page(p) + object_map_bytes > reply_len) { + ret = -EINVAL; + goto out; + } + + rbd_dev->object_map = kvmalloc(object_map_bytes, GFP_KERNEL); + if (!rbd_dev->object_map) { + ret = -ENOMEM; + goto out; + } + + rbd_dev->object_map_size = object_map_size; + ceph_copy_from_page_vector(pages, rbd_dev->object_map, + offset_in_page(p), object_map_bytes); + +out: + ceph_release_page_vector(pages, num_pages); + return ret; +} + +static void rbd_object_map_free(struct rbd_device *rbd_dev) +{ + kvfree(rbd_dev->object_map); + rbd_dev->object_map = NULL; + rbd_dev->object_map_size = 0; +} + +static int rbd_object_map_load(struct rbd_device *rbd_dev) +{ + int ret; + + ret = __rbd_object_map_load(rbd_dev); + if (ret) + return ret; + + ret = rbd_dev_v2_get_flags(rbd_dev); + if (ret) { + rbd_object_map_free(rbd_dev); + return ret; + } + + if (rbd_dev->object_map_flags & RBD_FLAG_OBJECT_MAP_INVALID) + rbd_warn(rbd_dev, "object map is invalid"); + + return 0; +} + +static int rbd_object_map_open(struct rbd_device *rbd_dev) +{ + int ret; + + ret = rbd_object_map_lock(rbd_dev); + if (ret) + return ret; + + ret = rbd_object_map_load(rbd_dev); + if (ret) { + rbd_object_map_unlock(rbd_dev); + return ret; + } + + return 0; +} + +static void rbd_object_map_close(struct rbd_device *rbd_dev) +{ + rbd_object_map_free(rbd_dev); + rbd_object_map_unlock(rbd_dev); +} + +/* + * This function needs snap_id (or more precisely just something to + * distinguish between HEAD and snapshot object maps), new_state and + * current_state that were passed to rbd_object_map_update(). + * + * To avoid allocating and stashing a context we piggyback on the OSD + * request. A HEAD update has two ops (assert_locked). For new_state + * and current_state we decode our own object_map_update op, encoded in + * rbd_cls_object_map_update(). + */ +static int rbd_object_map_update_finish(struct rbd_obj_request *obj_req, + struct ceph_osd_request *osd_req) +{ + struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; + struct ceph_osd_data *osd_data; + u64 objno; + u8 state, new_state, current_state; + bool has_current_state; + void *p; + + if (osd_req->r_result) + return osd_req->r_result; + + /* + * Nothing to do for a snapshot object map. + */ + if (osd_req->r_num_ops == 1) + return 0; + + /* + * Update in-memory HEAD object map. + */ + rbd_assert(osd_req->r_num_ops == 2); + osd_data = osd_req_op_data(osd_req, 1, cls, request_data); + rbd_assert(osd_data->type == CEPH_OSD_DATA_TYPE_PAGES); + + p = page_address(osd_data->pages[0]); + objno = ceph_decode_64(&p); + rbd_assert(objno == obj_req->ex.oe_objno); + rbd_assert(ceph_decode_64(&p) == objno + 1); + new_state = ceph_decode_8(&p); + has_current_state = ceph_decode_8(&p); + if (has_current_state) + current_state = ceph_decode_8(&p); + + spin_lock(&rbd_dev->object_map_lock); + state = __rbd_object_map_get(rbd_dev, objno); + if (!has_current_state || current_state == state || + (current_state == OBJECT_EXISTS && state == OBJECT_EXISTS_CLEAN)) + __rbd_object_map_set(rbd_dev, objno, new_state); + spin_unlock(&rbd_dev->object_map_lock); + + return 0; +} + +static void rbd_object_map_callback(struct ceph_osd_request *osd_req) +{ + struct rbd_obj_request *obj_req = osd_req->r_priv; + int result; + + dout("%s osd_req %p result %d for obj_req %p\n", __func__, osd_req, + osd_req->r_result, obj_req); + + result = rbd_object_map_update_finish(obj_req, osd_req); + rbd_obj_handle_request(obj_req, result); +} + +static bool update_needed(struct rbd_device *rbd_dev, u64 objno, u8 new_state) +{ + u8 state = rbd_object_map_get(rbd_dev, objno); + + if (state == new_state || + (new_state == OBJECT_PENDING && state == OBJECT_NONEXISTENT) || + (new_state == OBJECT_NONEXISTENT && state != OBJECT_PENDING)) + return false; + + return true; +} + +static int rbd_cls_object_map_update(struct ceph_osd_request *req, + int which, u64 objno, u8 new_state, + const u8 *current_state) +{ + struct page **pages; + void *p, *start; + int ret; + + ret = osd_req_op_cls_init(req, which, "rbd", "object_map_update"); + if (ret) + return ret; + + pages = ceph_alloc_page_vector(1, GFP_NOIO); + if (IS_ERR(pages)) + return PTR_ERR(pages); + + p = start = page_address(pages[0]); + ceph_encode_64(&p, objno); + ceph_encode_64(&p, objno + 1); + ceph_encode_8(&p, new_state); + if (current_state) { + ceph_encode_8(&p, 1); + ceph_encode_8(&p, *current_state); + } else { + ceph_encode_8(&p, 0); + } + + osd_req_op_cls_request_data_pages(req, which, pages, p - start, 0, + false, true); + return 0; +} + +/* + * Return: + * 0 - object map update sent + * 1 - object map update isn't needed + * <0 - error + */ +static int rbd_object_map_update(struct rbd_obj_request *obj_req, u64 snap_id, + u8 new_state, const u8 *current_state) +{ + struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; + struct ceph_osd_client *osdc = &rbd_dev->rbd_client->client->osdc; + struct ceph_osd_request *req; + int num_ops = 1; + int which = 0; + int ret; + + if (snap_id == CEPH_NOSNAP) { + if (!update_needed(rbd_dev, obj_req->ex.oe_objno, new_state)) + return 1; + + num_ops++; /* assert_locked */ + } + + req = ceph_osdc_alloc_request(osdc, NULL, num_ops, false, GFP_NOIO); + if (!req) + return -ENOMEM; + + list_add_tail(&req->r_unsafe_item, &obj_req->osd_reqs); + req->r_callback = rbd_object_map_callback; + req->r_priv = obj_req; + + rbd_object_map_name(rbd_dev, snap_id, &req->r_base_oid); + ceph_oloc_copy(&req->r_base_oloc, &rbd_dev->header_oloc); + req->r_flags = CEPH_OSD_FLAG_WRITE; + ktime_get_real_ts64(&req->r_mtime); + + if (snap_id == CEPH_NOSNAP) { + /* + * Protect against possible race conditions during lock + * ownership transitions. + */ + ret = ceph_cls_assert_locked(req, which++, RBD_LOCK_NAME, + CEPH_CLS_LOCK_EXCLUSIVE, "", ""); + if (ret) + return ret; + } + + ret = rbd_cls_object_map_update(req, which, obj_req->ex.oe_objno, + new_state, current_state); + if (ret) + return ret; + + ret = ceph_osdc_alloc_messages(req, GFP_NOIO); + if (ret) + return ret; + + ceph_osdc_start_request(osdc, req, false); + return 0; +} + static void prune_extents(struct ceph_file_extent *img_extents, u32 *num_img_extents, u64 overlap) { @@ -1975,6 +2452,7 @@ static int rbd_obj_init_discard(struct rbd_obj_request *obj_req) if (ret) return ret; + obj_req->flags |= RBD_OBJ_FLAG_NOOP_FOR_NONEXISTENT; if (rbd_obj_is_entire(obj_req) && !obj_req->num_img_extents) obj_req->flags |= RBD_OBJ_FLAG_DELETION; @@ -2022,6 +2500,7 @@ static int rbd_obj_init_zeroout(struct rbd_obj_request *obj_req) if (rbd_obj_copyup_enabled(obj_req)) obj_req->flags |= RBD_OBJ_FLAG_COPYUP_ENABLED; if (!obj_req->num_img_extents) { + obj_req->flags |= RBD_OBJ_FLAG_NOOP_FOR_NONEXISTENT; if (rbd_obj_is_entire(obj_req)) obj_req->flags |= RBD_OBJ_FLAG_DELETION; } @@ -2407,6 +2886,20 @@ static void rbd_img_schedule(struct rbd_img_request *img_req, int result) queue_work(rbd_wq, &img_req->work); } +static bool rbd_obj_may_exist(struct rbd_obj_request *obj_req) +{ + struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; + + if (rbd_object_map_may_exist(rbd_dev, obj_req->ex.oe_objno)) { + obj_req->flags |= RBD_OBJ_FLAG_MAY_EXIST; + return true; + } + + dout("%s %p objno %llu assuming dne\n", __func__, obj_req, + obj_req->ex.oe_objno); + return false; +} + static int rbd_obj_read_object(struct rbd_obj_request *obj_req) { struct ceph_osd_request *osd_req; @@ -2482,10 +2975,17 @@ static bool rbd_obj_advance_read(struct rbd_obj_request *obj_req, int *result) struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; int ret; +again: switch (obj_req->read_state) { case RBD_OBJ_READ_START: rbd_assert(!*result); + if (!rbd_obj_may_exist(obj_req)) { + *result = -ENOENT; + obj_req->read_state = RBD_OBJ_READ_OBJECT; + goto again; + } + ret = rbd_obj_read_object(obj_req); if (ret) { *result = ret; @@ -2536,6 +3036,44 @@ static bool rbd_obj_advance_read(struct rbd_obj_request *obj_req, int *result) } } +static bool rbd_obj_write_is_noop(struct rbd_obj_request *obj_req) +{ + struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; + + if (rbd_object_map_may_exist(rbd_dev, obj_req->ex.oe_objno)) + obj_req->flags |= RBD_OBJ_FLAG_MAY_EXIST; + + if (!(obj_req->flags & RBD_OBJ_FLAG_MAY_EXIST) && + (obj_req->flags & RBD_OBJ_FLAG_NOOP_FOR_NONEXISTENT)) { + dout("%s %p noop for nonexistent\n", __func__, obj_req); + return true; + } + + return false; +} + +/* + * Return: + * 0 - object map update sent + * 1 - object map update isn't needed + * <0 - error + */ +static int rbd_obj_write_pre_object_map(struct rbd_obj_request *obj_req) +{ + struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; + u8 new_state; + + if (!(rbd_dev->header.features & RBD_FEATURE_OBJECT_MAP)) + return 1; + + if (obj_req->flags & RBD_OBJ_FLAG_DELETION) + new_state = OBJECT_PENDING; + else + new_state = OBJECT_EXISTS; + + return rbd_object_map_update(obj_req, CEPH_NOSNAP, new_state, NULL); +} + static int rbd_obj_write_object(struct rbd_obj_request *obj_req) { struct ceph_osd_request *osd_req; @@ -2706,6 +3244,41 @@ static int rbd_obj_copyup_read_parent(struct rbd_obj_request *obj_req) return rbd_obj_read_from_parent(obj_req); } +static void rbd_obj_copyup_object_maps(struct rbd_obj_request *obj_req) +{ + struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; + struct ceph_snap_context *snapc = obj_req->img_request->snapc; + u8 new_state; + u32 i; + int ret; + + rbd_assert(!obj_req->pending.result && !obj_req->pending.num_pending); + + if (!(rbd_dev->header.features & RBD_FEATURE_OBJECT_MAP)) + return; + + if (obj_req->flags & RBD_OBJ_FLAG_COPYUP_ZEROS) + return; + + for (i = 0; i < snapc->num_snaps; i++) { + if ((rbd_dev->header.features & RBD_FEATURE_FAST_DIFF) && + i + 1 < snapc->num_snaps) + new_state = OBJECT_EXISTS_CLEAN; + else + new_state = OBJECT_EXISTS; + + ret = rbd_object_map_update(obj_req, snapc->snaps[i], + new_state, NULL); + if (ret < 0) { + obj_req->pending.result = ret; + return; + } + + rbd_assert(!ret); + obj_req->pending.num_pending++; + } +} + static void rbd_obj_copyup_write_object(struct rbd_obj_request *obj_req) { u32 bytes = rbd_obj_img_extents_bytes(obj_req); @@ -2749,6 +3322,7 @@ static void rbd_obj_copyup_write_object(struct rbd_obj_request *obj_req) static bool rbd_obj_advance_copyup(struct rbd_obj_request *obj_req, int *result) { + struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; int ret; again: @@ -2776,6 +3350,25 @@ static bool rbd_obj_advance_copyup(struct rbd_obj_request *obj_req, int *result) obj_req->flags |= RBD_OBJ_FLAG_COPYUP_ZEROS; } + rbd_obj_copyup_object_maps(obj_req); + if (!obj_req->pending.num_pending) { + *result = obj_req->pending.result; + obj_req->copyup_state = RBD_OBJ_COPYUP_OBJECT_MAPS; + goto again; + } + obj_req->copyup_state = __RBD_OBJ_COPYUP_OBJECT_MAPS; + return false; + case __RBD_OBJ_COPYUP_OBJECT_MAPS: + if (!pending_result_dec(&obj_req->pending, result)) + return false; + /* fall through */ + case RBD_OBJ_COPYUP_OBJECT_MAPS: + if (*result) { + rbd_warn(rbd_dev, "snap object map update failed: %d", + *result); + return true; + } + rbd_obj_copyup_write_object(obj_req); if (!obj_req->pending.num_pending) { *result = obj_req->pending.result; @@ -2795,6 +3388,27 @@ static bool rbd_obj_advance_copyup(struct rbd_obj_request *obj_req, int *result) } } +/* + * Return: + * 0 - object map update sent + * 1 - object map update isn't needed + * <0 - error + */ +static int rbd_obj_write_post_object_map(struct rbd_obj_request *obj_req) +{ + struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; + u8 current_state = OBJECT_PENDING; + + if (!(rbd_dev->header.features & RBD_FEATURE_OBJECT_MAP)) + return 1; + + if (!(obj_req->flags & RBD_OBJ_FLAG_DELETION)) + return 1; + + return rbd_object_map_update(obj_req, CEPH_NOSNAP, OBJECT_NONEXISTENT, + ¤t_state); +} + static bool rbd_obj_advance_write(struct rbd_obj_request *obj_req, int *result) { struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; @@ -2805,6 +3419,24 @@ static bool rbd_obj_advance_write(struct rbd_obj_request *obj_req, int *result) case RBD_OBJ_WRITE_START: rbd_assert(!*result); + if (rbd_obj_write_is_noop(obj_req)) + return true; + + ret = rbd_obj_write_pre_object_map(obj_req); + if (ret < 0) { + *result = ret; + return true; + } + obj_req->write_state = RBD_OBJ_WRITE_PRE_OBJECT_MAP; + if (ret > 0) + goto again; + return false; + case RBD_OBJ_WRITE_PRE_OBJECT_MAP: + if (*result) { + rbd_warn(rbd_dev, "pre object map update failed: %d", + *result); + return true; + } ret = rbd_obj_write_object(obj_req); if (ret) { *result = ret; @@ -2837,8 +3469,23 @@ static bool rbd_obj_advance_write(struct rbd_obj_request *obj_req, int *result) return false; /* fall through */ case RBD_OBJ_WRITE_COPYUP: - if (*result) + if (*result) { rbd_warn(rbd_dev, "copyup failed: %d", *result); + return true; + } + ret = rbd_obj_write_post_object_map(obj_req); + if (ret < 0) { + *result = ret; + return true; + } + obj_req->write_state = RBD_OBJ_WRITE_POST_OBJECT_MAP; + if (ret > 0) + goto again; + return false; + case RBD_OBJ_WRITE_POST_OBJECT_MAP: + if (*result) + rbd_warn(rbd_dev, "post object map update failed: %d", + *result); return true; default: BUG(); @@ -2892,7 +3539,8 @@ static bool need_exclusive_lock(struct rbd_img_request *img_req) return false; rbd_assert(!test_bit(IMG_REQ_CHILD, &img_req->flags)); - if (rbd_dev->opts->lock_on_read) + if (rbd_dev->opts->lock_on_read || + (rbd_dev->header.features & RBD_FEATURE_OBJECT_MAP)) return true; return rbd_img_is_write(img_req); @@ -3427,7 +4075,7 @@ static int rbd_try_lock(struct rbd_device *rbd_dev) if (ret) goto out; /* request lock or error */ - rbd_warn(rbd_dev, "%s%llu seems dead, breaking lock", + rbd_warn(rbd_dev, "breaking header lock owned by %s%llu", ENTITY_NAME(lockers[0].id.name)); ret = ceph_monc_blacklist_add(&client->monc, @@ -3454,6 +4102,19 @@ static int rbd_try_lock(struct rbd_device *rbd_dev) return ret; } +static int rbd_post_acquire_action(struct rbd_device *rbd_dev) +{ + int ret; + + if (rbd_dev->header.features & RBD_FEATURE_OBJECT_MAP) { + ret = rbd_object_map_open(rbd_dev); + if (ret) + return ret; + } + + return 0; +} + /* * Return: * 0 - lock acquired @@ -3497,6 +4158,17 @@ static int rbd_try_acquire_lock(struct rbd_device *rbd_dev) rbd_assert(rbd_dev->lock_state == RBD_LOCK_STATE_LOCKED); rbd_assert(list_empty(&rbd_dev->running_list)); + ret = rbd_post_acquire_action(rbd_dev); + if (ret) { + rbd_warn(rbd_dev, "post-acquire action failed: %d", ret); + /* + * Can't stay in RBD_LOCK_STATE_LOCKED because + * rbd_lock_add_request() would let the request through, + * assuming that e.g. object map is locked and loaded. + */ + rbd_unlock(rbd_dev); + } + out: wake_requests(rbd_dev, ret); up_write(&rbd_dev->lock_rwsem); @@ -3570,10 +4242,17 @@ static bool rbd_quiesce_lock(struct rbd_device *rbd_dev) return true; } +static void rbd_pre_release_action(struct rbd_device *rbd_dev) +{ + if (rbd_dev->header.features & RBD_FEATURE_OBJECT_MAP) + rbd_object_map_close(rbd_dev); +} + static void __rbd_release_lock(struct rbd_device *rbd_dev) { rbd_assert(list_empty(&rbd_dev->running_list)); + rbd_pre_release_action(rbd_dev); rbd_unlock(rbd_dev); } @@ -4860,6 +5539,8 @@ static struct rbd_device *__rbd_dev_create(struct rbd_client *rbdc, init_completion(&rbd_dev->acquire_wait); init_completion(&rbd_dev->releasing_wait); + spin_lock_init(&rbd_dev->object_map_lock); + rbd_dev->dev.bus = &rbd_bus_type; rbd_dev->dev.type = &rbd_device_type; rbd_dev->dev.parent = &rbd_root_dev; @@ -5041,6 +5722,32 @@ static int rbd_dev_v2_features(struct rbd_device *rbd_dev) &rbd_dev->header.features); } +/* + * These are generic image flags, but since they are used only for + * object map, store them in rbd_dev->object_map_flags. + * + * For the same reason, this function is called only on object map + * (re)load and not on header refresh. + */ +static int rbd_dev_v2_get_flags(struct rbd_device *rbd_dev) +{ + __le64 snapid = cpu_to_le64(rbd_dev->spec->snap_id); + __le64 flags; + int ret; + + ret = rbd_obj_method_sync(rbd_dev, &rbd_dev->header_oid, + &rbd_dev->header_oloc, "get_flags", + &snapid, sizeof(snapid), + &flags, sizeof(flags)); + if (ret < 0) + return ret; + if (ret < sizeof(flags)) + return -EBADMSG; + + rbd_dev->object_map_flags = le64_to_cpu(flags); + return 0; +} + struct parent_image_info { u64 pool_id; const char *pool_ns; @@ -6014,6 +6721,7 @@ static void rbd_dev_unprobe(struct rbd_device *rbd_dev) struct rbd_image_header *header; rbd_dev_parent_put(rbd_dev); + rbd_object_map_free(rbd_dev); rbd_dev_mapping_clear(rbd_dev); /* Free dynamic fields from the header, then zero it out */ @@ -6263,6 +6971,13 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth) if (ret) goto err_out_probe; + if (rbd_dev->spec->snap_id != CEPH_NOSNAP && + (rbd_dev->header.features & RBD_FEATURE_OBJECT_MAP)) { + ret = rbd_object_map_load(rbd_dev); + if (ret) + goto err_out_probe; + } + if (rbd_dev->header.features & RBD_FEATURE_LAYERING) { ret = rbd_dev_v2_parent_info(rbd_dev); if (ret) diff --git a/drivers/block/rbd_types.h b/drivers/block/rbd_types.h index 62ff50d3e7a6..ac98ab6ccd3b 100644 --- a/drivers/block/rbd_types.h +++ b/drivers/block/rbd_types.h @@ -18,6 +18,7 @@ /* For format version 2, rbd image 'foo' consists of objects * rbd_id.foo - id of image * rbd_header. - image metadata + * rbd_object_map. - optional image object map * rbd_data..0000000000000000 * rbd_data..0000000000000001 * ... - data @@ -25,6 +26,7 @@ */ #define RBD_HEADER_PREFIX "rbd_header." +#define RBD_OBJECT_MAP_PREFIX "rbd_object_map." #define RBD_ID_PREFIX "rbd_id." #define RBD_V2_DATA_FORMAT "%s.%016llx" @@ -39,6 +41,14 @@ enum rbd_notify_op { RBD_NOTIFY_OP_HEADER_UPDATE = 3, }; +#define OBJECT_NONEXISTENT 0 +#define OBJECT_EXISTS 1 +#define OBJECT_PENDING 2 +#define OBJECT_EXISTS_CLEAN 3 + +#define RBD_FLAG_OBJECT_MAP_INVALID (1ULL << 0) +#define RBD_FLAG_FAST_DIFF_INVALID (1ULL << 1) + /* * For format version 1, rbd image 'foo' consists of objects * foo.rbd - image metadata diff --git a/include/linux/ceph/cls_lock_client.h b/include/linux/ceph/cls_lock_client.h index bea6c77d2093..17bc7584d1fe 100644 --- a/include/linux/ceph/cls_lock_client.h +++ b/include/linux/ceph/cls_lock_client.h @@ -52,4 +52,7 @@ int ceph_cls_lock_info(struct ceph_osd_client *osdc, char *lock_name, u8 *type, char **tag, struct ceph_locker **lockers, u32 *num_lockers); +int ceph_cls_assert_locked(struct ceph_osd_request *req, int which, + char *lock_name, u8 type, char *cookie, char *tag); + #endif diff --git a/include/linux/ceph/striper.h b/include/linux/ceph/striper.h index cbd0d24b7148..3486636c0e6e 100644 --- a/include/linux/ceph/striper.h +++ b/include/linux/ceph/striper.h @@ -66,4 +66,6 @@ int ceph_extent_to_file(struct ceph_file_layout *l, struct ceph_file_extent **file_extents, u32 *num_file_extents); +u64 ceph_get_num_objects(struct ceph_file_layout *l, u64 size); + #endif diff --git a/net/ceph/cls_lock_client.c b/net/ceph/cls_lock_client.c index 56bbfe01e3ac..99cce6f3ec48 100644 --- a/net/ceph/cls_lock_client.c +++ b/net/ceph/cls_lock_client.c @@ -6,6 +6,7 @@ #include #include +#include /** * ceph_cls_lock - grab rados lock for object @@ -375,3 +376,47 @@ int ceph_cls_lock_info(struct ceph_osd_client *osdc, return ret; } EXPORT_SYMBOL(ceph_cls_lock_info); + +int ceph_cls_assert_locked(struct ceph_osd_request *req, int which, + char *lock_name, u8 type, char *cookie, char *tag) +{ + int assert_op_buf_size; + int name_len = strlen(lock_name); + int cookie_len = strlen(cookie); + int tag_len = strlen(tag); + struct page **pages; + void *p, *end; + int ret; + + assert_op_buf_size = name_len + sizeof(__le32) + + cookie_len + sizeof(__le32) + + tag_len + sizeof(__le32) + + sizeof(u8) + CEPH_ENCODING_START_BLK_LEN; + if (assert_op_buf_size > PAGE_SIZE) + return -E2BIG; + + ret = osd_req_op_cls_init(req, which, "lock", "assert_locked"); + if (ret) + return ret; + + pages = ceph_alloc_page_vector(1, GFP_NOIO); + if (IS_ERR(pages)) + return PTR_ERR(pages); + + p = page_address(pages[0]); + end = p + assert_op_buf_size; + + /* encode cls_lock_assert_op struct */ + ceph_start_encoding(&p, 1, 1, + assert_op_buf_size - CEPH_ENCODING_START_BLK_LEN); + ceph_encode_string(&p, end, lock_name, name_len); + ceph_encode_8(&p, type); + ceph_encode_string(&p, end, cookie, cookie_len); + ceph_encode_string(&p, end, tag, tag_len); + WARN_ON(p != end); + + osd_req_op_cls_request_data_pages(req, which, pages, assert_op_buf_size, + 0, false, true); + return 0; +} +EXPORT_SYMBOL(ceph_cls_assert_locked); diff --git a/net/ceph/striper.c b/net/ceph/striper.c index c36462dc86b7..3b3fa75d1189 100644 --- a/net/ceph/striper.c +++ b/net/ceph/striper.c @@ -259,3 +259,20 @@ int ceph_extent_to_file(struct ceph_file_layout *l, return 0; } EXPORT_SYMBOL(ceph_extent_to_file); + +u64 ceph_get_num_objects(struct ceph_file_layout *l, u64 size) +{ + u64 period = (u64)l->stripe_count * l->object_size; + u64 num_periods = DIV64_U64_ROUND_UP(size, period); + u64 remainder_bytes; + u64 remainder_objs = 0; + + div64_u64_rem(size, period, &remainder_bytes); + if (remainder_bytes > 0 && + remainder_bytes < (u64)l->stripe_count * l->stripe_unit) + remainder_objs = l->stripe_count - + DIV_ROUND_UP_ULL(remainder_bytes, l->stripe_unit); + + return num_periods * l->stripe_count - remainder_objs; +} +EXPORT_SYMBOL(ceph_get_num_objects); From patchwork Tue Jun 25 14:41:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 11015787 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A3A521398 for ; Tue, 25 Jun 2019 14:41:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 96494286A8 for ; Tue, 25 Jun 2019 14:41:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8B027289EA; Tue, 25 Jun 2019 14:41:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 395C628C37 for ; Tue, 25 Jun 2019 14:41:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731831AbfFYOla (ORCPT ); Tue, 25 Jun 2019 10:41:30 -0400 Received: from mail-wr1-f68.google.com ([209.85.221.68]:38547 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731697AbfFYOl3 (ORCPT ); Tue, 25 Jun 2019 10:41:29 -0400 Received: by mail-wr1-f68.google.com with SMTP id d18so18210483wrs.5 for ; Tue, 25 Jun 2019 07:41:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=qBDzcKemw5w5t3/Ex4/+gKGh8Auku8BKKDXCCz1dO1w=; b=nZrKjMlnK18M4BAH4U/E66R11Ob8xsS2DrAeJQtBZ7QTqidi5gNG2gMxjl+k9XiBPb P+CC+MO00KtwqGm1Lso37GL8xBCDp5ZgoCKoP49z+NXuK4QCbWgIJJPie5l/vK0nUWSV fIkoL39x+5CsH5vFDYvBIzgWQKKEKLLv9wwk2IuMqGYVGzCAxKeq0al+W9G0k7azn/UR 0ukuCgJ/9HY4PwqXJTdAlQgxgEcU6BTtyUjPTinuHFnzSqfzItNEQALhgySHjW4HMAP9 BhDhbVldhp5NR1XPY6+ru/C+uqOJ32V/bi1IsGvIqsg8FEpMwjz9rd24QxnaGkceam+O fgzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qBDzcKemw5w5t3/Ex4/+gKGh8Auku8BKKDXCCz1dO1w=; b=jEdzqRs/hr9q/Q3EET5ov93+rb60G3z7AdhZY635PHGNbM5wDgPkystxzuCIcrYmCS fOx46YqtKdTboDcn7MGi4eZBsuTBAeTH79G0OF38tdieXDPVh50QR6bmScbYmYaVc0bA wxbYSXJ4eqg46A+uy1SWAIFER6Rdo5WRYW//gadQxJkI1OR+imMvZwxBV0pnhJyCiPuT Pms3FUh6UjEl/fmb65JqFxH18T180yWw59/El0putDkVbTQOf9E7s4vCnP1j6HhnYHFI ktLp8ElCliBMDkaIB6Ei4PF+othfcn0Qa3dC/peX+kS+FbCmJy6KXw05Y1Ajovh+kPwz QMGA== X-Gm-Message-State: APjAAAUOypFW9Y++VS05SHEwrfWMRRyxGJWRki/eTi8Epi1lOu1p4Bp9 kKUTWHNVKmtIWYqC+8YWHIFt6k6frBw= X-Google-Smtp-Source: APXvYqyHDWCo6j/YBTFbV/Q5oGCYoHB9f9ULCARZbkvNyhKPXT7u9T8PEF/t66EJAtG4zD65353PBg== X-Received: by 2002:a5d:42ca:: with SMTP id t10mr3591820wrr.202.1561473687513; Tue, 25 Jun 2019 07:41:27 -0700 (PDT) Received: from kwango.redhat.com (ovpn-brq.redhat.com. [213.175.37.11]) by smtp.gmail.com with ESMTPSA id f2sm20282378wrq.48.2019.06.25.07.41.26 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 25 Jun 2019 07:41:27 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Dongsheng Yang Subject: [PATCH 20/20] rbd: setallochint only if object doesn't exist Date: Tue, 25 Jun 2019 16:41:11 +0200 Message-Id: <20190625144111.11270-21-idryomov@gmail.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190625144111.11270-1-idryomov@gmail.com> References: <20190625144111.11270-1-idryomov@gmail.com> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP setallochint is really only useful on object creation. Continue hinting unconditionally if object map cannot be used. Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang --- drivers/block/rbd.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 756595f5fbc9..5dc217530f0f 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -2366,9 +2366,12 @@ static void __rbd_osd_setup_write_ops(struct ceph_osd_request *osd_req, struct rbd_device *rbd_dev = obj_req->img_request->rbd_dev; u16 opcode; - osd_req_op_alloc_hint_init(osd_req, which++, - rbd_dev->layout.object_size, - rbd_dev->layout.object_size); + if (!use_object_map(rbd_dev) || + !(obj_req->flags & RBD_OBJ_FLAG_MAY_EXIST)) { + osd_req_op_alloc_hint_init(osd_req, which++, + rbd_dev->layout.object_size, + rbd_dev->layout.object_size); + } if (rbd_obj_is_entire(obj_req)) opcode = CEPH_OSD_OP_WRITEFULL; @@ -2511,9 +2514,15 @@ static int rbd_obj_init_zeroout(struct rbd_obj_request *obj_req) static int count_write_ops(struct rbd_obj_request *obj_req) { - switch (obj_req->img_request->op_type) { + struct rbd_img_request *img_req = obj_req->img_request; + + switch (img_req->op_type) { case OBJ_OP_WRITE: - return 2; /* setallochint + write/writefull */ + if (!use_object_map(img_req->rbd_dev) || + !(obj_req->flags & RBD_OBJ_FLAG_MAY_EXIST)) + return 2; /* setallochint + write/writefull */ + + return 1; /* write/writefull */ case OBJ_OP_DISCARD: return 1; /* delete/truncate/zero */ case OBJ_OP_ZEROOUT: