From patchwork Tue Sep 19 18:02:27 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 9959903 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id BC5D860568 for ; Tue, 19 Sep 2017 18:03:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AA3192877B for ; Tue, 19 Sep 2017 18:03:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9E46A289B3; Tue, 19 Sep 2017 18:03:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 366E22877B for ; Tue, 19 Sep 2017 18:03:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751499AbdISSDI (ORCPT ); Tue, 19 Sep 2017 14:03:08 -0400 Received: from mail-wm0-f68.google.com ([74.125.82.68]:38506 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751000AbdISSDH (ORCPT ); Tue, 19 Sep 2017 14:03:07 -0400 Received: by mail-wm0-f68.google.com with SMTP id x17so334244wmd.5 for ; Tue, 19 Sep 2017 11:03:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id; bh=01tO9W9tPGe4Y+drQigPmR0ID1D/mFiS4ZGBVH1yqG0=; b=eg0U+yxDTzvE+owJBg88UiNvlJnRTw0y5MFSkzoY24LKcbOr7oPxxjIHvR8RlG/DfI rF3sO87LKDbQZP5puqxmlt9WdlPOgJj8KY6DmqBBsi+Gzp/aznf63kA47mgGTYTd8CzL uDp+1k1plvX9WJpQ0/do3SvwuiWDD8V2dMtIFLq+Lr+DDQ8O81eam35wvaipWUe7ppAM yswVo79KsQOd8Ysy99z/Ona+L1UY++d757sI71n3WUcj1FATbjfl5phh8hPxbCA4v4uW kAxLr17BglAqrs0XmnhGtI5aXZRJs3c9zxnBhwAPPYC+1Q32zW9jvevzGELVN6fSqVY/ sKkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=01tO9W9tPGe4Y+drQigPmR0ID1D/mFiS4ZGBVH1yqG0=; b=fUY4GVWkoT7wOaaJPRNLS9CbEonVfIFM95a9DzSJ6uoV9awzKVF5yXncIX7ry/cKyV 40Z3Xf5YwX6KbbNpLrYBRYaRPkzPfSpwQsx1c38NKjRO+4/ipstslkyHGpcr1r/J8T6r AaGZOI2yFL2HhftOLc3xmpawzNJDPX+07bR9vLDv6Ym3kD/QZCj95vGPfHEuUcCOAxgy kq7ia4p2gppDRmb3LUtzukZkLfPhYsRlc4IgqZJOEo4y33i09P3ce6ncUvw6+E803JKa sEpS51FLzOc3/4+hZJUfKtFFzr9RAB64js0KtvzE7f6vc711DdbisYf06o8RtmOyRQd8 1s2A== X-Gm-Message-State: AHPjjUgQ2GLwMaiIiApgU847UH9ZeKGRntPTe/o8ghaP/IZcUSnnDhlV NYnUSpPXF7k/4p3bq+6nGyB/mTIU X-Google-Smtp-Source: AOwi7QAHNynambXBuCoCeG0dnW1poZcuGuImHqF0o8gg8pbAG5BqqjUQRj8CH8TLnVSFZFV1j+WxkA== X-Received: by 10.28.206.205 with SMTP id e196mr1879961wmg.149.1505844186257; Tue, 19 Sep 2017 11:03:06 -0700 (PDT) Received: from orange.brq.redhat.com. (nat-pool-brq-t.redhat.com. [213.175.37.10]) by smtp.gmail.com with ESMTPSA id q140sm1753311wmd.17.2017.09.19.11.03.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 19 Sep 2017 11:03:04 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Subject: [PATCH] libceph: don't allow bidirectional swap of pg-upmap-items Date: Tue, 19 Sep 2017 20:02:27 +0200 Message-Id: <1505844147-17221-1-git-send-email-idryomov@gmail.com> X-Mailer: git-send-email 2.4.3 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This reverts most of commit f53b7665c8ce ("libceph: upmap semantic changes"). We need to prevent duplicates in the final result. For example, we can currently take [1,2,3] and apply [(1,2)] and get [2,2,3] or [1,2,3] and apply [(3,2)] and get [1,2,2] The rest of the system is not prepared to handle duplicates in the result set like this. The reverted piece was intended to allow [1,2,3] and [(1,2),(2,1)] to get [2,1,3] to reorder primaries. First, this bidirectional swap is hard to implement in a way that also prevents dups. For example, [1,2,3] and [(1,4),(2,3),(3,4)] would give [4,3,4] but would we just drop the last step we'd have [4,3,3] which is also invalid, etc. Simpler to just not handle bidirectional swaps. In practice, they are not needed: if you just want to choose a different primary then use primary_affinity, or pg_upmap (not pg_upmap_items). Cc: stable@vger.kernel.org # 4.13 Link: http://tracker.ceph.com/issues/21410 Signed-off-by: Ilya Dryomov Reviewed-by: Sage Weil --- net/ceph/osdmap.c | 35 +++++++++++++++++++++++++---------- 1 file changed, 25 insertions(+), 10 deletions(-) diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c index f358d0bfa76b..79d14d70b7ea 100644 --- a/net/ceph/osdmap.c +++ b/net/ceph/osdmap.c @@ -2445,19 +2445,34 @@ static void apply_upmap(struct ceph_osdmap *osdmap, pg = lookup_pg_mapping(&osdmap->pg_upmap_items, pgid); if (pg) { - for (i = 0; i < raw->size; i++) { - for (j = 0; j < pg->pg_upmap_items.len; j++) { - int from = pg->pg_upmap_items.from_to[j][0]; - int to = pg->pg_upmap_items.from_to[j][1]; - - if (from == raw->osds[i]) { - if (!(to != CRUSH_ITEM_NONE && - to < osdmap->max_osd && - osdmap->osd_weight[to] == 0)) - raw->osds[i] = to; + /* + * Note: this approach does not allow a bidirectional swap, + * e.g., [[1,2],[2,1]] applied to [0,1,2] -> [0,2,1]. + */ + for (i = 0; i < pg->pg_upmap_items.len; i++) { + int from = pg->pg_upmap_items.from_to[i][0]; + int to = pg->pg_upmap_items.from_to[i][1]; + int pos = -1; + bool exists = false; + + /* make sure replacement doesn't already appear */ + for (j = 0; j < raw->size; j++) { + int osd = raw->osds[j]; + + if (osd == to) { + exists = true; break; } + /* ignore mapping if target is marked out */ + if (osd == from && pos < 0 && + !(to != CRUSH_ITEM_NONE && + to < osdmap->max_osd && + osdmap->osd_weight[to] == 0)) { + pos = j; + } } + if (!exists && pos >= 0) + raw->osds[pos] = to; } } }