From patchwork Thu Aug 9 16:22:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 10562473 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B2A021057 for ; Fri, 10 Aug 2018 08:51:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A067A2B8A5 for ; Fri, 10 Aug 2018 08:51:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9E0E92B8B2; Fri, 10 Aug 2018 08:51:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 183632B7B6 for ; Fri, 10 Aug 2018 08:51:47 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D81FFC04B928; Fri, 10 Aug 2018 08:51:45 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 988D45914A; Fri, 10 Aug 2018 08:51:45 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 558DE18037ED; Fri, 10 Aug 2018 08:51:45 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id w79GNfCx007286 for ; Thu, 9 Aug 2018 12:23:41 -0400 Received: by smtp.corp.redhat.com (Postfix) id 2D3992010CC7; Thu, 9 Aug 2018 16:23:41 +0000 (UTC) Delivered-To: dm-devel@redhat.com Received: from mx1.redhat.com (ext-mx08.extmail.prod.ext.phx2.redhat.com [10.5.110.32]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 58ED02010CD0; Thu, 9 Aug 2018 16:23:32 +0000 (UTC) Received: from mail-wm0-f67.google.com (mail-wm0-f67.google.com [74.125.82.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CD4C1C057F87; Thu, 9 Aug 2018 16:23:30 +0000 (UTC) Received: by mail-wm0-f67.google.com with SMTP id q8-v6so872928wmq.4; Thu, 09 Aug 2018 09:23:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=Il/MhC8TfNCOJ5TwxrBMfQbgnE3LjB6EfJ9iLALWR2M=; b=lzjDY80NDcCbE8XBqxVf3sGnrcTSFXLbu+IMQ8IAuIVJuqDMqSWj85VlR+c0rdTBGG gjp5Yj+/SGKCazwNGE0JncW4TVMr/zvwzO/iQ+jSknpX8Y2Es/qBLg0QiysA5C3pb62T k78woxzThFUtYrq9Vsx0l2gG1RiG40qOcer0b/SHwkSPTmEgIEILVdoIDflKXY5j2nmt 0t2xbJpyYxdDTremVJYmyG0h82xciIZoYPMrQ3wx/bNFykptbkyu+CMbBMfL8Pu5Pa62 8b3aDYfTLD6GTSHPgpsuXfKXyWs8L0d/M8EYt7c4cJd9ssVXUA8oDtFFHSlaa44S6FK0 Nykg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=Il/MhC8TfNCOJ5TwxrBMfQbgnE3LjB6EfJ9iLALWR2M=; b=BizgAS7ECIM/UpDMs8QI0vJlmS6uEUlfQaglIOqLWzv94y8WtXj1JUoAxI26vnSCVB bAuNVtpDMqFgxQVeepLCs7kw406SzXBQ4SO6kB1+nzIGKgD3NDdLTdcbq/Wg13Vk4JPr oGqGMIo/izL6Dea8WvlbzVQmNWvtu2kXHwv7d/FKCZtw7yA0eAmxm3HdqJdvntGBC9da 43ZxtAaayGxOTWK7ZFFAUY5lSW2B4XmUZtII6R6u4RuD/xoxZJotFMlgJGgbX8QzhFQ1 v0iiUYtOQcNf/KJvsO+6Rf1Z05/JqBvTjdjgp/THCekwlVVqiO+QOxRzrDyx4N557V45 7+pg== X-Gm-Message-State: AOUpUlFa6Pk8WLVnJ2VwV1nAcAcVwk57CzKuQ2X3/xFbq+8Ab5fhoUuY PeN7lVgAfDuDGgAEC1PFaqCN0/g1 X-Google-Smtp-Source: AA+uWPziGpiKrBCa1NNZNxbYGgfUb+XFYRnIUnt7CAiPOlUWK6dSha6N7SpeBRlhjfoqcvHr10r4Nw== X-Received: by 2002:a1c:c64f:: with SMTP id w76-v6mr2139161wmf.3.1533831809261; Thu, 09 Aug 2018 09:23:29 -0700 (PDT) Received: from orange.redhat.com (ip-86-49-115-215.net.upcbroadband.cz. [86.49.115.215]) by smtp.gmail.com with ESMTPSA id j8-v6sm7463582wrp.11.2018.08.09.09.23.28 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 09 Aug 2018 09:23:28 -0700 (PDT) From: Ilya Dryomov To: dm-devel@redhat.com Date: Thu, 9 Aug 2018 18:22:57 +0200 Message-Id: <20180809162257.30800-1-idryomov@gmail.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Thu, 09 Aug 2018 16:23:31 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Thu, 09 Aug 2018 16:23:31 +0000 (UTC) for IP:'74.125.82.67' DOMAIN:'mail-wm0-f67.google.com' HELO:'mail-wm0-f67.google.com' FROM:'idryomov@gmail.com' RCPT:'' X-RedHat-Spam-Score: -0.13 (DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_PASS) 74.125.82.67 mail-wm0-f67.google.com 74.125.82.67 mail-wm0-f67.google.com X-Scanned-By: MIMEDefang 2.78 on 10.5.110.32 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.25 X-loop: dm-devel@redhat.com X-Mailman-Approved-At: Fri, 10 Aug 2018 04:51:18 -0400 Cc: Joe Thornber , Mike Snitzer Subject: [dm-devel] [PATCH] dm cache metadata: set dirty on all cache blocks after a crash X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Fri, 10 Aug 2018 08:51:46 +0000 (UTC) X-Virus-Scanned: ClamAV using ClamSMTP Quoting Documentation/device-mapper/cache.txt: The 'dirty' state for a cache block changes far too frequently for us to keep updating it on the fly. So we treat it as a hint. In normal operation it will be written when the dm device is suspended. If the system crashes all cache blocks will be assumed dirty when restarted. This got broken in commit f177940a8091 ("dm cache metadata: switch to using the new cursor api for loading metadata") in 4.9, which removed the code that consulted cmd->clean_when_opened (CLEAN_SHUTDOWN on-disk flag) when loading cache blocks. This results in data corruption on an unclean shutdown with dirty cache blocks on the fast device. After the crash those blocks are considired clean and may get evicted from the cache at any time. This can be demonstrated by doing a lot of reads to trigger individual evictions, but uncache is more predictable: ### Disable auto-activation in lvm.conf to be able to do uncache in ### time (i.e. see uncache doing flushing) when the fix is applied. # xfs_io -d -c 'pwrite -b 4M -S 0xaa 0 1G' /dev/vdb # vgcreate vg_cache /dev/vdb /dev/vdc # lvcreate -L 1G -n lv_slowdev vg_cache /dev/vdb # lvcreate -L 512M -n lv_cachedev vg_cache /dev/vdc # lvcreate -L 256M -n lv_metadev vg_cache /dev/vdc # lvconvert --type cache-pool --cachemode writeback vg_cache/lv_cachedev --poolmetadata vg_cache/lv_metadev # lvconvert --type cache vg_cache/lv_slowdev --cachepool vg_cache/lv_cachedev # xfs_io -d -c 'pwrite -b 4M -S 0xbb 0 512M' /dev/mapper/vg_cache-lv_slowdev # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ 0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ # dmsetup status vg_cache-lv_slowdev 0 2097152 cache 8 27/65536 128 8192/8192 1 100 0 0 0 8192 7065 2 metadata2 writeback 2 migration_threshold 2048 smq 0 rw - ^^^^ 7065 * 64k = 441M yet to be written to the slow device # echo b >/proc/sysrq-trigger # vgchange -ay vg_cache # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ 0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ # lvconvert --uncache vg_cache/lv_slowdev Flushing 0 blocks for cache vg_cache/lv_slowdev. Logical volume "lv_cachedev" successfully removed Logical volume vg_cache/lv_slowdev is not cached. # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ 0fe00010: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ This is the case with both v1 and v2 cache pool metatata formats. After applying this patch: # vgchange -ay vg_cache # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ 0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ # lvconvert --uncache vg_cache/lv_slowdev Flushing 3724 blocks for cache vg_cache/lv_slowdev. ... Flushing 71 blocks for cache vg_cache/lv_slowdev. Logical volume "lv_cachedev" successfully removed Logical volume vg_cache/lv_slowdev is not cached. # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ 0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ Cc: stable@vger.kernel.org Fixes: f177940a8091 ("dm cache metadata: switch to using the new cursor api for loading metadata") Signed-off-by: Ilya Dryomov --- drivers/md/dm-cache-metadata.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c index 0d7212410e21..8294a780b579 100644 --- a/drivers/md/dm-cache-metadata.c +++ b/drivers/md/dm-cache-metadata.c @@ -1322,6 +1322,7 @@ static int __load_mapping_v1(struct dm_cache_metadata *cmd, dm_oblock_t oblock; unsigned flags; + bool dirty = true; dm_array_cursor_get_value(mapping_cursor, (void **) &mapping_value_le); memcpy(&mapping, mapping_value_le, sizeof(mapping)); @@ -1332,8 +1333,10 @@ static int __load_mapping_v1(struct dm_cache_metadata *cmd, dm_array_cursor_get_value(hint_cursor, (void **) &hint_value_le); memcpy(&hint, hint_value_le, sizeof(hint)); } + if (cmd->clean_when_opened) + dirty = flags & M_DIRTY; - r = fn(context, oblock, to_cblock(cb), flags & M_DIRTY, + r = fn(context, oblock, to_cblock(cb), dirty, le32_to_cpu(hint), hints_valid); if (r) { DMERR("policy couldn't load cache block %llu", @@ -1361,7 +1364,7 @@ static int __load_mapping_v2(struct dm_cache_metadata *cmd, dm_oblock_t oblock; unsigned flags; - bool dirty; + bool dirty = true; dm_array_cursor_get_value(mapping_cursor, (void **) &mapping_value_le); memcpy(&mapping, mapping_value_le, sizeof(mapping)); @@ -1372,8 +1375,9 @@ static int __load_mapping_v2(struct dm_cache_metadata *cmd, dm_array_cursor_get_value(hint_cursor, (void **) &hint_value_le); memcpy(&hint, hint_value_le, sizeof(hint)); } + if (cmd->clean_when_opened) + dirty = dm_bitset_cursor_get_value(dirty_cursor); - dirty = dm_bitset_cursor_get_value(dirty_cursor); r = fn(context, oblock, to_cblock(cb), dirty, le32_to_cpu(hint), hints_valid); if (r) {