From patchwork Fri Nov 27 09:20:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sargun Dhillon X-Patchwork-Id: 11935209 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A3FAC64E75 for ; Fri, 27 Nov 2020 09:21:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 01F822065E for ; Fri, 27 Nov 2020 09:21:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=sargun.me header.i=@sargun.me header.b="HDolisLK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728023AbgK0JVJ (ORCPT ); Fri, 27 Nov 2020 04:21:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42034 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726014AbgK0JVI (ORCPT ); Fri, 27 Nov 2020 04:21:08 -0500 Received: from mail-pg1-x541.google.com (mail-pg1-x541.google.com [IPv6:2607:f8b0:4864:20::541]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98577C0613D1 for ; Fri, 27 Nov 2020 01:21:08 -0800 (PST) Received: by mail-pg1-x541.google.com with SMTP id k11so3893262pgq.2 for ; Fri, 27 Nov 2020 01:21:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sargun.me; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=363/J7FI+YxpDHZBVwKbULZE4SMrK80CjI/t3VX+JKo=; b=HDolisLKeNAEQQTAJLzgVwk1K3d89nsQAXqrAwBs/8B5KxgdeZmaDrkUzA4aXOruDe LsnoAJfJxmngvDz/I0XpUrYkmLeVNpBa5iOU+UyoeG3Ex+MSlZsroy/T4Lca+9Z+eFaG MGT+lFob3K4ZK1fsisjG4sk2mVzNgNJSy/s90= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=363/J7FI+YxpDHZBVwKbULZE4SMrK80CjI/t3VX+JKo=; b=fwPfrQKh5E0XzDT36SZW9slgE8lSuT0f2NwZtFEOiVizqb/2A8eO+B6HiEKQsyjKeS agfYuFrEgiaZHiHX6Y8MTBXD3v+PkXmkEJLqLm5pPCsfcflW5KB1l/IOwBDs/zrHaywg yPF5DoPD0eqwn3uCrv0uw088cR+aIl9C9/o35EdFDY2C/ITDRqfnM74Ar5CVMgH9QZQR faUSS/udIGvwy8SFczf9k68XTGKQDwIHvwlksoLc0AZmdyWNlQ0PAoOZk3k6j/TwuZa7 dwX1bSr7rPsSfmNRXppHvGg68MA3qUMV5fkYtPnhRRo7Nkaov7iEFWGr3VoDnGcIHwnv CWXw== X-Gm-Message-State: AOAM530PI9difY453LCY8tTm7CLMZcjeRcDUWfgd1VbOXe2jpoGr/vUw ZY6NW2PD9m4n0apHLmh7sm/kpQ== X-Google-Smtp-Source: ABdhPJw0CK18w9KERQ13VRm/od39VBC87CyOcbdF7TfrPfunyw6nFTO727BRKDp9c7TgTCStElzIjQ== X-Received: by 2002:a17:90a:4283:: with SMTP id p3mr8846906pjg.174.1606468868120; Fri, 27 Nov 2020 01:21:08 -0800 (PST) Received: from ubuntu.netflix.com (203.20.25.136.in-addr.arpa. [136.25.20.203]) by smtp.gmail.com with ESMTPSA id t9sm9938944pjq.46.2020.11.27.01.21.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Nov 2020 01:21:07 -0800 (PST) From: Sargun Dhillon To: linux-unionfs@vger.kernel.org, miklos@szeredi.hu, Alexander Viro , Amir Goldstein Cc: Sargun Dhillon , Giuseppe Scrivano , Vivek Goyal , Daniel J Walsh , linux-fsdevel@vger.kernel.org, David Howells Subject: [PATCH v2 1/4] fs: Add s_instance_id field to superblock for unique identification Date: Fri, 27 Nov 2020 01:20:55 -0800 Message-Id: <20201127092058.15117-2-sargun@sargun.me> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201127092058.15117-1-sargun@sargun.me> References: <20201127092058.15117-1-sargun@sargun.me> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This assigns a per-boot unique number to each superblock. This allows other components to know whether a filesystem has been remounted since they last interacted with it. At every boot it is reset to 0. There is no specific reason it is set to 0, other than repeatability versus using some random starting number. Because of this, you must store it along some other piece of data which is initialized at boot time. This doesn't have any of the overhead of idr, and a u64 wont wrap any time soon. There is no forward lookup requirement, so an idr is not needed. In the future, we may want to expose this to userspace. Userspace programs can benefit from this if they have large chunks of dirty or mmaped memory that they're interacting with, and they want to see if that volume has been unmounted, and remounted. Along with this, and a mechanism to inspect the superblock's errseq a user can determine whether they need to throw away their cache or similar. This is another benefit in comparison to just using a pointer to the superblock to uniquely identify it. Although this doesn't expose an ioctl or similar yet, in the future we could add an ioctl that allows for fetching the s_instance_id for a given cache, and inspection of the errseq associated with that. Signed-off-by: Sargun Dhillon Cc: David Howells Cc: Al Viro Cc: linux-fsdevel@vger.kernel.org Cc: linux-unionfs@vger.kernel.org --- fs/super.c | 3 +++ include/linux/fs.h | 7 +++++++ 2 files changed, 10 insertions(+) diff --git a/fs/super.c b/fs/super.c index 904459b35119..e47ace7f8c3d 100644 --- a/fs/super.c +++ b/fs/super.c @@ -42,6 +42,7 @@ static int thaw_super_locked(struct super_block *sb); +static u64 s_instance_id_counter; static LIST_HEAD(super_blocks); static DEFINE_SPINLOCK(sb_lock); @@ -546,6 +547,7 @@ struct super_block *sget_fc(struct fs_context *fc, s->s_iflags |= fc->s_iflags; strlcpy(s->s_id, s->s_type->name, sizeof(s->s_id)); list_add_tail(&s->s_list, &super_blocks); + s->s_instance_id = s_instance_id_counter++; hlist_add_head(&s->s_instances, &s->s_type->fs_supers); spin_unlock(&sb_lock); get_filesystem(s->s_type); @@ -625,6 +627,7 @@ struct super_block *sget(struct file_system_type *type, s->s_type = type; strlcpy(s->s_id, type->name, sizeof(s->s_id)); list_add_tail(&s->s_list, &super_blocks); + s->s_instance_id = s_instance_id_counter++; hlist_add_head(&s->s_instances, &type->fs_supers); spin_unlock(&sb_lock); get_filesystem(type); diff --git a/include/linux/fs.h b/include/linux/fs.h index 7519ae003a08..09bf54382a54 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1472,6 +1472,13 @@ struct super_block { char s_id[32]; /* Informational name */ uuid_t s_uuid; /* UUID */ + /* + * ID identifying this particular instance of the superblock. It can + * be used to determine if a particular filesystem has been remounted. + * It may be exposed to userspace. + */ + u64 s_instance_id; + unsigned int s_max_links; fmode_t s_mode; From patchwork Fri Nov 27 09:20:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sargun Dhillon X-Patchwork-Id: 11935213 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC97FC64E7D for ; Fri, 27 Nov 2020 09:21:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 85516206D8 for ; Fri, 27 Nov 2020 09:21:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=sargun.me header.i=@sargun.me header.b="XCbkKPcW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728047AbgK0JVK (ORCPT ); Fri, 27 Nov 2020 04:21:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42042 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726014AbgK0JVK (ORCPT ); Fri, 27 Nov 2020 04:21:10 -0500 Received: from mail-pf1-x443.google.com (mail-pf1-x443.google.com [IPv6:2607:f8b0:4864:20::443]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69DAFC0613D1 for ; Fri, 27 Nov 2020 01:21:10 -0800 (PST) Received: by mail-pf1-x443.google.com with SMTP id y7so3991442pfq.11 for ; Fri, 27 Nov 2020 01:21:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sargun.me; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=N3znUkPKtqbryPODPnE7qItfA1c9fsYLlSw8ARlE2iU=; b=XCbkKPcW90nk8ZdKWXtOH5i9ZE+g53qlXZ8cLPT4L1dCcgEvMy8gG5O6KOjkCiRTS6 T1sT0tOswqbzJT3gDHM0Pr4X0oa1KJkNX6xjd2KuxpINgXtXWLgF9ffxKbDKP1rjpMFz nxxOy0yG5pnW+FbC2hlo0yE1i2S7eA9V7xfbA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=N3znUkPKtqbryPODPnE7qItfA1c9fsYLlSw8ARlE2iU=; b=Jm4P3++k1+037349M/tFc2MWDHZH6svXZ9qUD3b/ZrRDnSZytB0tt78j1EnqzMeG58 A0yPyPOWVl6Lo9TmKYHW9QFef10bS8ojTQU6hxMN8KJ19dvohJVxrBH6VZ8uIiX3ofNF hQYU5Qd0LXL9j62exRUba2AGocu5Rcz5rH38kIoC0wituPxXISc9UC+UdmAd7vIo7sYp DIliVBAvBKlXxwrwTPYKT8Ert5aPX04FuOL/BuUrBgo8zbUJVNgjdneyD91wLmYaZCQP EmLk0OlwB88QwBERi7yl7szR3ZyTHbo/f82ahv5Pk71OQrBlZHtYp4PrdlXegHt8f87X ksuA== X-Gm-Message-State: AOAM5329LwOURXjZjCX4K9gwYgyx5MgpddrDpGqLWmkqHLM9RPx5ou3c M/IxnUaNllQMF7do/XFmxDWFMw== X-Google-Smtp-Source: ABdhPJy+Nm0BqxYRvhxozEhyB0RB1iAA9/rcKtYRc0oHpzl11BL3ij3EUW9jZDE3KklG4mSjjnCSJA== X-Received: by 2002:a17:90a:2a83:: with SMTP id j3mr9035135pjd.84.1606468869856; Fri, 27 Nov 2020 01:21:09 -0800 (PST) Received: from ubuntu.netflix.com (203.20.25.136.in-addr.arpa. [136.25.20.203]) by smtp.gmail.com with ESMTPSA id t9sm9938944pjq.46.2020.11.27.01.21.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Nov 2020 01:21:09 -0800 (PST) From: Sargun Dhillon To: linux-unionfs@vger.kernel.org, miklos@szeredi.hu, Alexander Viro , Amir Goldstein Cc: Sargun Dhillon , Giuseppe Scrivano , Vivek Goyal , Daniel J Walsh , linux-fsdevel@vger.kernel.org, David Howells Subject: [PATCH v2 2/4] overlay: Document current outstanding shortcoming of volatile Date: Fri, 27 Nov 2020 01:20:56 -0800 Message-Id: <20201127092058.15117-3-sargun@sargun.me> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201127092058.15117-1-sargun@sargun.me> References: <20201127092058.15117-1-sargun@sargun.me> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This documents behaviour that was discussed in a thread about the volatile feature. Specifically, how failures can go hidden from asynchronous writes (such as from mmap, or writes that are not immediately flushed to the filesystem). Although we pass through calls like msync, fallocate, and write, and will still return errors on those, it doesn't guarantee all kinds of errors will happen at those times, and thus may hide errors. In the future, we can add error checking to all interactions with the upperdir, and pass through errseq_t from the upperdir on mappings, and other interactions with the filesystem[1]. [1]: https://lore.kernel.org/linux-unionfs/20201116045758.21774-1-sargun@sargun.me/T/#m7d501f375e031056efad626e471a1392dd3aad33 Signed-off-by: Sargun Dhillon Cc: linux-fsdevel@vger.kernel.org Cc: linux-unionfs@vger.kernel.org Cc: Miklos Szeredi Cc: Amir Goldstein Cc: Vivek Goyal --- Documentation/filesystems/overlayfs.rst | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst index 580ab9a0fe31..c6e30c1bc2f2 100644 --- a/Documentation/filesystems/overlayfs.rst +++ b/Documentation/filesystems/overlayfs.rst @@ -570,7 +570,11 @@ Volatile mount This is enabled with the "volatile" mount option. Volatile mounts are not guaranteed to survive a crash. It is strongly recommended that volatile mounts are only used if data written to the overlay can be recreated -without significant effort. +without significant effort. In addition to this, the sync family of syscalls +are not sufficient to determine whether a write failed as sync calls are +omitted. For this reason, it is important that the filesystem used by the +upperdir handles failure in a fashion that's suitable for the user. For +example, upon detecting a fault, ext4 can be configured to panic. The advantage of mounting with the "volatile" option is that all forms of sync calls to the upper filesystem are omitted. From patchwork Fri Nov 27 09:20:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sargun Dhillon X-Patchwork-Id: 11935215 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86E71C63777 for ; Fri, 27 Nov 2020 09:21:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0E4C6206D8 for ; Fri, 27 Nov 2020 09:21:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=sargun.me header.i=@sargun.me header.b="yK1PbGT9" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728077AbgK0JVN (ORCPT ); Fri, 27 Nov 2020 04:21:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42052 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728056AbgK0JVM (ORCPT ); Fri, 27 Nov 2020 04:21:12 -0500 Received: from mail-pf1-x443.google.com (mail-pf1-x443.google.com [IPv6:2607:f8b0:4864:20::443]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F824C0613D1 for ; Fri, 27 Nov 2020 01:21:12 -0800 (PST) Received: by mail-pf1-x443.google.com with SMTP id n137so4023635pfd.3 for ; Fri, 27 Nov 2020 01:21:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sargun.me; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=CtvcgZK9iYPDW2XGOFNWnwy2h9J2gIbXQeuOPZlJjW0=; b=yK1PbGT9kHNND4Ld+pGseZLGkvfFlYz7M8+NVdsqpuwYBbFPyWdacCsXIOzISCodi7 Y1pAYcEMZPPvCSoVqVGsp8hxa1VCy9wN3DXnxMPm6NLS9n3IZao3O3JcfAEuRlFpEr3c JgokqPECKGmBi7gPeK4i2w7b4qGBvo389gDBo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=CtvcgZK9iYPDW2XGOFNWnwy2h9J2gIbXQeuOPZlJjW0=; b=PRHRvuXqiperLD76+m7eRVVija0QiJxhl2q1VFSBg2vZErIy24vQwyebjJt4qiSbLe 6KwsZS43T3yt4codWeNjrtHT387Qf5iVmDNAQk/ncv5ViAwggE9S61dkLX2nlvAkdO24 mgr3gqNv/m9U86eKY8T7utup4ITXOn+lEs50/7BndOirUXTVjlXbcHEH+W3AEGokWkdD VhLyqIYY0MjLATSZpC0IP0ScLUsBnQMy6nEyUnEO/8ImWRgE8gEA1lUo1pKbRrHiSM8I 7fRM/kTuTq9UMeAGu0b/37FLWx2gaYABQ1c6YclPtOLkXbpIKv8JO7fbeO6i6YcwTBUB VVrA== X-Gm-Message-State: AOAM531OORMvjWYkMNOuyGapsCRzAXQjfGpIlNLCD2zzaeUUrw3f0mZ6 2ubwz8ig3uGxMZ2G9da2syZxcQ== X-Google-Smtp-Source: ABdhPJxPrOjTKbpDK5Tsejg5kFvDpC1doqjiJ3Nm3sMmUS0iPNPo6h4eO7X/g9KRl3j0ddnWF8G8jg== X-Received: by 2002:a17:90a:8402:: with SMTP id j2mr8837033pjn.120.1606468871882; Fri, 27 Nov 2020 01:21:11 -0800 (PST) Received: from ubuntu.netflix.com (203.20.25.136.in-addr.arpa. [136.25.20.203]) by smtp.gmail.com with ESMTPSA id t9sm9938944pjq.46.2020.11.27.01.21.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Nov 2020 01:21:11 -0800 (PST) From: Sargun Dhillon To: linux-unionfs@vger.kernel.org, miklos@szeredi.hu, Alexander Viro , Amir Goldstein Cc: Sargun Dhillon , Giuseppe Scrivano , Vivek Goyal , Daniel J Walsh , linux-fsdevel@vger.kernel.org, David Howells Subject: [PATCH v2 3/4] overlay: Add the ability to remount volatile directories when safe Date: Fri, 27 Nov 2020 01:20:57 -0800 Message-Id: <20201127092058.15117-4-sargun@sargun.me> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201127092058.15117-1-sargun@sargun.me> References: <20201127092058.15117-1-sargun@sargun.me> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Overlayfs added the ability to setup mounts where all syncs could be short-circuted in (2a99ddacee43: ovl: provide a mount option "volatile"). A user might want to remount this fs, but we do not let the user because of the "incompat" detection feature. In the case of volatile, it is safe to do something like[1]: $ sync -f /root/upperdir $ rm -rf /root/workdir/incompat/volatile There are two ways to go about this. You can call sync on the underlying filesystem, check the error code, and delete the dirty file if everything is clean. If you're running lots of containers on the same filesystem, or you want to avoid all unnecessary I/O, this may be suboptimal. Alternatively, you can blindly delete the dirty file, and "hope for the best". This patch introduces transparent functionality to check if it is (relatively) safe to reuse the upperdir. It ensures that the filesystem hasn't been remounted, the system hasn't been rebooted, nor has the overlayfs code changed. Since the structure is explicitly not meant to be used between different versions of the code, its stability does not matter so much. [1]: https://lore.kernel.org/linux-unionfs/CAOQ4uxhKr+j5jFyEC2gJX8E8M19mQ3CqdTYaPZOvDQ9c0qLEzw@mail.gmail.com/T/#m6abe713e4318202ad57f301bf28a414e1d824f9c Signed-off-by: Sargun Dhillon Cc: linux-fsdevel@vger.kernel.org Cc: linux-unionfs@vger.kernel.org Cc: Miklos Szeredi Cc: Amir Goldstein Cc: Vivek Goyal Reported-by: kernel test robot Reviewed-by: Amir Goldstein Reported-by: kernel test robot Reported-by: Dan Carpenter --- Documentation/filesystems/overlayfs.rst | 18 +++-- fs/overlayfs/overlayfs.h | 37 +++++++++- fs/overlayfs/readdir.c | 98 ++++++++++++++++++++++--- fs/overlayfs/super.c | 73 +++++++++++++----- fs/overlayfs/util.c | 2 + 5 files changed, 190 insertions(+), 38 deletions(-) diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst index c6e30c1bc2f2..b485fdb65b85 100644 --- a/Documentation/filesystems/overlayfs.rst +++ b/Documentation/filesystems/overlayfs.rst @@ -579,13 +579,17 @@ example, upon detecting a fault, ext4 can be configured to panic. The advantage of mounting with the "volatile" option is that all forms of sync calls to the upper filesystem are omitted. -When overlay is mounted with "volatile" option, the directory -"$workdir/work/incompat/volatile" is created. During next mount, overlay -checks for this directory and refuses to mount if present. This is a strong -indicator that user should throw away upper and work directories and create -fresh one. In very limited cases where the user knows that the system has -not crashed and contents of upperdir are intact, The "volatile" directory -can be removed. +When overlay is mounted with the "volatile" option, the directory +"$workdir/work/incompat/volatile" is created. This acts as a indicator +that the user should throw away upper and work directories and create fresh +ones. In some cases, the overlayfs can detect if the upperdir can be +reused safely in a subsequent volatile mounts, and mounting will proceed as +normal. If the filesystem is unable to determine if this is safe (due to a +reboot, upgraded kernel code, or loss of checkpoint, etc...), the user may +bypass these safety checks and remove the "volatile" directory if they know +the system did not encounter a fault and the contents of the upperdir are +intact. Then, the user can remount the filesystem as normal. + Testsuite --------- diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index f8880aa2ba0e..de694ee99d7c 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -32,8 +32,13 @@ enum ovl_xattr { OVL_XATTR_NLINK, OVL_XATTR_UPPER, OVL_XATTR_METACOPY, + OVL_XATTR_VOLATILE, }; +#define OVL_INCOMPATDIR_NAME "incompat" +#define OVL_VOLATILEDIR_NAME "volatile" +#define OVL_VOLATILE_DIRTY_NAME "dirty" + enum ovl_inode_flag { /* Pure upper dir that may contain non pure upper entries */ OVL_IMPURE, @@ -57,6 +62,31 @@ enum { OVL_XINO_ON, }; +/* + * This is copied into the volatile xattr, and the user does not interact with + * it. There is no stability requirement, as a reboot explicitly invalidates + * a volatile workdir. It is explicitly meant not to be a stable api. + * + * Although this structure isn't meant to be stable it is exposed to potentially + * unprivileged users. We don't do any kind of cryptographic operations with + * the structure, so it could be tampered with, or inspected. Don't put + * kernel memory pointers in it, or anything else that could cause problems, + * or information disclosure. + */ +struct ovl_volatile_info { + /* + * This uniquely identifies a boot, and is reset if overlayfs itself + * is reloaded. Therefore we check our current / known boot_id + * against this before looking at any other fields to validate: + * 1. Is this datastructure laid out in the way we expect? (Overlayfs + * module, reboot, etc...) + * 2. Could something have changed (like the s_instance_id counter + * resetting) + */ + uuid_t ovl_boot_id; /* Must stay first member */ + u64 s_instance_id; +} __packed; + /* * The tuple (fh,uuid) is a universal unique identifier for a copy up origin, * where: @@ -422,8 +452,8 @@ void ovl_cleanup_whiteouts(struct dentry *upper, struct list_head *list); void ovl_cache_free(struct list_head *list); void ovl_dir_cache_free(struct inode *inode); int ovl_check_d_type_supported(struct path *realpath); -int ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt, - struct dentry *dentry, int level); +int ovl_workdir_cleanup(struct ovl_fs *ofs, struct inode *dir, + struct vfsmount *mnt, struct dentry *dentry, int level); int ovl_indexdir_cleanup(struct ovl_fs *ofs); /* inode.c */ @@ -520,3 +550,6 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower, /* export.c */ extern const struct export_operations ovl_export_operations; + +/* super.c */ +extern uuid_t ovl_boot_id; diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c index 01620ebae1bd..7b66fbb20261 100644 --- a/fs/overlayfs/readdir.c +++ b/fs/overlayfs/readdir.c @@ -1080,10 +1080,78 @@ int ovl_check_d_type_supported(struct path *realpath) return rdd.d_type_supported; } +static int ovl_verify_volatile_info(struct ovl_fs *ofs, + struct dentry *volatiledir) +{ + int err; + struct ovl_volatile_info info; + + if (!volatiledir->d_inode) + return 0; + + if (!ofs->config.ovl_volatile) { + pr_debug("Mount is not volatile; upperdir is marked volatile\n"); + return -EINVAL; + } + + err = ovl_do_getxattr(ofs, volatiledir, OVL_XATTR_VOLATILE, &info, + sizeof(info)); + if (err < 0) { + pr_debug("Unable to read volatile xattr: %d\n", err); + return -EINVAL; + } + + if (err != sizeof(info)) { + pr_debug("%s xattr on-disk size is %d expected to read %zd\n", + ovl_xattr(ofs, OVL_XATTR_VOLATILE), err, sizeof(info)); + return -EINVAL; + } + + if (!uuid_equal(&ovl_boot_id, &info.ovl_boot_id)) { + pr_debug("boot id has changed (reboot or module reloaded)\n"); + return -EINVAL; + } + + if (volatiledir->d_sb->s_instance_id != info.s_instance_id) { + pr_debug("workdir has been unmounted and remounted\n"); + return -EINVAL; + } + + return 1; +} -#define OVL_INCOMPATDIR_NAME "incompat" +/* + * ovl_check_incompat checks this specific incompat entry for incompatibility. + * If it is found to be incompatible -EINVAL will be returned. + * + * If the directory should be preserved, then this function returns 1. + */ +static int ovl_check_incompat(struct ovl_fs *ofs, struct ovl_cache_entry *p, + struct path *path) +{ + int err = -EINVAL; + struct dentry *d; + + if (!strcmp(p->name, OVL_VOLATILEDIR_NAME)) { + d = lookup_one_len(p->name, path->dentry, p->len); + if (IS_ERR(d)) + return PTR_ERR(d); + + err = ovl_verify_volatile_info(ofs, d); + dput(d); + } + + if (err == -EINVAL) + pr_err("incompat feature '%s' cannot be mounted\n", p->name); + else + pr_debug("incompat '%s' handled: %d\n", p->name, err); + + dput(d); + return err; +} -static int ovl_workdir_cleanup_recurse(struct path *path, int level) +static int ovl_workdir_cleanup_recurse(struct ovl_fs *ofs, struct path *path, + int level) { int err; struct inode *dir = path->dentry->d_inode; @@ -1125,16 +1193,19 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level) if (p->len == 2 && p->name[1] == '.') continue; } else if (incompat) { - pr_err("overlay with incompat feature '%s' cannot be mounted\n", - p->name); - err = -EINVAL; - break; + err = ovl_check_incompat(ofs, p, path); + if (err < 0) + break; + /* Skip cleaning this */ + if (err == 1) + continue; } dentry = lookup_one_len(p->name, path->dentry, p->len); if (IS_ERR(dentry)) continue; if (dentry->d_inode) - err = ovl_workdir_cleanup(dir, path->mnt, dentry, level); + err = ovl_workdir_cleanup(ofs, dir, path->mnt, dentry, + level); dput(dentry); if (err) break; @@ -1142,11 +1213,13 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level) inode_unlock(dir); out: ovl_cache_free(&list); + if (incompat && err >= 0) + return 1; return err; } -int ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt, - struct dentry *dentry, int level) +int ovl_workdir_cleanup(struct ovl_fs *ofs, struct inode *dir, + struct vfsmount *mnt, struct dentry *dentry, int level) { int err; @@ -1159,7 +1232,7 @@ int ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt, struct path path = { .mnt = mnt, .dentry = dentry }; inode_unlock(dir); - err = ovl_workdir_cleanup_recurse(&path, level + 1); + err = ovl_workdir_cleanup_recurse(ofs, &path, level + 1); inode_lock_nested(dir, I_MUTEX_PARENT); if (!err) err = ovl_cleanup(dir, dentry); @@ -1206,9 +1279,10 @@ int ovl_indexdir_cleanup(struct ovl_fs *ofs) } /* Cleanup leftover from index create/cleanup attempt */ if (index->d_name.name[0] == '#') { - err = ovl_workdir_cleanup(dir, path.mnt, index, 1); - if (err) + err = ovl_workdir_cleanup(ofs, dir, path.mnt, index, 1); + if (err < 0) break; + err = 0; goto next; } err = ovl_verify_index(ofs, index); diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 290983bcfbb3..a8ee3ba4ebbd 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -15,6 +15,7 @@ #include #include #include +#include #include "overlayfs.h" MODULE_AUTHOR("Miklos Szeredi "); @@ -23,6 +24,7 @@ MODULE_LICENSE("GPL"); struct ovl_dir_cache; +uuid_t ovl_boot_id; #define OVL_MAX_STACK 500 @@ -722,20 +724,24 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs, goto out_unlock; retried = true; - err = ovl_workdir_cleanup(dir, mnt, work, 0); - dput(work); - if (err == -EINVAL) { - work = ERR_PTR(err); - goto out_unlock; + err = ovl_workdir_cleanup(ofs, dir, mnt, work, 0); + /* check if we should reuse the workdir */ + if (err != 1) { + dput(work); + if (err == -EINVAL) { + work = ERR_PTR(err); + goto out_unlock; + } + goto retry; } - goto retry; + } else { + work = ovl_create_real(dir, work, + OVL_CATTR(attr.ia_mode)); + err = PTR_ERR(work); + if (IS_ERR(work)) + goto out_err; } - work = ovl_create_real(dir, work, OVL_CATTR(attr.ia_mode)); - err = PTR_ERR(work); - if (IS_ERR(work)) - goto out_err; - /* * Try to remove POSIX ACL xattrs from workdir. We are good if: * @@ -1237,26 +1243,58 @@ static struct dentry *ovl_lookup_or_create(struct dentry *parent, return child; } +static int ovl_set_volatile_info(struct ovl_fs *ofs, struct dentry *volatiledir) +{ + int err; + struct ovl_volatile_info info = { + .s_instance_id = volatiledir->d_sb->s_instance_id, + }; + + uuid_copy(&info.ovl_boot_id, &ovl_boot_id); + err = ovl_do_setxattr(ofs, volatiledir, OVL_XATTR_VOLATILE, &info, + sizeof(info)); + + if (err == -EOPNOTSUPP) + return 0; + + return err; +} + /* * Creates $workdir/work/incompat/volatile/dirty file if it is not already * present. */ static int ovl_create_volatile_dirty(struct ovl_fs *ofs) { + int err; unsigned int ctr; - struct dentry *d = dget(ofs->workbasedir); + struct dentry *volatiledir, *d = dget(ofs->workbasedir); static const char *const volatile_path[] = { - OVL_WORKDIR_NAME, "incompat", "volatile", "dirty" + OVL_WORKDIR_NAME, + OVL_INCOMPATDIR_NAME, + OVL_VOLATILEDIR_NAME, }; const char *const *name = volatile_path; - for (ctr = ARRAY_SIZE(volatile_path); ctr; ctr--, name++) { - d = ovl_lookup_or_create(d, *name, ctr > 1 ? S_IFDIR : S_IFREG); + /* Create the volatile subdirectory that we put the xattr on */ + for (ctr = 0; ctr < ARRAY_SIZE(volatile_path); ctr++, name++) { + d = ovl_lookup_or_create(d, *name, S_IFDIR); if (IS_ERR(d)) return PTR_ERR(d); } - dput(d); - return 0; + volatiledir = dget(d); + + /* Create the dirty file exists before we set the xattr */ + d = ovl_lookup_or_create(d, OVL_VOLATILE_DIRTY_NAME, S_IFREG); + if (!IS_ERR(d)) { + dput(d); + err = ovl_set_volatile_info(ofs, volatiledir); + } else { + err = PTR_ERR(d); + } + + dput(volatiledir); + return err; } static int ovl_make_workdir(struct super_block *sb, struct ovl_fs *ofs, @@ -2044,6 +2082,7 @@ static int __init ovl_init(void) { int err; + uuid_gen(&ovl_boot_id); ovl_inode_cachep = kmem_cache_create("ovl_inode", sizeof(struct ovl_inode), 0, (SLAB_RECLAIM_ACCOUNT| diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c index 23f475627d07..87c9f5a063ed 100644 --- a/fs/overlayfs/util.c +++ b/fs/overlayfs/util.c @@ -580,6 +580,7 @@ bool ovl_check_dir_xattr(struct super_block *sb, struct dentry *dentry, #define OVL_XATTR_NLINK_POSTFIX "nlink" #define OVL_XATTR_UPPER_POSTFIX "upper" #define OVL_XATTR_METACOPY_POSTFIX "metacopy" +#define OVL_XATTR_VOLATILE_POSTFIX "volatile" #define OVL_XATTR_TAB_ENTRY(x) \ [x] = OVL_XATTR_PREFIX x ## _POSTFIX @@ -592,6 +593,7 @@ const char *ovl_xattr_table[] = { OVL_XATTR_TAB_ENTRY(OVL_XATTR_NLINK), OVL_XATTR_TAB_ENTRY(OVL_XATTR_UPPER), OVL_XATTR_TAB_ENTRY(OVL_XATTR_METACOPY), + OVL_XATTR_TAB_ENTRY(OVL_XATTR_VOLATILE), }; int ovl_check_setxattr(struct dentry *dentry, struct dentry *upperdentry, From patchwork Fri Nov 27 09:20:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sargun Dhillon X-Patchwork-Id: 11935217 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D8DFC71155 for ; Fri, 27 Nov 2020 09:21:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D56662065E for ; Fri, 27 Nov 2020 09:21:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=sargun.me header.i=@sargun.me header.b="BvVHKFHm" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728089AbgK0JVP (ORCPT ); Fri, 27 Nov 2020 04:21:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42062 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726014AbgK0JVO (ORCPT ); Fri, 27 Nov 2020 04:21:14 -0500 Received: from mail-pl1-x641.google.com (mail-pl1-x641.google.com [IPv6:2607:f8b0:4864:20::641]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88885C0613D1 for ; Fri, 27 Nov 2020 01:21:14 -0800 (PST) Received: by mail-pl1-x641.google.com with SMTP id v21so2389207plo.12 for ; Fri, 27 Nov 2020 01:21:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sargun.me; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=okmQhfPLGgBjgjZXbAm+aIoWMwXVsIOZ9BcK51UFRVI=; b=BvVHKFHm+5yavD2kii5ezOno5mhVTxOxRqvPhwq2ZnepF4QcmilndtgKa4E4XKjivE QKdejxSbJlfZ/cy5wfV7pZTIzrVmELPefTzuRmRzPrquEAGqCyBYaDRE9+vLPHnPRa5h SCjMvFlBWRuy4HEG7ld/MUhxYm0Ai0wz7nBok= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=okmQhfPLGgBjgjZXbAm+aIoWMwXVsIOZ9BcK51UFRVI=; b=gXb+BefQSLZ2fTsEWFzOw7b2lrPRdPrc9o4siBNVAoB7bZ26KRh/OXZX86imrfPHxi jJ55no0mv5iKA8XW3jqwgrs7/9VZnzyi6oiHXxNKGMAhupce007kgG/PSzEU2z34nuEX Lgo5+FLKwW4GTpVKncTZ1AHeMx+QXCYiyuPq1ZuFhXVOrb8kqoKLCncV4XgfPa0nT4CO MtDahjB1KWIVxRy6d5MhPhRBp+48TwMOgTxcuK0b9vYxs7NtNyPc+tvVThiwcZKf1grS 2LwLKXPvxP2h0VpgoP1cZDpfaFTW409BtdWlLOaXbf8GycqGtulXh6L/gF/mBP9d9tes 3C2g== X-Gm-Message-State: AOAM530S76GSRmIx9+2bUL7M0t06WNk39VcaOeAgMiPIGm2OvQsRm9uP RDnwsozDpYBSo51eMwmTLfm8lA== X-Google-Smtp-Source: ABdhPJxQVST1I6ABrDukfZAWfDWDJtHcCtYIHhwxiksn57cMXHQkRbzCSL6dkDHbM45XS3PnXYrN7A== X-Received: by 2002:a17:902:8f82:b029:da:23e0:17d7 with SMTP id z2-20020a1709028f82b02900da23e017d7mr6225058plo.37.1606468873951; Fri, 27 Nov 2020 01:21:13 -0800 (PST) Received: from ubuntu.netflix.com (203.20.25.136.in-addr.arpa. [136.25.20.203]) by smtp.gmail.com with ESMTPSA id t9sm9938944pjq.46.2020.11.27.01.21.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Nov 2020 01:21:13 -0800 (PST) From: Sargun Dhillon To: linux-unionfs@vger.kernel.org, miklos@szeredi.hu, Alexander Viro , Amir Goldstein Cc: Sargun Dhillon , Giuseppe Scrivano , Vivek Goyal , Daniel J Walsh , linux-fsdevel@vger.kernel.org, David Howells Subject: [PATCH v2 4/4] overlay: Add rudimentary checking of writeback errseq on volatile remount Date: Fri, 27 Nov 2020 01:20:58 -0800 Message-Id: <20201127092058.15117-5-sargun@sargun.me> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201127092058.15117-1-sargun@sargun.me> References: <20201127092058.15117-1-sargun@sargun.me> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Volatile remounts validate the following at the moment: * Has the module been reloaded / the system rebooted * Has the workdir been remounted This adds a new check for errors detected via the superblock's errseq_t. At mount time, the errseq_t is snapshotted to disk, and upon remount it's re-verified. This allows for kernel-level detection of errors without forcing userspace to perform a sync and allows for the hidden detection of writeback errors. Signed-off-by: Sargun Dhillon Cc: linux-fsdevel@vger.kernel.org Cc: linux-unionfs@vger.kernel.org Cc: Miklos Szeredi Cc: Amir Goldstein Cc: Vivek Goyal --- fs/overlayfs/overlayfs.h | 1 + fs/overlayfs/readdir.c | 6 ++++++ fs/overlayfs/super.c | 1 + 3 files changed, 8 insertions(+) diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index de694ee99d7c..e8a711953b64 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -85,6 +85,7 @@ struct ovl_volatile_info { */ uuid_t ovl_boot_id; /* Must stay first member */ u64 s_instance_id; + errseq_t errseq; /* Implemented as a u32 */ } __packed; /* diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c index 7b66fbb20261..5795b28bb4cf 100644 --- a/fs/overlayfs/readdir.c +++ b/fs/overlayfs/readdir.c @@ -1117,6 +1117,12 @@ static int ovl_verify_volatile_info(struct ovl_fs *ofs, return -EINVAL; } + err = errseq_check(&volatiledir->d_sb->s_wb_err, info.errseq); + if (err) { + pr_debug("Workdir filesystem reports errors: %d\n", err); + return -EINVAL; + } + return 1; } diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index a8ee3ba4ebbd..2e473f8c75dd 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -1248,6 +1248,7 @@ static int ovl_set_volatile_info(struct ovl_fs *ofs, struct dentry *volatiledir) int err; struct ovl_volatile_info info = { .s_instance_id = volatiledir->d_sb->s_instance_id, + .errseq = errseq_sample(&volatiledir->d_sb->s_wb_err), }; uuid_copy(&info.ovl_boot_id, &ovl_boot_id);