From patchwork Wed Sep 4 20:27:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 13791354 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56090CD4F25 for ; Wed, 4 Sep 2024 20:29:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C44426B0345; Wed, 4 Sep 2024 16:29:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BCD106B0346; Wed, 4 Sep 2024 16:29:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A4FE36B0347; Wed, 4 Sep 2024 16:29:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 80F286B0345 for ; Wed, 4 Sep 2024 16:29:17 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 31747A84C8 for ; Wed, 4 Sep 2024 20:29:17 +0000 (UTC) X-FDA: 82528195554.04.EDEDDA2 Received: from mail-oi1-f174.google.com (mail-oi1-f174.google.com [209.85.167.174]) by imf10.hostedemail.com (Postfix) with ESMTP id 47624C0018 for ; Wed, 4 Sep 2024 20:29:15 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=toxicpanda-com.20230601.gappssmtp.com header.s=20230601 header.b=E6pIdKVH; dmarc=none; spf=none (imf10.hostedemail.com: domain of josef@toxicpanda.com has no SPF policy when checking 209.85.167.174) smtp.mailfrom=josef@toxicpanda.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725481648; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=TwI5ZE+LxT3KOam/eAsyKw/n5tLnh1lq4qUNxruKGbU=; b=uF8J+VIKHZgEVozYfcCg55PjGT/TeVoNQPZuBNhezvGI0V+0Aj3DbqKUNCGkRbgKoetswe lPtPffzzpnGwUOeo2W7AxGXamb6iDwivElvrZFoL3fq9GoAwh0l0nlkLfkAwz44VYfE3h0 rgOeqcuGlU+EaSRjTW2ZMfwc9Yltu1Y= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725481648; a=rsa-sha256; cv=none; b=g6SdlkW3Ed73lT/t6B5Fn1BKs9SX2Tq5wfTnoyaVPuFbDP8/a2Os6L3uWjRI0kPZi7xNst vyRlTOCmy6xE9e8XrA0xawyZDFgyH2b9IiRHcv19kI+i6hRwfdirPY6YA+agd2bhh/50xk cLHWpBEy/wcICBAGpETPqcwU4l+wtXQ= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=toxicpanda-com.20230601.gappssmtp.com header.s=20230601 header.b=E6pIdKVH; dmarc=none; spf=none (imf10.hostedemail.com: domain of josef@toxicpanda.com has no SPF policy when checking 209.85.167.174) smtp.mailfrom=josef@toxicpanda.com Received: by mail-oi1-f174.google.com with SMTP id 5614622812f47-3df03d1f1ddso934534b6e.0 for ; Wed, 04 Sep 2024 13:29:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20230601.gappssmtp.com; s=20230601; t=1725481754; x=1726086554; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=TwI5ZE+LxT3KOam/eAsyKw/n5tLnh1lq4qUNxruKGbU=; b=E6pIdKVHD9uwsPSdrrIBWIe5lntsDwwB/3MjQKpPghCQ6lCwjpkKBdotm3nSIBBc+q u9nlqcvThqIjQg9KEngUUY0H71P/FItbAR3IkEItk5YY0ELoVeHkoFv56JLNS7I3l+Mf Ty2HsqmNBQmBW1dvb0+erJRjXlFKLEtnxQBKD/Ch5UqDTD8h614QQzsaBdSx6sPeb26i EYFs1CMmcRuFaU2JXPr0D3CpYiyL6eAHNP5DyiJsVABcam9gbewo0FGgLzHrR2zaIJe2 f6OEKz/v8gXxX4xaRuObdOJf8HPFwKZxmw2bnTov7sKUP+kTYlzzxDq4hauPTR+rrnPc vD3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725481754; x=1726086554; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TwI5ZE+LxT3KOam/eAsyKw/n5tLnh1lq4qUNxruKGbU=; b=OGcvSn9we6LitT9mBHRGXrpDx8bfbEpZONfseLLWt5s1VoyvLvRypEx4aAbttFbJEF d5iHHZfYbYHZiRIplkpIiyUNxMMfUTyL7d+aNarA8iiRqJ0RjBMyPzctzUIuAnezaola OUhfkY0b3sEIyB7VxVWuCn0v3YoAwX3TSY73wAp+TVyleIMRYEwir7phnWGij5mqwcb8 caJM38nsbHxE8zAAgTxpM4d8qBv+h72fCmCYoy0CFFbH9ekT2GtAIRSInrZ7esPSxErc OqzHZ9NpRRgNC13vDf8BnKZOKblslIsogyMeAgj/IqPpPtuWelyC/bqG90aodgksHlan 5Qhw== X-Forwarded-Encrypted: i=1; AJvYcCVeoD6jNpT3DsT4kFqJlR9QDRupxvRAk1KDVVzQ7TpLNTxXjjOk83VSId6y3+078n0rBDdpZdmm5g==@kvack.org X-Gm-Message-State: AOJu0YwkklAOKK8Hqp6PNcdt1VAyK312/eEgpLZENZPdMHd4T3CssERy Xz5SRGKx0OrVH8AprV1MPyVGhXQDgOm4Mo1a31WzcF6BSoJJNbS0YBo1NF5KWuE= X-Google-Smtp-Source: AGHT+IFmCKTcgpMbWnOZWSrhQNjNJKeds3Qcec7gu/SaeTJ7YoWz86ymksmS0VUDnwj3EKHmpboMHg== X-Received: by 2002:a05:6808:14d2:b0:3d9:2319:48ac with SMTP id 5614622812f47-3df05c2fa79mr25098859b6e.9.1725481754097; Wed, 04 Sep 2024 13:29:14 -0700 (PDT) Received: from localhost (syn-076-182-020-124.res.spectrum.com. [76.182.20.124]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-45801dce63esm1429311cf.95.2024.09.04.13.29.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Sep 2024 13:29:13 -0700 (PDT) From: Josef Bacik To: kernel-team@fb.com, linux-fsdevel@vger.kernel.org, jack@suse.cz, amir73il@gmail.com, brauner@kernel.org, linux-xfs@vger.kernel.org, linux-bcachefs@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v5 00/18] fanotify: add pre-content hooks Date: Wed, 4 Sep 2024 16:27:50 -0400 Message-ID: X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 47624C0018 X-Stat-Signature: grxz4sj5b79u1i5x9cgqig74s9chwphs X-Rspam-User: X-HE-Tag: 1725481755-642717 X-HE-Meta: U2FsdGVkX1+Y7J3C/Nu5gXN2dJIWQQDnxkO6V84AE9GsICtUEANTYajaWkYhJjJZr2avpsNTWORYSYnGGhUL/AkPbasJL22C8GD6uhHJ82niPR1ZNvoPTz8tIO6ij0L1sc+u6cneAHhN1tmz1xOaFFyC0ktwMrY+zKJOb6BSn6WPE871NZkBSyDriQtqiPXVbEyk3UMVag8LBSxpRdhWrt1W9fxpyQm8Y1A6g6eERa90xOLIHWxe+SBhl+FArMHZP5kz5Xn20m5cgNrV5ZBSqRYQdtcVUSZsHRqGtTbcRjNq3XXROHi0j+nqVlfZmhMdCyopVwmgnSooNnLEKQWOw30UQqWF9K/VXlQ7alriHtWpAYOY62dExb0zfpTroygdw5kor8Ww4dbaPxDiIym4kT87sSuOcDlt4GSbsmUL5EovLvFMrGm6XswcCEgmJswEXvpnLm+bca6ahDYUCHnnmYAsb05jwMtHNeP0zMKlNKF/A9TaEOiDJUyPa+1Z+msYcc3Yoo55RYJ8ZrYRu31FKu2p9KF2xUO1R+R8fgB50Tl3OT9RUowVoZRs2XAOLjBDbtI5/kx6YeLWYSx/u57Sr1r6kR38E5WQPlZSdZS9ySqjZbns2s/F4nz0qnmNzTOZCA9LSFUAsA9wAPQVw5eOE2Wd+EuVFXzK0q0sczaD9KS5fcUPlM1FnPA7WFJyy3XxhutSxE7Qk9j+8eLQXklS/3686da/rlrvlWSQxlNcltyCLlG3rV36t3UOo3cNNZ6E3+rdvgP35j6rviPPUzQo6i7qonkQzgbfY2V52CGhsxHL4/3CyseZmtpWN8wHlB5a0dGuWGTGWEY1dfdYcJbje3FTnbY3kWAZ5KP/r12Yz3ej08e+VTTDkSIIAfSBg7uFe7JWfQQn1FCNHrnyMriBlAbLrhh6oSslVg5xzw1cqrmjUxS9gZONi5qIC6rZ5Q4dZc/Z+7DD4QXbejeRtBf Xh6QsQse Mu22lwBoYRIdN8Bl1CTOcXIdhCgWyjU0PojJEkGiz2v0p8QesrciG/KMGQeA7EZ0kIuS6yhVUB8Y+tyUIpJNBcsNgBypy1kzYGfJk2CCh5mjTu32NRHbmJ4WC+i0yd42S6hCoEYcMXoBKuL9tvP/KyKutAyl1RuCGNs3ZF+Pt95kPWfjmiYzEL1dkH4SdOqUmqr9vdvXULtFpC/JBhrDPn4yYuUmDVLEQV8K375Lxa1kQpWfxO8uhxnRMcg2kzkdvlNQPwiHkehWSOsH6RtdnyHCEwsAprqL50z9cLXe8YpihDVu44GvbfTQeEtcYsrNru8RgYJshz8WLs4ZopVPJJ+HaBVwtwAlO+MDRv/RQYcAYTVc2hfIMHV0WdB9SHuGByqNENBO2JHbRt1uLl5dHxfEsCwGiTTStXgRrphQwmaiHb70= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: v4: https://lore.kernel.org/linux-fsdevel/cover.1723670362.git.josef@toxicpanda.com/ v3: https://lore.kernel.org/linux-fsdevel/cover.1723228772.git.josef@toxicpanda.com/ v2: https://lore.kernel.org/linux-fsdevel/cover.1723144881.git.josef@toxicpanda.com/ v1: https://lore.kernel.org/linux-fsdevel/cover.1721931241.git.josef@toxicpanda.com/ v4->v5: - Cleaned up the various "I'll fix it on commit" notes that Jan made since I had to respin the series anyway. - Renamed the filemap pagefault helper for fsnotify per Christians suggestion. - Added a FS_ALLOW_HSM flag per Jan's comments, based on Amir's rough sketch. - Added a patch to disable btrfs defrag on pre-content watched files. - Added a patch to turn on FS_ALLOW_HSM for all the file systems that I tested. - Added two fstests (which will be posted separately) to validate everything, re-validated the series with btrfs, xfs, ext4, and bcachefs to make sure I didn't break anything. v3->v4: - Trying to send a final verson Friday at 5pm before you go on vacation is a recipe for silly mistakes, fixed the xfs handling yet again, per Christoph's review. - Reworked the file system helper so it's handling of fpin was a little less silly, per Chinner's suggestion. - Updated the return values to not or in VM_FAULT_RETRY, as we have a comment in filemap_fault that says if VM_FAULT_ERROR is set we won't have VM_FAULT_RETRY set. v2->v3: - Fix the pagefault path to do MAY_ACCESS instead, updated the perm handler to emit PRE_ACCESS in this case, so we can avoid the extraneous perm event as per Amir's suggestion. - Reworked the exported helper so the per-filesystem changes are much smaller, per Amir's suggestion. - Fixed the screwup for DAX writes per Chinner's suggestion. - Added Christian's reviewed-by's where appropriate. v1->v2: - reworked the page fault logic based on Jan's suggestion and turned it into a helper. - Added 3 patches per-fs where we need to call the fsnotify helper from their ->fault handlers. - Disabled readahead in the case that there's a pre-content watch in place. - Disabled huge faults when there's a pre-content watch in place (entirely because it's untested, theoretically it should be straightforward to do). - Updated the command numbers. - Addressed the random spelling/grammer mistakes that Jan pointed out. - Addressed the other random nits from Jan. --- Original email --- Hello, These are the patches for the bare bones pre-content fanotify support. The majority of this work is Amir's, my contribution to this has solely been around adding the page fault hooks, testing and validating everything. I'm sending it because Amir is traveling a bunch, and I touched it last so I'm going to take all the hate and he can take all the credit. There is a PoC that I've been using to validate this work, you can find the git repo here https://github.com/josefbacik/remote-fetch This consists of 3 different tools. 1. populate. This just creates all the stub files in the directory from the source directory. Just run ./populate ~/linux ~/hsm-linux and it'll recursively create all of the stub files and directories. 2. remote-fetch. This is the actual PoC, you just point it at the source and destination directory and then you can do whatever. ./remote-fetch ~/linux ~/hsm-linux. 3. mmap-validate. This was to validate the pagefault thing, this is likely what will be turned into the selftest with remote-fetch. It creates a file and then you can validate the file matches the right pattern with both normal reads and mmap. Normally I do something like ./mmap-validate create ~/src/foo ./populate ~/src ~/dst ./rmeote-fetch ~/src ~/dst ./mmap-validate validate ~/dst/foo I did a bunch of testing, I also got some performance numbers. I copied a kernel tree, and then did remote-fetch, and then make -j4 Normal real 9m49.709s user 28m11.372s sys 4m57.304s HSM real 10m6.454s user 29m10.517s sys 5m2.617s So ~17 seconds more to build with HSM. I then did a make mrproper on both trees to see the size [root@fedora ~]# du -hs /src/linux 1.6G /src/linux [root@fedora ~]# du -hs dst 125M dst This mirrors the sort of savings we've seen in production. Meta has had these patches (minus the page fault patch) deployed in production for almost a year with our own utility for doing on-demand package fetching. The savings from this has been pretty significant. The page-fault hooks are necessary for the last thing we need, which is on-demand range fetching of executables. Some of our binaries are several gigs large, having the ability to remote fetch them on demand is a huge win for us not only with space savings, but with startup time of containers. There will be tests for this going into LTP once we're satisfied with the patches and they're on their way upstream. Thanks, Josef Amir Goldstein (8): fsnotify: introduce pre-content permission event fsnotify: generate pre-content permission event on open fanotify: introduce FAN_PRE_ACCESS permission event fanotify: introduce FAN_PRE_MODIFY permission event fanotify: pass optional file access range in pre-content event fanotify: rename a misnamed constant fanotify: report file range info with pre-content events fanotify: allow to set errno in FAN_DENY permission response Josef Bacik (10): fanotify: don't skip extra event info if no info_mode is set fs: add a flag to indicate the fs supports pre-content events fanotify: add a helper to check for pre content events fanotify: disable readahead if we have pre-content watches mm: don't allow huge faults for files with pre content watches fsnotify: generate pre-content permission event on page fault bcachefs: add pre-content fsnotify hook to fault xfs: add pre-content fsnotify hook for write faults btrfs: disable defrag on pre-content watched files fs: enable pre-content events on supported file systems fs/bcachefs/fs-io-pagecache.c | 4 + fs/bcachefs/fs.c | 2 +- fs/btrfs/ioctl.c | 9 ++ fs/btrfs/super.c | 3 +- fs/ext4/super.c | 6 +- fs/namei.c | 9 ++ fs/notify/fanotify/fanotify.c | 33 ++++++-- fs/notify/fanotify/fanotify.h | 15 ++++ fs/notify/fanotify/fanotify_user.c | 119 ++++++++++++++++++++++----- fs/notify/fsnotify.c | 17 +++- fs/xfs/xfs_file.c | 4 + fs/xfs/xfs_super.c | 2 +- include/linux/fanotify.h | 20 +++-- include/linux/fs.h | 1 + include/linux/fsnotify.h | 58 +++++++++++-- include/linux/fsnotify_backend.h | 59 ++++++++++++- include/linux/mm.h | 1 + include/uapi/linux/fanotify.h | 18 ++++ mm/filemap.c | 128 +++++++++++++++++++++++++++-- mm/memory.c | 22 +++++ mm/readahead.c | 13 +++ security/selinux/hooks.c | 3 +- 22 files changed, 489 insertions(+), 57 deletions(-)