mbox series

[0/5] Volatile fanotify marks

Message ID 20220307155741.1352405-1-amir73il@gmail.com (mailing list archive)
Headers show
Series Volatile fanotify marks | expand

Message

Amir Goldstein March 7, 2022, 3:57 p.m. UTC
Jan,

Following RFC discussion [1], following are the volatile mark patches.

Tested both manually and with this LTP test [2].
I was struggling with this test for a while because drop caches
did not get rid of the un-pinned inode when test was run with
ext2 or ext4 on my test VM. With xfs, the test works fine for me,
but it may not work for everyone.

Perhaps you have a suggestion for a better way to test inode eviction.

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/CAOQ4uxiRDpuS=2uA6+ZUM7yG9vVU-u212tkunBmSnP_u=mkv=Q@mail.gmail.com/
[2] https://github.com/amir73il/ltp/commits/fan_volatile

Amir Goldstein (5):
  fsnotify: move inotify control flags to mark flags
  fsnotify: pass flags argument to fsnotify_add_mark()
  fsnotify: allow adding an inode mark without pinning inode
  fanotify: add support for exclusive create of mark
  fanotify: add support for "volatile" inode marks

 fs/notify/fanotify/fanotify_user.c   | 32 +++++++++--
 fs/notify/fsnotify.c                 |  4 +-
 fs/notify/inotify/inotify_fsnotify.c |  2 +-
 fs/notify/inotify/inotify_user.c     | 40 +++++++++-----
 fs/notify/mark.c                     | 83 +++++++++++++++++++++++-----
 include/linux/fanotify.h             |  9 ++-
 include/linux/fsnotify_backend.h     | 32 ++++++-----
 include/uapi/linux/fanotify.h        |  2 +
 kernel/audit_fsnotify.c              |  3 +-
 9 files changed, 151 insertions(+), 56 deletions(-)

Comments

Jan Kara March 17, 2022, 2:12 p.m. UTC | #1
On Mon 07-03-22 17:57:36, Amir Goldstein wrote:
> Jan,
> 
> Following RFC discussion [1], following are the volatile mark patches.
> 
> Tested both manually and with this LTP test [2].
> I was struggling with this test for a while because drop caches
> did not get rid of the un-pinned inode when test was run with
> ext2 or ext4 on my test VM. With xfs, the test works fine for me,
> but it may not work for everyone.
> 
> Perhaps you have a suggestion for a better way to test inode eviction.

Drop caches does not evict dirty inodes. The inode is likely dirty because
you have chmodded it just before drop caches. So I think calling sync or
syncfs before dropping caches should fix your problems with ext2 / ext4.  I
suspect this has worked for XFS only because it does its private inode
dirtiness tracking and keeps the inode behind VFS's back.

								Honza

> 
> Thanks,
> Amir.
> 
> [1] https://lore.kernel.org/linux-fsdevel/CAOQ4uxiRDpuS=2uA6+ZUM7yG9vVU-u212tkunBmSnP_u=mkv=Q@mail.gmail.com/
> [2] https://github.com/amir73il/ltp/commits/fan_volatile
> 
> Amir Goldstein (5):
>   fsnotify: move inotify control flags to mark flags
>   fsnotify: pass flags argument to fsnotify_add_mark()
>   fsnotify: allow adding an inode mark without pinning inode
>   fanotify: add support for exclusive create of mark
>   fanotify: add support for "volatile" inode marks
> 
>  fs/notify/fanotify/fanotify_user.c   | 32 +++++++++--
>  fs/notify/fsnotify.c                 |  4 +-
>  fs/notify/inotify/inotify_fsnotify.c |  2 +-
>  fs/notify/inotify/inotify_user.c     | 40 +++++++++-----
>  fs/notify/mark.c                     | 83 +++++++++++++++++++++++-----
>  include/linux/fanotify.h             |  9 ++-
>  include/linux/fsnotify_backend.h     | 32 ++++++-----
>  include/uapi/linux/fanotify.h        |  2 +
>  kernel/audit_fsnotify.c              |  3 +-
>  9 files changed, 151 insertions(+), 56 deletions(-)
> 
> -- 
> 2.25.1
>
Amir Goldstein March 17, 2022, 3:14 p.m. UTC | #2
On Thu, Mar 17, 2022 at 4:12 PM Jan Kara <jack@suse.cz> wrote:
>
> On Mon 07-03-22 17:57:36, Amir Goldstein wrote:
> > Jan,
> >
> > Following RFC discussion [1], following are the volatile mark patches.
> >
> > Tested both manually and with this LTP test [2].
> > I was struggling with this test for a while because drop caches
> > did not get rid of the un-pinned inode when test was run with
> > ext2 or ext4 on my test VM. With xfs, the test works fine for me,
> > but it may not work for everyone.
> >
> > Perhaps you have a suggestion for a better way to test inode eviction.
>
> Drop caches does not evict dirty inodes. The inode is likely dirty because
> you have chmodded it just before drop caches. So I think calling sync or
> syncfs before dropping caches should fix your problems with ext2 / ext4.  I
> suspect this has worked for XFS only because it does its private inode
> dirtiness tracking and keeps the inode behind VFS's back.

I did think of that and tried to fsync which did not help, but maybe
I messed it up somehow.

Thanks,
Amir.
Amir Goldstein March 20, 2022, 12:54 p.m. UTC | #3
On Thu, Mar 17, 2022 at 5:14 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Mar 17, 2022 at 4:12 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Mon 07-03-22 17:57:36, Amir Goldstein wrote:
> > > Jan,
> > >
> > > Following RFC discussion [1], following are the volatile mark patches.
> > >
> > > Tested both manually and with this LTP test [2].
> > > I was struggling with this test for a while because drop caches
> > > did not get rid of the un-pinned inode when test was run with
> > > ext2 or ext4 on my test VM. With xfs, the test works fine for me,
> > > but it may not work for everyone.
> > >
> > > Perhaps you have a suggestion for a better way to test inode eviction.
> >
> > Drop caches does not evict dirty inodes. The inode is likely dirty because
> > you have chmodded it just before drop caches. So I think calling sync or
> > syncfs before dropping caches should fix your problems with ext2 / ext4.  I
> > suspect this has worked for XFS only because it does its private inode
> > dirtiness tracking and keeps the inode behind VFS's back.
>
> I did think of that and tried to fsync which did not help, but maybe
> I messed it up somehow.
>

You were right. fsync did fix the test.

Thanks,
Amir.
Amir Goldstein June 13, 2022, 5:40 a.m. UTC | #4
On Sun, Mar 20, 2022 at 2:54 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Mar 17, 2022 at 5:14 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Thu, Mar 17, 2022 at 4:12 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Mon 07-03-22 17:57:36, Amir Goldstein wrote:
> > > > Jan,
> > > >
> > > > Following RFC discussion [1], following are the volatile mark patches.
> > > >
> > > > Tested both manually and with this LTP test [2].
> > > > I was struggling with this test for a while because drop caches
> > > > did not get rid of the un-pinned inode when test was run with
> > > > ext2 or ext4 on my test VM. With xfs, the test works fine for me,
> > > > but it may not work for everyone.
> > > >
> > > > Perhaps you have a suggestion for a better way to test inode eviction.
> > >
> > > Drop caches does not evict dirty inodes. The inode is likely dirty because
> > > you have chmodded it just before drop caches. So I think calling sync or
> > > syncfs before dropping caches should fix your problems with ext2 / ext4.  I
> > > suspect this has worked for XFS only because it does its private inode
> > > dirtiness tracking and keeps the inode behind VFS's back.
> >
> > I did think of that and tried to fsync which did not help, but maybe
> > I messed it up somehow.
> >
>
> You were right. fsync did fix the test.

Hi Jan,

I was preparing to post the LTP test for FAN_MARK_EVICTABLE [1]
and I realized the issue we discussed above was not really resolved.
fsync() + drop_caches is not enough to guarantee reliable inode eviction.

It "kind of" works for ext2 and xfs, but not for ext4, ext3, btrfs.
"kind of" because even for ext2 and xfs, dropping only inode cache (2)
doesn't evict the inode/mark and dropping inode+page cache (3) does work
most of the time, although I did occasionally see failures.
I suspect those failures were related to running the test on a system with very
low page cache usage.
The fact that I had to tweak vfs_cache_pressure to increase test reliability
also suggests that there are heuristics at play.

I guess I could fill the page cache with pages to rig the game.
Do you have other suggestions on how to increase the reliability of the test?
That is, how to reliably evict a non-dirty inode.

Thanks,
Amir.

[1] https://github.com/amir73il/ltp/blob/fan_evictable/testcases/kernel/syscalls/fanotify/fanotify23.c
Jan Kara June 13, 2022, 11:59 a.m. UTC | #5
On Mon 13-06-22 08:40:37, Amir Goldstein wrote:
> On Sun, Mar 20, 2022 at 2:54 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Thu, Mar 17, 2022 at 5:14 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Thu, Mar 17, 2022 at 4:12 PM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Mon 07-03-22 17:57:36, Amir Goldstein wrote:
> > > > > Jan,
> > > > >
> > > > > Following RFC discussion [1], following are the volatile mark patches.
> > > > >
> > > > > Tested both manually and with this LTP test [2].
> > > > > I was struggling with this test for a while because drop caches
> > > > > did not get rid of the un-pinned inode when test was run with
> > > > > ext2 or ext4 on my test VM. With xfs, the test works fine for me,
> > > > > but it may not work for everyone.
> > > > >
> > > > > Perhaps you have a suggestion for a better way to test inode eviction.
> > > >
> > > > Drop caches does not evict dirty inodes. The inode is likely dirty because
> > > > you have chmodded it just before drop caches. So I think calling sync or
> > > > syncfs before dropping caches should fix your problems with ext2 / ext4.  I
> > > > suspect this has worked for XFS only because it does its private inode
> > > > dirtiness tracking and keeps the inode behind VFS's back.
> > >
> > > I did think of that and tried to fsync which did not help, but maybe
> > > I messed it up somehow.
> > >
> >
> > You were right. fsync did fix the test.
> 
> Hi Jan,
> 
> I was preparing to post the LTP test for FAN_MARK_EVICTABLE [1]
> and I realized the issue we discussed above was not really resolved.
> fsync() + drop_caches is not enough to guarantee reliable inode eviction.
> 
> It "kind of" works for ext2 and xfs, but not for ext4, ext3, btrfs.
> "kind of" because even for ext2 and xfs, dropping only inode cache (2)
> doesn't evict the inode/mark and dropping inode+page cache (3) does work
> most of the time, although I did occasionally see failures.
> I suspect those failures were related to running the test on a system
> with very low page cache usage.
> The fact that I had to tweak vfs_cache_pressure to increase test reliability
> also suggests that there are heuristics at play.

Well, yes, there's no guaranteed way to force inode out of cache. It is all
best-effort stuff. When we needed to make sure inode goes out of cache on
nearest occasion, we have introduced d_mark_dontcache() but there's no
fs common way to set this flag on dentry and I don't think we want to
expose one.

I was thinking whether we have some more reliable way to test this
functionality and I didn't find one. One other obvious approach to the test
is to create memcgroup with low memory limit, tag large tree with evictable
mark, and see whether the memory gets exhausted. This is kind of where this
functionality is aimed. But there are also variables in this testing scheme
that may be difficult to tame and the test will likely take rather long
time to perform.

								Honza
Amir Goldstein June 13, 2022, 2:16 p.m. UTC | #6
On Mon, Jun 13, 2022 at 2:59 PM Jan Kara <jack@suse.cz> wrote:
>
> On Mon 13-06-22 08:40:37, Amir Goldstein wrote:
> > On Sun, Mar 20, 2022 at 2:54 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Thu, Mar 17, 2022 at 5:14 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > On Thu, Mar 17, 2022 at 4:12 PM Jan Kara <jack@suse.cz> wrote:
> > > > >
> > > > > On Mon 07-03-22 17:57:36, Amir Goldstein wrote:
> > > > > > Jan,
> > > > > >
> > > > > > Following RFC discussion [1], following are the volatile mark patches.
> > > > > >
> > > > > > Tested both manually and with this LTP test [2].
> > > > > > I was struggling with this test for a while because drop caches
> > > > > > did not get rid of the un-pinned inode when test was run with
> > > > > > ext2 or ext4 on my test VM. With xfs, the test works fine for me,
> > > > > > but it may not work for everyone.
> > > > > >
> > > > > > Perhaps you have a suggestion for a better way to test inode eviction.
> > > > >
> > > > > Drop caches does not evict dirty inodes. The inode is likely dirty because
> > > > > you have chmodded it just before drop caches. So I think calling sync or
> > > > > syncfs before dropping caches should fix your problems with ext2 / ext4.  I
> > > > > suspect this has worked for XFS only because it does its private inode
> > > > > dirtiness tracking and keeps the inode behind VFS's back.
> > > >
> > > > I did think of that and tried to fsync which did not help, but maybe
> > > > I messed it up somehow.
> > > >
> > >
> > > You were right. fsync did fix the test.
> >
> > Hi Jan,
> >
> > I was preparing to post the LTP test for FAN_MARK_EVICTABLE [1]
> > and I realized the issue we discussed above was not really resolved.
> > fsync() + drop_caches is not enough to guarantee reliable inode eviction.
> >
> > It "kind of" works for ext2 and xfs, but not for ext4, ext3, btrfs.
> > "kind of" because even for ext2 and xfs, dropping only inode cache (2)
> > doesn't evict the inode/mark and dropping inode+page cache (3) does work
> > most of the time, although I did occasionally see failures.
> > I suspect those failures were related to running the test on a system
> > with very low page cache usage.
> > The fact that I had to tweak vfs_cache_pressure to increase test reliability
> > also suggests that there are heuristics at play.
>
> Well, yes, there's no guaranteed way to force inode out of cache. It is all
> best-effort stuff. When we needed to make sure inode goes out of cache on
> nearest occasion, we have introduced d_mark_dontcache() but there's no
> fs common way to set this flag on dentry and I don't think we want to
> expose one.
>
> I was thinking whether we have some more reliable way to test this
> functionality and I didn't find one. One other obvious approach to the test
> is to create memcgroup with low memory limit, tag large tree with evictable
> mark, and see whether the memory gets exhausted. This is kind of where this
> functionality is aimed. But there are also variables in this testing scheme
> that may be difficult to tame and the test will likely take rather long
> time to perform.

That's why I had the debugging FAN_IOC_SET_MARK_PAGE_ORDER
ioctl, which I used for a reliable and fast direct reclaim test.
Anyway, I am going to submit the test with the current kludges to run
on ext2 and see if anyone complains.

Thanks,
Amir.