Message ID | 20250216165008.6671-3-luis@igalia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fuse: allow notify_inval for all inodes | expand |
On 2/16/25 17:50, Luis Henriques wrote: > Currently userspace is able to notify the kernel to invalidate the cache > for an inode. This means that, if all the inodes in a filesystem need to > be invalidated, then userspace needs to iterate through all of them and do > this kernel notification separately. > > This patch adds a new option that allows userspace to invalidate all the > inodes with a single notification operation. In addition to invalidate > all the inodes, it also shrinks the sb dcache. > > Signed-off-by: Luis Henriques <luis@igalia.com> > --- > fs/fuse/inode.c | 33 +++++++++++++++++++++++++++++++++ > include/uapi/linux/fuse.h | 3 +++ > 2 files changed, 36 insertions(+) > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index e9db2cb8c150..01a4dc5677ae 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -547,6 +547,36 @@ struct inode *fuse_ilookup(struct fuse_conn *fc, u64 nodeid, > return NULL; > } > > +static int fuse_reverse_inval_all(struct fuse_conn *fc) > +{ > + struct fuse_mount *fm; > + struct inode *inode; > + > + inode = fuse_ilookup(fc, FUSE_ROOT_ID, &fm); > + if (!inode || !fm) > + return -ENOENT; > + > + /* Remove all possible active references to cached inodes */ > + shrink_dcache_sb(fm->sb); > + > + /* Remove all unreferenced inodes from cache */ > + invalidate_inodes(fm->sb); > + > + return 0; > +} > + > +/* > + * Notify to invalidate inodes cache. It can be called with @nodeid set to > + * either: > + * > + * - An inode number - Any pending writebacks within the rage [@offset @len] > + * will be triggered and the inode will be validated. To invalidate the whole > + * cache @offset has to be set to '0' and @len needs to be <= '0'; if @offset > + * is negative, only the inode attributes are invalidated. > + * > + * - FUSE_INVAL_ALL_INODES - All the inodes in the superblock are invalidated > + * and the whole dcache is shrinked. > + */ > int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, > loff_t offset, loff_t len) > { > @@ -555,6 +585,9 @@ int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, > pgoff_t pg_start; > pgoff_t pg_end; > > + if (nodeid == FUSE_INVAL_ALL_INODES) > + return fuse_reverse_inval_all(fc); > + > inode = fuse_ilookup(fc, nodeid, NULL); > if (!inode) > return -ENOENT; > diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h > index 5e0eb41d967e..e5852b63f99f 100644 > --- a/include/uapi/linux/fuse.h > +++ b/include/uapi/linux/fuse.h > @@ -669,6 +669,9 @@ enum fuse_notify_code { > FUSE_NOTIFY_CODE_MAX, > }; > > +/* The nodeid to request to invalidate all inodes */ > +#define FUSE_INVAL_ALL_INODES 0 > + > /* The read buffer is required to be at least 8k, but may be much larger */ > #define FUSE_MIN_READ_BUFFER 8192 > I think this version might end up in static void fuse_evict_inode(struct inode *inode) { struct fuse_inode *fi = get_fuse_inode(inode); /* Will write inode on close/munmap and in all other dirtiers */ WARN_ON(inode->i_state & I_DIRTY_INODE); if the fuse connection has writeback cache enabled. Without having it tested, reproducer would probably be to run something like passthrough_hp (without --direct-io), opening and writing to a file and then sending FUSE_INVAL_ALL_INODES. Thanks, Bernd
On Mon, Feb 17 2025, Bernd Schubert wrote: > On 2/16/25 17:50, Luis Henriques wrote: >> Currently userspace is able to notify the kernel to invalidate the cache >> for an inode. This means that, if all the inodes in a filesystem need to >> be invalidated, then userspace needs to iterate through all of them and do >> this kernel notification separately. >> >> This patch adds a new option that allows userspace to invalidate all the >> inodes with a single notification operation. In addition to invalidate >> all the inodes, it also shrinks the sb dcache. >> >> Signed-off-by: Luis Henriques <luis@igalia.com> >> --- >> fs/fuse/inode.c | 33 +++++++++++++++++++++++++++++++++ >> include/uapi/linux/fuse.h | 3 +++ >> 2 files changed, 36 insertions(+) >> >> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c >> index e9db2cb8c150..01a4dc5677ae 100644 >> --- a/fs/fuse/inode.c >> +++ b/fs/fuse/inode.c >> @@ -547,6 +547,36 @@ struct inode *fuse_ilookup(struct fuse_conn *fc, u64 nodeid, >> return NULL; >> } >> >> +static int fuse_reverse_inval_all(struct fuse_conn *fc) >> +{ >> + struct fuse_mount *fm; >> + struct inode *inode; >> + >> + inode = fuse_ilookup(fc, FUSE_ROOT_ID, &fm); >> + if (!inode || !fm) >> + return -ENOENT; >> + >> + /* Remove all possible active references to cached inodes */ >> + shrink_dcache_sb(fm->sb); >> + >> + /* Remove all unreferenced inodes from cache */ >> + invalidate_inodes(fm->sb); >> + >> + return 0; >> +} >> + >> +/* >> + * Notify to invalidate inodes cache. It can be called with @nodeid set to >> + * either: >> + * >> + * - An inode number - Any pending writebacks within the rage [@offset @len] >> + * will be triggered and the inode will be validated. To invalidate the whole >> + * cache @offset has to be set to '0' and @len needs to be <= '0'; if @offset >> + * is negative, only the inode attributes are invalidated. >> + * >> + * - FUSE_INVAL_ALL_INODES - All the inodes in the superblock are invalidated >> + * and the whole dcache is shrinked. >> + */ >> int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, >> loff_t offset, loff_t len) >> { >> @@ -555,6 +585,9 @@ int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, >> pgoff_t pg_start; >> pgoff_t pg_end; >> >> + if (nodeid == FUSE_INVAL_ALL_INODES) >> + return fuse_reverse_inval_all(fc); >> + >> inode = fuse_ilookup(fc, nodeid, NULL); >> if (!inode) >> return -ENOENT; >> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h >> index 5e0eb41d967e..e5852b63f99f 100644 >> --- a/include/uapi/linux/fuse.h >> +++ b/include/uapi/linux/fuse.h >> @@ -669,6 +669,9 @@ enum fuse_notify_code { >> FUSE_NOTIFY_CODE_MAX, >> }; >> >> +/* The nodeid to request to invalidate all inodes */ >> +#define FUSE_INVAL_ALL_INODES 0 >> + >> /* The read buffer is required to be at least 8k, but may be much larger */ >> #define FUSE_MIN_READ_BUFFER 8192 >> > > > I think this version might end up in > > static void fuse_evict_inode(struct inode *inode) > { > struct fuse_inode *fi = get_fuse_inode(inode); > > /* Will write inode on close/munmap and in all other dirtiers */ > WARN_ON(inode->i_state & I_DIRTY_INODE); > > > if the fuse connection has writeback cache enabled. > > > Without having it tested, reproducer would probably be to run > something like passthrough_hp (without --direct-io), opening > and writing to a file and then sending FUSE_INVAL_ALL_INODES. Thanks, Bernd. So far I couldn't trigger this warning. But I just found that there's a stupid bug in the code: a missing iput() after doing the fuse_ilookup(). I'll spend some more time trying to understand how (or if) the warning you mentioned can triggered before sending a new revision. Cheers,
On 2/17/25 11:07, Luis Henriques wrote: > On Mon, Feb 17 2025, Bernd Schubert wrote: > >> On 2/16/25 17:50, Luis Henriques wrote: >>> Currently userspace is able to notify the kernel to invalidate the cache >>> for an inode. This means that, if all the inodes in a filesystem need to >>> be invalidated, then userspace needs to iterate through all of them and do >>> this kernel notification separately. >>> >>> This patch adds a new option that allows userspace to invalidate all the >>> inodes with a single notification operation. In addition to invalidate >>> all the inodes, it also shrinks the sb dcache. >>> >>> Signed-off-by: Luis Henriques <luis@igalia.com> >>> --- >>> fs/fuse/inode.c | 33 +++++++++++++++++++++++++++++++++ >>> include/uapi/linux/fuse.h | 3 +++ >>> 2 files changed, 36 insertions(+) >>> >>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c >>> index e9db2cb8c150..01a4dc5677ae 100644 >>> --- a/fs/fuse/inode.c >>> +++ b/fs/fuse/inode.c >>> @@ -547,6 +547,36 @@ struct inode *fuse_ilookup(struct fuse_conn *fc, u64 nodeid, >>> return NULL; >>> } >>> >>> +static int fuse_reverse_inval_all(struct fuse_conn *fc) >>> +{ >>> + struct fuse_mount *fm; >>> + struct inode *inode; >>> + >>> + inode = fuse_ilookup(fc, FUSE_ROOT_ID, &fm); >>> + if (!inode || !fm) >>> + return -ENOENT; >>> + >>> + /* Remove all possible active references to cached inodes */ >>> + shrink_dcache_sb(fm->sb); >>> + >>> + /* Remove all unreferenced inodes from cache */ >>> + invalidate_inodes(fm->sb); >>> + >>> + return 0; >>> +} >>> + >>> +/* >>> + * Notify to invalidate inodes cache. It can be called with @nodeid set to >>> + * either: >>> + * >>> + * - An inode number - Any pending writebacks within the rage [@offset @len] >>> + * will be triggered and the inode will be validated. To invalidate the whole >>> + * cache @offset has to be set to '0' and @len needs to be <= '0'; if @offset >>> + * is negative, only the inode attributes are invalidated. >>> + * >>> + * - FUSE_INVAL_ALL_INODES - All the inodes in the superblock are invalidated >>> + * and the whole dcache is shrinked. >>> + */ >>> int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, >>> loff_t offset, loff_t len) >>> { >>> @@ -555,6 +585,9 @@ int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, >>> pgoff_t pg_start; >>> pgoff_t pg_end; >>> >>> + if (nodeid == FUSE_INVAL_ALL_INODES) >>> + return fuse_reverse_inval_all(fc); >>> + >>> inode = fuse_ilookup(fc, nodeid, NULL); >>> if (!inode) >>> return -ENOENT; >>> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h >>> index 5e0eb41d967e..e5852b63f99f 100644 >>> --- a/include/uapi/linux/fuse.h >>> +++ b/include/uapi/linux/fuse.h >>> @@ -669,6 +669,9 @@ enum fuse_notify_code { >>> FUSE_NOTIFY_CODE_MAX, >>> }; >>> >>> +/* The nodeid to request to invalidate all inodes */ >>> +#define FUSE_INVAL_ALL_INODES 0 >>> + >>> /* The read buffer is required to be at least 8k, but may be much larger */ >>> #define FUSE_MIN_READ_BUFFER 8192 >>> >> >> >> I think this version might end up in >> >> static void fuse_evict_inode(struct inode *inode) >> { >> struct fuse_inode *fi = get_fuse_inode(inode); >> >> /* Will write inode on close/munmap and in all other dirtiers */ >> WARN_ON(inode->i_state & I_DIRTY_INODE); >> >> >> if the fuse connection has writeback cache enabled. >> >> >> Without having it tested, reproducer would probably be to run >> something like passthrough_hp (without --direct-io), opening >> and writing to a file and then sending FUSE_INVAL_ALL_INODES. > > Thanks, Bernd. So far I couldn't trigger this warning. But I just found > that there's a stupid bug in the code: a missing iput() after doing the > fuse_ilookup(). > > I'll spend some more time trying to understand how (or if) the warning you > mentioned can triggered before sending a new revision. > Maybe I'm wrong, but it calls invalidate_inodes() dispose_list() evict(inode) fuse_evict_inode() and if at the same time something writes to inode page cache, the warning would be triggered? There are some conditions in evict, like inode_wait_for_writeback() that might protect us, but what is if it waited and then just in the right time the another write comes and dirties the inode again? Thanks, Bernd
On Mon, Feb 17 2025, Bernd Schubert wrote: > On 2/17/25 11:07, Luis Henriques wrote: >> On Mon, Feb 17 2025, Bernd Schubert wrote: >> >>> On 2/16/25 17:50, Luis Henriques wrote: >>>> Currently userspace is able to notify the kernel to invalidate the cache >>>> for an inode. This means that, if all the inodes in a filesystem need to >>>> be invalidated, then userspace needs to iterate through all of them and do >>>> this kernel notification separately. >>>> >>>> This patch adds a new option that allows userspace to invalidate all the >>>> inodes with a single notification operation. In addition to invalidate >>>> all the inodes, it also shrinks the sb dcache. >>>> >>>> Signed-off-by: Luis Henriques <luis@igalia.com> >>>> --- >>>> fs/fuse/inode.c | 33 +++++++++++++++++++++++++++++++++ >>>> include/uapi/linux/fuse.h | 3 +++ >>>> 2 files changed, 36 insertions(+) >>>> >>>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c >>>> index e9db2cb8c150..01a4dc5677ae 100644 >>>> --- a/fs/fuse/inode.c >>>> +++ b/fs/fuse/inode.c >>>> @@ -547,6 +547,36 @@ struct inode *fuse_ilookup(struct fuse_conn *fc, u64 nodeid, >>>> return NULL; >>>> } >>>> >>>> +static int fuse_reverse_inval_all(struct fuse_conn *fc) >>>> +{ >>>> + struct fuse_mount *fm; >>>> + struct inode *inode; >>>> + >>>> + inode = fuse_ilookup(fc, FUSE_ROOT_ID, &fm); >>>> + if (!inode || !fm) >>>> + return -ENOENT; >>>> + >>>> + /* Remove all possible active references to cached inodes */ >>>> + shrink_dcache_sb(fm->sb); >>>> + >>>> + /* Remove all unreferenced inodes from cache */ >>>> + invalidate_inodes(fm->sb); >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +/* >>>> + * Notify to invalidate inodes cache. It can be called with @nodeid set to >>>> + * either: >>>> + * >>>> + * - An inode number - Any pending writebacks within the rage [@offset @len] >>>> + * will be triggered and the inode will be validated. To invalidate the whole >>>> + * cache @offset has to be set to '0' and @len needs to be <= '0'; if @offset >>>> + * is negative, only the inode attributes are invalidated. >>>> + * >>>> + * - FUSE_INVAL_ALL_INODES - All the inodes in the superblock are invalidated >>>> + * and the whole dcache is shrinked. >>>> + */ >>>> int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, >>>> loff_t offset, loff_t len) >>>> { >>>> @@ -555,6 +585,9 @@ int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, >>>> pgoff_t pg_start; >>>> pgoff_t pg_end; >>>> >>>> + if (nodeid == FUSE_INVAL_ALL_INODES) >>>> + return fuse_reverse_inval_all(fc); >>>> + >>>> inode = fuse_ilookup(fc, nodeid, NULL); >>>> if (!inode) >>>> return -ENOENT; >>>> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h >>>> index 5e0eb41d967e..e5852b63f99f 100644 >>>> --- a/include/uapi/linux/fuse.h >>>> +++ b/include/uapi/linux/fuse.h >>>> @@ -669,6 +669,9 @@ enum fuse_notify_code { >>>> FUSE_NOTIFY_CODE_MAX, >>>> }; >>>> >>>> +/* The nodeid to request to invalidate all inodes */ >>>> +#define FUSE_INVAL_ALL_INODES 0 >>>> + >>>> /* The read buffer is required to be at least 8k, but may be much larger */ >>>> #define FUSE_MIN_READ_BUFFER 8192 >>>> >>> >>> >>> I think this version might end up in >>> >>> static void fuse_evict_inode(struct inode *inode) >>> { >>> struct fuse_inode *fi = get_fuse_inode(inode); >>> >>> /* Will write inode on close/munmap and in all other dirtiers */ >>> WARN_ON(inode->i_state & I_DIRTY_INODE); >>> >>> >>> if the fuse connection has writeback cache enabled. >>> >>> >>> Without having it tested, reproducer would probably be to run >>> something like passthrough_hp (without --direct-io), opening >>> and writing to a file and then sending FUSE_INVAL_ALL_INODES. >> >> Thanks, Bernd. So far I couldn't trigger this warning. But I just found >> that there's a stupid bug in the code: a missing iput() after doing the >> fuse_ilookup(). >> >> I'll spend some more time trying to understand how (or if) the warning you >> mentioned can triggered before sending a new revision. >> > > Maybe I'm wrong, but it calls > > invalidate_inodes() > dispose_list() > evict(inode) > fuse_evict_inode() > > and if at the same time something writes to inode page cache, the > warning would be triggered? > There are some conditions in evict, like inode_wait_for_writeback() > that might protect us, but what is if it waited and then just > in the right time the another write comes and dirties the inode > again? Right, I have looked into that too but my understanding is that this can not happen because, before doing that wait, the code does: inode_sb_list_del(inode); and the inode state will include I_FREEING. Thus, before writing to it again, the inode will need to get added back to the sb list. Also, reading the comments on evict(), if something writes into the inode at that point that's likely a bug. But this is just my understanding, and I may be missing something. Cheers,
On Mon 17-02-25 11:47:09, Luis Henriques wrote: > On Mon, Feb 17 2025, Bernd Schubert wrote: > > On 2/17/25 11:07, Luis Henriques wrote: > >> On Mon, Feb 17 2025, Bernd Schubert wrote: > >> > >>> On 2/16/25 17:50, Luis Henriques wrote: > >>>> Currently userspace is able to notify the kernel to invalidate the cache > >>>> for an inode. This means that, if all the inodes in a filesystem need to > >>>> be invalidated, then userspace needs to iterate through all of them and do > >>>> this kernel notification separately. > >>>> > >>>> This patch adds a new option that allows userspace to invalidate all the > >>>> inodes with a single notification operation. In addition to invalidate > >>>> all the inodes, it also shrinks the sb dcache. > >>>> > >>>> Signed-off-by: Luis Henriques <luis@igalia.com> > >>>> --- > >>>> fs/fuse/inode.c | 33 +++++++++++++++++++++++++++++++++ > >>>> include/uapi/linux/fuse.h | 3 +++ > >>>> 2 files changed, 36 insertions(+) > >>>> > >>>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > >>>> index e9db2cb8c150..01a4dc5677ae 100644 > >>>> --- a/fs/fuse/inode.c > >>>> +++ b/fs/fuse/inode.c > >>>> @@ -547,6 +547,36 @@ struct inode *fuse_ilookup(struct fuse_conn *fc, u64 nodeid, > >>>> return NULL; > >>>> } > >>>> > >>>> +static int fuse_reverse_inval_all(struct fuse_conn *fc) > >>>> +{ > >>>> + struct fuse_mount *fm; > >>>> + struct inode *inode; > >>>> + > >>>> + inode = fuse_ilookup(fc, FUSE_ROOT_ID, &fm); > >>>> + if (!inode || !fm) > >>>> + return -ENOENT; > >>>> + > >>>> + /* Remove all possible active references to cached inodes */ > >>>> + shrink_dcache_sb(fm->sb); > >>>> + > >>>> + /* Remove all unreferenced inodes from cache */ > >>>> + invalidate_inodes(fm->sb); > >>>> + > >>>> + return 0; > >>>> +} > >>>> + > >>>> +/* > >>>> + * Notify to invalidate inodes cache. It can be called with @nodeid set to > >>>> + * either: > >>>> + * > >>>> + * - An inode number - Any pending writebacks within the rage [@offset @len] > >>>> + * will be triggered and the inode will be validated. To invalidate the whole > >>>> + * cache @offset has to be set to '0' and @len needs to be <= '0'; if @offset > >>>> + * is negative, only the inode attributes are invalidated. > >>>> + * > >>>> + * - FUSE_INVAL_ALL_INODES - All the inodes in the superblock are invalidated > >>>> + * and the whole dcache is shrinked. > >>>> + */ > >>>> int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, > >>>> loff_t offset, loff_t len) > >>>> { > >>>> @@ -555,6 +585,9 @@ int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, > >>>> pgoff_t pg_start; > >>>> pgoff_t pg_end; > >>>> > >>>> + if (nodeid == FUSE_INVAL_ALL_INODES) > >>>> + return fuse_reverse_inval_all(fc); > >>>> + > >>>> inode = fuse_ilookup(fc, nodeid, NULL); > >>>> if (!inode) > >>>> return -ENOENT; > >>>> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h > >>>> index 5e0eb41d967e..e5852b63f99f 100644 > >>>> --- a/include/uapi/linux/fuse.h > >>>> +++ b/include/uapi/linux/fuse.h > >>>> @@ -669,6 +669,9 @@ enum fuse_notify_code { > >>>> FUSE_NOTIFY_CODE_MAX, > >>>> }; > >>>> > >>>> +/* The nodeid to request to invalidate all inodes */ > >>>> +#define FUSE_INVAL_ALL_INODES 0 > >>>> + > >>>> /* The read buffer is required to be at least 8k, but may be much larger */ > >>>> #define FUSE_MIN_READ_BUFFER 8192 > >>>> > >>> > >>> > >>> I think this version might end up in > >>> > >>> static void fuse_evict_inode(struct inode *inode) > >>> { > >>> struct fuse_inode *fi = get_fuse_inode(inode); > >>> > >>> /* Will write inode on close/munmap and in all other dirtiers */ > >>> WARN_ON(inode->i_state & I_DIRTY_INODE); > >>> > >>> > >>> if the fuse connection has writeback cache enabled. > >>> > >>> > >>> Without having it tested, reproducer would probably be to run > >>> something like passthrough_hp (without --direct-io), opening > >>> and writing to a file and then sending FUSE_INVAL_ALL_INODES. > >> > >> Thanks, Bernd. So far I couldn't trigger this warning. But I just found > >> that there's a stupid bug in the code: a missing iput() after doing the > >> fuse_ilookup(). > >> > >> I'll spend some more time trying to understand how (or if) the warning you > >> mentioned can triggered before sending a new revision. > >> > > > > Maybe I'm wrong, but it calls > > > > invalidate_inodes() > > dispose_list() > > evict(inode) > > fuse_evict_inode() > > > > and if at the same time something writes to inode page cache, the > > warning would be triggered? > > There are some conditions in evict, like inode_wait_for_writeback() > > that might protect us, but what is if it waited and then just > > in the right time the another write comes and dirties the inode > > again? > > Right, I have looked into that too but my understanding is that this can > not happen because, before doing that wait, the code does: > > inode_sb_list_del(inode); > > and the inode state will include I_FREEING. > > Thus, before writing to it again, the inode will need to get added back to > the sb list. Also, reading the comments on evict(), if something writes > into the inode at that point that's likely a bug. But this is just my > understanding, and I may be missing something. Yes. invalidate_inodes() checks i_count == 0 and sets I_FREEING. Once I_FREEING is set nobody can acquire inode reference until the inode is fully destroyed. So nobody should be writing to the inode or anything like that. Honza
On Tue, Feb 18 2025, Jan Kara wrote: > On Mon 17-02-25 11:47:09, Luis Henriques wrote: >> On Mon, Feb 17 2025, Bernd Schubert wrote: >> > On 2/17/25 11:07, Luis Henriques wrote: >> >> On Mon, Feb 17 2025, Bernd Schubert wrote: >> >> >> >>> On 2/16/25 17:50, Luis Henriques wrote: >> >>>> Currently userspace is able to notify the kernel to invalidate the cache >> >>>> for an inode. This means that, if all the inodes in a filesystem need to >> >>>> be invalidated, then userspace needs to iterate through all of them and do >> >>>> this kernel notification separately. >> >>>> >> >>>> This patch adds a new option that allows userspace to invalidate all the >> >>>> inodes with a single notification operation. In addition to invalidate >> >>>> all the inodes, it also shrinks the sb dcache. >> >>>> >> >>>> Signed-off-by: Luis Henriques <luis@igalia.com> >> >>>> --- >> >>>> fs/fuse/inode.c | 33 +++++++++++++++++++++++++++++++++ >> >>>> include/uapi/linux/fuse.h | 3 +++ >> >>>> 2 files changed, 36 insertions(+) >> >>>> >> >>>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c >> >>>> index e9db2cb8c150..01a4dc5677ae 100644 >> >>>> --- a/fs/fuse/inode.c >> >>>> +++ b/fs/fuse/inode.c >> >>>> @@ -547,6 +547,36 @@ struct inode *fuse_ilookup(struct fuse_conn *fc, u64 nodeid, >> >>>> return NULL; >> >>>> } >> >>>> >> >>>> +static int fuse_reverse_inval_all(struct fuse_conn *fc) >> >>>> +{ >> >>>> + struct fuse_mount *fm; >> >>>> + struct inode *inode; >> >>>> + >> >>>> + inode = fuse_ilookup(fc, FUSE_ROOT_ID, &fm); >> >>>> + if (!inode || !fm) >> >>>> + return -ENOENT; >> >>>> + >> >>>> + /* Remove all possible active references to cached inodes */ >> >>>> + shrink_dcache_sb(fm->sb); >> >>>> + >> >>>> + /* Remove all unreferenced inodes from cache */ >> >>>> + invalidate_inodes(fm->sb); >> >>>> + >> >>>> + return 0; >> >>>> +} >> >>>> + >> >>>> +/* >> >>>> + * Notify to invalidate inodes cache. It can be called with @nodeid set to >> >>>> + * either: >> >>>> + * >> >>>> + * - An inode number - Any pending writebacks within the rage [@offset @len] >> >>>> + * will be triggered and the inode will be validated. To invalidate the whole >> >>>> + * cache @offset has to be set to '0' and @len needs to be <= '0'; if @offset >> >>>> + * is negative, only the inode attributes are invalidated. >> >>>> + * >> >>>> + * - FUSE_INVAL_ALL_INODES - All the inodes in the superblock are invalidated >> >>>> + * and the whole dcache is shrinked. >> >>>> + */ >> >>>> int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, >> >>>> loff_t offset, loff_t len) >> >>>> { >> >>>> @@ -555,6 +585,9 @@ int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, >> >>>> pgoff_t pg_start; >> >>>> pgoff_t pg_end; >> >>>> >> >>>> + if (nodeid == FUSE_INVAL_ALL_INODES) >> >>>> + return fuse_reverse_inval_all(fc); >> >>>> + >> >>>> inode = fuse_ilookup(fc, nodeid, NULL); >> >>>> if (!inode) >> >>>> return -ENOENT; >> >>>> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h >> >>>> index 5e0eb41d967e..e5852b63f99f 100644 >> >>>> --- a/include/uapi/linux/fuse.h >> >>>> +++ b/include/uapi/linux/fuse.h >> >>>> @@ -669,6 +669,9 @@ enum fuse_notify_code { >> >>>> FUSE_NOTIFY_CODE_MAX, >> >>>> }; >> >>>> >> >>>> +/* The nodeid to request to invalidate all inodes */ >> >>>> +#define FUSE_INVAL_ALL_INODES 0 >> >>>> + >> >>>> /* The read buffer is required to be at least 8k, but may be much larger */ >> >>>> #define FUSE_MIN_READ_BUFFER 8192 >> >>>> >> >>> >> >>> >> >>> I think this version might end up in >> >>> >> >>> static void fuse_evict_inode(struct inode *inode) >> >>> { >> >>> struct fuse_inode *fi = get_fuse_inode(inode); >> >>> >> >>> /* Will write inode on close/munmap and in all other dirtiers */ >> >>> WARN_ON(inode->i_state & I_DIRTY_INODE); >> >>> >> >>> >> >>> if the fuse connection has writeback cache enabled. >> >>> >> >>> >> >>> Without having it tested, reproducer would probably be to run >> >>> something like passthrough_hp (without --direct-io), opening >> >>> and writing to a file and then sending FUSE_INVAL_ALL_INODES. >> >> >> >> Thanks, Bernd. So far I couldn't trigger this warning. But I just found >> >> that there's a stupid bug in the code: a missing iput() after doing the >> >> fuse_ilookup(). >> >> >> >> I'll spend some more time trying to understand how (or if) the warning you >> >> mentioned can triggered before sending a new revision. >> >> >> > >> > Maybe I'm wrong, but it calls >> > >> > invalidate_inodes() >> > dispose_list() >> > evict(inode) >> > fuse_evict_inode() >> > >> > and if at the same time something writes to inode page cache, the >> > warning would be triggered? >> > There are some conditions in evict, like inode_wait_for_writeback() >> > that might protect us, but what is if it waited and then just >> > in the right time the another write comes and dirties the inode >> > again? >> >> Right, I have looked into that too but my understanding is that this can >> not happen because, before doing that wait, the code does: >> >> inode_sb_list_del(inode); >> >> and the inode state will include I_FREEING. >> >> Thus, before writing to it again, the inode will need to get added back to >> the sb list. Also, reading the comments on evict(), if something writes >> into the inode at that point that's likely a bug. But this is just my >> understanding, and I may be missing something. > > Yes. invalidate_inodes() checks i_count == 0 and sets I_FREEING. Once > I_FREEING is set nobody can acquire inode reference until the inode is > fully destroyed. So nobody should be writing to the inode or anything like > that. Awesome, it's good to have that confirmed. Thank you, Jan! Cheers,
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index e9db2cb8c150..01a4dc5677ae 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -547,6 +547,36 @@ struct inode *fuse_ilookup(struct fuse_conn *fc, u64 nodeid, return NULL; } +static int fuse_reverse_inval_all(struct fuse_conn *fc) +{ + struct fuse_mount *fm; + struct inode *inode; + + inode = fuse_ilookup(fc, FUSE_ROOT_ID, &fm); + if (!inode || !fm) + return -ENOENT; + + /* Remove all possible active references to cached inodes */ + shrink_dcache_sb(fm->sb); + + /* Remove all unreferenced inodes from cache */ + invalidate_inodes(fm->sb); + + return 0; +} + +/* + * Notify to invalidate inodes cache. It can be called with @nodeid set to + * either: + * + * - An inode number - Any pending writebacks within the rage [@offset @len] + * will be triggered and the inode will be validated. To invalidate the whole + * cache @offset has to be set to '0' and @len needs to be <= '0'; if @offset + * is negative, only the inode attributes are invalidated. + * + * - FUSE_INVAL_ALL_INODES - All the inodes in the superblock are invalidated + * and the whole dcache is shrinked. + */ int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, loff_t offset, loff_t len) { @@ -555,6 +585,9 @@ int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid, pgoff_t pg_start; pgoff_t pg_end; + if (nodeid == FUSE_INVAL_ALL_INODES) + return fuse_reverse_inval_all(fc); + inode = fuse_ilookup(fc, nodeid, NULL); if (!inode) return -ENOENT; diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 5e0eb41d967e..e5852b63f99f 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -669,6 +669,9 @@ enum fuse_notify_code { FUSE_NOTIFY_CODE_MAX, }; +/* The nodeid to request to invalidate all inodes */ +#define FUSE_INVAL_ALL_INODES 0 + /* The read buffer is required to be at least 8k, but may be much larger */ #define FUSE_MIN_READ_BUFFER 8192
Currently userspace is able to notify the kernel to invalidate the cache for an inode. This means that, if all the inodes in a filesystem need to be invalidated, then userspace needs to iterate through all of them and do this kernel notification separately. This patch adds a new option that allows userspace to invalidate all the inodes with a single notification operation. In addition to invalidate all the inodes, it also shrinks the sb dcache. Signed-off-by: Luis Henriques <luis@igalia.com> --- fs/fuse/inode.c | 33 +++++++++++++++++++++++++++++++++ include/uapi/linux/fuse.h | 3 +++ 2 files changed, 36 insertions(+)