Message ID | 20161222091538.28702-14-jack@suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote: > fanotify wants to drop fsnotify_mark_srcu lock when waiting for response > from userspace so that the whole notification subsystem is not blocked > during that time. This patch provides a framework for safely getting > mark reference for a mark found in the object list which pins the mark > in that list. We can then drop fsnotify_mark_srcu, wait for userspace > response and then safely continue iteration of the object list once we > reaquire fsnotify_mark_srcu. > > Signed-off-by: Jan Kara <jack@suse.cz> Looks good Reviewed-by: Amir Goldstein <amir73il@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote: > fanotify wants to drop fsnotify_mark_srcu lock when waiting for response > from userspace so that the whole notification subsystem is not blocked > during that time. This patch provides a framework for safely getting > mark reference for a mark found in the object list which pins the mark > in that list. We can then drop fsnotify_mark_srcu, wait for userspace > response and then safely continue iteration of the object list once we > reaquire fsnotify_mark_srcu. > > Signed-off-by: Jan Kara <jack@suse.cz> > --- ... > + /* > + * Now that both marks are pinned by refcount we can drop SRCU lock. > + * Marks can still be removed from the list but because of refcount > + * they cannot be destroyed and we can safely resume the list iteration > + * once userspace returns. > + */ Sorry, forgot to comment on this. "Marks can still be removed from the list ... ... and we can safely resume the list iteration" I suppose you are plannig to get the mechanics right, by replacing hlist_del_init() with just __hlist_del() ?? but this sentence is confusing. Usually, it wouldn't be safe to resume iteration if items may have been removed, so perhaps rephrase or clarify. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> +bool fsnotify_prepare_user_wait(struct fsnotify_mark *inode_mark, > + struct fsnotify_mark *vfsmount_mark, > + int *srcu_idx) > +{ > + struct fsnotify_group *group; > + > + if (WARN_ON_ONCE(!inode_mark && !vfsmount_mark)) > + return false; > + > + if (inode_mark) > + group = inode_mark->group; > + else > + group = vfsmount_mark->group; > + > + /* > + * Since acquisition of mark reference is an atomic op as well, we can > + * be sure this inc is seen before any effect of refcount increment. > + */ > + atomic_inc(&group->user_waits); > + > + if (inode_mark) { > + /* This can fail if mark is being removed */ > + if (!fsnotify_get_mark_safe(inode_mark)) > + goto out_wait; > + } > + if (vfsmount_mark) { > + if (!fsnotify_get_mark_safe(vfsmount_mark)) > + goto out_inode; > + } > + > + /* > + * Now that both marks are pinned by refcount we can drop SRCU lock. > + * Marks can still be removed from the list but because of refcount > + * they cannot be destroyed and we can safely resume the list iteration > + * once userspace returns. > + */ Jan, Forgive me for hijacking this review for yet another cleanup proposal. When I first looked at this function I thought: "<sigh> again with those inode_mark, vfsmount_mark args.. oh well" but then I took another look and it suddenly seems quite simple to get rid of all the places that get passed these 2 args and simplify all of them and mostly simplify send_to_group(). The plan is: 1. Return 1 from handle_event() => send_to_group() if event was "dropped" by group. 2. backends may return "dropped" for several reasons (e.g. non-dir inode in dnotify), but the only interesting case to return "dropped" is in fanotify if (!fanotify_should_send_event()), because only fanotify supports vfsmount_mark 3. in fsnotify(), if (inode_group == vfsmount_group), pass only vfsmount_mark to send_to_group() and check for "dropped" event. if event was dropped, set inode_group = NULL, so inode_mark is iterated again. if event wasn't dropped, there is no reason to call send_to_group() again with inode_mark. This logic change incurs a behavior change because fanotify_should_send_event() for some reason combines the inode and vfsmount mark ignore masks to a single unified ignore mask, so an ignore bit in just one of them would today cause not sending event to group. However, from reading the man page, this seems like a bug rather then desired behavior, because the ignore mask should be relative to the object in question and it should be cleared when the object in question is modified and in that sense a mount is a completely different object than an inode, so their ignore masks should be independent as well and not unified. I will send out a POC patch for you to consider along with the rest of your very neat cleanups. Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon 26-12-16 17:11:29, Amir Goldstein wrote: > On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote: > > fanotify wants to drop fsnotify_mark_srcu lock when waiting for response > > from userspace so that the whole notification subsystem is not blocked > > during that time. This patch provides a framework for safely getting > > mark reference for a mark found in the object list which pins the mark > > in that list. We can then drop fsnotify_mark_srcu, wait for userspace > > response and then safely continue iteration of the object list once we > > reaquire fsnotify_mark_srcu. > > > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > ... > > + /* > > + * Now that both marks are pinned by refcount we can drop SRCU lock. > > + * Marks can still be removed from the list but because of refcount > > + * they cannot be destroyed and we can safely resume the list iteration > > + * once userspace returns. > > + */ > > Sorry, forgot to comment on this. > "Marks can still be removed from the list ... > ... and we can safely resume the list iteration" > > I suppose you are plannig to get the mechanics right, by replacing > hlist_del_init() with just __hlist_del() ?? but this sentence is confusing. > Usually, it wouldn't be safe to resume iteration if items may have been removed, > so perhaps rephrase or clarify. The point is that marks that have refcount elevated cannot be even removed from the list we iterate (that happens only once the last reference is dropped). That is the reason why we are safe to resume the iteration... Honza
On Wed, Jan 4, 2017 at 11:03 AM, Jan Kara <jack@suse.cz> wrote: > On Mon 26-12-16 17:11:29, Amir Goldstein wrote: >> On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote: >> > fanotify wants to drop fsnotify_mark_srcu lock when waiting for response >> > from userspace so that the whole notification subsystem is not blocked >> > during that time. This patch provides a framework for safely getting >> > mark reference for a mark found in the object list which pins the mark >> > in that list. We can then drop fsnotify_mark_srcu, wait for userspace >> > response and then safely continue iteration of the object list once we >> > reaquire fsnotify_mark_srcu. >> > >> > Signed-off-by: Jan Kara <jack@suse.cz> >> > --- >> ... >> > + /* >> > + * Now that both marks are pinned by refcount we can drop SRCU lock. >> > + * Marks can still be removed from the list but because of refcount >> > + * they cannot be destroyed and we can safely resume the list iteration >> > + * once userspace returns. >> > + */ >> >> Sorry, forgot to comment on this. >> "Marks can still be removed from the list ... >> ... and we can safely resume the list iteration" >> >> I suppose you are plannig to get the mechanics right, by replacing >> hlist_del_init() with just __hlist_del() ?? but this sentence is confusing. >> Usually, it wouldn't be safe to resume iteration if items may have been removed, >> so perhaps rephrase or clarify. > > The point is that marks that have refcount elevated cannot be even removed > from the list we iterate (that happens only once the last reference is > dropped). That is the reason why we are safe to resume the iteration... > Well, if they "cannot be even removed" then the comment above that says "Marks can still be removed... but cannot be destroyed" is inaccurate or at the very least confusing. Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed 04-01-17 12:50:48, Amir Goldstein wrote: > On Wed, Jan 4, 2017 at 11:03 AM, Jan Kara <jack@suse.cz> wrote: > > On Mon 26-12-16 17:11:29, Amir Goldstein wrote: > >> On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote: > >> > fanotify wants to drop fsnotify_mark_srcu lock when waiting for response > >> > from userspace so that the whole notification subsystem is not blocked > >> > during that time. This patch provides a framework for safely getting > >> > mark reference for a mark found in the object list which pins the mark > >> > in that list. We can then drop fsnotify_mark_srcu, wait for userspace > >> > response and then safely continue iteration of the object list once we > >> > reaquire fsnotify_mark_srcu. > >> > > >> > Signed-off-by: Jan Kara <jack@suse.cz> > >> > --- > >> ... > >> > + /* > >> > + * Now that both marks are pinned by refcount we can drop SRCU lock. > >> > + * Marks can still be removed from the list but because of refcount > >> > + * they cannot be destroyed and we can safely resume the list iteration > >> > + * once userspace returns. > >> > + */ > >> > >> Sorry, forgot to comment on this. > >> "Marks can still be removed from the list ... > >> ... and we can safely resume the list iteration" > >> > >> I suppose you are plannig to get the mechanics right, by replacing > >> hlist_del_init() with just __hlist_del() ?? but this sentence is confusing. > >> Usually, it wouldn't be safe to resume iteration if items may have been removed, > >> so perhaps rephrase or clarify. > > > > The point is that marks that have refcount elevated cannot be even removed > > from the list we iterate (that happens only once the last reference is > > dropped). That is the reason why we are safe to resume the iteration... > > > > Well, if they "cannot be even removed" then the comment above that says > "Marks can still be removed... but cannot be destroyed" is inaccurate or > at the very least confusing. Good spotting. The comment is just wrong (it used to be like that in previous version of the patch set but not anymore). I'll fix it. Honza
diff --git a/fs/notify/group.c b/fs/notify/group.c index 0fb4aadcc19f..79439cdf16e0 100644 --- a/fs/notify/group.c +++ b/fs/notify/group.c @@ -126,6 +126,7 @@ struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops) /* set to 0 when there a no external references to this group */ atomic_set(&group->refcnt, 1); atomic_set(&group->num_marks, 0); + atomic_set(&group->user_waits, 0); spin_lock_init(&group->notification_lock); INIT_LIST_HEAD(&group->notification_list); diff --git a/fs/notify/mark.c b/fs/notify/mark.c index fee4255e9227..c5c1dcc8fa00 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -109,6 +109,16 @@ void fsnotify_get_mark(struct fsnotify_mark *mark) atomic_inc(&mark->refcnt); } +/* + * Get mark reference when we found the mark via lockless traversal of object + * list. Mark can be already removed from the list by now and on its way to be + * destroyed once SRCU period ends. + */ +static bool fsnotify_get_mark_safe(struct fsnotify_mark *mark) +{ + return atomic_inc_not_zero(&mark->refcnt); +} + static void __fsnotify_recalc_mask(struct fsnotify_mark_list *list) { u32 new_mask = 0; @@ -248,6 +258,77 @@ void fsnotify_put_mark(struct fsnotify_mark *mark) } } +bool fsnotify_prepare_user_wait(struct fsnotify_mark *inode_mark, + struct fsnotify_mark *vfsmount_mark, + int *srcu_idx) +{ + struct fsnotify_group *group; + + if (WARN_ON_ONCE(!inode_mark && !vfsmount_mark)) + return false; + + if (inode_mark) + group = inode_mark->group; + else + group = vfsmount_mark->group; + + /* + * Since acquisition of mark reference is an atomic op as well, we can + * be sure this inc is seen before any effect of refcount increment. + */ + atomic_inc(&group->user_waits); + + if (inode_mark) { + /* This can fail if mark is being removed */ + if (!fsnotify_get_mark_safe(inode_mark)) + goto out_wait; + } + if (vfsmount_mark) { + if (!fsnotify_get_mark_safe(vfsmount_mark)) + goto out_inode; + } + + /* + * Now that both marks are pinned by refcount we can drop SRCU lock. + * Marks can still be removed from the list but because of refcount + * they cannot be destroyed and we can safely resume the list iteration + * once userspace returns. + */ + srcu_read_unlock(&fsnotify_mark_srcu, *srcu_idx); + + return true; +out_inode: + if (inode_mark) + fsnotify_put_mark(inode_mark); +out_wait: + if (atomic_dec_and_test(&group->user_waits) && group->shutdown) + wake_up(&group->notification_waitq); + return false; +} + +void fsnotify_finish_user_wait(struct fsnotify_mark *inode_mark, + struct fsnotify_mark *vfsmount_mark, + int *srcu_idx) +{ + struct fsnotify_group *group = NULL; + + *srcu_idx = srcu_read_lock(&fsnotify_mark_srcu); + if (inode_mark) { + group = inode_mark->group; + fsnotify_put_mark(inode_mark); + } + if (vfsmount_mark) { + group = vfsmount_mark->group; + fsnotify_put_mark(vfsmount_mark); + } + /* + * We abuse notification_waitq on group shutdown for waiting for all + * marks pinned when waiting for userspace. + */ + if (atomic_dec_and_test(&group->user_waits) && group->shutdown) + wake_up(&group->notification_waitq); +} + /* * Mark mark as dead, remove it from group list. Mark still stays in object * list until its last reference is dropped. Note that we rely on mark being @@ -636,6 +717,12 @@ void fsnotify_detach_group_marks(struct fsnotify_group *group) fsnotify_free_mark(mark); fsnotify_put_mark(mark); } + /* + * Some marks can still be pinned when waiting for response from + * userspace. Wait for those now. fsnotify_prepare_user_wait() will + * not succeed now so this wait is race-free. + */ + wait_event(group->notification_waitq, !atomic_read(&group->user_waits)); } /* Destroy all marks attached to inode / vfsmount */ diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 76b3c34172c7..27223e254e00 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -162,6 +162,8 @@ struct fsnotify_group { struct fsnotify_event *overflow_event; /* Event we queue when the * notification list is too * full */ + atomic_t user_waits; /* Number of tasks waiting for user + * response */ /* groups can define private fields here or use the void *private */ union { @@ -367,6 +369,12 @@ extern void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group, un extern void fsnotify_get_mark(struct fsnotify_mark *mark); extern void fsnotify_put_mark(struct fsnotify_mark *mark); extern void fsnotify_unmount_inodes(struct super_block *sb); +extern void fsnotify_finish_user_wait(struct fsnotify_mark *inode_mark, + struct fsnotify_mark *vfsmount_mark, + int *srcu_idx); +extern bool fsnotify_prepare_user_wait(struct fsnotify_mark *inode_mark, + struct fsnotify_mark *vfsmount_mark, + int *srcu_idx); /* put here because inotify does some weird stuff when destroying watches */ extern void fsnotify_init_event(struct fsnotify_event *event,
fanotify wants to drop fsnotify_mark_srcu lock when waiting for response from userspace so that the whole notification subsystem is not blocked during that time. This patch provides a framework for safely getting mark reference for a mark found in the object list which pins the mark in that list. We can then drop fsnotify_mark_srcu, wait for userspace response and then safely continue iteration of the object list once we reaquire fsnotify_mark_srcu. Signed-off-by: Jan Kara <jack@suse.cz> --- fs/notify/group.c | 1 + fs/notify/mark.c | 87 ++++++++++++++++++++++++++++++++++++++++ include/linux/fsnotify_backend.h | 8 ++++ 3 files changed, 96 insertions(+)