Re: Re: [PATCH] fsnotify: fix a crash due to invalid virtual address
diff mbox

Message ID 166776534.176181435140830045.JavaMail.weblogic@ep2mlwas04a
State New
Headers show

Commit Message

Ashish Sangwan June 24, 2015, 10:13 a.m. UTC
Hi Jan,

> On Tue 23-06-15 12:05:51, Ashish Sangwan wrote:

> > > Looking into this in more detail, it might be worthwhile to revisit how

> > > mark_mutex is used since at least fanotify and dnotify use it for more than

> > > just a protection of list of group marks and untangling this would simplify

> > > things. But that's a longer term goal.

> > > 

> > > A relatively simple fix for your issue is to split group list of marks into

> > > a list of inode marks and a list of mount marks. Then destroying becomes

> > > much simpler because we always discard the whole list (or both of them) and

> > > we can easily avoid problems with list corruption when dropping the

> > > mark_mutex. I can write the patch later or you can do that if you are

> > Sorry I could not understand why the group's list of marks needs to be split.

> > I was browsing through the old code, from the days mark_mutex was not present

> > and it looked like below:

> > void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,

> > 					 unsigned int flags)

> > {

> > 	struct fsnotify_mark *lmark, *mark;

> > 	LIST_HEAD(free_list);

> > 

> > 	spin_lock(&group->mark_lock);

> > 	list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) {

> > 		if (mark->flags & flags) {

> > 			list_add(&mark->free_g_list, &free_list);

> > 			list_del_init(&mark->g_list);

> > 			fsnotify_get_mark(mark);

> > 		}

> > 	}

> > 	spin_unlock(&group->mark_lock);

> > 

> > 	list_for_each_entry_safe(mark, lmark, &free_list, free_g_list) {

> > 		fsnotify_destroy_mark(mark);

> > 		fsnotify_put_mark(mark);

> > 	}

> > }

> > How about using a temporary onstack list_head like above?

> 

> So we can use a temporary list_head for entries selected for destruction as

> well. *But* in either case you have to be careful because even the

> temporary free_list can be modified by concurrent mark destruction e.g. from

> fsnotify_free_marks_by_inode() call as each mark is in two lists - one

> hanging from inode / mount and one hanging from the notification group.

True.
> 

> Actually, looking into it even fsnotify_destroy_marks() doesn't seem to be

> properly protected from the race. Sigh.

True.

So anything which is calling fsnotify_destroy_mark is not protected against
concurrent deletion from fsnotify_clear_marks_by_group_flags() and vice versa.
Plus, as you rightly pointed, we have both the inode mark and vfsmount sharing the same list.
So even if fsnotify_clear_marks_by_group_flags is for removing inode mark, a
parallel fsnotify_destroy_mark for vfsmount can cause crash as they are sharing
same list.
Can you check below patch? It is untested, just want to know if the approach is
correct or not. If it seems ok, I can send a tested patch later.

Comments

Jan Kara June 24, 2015, 11:22 a.m. UTC | #1
Hello,

On Wed 24-06-15 10:13:50, Ashish Sangwan wrote:
> > On Tue 23-06-15 12:05:51, Ashish Sangwan wrote:
> > > > Looking into this in more detail, it might be worthwhile to revisit how
> > > > mark_mutex is used since at least fanotify and dnotify use it for more than
> > > > just a protection of list of group marks and untangling this would simplify
> > > > things. But that's a longer term goal.
> > > > 
> > > > A relatively simple fix for your issue is to split group list of marks into
> > > > a list of inode marks and a list of mount marks. Then destroying becomes
> > > > much simpler because we always discard the whole list (or both of them) and
> > > > we can easily avoid problems with list corruption when dropping the
> > > > mark_mutex. I can write the patch later or you can do that if you are
> > > Sorry I could not understand why the group's list of marks needs to be split.
> > > I was browsing through the old code, from the days mark_mutex was not present
> > > and it looked like below:
> > > void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
> > > 					 unsigned int flags)
> > > {
> > > 	struct fsnotify_mark *lmark, *mark;
> > > 	LIST_HEAD(free_list);
> > > 
> > > 	spin_lock(&group->mark_lock);
> > > 	list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) {
> > > 		if (mark->flags & flags) {
> > > 			list_add(&mark->free_g_list, &free_list);
> > > 			list_del_init(&mark->g_list);
> > > 			fsnotify_get_mark(mark);
> > > 		}
> > > 	}
> > > 	spin_unlock(&group->mark_lock);
> > > 
> > > 	list_for_each_entry_safe(mark, lmark, &free_list, free_g_list) {
> > > 		fsnotify_destroy_mark(mark);
> > > 		fsnotify_put_mark(mark);
> > > 	}
> > > }
> > > How about using a temporary onstack list_head like above?
> > 
> > So we can use a temporary list_head for entries selected for destruction as
> > well. *But* in either case you have to be careful because even the
> > temporary free_list can be modified by concurrent mark destruction e.g. from
> > fsnotify_free_marks_by_inode() call as each mark is in two lists - one
> > hanging from inode / mount and one hanging from the notification group.
> True.
> > 
> > Actually, looking into it even fsnotify_destroy_marks() doesn't seem to be
> > properly protected from the race. Sigh.
> True.
> 
> So anything which is calling fsnotify_destroy_mark is not protected against
> concurrent deletion from fsnotify_clear_marks_by_group_flags() and vice versa.
> Plus, as you rightly pointed, we have both the inode mark and vfsmount sharing the same list.
> So even if fsnotify_clear_marks_by_group_flags is for removing inode mark, a
> parallel fsnotify_destroy_mark for vfsmount can cause crash as they are sharing
> same list.

So I was looking more into the issue today and fsnotify_destroy_marks() is
fine in the end since it holds a reference to each mark (so it cannot be
freed) and it also uses a special free_list list_head within mark for its
temporary list so nobody removes any mark from that list.

> diff --git a/fs/notify/mark.c b/fs/notify/mark.c
> index d90deaa..d83ec7d 100644
> --- a/fs/notify/mark.c
> +++ b/fs/notify/mark.c
> @@ -124,14 +124,6 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
> 
>         spin_lock(&mark->lock);
> 
> -       /* something else already called this function on this mark */
> -       if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE)) {
> -               spin_unlock(&mark->lock);
> -               return;
> -       }
> -
> -       mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
> -
>         if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) {
>                 inode = mark->i.inode;
>                 fsnotify_destroy_inode_mark(mark);
> @@ -188,7 +180,10 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark,
>                            struct fsnotify_group *group)
>  {
>         mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
> -       fsnotify_destroy_mark_locked(mark, group);
> +       if (mark->flags & FSNOTIFY_MARK_FLAG_ALIVE) {
> +               mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
> +               fsnotify_destroy_mark_locked(mark, group);
> +       }
>         mutex_unlock(&group->mark_mutex);
>  }
> 
> @@ -293,14 +288,27 @@ void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
>                                          unsigned int flags)
>  {
>         struct fsnotify_mark *lmark, *mark;
> -
> +       LIST_HEAD(free_list);
> +
>         mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
> -       list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) {
> +       /* Pass 1 : clear the alive flag and move to free list */
> +       list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list)
>                 if (mark->flags & flags) {
> +                       /*
> +                        * If the mark is present on group's mark list
> +                        * it has to be alive.
> +                        */
> +                       WARN_ON(!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE));
> +                       list_del_init(&mark->g_list);
> +                       list_add(&mark->g_list, &free_list);
> +                       mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
> +               }
> +
> +       /* Pass 2: remove mark */
> +       list_for_each_entry_safe(mark, lmark, &free_list, g_list) {
>                         fsnotify_get_mark(mark);
>                         fsnotify_destroy_mark_locked(mark, group);
>                         fsnotify_put_mark(mark);
> -               }
>         }
>         mutex_unlock(&group->mark_mutex);
>  }

So this should fix the problem as well. However the lifetime rules are so
convoluted that I'm thinking how to make things simpler (especially to
unify how clearing via inode/mount and via group works). When investigating
the code I also found out there are possible issues with mark->inode
getting cleared while we are delivering some event. So I'm looking into
fixing all the problems.

									Honza

Patch
diff mbox

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index d90deaa..d83ec7d 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -124,14 +124,6 @@  void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,

        spin_lock(&mark->lock);

-       /* something else already called this function on this mark */
-       if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE)) {
-               spin_unlock(&mark->lock);
-               return;
-       }
-
-       mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
-
        if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) {
                inode = mark->i.inode;
                fsnotify_destroy_inode_mark(mark);
@@ -188,7 +180,10 @@  void fsnotify_destroy_mark(struct fsnotify_mark *mark,
                           struct fsnotify_group *group)
 {
        mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
-       fsnotify_destroy_mark_locked(mark, group);
+       if (mark->flags & FSNOTIFY_MARK_FLAG_ALIVE) {
+               mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
+               fsnotify_destroy_mark_locked(mark, group);
+       }
        mutex_unlock(&group->mark_mutex);
 }

@@ -293,14 +288,27 @@  void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
                                         unsigned int flags)
 {
        struct fsnotify_mark *lmark, *mark;
-
+       LIST_HEAD(free_list);
+
        mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
-       list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) {
+       /* Pass 1 : clear the alive flag and move to free list */
+       list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list)
                if (mark->flags & flags) {
+                       /*
+                        * If the mark is present on group's mark list
+                        * it has to be alive.
+                        */
+                       WARN_ON(!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE));
+                       list_del_init(&mark->g_list);
+                       list_add(&mark->g_list, &free_list);
+                       mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
+               }
+
+       /* Pass 2: remove mark */
+       list_for_each_entry_safe(mark, lmark, &free_list, g_list) {
                        fsnotify_get_mark(mark);
                        fsnotify_destroy_mark_locked(mark, group);
                        fsnotify_put_mark(mark);
-               }
        }
        mutex_unlock(&group->mark_mutex);
 }