diff mbox

[1/3] fsnotify: Fix oops in fsnotify_clear_marks_by_group_flags()

Message ID 1436966481-12517-2-git-send-email-jack@suse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Kara July 15, 2015, 1:21 p.m. UTC
From: Jan Kara <jack@suse.cz>

fsnotify_clear_marks_by_group_flags() can race with
fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
mark_mutex, a mark from the list iterated by
fsnotify_clear_marks_by_group_flags() can be freed and we dereference
free memory in the loop there.

Fix the problem by keeping mark_mutex held in
fsnotify_destroy_mark_locked(). The reason why we drop that mutex is
that we need to call a ->freeing_mark() callback which may acquire
mark_mutex again. To avoid this and similar lock inversion issues, we
move the call to ->freeing_mark() callback to the kthread destroying the
mark.

Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
Suggested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/mark.c | 34 ++++++++++++++--------------------
 1 file changed, 14 insertions(+), 20 deletions(-)

Comments

Andrew Morton July 15, 2015, 8:41 p.m. UTC | #1
On Wed, 15 Jul 2015 15:21:19 +0200 Jan Kara <jack@suse.com> wrote:

> fsnotify_clear_marks_by_group_flags() can race with
> fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
> mark_mutex, a mark from the list iterated by
> fsnotify_clear_marks_by_group_flags() can be freed and we dereference
> free memory in the loop there.
> 
> Fix the problem by keeping mark_mutex held in
> fsnotify_destroy_mark_locked(). The reason why we drop that mutex is
> that we need to call a ->freeing_mark() callback which may acquire
> mark_mutex again. To avoid this and similar lock inversion issues, we
> move the call to ->freeing_mark() callback to the kthread destroying the
> mark.
> 
> Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
> Suggested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> Signed-off-by: Jan Kara <jack@suse.cz>

I added a cc:stable.  Please let me know if you omitted this
deliberately.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara July 16, 2015, 6:50 a.m. UTC | #2
On Wed 15-07-15 13:41:15, Andrew Morton wrote:
> On Wed, 15 Jul 2015 15:21:19 +0200 Jan Kara <jack@suse.com> wrote:
> 
> > fsnotify_clear_marks_by_group_flags() can race with
> > fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
> > mark_mutex, a mark from the list iterated by
> > fsnotify_clear_marks_by_group_flags() can be freed and we dereference
> > free memory in the loop there.
> > 
> > Fix the problem by keeping mark_mutex held in
> > fsnotify_destroy_mark_locked(). The reason why we drop that mutex is
> > that we need to call a ->freeing_mark() callback which may acquire
> > mark_mutex again. To avoid this and similar lock inversion issues, we
> > move the call to ->freeing_mark() callback to the kthread destroying the
> > mark.
> > 
> > Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
> > Suggested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> I added a cc:stable.  Please let me know if you omitted this
> deliberately.

No, that was a mistake on my side. Thanks for adding it.

								Honza
Kinglong Mee July 19, 2015, 10:21 a.m. UTC | #3
On 7/15/2015 21:21, Jan Kara wrote:
> From: Jan Kara <jack@suse.cz>
> 
> fsnotify_clear_marks_by_group_flags() can race with
> fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
> mark_mutex, a mark from the list iterated by
> fsnotify_clear_marks_by_group_flags() can be freed and we dereference
> free memory in the loop there.
> 
> Fix the problem by keeping mark_mutex held in
> fsnotify_destroy_mark_locked(). The reason why we drop that mutex is
> that we need to call a ->freeing_mark() callback which may acquire
> mark_mutex again. To avoid this and similar lock inversion issues, we
> move the call to ->freeing_mark() callback to the kthread destroying the
> mark.
> 
> Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
> Suggested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/notify/mark.c | 34 ++++++++++++++--------------------
>  1 file changed, 14 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/notify/mark.c b/fs/notify/mark.c
> index 92e48c70f0f0..3e594ce41010 100644
> --- a/fs/notify/mark.c
> +++ b/fs/notify/mark.c
> @@ -152,31 +152,15 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
>  		BUG();
>  
>  	list_del_init(&mark->g_list);
> -
>  	spin_unlock(&mark->lock);
>  
>  	if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED))
>  		iput(inode);
> -	/* release lock temporarily */
> -	mutex_unlock(&group->mark_mutex);
>  
>  	spin_lock(&destroy_lock);
>  	list_add(&mark->g_list, &destroy_list);
>  	spin_unlock(&destroy_lock);
>  	wake_up(&destroy_waitq);
> -	/*
> -	 * We don't necessarily have a ref on mark from caller so the above destroy
> -	 * may have actually freed it, unless this group provides a 'freeing_mark'
> -	 * function which must be holding a reference.
> -	 */
> -
> -	/*
> -	 * Some groups like to know that marks are being freed.  This is a
> -	 * callback to the group function to let it know that this mark
> -	 * is being freed.
> -	 */
> -	if (group->ops->freeing_mark)
> -		group->ops->freeing_mark(mark, group);
>  
>  	/*
>  	 * __fsnotify_update_child_dentry_flags(inode);
> @@ -191,8 +175,6 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
>  	 */
>  
>  	atomic_dec(&group->num_marks);
> -
> -	mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
>  }
>  
>  void fsnotify_destroy_mark(struct fsnotify_mark *mark,
> @@ -205,7 +187,10 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark,
>  
>  /*
>   * Destroy all marks in the given list. The marks must be already detached from
> - * the original inode / vfsmount.
> + * the original inode / vfsmount. Note that we can race with
> + * fsnotify_clear_marks_by_group_flags(). However we hold a reference to each
> + * mark so they won't get freed from under us and nobody else touches our
> + * free_list list_head.
>   */
>  void fsnotify_destroy_marks(struct list_head *to_free)
>  {
> @@ -406,7 +391,7 @@ struct fsnotify_mark *fsnotify_find_mark(struct hlist_head *head,
>  }
>  
>  /*
> - * clear any marks in a group in which mark->flags & flags is true
> + * Clear any marks in a group in which mark->flags & flags is true.
>   */
>  void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
>  					 unsigned int flags)
> @@ -460,6 +445,7 @@ static int fsnotify_mark_destroy(void *ignored)
>  {
>  	struct fsnotify_mark *mark, *next;
>  	struct list_head private_destroy_list;
> +	struct fsnotify_group *group;
>  
>  	for (;;) {
>  		spin_lock(&destroy_lock);
> @@ -471,6 +457,14 @@ static int fsnotify_mark_destroy(void *ignored)
>  
>  		list_for_each_entry_safe(mark, next, &private_destroy_list, g_list) {
>  			list_del_init(&mark->g_list);
> +			group = mark->group;
> +			/*
> +			 * Some groups like to know that marks are being freed.
> +			 * This is a callback to the group function to let it
> +			 * know that this mark is being freed.
> +			 */
> +			if (group && group->ops->freeing_mark)
> +				group->ops->freeing_mark(mark, group);
>  			fsnotify_put_mark(mark);
>  		}

With this patch, I got so many memleak notice, 

unreferenced object 0xffff880035bef640 (size 64):
  comm "fsnotify_mark", pid 26, jiffies 4294673717 (age 628.737s)
  hex dump (first 32 bytes):
    28 36 3f 76 00 88 ff ff 28 36 3f 76 00 88 ff ff  (6?v....(6?v....
    00 00 00 00 00 00 00 00 00 80 00 00 00 00 ad de  ................
  backtrace:
    [<ffffffff816cd34e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811ac6b5>] __kmalloc+0x1e5/0x290
    [<ffffffff81204f25>] inotify_handle_event+0x75/0x160
    [<ffffffff81205abc>] inotify_ignored_and_remove_idr+0x5c/0x80
    [<ffffffff8120505e>] inotify_freeing_mark+0xe/0x10
    [<ffffffff81203ca6>] fsnotify_mark_destroy+0xb6/0x150
    [<ffffffff810a4487>] kthread+0xd7/0xf0
    [<ffffffff816d92df>] ret_from_fork+0x3f/0x70
    [<ffffffffffffffff>] 0xffffffffffffffff

It is caused by ->freeing_mark() insert an event to group,
but snotify_put_mark() kfree the group without free the event.

thanks, 
Kinglong Mee
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara July 20, 2015, 2:46 p.m. UTC | #4
On Sun 19-07-15 18:21:49, Kinglong Mee wrote:
> On 7/15/2015 21:21, Jan Kara wrote:
> > From: Jan Kara <jack@suse.cz>
> > 
> > fsnotify_clear_marks_by_group_flags() can race with
> > fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
> > mark_mutex, a mark from the list iterated by
> > fsnotify_clear_marks_by_group_flags() can be freed and we dereference
> > free memory in the loop there.
> > 
> > Fix the problem by keeping mark_mutex held in
> > fsnotify_destroy_mark_locked(). The reason why we drop that mutex is
> > that we need to call a ->freeing_mark() callback which may acquire
> > mark_mutex again. To avoid this and similar lock inversion issues, we
> > move the call to ->freeing_mark() callback to the kthread destroying the
> > mark.
> > 
> > Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
> > Suggested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> > Signed-off-by: Jan Kara <jack@suse.cz>

<snip>

> With this patch, I got so many memleak notice, 
> 
> unreferenced object 0xffff880035bef640 (size 64):
>   comm "fsnotify_mark", pid 26, jiffies 4294673717 (age 628.737s)
>   hex dump (first 32 bytes):
>     28 36 3f 76 00 88 ff ff 28 36 3f 76 00 88 ff ff  (6?v....(6?v....
>     00 00 00 00 00 00 00 00 00 80 00 00 00 00 ad de  ................
>   backtrace:
>     [<ffffffff816cd34e>] kmemleak_alloc+0x4e/0xb0
>     [<ffffffff811ac6b5>] __kmalloc+0x1e5/0x290
>     [<ffffffff81204f25>] inotify_handle_event+0x75/0x160
>     [<ffffffff81205abc>] inotify_ignored_and_remove_idr+0x5c/0x80
>     [<ffffffff8120505e>] inotify_freeing_mark+0xe/0x10
>     [<ffffffff81203ca6>] fsnotify_mark_destroy+0xb6/0x150
>     [<ffffffff810a4487>] kthread+0xd7/0xf0
>     [<ffffffff816d92df>] ret_from_fork+0x3f/0x70
>     [<ffffffffffffffff>] 0xffffffffffffffff
> 
> It is caused by ->freeing_mark() insert an event to group,
> but snotify_put_mark() kfree the group without free the event.

Thanks for report! You are right that my patch introduces a race between
fsnotify kthread and fsnotify_destroy_group() which can result in leaking
inotify event on group destruction. I haven't yet decided whether the right
fix is not to queue events for dying notification group (as that is
pointless anyway) or whether we should just fix the original problem
differently... Whenever I look at fsnotify code mark handling I get lost in
the maze of locks, lists, and subtle differences between how different
notification systems handle notification marks :( I'll think about it over
night.

								Honza
Konstantin Khlebnikov July 20, 2015, 3:24 p.m. UTC | #5
On Sun, Jul 19, 2015 at 1:21 PM, Kinglong Mee <kinglongmee@gmail.com> wrote:
> On 7/15/2015 21:21, Jan Kara wrote:
>> From: Jan Kara <jack@suse.cz>
>>
>> fsnotify_clear_marks_by_group_flags() can race with
>> fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
>> mark_mutex, a mark from the list iterated by
>> fsnotify_clear_marks_by_group_flags() can be freed and we dereference
>> free memory in the loop there.
>>
>> Fix the problem by keeping mark_mutex held in
>> fsnotify_destroy_mark_locked(). The reason why we drop that mutex is
>> that we need to call a ->freeing_mark() callback which may acquire
>> mark_mutex again. To avoid this and similar lock inversion issues, we
>> move the call to ->freeing_mark() callback to the kthread destroying the
>> mark.
>>
>> Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
>> Suggested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
>> Signed-off-by: Jan Kara <jack@suse.cz>
>> ---
>>  fs/notify/mark.c | 34 ++++++++++++++--------------------
>>  1 file changed, 14 insertions(+), 20 deletions(-)
>>
>> diff --git a/fs/notify/mark.c b/fs/notify/mark.c
>> index 92e48c70f0f0..3e594ce41010 100644
>> --- a/fs/notify/mark.c
>> +++ b/fs/notify/mark.c
>> @@ -152,31 +152,15 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
>>               BUG();
>>
>>       list_del_init(&mark->g_list);
>> -
>>       spin_unlock(&mark->lock);
>>
>>       if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED))
>>               iput(inode);
>> -     /* release lock temporarily */
>> -     mutex_unlock(&group->mark_mutex);
>>
>>       spin_lock(&destroy_lock);
>>       list_add(&mark->g_list, &destroy_list);
>>       spin_unlock(&destroy_lock);
>>       wake_up(&destroy_waitq);
>> -     /*
>> -      * We don't necessarily have a ref on mark from caller so the above destroy
>> -      * may have actually freed it, unless this group provides a 'freeing_mark'
>> -      * function which must be holding a reference.
>> -      */
>> -
>> -     /*
>> -      * Some groups like to know that marks are being freed.  This is a
>> -      * callback to the group function to let it know that this mark
>> -      * is being freed.
>> -      */
>> -     if (group->ops->freeing_mark)
>> -             group->ops->freeing_mark(mark, group);
>>
>>       /*
>>        * __fsnotify_update_child_dentry_flags(inode);
>> @@ -191,8 +175,6 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
>>        */
>>
>>       atomic_dec(&group->num_marks);
>> -
>> -     mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
>>  }
>>
>>  void fsnotify_destroy_mark(struct fsnotify_mark *mark,
>> @@ -205,7 +187,10 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark,
>>
>>  /*
>>   * Destroy all marks in the given list. The marks must be already detached from
>> - * the original inode / vfsmount.
>> + * the original inode / vfsmount. Note that we can race with
>> + * fsnotify_clear_marks_by_group_flags(). However we hold a reference to each
>> + * mark so they won't get freed from under us and nobody else touches our
>> + * free_list list_head.
>>   */
>>  void fsnotify_destroy_marks(struct list_head *to_free)
>>  {
>> @@ -406,7 +391,7 @@ struct fsnotify_mark *fsnotify_find_mark(struct hlist_head *head,
>>  }
>>
>>  /*
>> - * clear any marks in a group in which mark->flags & flags is true
>> + * Clear any marks in a group in which mark->flags & flags is true.
>>   */
>>  void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
>>                                        unsigned int flags)
>> @@ -460,6 +445,7 @@ static int fsnotify_mark_destroy(void *ignored)
>>  {
>>       struct fsnotify_mark *mark, *next;
>>       struct list_head private_destroy_list;
>> +     struct fsnotify_group *group;
>>
>>       for (;;) {
>>               spin_lock(&destroy_lock);
>> @@ -471,6 +457,14 @@ static int fsnotify_mark_destroy(void *ignored)
>>
>>               list_for_each_entry_safe(mark, next, &private_destroy_list, g_list) {
>>                       list_del_init(&mark->g_list);
>> +                     group = mark->group;
>> +                     /*
>> +                      * Some groups like to know that marks are being freed.
>> +                      * This is a callback to the group function to let it
>> +                      * know that this mark is being freed.
>> +                      */
>> +                     if (group && group->ops->freeing_mark)
>> +                             group->ops->freeing_mark(mark, group);
>>                       fsnotify_put_mark(mark);
>>               }
>
> With this patch, I got so many memleak notice,
>
> unreferenced object 0xffff880035bef640 (size 64):
>   comm "fsnotify_mark", pid 26, jiffies 4294673717 (age 628.737s)
>   hex dump (first 32 bytes):
>     28 36 3f 76 00 88 ff ff 28 36 3f 76 00 88 ff ff  (6?v....(6?v....
>     00 00 00 00 00 00 00 00 00 80 00 00 00 00 ad de  ................
>   backtrace:
>     [<ffffffff816cd34e>] kmemleak_alloc+0x4e/0xb0
>     [<ffffffff811ac6b5>] __kmalloc+0x1e5/0x290
>     [<ffffffff81204f25>] inotify_handle_event+0x75/0x160
>     [<ffffffff81205abc>] inotify_ignored_and_remove_idr+0x5c/0x80
>     [<ffffffff8120505e>] inotify_freeing_mark+0xe/0x10
>     [<ffffffff81203ca6>] fsnotify_mark_destroy+0xb6/0x150
>     [<ffffffff810a4487>] kthread+0xd7/0xf0
>     [<ffffffff816d92df>] ret_from_fork+0x3f/0x70
>     [<ffffffffffffffff>] 0xffffffffffffffff
>
> It is caused by ->freeing_mark() insert an event to group,
> but snotify_put_mark() kfree the group without free the event.

Yep. I see the same leak. Now inotify_freeing_mark() is called after
fsnotify_flush_notify() -- nobody releases these events anymore.


>
> thanks,
> Kinglong Mee
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara July 21, 2015, 8:03 p.m. UTC | #6
On Mon 20-07-15 16:46:42, Jan Kara wrote:
> On Sun 19-07-15 18:21:49, Kinglong Mee wrote:
> > On 7/15/2015 21:21, Jan Kara wrote:
> > > From: Jan Kara <jack@suse.cz>
> > > 
> > > fsnotify_clear_marks_by_group_flags() can race with
> > > fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
> > > mark_mutex, a mark from the list iterated by
> > > fsnotify_clear_marks_by_group_flags() can be freed and we dereference
> > > free memory in the loop there.
> > > 
> > > Fix the problem by keeping mark_mutex held in
> > > fsnotify_destroy_mark_locked(). The reason why we drop that mutex is
> > > that we need to call a ->freeing_mark() callback which may acquire
> > > mark_mutex again. To avoid this and similar lock inversion issues, we
> > > move the call to ->freeing_mark() callback to the kthread destroying the
> > > mark.
> > > 
> > > Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
> > > Suggested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> <snip>
> 
> > With this patch, I got so many memleak notice, 
> > 
> > unreferenced object 0xffff880035bef640 (size 64):
> >   comm "fsnotify_mark", pid 26, jiffies 4294673717 (age 628.737s)
> >   hex dump (first 32 bytes):
> >     28 36 3f 76 00 88 ff ff 28 36 3f 76 00 88 ff ff  (6?v....(6?v....
> >     00 00 00 00 00 00 00 00 00 80 00 00 00 00 ad de  ................
> >   backtrace:
> >     [<ffffffff816cd34e>] kmemleak_alloc+0x4e/0xb0
> >     [<ffffffff811ac6b5>] __kmalloc+0x1e5/0x290
> >     [<ffffffff81204f25>] inotify_handle_event+0x75/0x160
> >     [<ffffffff81205abc>] inotify_ignored_and_remove_idr+0x5c/0x80
> >     [<ffffffff8120505e>] inotify_freeing_mark+0xe/0x10
> >     [<ffffffff81203ca6>] fsnotify_mark_destroy+0xb6/0x150
> >     [<ffffffff810a4487>] kthread+0xd7/0xf0
> >     [<ffffffff816d92df>] ret_from_fork+0x3f/0x70
> >     [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > It is caused by ->freeing_mark() insert an event to group,
> > but snotify_put_mark() kfree the group without free the event.
> 
> Thanks for report! You are right that my patch introduces a race between
> fsnotify kthread and fsnotify_destroy_group() which can result in leaking
> inotify event on group destruction. I haven't yet decided whether the right
> fix is not to queue events for dying notification group (as that is
> pointless anyway) or whether we should just fix the original problem
> differently... Whenever I look at fsnotify code mark handling I get lost in
> the maze of locks, lists, and subtle differences between how different
> notification systems handle notification marks :( I'll think about it over
> night.

OK, I have looked into the code some more and I found another relatively
simple way of fixing the original oops. It will be IMHO better than trying
to fixup this issue which has more potential for breakage. I'll ask Linus
to revert the fsnotify fix he already merged and send a new fix.

Andrew, the fsnotify patch "fsnotify: Make fsnotify_destroy_mark_locked()
safe without refcount" will stop applying after the revert happens. Just
drop it. I will send an updated version later as after spending a day in
fsnotify code I think I finally see a way how to get rid of
fsnotify_destroy_mark_locked() which is just too error-prone to use. But
that definitely isn't for this cycle.

								Honza
Jan Kara July 21, 2015, 8:35 p.m. UTC | #7
On Tue 21-07-15 22:03:39, Jan Kara wrote:
> On Mon 20-07-15 16:46:42, Jan Kara wrote:
> > On Sun 19-07-15 18:21:49, Kinglong Mee wrote:
> > > On 7/15/2015 21:21, Jan Kara wrote:
> > > > From: Jan Kara <jack@suse.cz>
> > > > 
> > > > fsnotify_clear_marks_by_group_flags() can race with
> > > > fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
> > > > mark_mutex, a mark from the list iterated by
> > > > fsnotify_clear_marks_by_group_flags() can be freed and we dereference
> > > > free memory in the loop there.
> > > > 
> > > > Fix the problem by keeping mark_mutex held in
> > > > fsnotify_destroy_mark_locked(). The reason why we drop that mutex is
> > > > that we need to call a ->freeing_mark() callback which may acquire
> > > > mark_mutex again. To avoid this and similar lock inversion issues, we
> > > > move the call to ->freeing_mark() callback to the kthread destroying the
> > > > mark.
> > > > 
> > > > Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
> > > > Suggested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> > > > Signed-off-by: Jan Kara <jack@suse.cz>
> > 
> > <snip>
> > 
> > > With this patch, I got so many memleak notice, 
> > > 
> > > unreferenced object 0xffff880035bef640 (size 64):
> > >   comm "fsnotify_mark", pid 26, jiffies 4294673717 (age 628.737s)
> > >   hex dump (first 32 bytes):
> > >     28 36 3f 76 00 88 ff ff 28 36 3f 76 00 88 ff ff  (6?v....(6?v....
> > >     00 00 00 00 00 00 00 00 00 80 00 00 00 00 ad de  ................
> > >   backtrace:
> > >     [<ffffffff816cd34e>] kmemleak_alloc+0x4e/0xb0
> > >     [<ffffffff811ac6b5>] __kmalloc+0x1e5/0x290
> > >     [<ffffffff81204f25>] inotify_handle_event+0x75/0x160
> > >     [<ffffffff81205abc>] inotify_ignored_and_remove_idr+0x5c/0x80
> > >     [<ffffffff8120505e>] inotify_freeing_mark+0xe/0x10
> > >     [<ffffffff81203ca6>] fsnotify_mark_destroy+0xb6/0x150
> > >     [<ffffffff810a4487>] kthread+0xd7/0xf0
> > >     [<ffffffff816d92df>] ret_from_fork+0x3f/0x70
> > >     [<ffffffffffffffff>] 0xffffffffffffffff
> > > 
> > > It is caused by ->freeing_mark() insert an event to group,
> > > but snotify_put_mark() kfree the group without free the event.
> > 
> > Thanks for report! You are right that my patch introduces a race between
> > fsnotify kthread and fsnotify_destroy_group() which can result in leaking
> > inotify event on group destruction. I haven't yet decided whether the right
> > fix is not to queue events for dying notification group (as that is
> > pointless anyway) or whether we should just fix the original problem
> > differently... Whenever I look at fsnotify code mark handling I get lost in
> > the maze of locks, lists, and subtle differences between how different
> > notification systems handle notification marks :( I'll think about it over
> > night.
> 
> OK, I have looked into the code some more and I found another relatively
> simple way of fixing the original oops. It will be IMHO better than trying
> to fixup this issue which has more potential for breakage. I'll ask Linus
> to revert the fsnotify fix he already merged and send a new fix.

Linus, please revert commit

a2673b6e0406 fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()

It fixes the oops but introduces a race which can leak event structure on
group destruction. I'll fix the oops in a different way and it would
basically undo what this patch did anyway. Thanks!

								Honza
Jan Kara July 21, 2015, 8:36 p.m. UTC | #8
On Mon 20-07-15 16:46:42, Jan Kara wrote:
> On Sun 19-07-15 18:21:49, Kinglong Mee wrote:
> > On 7/15/2015 21:21, Jan Kara wrote:
> > > From: Jan Kara <jack@suse.cz>
> > > 
> > > fsnotify_clear_marks_by_group_flags() can race with
> > > fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
> > > mark_mutex, a mark from the list iterated by
> > > fsnotify_clear_marks_by_group_flags() can be freed and we dereference
> > > free memory in the loop there.
> > > 
> > > Fix the problem by keeping mark_mutex held in
> > > fsnotify_destroy_mark_locked(). The reason why we drop that mutex is
> > > that we need to call a ->freeing_mark() callback which may acquire
> > > mark_mutex again. To avoid this and similar lock inversion issues, we
> > > move the call to ->freeing_mark() callback to the kthread destroying the
> > > mark.
> > > 
> > > Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
> > > Suggested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> <snip>
> 
> > With this patch, I got so many memleak notice, 
> > 
> > unreferenced object 0xffff880035bef640 (size 64):
> >   comm "fsnotify_mark", pid 26, jiffies 4294673717 (age 628.737s)
> >   hex dump (first 32 bytes):
> >     28 36 3f 76 00 88 ff ff 28 36 3f 76 00 88 ff ff  (6?v....(6?v....
> >     00 00 00 00 00 00 00 00 00 80 00 00 00 00 ad de  ................
> >   backtrace:
> >     [<ffffffff816cd34e>] kmemleak_alloc+0x4e/0xb0
> >     [<ffffffff811ac6b5>] __kmalloc+0x1e5/0x290
> >     [<ffffffff81204f25>] inotify_handle_event+0x75/0x160
> >     [<ffffffff81205abc>] inotify_ignored_and_remove_idr+0x5c/0x80
> >     [<ffffffff8120505e>] inotify_freeing_mark+0xe/0x10
> >     [<ffffffff81203ca6>] fsnotify_mark_destroy+0xb6/0x150
> >     [<ffffffff810a4487>] kthread+0xd7/0xf0
> >     [<ffffffff816d92df>] ret_from_fork+0x3f/0x70
> >     [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > It is caused by ->freeing_mark() insert an event to group,
> > but snotify_put_mark() kfree the group without free the event.
> 
> Thanks for report! You are right that my patch introduces a race between
> fsnotify kthread and fsnotify_destroy_group() which can result in leaking
> inotify event on group destruction. I haven't yet decided whether the right
> fix is not to queue events for dying notification group (as that is
> pointless anyway) or whether we should just fix the original problem
> differently... Whenever I look at fsnotify code mark handling I get lost in
> the maze of locks, lists, and subtle differences between how different
> notification systems handle notification marks :( I'll think about it over
> night.

Bah, my sent emails were just getting queued for last two days. Sorry for
the delay..

								Honza
Linus Torvalds July 21, 2015, 11:14 p.m. UTC | #9
On Tue, Jul 21, 2015 at 1:35 PM, Jan Kara <jack@suse.cz> wrote:
>
> Linus, please revert commit
>
> a2673b6e0406 fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()
>
> It fixes the oops but introduces a race which can leak event structure on
> group destruction. I'll fix the oops in a different way and it would
> basically undo what this patch did anyway. Thanks!

Ok, it is reverted in my tree now (and doing a build before pushing it
out). However, that original commit was also marked for stable, so we
need to kill it there.

Greg, is there some protocol to kill stable entries? I did *not* mark
the revert for stable, because I'm hoping the original never made it
into the queue anyway (it was committed last Friday), but I don't know
of any way to notify stable except this kind of "hey guys, do not
apply that particular patch".

                  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg KH July 22, 2015, 12:26 a.m. UTC | #10
On Tue, Jul 21, 2015 at 04:14:00PM -0700, Linus Torvalds wrote:
> On Tue, Jul 21, 2015 at 1:35 PM, Jan Kara <jack@suse.cz> wrote:
> >
> > Linus, please revert commit
> >
> > a2673b6e0406 fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()
> >
> > It fixes the oops but introduces a race which can leak event structure on
> > group destruction. I'll fix the oops in a different way and it would
> > basically undo what this patch did anyway. Thanks!
> 
> Ok, it is reverted in my tree now (and doing a build before pushing it
> out). However, that original commit was also marked for stable, so we
> need to kill it there.
> 
> Greg, is there some protocol to kill stable entries? I did *not* mark
> the revert for stable, because I'm hoping the original never made it
> into the queue anyway (it was committed last Friday), but I don't know
> of any way to notify stable except this kind of "hey guys, do not
> apply that particular patch".

That's the whole protocol, I'll go drop it from my queue now, it never
made it into a stable release just yet.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 92e48c70f0f0..3e594ce41010 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -152,31 +152,15 @@  void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
 		BUG();
 
 	list_del_init(&mark->g_list);
-
 	spin_unlock(&mark->lock);
 
 	if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED))
 		iput(inode);
-	/* release lock temporarily */
-	mutex_unlock(&group->mark_mutex);
 
 	spin_lock(&destroy_lock);
 	list_add(&mark->g_list, &destroy_list);
 	spin_unlock(&destroy_lock);
 	wake_up(&destroy_waitq);
-	/*
-	 * We don't necessarily have a ref on mark from caller so the above destroy
-	 * may have actually freed it, unless this group provides a 'freeing_mark'
-	 * function which must be holding a reference.
-	 */
-
-	/*
-	 * Some groups like to know that marks are being freed.  This is a
-	 * callback to the group function to let it know that this mark
-	 * is being freed.
-	 */
-	if (group->ops->freeing_mark)
-		group->ops->freeing_mark(mark, group);
 
 	/*
 	 * __fsnotify_update_child_dentry_flags(inode);
@@ -191,8 +175,6 @@  void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
 	 */
 
 	atomic_dec(&group->num_marks);
-
-	mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
 }
 
 void fsnotify_destroy_mark(struct fsnotify_mark *mark,
@@ -205,7 +187,10 @@  void fsnotify_destroy_mark(struct fsnotify_mark *mark,
 
 /*
  * Destroy all marks in the given list. The marks must be already detached from
- * the original inode / vfsmount.
+ * the original inode / vfsmount. Note that we can race with
+ * fsnotify_clear_marks_by_group_flags(). However we hold a reference to each
+ * mark so they won't get freed from under us and nobody else touches our
+ * free_list list_head.
  */
 void fsnotify_destroy_marks(struct list_head *to_free)
 {
@@ -406,7 +391,7 @@  struct fsnotify_mark *fsnotify_find_mark(struct hlist_head *head,
 }
 
 /*
- * clear any marks in a group in which mark->flags & flags is true
+ * Clear any marks in a group in which mark->flags & flags is true.
  */
 void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
 					 unsigned int flags)
@@ -460,6 +445,7 @@  static int fsnotify_mark_destroy(void *ignored)
 {
 	struct fsnotify_mark *mark, *next;
 	struct list_head private_destroy_list;
+	struct fsnotify_group *group;
 
 	for (;;) {
 		spin_lock(&destroy_lock);
@@ -471,6 +457,14 @@  static int fsnotify_mark_destroy(void *ignored)
 
 		list_for_each_entry_safe(mark, next, &private_destroy_list, g_list) {
 			list_del_init(&mark->g_list);
+			group = mark->group;
+			/*
+			 * Some groups like to know that marks are being freed.
+			 * This is a callback to the group function to let it
+			 * know that this mark is being freed.
+			 */
+			if (group && group->ops->freeing_mark)
+				group->ops->freeing_mark(mark, group);
 			fsnotify_put_mark(mark);
 		}