diff mbox series

mm: cleancache: fix potential race in cleancache apis

Message ID 20210630073310epcms1p2ad6803cfd9dbc8ab501c4c99f799f4da@epcms1p2 (mailing list archive)
State New
Headers show
Series mm: cleancache: fix potential race in cleancache apis | expand

Commit Message

권오훈 June 30, 2021, 7:33 a.m. UTC
Current cleancache api implementation has potential race as follows,
which might lead to corruption in filesystems using cleancache.

thread 0                thread 1                        thread 2

                        in put_page
                        get pool_id K for fs1
invalidate_fs on fs1
frees pool_id K
                                                        init_fs for fs2
                                                        allocates pool_id K
                        put_page puts page
                        which belongs to fs1
                        into cleancache pool for fs2

At this point, a file cache which originally belongs to fs1 might be
copied back to cleancache pool of fs2, which might be later used as if
it were normal cleancache of fs2, and could eventually corrupt fs2 when
flushed back.

Add rwlock in order to synchronize invalidate_fs with other cleancache
operations.

In normal situations where filesystems are not frequently mounted or
unmounted, there will be little performance impact since
read_lock/read_unlock apis are used.

Signed-off-by: Ohhoon Kwon <ohoono.kwon@samsung.com>
---
 fs/super.c         |  1 +
 include/linux/fs.h |  1 +
 mm/cleancache.c    | 29 ++++++++++++++++++++++++++---
 3 files changed, 28 insertions(+), 3 deletions(-)

Comments

Greg KH June 30, 2021, 8:13 a.m. UTC | #1
On Wed, Jun 30, 2021 at 04:33:10PM +0900, 권오훈 wrote:
> Current cleancache api implementation has potential race as follows,
> which might lead to corruption in filesystems using cleancache.
> 
> thread 0                thread 1                        thread 2
> 
>                         in put_page
>                         get pool_id K for fs1
> invalidate_fs on fs1
> frees pool_id K
>                                                         init_fs for fs2
>                                                         allocates pool_id K
>                         put_page puts page
>                         which belongs to fs1
>                         into cleancache pool for fs2
> 
> At this point, a file cache which originally belongs to fs1 might be
> copied back to cleancache pool of fs2, which might be later used as if
> it were normal cleancache of fs2, and could eventually corrupt fs2 when
> flushed back.
> 
> Add rwlock in order to synchronize invalidate_fs with other cleancache
> operations.
> 
> In normal situations where filesystems are not frequently mounted or
> unmounted, there will be little performance impact since
> read_lock/read_unlock apis are used.
> 
> Signed-off-by: Ohhoon Kwon <ohoono.kwon@samsung.com>

What commit does this fix?  Should it go to stable kernels?

thanks,

greg k-h
Matthew Wilcox June 30, 2021, 11:26 a.m. UTC | #2
On Wed, Jun 30, 2021 at 10:13:28AM +0200, gregkh@linuxfoundation.org wrote:
> On Wed, Jun 30, 2021 at 04:33:10PM +0900, 권오훈 wrote:
> > Current cleancache api implementation has potential race as follows,
> > which might lead to corruption in filesystems using cleancache.
> > 
> > thread 0                thread 1                        thread 2
> > 
> >                         in put_page
> >                         get pool_id K for fs1
> > invalidate_fs on fs1
> > frees pool_id K
> >                                                         init_fs for fs2
> >                                                         allocates pool_id K
> >                         put_page puts page
> >                         which belongs to fs1
> >                         into cleancache pool for fs2
> > 
> > At this point, a file cache which originally belongs to fs1 might be
> > copied back to cleancache pool of fs2, which might be later used as if
> > it were normal cleancache of fs2, and could eventually corrupt fs2 when
> > flushed back.
> > 
> > Add rwlock in order to synchronize invalidate_fs with other cleancache
> > operations.
> > 
> > In normal situations where filesystems are not frequently mounted or
> > unmounted, there will be little performance impact since
> > read_lock/read_unlock apis are used.
> > 
> > Signed-off-by: Ohhoon Kwon <ohoono.kwon@samsung.com>
> 
> What commit does this fix?  Should it go to stable kernels?

I have a commit I haven't submitted yet with this changelog:

    Remove cleancache

    The last cleancache backend was deleted in v5.3 ("xen: remove tmem
    driver"), so it has been unused since.  Remove all its filesystem hooks.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Greg KH June 30, 2021, 12:29 p.m. UTC | #3
On Wed, Jun 30, 2021 at 12:26:57PM +0100, Matthew Wilcox wrote:
> On Wed, Jun 30, 2021 at 10:13:28AM +0200, gregkh@linuxfoundation.org wrote:
> > On Wed, Jun 30, 2021 at 04:33:10PM +0900, 권오훈 wrote:
> > > Current cleancache api implementation has potential race as follows,
> > > which might lead to corruption in filesystems using cleancache.
> > > 
> > > thread 0                thread 1                        thread 2
> > > 
> > >                         in put_page
> > >                         get pool_id K for fs1
> > > invalidate_fs on fs1
> > > frees pool_id K
> > >                                                         init_fs for fs2
> > >                                                         allocates pool_id K
> > >                         put_page puts page
> > >                         which belongs to fs1
> > >                         into cleancache pool for fs2
> > > 
> > > At this point, a file cache which originally belongs to fs1 might be
> > > copied back to cleancache pool of fs2, which might be later used as if
> > > it were normal cleancache of fs2, and could eventually corrupt fs2 when
> > > flushed back.
> > > 
> > > Add rwlock in order to synchronize invalidate_fs with other cleancache
> > > operations.
> > > 
> > > In normal situations where filesystems are not frequently mounted or
> > > unmounted, there will be little performance impact since
> > > read_lock/read_unlock apis are used.
> > > 
> > > Signed-off-by: Ohhoon Kwon <ohoono.kwon@samsung.com>
> > 
> > What commit does this fix?  Should it go to stable kernels?
> 
> I have a commit I haven't submitted yet with this changelog:
> 
>     Remove cleancache
> 
>     The last cleancache backend was deleted in v5.3 ("xen: remove tmem
>     driver"), so it has been unused since.  Remove all its filesystem hooks.
> 
>     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

That's even better!

But if so, how is the above reported problem even a problem if no one is
using cleancache?

thanks,

greg k-h
권오훈 July 1, 2021, 5:06 a.m. UTC | #4
On Thu, Jul 1, 2021 at 02:06:45PM +0900, 권오훈 wrote:
> On Wed, Jun 30, 2021 at 12:26:57PM +0100, Matthew Wilcox wrote:
> > On Wed, Jun 30, 2021 at 10:13:28AM +0200, gregkh@linuxfoundation.org wrote:
> > > On Wed, Jun 30, 2021 at 04:33:10PM +0900, 권오훈 wrote:
> > > > Current cleancache api implementation has potential race as follows,
> > > > which might lead to corruption in filesystems using cleancache.
> > > > 
> > > > thread 0                thread 1                        thread 2
> > > > 
> > > >                         in put_page
> > > >                         get pool_id K for fs1
> > > > invalidate_fs on fs1
> > > > frees pool_id K
> > > >                                                         init_fs for fs2
> > > >                                                         allocates pool_id K
> > > >                         put_page puts page
> > > >                         which belongs to fs1
> > > >                         into cleancache pool for fs2
> > > > 
> > > > At this point, a file cache which originally belongs to fs1 might be
> > > > copied back to cleancache pool of fs2, which might be later used as if
> > > > it were normal cleancache of fs2, and could eventually corrupt fs2 when
> > > > flushed back.
> > > > 
> > > > Add rwlock in order to synchronize invalidate_fs with other cleancache
> > > > operations.
> > > > 
> > > > In normal situations where filesystems are not frequently mounted or
> > > > unmounted, there will be little performance impact since
> > > > read_lock/read_unlock apis are used.
> > > > 
> > > > Signed-off-by: Ohhoon Kwon <ohoono.kwon@samsung.com>
> > > 
> > > What commit does this fix?  Should it go to stable kernels?
> > 
> > I have a commit I haven't submitted yet with this changelog:
> > 
> >     Remove cleancache
> > 
> >     The last cleancache backend was deleted in v5.3 ("xen: remove tmem
> >     driver"), so it has been unused since.  Remove all its filesystem hooks.
> > 
> >     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>  
> That's even better!
>  
> But if so, how is the above reported problem even a problem if no one is
> using cleancache?
>  
> thanks,
>  
> greg k-h
> 
Dear all.

We are using Cleancache APIs for our proprietary feature in Samsung.
As Wilcox mentioned, however, there is no cleancache backend in current kernel
mainline.
So if the race patch shall be accepted, then it seems unnecessary to patch 
previous stable kernels.

Meanwhile, I personally think cleancache API still has potential to be a good
material when used with new arising technologies such as pmem or NVMe.

So I suggest to postpone removing cleancache for a while.

Thanks,
Ohhoon Kwon.
Greg KH July 1, 2021, 5:58 a.m. UTC | #5
On Thu, Jul 01, 2021 at 02:06:44PM +0900, 권오훈 wrote:
> On Thu, Jul 1, 2021 at 02:06:45PM +0900, 권오훈 wrote:
> > On Wed, Jun 30, 2021 at 12:26:57PM +0100, Matthew Wilcox wrote:
> > > On Wed, Jun 30, 2021 at 10:13:28AM +0200, gregkh@linuxfoundation.org wrote:
> > > > On Wed, Jun 30, 2021 at 04:33:10PM +0900, 권오훈 wrote:
> > > > > Current cleancache api implementation has potential race as follows,
> > > > > which might lead to corruption in filesystems using cleancache.
> > > > > 
> > > > > thread 0                thread 1                        thread 2
> > > > > 
> > > > >                         in put_page
> > > > >                         get pool_id K for fs1
> > > > > invalidate_fs on fs1
> > > > > frees pool_id K
> > > > >                                                         init_fs for fs2
> > > > >                                                         allocates pool_id K
> > > > >                         put_page puts page
> > > > >                         which belongs to fs1
> > > > >                         into cleancache pool for fs2
> > > > > 
> > > > > At this point, a file cache which originally belongs to fs1 might be
> > > > > copied back to cleancache pool of fs2, which might be later used as if
> > > > > it were normal cleancache of fs2, and could eventually corrupt fs2 when
> > > > > flushed back.
> > > > > 
> > > > > Add rwlock in order to synchronize invalidate_fs with other cleancache
> > > > > operations.
> > > > > 
> > > > > In normal situations where filesystems are not frequently mounted or
> > > > > unmounted, there will be little performance impact since
> > > > > read_lock/read_unlock apis are used.
> > > > > 
> > > > > Signed-off-by: Ohhoon Kwon <ohoono.kwon@samsung.com>
> > > > 
> > > > What commit does this fix?  Should it go to stable kernels?
> > > 
> > > I have a commit I haven't submitted yet with this changelog:
> > > 
> > >     Remove cleancache
> > > 
> > >     The last cleancache backend was deleted in v5.3 ("xen: remove tmem
> > >     driver"), so it has been unused since.  Remove all its filesystem hooks.
> > > 
> > >     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> >  
> > That's even better!
> >  
> > But if so, how is the above reported problem even a problem if no one is
> > using cleancache?
> >  
> > thanks,
> >  
> > greg k-h
> > 
> Dear all.
> 
> We are using Cleancache APIs for our proprietary feature in Samsung.
> As Wilcox mentioned, however, there is no cleancache backend in current kernel
> mainline.
> So if the race patch shall be accepted, then it seems unnecessary to patch 
> previous stable kernels.
> 
> Meanwhile, I personally think cleancache API still has potential to be a good
> material when used with new arising technologies such as pmem or NVMe.
> 
> So I suggest to postpone removing cleancache for a while.

If there are no in-kernel users, it needs to be removed.  If you rely on
this, wonderful, please submit your code as soon as possible.

thanks,

greg k-h
Christoph Hellwig July 1, 2021, 8:14 a.m. UTC | #6
On Wed, Jun 30, 2021 at 12:26:57PM +0100, Matthew Wilcox wrote:
> I have a commit I haven't submitted yet with this changelog:
> 
>     Remove cleancache
> 
>     The last cleancache backend was deleted in v5.3 ("xen: remove tmem
>     driver"), so it has been unused since.  Remove all its filesystem hooks.
> 
>     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

This would be really helpful.  cleancache is such mess.
권오훈 July 1, 2021, 8:56 a.m. UTC | #7
On Thu, Jul 01, 2021 at 07:58:36AM +0200, gregkh@linuxfoundation.org wrote:
> On Thu, Jul 01, 2021 at 02:06:44PM +0900, 권오훈 wrote:
> > On Wed, Jun 30, 2021 at 02:29:23PM +0200, gregkh@linuxfoundation.org wrote:
> > > On Wed, Jun 30, 2021 at 12:26:57PM +0100, Matthew Wilcox wrote:
> > > > On Wed, Jun 30, 2021 at 10:13:28AM +0200, gregkh@linuxfoundation.org wrote:
> > > > > On Wed, Jun 30, 2021 at 04:33:10PM +0900, 권오훈 wrote:
> > > > > > Current cleancache api implementation has potential race as follows,
> > > > > > which might lead to corruption in filesystems using cleancache.
> > > > > > 
> > > > > > thread 0                thread 1                        thread 2
> > > > > > 
> > > > > >                         in put_page
> > > > > >                         get pool_id K for fs1
> > > > > > invalidate_fs on fs1
> > > > > > frees pool_id K
> > > > > >                                                         init_fs for fs2
> > > > > >                                                         allocates pool_id K
> > > > > >                         put_page puts page
> > > > > >                         which belongs to fs1
> > > > > >                         into cleancache pool for fs2
> > > > > > 
> > > > > > At this point, a file cache which originally belongs to fs1 might be
> > > > > > copied back to cleancache pool of fs2, which might be later used as if
> > > > > > it were normal cleancache of fs2, and could eventually corrupt fs2 when
> > > > > > flushed back.
> > > > > > 
> > > > > > Add rwlock in order to synchronize invalidate_fs with other cleancache
> > > > > > operations.
> > > > > > 
> > > > > > In normal situations where filesystems are not frequently mounted or
> > > > > > unmounted, there will be little performance impact since
> > > > > > read_lock/read_unlock apis are used.
> > > > > > 
> > > > > > Signed-off-by: Ohhoon Kwon <ohoono.kwon@samsung.com>
> > > > > 
> > > > > What commit does this fix?  Should it go to stable kernels?
> > > > 
> > > > I have a commit I haven't submitted yet with this changelog:
> > > > 
> > > >     Remove cleancache
> > > > 
> > > >     The last cleancache backend was deleted in v5.3 ("xen: remove tmem
> > > >     driver"), so it has been unused since.  Remove all its filesystem hooks.
> > > > 
> > > >     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > >  
> > > That's even better!
> > >  
> > > But if so, how is the above reported problem even a problem if no one is
> > > using cleancache?
> > >  
> > > thanks,
> > >  
> > > greg k-h
> > > 
> > Dear all.
> > 
> > We are using Cleancache APIs for our proprietary feature in Samsung.
> > As Wilcox mentioned, however, there is no cleancache backend in current kernel
> > mainline.
> > So if the race patch shall be accepted, then it seems unnecessary to patch 
> > previous stable kernels.
> > 
> > Meanwhile, I personally think cleancache API still has potential to be a good
> > material when used with new arising technologies such as pmem or NVMe.
> > 
> > So I suggest to postpone removing cleancache for a while.
>  
> If there are no in-kernel users, it needs to be removed.  If you rely on
> this, wonderful, please submit your code as soon as possible.
>  
> thanks,
>  
> greg k-h
> 
We will discuss internally and see if we can submit our feature.

Thanks,
Ohhoon Kwon
Matthew Wilcox July 1, 2021, 11:57 a.m. UTC | #8
On Thu, Jul 01, 2021 at 05:56:50PM +0900, 권오훈 wrote:
> On Thu, Jul 01, 2021 at 07:58:36AM +0200, gregkh@linuxfoundation.org wrote:
> > On Thu, Jul 01, 2021 at 02:06:44PM +0900, 권오훈 wrote:
> > > We are using Cleancache APIs for our proprietary feature in Samsung.
> > > As Wilcox mentioned, however, there is no cleancache backend in current kernel
> > > mainline.
> > > So if the race patch shall be accepted, then it seems unnecessary to patch 
> > > previous stable kernels.
> > > 
> > > Meanwhile, I personally think cleancache API still has potential to be a good
> > > material when used with new arising technologies such as pmem or NVMe.
> > > 
> > > So I suggest to postpone removing cleancache for a while.
> >  
> > If there are no in-kernel users, it needs to be removed.  If you rely on
> > this, wonderful, please submit your code as soon as possible.
> >  
> > thanks,
> >  
> > greg k-h
> > 
> We will discuss internally and see if we can submit our feature.

You have two months.  I'll submit the removal then if no new backend has
shown up.
diff mbox series

Patch

diff --git a/fs/super.c b/fs/super.c
index 11b7e7213fd1..6810b685490c 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -261,6 +261,7 @@  static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	s->s_time_min = TIME64_MIN;
 	s->s_time_max = TIME64_MAX;
 	s->cleancache_poolid = CLEANCACHE_NO_POOL;
+	rwlock_init(&s->cleancache_pool_lock);
 
 	s->s_shrink.seeks = DEFAULT_SEEKS;
 	s->s_shrink.scan_objects = super_cache_scan;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c3c88fdb9b2a..f61008c9e8fc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1501,6 +1501,7 @@  struct super_block {
 	 * Saved pool identifier for cleancache (-1 means none)
 	 */
 	int cleancache_poolid;
+	rwlock_t cleancache_pool_lock;
 
 	struct shrinker s_shrink;	/* per-sb shrinker handle */
 
diff --git a/mm/cleancache.c b/mm/cleancache.c
index db7eee9c0886..10b436a28219 100644
--- a/mm/cleancache.c
+++ b/mm/cleancache.c
@@ -114,12 +114,14 @@  void __cleancache_init_fs(struct super_block *sb)
 {
 	int pool_id = CLEANCACHE_NO_BACKEND;
 
+	write_lock(&sb->cleancache_pool_lock);
 	if (cleancache_ops) {
 		pool_id = cleancache_ops->init_fs(PAGE_SIZE);
 		if (pool_id < 0)
 			pool_id = CLEANCACHE_NO_POOL;
 	}
 	sb->cleancache_poolid = pool_id;
+	write_unlock(&sb->cleancache_pool_lock);
 }
 EXPORT_SYMBOL(__cleancache_init_fs);
 
@@ -128,12 +130,14 @@  void __cleancache_init_shared_fs(struct super_block *sb)
 {
 	int pool_id = CLEANCACHE_NO_BACKEND_SHARED;
 
+	write_lock(&sb->cleancache_pool_lock);
 	if (cleancache_ops) {
 		pool_id = cleancache_ops->init_shared_fs(&sb->s_uuid, PAGE_SIZE);
 		if (pool_id < 0)
 			pool_id = CLEANCACHE_NO_POOL;
 	}
 	sb->cleancache_poolid = pool_id;
+	write_unlock(&sb->cleancache_pool_lock);
 }
 EXPORT_SYMBOL(__cleancache_init_shared_fs);
 
@@ -185,6 +189,7 @@  int __cleancache_get_page(struct page *page)
 	}
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	read_lock(&page->mapping->host->i_sb->cleancache_pool_lock);
 	pool_id = page->mapping->host->i_sb->cleancache_poolid;
 	if (pool_id < 0)
 		goto out;
@@ -198,6 +203,7 @@  int __cleancache_get_page(struct page *page)
 	else
 		cleancache_failed_gets++;
 out:
+	read_unlock(&page->mapping->host->i_sb->cleancache_pool_lock);
 	return ret;
 }
 EXPORT_SYMBOL(__cleancache_get_page);
@@ -223,12 +229,14 @@  void __cleancache_put_page(struct page *page)
 	}
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	read_lock(&page->mapping->host->i_sb->cleancache_pool_lock);
 	pool_id = page->mapping->host->i_sb->cleancache_poolid;
 	if (pool_id >= 0 &&
 		cleancache_get_key(page->mapping->host, &key) >= 0) {
 		cleancache_ops->put_page(pool_id, key, page->index, page);
 		cleancache_puts++;
 	}
+	read_unlock(&page->mapping->host->i_sb->cleancache_pool_lock);
 }
 EXPORT_SYMBOL(__cleancache_put_page);
 
@@ -244,12 +252,15 @@  void __cleancache_invalidate_page(struct address_space *mapping,
 					struct page *page)
 {
 	/* careful... page->mapping is NULL sometimes when this is called */
-	int pool_id = mapping->host->i_sb->cleancache_poolid;
+	int pool_id;
 	struct cleancache_filekey key = { .u.key = { 0 } };
 
 	if (!cleancache_ops)
 		return;
 
+	read_lock(&mapping->host->i_sb->cleancache_pool_lock);
+	pool_id = mapping->host->i_sb->cleancache_poolid;
+
 	if (pool_id >= 0) {
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
 		if (cleancache_get_key(mapping->host, &key) >= 0) {
@@ -258,6 +269,7 @@  void __cleancache_invalidate_page(struct address_space *mapping,
 			cleancache_invalidates++;
 		}
 	}
+	read_unlock(&mapping->host->i_sb->cleancache_pool_lock);
 }
 EXPORT_SYMBOL(__cleancache_invalidate_page);
 
@@ -272,14 +284,19 @@  EXPORT_SYMBOL(__cleancache_invalidate_page);
  */
 void __cleancache_invalidate_inode(struct address_space *mapping)
 {
-	int pool_id = mapping->host->i_sb->cleancache_poolid;
+	int pool_id;
 	struct cleancache_filekey key = { .u.key = { 0 } };
 
 	if (!cleancache_ops)
 		return;
 
+	read_lock(&mapping->host->i_sb->cleancache_pool_lock);
+	pool_id = mapping->host->i_sb->cleancache_poolid;
+
 	if (pool_id >= 0 && cleancache_get_key(mapping->host, &key) >= 0)
 		cleancache_ops->invalidate_inode(pool_id, key);
+
+	read_unlock(&mapping->host->i_sb->cleancache_pool_lock);
 }
 EXPORT_SYMBOL(__cleancache_invalidate_inode);
 
@@ -292,11 +309,17 @@  void __cleancache_invalidate_fs(struct super_block *sb)
 {
 	int pool_id;
 
+	if (!cleancache_ops)
+		return;
+
+	write_lock(&sb->cleancache_pool_lock);
 	pool_id = sb->cleancache_poolid;
 	sb->cleancache_poolid = CLEANCACHE_NO_POOL;
 
-	if (cleancache_ops && pool_id >= 0)
+	if (pool_id >= 0)
 		cleancache_ops->invalidate_fs(pool_id);
+
+	write_unlock(&sb->cleancache_pool_lock);
 }
 EXPORT_SYMBOL(__cleancache_invalidate_fs);