diff mbox series

[3/7] cachefiles: Fix page leak in cachefiles_read_backing_file while vmscan is active

Message ID 154359603369.18703.763590641473461495.stgit@warthog.procyon.org.uk (mailing list archive)
State New, archived
Headers show
Series FS-Cache: Miscellaneous fixes | expand

Commit Message

David Howells Nov. 30, 2018, 4:40 p.m. UTC
From: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>

[Description]

In a heavily loaded system where the system pagecache is nearing memory
limits and fscache is enabled, pages can be leaked by fscache while trying
read pages from cachefiles backend.  This can happen because two
applications can be reading same page from a single mount, two threads can
be trying to read the backing page at same time.  This results in one of
the threads finding that a page for the backing file or netfs file is
already in the radix tree.  During the error handling cachefiles does not
clean up the reference on backing page, leading to page leak.

[Fix]
The fix is straightforward, to decrement the reference when error is
encountered.

  [dhowells: Note that I've removed the clearance and put of newpage as
   they aren't attested in the commit message and don't appear to actually
   achieve anything since a new page is only allocated is newpage!=NULL and
   any residual new page is cleared before returning.]

[Testing]
I have tested the fix using following method for 12+ hrs.

1) mkdir -p /mnt/nfs ; mount -o vers=3,fsc <server_ip>:/export /mnt/nfs
2) create 10000 files of 2.8MB in a NFS mount.
3) start a thread to simulate heavy VM presssure
   (while true ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 1 ; done)&
4) start multiple parallel reader for data set at same time
   find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
   find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
   find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
   ..
   ..
   find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
   find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
5) finally check using cat /proc/fs/fscache/stats | grep -i pages ;
   free -h , cat /proc/meminfo and page-types -r -b lru
   to ensure all pages are freed.

Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Shantanu Goel <sgoel01@yahoo.com>
Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
[dja: forward ported to current upstream]
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/cachefiles/rdwr.c |    6 ++++++
 1 file changed, 6 insertions(+)

Comments

Daniel Axtens Dec. 1, 2018, 12:23 a.m. UTC | #1
David Howells <dhowells@redhat.com> writes:

> From: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
>
> [Description]
>
> In a heavily loaded system where the system pagecache is nearing memory
> limits and fscache is enabled, pages can be leaked by fscache while trying
> read pages from cachefiles backend.  This can happen because two
> applications can be reading same page from a single mount, two threads can
> be trying to read the backing page at same time.  This results in one of
> the threads finding that a page for the backing file or netfs file is
> already in the radix tree.  During the error handling cachefiles does not
> clean up the reference on backing page, leading to page leak.
>
> [Fix]
> The fix is straightforward, to decrement the reference when error is
> encountered.
>
>   [dhowells: Note that I've removed the clearance and put of newpage as
>    they aren't attested in the commit message and don't appear to actually
>    achieve anything since a new page is only allocated is newpage!=NULL and
>    any residual new page is cleared before returning.]

Sorry I hadn't got back to you on this; I think we also discussed this
with the Ubuntu kernel team and concluded - as you did - that these
didn't fix any bugs but did make things seem more consistent.

Regards,
Daniel
>
> [Testing]
> I have tested the fix using following method for 12+ hrs.
>
> 1) mkdir -p /mnt/nfs ; mount -o vers=3,fsc <server_ip>:/export /mnt/nfs
> 2) create 10000 files of 2.8MB in a NFS mount.
> 3) start a thread to simulate heavy VM presssure
>    (while true ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 1 ; done)&
> 4) start multiple parallel reader for data set at same time
>    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
>    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
>    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
>    ..
>    ..
>    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
>    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
> 5) finally check using cat /proc/fs/fscache/stats | grep -i pages ;
>    free -h , cat /proc/meminfo and page-types -r -b lru
>    to ensure all pages are freed.
>
> Reviewed-by: Daniel Axtens <dja@axtens.net>
> Signed-off-by: Shantanu Goel <sgoel01@yahoo.com>
> Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
> [dja: forward ported to current upstream]
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
>
>  fs/cachefiles/rdwr.c |    6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
> index 40f7595aad10..db233588a69a 100644
> --- a/fs/cachefiles/rdwr.c
> +++ b/fs/cachefiles/rdwr.c
> @@ -535,7 +535,10 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
>  					    netpage->index, cachefiles_gfp);
>  		if (ret < 0) {
>  			if (ret == -EEXIST) {
> +				put_page(backpage);
> +				backpage = NULL;
>  				put_page(netpage);
> +				netpage = NULL;
>  				fscache_retrieval_complete(op, 1);
>  				continue;
>  			}
> @@ -608,7 +611,10 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
>  					    netpage->index, cachefiles_gfp);
>  		if (ret < 0) {
>  			if (ret == -EEXIST) {
> +				put_page(backpage);
> +				backpage = NULL;
>  				put_page(netpage);
> +				netpage = NULL;
>  				fscache_retrieval_complete(op, 1);
>  				continue;
>  			}
David Howells Dec. 1, 2018, 1:36 p.m. UTC | #2
Daniel Axtens <dja@axtens.net> wrote:

> >   [dhowells: Note that I've removed the clearance and put of newpage as
> >    they aren't attested in the commit message and don't appear to actually
> >    achieve anything since a new page is only allocated is newpage!=NULL and
> >    any residual new page is cleared before returning.]
> 
> Sorry I hadn't got back to you on this; I think we also discussed this
> with the Ubuntu kernel team and concluded - as you did - that these
> didn't fix any bugs but did make things seem more consistent.

The idea is that if it fails to use the new page it caches it for the next
iteration of the loop rather than going to the allocator twice.  But making
the change you proposed, you should also remove the bit that discards the page
on the way out of the function and probably shouldn't initialise newpage at
the top of the function so that the compiler will let you know about paths
that don't handle it.

David
diff mbox series

Patch

diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 40f7595aad10..db233588a69a 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -535,7 +535,10 @@  static int cachefiles_read_backing_file(struct cachefiles_object *object,
 					    netpage->index, cachefiles_gfp);
 		if (ret < 0) {
 			if (ret == -EEXIST) {
+				put_page(backpage);
+				backpage = NULL;
 				put_page(netpage);
+				netpage = NULL;
 				fscache_retrieval_complete(op, 1);
 				continue;
 			}
@@ -608,7 +611,10 @@  static int cachefiles_read_backing_file(struct cachefiles_object *object,
 					    netpage->index, cachefiles_gfp);
 		if (ret < 0) {
 			if (ret == -EEXIST) {
+				put_page(backpage);
+				backpage = NULL;
 				put_page(netpage);
+				netpage = NULL;
 				fscache_retrieval_complete(op, 1);
 				continue;
 			}