diff mbox

[2/3] f2fs crypto: use bounce pages from mempool first

Message ID 1432255225-76792-2-git-send-email-jaegeuk@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Jaegeuk Kim May 22, 2015, 12:40 a.m. UTC
If a lot of write streams are triggered, alloc_page and __free_page are
costly called, resulting in high memory pressure.

In order to avoid that, let's reuse mempool pages for writeback pages.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/crypto.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

Comments

?? May 25, 2015, 10 a.m. UTC | #1
Hi Jaegeuk,

> -----Original Message-----
> From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> Sent: Friday, May 22, 2015 8:40 AM
> To: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org;
> linux-f2fs-devel@lists.sourceforge.net
> Cc: Jaegeuk Kim
> Subject: [f2fs-dev] [PATCH 2/3] f2fs crypto: use bounce pages from mempool first
> 
> If a lot of write streams are triggered, alloc_page and __free_page are
> costly called, resulting in high memory pressure.
> 
> In order to avoid that, let's reuse mempool pages for writeback pages.
> 
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
>  fs/f2fs/crypto.c | 16 ++++------------
>  1 file changed, 4 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/f2fs/crypto.c b/fs/f2fs/crypto.c
> index 2c7819a..2ceee68 100644
> --- a/fs/f2fs/crypto.c
> +++ b/fs/f2fs/crypto.c
> @@ -408,20 +408,12 @@ struct page *f2fs_encrypt(struct inode *inode,
>  		return (struct page *)ctx;
> 
>  	/* The encryption operation will require a bounce page. */
> -	ciphertext_page = alloc_page(GFP_NOFS);
> +	ciphertext_page = mempool_alloc(f2fs_bounce_page_pool, GFP_NOFS);
>  	if (!ciphertext_page) {
> -		/*
> -		 * This is a potential bottleneck, but at least we'll have
> -		 * forward progress.
> -		 */
> -		ciphertext_page = mempool_alloc(f2fs_bounce_page_pool,
> -							GFP_NOFS);
> -		if (WARN_ON_ONCE(!ciphertext_page))
> -			ciphertext_page = mempool_alloc(f2fs_bounce_page_pool,
> -						GFP_NOFS | __GFP_WAIT);
> -		ctx->flags &= ~F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL;
> -	} else {
> +		ciphertext_page = alloc_page(GFP_NOFS);

Using alloc_page(GFP_NOFS | __GFP_WAIT) to avoid failure?

Thanks,

>  		ctx->flags |= F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL;
> +	} else {
> +		ctx->flags &= ~F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL;
>  	}
>  	ctx->flags |= F2FS_WRITE_PATH_FL;
>  	ctx->w.bounce_page = ciphertext_page;
> --
> 2.1.1
> 
> 
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o May 25, 2015, 7:55 p.m. UTC | #2
On Thu, May 21, 2015 at 05:40:24PM -0700, Jaegeuk Kim wrote:
> If a lot of write streams are triggered, alloc_page and __free_page are
> costly called, resulting in high memory pressure.
> 
> In order to avoid that, let's reuse mempool pages for writeback pages.

The reason why the mempool pages was used as a fallback was because
once we are deep in the writeback code, handling memory allocation
failures is close to impossible, since we've already made enough
changes that unwinding them would be extremely difficult.  So the
basic idea was to use the mempool as an emergency reserve, since
Failure Is Not An Option, and the alternative, which is to simply loop
until the mm subsystem sees fit to give us a page, has sometimes led
to deadlock.

The primary source of write streams should be either (a) fsync
operations, or (b) calls from the writeback thread.  Are there any
additional sources for f2fs?  If they are calls from fsync operations,
and we have more than a threshold number of write operations in play,
we might want to think about blocking the fsync/fdatasync writeback,
**before** the operation starts taking locks, so other write
operations can proceed.  And the writeback thread should keep the
number of write operations to a reasonable number, especially given
that we are treating page encryption as a blocking operation.  Or is
there something else going on which is making this to be more of a
problem for f2fs?

Regards,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jaegeuk Kim May 27, 2015, 7:09 p.m. UTC | #3
Hi Chao,

On Mon, May 25, 2015 at 06:00:47PM +0800, Chao Yu wrote:
> Hi Jaegeuk,
> 
> > -----Original Message-----
> > From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> > Sent: Friday, May 22, 2015 8:40 AM
> > To: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org;
> > linux-f2fs-devel@lists.sourceforge.net
> > Cc: Jaegeuk Kim
> > Subject: [f2fs-dev] [PATCH 2/3] f2fs crypto: use bounce pages from mempool first
> > 
> > If a lot of write streams are triggered, alloc_page and __free_page are
> > costly called, resulting in high memory pressure.
> > 
> > In order to avoid that, let's reuse mempool pages for writeback pages.
> > 
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> >  fs/f2fs/crypto.c | 16 ++++------------
> >  1 file changed, 4 insertions(+), 12 deletions(-)
> > 
> > diff --git a/fs/f2fs/crypto.c b/fs/f2fs/crypto.c
> > index 2c7819a..2ceee68 100644
> > --- a/fs/f2fs/crypto.c
> > +++ b/fs/f2fs/crypto.c
> > @@ -408,20 +408,12 @@ struct page *f2fs_encrypt(struct inode *inode,
> >  		return (struct page *)ctx;
> > 
> >  	/* The encryption operation will require a bounce page. */
> > -	ciphertext_page = alloc_page(GFP_NOFS);
> > +	ciphertext_page = mempool_alloc(f2fs_bounce_page_pool, GFP_NOFS);
> >  	if (!ciphertext_page) {
> > -		/*
> > -		 * This is a potential bottleneck, but at least we'll have
> > -		 * forward progress.
> > -		 */
> > -		ciphertext_page = mempool_alloc(f2fs_bounce_page_pool,
> > -							GFP_NOFS);
> > -		if (WARN_ON_ONCE(!ciphertext_page))
> > -			ciphertext_page = mempool_alloc(f2fs_bounce_page_pool,
> > -						GFP_NOFS | __GFP_WAIT);
> > -		ctx->flags &= ~F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL;
> > -	} else {
> > +		ciphertext_page = alloc_page(GFP_NOFS);
> 
> Using alloc_page(GFP_NOFS | __GFP_WAIT) to avoid failure?

#define GFP_NOFS	(__GFP_WAIT | __GFP_IO)

Thanks,

> 
> Thanks,
> 
> >  		ctx->flags |= F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL;
> > +	} else {
> > +		ctx->flags &= ~F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL;
> >  	}
> >  	ctx->flags |= F2FS_WRITE_PATH_FL;
> >  	ctx->w.bounce_page = ciphertext_page;
> > --
> > 2.1.1
> > 
> > 
> > ------------------------------------------------------------------------------
> > One dashboard for servers and applications across Physical-Virtual-Cloud
> > Widest out-of-the-box monitoring support with 50+ applications
> > Performance metrics, stats and reports that give you Actionable Insights
> > Deep dive visibility with transaction tracing using APM Insight.
> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> > _______________________________________________
> > Linux-f2fs-devel mailing list
> > Linux-f2fs-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jaegeuk Kim May 27, 2015, 9:18 p.m. UTC | #4
On Mon, May 25, 2015 at 03:55:51PM -0400, Theodore Ts'o wrote:
> On Thu, May 21, 2015 at 05:40:24PM -0700, Jaegeuk Kim wrote:
> > If a lot of write streams are triggered, alloc_page and __free_page are
> > costly called, resulting in high memory pressure.
> > 
> > In order to avoid that, let's reuse mempool pages for writeback pages.
> 
> The reason why the mempool pages was used as a fallback was because
> once we are deep in the writeback code, handling memory allocation
> failures is close to impossible, since we've already made enough
> changes that unwinding them would be extremely difficult.  So the
> basic idea was to use the mempool as an emergency reserve, since
> Failure Is Not An Option, and the alternative, which is to simply loop
> until the mm subsystem sees fit to give us a page, has sometimes led
> to deadlock.

So, in the current flow,

  ciphertext_page = mempool_alloc(f2fs_bounce_page_pool, GFP_NOFS);

  if (WARN_ON_ONCE(!ciphertext_page))
    ciphertext_page = mempool_alloc(f2fs_bounce_page_pool,
                                            GFP_NOFS | __GFP_WAIT);
                                                      ^^^^^^^^^^^^
Was it intended                                       __GFP_NOFAIL?

Anyway, f2fs handles ENOMEM in ->writepage by:

...
redirty_out:
  redirty_page_for_writepage();
  return AOP_WRITEPAGE_ACTIVATE;
}

> 
> The primary source of write streams should be either (a) fsync
> operations, or (b) calls from the writeback thread.  Are there any
> additional sources for f2fs?  If they are calls from fsync operations,
> and we have more than a threshold number of write operations in play,
> we might want to think about blocking the fsync/fdatasync writeback,
> **before** the operation starts taking locks, so other write
> operations can proceed.  And the writeback thread should keep the
> number of write operations to a reasonable number, especially given
> that we are treating page encryption as a blocking operation.  Or is
> there something else going on which is making this to be more of a
> problem for f2fs?

The problem that I'd like to address here is to reduce the call counts of
allocating and freeing a number of pages in pairs.

When I conduct xfstests/224 under 1GB DRAM, I've seen triggering several oom
killers, and in that moment, a huge number of inactive anonymous pages are
registered in the page cache. Not sure why those pages are not reclaimed
seamlessly though.

Neverthelss, once I changed the flow to reuse the pages in the mempool for
encryption/decryption, I could have resolved that issue.
And, I thought that there is no reason to allocate new pages for every requests.

For general purpose, it may need an additional mempool too.

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o May 28, 2015, 6:18 p.m. UTC | #5
On Wed, May 27, 2015 at 02:18:54PM -0700, Jaegeuk Kim wrote:
> The problem that I'd like to address here is to reduce the call counts of
> allocating and freeing a number of pages in pairs.
> 
> When I conduct xfstests/224 under 1GB DRAM, I've seen triggering several oom
> killers, and in that moment, a huge number of inactive anonymous pages are
> registered in the page cache. Not sure why those pages are not reclaimed
> seamlessly though.

If the system is running 8 fio processes, each one writing 1 meg
(BIO_MAX pages) at a time, one of the things that is going on is that
we need to grab 256 4k paegs before the submitting the bio, and then
if there are a large number of bio's queued, we can have potentially a
very large number of pages allocated until the I/O has been completed.

So the problem is it's extremely difficult to determine ahead of time
how many pages that need to be reserved in a mempool.  Simply
increasing the number of in the mempool from 32 to 256 is no guarantee
that it will be enough.  We originally only reserved 32 pages so that
in the case of an extreme memory crunch, we could make at least some
amount of forward progress.

I can imagine a number of different solutions (and these are not
mutally exclusive):

1) Try to dynamically adjust the number of pages we keep in the
mempool so that we ramp up under I/O load and then gradually ramp down
when the I/O pressure decreases.

2) Keep track of how many temporary encryption outstanding bounce
pages are outstanding, if we exceed some number, push back in
writepages for encrypted inode.  That way we can make it be a tunable
so that we don't end up using a huge number of pages, and can start
throttling encrypted writeback even before we start getting allocation
failures.

I'm currently leaning towards #2, myself.  I haven't tried doing some
kernel performance measurements to see how much CPU time we're
spending in alloc_page() and free_page() when under a very heavy
memory load.  I assume you've done some measurements and this has been
very heavy.  Can you give more details about how much CPU time is
getting burned by alloc_page() and free_page()?  I had been assuming
that if we're I/O bound, the extra CPU time to allocate and free the
pages wouldn't be really onerous.  If you're seeing something
different, I'd love to see some data (perf traces, etc.) to correct my
impressions.

Cheers,

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
?? May 29, 2015, 2:45 a.m. UTC | #6
Hi Jaegeuk,

> -----Original Message-----
> From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> Sent: Thursday, May 28, 2015 3:10 AM
> To: Chao Yu
> Cc: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org;
> linux-f2fs-devel@lists.sourceforge.net
> Subject: Re: [f2fs-dev] [PATCH 2/3] f2fs crypto: use bounce pages from mempool first
> 
> Hi Chao,
> 
> On Mon, May 25, 2015 at 06:00:47PM +0800, Chao Yu wrote:
> > Hi Jaegeuk,
> >

[snip]

> >
> > Using alloc_page(GFP_NOFS | __GFP_WAIT) to avoid failure?
> 
> #define GFP_NOFS	(__GFP_WAIT | __GFP_IO)

That's right, my miss.

Thanks,


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/f2fs/crypto.c b/fs/f2fs/crypto.c
index 2c7819a..2ceee68 100644
--- a/fs/f2fs/crypto.c
+++ b/fs/f2fs/crypto.c
@@ -408,20 +408,12 @@  struct page *f2fs_encrypt(struct inode *inode,
 		return (struct page *)ctx;
 
 	/* The encryption operation will require a bounce page. */
-	ciphertext_page = alloc_page(GFP_NOFS);
+	ciphertext_page = mempool_alloc(f2fs_bounce_page_pool, GFP_NOFS);
 	if (!ciphertext_page) {
-		/*
-		 * This is a potential bottleneck, but at least we'll have
-		 * forward progress.
-		 */
-		ciphertext_page = mempool_alloc(f2fs_bounce_page_pool,
-							GFP_NOFS);
-		if (WARN_ON_ONCE(!ciphertext_page))
-			ciphertext_page = mempool_alloc(f2fs_bounce_page_pool,
-						GFP_NOFS | __GFP_WAIT);
-		ctx->flags &= ~F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL;
-	} else {
+		ciphertext_page = alloc_page(GFP_NOFS);
 		ctx->flags |= F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL;
+	} else {
+		ctx->flags &= ~F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL;
 	}
 	ctx->flags |= F2FS_WRITE_PATH_FL;
 	ctx->w.bounce_page = ciphertext_page;