Message ID | 1432255225-76792-2-git-send-email-jaegeuk@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Jaegeuk, > -----Original Message----- > From: Jaegeuk Kim [mailto:jaegeuk@kernel.org] > Sent: Friday, May 22, 2015 8:40 AM > To: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; > linux-f2fs-devel@lists.sourceforge.net > Cc: Jaegeuk Kim > Subject: [f2fs-dev] [PATCH 2/3] f2fs crypto: use bounce pages from mempool first > > If a lot of write streams are triggered, alloc_page and __free_page are > costly called, resulting in high memory pressure. > > In order to avoid that, let's reuse mempool pages for writeback pages. > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> > --- > fs/f2fs/crypto.c | 16 ++++------------ > 1 file changed, 4 insertions(+), 12 deletions(-) > > diff --git a/fs/f2fs/crypto.c b/fs/f2fs/crypto.c > index 2c7819a..2ceee68 100644 > --- a/fs/f2fs/crypto.c > +++ b/fs/f2fs/crypto.c > @@ -408,20 +408,12 @@ struct page *f2fs_encrypt(struct inode *inode, > return (struct page *)ctx; > > /* The encryption operation will require a bounce page. */ > - ciphertext_page = alloc_page(GFP_NOFS); > + ciphertext_page = mempool_alloc(f2fs_bounce_page_pool, GFP_NOFS); > if (!ciphertext_page) { > - /* > - * This is a potential bottleneck, but at least we'll have > - * forward progress. > - */ > - ciphertext_page = mempool_alloc(f2fs_bounce_page_pool, > - GFP_NOFS); > - if (WARN_ON_ONCE(!ciphertext_page)) > - ciphertext_page = mempool_alloc(f2fs_bounce_page_pool, > - GFP_NOFS | __GFP_WAIT); > - ctx->flags &= ~F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL; > - } else { > + ciphertext_page = alloc_page(GFP_NOFS); Using alloc_page(GFP_NOFS | __GFP_WAIT) to avoid failure? Thanks, > ctx->flags |= F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL; > + } else { > + ctx->flags &= ~F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL; > } > ctx->flags |= F2FS_WRITE_PATH_FL; > ctx->w.bounce_page = ciphertext_page; > -- > 2.1.1 > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Linux-f2fs-devel mailing list > Linux-f2fs-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 21, 2015 at 05:40:24PM -0700, Jaegeuk Kim wrote: > If a lot of write streams are triggered, alloc_page and __free_page are > costly called, resulting in high memory pressure. > > In order to avoid that, let's reuse mempool pages for writeback pages. The reason why the mempool pages was used as a fallback was because once we are deep in the writeback code, handling memory allocation failures is close to impossible, since we've already made enough changes that unwinding them would be extremely difficult. So the basic idea was to use the mempool as an emergency reserve, since Failure Is Not An Option, and the alternative, which is to simply loop until the mm subsystem sees fit to give us a page, has sometimes led to deadlock. The primary source of write streams should be either (a) fsync operations, or (b) calls from the writeback thread. Are there any additional sources for f2fs? If they are calls from fsync operations, and we have more than a threshold number of write operations in play, we might want to think about blocking the fsync/fdatasync writeback, **before** the operation starts taking locks, so other write operations can proceed. And the writeback thread should keep the number of write operations to a reasonable number, especially given that we are treating page encryption as a blocking operation. Or is there something else going on which is making this to be more of a problem for f2fs? Regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Chao, On Mon, May 25, 2015 at 06:00:47PM +0800, Chao Yu wrote: > Hi Jaegeuk, > > > -----Original Message----- > > From: Jaegeuk Kim [mailto:jaegeuk@kernel.org] > > Sent: Friday, May 22, 2015 8:40 AM > > To: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; > > linux-f2fs-devel@lists.sourceforge.net > > Cc: Jaegeuk Kim > > Subject: [f2fs-dev] [PATCH 2/3] f2fs crypto: use bounce pages from mempool first > > > > If a lot of write streams are triggered, alloc_page and __free_page are > > costly called, resulting in high memory pressure. > > > > In order to avoid that, let's reuse mempool pages for writeback pages. > > > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> > > --- > > fs/f2fs/crypto.c | 16 ++++------------ > > 1 file changed, 4 insertions(+), 12 deletions(-) > > > > diff --git a/fs/f2fs/crypto.c b/fs/f2fs/crypto.c > > index 2c7819a..2ceee68 100644 > > --- a/fs/f2fs/crypto.c > > +++ b/fs/f2fs/crypto.c > > @@ -408,20 +408,12 @@ struct page *f2fs_encrypt(struct inode *inode, > > return (struct page *)ctx; > > > > /* The encryption operation will require a bounce page. */ > > - ciphertext_page = alloc_page(GFP_NOFS); > > + ciphertext_page = mempool_alloc(f2fs_bounce_page_pool, GFP_NOFS); > > if (!ciphertext_page) { > > - /* > > - * This is a potential bottleneck, but at least we'll have > > - * forward progress. > > - */ > > - ciphertext_page = mempool_alloc(f2fs_bounce_page_pool, > > - GFP_NOFS); > > - if (WARN_ON_ONCE(!ciphertext_page)) > > - ciphertext_page = mempool_alloc(f2fs_bounce_page_pool, > > - GFP_NOFS | __GFP_WAIT); > > - ctx->flags &= ~F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL; > > - } else { > > + ciphertext_page = alloc_page(GFP_NOFS); > > Using alloc_page(GFP_NOFS | __GFP_WAIT) to avoid failure? #define GFP_NOFS (__GFP_WAIT | __GFP_IO) Thanks, > > Thanks, > > > ctx->flags |= F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL; > > + } else { > > + ctx->flags &= ~F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL; > > } > > ctx->flags |= F2FS_WRITE_PATH_FL; > > ctx->w.bounce_page = ciphertext_page; > > -- > > 2.1.1 > > > > > > ------------------------------------------------------------------------------ > > One dashboard for servers and applications across Physical-Virtual-Cloud > > Widest out-of-the-box monitoring support with 50+ applications > > Performance metrics, stats and reports that give you Actionable Insights > > Deep dive visibility with transaction tracing using APM Insight. > > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > > _______________________________________________ > > Linux-f2fs-devel mailing list > > Linux-f2fs-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 25, 2015 at 03:55:51PM -0400, Theodore Ts'o wrote: > On Thu, May 21, 2015 at 05:40:24PM -0700, Jaegeuk Kim wrote: > > If a lot of write streams are triggered, alloc_page and __free_page are > > costly called, resulting in high memory pressure. > > > > In order to avoid that, let's reuse mempool pages for writeback pages. > > The reason why the mempool pages was used as a fallback was because > once we are deep in the writeback code, handling memory allocation > failures is close to impossible, since we've already made enough > changes that unwinding them would be extremely difficult. So the > basic idea was to use the mempool as an emergency reserve, since > Failure Is Not An Option, and the alternative, which is to simply loop > until the mm subsystem sees fit to give us a page, has sometimes led > to deadlock. So, in the current flow, ciphertext_page = mempool_alloc(f2fs_bounce_page_pool, GFP_NOFS); if (WARN_ON_ONCE(!ciphertext_page)) ciphertext_page = mempool_alloc(f2fs_bounce_page_pool, GFP_NOFS | __GFP_WAIT); ^^^^^^^^^^^^ Was it intended __GFP_NOFAIL? Anyway, f2fs handles ENOMEM in ->writepage by: ... redirty_out: redirty_page_for_writepage(); return AOP_WRITEPAGE_ACTIVATE; } > > The primary source of write streams should be either (a) fsync > operations, or (b) calls from the writeback thread. Are there any > additional sources for f2fs? If they are calls from fsync operations, > and we have more than a threshold number of write operations in play, > we might want to think about blocking the fsync/fdatasync writeback, > **before** the operation starts taking locks, so other write > operations can proceed. And the writeback thread should keep the > number of write operations to a reasonable number, especially given > that we are treating page encryption as a blocking operation. Or is > there something else going on which is making this to be more of a > problem for f2fs? The problem that I'd like to address here is to reduce the call counts of allocating and freeing a number of pages in pairs. When I conduct xfstests/224 under 1GB DRAM, I've seen triggering several oom killers, and in that moment, a huge number of inactive anonymous pages are registered in the page cache. Not sure why those pages are not reclaimed seamlessly though. Neverthelss, once I changed the flow to reuse the pages in the mempool for encryption/decryption, I could have resolved that issue. And, I thought that there is no reason to allocate new pages for every requests. For general purpose, it may need an additional mempool too. Thanks, -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 27, 2015 at 02:18:54PM -0700, Jaegeuk Kim wrote: > The problem that I'd like to address here is to reduce the call counts of > allocating and freeing a number of pages in pairs. > > When I conduct xfstests/224 under 1GB DRAM, I've seen triggering several oom > killers, and in that moment, a huge number of inactive anonymous pages are > registered in the page cache. Not sure why those pages are not reclaimed > seamlessly though. If the system is running 8 fio processes, each one writing 1 meg (BIO_MAX pages) at a time, one of the things that is going on is that we need to grab 256 4k paegs before the submitting the bio, and then if there are a large number of bio's queued, we can have potentially a very large number of pages allocated until the I/O has been completed. So the problem is it's extremely difficult to determine ahead of time how many pages that need to be reserved in a mempool. Simply increasing the number of in the mempool from 32 to 256 is no guarantee that it will be enough. We originally only reserved 32 pages so that in the case of an extreme memory crunch, we could make at least some amount of forward progress. I can imagine a number of different solutions (and these are not mutally exclusive): 1) Try to dynamically adjust the number of pages we keep in the mempool so that we ramp up under I/O load and then gradually ramp down when the I/O pressure decreases. 2) Keep track of how many temporary encryption outstanding bounce pages are outstanding, if we exceed some number, push back in writepages for encrypted inode. That way we can make it be a tunable so that we don't end up using a huge number of pages, and can start throttling encrypted writeback even before we start getting allocation failures. I'm currently leaning towards #2, myself. I haven't tried doing some kernel performance measurements to see how much CPU time we're spending in alloc_page() and free_page() when under a very heavy memory load. I assume you've done some measurements and this has been very heavy. Can you give more details about how much CPU time is getting burned by alloc_page() and free_page()? I had been assuming that if we're I/O bound, the extra CPU time to allocate and free the pages wouldn't be really onerous. If you're seeing something different, I'd love to see some data (perf traces, etc.) to correct my impressions. Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jaegeuk, > -----Original Message----- > From: Jaegeuk Kim [mailto:jaegeuk@kernel.org] > Sent: Thursday, May 28, 2015 3:10 AM > To: Chao Yu > Cc: linux-kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; > linux-f2fs-devel@lists.sourceforge.net > Subject: Re: [f2fs-dev] [PATCH 2/3] f2fs crypto: use bounce pages from mempool first > > Hi Chao, > > On Mon, May 25, 2015 at 06:00:47PM +0800, Chao Yu wrote: > > Hi Jaegeuk, > > [snip] > > > > Using alloc_page(GFP_NOFS | __GFP_WAIT) to avoid failure? > > #define GFP_NOFS (__GFP_WAIT | __GFP_IO) That's right, my miss. Thanks, -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/f2fs/crypto.c b/fs/f2fs/crypto.c index 2c7819a..2ceee68 100644 --- a/fs/f2fs/crypto.c +++ b/fs/f2fs/crypto.c @@ -408,20 +408,12 @@ struct page *f2fs_encrypt(struct inode *inode, return (struct page *)ctx; /* The encryption operation will require a bounce page. */ - ciphertext_page = alloc_page(GFP_NOFS); + ciphertext_page = mempool_alloc(f2fs_bounce_page_pool, GFP_NOFS); if (!ciphertext_page) { - /* - * This is a potential bottleneck, but at least we'll have - * forward progress. - */ - ciphertext_page = mempool_alloc(f2fs_bounce_page_pool, - GFP_NOFS); - if (WARN_ON_ONCE(!ciphertext_page)) - ciphertext_page = mempool_alloc(f2fs_bounce_page_pool, - GFP_NOFS | __GFP_WAIT); - ctx->flags &= ~F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL; - } else { + ciphertext_page = alloc_page(GFP_NOFS); ctx->flags |= F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL; + } else { + ctx->flags &= ~F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL; } ctx->flags |= F2FS_WRITE_PATH_FL; ctx->w.bounce_page = ciphertext_page;
If a lot of write streams are triggered, alloc_page and __free_page are costly called, resulting in high memory pressure. In order to avoid that, let's reuse mempool pages for writeback pages. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> --- fs/f2fs/crypto.c | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-)