Message ID | 39b20c5e65df079ad99aa06ec7f70f164a541c09.1726011483.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: wait for writeback if sector size is smaller than page size | expand |
On Wed, Sep 11, 2024 at 09:09:05AM +0930, Qu Wenruo wrote: > [PROBLEM] > If sector perfect compression is enabled for sector size < page size > case, the following case can lead dirty ranges not being written back: > > 0 32K 64K 96K 128K > | |///////||//////| |/| > 124K > > In above example, the page size is 64K, and we need to write back above > two pages. > > - Submit for page 0 (main thread) > We found delalloc range [32K, 96K), which can be compressed. > So we queue an async range for [32K, 96K). > This means, the page unlock/clearing dirty/setting writeback will > all happen in a workqueue context. > > - The compression is done, and compressed range is submitted (workqueue) > Since the compression is done in asynchronously, the compression can > be done before the main thread to submit for page 64K. > > Now the whole range [32K, 96K), involving two pages, will be marked > writeback. > > - Submit for page 64K (main thread) > extent_write_cache_pages() got its wbc->sync_mode is WB_SYNC_NONE, > so it skips the writeback wait. > > And unlock the page and exit. This means the dirty range [124K, 128K) > will never be submitted, until next writeback happens for page 64K. > > This will never happen for previous kernels because: > > - For sector size == page size case > Since one page is one sector, if a page is marked writeback it will > not have dirty flags. > So this corner case will never hit. > > - For sector size < page size case > We never do subpage compression, a range can only be submitted for > compression if the range is fully page aligned. > This change makes the subpage behavior mostly the same as non-subpage > cases. > > [ENHANCEMENT] > Instead of relying WB_SYNC_NONE check only, if it's a subpage case, then > always wait for writeback flags. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/extent_io.c | 22 +++++++++++++++++++++- > 1 file changed, 21 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 385e88b7fcf5..644e00d5b0f8 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -2116,7 +2116,27 @@ static int extent_write_cache_pages(struct address_space *mapping, > continue; > } > > - if (wbc->sync_mode != WB_SYNC_NONE) { > + /* > + * For subpage case, compression can lead to mixed > + * writeback and dirty flags, e.g: > + * 0 32K 64K 96K 128K > + * | |//////||/////| |//| > + * > + * In above case, [32K, 96K) is asynchronously submitted > + * for compression, and [124K, 128K) needs to be written back. > + * > + * If we didn't wait wrtiteback for page 64K, [128K, 128K) Should this be referring to the page from offset 64K, ie [64K, 128K)? Otherwise the range in the comment does not make sense. > + * won't be submitted as the page still has writeback flag > + * and will be skipped in the next check. > + * > + * This mixed writeback and dirty case is only possible for > + * subpage case. > + * > + * TODO: Remove this check after migrating compression to > + * regular submission. Please don't add the TODOs, keep a note or add an assertion instead. > + */ > + if (wbc->sync_mode != WB_SYNC_NONE || > + btrfs_is_subpage(inode_to_fs_info(inode), mapping)) { > if (folio_test_writeback(folio)) > submit_write_bio(bio_ctrl, 0); > folio_wait_writeback(folio); > -- > 2.46.0 >
在 2024/9/18 02:15, David Sterba 写道: > On Wed, Sep 11, 2024 at 09:09:05AM +0930, Qu Wenruo wrote: >> [PROBLEM] >> If sector perfect compression is enabled for sector size < page size >> case, the following case can lead dirty ranges not being written back: >> >> 0 32K 64K 96K 128K >> | |///////||//////| |/| >> 124K >> >> In above example, the page size is 64K, and we need to write back above >> two pages. >> >> - Submit for page 0 (main thread) >> We found delalloc range [32K, 96K), which can be compressed. >> So we queue an async range for [32K, 96K). >> This means, the page unlock/clearing dirty/setting writeback will >> all happen in a workqueue context. >> >> - The compression is done, and compressed range is submitted (workqueue) >> Since the compression is done in asynchronously, the compression can >> be done before the main thread to submit for page 64K. >> >> Now the whole range [32K, 96K), involving two pages, will be marked >> writeback. >> >> - Submit for page 64K (main thread) >> extent_write_cache_pages() got its wbc->sync_mode is WB_SYNC_NONE, >> so it skips the writeback wait. >> >> And unlock the page and exit. This means the dirty range [124K, 128K) >> will never be submitted, until next writeback happens for page 64K. >> >> This will never happen for previous kernels because: >> >> - For sector size == page size case >> Since one page is one sector, if a page is marked writeback it will >> not have dirty flags. >> So this corner case will never hit. >> >> - For sector size < page size case >> We never do subpage compression, a range can only be submitted for >> compression if the range is fully page aligned. >> This change makes the subpage behavior mostly the same as non-subpage >> cases. >> >> [ENHANCEMENT] >> Instead of relying WB_SYNC_NONE check only, if it's a subpage case, then >> always wait for writeback flags. >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/extent_io.c | 22 +++++++++++++++++++++- >> 1 file changed, 21 insertions(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c >> index 385e88b7fcf5..644e00d5b0f8 100644 >> --- a/fs/btrfs/extent_io.c >> +++ b/fs/btrfs/extent_io.c >> @@ -2116,7 +2116,27 @@ static int extent_write_cache_pages(struct address_space *mapping, >> continue; >> } >> >> - if (wbc->sync_mode != WB_SYNC_NONE) { >> + /* >> + * For subpage case, compression can lead to mixed >> + * writeback and dirty flags, e.g: >> + * 0 32K 64K 96K 128K >> + * | |//////||/////| |//| >> + * >> + * In above case, [32K, 96K) is asynchronously submitted >> + * for compression, and [124K, 128K) needs to be written back. >> + * >> + * If we didn't wait wrtiteback for page 64K, [128K, 128K) > > Should this be referring to the page from offset 64K, ie [64K, 128K)? > Otherwise the range in the comment does not make sense. My bad, I mean [124K, 128K). > >> + * won't be submitted as the page still has writeback flag >> + * and will be skipped in the next check. >> + * >> + * This mixed writeback and dirty case is only possible for >> + * subpage case. >> + * >> + * TODO: Remove this check after migrating compression to >> + * regular submission. > > Please don't add the TODOs, keep a note or add an assertion instead. OK. Thanks, Qu > >> + */ >> + if (wbc->sync_mode != WB_SYNC_NONE || >> + btrfs_is_subpage(inode_to_fs_info(inode), mapping)) { >> if (folio_test_writeback(folio)) >> submit_write_bio(bio_ctrl, 0); >> folio_wait_writeback(folio); >> -- >> 2.46.0 >> >
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 385e88b7fcf5..644e00d5b0f8 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2116,7 +2116,27 @@ static int extent_write_cache_pages(struct address_space *mapping, continue; } - if (wbc->sync_mode != WB_SYNC_NONE) { + /* + * For subpage case, compression can lead to mixed + * writeback and dirty flags, e.g: + * 0 32K 64K 96K 128K + * | |//////||/////| |//| + * + * In above case, [32K, 96K) is asynchronously submitted + * for compression, and [124K, 128K) needs to be written back. + * + * If we didn't wait wrtiteback for page 64K, [128K, 128K) + * won't be submitted as the page still has writeback flag + * and will be skipped in the next check. + * + * This mixed writeback and dirty case is only possible for + * subpage case. + * + * TODO: Remove this check after migrating compression to + * regular submission. + */ + if (wbc->sync_mode != WB_SYNC_NONE || + btrfs_is_subpage(inode_to_fs_info(inode), mapping)) { if (folio_test_writeback(folio)) submit_write_bio(bio_ctrl, 0); folio_wait_writeback(folio);
[PROBLEM] If sector perfect compression is enabled for sector size < page size case, the following case can lead dirty ranges not being written back: 0 32K 64K 96K 128K | |///////||//////| |/| 124K In above example, the page size is 64K, and we need to write back above two pages. - Submit for page 0 (main thread) We found delalloc range [32K, 96K), which can be compressed. So we queue an async range for [32K, 96K). This means, the page unlock/clearing dirty/setting writeback will all happen in a workqueue context. - The compression is done, and compressed range is submitted (workqueue) Since the compression is done in asynchronously, the compression can be done before the main thread to submit for page 64K. Now the whole range [32K, 96K), involving two pages, will be marked writeback. - Submit for page 64K (main thread) extent_write_cache_pages() got its wbc->sync_mode is WB_SYNC_NONE, so it skips the writeback wait. And unlock the page and exit. This means the dirty range [124K, 128K) will never be submitted, until next writeback happens for page 64K. This will never happen for previous kernels because: - For sector size == page size case Since one page is one sector, if a page is marked writeback it will not have dirty flags. So this corner case will never hit. - For sector size < page size case We never do subpage compression, a range can only be submitted for compression if the range is fully page aligned. This change makes the subpage behavior mostly the same as non-subpage cases. [ENHANCEMENT] Instead of relying WB_SYNC_NONE check only, if it's a subpage case, then always wait for writeback flags. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/extent_io.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-)