diff mbox series

btrfs: wait for writeback if sector size is smaller than page size

Message ID 39b20c5e65df079ad99aa06ec7f70f164a541c09.1726011483.git.wqu@suse.com (mailing list archive)
State New
Headers show
Series btrfs: wait for writeback if sector size is smaller than page size | expand

Commit Message

Qu Wenruo Sept. 10, 2024, 11:39 p.m. UTC
[PROBLEM]
If sector perfect compression is enabled for sector size < page size
case, the following case can lead dirty ranges not being written back:

     0     32K     64K     96K     128K
     |     |///////||//////|     |/|
                                 124K

In above example, the page size is 64K, and we need to write back above
two pages.

- Submit for page 0 (main thread)
  We found delalloc range [32K, 96K), which can be compressed.
  So we queue an async range for [32K, 96K).
  This means, the page unlock/clearing dirty/setting writeback will
  all happen in a workqueue context.

- The compression is done, and compressed range is submitted (workqueue)
  Since the compression is done in asynchronously, the compression can
  be done before the main thread to submit for page 64K.

  Now the whole range [32K, 96K), involving two pages, will be marked
  writeback.

- Submit for page 64K (main thread)
  extent_write_cache_pages() got its wbc->sync_mode is WB_SYNC_NONE,
  so it skips the writeback wait.

  And unlock the page and exit. This means the dirty range [124K, 128K)
  will never be submitted, until next writeback happens for page 64K.

This will never happen for previous kernels because:

- For sector size == page size case
  Since one page is one sector, if a page is marked writeback it will
  not have dirty flags.
  So this corner case will never hit.

- For sector size < page size case
  We never do subpage compression, a range can only be submitted for
  compression if the range is fully page aligned.
  This change makes the subpage behavior mostly the same as non-subpage
  cases.

[ENHANCEMENT]
Instead of relying WB_SYNC_NONE check only, if it's a subpage case, then
always wait for writeback flags.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

Comments

David Sterba Sept. 17, 2024, 4:45 p.m. UTC | #1
On Wed, Sep 11, 2024 at 09:09:05AM +0930, Qu Wenruo wrote:
> [PROBLEM]
> If sector perfect compression is enabled for sector size < page size
> case, the following case can lead dirty ranges not being written back:
> 
>      0     32K     64K     96K     128K
>      |     |///////||//////|     |/|
>                                  124K
> 
> In above example, the page size is 64K, and we need to write back above
> two pages.
> 
> - Submit for page 0 (main thread)
>   We found delalloc range [32K, 96K), which can be compressed.
>   So we queue an async range for [32K, 96K).
>   This means, the page unlock/clearing dirty/setting writeback will
>   all happen in a workqueue context.
> 
> - The compression is done, and compressed range is submitted (workqueue)
>   Since the compression is done in asynchronously, the compression can
>   be done before the main thread to submit for page 64K.
> 
>   Now the whole range [32K, 96K), involving two pages, will be marked
>   writeback.
> 
> - Submit for page 64K (main thread)
>   extent_write_cache_pages() got its wbc->sync_mode is WB_SYNC_NONE,
>   so it skips the writeback wait.
> 
>   And unlock the page and exit. This means the dirty range [124K, 128K)
>   will never be submitted, until next writeback happens for page 64K.
> 
> This will never happen for previous kernels because:
> 
> - For sector size == page size case
>   Since one page is one sector, if a page is marked writeback it will
>   not have dirty flags.
>   So this corner case will never hit.
> 
> - For sector size < page size case
>   We never do subpage compression, a range can only be submitted for
>   compression if the range is fully page aligned.
>   This change makes the subpage behavior mostly the same as non-subpage
>   cases.
> 
> [ENHANCEMENT]
> Instead of relying WB_SYNC_NONE check only, if it's a subpage case, then
> always wait for writeback flags.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 385e88b7fcf5..644e00d5b0f8 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2116,7 +2116,27 @@ static int extent_write_cache_pages(struct address_space *mapping,
>  				continue;
>  			}
>  
> -			if (wbc->sync_mode != WB_SYNC_NONE) {
> +			/*
> +			 * For subpage case, compression can lead to mixed
> +			 * writeback and dirty flags, e.g:
> +			 * 0     32K    64K    96K    128K
> +			 * |     |//////||/////|   |//|
> +			 *
> +			 * In above case, [32K, 96K) is asynchronously submitted
> +			 * for compression, and [124K, 128K) needs to be written back.
> +			 *
> +			 * If we didn't wait wrtiteback for page 64K, [128K, 128K)

Should this be referring to the page from offset 64K, ie [64K, 128K)?
Otherwise the range in the comment does not make sense.

> +			 * won't be submitted as the page still has writeback flag
> +			 * and will be skipped in the next check.
> +			 *
> +			 * This mixed writeback and dirty case is only possible for
> +			 * subpage case.
> +			 *
> +			 * TODO: Remove this check after migrating compression to
> +			 * regular submission.

Please don't add the TODOs, keep a note or add an assertion instead.

> +			 */
> +			if (wbc->sync_mode != WB_SYNC_NONE ||
> +			    btrfs_is_subpage(inode_to_fs_info(inode), mapping)) {
>  				if (folio_test_writeback(folio))
>  					submit_write_bio(bio_ctrl, 0);
>  				folio_wait_writeback(folio);
> -- 
> 2.46.0
>
Qu Wenruo Sept. 17, 2024, 9:16 p.m. UTC | #2
在 2024/9/18 02:15, David Sterba 写道:
> On Wed, Sep 11, 2024 at 09:09:05AM +0930, Qu Wenruo wrote:
>> [PROBLEM]
>> If sector perfect compression is enabled for sector size < page size
>> case, the following case can lead dirty ranges not being written back:
>>
>>       0     32K     64K     96K     128K
>>       |     |///////||//////|     |/|
>>                                   124K
>>
>> In above example, the page size is 64K, and we need to write back above
>> two pages.
>>
>> - Submit for page 0 (main thread)
>>    We found delalloc range [32K, 96K), which can be compressed.
>>    So we queue an async range for [32K, 96K).
>>    This means, the page unlock/clearing dirty/setting writeback will
>>    all happen in a workqueue context.
>>
>> - The compression is done, and compressed range is submitted (workqueue)
>>    Since the compression is done in asynchronously, the compression can
>>    be done before the main thread to submit for page 64K.
>>
>>    Now the whole range [32K, 96K), involving two pages, will be marked
>>    writeback.
>>
>> - Submit for page 64K (main thread)
>>    extent_write_cache_pages() got its wbc->sync_mode is WB_SYNC_NONE,
>>    so it skips the writeback wait.
>>
>>    And unlock the page and exit. This means the dirty range [124K, 128K)
>>    will never be submitted, until next writeback happens for page 64K.
>>
>> This will never happen for previous kernels because:
>>
>> - For sector size == page size case
>>    Since one page is one sector, if a page is marked writeback it will
>>    not have dirty flags.
>>    So this corner case will never hit.
>>
>> - For sector size < page size case
>>    We never do subpage compression, a range can only be submitted for
>>    compression if the range is fully page aligned.
>>    This change makes the subpage behavior mostly the same as non-subpage
>>    cases.
>>
>> [ENHANCEMENT]
>> Instead of relying WB_SYNC_NONE check only, if it's a subpage case, then
>> always wait for writeback flags.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/extent_io.c | 22 +++++++++++++++++++++-
>>   1 file changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 385e88b7fcf5..644e00d5b0f8 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -2116,7 +2116,27 @@ static int extent_write_cache_pages(struct address_space *mapping,
>>   				continue;
>>   			}
>>
>> -			if (wbc->sync_mode != WB_SYNC_NONE) {
>> +			/*
>> +			 * For subpage case, compression can lead to mixed
>> +			 * writeback and dirty flags, e.g:
>> +			 * 0     32K    64K    96K    128K
>> +			 * |     |//////||/////|   |//|
>> +			 *
>> +			 * In above case, [32K, 96K) is asynchronously submitted
>> +			 * for compression, and [124K, 128K) needs to be written back.
>> +			 *
>> +			 * If we didn't wait wrtiteback for page 64K, [128K, 128K)
>
> Should this be referring to the page from offset 64K, ie [64K, 128K)?
> Otherwise the range in the comment does not make sense.

My bad, I mean [124K, 128K).

>
>> +			 * won't be submitted as the page still has writeback flag
>> +			 * and will be skipped in the next check.
>> +			 *
>> +			 * This mixed writeback and dirty case is only possible for
>> +			 * subpage case.
>> +			 *
>> +			 * TODO: Remove this check after migrating compression to
>> +			 * regular submission.
>
> Please don't add the TODOs, keep a note or add an assertion instead.

OK.

Thanks,
Qu

>
>> +			 */
>> +			if (wbc->sync_mode != WB_SYNC_NONE ||
>> +			    btrfs_is_subpage(inode_to_fs_info(inode), mapping)) {
>>   				if (folio_test_writeback(folio))
>>   					submit_write_bio(bio_ctrl, 0);
>>   				folio_wait_writeback(folio);
>> --
>> 2.46.0
>>
>
diff mbox series

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 385e88b7fcf5..644e00d5b0f8 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2116,7 +2116,27 @@  static int extent_write_cache_pages(struct address_space *mapping,
 				continue;
 			}
 
-			if (wbc->sync_mode != WB_SYNC_NONE) {
+			/*
+			 * For subpage case, compression can lead to mixed
+			 * writeback and dirty flags, e.g:
+			 * 0     32K    64K    96K    128K
+			 * |     |//////||/////|   |//|
+			 *
+			 * In above case, [32K, 96K) is asynchronously submitted
+			 * for compression, and [124K, 128K) needs to be written back.
+			 *
+			 * If we didn't wait wrtiteback for page 64K, [128K, 128K)
+			 * won't be submitted as the page still has writeback flag
+			 * and will be skipped in the next check.
+			 *
+			 * This mixed writeback and dirty case is only possible for
+			 * subpage case.
+			 *
+			 * TODO: Remove this check after migrating compression to
+			 * regular submission.
+			 */
+			if (wbc->sync_mode != WB_SYNC_NONE ||
+			    btrfs_is_subpage(inode_to_fs_info(inode), mapping)) {
 				if (folio_test_writeback(folio))
 					submit_write_bio(bio_ctrl, 0);
 				folio_wait_writeback(folio);