diff mbox series

[3/3] btrfs: get rid of filemap_get_folios_contig() calls

Message ID 577429c985d01407c27141db4015c50d8ba7c746.1743731232.git.wqu@suse.com (mailing list archive)
State New
Headers show
Series btrfs: fsstress hang fix for large data folios | expand

Commit Message

Qu Wenruo April 4, 2025, 1:47 a.m. UTC
With large folios, filemap_get_folios_contig() can return duplicated
folios, for example:

	704K                     768K	                          1M
        |<-- 64K sized folio --->|<------- 256K sized folio ----->|
	                      |          |
			      764K       800K

If we call lock_delalloc_folios() with range [762K, 800K) on above
layout with locked folio at 704K, we will hit the following sequence:

1. filemap_get_folios_contig() returned 1 for range [764K, 768K)
   As this is a folio boundary.

   The returned folio will be folio at 704K.

   Since the folio is already locked, we will not lock the folio.

2. filemap_get_folios_contig() returned 8 for range [768K, 800K)
   All the entries are the same folio at 768K.

3. We lock folio 768K for the slot 0 of the fbatch

4. We lock folio 768K for the slot 1 of the fbatch
   We deadlock, as we have already locked the same folio at step 3.

The filemap_get_folios_contig() behavior is definitely not ideal, but on
the other hand, we do not really need the filemap_get_folios_contig()
call either.

The current filemap_get_folios() is already good enough, and we require
no strong contiguous requirement either, we only need the returned folios
contiguous at file map level (aka, their folio file offsets are contiguous).

So get rid of the cursed filemap_get_folios_contig() and use regular
filemap_get_folios() instead, this will fix the above deadlock for large
folios.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c             | 6 ++----
 fs/btrfs/tests/extent-io-tests.c | 3 +--
 2 files changed, 3 insertions(+), 6 deletions(-)

Comments

Filipe Manana April 4, 2025, 4:38 p.m. UTC | #1
On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote:
>
> With large folios, filemap_get_folios_contig() can return duplicated
> folios, for example:
>
>         704K                     768K                             1M
>         |<-- 64K sized folio --->|<------- 256K sized folio ----->|
>                               |          |
>                               764K       800K
>
> If we call lock_delalloc_folios() with range [762K, 800K) on above
> layout with locked folio at 704K, we will hit the following sequence:
>
> 1. filemap_get_folios_contig() returned 1 for range [764K, 768K)
>    As this is a folio boundary.
>
>    The returned folio will be folio at 704K.
>
>    Since the folio is already locked, we will not lock the folio.
>
> 2. filemap_get_folios_contig() returned 8 for range [768K, 800K)
>    All the entries are the same folio at 768K.
>
> 3. We lock folio 768K for the slot 0 of the fbatch
>
> 4. We lock folio 768K for the slot 1 of the fbatch
>    We deadlock, as we have already locked the same folio at step 3.
>
> The filemap_get_folios_contig() behavior is definitely not ideal, but on
> the other hand, we do not really need the filemap_get_folios_contig()
> call either.
>
> The current filemap_get_folios() is already good enough, and we require
> no strong contiguous requirement either, we only need the returned folios
> contiguous at file map level (aka, their folio file offsets are contiguous).

This paragraph is confusing.
This says filemap_get_folios() provides contiguous results in the
sense that the file offset of each folio is greater than the previous
ones (the folios in the batch are ordered by file offsets).
But then what is the kind of contiguous results that
filemap_get_folios_contig() provides? How different is it? Is it that
the batch doesn't get "holes" - i.e. a folio's start always matches
the end of the previous one +1?

>
> So get rid of the cursed filemap_get_folios_contig() and use regular
> filemap_get_folios() instead, this will fix the above deadlock for large
> folios.

I think it's worth adding in this changelog that it is known that
filemap_get_folios_contig() has a bug and there's a pending fix for
it, adding links to the thread you started and the respective fix:

https://lore.kernel.org/linux-btrfs/b714e4de-2583-4035-b829-72cfb5eb6fc6@gmx.com/
https://lore.kernel.org/linux-btrfs/Z-8s1-kiIDkzgRbc@fedora/

Anyway, it seems good, so with those small changes:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c             | 6 ++----
>  fs/btrfs/tests/extent-io-tests.c | 3 +--
>  2 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index f0d51f6ed951..986bda2eff1c 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -207,8 +207,7 @@ static void __process_folios_contig(struct address_space *mapping,
>         while (index <= end_index) {
>                 int found_folios;
>
> -               found_folios = filemap_get_folios_contig(mapping, &index,
> -                               end_index, &fbatch);
> +               found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch);
>                 for (i = 0; i < found_folios; i++) {
>                         struct folio *folio = fbatch.folios[i];
>
> @@ -245,8 +244,7 @@ static noinline int lock_delalloc_folios(struct inode *inode,
>         while (index <= end_index) {
>                 unsigned int found_folios, i;
>
> -               found_folios = filemap_get_folios_contig(mapping, &index,
> -                               end_index, &fbatch);
> +               found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch);
>                 if (found_folios == 0)
>                         goto out;
>
> diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c
> index 74aca7180a5a..e762eca8a99f 100644
> --- a/fs/btrfs/tests/extent-io-tests.c
> +++ b/fs/btrfs/tests/extent-io-tests.c
> @@ -32,8 +32,7 @@ static noinline int process_page_range(struct inode *inode, u64 start, u64 end,
>         folio_batch_init(&fbatch);
>
>         while (index <= end_index) {
> -               ret = filemap_get_folios_contig(inode->i_mapping, &index,
> -                               end_index, &fbatch);
> +               ret = filemap_get_folios(inode->i_mapping, &index, end_index, &fbatch);
>                 for (i = 0; i < ret; i++) {
>                         struct folio *folio = fbatch.folios[i];
>
> --
> 2.49.0
>
>
Qu Wenruo April 4, 2025, 9:51 p.m. UTC | #2
在 2025/4/5 03:08, Filipe Manana 写道:
> On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> With large folios, filemap_get_folios_contig() can return duplicated
>> folios, for example:
>>
>>          704K                     768K                             1M
>>          |<-- 64K sized folio --->|<------- 256K sized folio ----->|
>>                                |          |
>>                                764K       800K
>>
>> If we call lock_delalloc_folios() with range [762K, 800K) on above
>> layout with locked folio at 704K, we will hit the following sequence:
>>
>> 1. filemap_get_folios_contig() returned 1 for range [764K, 768K)
>>     As this is a folio boundary.
>>
>>     The returned folio will be folio at 704K.
>>
>>     Since the folio is already locked, we will not lock the folio.
>>
>> 2. filemap_get_folios_contig() returned 8 for range [768K, 800K)
>>     All the entries are the same folio at 768K.
>>
>> 3. We lock folio 768K for the slot 0 of the fbatch
>>
>> 4. We lock folio 768K for the slot 1 of the fbatch
>>     We deadlock, as we have already locked the same folio at step 3.
>>
>> The filemap_get_folios_contig() behavior is definitely not ideal, but on
>> the other hand, we do not really need the filemap_get_folios_contig()
>> call either.
>>
>> The current filemap_get_folios() is already good enough, and we require
>> no strong contiguous requirement either, we only need the returned folios
>> contiguous at file map level (aka, their folio file offsets are contiguous).
> 
> This paragraph is confusing.
> This says filemap_get_folios() provides contiguous results in the
> sense that the file offset of each folio is greater than the previous
> ones (the folios in the batch are ordered by file offsets).
> But then what is the kind of contiguous results that
> filemap_get_folios_contig() provides? How different is it? Is it that
> the batch doesn't get "holes" - i.e. a folio's start always matches
> the end of the previous one +1?

 From my understanding, filemap_get_folios_contig() returns physically 
contiguous pages/folios.

And the hole handling is always the caller's responsibility, no matter 
if it's the _contig() version or not.

Thus for filemap_get_folios_contig(), and the above two large folios 
cases, as long as the two large folios are not physically contiguous, 
the call returns the first folio, then the next call it returns the next 
folio.

Not returning both in one go, and this behavior is not that useful to us 
either.


And talking about holes, due to the _contig() behavior itself, if we hit 
a hole the function will definitely return, but between different calls 
caller should check the folio's position between calls, to make sure no 
holes is between two filemap_get_folios_contig() calls.

Of course, no caller is really doing that, thus another reason why we do 
not need the filemap_get_folios_contig() call.

Thanks,
Qu>
>>
>> So get rid of the cursed filemap_get_folios_contig() and use regular
>> filemap_get_folios() instead, this will fix the above deadlock for large
>> folios.
> 
> I think it's worth adding in this changelog that it is known that
> filemap_get_folios_contig() has a bug and there's a pending fix for
> it, adding links to the thread you started and the respective fix:
> 
> https://lore.kernel.org/linux-btrfs/b714e4de-2583-4035-b829-72cfb5eb6fc6@gmx.com/
> https://lore.kernel.org/linux-btrfs/Z-8s1-kiIDkzgRbc@fedora/
> 
> Anyway, it seems good, so with those small changes:
> 
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> 
> Thanks.
> 
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/extent_io.c             | 6 ++----
>>   fs/btrfs/tests/extent-io-tests.c | 3 +--
>>   2 files changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index f0d51f6ed951..986bda2eff1c 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -207,8 +207,7 @@ static void __process_folios_contig(struct address_space *mapping,
>>          while (index <= end_index) {
>>                  int found_folios;
>>
>> -               found_folios = filemap_get_folios_contig(mapping, &index,
>> -                               end_index, &fbatch);
>> +               found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch);
>>                  for (i = 0; i < found_folios; i++) {
>>                          struct folio *folio = fbatch.folios[i];
>>
>> @@ -245,8 +244,7 @@ static noinline int lock_delalloc_folios(struct inode *inode,
>>          while (index <= end_index) {
>>                  unsigned int found_folios, i;
>>
>> -               found_folios = filemap_get_folios_contig(mapping, &index,
>> -                               end_index, &fbatch);
>> +               found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch);
>>                  if (found_folios == 0)
>>                          goto out;
>>
>> diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c
>> index 74aca7180a5a..e762eca8a99f 100644
>> --- a/fs/btrfs/tests/extent-io-tests.c
>> +++ b/fs/btrfs/tests/extent-io-tests.c
>> @@ -32,8 +32,7 @@ static noinline int process_page_range(struct inode *inode, u64 start, u64 end,
>>          folio_batch_init(&fbatch);
>>
>>          while (index <= end_index) {
>> -               ret = filemap_get_folios_contig(inode->i_mapping, &index,
>> -                               end_index, &fbatch);
>> +               ret = filemap_get_folios(inode->i_mapping, &index, end_index, &fbatch);
>>                  for (i = 0; i < ret; i++) {
>>                          struct folio *folio = fbatch.folios[i];
>>
>> --
>> 2.49.0
>>
>>
Qu Wenruo April 4, 2025, 10 p.m. UTC | #3
在 2025/4/5 08:21, Qu Wenruo 写道:
> 
> 
> 在 2025/4/5 03:08, Filipe Manana 写道:
>> On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote:
>>>
>>> With large folios, filemap_get_folios_contig() can return duplicated
>>> folios, for example:
>>>
>>>          704K                     768K                             1M
>>>          |<-- 64K sized folio --->|<------- 256K sized folio ----->|
>>>                                |          |
>>>                                764K       800K
>>>
>>> If we call lock_delalloc_folios() with range [762K, 800K) on above
>>> layout with locked folio at 704K, we will hit the following sequence:
>>>
>>> 1. filemap_get_folios_contig() returned 1 for range [764K, 768K)
>>>     As this is a folio boundary.
>>>
>>>     The returned folio will be folio at 704K.
>>>
>>>     Since the folio is already locked, we will not lock the folio.
>>>
>>> 2. filemap_get_folios_contig() returned 8 for range [768K, 800K)
>>>     All the entries are the same folio at 768K.
>>>
>>> 3. We lock folio 768K for the slot 0 of the fbatch
>>>
>>> 4. We lock folio 768K for the slot 1 of the fbatch
>>>     We deadlock, as we have already locked the same folio at step 3.
>>>
>>> The filemap_get_folios_contig() behavior is definitely not ideal, but on
>>> the other hand, we do not really need the filemap_get_folios_contig()
>>> call either.
>>>
>>> The current filemap_get_folios() is already good enough, and we require
>>> no strong contiguous requirement either, we only need the returned 
>>> folios
>>> contiguous at file map level (aka, their folio file offsets are 
>>> contiguous).
>>
>> This paragraph is confusing.
>> This says filemap_get_folios() provides contiguous results in the
>> sense that the file offset of each folio is greater than the previous
>> ones (the folios in the batch are ordered by file offsets).
>> But then what is the kind of contiguous results that
>> filemap_get_folios_contig() provides? How different is it? Is it that
>> the batch doesn't get "holes" - i.e. a folio's start always matches
>> the end of the previous one +1?
> 
>  From my understanding, filemap_get_folios_contig() returns physically 
> contiguous pages/folios.
> 
> And the hole handling is always the caller's responsibility, no matter 
> if it's the _contig() version or not.
> 
> Thus for filemap_get_folios_contig(), and the above two large folios 
> cases, as long as the two large folios are not physically contiguous, 
> the call returns the first folio, then the next call it returns the next 
> folio.
> 
> Not returning both in one go, and this behavior is not that useful to us 
> either.

My bad, I should double check the code before commenting on this.

It turns out it's not really splitting the result based on physical 
addresses, but purely returning to the end of a large folio.
Which is another thing we do not really want in btrfs' code.

So the contiguous part really seems to be the following points:

- There should always be a folio at the start index
   Or it will return 0 immediately

- No holes in the returned fbatch
   Although it can still return the result early, due to large folios.

Anyway I'll add these to the next version of the commit message, to 
explain the behavior in details and why it's safe we go the regular 
filemap_get_folios() call.

Thanks,
Qu

> 
> 
> And talking about holes, due to the _contig() behavior itself, if we hit 
> a hole the function will definitely return, but between different calls 
> caller should check the folio's position between calls, to make sure no 
> holes is between two filemap_get_folios_contig() calls.
> 
> Of course, no caller is really doing that, thus another reason why we do 
> not need the filemap_get_folios_contig() call.
> 
> Thanks,
> Qu>
>>>
>>> So get rid of the cursed filemap_get_folios_contig() and use regular
>>> filemap_get_folios() instead, this will fix the above deadlock for large
>>> folios.
>>
>> I think it's worth adding in this changelog that it is known that
>> filemap_get_folios_contig() has a bug and there's a pending fix for
>> it, adding links to the thread you started and the respective fix:
>>
>> https://lore.kernel.org/linux-btrfs/b714e4de-2583-4035- 
>> b829-72cfb5eb6fc6@gmx.com/
>> https://lore.kernel.org/linux-btrfs/Z-8s1-kiIDkzgRbc@fedora/
>>
>> Anyway, it seems good, so with those small changes:
>>
>> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>>
>> Thanks.
>>
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>   fs/btrfs/extent_io.c             | 6 ++----
>>>   fs/btrfs/tests/extent-io-tests.c | 3 +--
>>>   2 files changed, 3 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>> index f0d51f6ed951..986bda2eff1c 100644
>>> --- a/fs/btrfs/extent_io.c
>>> +++ b/fs/btrfs/extent_io.c
>>> @@ -207,8 +207,7 @@ static void __process_folios_contig(struct 
>>> address_space *mapping,
>>>          while (index <= end_index) {
>>>                  int found_folios;
>>>
>>> -               found_folios = filemap_get_folios_contig(mapping, 
>>> &index,
>>> -                               end_index, &fbatch);
>>> +               found_folios = filemap_get_folios(mapping, &index, 
>>> end_index, &fbatch);
>>>                  for (i = 0; i < found_folios; i++) {
>>>                          struct folio *folio = fbatch.folios[i];
>>>
>>> @@ -245,8 +244,7 @@ static noinline int lock_delalloc_folios(struct 
>>> inode *inode,
>>>          while (index <= end_index) {
>>>                  unsigned int found_folios, i;
>>>
>>> -               found_folios = filemap_get_folios_contig(mapping, 
>>> &index,
>>> -                               end_index, &fbatch);
>>> +               found_folios = filemap_get_folios(mapping, &index, 
>>> end_index, &fbatch);
>>>                  if (found_folios == 0)
>>>                          goto out;
>>>
>>> diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/ 
>>> extent-io-tests.c
>>> index 74aca7180a5a..e762eca8a99f 100644
>>> --- a/fs/btrfs/tests/extent-io-tests.c
>>> +++ b/fs/btrfs/tests/extent-io-tests.c
>>> @@ -32,8 +32,7 @@ static noinline int process_page_range(struct inode 
>>> *inode, u64 start, u64 end,
>>>          folio_batch_init(&fbatch);
>>>
>>>          while (index <= end_index) {
>>> -               ret = filemap_get_folios_contig(inode->i_mapping, 
>>> &index,
>>> -                               end_index, &fbatch);
>>> +               ret = filemap_get_folios(inode->i_mapping, &index, 
>>> end_index, &fbatch);
>>>                  for (i = 0; i < ret; i++) {
>>>                          struct folio *folio = fbatch.folios[i];
>>>
>>> -- 
>>> 2.49.0
>>>
>>>
> 
>
diff mbox series

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f0d51f6ed951..986bda2eff1c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -207,8 +207,7 @@  static void __process_folios_contig(struct address_space *mapping,
 	while (index <= end_index) {
 		int found_folios;
 
-		found_folios = filemap_get_folios_contig(mapping, &index,
-				end_index, &fbatch);
+		found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch);
 		for (i = 0; i < found_folios; i++) {
 			struct folio *folio = fbatch.folios[i];
 
@@ -245,8 +244,7 @@  static noinline int lock_delalloc_folios(struct inode *inode,
 	while (index <= end_index) {
 		unsigned int found_folios, i;
 
-		found_folios = filemap_get_folios_contig(mapping, &index,
-				end_index, &fbatch);
+		found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch);
 		if (found_folios == 0)
 			goto out;
 
diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c
index 74aca7180a5a..e762eca8a99f 100644
--- a/fs/btrfs/tests/extent-io-tests.c
+++ b/fs/btrfs/tests/extent-io-tests.c
@@ -32,8 +32,7 @@  static noinline int process_page_range(struct inode *inode, u64 start, u64 end,
 	folio_batch_init(&fbatch);
 
 	while (index <= end_index) {
-		ret = filemap_get_folios_contig(inode->i_mapping, &index,
-				end_index, &fbatch);
+		ret = filemap_get_folios(inode->i_mapping, &index, end_index, &fbatch);
 		for (i = 0; i < ret; i++) {
 			struct folio *folio = fbatch.folios[i];