Message ID | 577429c985d01407c27141db4015c50d8ba7c746.1743731232.git.wqu@suse.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | btrfs: fsstress hang fix for large data folios | expand |
On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote: > > With large folios, filemap_get_folios_contig() can return duplicated > folios, for example: > > 704K 768K 1M > |<-- 64K sized folio --->|<------- 256K sized folio ----->| > | | > 764K 800K > > If we call lock_delalloc_folios() with range [762K, 800K) on above > layout with locked folio at 704K, we will hit the following sequence: > > 1. filemap_get_folios_contig() returned 1 for range [764K, 768K) > As this is a folio boundary. > > The returned folio will be folio at 704K. > > Since the folio is already locked, we will not lock the folio. > > 2. filemap_get_folios_contig() returned 8 for range [768K, 800K) > All the entries are the same folio at 768K. > > 3. We lock folio 768K for the slot 0 of the fbatch > > 4. We lock folio 768K for the slot 1 of the fbatch > We deadlock, as we have already locked the same folio at step 3. > > The filemap_get_folios_contig() behavior is definitely not ideal, but on > the other hand, we do not really need the filemap_get_folios_contig() > call either. > > The current filemap_get_folios() is already good enough, and we require > no strong contiguous requirement either, we only need the returned folios > contiguous at file map level (aka, their folio file offsets are contiguous). This paragraph is confusing. This says filemap_get_folios() provides contiguous results in the sense that the file offset of each folio is greater than the previous ones (the folios in the batch are ordered by file offsets). But then what is the kind of contiguous results that filemap_get_folios_contig() provides? How different is it? Is it that the batch doesn't get "holes" - i.e. a folio's start always matches the end of the previous one +1? > > So get rid of the cursed filemap_get_folios_contig() and use regular > filemap_get_folios() instead, this will fix the above deadlock for large > folios. I think it's worth adding in this changelog that it is known that filemap_get_folios_contig() has a bug and there's a pending fix for it, adding links to the thread you started and the respective fix: https://lore.kernel.org/linux-btrfs/b714e4de-2583-4035-b829-72cfb5eb6fc6@gmx.com/ https://lore.kernel.org/linux-btrfs/Z-8s1-kiIDkzgRbc@fedora/ Anyway, it seems good, so with those small changes: Reviewed-by: Filipe Manana <fdmanana@suse.com> Thanks. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/extent_io.c | 6 ++---- > fs/btrfs/tests/extent-io-tests.c | 3 +-- > 2 files changed, 3 insertions(+), 6 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index f0d51f6ed951..986bda2eff1c 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -207,8 +207,7 @@ static void __process_folios_contig(struct address_space *mapping, > while (index <= end_index) { > int found_folios; > > - found_folios = filemap_get_folios_contig(mapping, &index, > - end_index, &fbatch); > + found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch); > for (i = 0; i < found_folios; i++) { > struct folio *folio = fbatch.folios[i]; > > @@ -245,8 +244,7 @@ static noinline int lock_delalloc_folios(struct inode *inode, > while (index <= end_index) { > unsigned int found_folios, i; > > - found_folios = filemap_get_folios_contig(mapping, &index, > - end_index, &fbatch); > + found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch); > if (found_folios == 0) > goto out; > > diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c > index 74aca7180a5a..e762eca8a99f 100644 > --- a/fs/btrfs/tests/extent-io-tests.c > +++ b/fs/btrfs/tests/extent-io-tests.c > @@ -32,8 +32,7 @@ static noinline int process_page_range(struct inode *inode, u64 start, u64 end, > folio_batch_init(&fbatch); > > while (index <= end_index) { > - ret = filemap_get_folios_contig(inode->i_mapping, &index, > - end_index, &fbatch); > + ret = filemap_get_folios(inode->i_mapping, &index, end_index, &fbatch); > for (i = 0; i < ret; i++) { > struct folio *folio = fbatch.folios[i]; > > -- > 2.49.0 > >
在 2025/4/5 03:08, Filipe Manana 写道: > On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote: >> >> With large folios, filemap_get_folios_contig() can return duplicated >> folios, for example: >> >> 704K 768K 1M >> |<-- 64K sized folio --->|<------- 256K sized folio ----->| >> | | >> 764K 800K >> >> If we call lock_delalloc_folios() with range [762K, 800K) on above >> layout with locked folio at 704K, we will hit the following sequence: >> >> 1. filemap_get_folios_contig() returned 1 for range [764K, 768K) >> As this is a folio boundary. >> >> The returned folio will be folio at 704K. >> >> Since the folio is already locked, we will not lock the folio. >> >> 2. filemap_get_folios_contig() returned 8 for range [768K, 800K) >> All the entries are the same folio at 768K. >> >> 3. We lock folio 768K for the slot 0 of the fbatch >> >> 4. We lock folio 768K for the slot 1 of the fbatch >> We deadlock, as we have already locked the same folio at step 3. >> >> The filemap_get_folios_contig() behavior is definitely not ideal, but on >> the other hand, we do not really need the filemap_get_folios_contig() >> call either. >> >> The current filemap_get_folios() is already good enough, and we require >> no strong contiguous requirement either, we only need the returned folios >> contiguous at file map level (aka, their folio file offsets are contiguous). > > This paragraph is confusing. > This says filemap_get_folios() provides contiguous results in the > sense that the file offset of each folio is greater than the previous > ones (the folios in the batch are ordered by file offsets). > But then what is the kind of contiguous results that > filemap_get_folios_contig() provides? How different is it? Is it that > the batch doesn't get "holes" - i.e. a folio's start always matches > the end of the previous one +1? From my understanding, filemap_get_folios_contig() returns physically contiguous pages/folios. And the hole handling is always the caller's responsibility, no matter if it's the _contig() version or not. Thus for filemap_get_folios_contig(), and the above two large folios cases, as long as the two large folios are not physically contiguous, the call returns the first folio, then the next call it returns the next folio. Not returning both in one go, and this behavior is not that useful to us either. And talking about holes, due to the _contig() behavior itself, if we hit a hole the function will definitely return, but between different calls caller should check the folio's position between calls, to make sure no holes is between two filemap_get_folios_contig() calls. Of course, no caller is really doing that, thus another reason why we do not need the filemap_get_folios_contig() call. Thanks, Qu> >> >> So get rid of the cursed filemap_get_folios_contig() and use regular >> filemap_get_folios() instead, this will fix the above deadlock for large >> folios. > > I think it's worth adding in this changelog that it is known that > filemap_get_folios_contig() has a bug and there's a pending fix for > it, adding links to the thread you started and the respective fix: > > https://lore.kernel.org/linux-btrfs/b714e4de-2583-4035-b829-72cfb5eb6fc6@gmx.com/ > https://lore.kernel.org/linux-btrfs/Z-8s1-kiIDkzgRbc@fedora/ > > Anyway, it seems good, so with those small changes: > > Reviewed-by: Filipe Manana <fdmanana@suse.com> > > Thanks. > >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/extent_io.c | 6 ++---- >> fs/btrfs/tests/extent-io-tests.c | 3 +-- >> 2 files changed, 3 insertions(+), 6 deletions(-) >> >> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c >> index f0d51f6ed951..986bda2eff1c 100644 >> --- a/fs/btrfs/extent_io.c >> +++ b/fs/btrfs/extent_io.c >> @@ -207,8 +207,7 @@ static void __process_folios_contig(struct address_space *mapping, >> while (index <= end_index) { >> int found_folios; >> >> - found_folios = filemap_get_folios_contig(mapping, &index, >> - end_index, &fbatch); >> + found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch); >> for (i = 0; i < found_folios; i++) { >> struct folio *folio = fbatch.folios[i]; >> >> @@ -245,8 +244,7 @@ static noinline int lock_delalloc_folios(struct inode *inode, >> while (index <= end_index) { >> unsigned int found_folios, i; >> >> - found_folios = filemap_get_folios_contig(mapping, &index, >> - end_index, &fbatch); >> + found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch); >> if (found_folios == 0) >> goto out; >> >> diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c >> index 74aca7180a5a..e762eca8a99f 100644 >> --- a/fs/btrfs/tests/extent-io-tests.c >> +++ b/fs/btrfs/tests/extent-io-tests.c >> @@ -32,8 +32,7 @@ static noinline int process_page_range(struct inode *inode, u64 start, u64 end, >> folio_batch_init(&fbatch); >> >> while (index <= end_index) { >> - ret = filemap_get_folios_contig(inode->i_mapping, &index, >> - end_index, &fbatch); >> + ret = filemap_get_folios(inode->i_mapping, &index, end_index, &fbatch); >> for (i = 0; i < ret; i++) { >> struct folio *folio = fbatch.folios[i]; >> >> -- >> 2.49.0 >> >>
在 2025/4/5 08:21, Qu Wenruo 写道: > > > 在 2025/4/5 03:08, Filipe Manana 写道: >> On Fri, Apr 4, 2025 at 2:48 AM Qu Wenruo <wqu@suse.com> wrote: >>> >>> With large folios, filemap_get_folios_contig() can return duplicated >>> folios, for example: >>> >>> 704K 768K 1M >>> |<-- 64K sized folio --->|<------- 256K sized folio ----->| >>> | | >>> 764K 800K >>> >>> If we call lock_delalloc_folios() with range [762K, 800K) on above >>> layout with locked folio at 704K, we will hit the following sequence: >>> >>> 1. filemap_get_folios_contig() returned 1 for range [764K, 768K) >>> As this is a folio boundary. >>> >>> The returned folio will be folio at 704K. >>> >>> Since the folio is already locked, we will not lock the folio. >>> >>> 2. filemap_get_folios_contig() returned 8 for range [768K, 800K) >>> All the entries are the same folio at 768K. >>> >>> 3. We lock folio 768K for the slot 0 of the fbatch >>> >>> 4. We lock folio 768K for the slot 1 of the fbatch >>> We deadlock, as we have already locked the same folio at step 3. >>> >>> The filemap_get_folios_contig() behavior is definitely not ideal, but on >>> the other hand, we do not really need the filemap_get_folios_contig() >>> call either. >>> >>> The current filemap_get_folios() is already good enough, and we require >>> no strong contiguous requirement either, we only need the returned >>> folios >>> contiguous at file map level (aka, their folio file offsets are >>> contiguous). >> >> This paragraph is confusing. >> This says filemap_get_folios() provides contiguous results in the >> sense that the file offset of each folio is greater than the previous >> ones (the folios in the batch are ordered by file offsets). >> But then what is the kind of contiguous results that >> filemap_get_folios_contig() provides? How different is it? Is it that >> the batch doesn't get "holes" - i.e. a folio's start always matches >> the end of the previous one +1? > > From my understanding, filemap_get_folios_contig() returns physically > contiguous pages/folios. > > And the hole handling is always the caller's responsibility, no matter > if it's the _contig() version or not. > > Thus for filemap_get_folios_contig(), and the above two large folios > cases, as long as the two large folios are not physically contiguous, > the call returns the first folio, then the next call it returns the next > folio. > > Not returning both in one go, and this behavior is not that useful to us > either. My bad, I should double check the code before commenting on this. It turns out it's not really splitting the result based on physical addresses, but purely returning to the end of a large folio. Which is another thing we do not really want in btrfs' code. So the contiguous part really seems to be the following points: - There should always be a folio at the start index Or it will return 0 immediately - No holes in the returned fbatch Although it can still return the result early, due to large folios. Anyway I'll add these to the next version of the commit message, to explain the behavior in details and why it's safe we go the regular filemap_get_folios() call. Thanks, Qu > > > And talking about holes, due to the _contig() behavior itself, if we hit > a hole the function will definitely return, but between different calls > caller should check the folio's position between calls, to make sure no > holes is between two filemap_get_folios_contig() calls. > > Of course, no caller is really doing that, thus another reason why we do > not need the filemap_get_folios_contig() call. > > Thanks, > Qu> >>> >>> So get rid of the cursed filemap_get_folios_contig() and use regular >>> filemap_get_folios() instead, this will fix the above deadlock for large >>> folios. >> >> I think it's worth adding in this changelog that it is known that >> filemap_get_folios_contig() has a bug and there's a pending fix for >> it, adding links to the thread you started and the respective fix: >> >> https://lore.kernel.org/linux-btrfs/b714e4de-2583-4035- >> b829-72cfb5eb6fc6@gmx.com/ >> https://lore.kernel.org/linux-btrfs/Z-8s1-kiIDkzgRbc@fedora/ >> >> Anyway, it seems good, so with those small changes: >> >> Reviewed-by: Filipe Manana <fdmanana@suse.com> >> >> Thanks. >> >>> >>> Signed-off-by: Qu Wenruo <wqu@suse.com> >>> --- >>> fs/btrfs/extent_io.c | 6 ++---- >>> fs/btrfs/tests/extent-io-tests.c | 3 +-- >>> 2 files changed, 3 insertions(+), 6 deletions(-) >>> >>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c >>> index f0d51f6ed951..986bda2eff1c 100644 >>> --- a/fs/btrfs/extent_io.c >>> +++ b/fs/btrfs/extent_io.c >>> @@ -207,8 +207,7 @@ static void __process_folios_contig(struct >>> address_space *mapping, >>> while (index <= end_index) { >>> int found_folios; >>> >>> - found_folios = filemap_get_folios_contig(mapping, >>> &index, >>> - end_index, &fbatch); >>> + found_folios = filemap_get_folios(mapping, &index, >>> end_index, &fbatch); >>> for (i = 0; i < found_folios; i++) { >>> struct folio *folio = fbatch.folios[i]; >>> >>> @@ -245,8 +244,7 @@ static noinline int lock_delalloc_folios(struct >>> inode *inode, >>> while (index <= end_index) { >>> unsigned int found_folios, i; >>> >>> - found_folios = filemap_get_folios_contig(mapping, >>> &index, >>> - end_index, &fbatch); >>> + found_folios = filemap_get_folios(mapping, &index, >>> end_index, &fbatch); >>> if (found_folios == 0) >>> goto out; >>> >>> diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/ >>> extent-io-tests.c >>> index 74aca7180a5a..e762eca8a99f 100644 >>> --- a/fs/btrfs/tests/extent-io-tests.c >>> +++ b/fs/btrfs/tests/extent-io-tests.c >>> @@ -32,8 +32,7 @@ static noinline int process_page_range(struct inode >>> *inode, u64 start, u64 end, >>> folio_batch_init(&fbatch); >>> >>> while (index <= end_index) { >>> - ret = filemap_get_folios_contig(inode->i_mapping, >>> &index, >>> - end_index, &fbatch); >>> + ret = filemap_get_folios(inode->i_mapping, &index, >>> end_index, &fbatch); >>> for (i = 0; i < ret; i++) { >>> struct folio *folio = fbatch.folios[i]; >>> >>> -- >>> 2.49.0 >>> >>> > >
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index f0d51f6ed951..986bda2eff1c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -207,8 +207,7 @@ static void __process_folios_contig(struct address_space *mapping, while (index <= end_index) { int found_folios; - found_folios = filemap_get_folios_contig(mapping, &index, - end_index, &fbatch); + found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch); for (i = 0; i < found_folios; i++) { struct folio *folio = fbatch.folios[i]; @@ -245,8 +244,7 @@ static noinline int lock_delalloc_folios(struct inode *inode, while (index <= end_index) { unsigned int found_folios, i; - found_folios = filemap_get_folios_contig(mapping, &index, - end_index, &fbatch); + found_folios = filemap_get_folios(mapping, &index, end_index, &fbatch); if (found_folios == 0) goto out; diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c index 74aca7180a5a..e762eca8a99f 100644 --- a/fs/btrfs/tests/extent-io-tests.c +++ b/fs/btrfs/tests/extent-io-tests.c @@ -32,8 +32,7 @@ static noinline int process_page_range(struct inode *inode, u64 start, u64 end, folio_batch_init(&fbatch); while (index <= end_index) { - ret = filemap_get_folios_contig(inode->i_mapping, &index, - end_index, &fbatch); + ret = filemap_get_folios(inode->i_mapping, &index, end_index, &fbatch); for (i = 0; i < ret; i++) { struct folio *folio = fbatch.folios[i];
With large folios, filemap_get_folios_contig() can return duplicated folios, for example: 704K 768K 1M |<-- 64K sized folio --->|<------- 256K sized folio ----->| | | 764K 800K If we call lock_delalloc_folios() with range [762K, 800K) on above layout with locked folio at 704K, we will hit the following sequence: 1. filemap_get_folios_contig() returned 1 for range [764K, 768K) As this is a folio boundary. The returned folio will be folio at 704K. Since the folio is already locked, we will not lock the folio. 2. filemap_get_folios_contig() returned 8 for range [768K, 800K) All the entries are the same folio at 768K. 3. We lock folio 768K for the slot 0 of the fbatch 4. We lock folio 768K for the slot 1 of the fbatch We deadlock, as we have already locked the same folio at step 3. The filemap_get_folios_contig() behavior is definitely not ideal, but on the other hand, we do not really need the filemap_get_folios_contig() call either. The current filemap_get_folios() is already good enough, and we require no strong contiguous requirement either, we only need the returned folios contiguous at file map level (aka, their folio file offsets are contiguous). So get rid of the cursed filemap_get_folios_contig() and use regular filemap_get_folios() instead, this will fix the above deadlock for large folios. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/extent_io.c | 6 ++---- fs/btrfs/tests/extent-io-tests.c | 3 +-- 2 files changed, 3 insertions(+), 6 deletions(-)