Message ID | dd32747467e46ee7ce4feb8a1c3a30d93fd4b133.1702593423.git.josef@toxicpanda.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RESEND] btrfs: don't double put our subpage reference in alloc_extent_buffer | expand |
On 2023/12/15 09:09, Josef Bacik wrote: > ** Apologies if you're getting this twice, I fat fingered my email command ** > > This fix can be folded into "btrfs: refactor alloc_extent_buffer() to > allocate-then-attach method". > > We have been seeing panics in the CI for the subpage stuff recently, it > happens on btrfs/187 but could potentially happen anywhere. > > In the subpage case, if we race with somebody else inserting the same > extent buffer, the error case will end up calling > detach_extent_buffer_page() on the page twice. > > This is done first in the bit > > for (int i = 0; i < attached; i++) > detach_extent_buffer_page(eb, eb->pages[i]; > > and then again in btrfs_release_extent_buffer(). > > This works fine for !subpage because we're the only person who ever has > ourselves on the private, and so when we do the initial > detach_extent_buffer_page() we know we've completely removed it. > > However for subpage we could be using this page private elsewhere, so > this results in a double put on the subpage, which can result in an > early free'ing. > > The fix here is to clear eb->pages[i] for everything we detach. Then > anything still attached to the eb is freed in > btrfs_release_extent_buffer(). > > Because of this change we must update > btrfs_release_extent_buffer_pages() to not use num_extent_folios, > because it assumes eb->folio[0] is set properly. Since this is only > interested in free'ing any pages we have on the extent buffer we can > simply use INLINE_EXTENT_BUFFER_PAGES. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Thanks for exposing and fix the bug, Qu > --- > fs/btrfs/extent_io.c | 22 +++++++++++++++++++--- > 1 file changed, 19 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 7898e17e8d84..b42603098b6b 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -3185,11 +3185,9 @@ static void detach_extent_buffer_folio(struct extent_buffer *eb, struct folio *f > /* Release all pages attached to the extent buffer */ > static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb) > { > - int num_folios = num_extent_folios(eb); > - > ASSERT(!extent_buffer_under_io(eb)); > > - for (int i = 0; i < num_folios; i++) { > + for (int i = 0; i < INLINE_EXTENT_BUFFER_PAGES; i++) { > struct folio *folio = eb->folios[i]; > > if (!folio) > @@ -3754,10 +3752,28 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info, > > out: > WARN_ON(!atomic_dec_and_test(&eb->refs)); > + > + /* > + * Any attached folios need to be detached before we unlock them. This > + * is because when we're inserting our new folios into the mapping, and > + * then attaching our eb to that folio. If we fail to insert our folio > + * we'll lookup the folio for that index, and grab that EB. We do not > + * want that to grab this eb, as we're getting ready to free it. So we > + * have to detach it first and then unlock it. > + * > + * We have to drop our reference and NULL it out here because in the > + * subpage case detaching does a btrfs_folio_dec_eb_refs() for our eb. > + * Below when we call btrfs_release_extent_buffer() we will call > + * detach_extent_buffer_folio() on our remaining pages in the !subpage > + * case. If we left eb->folios[i] populated in the subpage case we'd > + * double put our reference and be super sad. > + */ > for (int i = 0; i < attached; i++) { > ASSERT(eb->folios[i]); > detach_extent_buffer_folio(eb, eb->folios[i]); > unlock_page(folio_page(eb->folios[i], 0)); > + folio_put(eb->folios[i]); > + eb->folios[i] = NULL; > } > /* > * Now all pages of that extent buffer is unmapped, set UNMAPPED flag,
On Thu, Dec 14, 2023 at 05:39:38PM -0500, Josef Bacik wrote: > ** Apologies if you're getting this twice, I fat fingered my email command ** > > This fix can be folded into "btrfs: refactor alloc_extent_buffer() to > allocate-then-attach method". > > We have been seeing panics in the CI for the subpage stuff recently, it > happens on btrfs/187 but could potentially happen anywhere. > > In the subpage case, if we race with somebody else inserting the same > extent buffer, the error case will end up calling > detach_extent_buffer_page() on the page twice. > > This is done first in the bit > > for (int i = 0; i < attached; i++) > detach_extent_buffer_page(eb, eb->pages[i]; > > and then again in btrfs_release_extent_buffer(). > > This works fine for !subpage because we're the only person who ever has > ourselves on the private, and so when we do the initial > detach_extent_buffer_page() we know we've completely removed it. > > However for subpage we could be using this page private elsewhere, so > this results in a double put on the subpage, which can result in an > early free'ing. > > The fix here is to clear eb->pages[i] for everything we detach. Then > anything still attached to the eb is freed in > btrfs_release_extent_buffer(). > > Because of this change we must update > btrfs_release_extent_buffer_pages() to not use num_extent_folios, > because it assumes eb->folio[0] is set properly. Since this is only > interested in free'ing any pages we have on the extent buffer we can > simply use INLINE_EXTENT_BUFFER_PAGES. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> The patch where this applies to best is "btrfs: cleanup metadata page pointer usage" but left it as a standalone fix so we get the explanation. The other folio conversion patches are grouped together so if this ends up in the middle of a bisection at least it should be easy to find the fix. Also it's only for subpage.
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 7898e17e8d84..b42603098b6b 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3185,11 +3185,9 @@ static void detach_extent_buffer_folio(struct extent_buffer *eb, struct folio *f /* Release all pages attached to the extent buffer */ static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb) { - int num_folios = num_extent_folios(eb); - ASSERT(!extent_buffer_under_io(eb)); - for (int i = 0; i < num_folios; i++) { + for (int i = 0; i < INLINE_EXTENT_BUFFER_PAGES; i++) { struct folio *folio = eb->folios[i]; if (!folio) @@ -3754,10 +3752,28 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info, out: WARN_ON(!atomic_dec_and_test(&eb->refs)); + + /* + * Any attached folios need to be detached before we unlock them. This + * is because when we're inserting our new folios into the mapping, and + * then attaching our eb to that folio. If we fail to insert our folio + * we'll lookup the folio for that index, and grab that EB. We do not + * want that to grab this eb, as we're getting ready to free it. So we + * have to detach it first and then unlock it. + * + * We have to drop our reference and NULL it out here because in the + * subpage case detaching does a btrfs_folio_dec_eb_refs() for our eb. + * Below when we call btrfs_release_extent_buffer() we will call + * detach_extent_buffer_folio() on our remaining pages in the !subpage + * case. If we left eb->folios[i] populated in the subpage case we'd + * double put our reference and be super sad. + */ for (int i = 0; i < attached; i++) { ASSERT(eb->folios[i]); detach_extent_buffer_folio(eb, eb->folios[i]); unlock_page(folio_page(eb->folios[i], 0)); + folio_put(eb->folios[i]); + eb->folios[i] = NULL; } /* * Now all pages of that extent buffer is unmapped, set UNMAPPED flag,
** Apologies if you're getting this twice, I fat fingered my email command ** This fix can be folded into "btrfs: refactor alloc_extent_buffer() to allocate-then-attach method". We have been seeing panics in the CI for the subpage stuff recently, it happens on btrfs/187 but could potentially happen anywhere. In the subpage case, if we race with somebody else inserting the same extent buffer, the error case will end up calling detach_extent_buffer_page() on the page twice. This is done first in the bit for (int i = 0; i < attached; i++) detach_extent_buffer_page(eb, eb->pages[i]; and then again in btrfs_release_extent_buffer(). This works fine for !subpage because we're the only person who ever has ourselves on the private, and so when we do the initial detach_extent_buffer_page() we know we've completely removed it. However for subpage we could be using this page private elsewhere, so this results in a double put on the subpage, which can result in an early free'ing. The fix here is to clear eb->pages[i] for everything we detach. Then anything still attached to the eb is freed in btrfs_release_extent_buffer(). Because of this change we must update btrfs_release_extent_buffer_pages() to not use num_extent_folios, because it assumes eb->folio[0] is set properly. Since this is only interested in free'ing any pages we have on the extent buffer we can simply use INLINE_EXTENT_BUFFER_PAGES. Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/btrfs/extent_io.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-)