Message ID | cover.1637781110.git.josef@toxicpanda.com (mailing list archive) |
---|---|
Headers | show |
Series | Metadata IO error fixes | expand |
On 24.11.21 г. 21:14, Josef Bacik wrote: > v1->v2: > - I was debugging generic/484 separately because I thought it was data related, > but it turned out to be metadata related as well, so I've added the patch > "btrfs: call mapping_set_error() on btree inode with a write error" to the > series. > > --- Original email --- > > Hello, > > I saw a dmesg failure with generic/281 on our overnight runs. This turned out > to be because we weren't getting an error back from btrfs_search_slot() even > though we found a metadata block that shouldn't have been uptodate. > > The root cause is that write errors on the page clear uptodate on the page, but > not on the extent buffer itself. Since we rely on that bit to tell wether the > extent buffer is valid or not we don't notice that the eb is bogus when we find > it in cache in a subsequent write, and eventually trip over > assert_eb_page_uptodate() warnings. > > This fixes the problem I was seeing, I could easily reproduce by running > generic/281 in a loop a few times. With these pages I haven't reproduced in 20 > loops. Thanks, > > Josef > > Josef Bacik (3): > btrfs: clear extent buffer uptodate when we fail to write it > btrfs: check the root node for uptodate before returning it > btrfs: call mapping_set_error() on btree inode with a write error > > fs/btrfs/ctree.c | 19 +++++++++++++++---- > fs/btrfs/extent_io.c | 14 ++++++++++++++ > 2 files changed, 29 insertions(+), 4 deletions(-) > This is stable material as well, right?
On Thu, Nov 25, 2021 at 11:12:53AM +0200, Nikolay Borisov wrote: > > > On 24.11.21 г. 21:14, Josef Bacik wrote: > > v1->v2: > > - I was debugging generic/484 separately because I thought it was data related, > > but it turned out to be metadata related as well, so I've added the patch > > "btrfs: call mapping_set_error() on btree inode with a write error" to the > > series. > > > > --- Original email --- > > > > Hello, > > > > I saw a dmesg failure with generic/281 on our overnight runs. This turned out > > to be because we weren't getting an error back from btrfs_search_slot() even > > though we found a metadata block that shouldn't have been uptodate. > > > > The root cause is that write errors on the page clear uptodate on the page, but > > not on the extent buffer itself. Since we rely on that bit to tell wether the > > extent buffer is valid or not we don't notice that the eb is bogus when we find > > it in cache in a subsequent write, and eventually trip over > > assert_eb_page_uptodate() warnings. > > > > This fixes the problem I was seeing, I could easily reproduce by running > > generic/281 in a loop a few times. With these pages I haven't reproduced in 20 > > loops. Thanks, > > > > Josef > > > > Josef Bacik (3): > > btrfs: clear extent buffer uptodate when we fail to write it > > btrfs: check the root node for uptodate before returning it > > btrfs: call mapping_set_error() on btree inode with a write error > > > > fs/btrfs/ctree.c | 19 +++++++++++++++---- > > fs/btrfs/extent_io.c | 14 ++++++++++++++ > > 2 files changed, 29 insertions(+), 4 deletions(-) > > > > > This is stable material as well, right? Yes, tags added.
On Wed, Nov 24, 2021 at 02:14:22PM -0500, Josef Bacik wrote: > v1->v2: > - I was debugging generic/484 separately because I thought it was data related, > but it turned out to be metadata related as well, so I've added the patch > "btrfs: call mapping_set_error() on btree inode with a write error" to the > series. > > --- Original email --- > > Hello, > > I saw a dmesg failure with generic/281 on our overnight runs. This turned out > to be because we weren't getting an error back from btrfs_search_slot() even > though we found a metadata block that shouldn't have been uptodate. > > The root cause is that write errors on the page clear uptodate on the page, but > not on the extent buffer itself. Since we rely on that bit to tell wether the > extent buffer is valid or not we don't notice that the eb is bogus when we find > it in cache in a subsequent write, and eventually trip over > assert_eb_page_uptodate() warnings. > > This fixes the problem I was seeing, I could easily reproduce by running > generic/281 in a loop a few times. With these pages I haven't reproduced in 20 > loops. Thanks, > > Josef > > Josef Bacik (3): > btrfs: clear extent buffer uptodate when we fail to write it > btrfs: check the root node for uptodate before returning it > btrfs: call mapping_set_error() on btree inode with a write error Added to misc-next, thanks.