Message ID | 20220601104547.260949-8-amir73il@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xfs stable candidate patches for 5.10.y (part 2) | expand |
On Wed, Jun 01, 2022 at 01:45:46PM +0300, Amir Goldstein wrote: > From: Brian Foster <bfoster@redhat.com> > > commit 1cd738b13ae9b29e03d6149f0246c61f76e81fcf upstream. > > The assert in xfs_btree_del_cursor() checks that the bmapbt block > allocation field has been handled correctly before the cursor is > freed. This field is used for accurate calculation of indirect block > reservation requirements (for delayed allocations), for example. > generic/019 reproduces a scenario where this assert fails because > the filesystem has shutdown while in the middle of a bmbt record > insertion. This occurs after a bmbt block has been allocated via the > cursor but before the higher level bmap function (i.e. > xfs_bmap_add_extent_hole_real()) completes and resets the field. > > Update the assert to accommodate the transient state if the > filesystem has shutdown. While here, clean up the indentation and > comments in the function. > > Signed-off-by: Brian Foster <bfoster@redhat.com> > Reviewed-by: Darrick J. Wong <djwong@kernel.org> > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > fs/xfs/libxfs/xfs_btree.c | 33 ++++++++++++--------------------- > 1 file changed, 12 insertions(+), 21 deletions(-) https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=56486f307100e8fc66efa2ebd8a71941fa10bf6f
On Thu, Jun 2, 2022 at 3:38 AM Dave Chinner <david@fromorbit.com> wrote: > > On Wed, Jun 01, 2022 at 01:45:46PM +0300, Amir Goldstein wrote: > > From: Brian Foster <bfoster@redhat.com> > > > > commit 1cd738b13ae9b29e03d6149f0246c61f76e81fcf upstream. > > > > The assert in xfs_btree_del_cursor() checks that the bmapbt block > > allocation field has been handled correctly before the cursor is > > freed. This field is used for accurate calculation of indirect block > > reservation requirements (for delayed allocations), for example. > > generic/019 reproduces a scenario where this assert fails because > > the filesystem has shutdown while in the middle of a bmbt record > > insertion. This occurs after a bmbt block has been allocated via the > > cursor but before the higher level bmap function (i.e. > > xfs_bmap_add_extent_hole_real()) completes and resets the field. > > > > Update the assert to accommodate the transient state if the > > filesystem has shutdown. While here, clean up the indentation and > > comments in the function. > > > > Signed-off-by: Brian Foster <bfoster@redhat.com> > > Reviewed-by: Darrick J. Wong <djwong@kernel.org> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > --- > > fs/xfs/libxfs/xfs_btree.c | 33 ++++++++++++--------------------- > > 1 file changed, 12 insertions(+), 21 deletions(-) > > https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=56486f307100e8fc66efa2ebd8a71941fa10bf6f > Warm from the over :) I will need more time to verify that this new fix is not breaking LTS but I don't think that it should be blocking taking the old 5.12 fix now. Right? Thanks, Amir.
On Thu, Jun 02, 2022 at 07:24:26AM +0300, Amir Goldstein wrote: > On Thu, Jun 2, 2022 at 3:38 AM Dave Chinner <david@fromorbit.com> wrote: > > > > On Wed, Jun 01, 2022 at 01:45:46PM +0300, Amir Goldstein wrote: > > > From: Brian Foster <bfoster@redhat.com> > > > > > > commit 1cd738b13ae9b29e03d6149f0246c61f76e81fcf upstream. > > > > > > The assert in xfs_btree_del_cursor() checks that the bmapbt block > > > allocation field has been handled correctly before the cursor is > > > freed. This field is used for accurate calculation of indirect block > > > reservation requirements (for delayed allocations), for example. > > > generic/019 reproduces a scenario where this assert fails because > > > the filesystem has shutdown while in the middle of a bmbt record > > > insertion. This occurs after a bmbt block has been allocated via the > > > cursor but before the higher level bmap function (i.e. > > > xfs_bmap_add_extent_hole_real()) completes and resets the field. > > > > > > Update the assert to accommodate the transient state if the > > > filesystem has shutdown. While here, clean up the indentation and > > > comments in the function. > > > > > > Signed-off-by: Brian Foster <bfoster@redhat.com> > > > Reviewed-by: Darrick J. Wong <djwong@kernel.org> > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > --- > > > fs/xfs/libxfs/xfs_btree.c | 33 ++++++++++++--------------------- > > > 1 file changed, 12 insertions(+), 21 deletions(-) > > > > https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=56486f307100e8fc66efa2ebd8a71941fa10bf6f > > > > Warm from the over :) > > I will need more time to verify that this new fix is not breaking LTS > but I don't think that it should be blocking taking the old 5.12 fix now. > Right? Rule #1: don't introduce new bugs into stable kernels. This commit has a known (and fixed) bug in it. If you are going to back port it to a stable kernel, then you need to also pull in the fix for that commit, too. But the bigger question is this: why propose backports of commits that only change debug code? ASSERT()s are not compiled into production kernels - they are only compiled into developer builds when CONFIG_XFS_DEBUG=y is set. It is test code, not production code, hence nobody will be using this in production kernels. I don't see the value in backporting debug fixes unless there is some other dependency that requires them. But if you are going to back port them, Rule #1 applies. Cheers, Dave.
On Thu, Jun 2, 2022 at 8:15 AM Dave Chinner <david@fromorbit.com> wrote: > > On Thu, Jun 02, 2022 at 07:24:26AM +0300, Amir Goldstein wrote: > > On Thu, Jun 2, 2022 at 3:38 AM Dave Chinner <david@fromorbit.com> wrote: > > > > > > On Wed, Jun 01, 2022 at 01:45:46PM +0300, Amir Goldstein wrote: > > > > From: Brian Foster <bfoster@redhat.com> > > > > > > > > commit 1cd738b13ae9b29e03d6149f0246c61f76e81fcf upstream. > > > > > > > > The assert in xfs_btree_del_cursor() checks that the bmapbt block > > > > allocation field has been handled correctly before the cursor is > > > > freed. This field is used for accurate calculation of indirect block > > > > reservation requirements (for delayed allocations), for example. > > > > generic/019 reproduces a scenario where this assert fails because > > > > the filesystem has shutdown while in the middle of a bmbt record > > > > insertion. This occurs after a bmbt block has been allocated via the > > > > cursor but before the higher level bmap function (i.e. > > > > xfs_bmap_add_extent_hole_real()) completes and resets the field. > > > > > > > > Update the assert to accommodate the transient state if the > > > > filesystem has shutdown. While here, clean up the indentation and > > > > comments in the function. > > > > > > > > Signed-off-by: Brian Foster <bfoster@redhat.com> > > > > Reviewed-by: Darrick J. Wong <djwong@kernel.org> > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > > --- > > > > fs/xfs/libxfs/xfs_btree.c | 33 ++++++++++++--------------------- > > > > 1 file changed, 12 insertions(+), 21 deletions(-) > > > > > > https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=56486f307100e8fc66efa2ebd8a71941fa10bf6f > > > > > > > Warm from the over :) > > > > I will need more time to verify that this new fix is not breaking LTS > > but I don't think that it should be blocking taking the old 5.12 fix now. > > Right? > > Rule #1: don't introduce new bugs into stable kernels. > > This commit has a known (and fixed) bug in it. If you are going to > back port it to a stable kernel, then you need to also pull in the > fix for that commit, too. Oh. I misunderstood. I thought this wasn't a Fixes: situation. I thought you pointed me to another related bug fix. > > But the bigger question is this: why propose backports of commits > that only change debug code? > > ASSERT()s are not compiled into production kernels - they are only > compiled into developer builds when CONFIG_XFS_DEBUG=y is set. It is > test code, not production code, hence nobody will be using this in > production kernels. > > I don't see the value in backporting debug fixes unless there > is some other dependency that requires them. The value is in testing of LTS kernel. For my backport work to be serious, I need to do serious testing. Serious means running as many tests as I can and running the tests on many configs and many times over. When I first joined Luis in testing LTS baseline, CONFIG_XFS_DEBUG was not enabled on the tested kernels. I enabled it so I could get better test coverage for fstests that use error injection and tests that check for asserts. This helped me find a regression with one of the backported patches [1]. IOW, for LTS code to be in good quality, it needs to also have the correct assertions. For the same reason, I am also going to queue the following as stable candidate: 756b1c343333 xfs: use current->journal_info for detecting transaction recursion Because it has already proved to be helpful in detecting bugs on our internal product tests. > But if you are going to back port them, Rule #1 applies. > Of course. I will defer sending this patch to stable and test it along with the new fix. Thanks! Amir. [1] https://lore.kernel.org/linux-xfs/YpY6hUknor2S1iMd@bfoster/T/#mf1add66b8309a75a8984f28ea08718f22033bce7
On Thu, Jun 2, 2022 at 8:55 AM Amir Goldstein <amir73il@gmail.com> wrote: > > On Thu, Jun 2, 2022 at 8:15 AM Dave Chinner <david@fromorbit.com> wrote: > > > > On Thu, Jun 02, 2022 at 07:24:26AM +0300, Amir Goldstein wrote: > > > On Thu, Jun 2, 2022 at 3:38 AM Dave Chinner <david@fromorbit.com> wrote: > > > > > > > > On Wed, Jun 01, 2022 at 01:45:46PM +0300, Amir Goldstein wrote: > > > > > From: Brian Foster <bfoster@redhat.com> > > > > > > > > > > commit 1cd738b13ae9b29e03d6149f0246c61f76e81fcf upstream. > > > > > > > > > > The assert in xfs_btree_del_cursor() checks that the bmapbt block > > > > > allocation field has been handled correctly before the cursor is > > > > > freed. This field is used for accurate calculation of indirect block > > > > > reservation requirements (for delayed allocations), for example. > > > > > generic/019 reproduces a scenario where this assert fails because > > > > > the filesystem has shutdown while in the middle of a bmbt record > > > > > insertion. This occurs after a bmbt block has been allocated via the > > > > > cursor but before the higher level bmap function (i.e. > > > > > xfs_bmap_add_extent_hole_real()) completes and resets the field. > > > > > > > > > > Update the assert to accommodate the transient state if the > > > > > filesystem has shutdown. While here, clean up the indentation and > > > > > comments in the function. > > > > > > > > > > Signed-off-by: Brian Foster <bfoster@redhat.com> > > > > > Reviewed-by: Darrick J. Wong <djwong@kernel.org> > > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > > > --- > > > > > fs/xfs/libxfs/xfs_btree.c | 33 ++++++++++++--------------------- > > > > > 1 file changed, 12 insertions(+), 21 deletions(-) > > > > > > > > https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=56486f307100e8fc66efa2ebd8a71941fa10bf6f > > > > > > > > > > Warm from the over :) > > > > > > I will need more time to verify that this new fix is not breaking LTS > > > but I don't think that it should be blocking taking the old 5.12 fix now. > > > Right? > > > > Rule #1: don't introduce new bugs into stable kernels. > > > > This commit has a known (and fixed) bug in it. If you are going to > > back port it to a stable kernel, then you need to also pull in the > > fix for that commit, too. > > Oh. I misunderstood. > I thought this wasn't a Fixes: situation. > I thought you pointed me to another related bug fix. > Just to make sure we are all on the same page. I have applied both patches to my test tree: 1. 1cd738b13ae9 xfs: consider shutdown in bmapbt cursor delete assert 2. 56486f307100 xfs: assert in xfs_btree_del_cursor should take into account error Patch #2 looks pretty safe and it only affects builds with XFS_WARN/DEBUG, so I am not too concerned about a soaking period. I plan to send it along with patch #1 to stable after a few more test runs. If my understanding is correct, the ASSERT has been there since git epoc. The too strict ASSERT was relaxed two times by patch #1 and then by patch #2. Maybe I am missing something, but I do not see how applying patch #1 introduces a bug, but anyway, I am going to send both patches together. Thanks, Amir.
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index 2d25bab68764..9f9f9feccbcd 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -353,20 +353,17 @@ xfs_btree_free_block( */ void xfs_btree_del_cursor( - xfs_btree_cur_t *cur, /* btree cursor */ - int error) /* del because of error */ + struct xfs_btree_cur *cur, /* btree cursor */ + int error) /* del because of error */ { - int i; /* btree level */ + int i; /* btree level */ /* - * Clear the buffer pointers, and release the buffers. - * If we're doing this in the face of an error, we - * need to make sure to inspect all of the entries - * in the bc_bufs array for buffers to be unlocked. - * This is because some of the btree code works from - * level n down to 0, and if we get an error along - * the way we won't have initialized all the entries - * down to 0. + * Clear the buffer pointers and release the buffers. If we're doing + * this because of an error, inspect all of the entries in the bc_bufs + * array for buffers to be unlocked. This is because some of the btree + * code works from level n down to 0, and if we get an error along the + * way we won't have initialized all the entries down to 0. */ for (i = 0; i < cur->bc_nlevels; i++) { if (cur->bc_bufs[i]) @@ -374,17 +371,11 @@ xfs_btree_del_cursor( else if (!error) break; } - /* - * Can't free a bmap cursor without having dealt with the - * allocated indirect blocks' accounting. - */ - ASSERT(cur->bc_btnum != XFS_BTNUM_BMAP || - cur->bc_ino.allocated == 0); - /* - * Free the cursor. - */ + + ASSERT(cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 || + XFS_FORCED_SHUTDOWN(cur->bc_mp)); if (unlikely(cur->bc_flags & XFS_BTREE_STAGING)) - kmem_free((void *)cur->bc_ops); + kmem_free(cur->bc_ops); kmem_cache_free(xfs_btree_cur_zone, cur); }