[PATH,5.10,0/4] xfs stable candidate patches for 5.10.y (part 1)

Message ID	20220525111715.2769700-1-amir73il@gmail.com (mailing list archive)
Headers	show Return-Path: <fstests-owner@kernel.org> From: Amir Goldstein <amir73il@gmail.com> To: Dave Chinner <david@fromorbit.com>, "Darrick J . Wong" <darrick.wong@oracle.com> Cc: Luis Chamberlain <mcgrof@kernel.org>, Theodore Ts'o <tytso@mit.edu>, Adam Manzanares <a.manzanares@samsung.com>, Tyler Hicks <code@tyhicks.com>, linux-xfs@vger.kernel.org, fstests@vger.kernel.org Subject: [PATH 5.10 0/4] xfs stable candidate patches for 5.10.y (part 1) Date: Wed, 25 May 2022 14:17:11 +0300 Message-Id: <20220525111715.2769700-1-amir73il@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	xfs stable candidate patches for 5.10.y (part 1) \| expand [PATH,5.10,0/4] xfs stable candidate patches for 5.10.y (part 1) [PATH,5.10,1/4] xfs: detect overflows in bmbt records [PATH,5.10,2/4] xfs: show the proper user quota options [PATH,5.10,3/4] xfs: fix the forward progress assertion in xfs_iwalk_run_callbacks [PATH,5.10,4/4] xfs: fix an ABBA deadlock in xfs_rename

Amir Goldstein May 25, 2022, 11:17 a.m. UTC

Hi all!

During LSFMM 2022, I have had an opportunity to speak with developers
from several different companies that showed interest in collaborating
on the effort of improving the state of xfs code in LTS kernels.

I would like to kick-off this effort for the 5.10 LTS kernel, in the
hope that others will join me in the future to produce a better common
baseline for everyone to build on.

This is the first of 6 series of stable patch candidates that
I collected from xfs releases v5.11..v5.18 [1].

My intention is to post the parts for review on the xfs list on
a ~weekly basis and forward them to stable only after xfs developers
have had the chance to review the selection.

I used a gadget that I developed "b4 rn" that produces high level
"release notes" with references to the posted patch series and also
looks for mentions of fstest names in the discussions on lore.
I then used an elimination process to select the stable tree candidate
patches. The selection process is documented in the git log of [1].

After I had candidates, Luis has helped me to set up a kdevops testing
environment on a server that Samsung has contributed to the effort.
Luis and I have spent a considerable amount of time to establish the
expunge lists that produce stable baseline results for v5.10.y [2].
Eventually, we ran the auto group test over 100 times to sanitize the
baseline, on the following configurations:
reflink_normapbt (default), reflink, reflink_1024, nocrc, nocrc_512.

The patches in this part are from circa v5.11 release.
They have been through 36 auto group runs with the configs listed above
and no regressions from baseline were observed.

At least two of the fixes have regression tests in fstests that were used
to verify the fix. I also annotated [3] the fix commits in the tests.

I would like to thank Luis for his huge part in this still ongoing effort
and I would like to thank Samsung for contributing the hardware resources
to drive this effort.

Your inputs on the selection in this part and in upcoming parts [1]
are most welcome!

Thanks,
Amir.

[1] https://github.com/amir73il/b4/blob/xfs-5.10.y/xfs-5.10..5.17-fixes.rst
[2] https://github.com/linux-kdevops/kdevops/tree/master/workflows/fstests/expunges/5.10.105/xfs/unassigned
[3] https://lore.kernel.org/fstests/20220520143249.2103631-1-amir73il@gmail.com/

Darrick J. Wong (3):
xfs: detect overflows in bmbt records
xfs: fix the forward progress assertion in xfs_iwalk_run_callbacks
xfs: fix an ABBA deadlock in xfs_rename

Kaixu Xia (1):
xfs: show the proper user quota options

Darrick J. Wong May 26, 2022, 5:27 p.m. UTC | #1

On Wed, May 25, 2022 at 02:17:11PM +0300, Amir Goldstein wrote:
> Hi all!
> 
> During LSFMM 2022, I have had an opportunity to speak with developers
> from several different companies that showed interest in collaborating
> on the effort of improving the state of xfs code in LTS kernels.
> 
> I would like to kick-off this effort for the 5.10 LTS kernel, in the
> hope that others will join me in the future to produce a better common
> baseline for everyone to build on.
> 
> This is the first of 6 series of stable patch candidates that
> I collected from xfs releases v5.11..v5.18 [1].
> 
> My intention is to post the parts for review on the xfs list on
> a ~weekly basis and forward them to stable only after xfs developers
> have had the chance to review the selection.
> 
> I used a gadget that I developed "b4 rn" that produces high level
> "release notes" with references to the posted patch series and also
> looks for mentions of fstest names in the discussions on lore.
> I then used an elimination process to select the stable tree candidate
> patches. The selection process is documented in the git log of [1].
> 
> After I had candidates, Luis has helped me to set up a kdevops testing
> environment on a server that Samsung has contributed to the effort.
> Luis and I have spent a considerable amount of time to establish the
> expunge lists that produce stable baseline results for v5.10.y [2].
> Eventually, we ran the auto group test over 100 times to sanitize the
> baseline, on the following configurations:
> reflink_normapbt (default), reflink, reflink_1024, nocrc, nocrc_512.
> 
> The patches in this part are from circa v5.11 release.
> They have been through 36 auto group runs with the configs listed above
> and no regressions from baseline were observed.

Woot!

> At least two of the fixes have regression tests in fstests that were used
> to verify the fix. I also annotated [3] the fix commits in the tests.
> 
> I would like to thank Luis for his huge part in this still ongoing effort
> and I would like to thank Samsung for contributing the hardware resources
> to drive this effort.
> 
> Your inputs on the selection in this part and in upcoming parts [1]
> are most welcome!

/me wonders if you need commit 9a5280b312e2 xfs: reorder iunlink remove
operation in xfs_ifree ?  Or did that one already get pulled in?

The changes proposed look reasonable to me, and moreso with the testing
to prove it.  Minor nit: patchsets should be tagged "PATCH", not "PATH".

Thanks to you and Luis for doing this work! :)

> Thanks,
> Amir.
> 
> [1] https://github.com/amir73il/b4/blob/xfs-5.10.y/xfs-5.10..5.17-fixes.rst
> [2] https://github.com/linux-kdevops/kdevops/tree/master/workflows/fstests/expunges/5.10.105/xfs/unassigned

/me looks and sees a large collection of expunge lists, along with
comments about how often failures occur and/or reasons.  Neat!

Leah mentioned on the ext4 call this morning that she would have found
it helpful to know (before she started working on 5.15 backports) which
tests were of the flaky variety so that she could better prioritize the
time she had to look into fstests failures.  (IOWS: saw a test fail a
small percentage of the time and then burned a lot of machine time only
to figure out that 5.15.0 also failed a percentage of th time).

We talked about where it would be most useful for maintainers and QA
people to store their historical pass/fail data, before settling on
"somewhere public where everyone can review their colleagues' notes" and
"somewhere minimizing commit friction".  At the time, we were thinking
about having people contribute their notes directly to the fstests
source code, but I guess Luis has been doing that in the kdevops repo
for a few years now.

So, maybe there?

--D

> [3] https://lore.kernel.org/fstests/20220520143249.2103631-1-amir73il@gmail.com/
> 
> Darrick J. Wong (3):
>   xfs: detect overflows in bmbt records
>   xfs: fix the forward progress assertion in xfs_iwalk_run_callbacks
>   xfs: fix an ABBA deadlock in xfs_rename
> 
> Kaixu Xia (1):
>   xfs: show the proper user quota options
> 
>  fs/xfs/libxfs/xfs_bmap.c    |  5 +++++
>  fs/xfs/libxfs/xfs_dir2.h    |  2 --
>  fs/xfs/libxfs/xfs_dir2_sf.c |  2 +-
>  fs/xfs/xfs_inode.c          | 42 ++++++++++++++++++++++---------------
>  fs/xfs/xfs_iwalk.c          |  2 +-
>  fs/xfs/xfs_super.c          | 10 +++++----
>  6 files changed, 38 insertions(+), 25 deletions(-)
> 
> -- 
> 2.25.1
>

Luis Chamberlain May 26, 2022, 6:44 p.m. UTC | #2

On Thu, May 26, 2022 at 10:27:41AM -0700, Darrick J. Wong wrote:
> /me looks and sees a large collection of expunge lists, along with
> comments about how often failures occur and/or reasons.  Neat!
> 
> Leah mentioned on the ext4 call this morning that she would have found
> it helpful to know (before she started working on 5.15 backports) which
> tests were of the flaky variety so that she could better prioritize the
> time she had to look into fstests failures.  (IOWS: saw a test fail a
> small percentage of the time and then burned a lot of machine time only
> to figure out that 5.15.0 also failed a percentage of th time).

See my proposal to try to make this easier to parse:

https://lore.kernel.org/all/YoW0ZC+zM27Pi0Us@bombadil.infradead.org/

> We talked about where it would be most useful for maintainers and QA
> people to store their historical pass/fail data, before settling on
> "somewhere public where everyone can review their colleagues' notes" and
> "somewhere minimizing commit friction".  At the time, we were thinking
> about having people contribute their notes directly to the fstests
> source code, but I guess Luis has been doing that in the kdevops repo
> for a few years now.
> 
> So, maybe there?

For now sure, I'm happy to add others the linux-kdevops org on github
and they get immediate write access to the repo. This is working well
so far. Long term we need to decide if we want to spin off the
expunge list as a separate effort and make it a git subtree (note
it is different than a git sub module). Another example of a use case
for a git subtree, to use it as an example, is the way I forked
kconfig from Linux into a standalone git tree so to allow any project
to bump kconfig code with just one command. So different projects
don't need to fork kconfig as they do today.

The value in doing the git subtree for expunges is any runner can use
it. I did design kdevops though to run on *any* cloud, and support
local virtualization tech like libvirt and virtualbox.

The linux-kdevops git org also has other projects which both fstest
and blktests depend on, so for example dbench which I had forked and
cleaned up a while ago. It may make sense to share keeping oddball
efforts like thse which are no longer maintained in this repo.

There is other tech I'm evaluating for this sort of collaborative test
efforts such as ledgers, but that is in its infancy at this point in
time. I have a sense though it may be a good outlet for collection of
test artifacts in a decentralized way and also allow *any* entity to
participate in bringing confidence to stable kernel branches or dev
branches prior to release.

  Luis

Amir Goldstein May 26, 2022, 6:47 p.m. UTC | #3

On Thu, May 26, 2022 at 8:27 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Wed, May 25, 2022 at 02:17:11PM +0300, Amir Goldstein wrote:
> > Hi all!
> >
> > During LSFMM 2022, I have had an opportunity to speak with developers
> > from several different companies that showed interest in collaborating
> > on the effort of improving the state of xfs code in LTS kernels.
> >
> > I would like to kick-off this effort for the 5.10 LTS kernel, in the
> > hope that others will join me in the future to produce a better common
> > baseline for everyone to build on.
> >
> > This is the first of 6 series of stable patch candidates that
> > I collected from xfs releases v5.11..v5.18 [1].
> >
> > My intention is to post the parts for review on the xfs list on
> > a ~weekly basis and forward them to stable only after xfs developers
> > have had the chance to review the selection.
> >
> > I used a gadget that I developed "b4 rn" that produces high level
> > "release notes" with references to the posted patch series and also
> > looks for mentions of fstest names in the discussions on lore.
> > I then used an elimination process to select the stable tree candidate
> > patches. The selection process is documented in the git log of [1].
> >
> > After I had candidates, Luis has helped me to set up a kdevops testing
> > environment on a server that Samsung has contributed to the effort.
> > Luis and I have spent a considerable amount of time to establish the
> > expunge lists that produce stable baseline results for v5.10.y [2].
> > Eventually, we ran the auto group test over 100 times to sanitize the
> > baseline, on the following configurations:
> > reflink_normapbt (default), reflink, reflink_1024, nocrc, nocrc_512.
> >
> > The patches in this part are from circa v5.11 release.
> > They have been through 36 auto group runs with the configs listed above
> > and no regressions from baseline were observed.
>
> Woot!
>
> > At least two of the fixes have regression tests in fstests that were used
> > to verify the fix. I also annotated [3] the fix commits in the tests.
> >
> > I would like to thank Luis for his huge part in this still ongoing effort
> > and I would like to thank Samsung for contributing the hardware resources
> > to drive this effort.
> >
> > Your inputs on the selection in this part and in upcoming parts [1]
> > are most welcome!
>
> /me wonders if you need commit 9a5280b312e2 xfs: reorder iunlink remove
> operation in xfs_ifree ?  Or did that one already get pulled in?
>

Ok. I added it to my branch.
Note that 5.10.y was not getting any xfs fixes for 2 years, so for now I am
working my way up from v5.11 and not expediting any fixes, because if
downstream users waited for 2 years, they can wait a few more weeks.

My list of candidates was extracted at time of 5.18-rc1 and I intentionally
wanted to let the 5.18 patches soak before I consider them.
I will get to the 5.18 release when I am done posting fixes up to 5.17.

> The changes proposed look reasonable to me, and moreso with the testing
> to prove it.  Minor nit: patchsets should be tagged "PATCH", not "PATH".
>

Oy :)

I'll be sure to fix that when I post to stable.

Thanks,
Amir.

Amir Goldstein May 26, 2022, 6:59 p.m. UTC | #4

On Thu, May 26, 2022 at 9:44 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Thu, May 26, 2022 at 10:27:41AM -0700, Darrick J. Wong wrote:
> > /me looks and sees a large collection of expunge lists, along with
> > comments about how often failures occur and/or reasons.  Neat!
> >
> > Leah mentioned on the ext4 call this morning that she would have found
> > it helpful to know (before she started working on 5.15 backports) which
> > tests were of the flaky variety so that she could better prioritize the
> > time she had to look into fstests failures.  (IOWS: saw a test fail a
> > small percentage of the time and then burned a lot of machine time only
> > to figure out that 5.15.0 also failed a percentage of th time).
>
> See my proposal to try to make this easier to parse:
>
> https://lore.kernel.org/all/YoW0ZC+zM27Pi0Us@bombadil.infradead.org/
>
> > We talked about where it would be most useful for maintainers and QA
> > people to store their historical pass/fail data, before settling on
> > "somewhere public where everyone can review their colleagues' notes" and
> > "somewhere minimizing commit friction".  At the time, we were thinking
> > about having people contribute their notes directly to the fstests
> > source code, but I guess Luis has been doing that in the kdevops repo
> > for a few years now.
> >
> > So, maybe there?
>
> For now sure, I'm happy to add others the linux-kdevops org on github
> and they get immediate write access to the repo. This is working well
> so far. Long term we need to decide if we want to spin off the
> expunge list as a separate effort and make it a git subtree (note
> it is different than a git sub module). Another example of a use case
> for a git subtree, to use it as an example, is the way I forked
> kconfig from Linux into a standalone git tree so to allow any project
> to bump kconfig code with just one command. So different projects
> don't need to fork kconfig as they do today.
>
> The value in doing the git subtree for expunges is any runner can use
> it. I did design kdevops though to run on *any* cloud, and support
> local virtualization tech like libvirt and virtualbox.
>
> The linux-kdevops git org also has other projects which both fstest
> and blktests depend on, so for example dbench which I had forked and
> cleaned up a while ago. It may make sense to share keeping oddball
> efforts like thse which are no longer maintained in this repo.
>
> There is other tech I'm evaluating for this sort of collaborative test
> efforts such as ledgers, but that is in its infancy at this point in
> time. I have a sense though it may be a good outlet for collection of
> test artifacts in a decentralized way and also allow *any* entity to
> participate in bringing confidence to stable kernel branches or dev
> branches prior to release.
>

There are few problems I noticed with the current workflow.

1. It will not scale to maintain this in git as more and more testers
start using kdepops and adding more and more fs and configs and distros.
How many more developers you want to give push access to linux-kdevops?
I don't know how test labs report to KernelCI, but we need to look at their
model and not invent the wheel.

2. kdevops is very focused on stabilizing the baseline fast, which is
very good, but there is no good process of getting a test out of expunge list.
We have a very strong suspicion that some of the tests that we put in
expunge lists failed due to some setup issue in the host OS that caused
NVME IO errors in the guests. I tried to put that into comments when
I noticed that, but I am afraid there may have been other tests that are
falsely accused of failing. All developers make those mistakes in their
own expunge lists, but if we start propagating those mistakes to the world,
it becomes an issue.

For those two reasons I think that the model to aspire to should be
composed of a database where absolutely everyone can post data
point to in the form of facts (i.e. the test failed after N runs on this kernel
and this hardware...) and another process, partly AI, partly human to
digest all those facts into a knowledge base that is valuable and
maintained by experts. Much easier said than done...

Thanks,
Amir.

Christoph Hellwig May 27, 2022, 6:06 a.m. UTC | #5

FYI, below is the 5.10-stable backport I have been testing earlier this
week that fixes a bugzilla reported hang:

https://bugzilla.kernel.org/show_bug.cgi?id=214767

I was just going to submit it to the stable maintaines today after
beeing out of the holiday, but if you want to add it to this queue
that is fine with me as well.

---
From 8e0464752b24f2b3919e8e92298759d116b283eb Mon Sep 17 00:00:00 2001
From: Dave Chinner <dchinner@redhat.com>
Date: Fri, 18 Jun 2021 08:21:51 -0700
Subject: xfs: Fix CIL throttle hang when CIL space used going backwards

commit 19f4e7cc819771812a7f527d7897c2deffbf7a00 upstream.

A hang with tasks stuck on the CIL hard throttle was reported and
largely diagnosed by Donald Buczek, who discovered that it was a
result of the CIL context space usage decrementing in committed
transactions once the hard throttle limit had been hit and processes
were already blocked.  This resulted in the CIL push not waking up
those waiters because the CIL context was no longer over the hard
throttle limit.

The surprising aspect of this was the CIL space usage going
backwards regularly enough to trigger this situation. Assumptions
had been made in design that the relogging process would only
increase the size of the objects in the CIL, and so that space would
only increase.

This change and commit message fixes the issue and documents the
result of an audit of the triggers that can cause the CIL space to
go backwards, how large the backwards steps tend to be, the
frequency in which they occur, and what the impact on the CIL
accounting code is.

Even though the CIL ctx->space_used can go backwards, it will only
do so if the log item is already logged to the CIL and contains a
space reservation for it's entire logged state. This is tracked by
the shadow buffer state on the log item. If the item is not
previously logged in the CIL it has no shadow buffer nor log vector,
and hence the entire size of the logged item copied to the log
vector is accounted to the CIL space usage. i.e.  it will always go
up in this case.

If the item has a log vector (i.e. already in the CIL) and the size
decreases, then the existing log vector will be overwritten and the
space usage will go down. This is the only condition where the space
usage reduces, and it can only occur when an item is already tracked
in the CIL. Hence we are safe from CIL space usage underruns as a
result of log items decreasing in size when they are relogged.

Typically this reduction in CIL usage occurs from metadata blocks
being free, such as when a btree block merge occurs or a directory
enter/xattr entry is removed and the da-tree is reduced in size.
This generally results in a reduction in size of around a single
block in the CIL, but also tends to increase the number of log
vectors because the parent and sibling nodes in the tree needs to be
updated when a btree block is removed. If a multi-level merge
occurs, then we see reduction in size of 2+ blocks, but again the
log vector count goes up.

The other vector is inode fork size changes, which only log the
current size of the fork and ignore the previously logged size when
the fork is relogged. Hence if we are removing items from the inode
fork (dir/xattr removal in shortform, extent record removal in
extent form, etc) the relogged size of the inode for can decrease.

No other log items can decrease in size either because they are a
fixed size (e.g. dquots) or they cannot be relogged (e.g. relogging
an intent actually creates a new intent log item and doesn't relog
the old item at all.) Hence the only two vectors for CIL context
size reduction are relogging inode forks and marking buffers active
in the CIL as stale.

Long story short: the majority of the code does the right thing and
handles the reduction in log item size correctly, and only the CIL
hard throttle implementation is problematic and needs fixing. This
patch makes that fix, as well as adds comments in the log item code
that result in items shrinking in size when they are relogged as a
clear reminder that this can and does happen frequently.

The throttle fix is based upon the change Donald proposed, though it
goes further to ensure that once the throttle is activated, it
captures all tasks until the CIL push issues a wakeup, regardless of
whether the CIL space used has gone back under the throttle
threshold.

This ensures that we prevent tasks reducing the CIL slightly under
the throttle threshold and then making more changes that push it
well over the throttle limit. This is acheived by checking if the
throttle wait queue is already active as a condition of throttling.
Hence once we start throttling, we continue to apply the throttle
until the CIL context push wakes everything on the wait queue.

We can use waitqueue_active() for the waitqueue manipulations and
checks as they are all done under the ctx->xc_push_lock. Hence the
waitqueue has external serialisation and we can safely peek inside
the wait queue without holding the internal waitqueue locks.

Many thanks to Donald for his diagnostic and analysis work to
isolate the cause of this hang.

Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_buf_item.c   | 37 ++++++++++++++++++-------------------
 fs/xfs/xfs_inode_item.c | 14 ++++++++++++++
 fs/xfs/xfs_log_cil.c    | 22 +++++++++++++++++-----
 3 files changed, 49 insertions(+), 24 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 0356f2e340a10e..8c6e26d62ef288 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -56,14 +56,12 @@ xfs_buf_log_format_size(
 }
 
 /*
- * This returns the number of log iovecs needed to log the
- * given buf log item.
+ * Return the number of log iovecs and space needed to log the given buf log
+ * item segment.
  *
- * It calculates this as 1 iovec for the buf log format structure
- * and 1 for each stretch of non-contiguous chunks to be logged.
- * Contiguous chunks are logged in a single iovec.
- *
- * If the XFS_BLI_STALE flag has been set, then log nothing.
+ * It calculates this as 1 iovec for the buf log format structure and 1 for each
+ * stretch of non-contiguous chunks to be logged.  Contiguous chunks are logged
+ * in a single iovec.
  */
 STATIC void
 xfs_buf_item_size_segment(
@@ -119,11 +117,8 @@ xfs_buf_item_size_segment(
 }
 
 /*
- * This returns the number of log iovecs needed to log the given buf log item.
- *
- * It calculates this as 1 iovec for the buf log format structure and 1 for each
- * stretch of non-contiguous chunks to be logged.  Contiguous chunks are logged
- * in a single iovec.
+ * Return the number of log iovecs and space needed to log the given buf log
+ * item.
  *
  * Discontiguous buffers need a format structure per region that is being
  * logged. This makes the changes in the buffer appear to log recovery as though
@@ -133,7 +128,11 @@ xfs_buf_item_size_segment(
  * what ends up on disk.
  *
  * If the XFS_BLI_STALE flag has been set, then log nothing but the buf log
- * format structures.
+ * format structures. If the item has previously been logged and has dirty
+ * regions, we do not relog them in stale buffers. This has the effect of
+ * reducing the size of the relogged item by the amount of dirty data tracked
+ * by the log item. This can result in the committing transaction reducing the
+ * amount of space being consumed by the CIL.
  */
 STATIC void
 xfs_buf_item_size(
@@ -147,9 +146,9 @@ xfs_buf_item_size(
 	ASSERT(atomic_read(&bip->bli_refcount) > 0);
 	if (bip->bli_flags & XFS_BLI_STALE) {
 		/*
-		 * The buffer is stale, so all we need to log
-		 * is the buf log format structure with the
-		 * cancel flag in it.
+		 * The buffer is stale, so all we need to log is the buf log
+		 * format structure with the cancel flag in it as we are never
+		 * going to replay the changes tracked in the log item.
 		 */
 		trace_xfs_buf_item_size_stale(bip);
 		ASSERT(bip->__bli_format.blf_flags & XFS_BLF_CANCEL);
@@ -164,9 +163,9 @@ xfs_buf_item_size(
 
 	if (bip->bli_flags & XFS_BLI_ORDERED) {
 		/*
-		 * The buffer has been logged just to order it.
-		 * It is not being included in the transaction
-		 * commit, so no vectors are used at all.
+		 * The buffer has been logged just to order it. It is not being
+		 * included in the transaction commit, so no vectors are used at
+		 * all.
 		 */
 		trace_xfs_buf_item_size_ordered(bip);
 		*nvecs = XFS_LOG_VEC_ORDERED;
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 17e20a6d8b4e27..6ff91e5bf3cd73 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -28,6 +28,20 @@ static inline struct xfs_inode_log_item *INODE_ITEM(struct xfs_log_item *lip)
 	return container_of(lip, struct xfs_inode_log_item, ili_item);
 }
 
+/*
+ * The logged size of an inode fork is always the current size of the inode
+ * fork. This means that when an inode fork is relogged, the size of the logged
+ * region is determined by the current state, not the combination of the
+ * previously logged state + the current state. This is different relogging
+ * behaviour to most other log items which will retain the size of the
+ * previously logged changes when smaller regions are relogged.
+ *
+ * Hence operations that remove data from the inode fork (e.g. shortform
+ * dir/attr remove, extent form extent removal, etc), the size of the relogged
+ * inode gets -smaller- rather than stays the same size as the previously logged
+ * size and this can result in the committing transaction reducing the amount of
+ * space being consumed by the CIL.
+ */
 STATIC void
 xfs_inode_item_data_fork_size(
 	struct xfs_inode_log_item *iip,
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index b0ef071b3cb53a..cd5c04dabe2e18 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -668,9 +668,14 @@ xlog_cil_push_work(
 	ASSERT(push_seq <= ctx->sequence);
 
 	/*
-	 * Wake up any background push waiters now this context is being pushed.
+	 * As we are about to switch to a new, empty CIL context, we no longer
+	 * need to throttle tasks on CIL space overruns. Wake any waiters that
+	 * the hard push throttle may have caught so they can start committing
+	 * to the new context. The ctx->xc_push_lock provides the serialisation
+	 * necessary for safely using the lockless waitqueue_active() check in
+	 * this context.
 	 */
-	if (ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log))
+	if (waitqueue_active(&cil->xc_push_wait))
 		wake_up_all(&cil->xc_push_wait);
 
 	/*
@@ -907,7 +912,7 @@ xlog_cil_push_background(
 	ASSERT(!list_empty(&cil->xc_cil));
 
 	/*
-	 * don't do a background push if we haven't used up all the
+	 * Don't do a background push if we haven't used up all the
 	 * space available yet.
 	 */
 	if (cil->xc_ctx->space_used < XLOG_CIL_SPACE_LIMIT(log)) {
@@ -931,9 +936,16 @@ xlog_cil_push_background(
 
 	/*
 	 * If we are well over the space limit, throttle the work that is being
-	 * done until the push work on this context has begun.
+	 * done until the push work on this context has begun. Enforce the hard
+	 * throttle on all transaction commits once it has been activated, even
+	 * if the committing transactions have resulted in the space usage
+	 * dipping back down under the hard limit.
+	 *
+	 * The ctx->xc_push_lock provides the serialisation necessary for safely
+	 * using the lockless waitqueue_active() check in this context.
 	 */
-	if (cil->xc_ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) {
+	if (cil->xc_ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log) ||
+	    waitqueue_active(&cil->xc_push_wait)) {
 		trace_xfs_log_cil_wait(log, cil->xc_ctx->ticket);
 		ASSERT(cil->xc_ctx->space_used < log->l_logsize);
 		xlog_wait(&cil->xc_push_wait, &cil->xc_push_lock);

Amir Goldstein May 27, 2022, 7:01 a.m. UTC | #6

On Fri, May 27, 2022 at 9:06 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> FYI, below is the 5.10-stable backport I have been testing earlier this
> week that fixes a bugzilla reported hang:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=214767
>
> I was just going to submit it to the stable maintaines today after
> beeing out of the holiday, but if you want to add it to this queue
> that is fine with me as well.
>

Let me take it for a short spin in out xfs stable test environment, since
this env has caught one regression with an allegedly safe fix.
This env has also VMs with old xfsprogs, which is kind of important to
test since those LTS patches may end up in distros with old xfsprogs.

If all is well, I'll send your patch later today to stable maintainers
with this first for-5.10 series.

> ---
> From 8e0464752b24f2b3919e8e92298759d116b283eb Mon Sep 17 00:00:00 2001
> From: Dave Chinner <dchinner@redhat.com>
> Date: Fri, 18 Jun 2021 08:21:51 -0700
> Subject: xfs: Fix CIL throttle hang when CIL space used going backwards
>

Damn! this patch slipped through my process (even though I did see
the correspondence on the list).

My (human) process has eliminated the entire 38 patch series
https://lore.kernel.org/linux-xfs/20210603052240.171998-1-david@fromorbit.com/
without noticing the fix that was inside it.

I did read every cover letter carefully for optimization series that I
eliminated
to look for important fixes and more than once I did pick singular fix patches
from optimization/cleanup series.

In this case, I guess Dave was not aware of the severity of the bug fixed
when he wrote the cover letter(?), but the fact that the series is not only
an improvement was not mentioned.

It's good that we have many vectors to find stable fixes :)

Cheers,
Amir.

Dave Chinner May 27, 2022, 9:08 a.m. UTC | #7

On Fri, May 27, 2022 at 10:01:48AM +0300, Amir Goldstein wrote:
> On Fri, May 27, 2022 at 9:06 AM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > FYI, below is the 5.10-stable backport I have been testing earlier this
> > week that fixes a bugzilla reported hang:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=214767
> >
> > I was just going to submit it to the stable maintaines today after
> > beeing out of the holiday, but if you want to add it to this queue
> > that is fine with me as well.
> >
> 
> Let me take it for a short spin in out xfs stable test environment, since
> this env has caught one regression with an allegedly safe fix.
> This env has also VMs with old xfsprogs, which is kind of important to
> test since those LTS patches may end up in distros with old xfsprogs.
> 
> If all is well, I'll send your patch later today to stable maintainers
> with this first for-5.10 series.
> 
> > ---
> > From 8e0464752b24f2b3919e8e92298759d116b283eb Mon Sep 17 00:00:00 2001
> > From: Dave Chinner <dchinner@redhat.com>
> > Date: Fri, 18 Jun 2021 08:21:51 -0700
> > Subject: xfs: Fix CIL throttle hang when CIL space used going backwards
> >
> 
> Damn! this patch slipped through my process (even though I did see
> the correspondence on the list).
> 
> My (human) process has eliminated the entire 38 patch series
> https://lore.kernel.org/linux-xfs/20210603052240.171998-1-david@fromorbit.com/
> without noticing the fix that was inside it.

The first two times it was in much smaller patch series (5 and 8
patches total).

Also, you probably need to search for commit IDs on the list, too,
because this discussion was held in November about backporting the
fix to 5.10 stable kernels:

Subject: Help deciding about backported patch (kernel bug 214767, 19f4e7cc8197 xfs: Fix CIL throttle hang when CIL space used going backwards)
https://lore.kernel.org/linux-xfs/C1EC87A2-15B4-45B1-ACE2-F225E9E30DA9@flyingcircus.io/

> In this case, I guess Dave was not aware of the severity of the bug fixed

I was very aware of the severity of the problem, and I don't need
anyone trying to tell me what I should have been doing 18 months
ago.

It simply wasn't a severe bug. We had one user reporting it, and the
when I found the bug I realised that it was a zero-day thinko in
delayed logging accounting I made back in 2010 (~2.6.38 timeframe,
IIRC).  IOWs, it took 10 years before we got the first indication
there was a deep, dark corner case bug lurking in that code.

The time between first post of the bug fix and merge was about 6
months, so it also wasn't considered serious by anyone at the time
as it missed 2 whole kernel releases before it was reviewed and
merged...

There's been a small handful of user reports of this bug since (e.g
the bz above and the backport discussions), but it's pretty clear
that this bug is not (and never has been) a widespread issue.  It
just doesn't fit any of the criteria for a severe bug.

Backport candidate: yes. Severe: absolutely not.

-Dave.

Amir Goldstein May 27, 2022, 12:24 p.m. UTC | #8

On Fri, May 27, 2022 at 12:08 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Fri, May 27, 2022 at 10:01:48AM +0300, Amir Goldstein wrote:
> > On Fri, May 27, 2022 at 9:06 AM Christoph Hellwig <hch@infradead.org> wrote:
> > >
> > > FYI, below is the 5.10-stable backport I have been testing earlier this
> > > week that fixes a bugzilla reported hang:
> > >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=214767
> > >
> > > I was just going to submit it to the stable maintaines today after
> > > beeing out of the holiday, but if you want to add it to this queue
> > > that is fine with me as well.
> > >
> >
> > Let me take it for a short spin in out xfs stable test environment, since
> > this env has caught one regression with an allegedly safe fix.
> > This env has also VMs with old xfsprogs, which is kind of important to
> > test since those LTS patches may end up in distros with old xfsprogs.
> >
> > If all is well, I'll send your patch later today to stable maintainers
> > with this first for-5.10 series.
> >
> > > ---
> > > From 8e0464752b24f2b3919e8e92298759d116b283eb Mon Sep 17 00:00:00 2001
> > > From: Dave Chinner <dchinner@redhat.com>
> > > Date: Fri, 18 Jun 2021 08:21:51 -0700
> > > Subject: xfs: Fix CIL throttle hang when CIL space used going backwards
> > >
> >
> > Damn! this patch slipped through my process (even though I did see
> > the correspondence on the list).
> >
> > My (human) process has eliminated the entire 38 patch series
> > https://lore.kernel.org/linux-xfs/20210603052240.171998-1-david@fromorbit.com/
> > without noticing the fix that was inside it.
>
> The first two times it was in much smaller patch series (5 and 8
> patches total).
>

The tool I was using tried to find the latest reference in lore
to have the closest reference to what was eventually merged
under the assumption that the last cover letter revision is the
best place to look at for overview.

In this case, however, there was an actual pull request for
the final version, but my tool wasn't aware of this practice yet,
so wasnt look for it.

>
> Also, you probably need to search for commit IDs on the list, too,
> because this discussion was held in November about backporting the
> fix to 5.10 stable kernels:
>
> Subject: Help deciding about backported patch (kernel bug 214767, 19f4e7cc8197 xfs: Fix CIL throttle hang when CIL space used going backwards)
> https://lore.kernel.org/linux-xfs/C1EC87A2-15B4-45B1-ACE2-F225E9E30DA9@flyingcircus.io/
>

Yes, as a human I was aware of this correspondence.
As a human I also forgot about it, so having the tool search
the lists for followup by commit id sounds like a good idea.

> > In this case, I guess Dave was not aware of the severity of the bug fixed
>
> I was very aware of the severity of the problem, and I don't need
> anyone trying to tell me what I should have been doing 18 months
> ago.
>

I will make a note of that.
I wasn't trying to pass judgement on you.
I was trying to analyse what needed fixing in my process as this is
the first time I employed it and this was a case study of something
that went wrong in my process.
forgive me if I got carried away with trying to read your thoughts
about the bug.

> It simply wasn't a severe bug. We had one user reporting it, and the
> when I found the bug I realised that it was a zero-day thinko in
> delayed logging accounting I made back in 2010 (~2.6.38 timeframe,
> IIRC).  IOWs, it took 10 years before we got the first indication
> there was a deep, dark corner case bug lurking in that code.
>
> The time between first post of the bug fix and merge was about 6
> months, so it also wasn't considered serious by anyone at the time
> as it missed 2 whole kernel releases before it was reviewed and
> merged...
>
> There's been a small handful of user reports of this bug since (e.g
> the bz above and the backport discussions), but it's pretty clear
> that this bug is not (and never has been) a widespread issue.  It
> just doesn't fit any of the criteria for a severe bug.
>
> Backport candidate: yes. Severe: absolutely not.
>

Understood.

In the future, if you are writing a cover letter for an improvement
series or internal pull request and you know there is a backport
candidate inside, if you happen to remember to mention it, it would
be of great help to me.

Thanks!
Amir.

Luis Chamberlain May 27, 2022, 1:10 p.m. UTC | #9

On Thu, May 26, 2022 at 09:59:19PM +0300, Amir Goldstein wrote:
> On Thu, May 26, 2022 at 9:44 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
> >
> > On Thu, May 26, 2022 at 10:27:41AM -0700, Darrick J. Wong wrote:
> > > /me looks and sees a large collection of expunge lists, along with
> > > comments about how often failures occur and/or reasons.  Neat!
> > >
> > > Leah mentioned on the ext4 call this morning that she would have found
> > > it helpful to know (before she started working on 5.15 backports) which
> > > tests were of the flaky variety so that she could better prioritize the
> > > time she had to look into fstests failures.  (IOWS: saw a test fail a
> > > small percentage of the time and then burned a lot of machine time only
> > > to figure out that 5.15.0 also failed a percentage of th time).
> >
> > See my proposal to try to make this easier to parse:
> >
> > https://lore.kernel.org/all/YoW0ZC+zM27Pi0Us@bombadil.infradead.org/
> >
> > > We talked about where it would be most useful for maintainers and QA
> > > people to store their historical pass/fail data, before settling on
> > > "somewhere public where everyone can review their colleagues' notes" and
> > > "somewhere minimizing commit friction".  At the time, we were thinking
> > > about having people contribute their notes directly to the fstests
> > > source code, but I guess Luis has been doing that in the kdevops repo
> > > for a few years now.
> > >
> > > So, maybe there?
> >
> > For now sure, I'm happy to add others the linux-kdevops org on github
> > and they get immediate write access to the repo. This is working well
> > so far. Long term we need to decide if we want to spin off the
> > expunge list as a separate effort and make it a git subtree (note
> > it is different than a git sub module). Another example of a use case
> > for a git subtree, to use it as an example, is the way I forked
> > kconfig from Linux into a standalone git tree so to allow any project
> > to bump kconfig code with just one command. So different projects
> > don't need to fork kconfig as they do today.
> >
> > The value in doing the git subtree for expunges is any runner can use
> > it. I did design kdevops though to run on *any* cloud, and support
> > local virtualization tech like libvirt and virtualbox.
> >
> > The linux-kdevops git org also has other projects which both fstest
> > and blktests depend on, so for example dbench which I had forked and
> > cleaned up a while ago. It may make sense to share keeping oddball
> > efforts like thse which are no longer maintained in this repo.
> >
> > There is other tech I'm evaluating for this sort of collaborative test
> > efforts such as ledgers, but that is in its infancy at this point in
> > time. I have a sense though it may be a good outlet for collection of
> > test artifacts in a decentralized way and also allow *any* entity to
> > participate in bringing confidence to stable kernel branches or dev
> > branches prior to release.
> >
> 
> There are few problems I noticed with the current workflow.
> 
> 1. It will not scale to maintain this in git as more and more testers
> start using kdepops and adding more and more fs and configs and distros.

You say that but do not explain why you think this is the case.
Quite the contrary, I don't think so and I'll expain why. Let's
just stic to the expunge list as that is what matters here in this
context.

The expunge list is already divided by target kernels if using upstream
kernels by directory. So this applies to any stable kernel, vanilla
kernels, or linux-next. Folks working on these kernels would very likely
be collaborating just as you and I have.

Distro kernels also have their own directory, and so they'd very likely
collaborate.

> How many more developers you want to give push access to linux-kdevops?

Only those really collaborating, the idea is not to give access to the
world here. The challenge I'm thinking about for the future though is
how to scale this beyond just those few in a meaningful way in such a
way that you don't limit your scope of evaluation only to what resources
these folks have.

That is a research question and beyond the scope of just using git in a
shared linux repo.

> I don't know how test labs report to KernelCI, but we need to look at their
> model and not invent the wheel.

I looked at and to say the least I was not in any way shape or form
drawn to it or what it was using. You are free to look at it too.

The distributed aspect is what I don't agree with, and it is why I am
evaluating alternative decentralized technologies for the future.

It relies on a LAVA, Linaro Automated Validation Architecture. The
project home page to LAVA [0], mentions "LAVA is an automated validation
architecture primarily aimed at testing deployments of systems based
around the Linux kernel on ARM devices, specifically ARMv7 and later".
The SOC [1] page however now lists x86, but it is not the main focus of
the project. You can add a new test lab and add new tests, these tests
are intended to be public. If running tests for private consumption
you’d have to set up your own backend and front end. All this and the
experience with the results page was enough for me to decide this
wasn't an immediate good fit for automation for fstests and blktests
when I started considering this for enterprise Linux.

[0] https://git.lavasoftware.org
[1] https://linux.kernelci.org/soc/

It does not mean one cannot use a centralized methodology to share an
expunge list / artifacts, etc for fstests or blktests. A shared expunge
set on linux-kdevops organization is a *simple* centralized way to do that
to start off with, and if you limit access to folks who collaborate on
one directory (as you kind of do in Linux development with maintainers)
you avoid merge conflicts. We're not at a point yet where we're going to
have 100 folks who want access to say the v5.10.y directory for expunges
for XFS for example.... it's just you and me right now. Likewise for
other filesystems it would be similar. But from a research perspective
it does invite one to consider how to scale this in a sensible way
beyond those. When I looked at kernelci, I didn't personally think that
was an optimal way to scale, but that is beyond the scope of the simple
ramp up we're still discussing.

> 2. kdevops is very focused on stabilizing the baseline fast, 

Although it does help with this, I still think there is small efforts to
help automate this further in the future. A runner should be able to
spin this off without intervention if possible. Today upon failures it
requires manual verification, adding a new failure to an expunge list,
etc.  We can do better, and the goal is to slowly automate each of those
menial tasks which today we do manually. Building a completely new
baseline without manual intervention I think is possible and we should
strive towards it slowly and carefully.

> which is
> very good, but there is no good process of getting a test out of expunge list.

Yes, *part* of this involves a nice atomic task which can be dedicated to a
runner. So this goal alone needs to broken up in to parts:

a) Is this task still failing? --> easily automated today
b) How can we avoid this to fail --> not easily autmated today

As for a), a simple dedicated guest could for example evaluate a target kernel
on a fileystem configuration and run through each expunge and *verify* it
is indeed still failing. If it is not, and there is high confidence that
this is the case (say it verified that it is not failing many times), then
clearly the issue may have been fixed (say a stable kernel update) and
the task can inform us of that.

Thas task b) requires years, and years of work.

> We have a very strong suspicion that some of the tests that we put in
> expunge lists failed due to some setup issue in the host OS that caused
> NVME IO errors in the guests.

We already know that qemu with qcow2 nvme files does incur some delays
when doing full swing drive discards and this can cause some of these
nvme IO errors (timeouts). We now also are aware that the odds of this
timeout happening twice is also low but *is* possible. We *also* now
know that when two consecutive nvme timeoutes happen due to this it can
also *somehow* trigger an RCU false positive for blktests in some corner
cases when testing ZNS [0] but this was *what* made us realize that this
issue was a qemu issue and the qemu nvme maintainer has noted that
this needs to be fixed in qemu.

[0] https://lkml.kernel.org/r/YliZ9M6QWISXvhAJ@bombadil.infradead.org

But these sorts of qemu bugs should should not cause filesystem
issues. We also already know that this is a qemu bug and that this will
be fixed in the long term. Upon review with the qemu nvme maintainer
the way kdevops uses nvme is not incorrect.

Yes we can switch to raw format to skip the suboptimal way to do
discards, but we *want* to find more bugs, not less. We could
simply just make a new Kconfig entry on kdevops to enable users to use
raw files for the nvme drives for those that want to opt-out of these
timeouts for now.

> I tried to put that into comments when
> I noticed that, but I am afraid there may have been other tests that are
> falsely accused of failing.

There are two things we should consider in light of this:

c) We do need semantics for common exceptions to failures
d) We need an appreciation for why some of these exceptions may be
   real odd issues and it may take time to either fix them or
   to acknoledge they are non-issue somehow.

As for c) I had proposed a way to annotate failure rate, perhaps
we need a way to annotate these odd issues as well.

In my talk at LSFMM I mentioned how 25% of time *alone* on the test
automation effort consists of dealing with low hanging fruit. Since
companies are now trying to dedicate some resources towards stable
filesystem efforts it maybe worthy for them to consider this so that
they are aware that some of these oddball issues may end up with them
lurking in odd corners. I gave one example which took 8 months to root
cause on the blktests front alone at LSFMM.

> All developers make those mistakes in their
> own expunge lists, but if we start propagating those mistakes to the world,
> it becomes an issue.

Agreed, but note that the conversation is shifting from not sharing
expunges to possibly sharing some notion of expunges *somehow*. That is
a small step forward. I agree we need to address these semantics issues
and they are important, but without the will to share expunges there
would have been no point to address some of these common pain points.

> For those two reasons I think that the model to aspire to should be
> composed of a database where absolutely everyone can post data
> point to in the form of facts (i.e. the test failed after N runs on this kernel
> and this hardware...) and another process, partly AI, partly human to
> digest all those facts into a knowledge base that is valuable and
> maintained by experts. Much easier said than done...

Anything is possible, sure. A centralized database is one way to go
about some of these things. I'm however suspicious that perhaps there
is a better way, and am still evaluating a ledger as a way to scale
test results. Both paths can be taken, in fact. One does not negate
the other.

*For now*... I do think a simple repo with those who *are* collaborating
on expunges can share a simple repo as we have been doing for a few
months.

The need for scaling has to be addressed but for the long term of growth
of the endeavour.

  Luis

Luis Chamberlain May 27, 2022, 3:40 p.m. UTC | #10

On Fri, May 27, 2022 at 03:24:02PM +0300, Amir Goldstein wrote:
> On Fri, May 27, 2022 at 12:08 PM Dave Chinner <david@fromorbit.com> wrote:
> > Backport candidate: yes. Severe: absolutely not.
> In the future, if you are writing a cover letter for an improvement
> series or internal pull request and you know there is a backport
> candidate inside, if you happen to remember to mention it, it would
> be of great help to me.

Amir, since you wrote a tool enhancement to scrape for possible
candidates, *if* we defined some sort of meta-data to describe
this sort of stuff on the cover letter:

Backport candidate: yes. Severe: absolutely not

It would be useful when scraping. Therefore, leaving the effort
to try to backport / feasibility to others. This would be different
than a stable Cc tag, as those have a high degree of certainty.

How about something like:

Backport-candidate: yes
Impact: low

  Luis

Darrick J. Wong May 27, 2022, 5:19 p.m. UTC | #11

On Fri, May 27, 2022 at 08:40:14AM -0700, Luis Chamberlain wrote:
> On Fri, May 27, 2022 at 03:24:02PM +0300, Amir Goldstein wrote:
> > On Fri, May 27, 2022 at 12:08 PM Dave Chinner <david@fromorbit.com> wrote:
> > > Backport candidate: yes. Severe: absolutely not.
> > In the future, if you are writing a cover letter for an improvement
> > series or internal pull request and you know there is a backport
> > candidate inside, if you happen to remember to mention it, it would
> > be of great help to me.
> 
> Amir, since you wrote a tool enhancement to scrape for possible
> candidates, *if* we defined some sort of meta-data to describe
> this sort of stuff on the cover letter:
> 
> Backport candidate: yes. Severe: absolutely not
> 
> It would be useful when scraping. Therefore, leaving the effort
> to try to backport / feasibility to others. This would be different
> than a stable Cc tag, as those have a high degree of certainty.
> 
> How about something like:
> 
> Backport-candidate: yes
> Impact: low

This particular patch (because we've recently hit it internally too)
would have been one of those:

Stable-Soak: January 2022

patches that I was talking about.

--D

>   Luis

Dave Chinner May 27, 2022, 11:42 p.m. UTC | #12

On Fri, May 27, 2022 at 08:40:14AM -0700, Luis Chamberlain wrote:
> On Fri, May 27, 2022 at 03:24:02PM +0300, Amir Goldstein wrote:
> > On Fri, May 27, 2022 at 12:08 PM Dave Chinner <david@fromorbit.com> wrote:
> > > Backport candidate: yes. Severe: absolutely not.
> > In the future, if you are writing a cover letter for an improvement
> > series or internal pull request and you know there is a backport
> > candidate inside, if you happen to remember to mention it, it would
> > be of great help to me.

That's what "fixes" and "cc: stable" tags in the commit itself are
for, not the cover letter.

> Amir, since you wrote a tool enhancement to scrape for possible
> candidates, *if* we defined some sort of meta-data to describe
> this sort of stuff on the cover letter:
> 
> Backport candidate: yes. Severe: absolutely not
> 
> It would be useful when scraping. Therefore, leaving the effort
> to try to backport / feasibility to others. This would be different
> than a stable Cc tag, as those have a high degree of certainty.
> 
> How about something like:
> 
> Backport-candidate: yes
> Impact: low

No.

This placing the responsibility/burden on upstream developers to
classify everything they do that *might* get backported regardless
of the context the change is being made under. Stable maintainers
have the benefit of hindsight, user bug reports, etc and can make
much better determinations of whether any particular change should
be backported.

Indeed, a single user reporting a bug doesn't make the fix an
automatic backport candidate. The fix might be too complex to
backport, it might have serious dependency requirements, it might be
specific to a really niche configuration or workload, the user might
be running their own custom kernel and does their own backports, it
might be impossible for anyone but the person fixing the bug to
validate that the fix works correctly and it might take days or
weeks of effort to perform that validation, etc.

IOWs, there are many reasons that bug fixes may not be considered
backport candidates at the time they are fixed and committed. If
circumstances change 6 months later and only then it becomes clear
that we need to backport the fix, then that is not a problem the
upstream process can solve. Upstream has not done anything wrong
here, nor could they have known that the initial classification of
"rare, whacky, almost no exposure anywhere" might be completely
wrong because of something else they did not know about at the
time...

IMO, placing the responsibility on upstream developers to accurately
identify and classify every possible commit that might be a backport
candidate for the benefit of stable kernel maintainers is completely
upside down.

Identifying the fixes for problems that that stable kernel users are
hitting is the responsibility of the stable kernel maintainers. A
stable kernel maintainer is not just a role of "do backports and
test kernels". The stable kernel maintainer must also provide front
line user support so they are aware of the bugs those users are
hitting that *haven't already been identified* as affecting users of
those kernels. Stable kernel maintainers are *always* going to miss
bug fixes or changes that happen upstream that unintentionally fix
or mitigate problems that occur in older kernels.

Hence stable maintainers need to start from the position of "users
will always hit stuff we haven't fixed", not start from the position
of "upstream developers should tell us what we should fix". Many of
the upstream bug fixes come from user bug reports on stable kernels,
not from users on bleeding edge upstream kernels. Stable kernel
maintainers need to be triaging these bug reports and determining if
the problem is fixed upstream and the commit that fixes it, or if it
hasn't been fixed working with upstream to get the bugs fixed and
then backported.

If the upstream developer knows that it is a regression fix or a new
bug that likely needs to be backported, then we already have
"Fixes:" and "cc: stable" for communicating "please backport this"
back down the stack. Anything more is just asking us to make wild
guesses about an unknown future.

Cheers,

-Dave.

Amir Goldstein May 28, 2022, 5 a.m. UTC | #13

On Sat, May 28, 2022 at 2:42 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Fri, May 27, 2022 at 08:40:14AM -0700, Luis Chamberlain wrote:
> > On Fri, May 27, 2022 at 03:24:02PM +0300, Amir Goldstein wrote:
> > > On Fri, May 27, 2022 at 12:08 PM Dave Chinner <david@fromorbit.com> wrote:
> > > > Backport candidate: yes. Severe: absolutely not.
> > > In the future, if you are writing a cover letter for an improvement
> > > series or internal pull request and you know there is a backport
> > > candidate inside, if you happen to remember to mention it, it would
> > > be of great help to me.
>
> That's what "fixes" and "cc: stable" tags in the commit itself are
> for, not the cover letter.

Cover letter is an overview of the work.
A good cover letter includes an overview of the individual patches
in the context of the whole work as your cover letter did:

Summary of series:

Patches Modifications
------- -------------
1-7: log write FUA/FLUSH optimisations
8: bug fix
9-11: Async CIL pushes
12-25: xlog_write() rework
26-39: CIL commit scalability

So it was lapse of judgement on my part or carelessness that made me
eliminate the series without noting patch #8.

Furthermore, the subject of the patch has Fix and trailer has
Reported-and-tested-by:
so auto candidate selection would have picked it up easily, but my scripts
only looked for the obvious Fixes: tag inside the eliminated series, so that
is a problem with my process that I need to improve.

So the blame is entirely on me! not on you!

And yet.
"bug fix"?
Really?

I may not have been expecting more of other developers.
But I consider you to be one of the best when it comes to analyzing and
documenting complex and subtle code, so forgive me for expecting more.
I am not talking about severity or rareness or other things that may change
in time perspective. I am talking about describing the changes in a way that
benefits the prospective consumers of the cover letter (i.e. me).

It makes me sad that you are being defensive about this, because I wish
to be able to provide feedback to developers without having deal with a
responses like:
"I don't need anyone trying to tell me what I should have been doing"

Not every suggestion for an improvement is an attack.
You do not have to agree with my suggestions nor to adopt them,
but please do not dismiss them, because they come in good faith.

Thanks,
Amir.

Dave Chinner June 1, 2022, 4:31 a.m. UTC | #14

On Sat, May 28, 2022 at 08:00:48AM +0300, Amir Goldstein wrote:
> On Sat, May 28, 2022 at 2:42 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Fri, May 27, 2022 at 08:40:14AM -0700, Luis Chamberlain wrote:
> > > On Fri, May 27, 2022 at 03:24:02PM +0300, Amir Goldstein wrote:
> > > > On Fri, May 27, 2022 at 12:08 PM Dave Chinner <david@fromorbit.com> wrote:
> > > > > Backport candidate: yes. Severe: absolutely not.
> > > > In the future, if you are writing a cover letter for an improvement
> > > > series or internal pull request and you know there is a backport
> > > > candidate inside, if you happen to remember to mention it, it would
> > > > be of great help to me.
> >
> > That's what "fixes" and "cc: stable" tags in the commit itself are
> > for, not the cover letter.
> 
> Cover letter is an overview of the work.
> A good cover letter includes an overview of the individual patches
> in the context of the whole work as your cover letter did:
> 
> Summary of series:
> 
> Patches Modifications
> ------- -------------
> 1-7: log write FUA/FLUSH optimisations
> 8: bug fix
> 9-11: Async CIL pushes
> 12-25: xlog_write() rework
> 26-39: CIL commit scalability
> 
> So it was lapse of judgement on my part or carelessness that made me
> eliminate the series without noting patch #8.
> 
> Furthermore, the subject of the patch has Fix and trailer has
> Reported-and-tested-by:
> so auto candidate selection would have picked it up easily, but my scripts
> only looked for the obvious Fixes: tag inside the eliminated series, so that
> is a problem with my process that I need to improve.
>
> So the blame is entirely on me! not on you!

I can feel a "But..." is about to arrive....

> And yet.
> "bug fix"?
> Really?

... and there's the passive-aggressive blame-shift.

As you just said yourself, all the information you required was in
both the cover letter and the patch, but you missed them both. You
also missed the other 3 times this patch was posted to the list,
too. Hence even if that cover letter said "patch 8: CIL log space
overrun bug fix", it wouldn't have made any difference because of
the process failures on your side.

So what's the point you're trying to make with this comment? What is
the problem it demonstrates that we need to address? We can't be
much more explicit than "patch X is a bug fix" in a cover letter, so
what are you expecting us to do differently?

> I may not have been expecting more of other developers.
> But I consider you to be one of the best when it comes to analyzing and
> documenting complex and subtle code, so forgive me for expecting more.

That's not very fair.  If you are going to hold me to a high bar,
then you need to hold everyone to that same high bar.....

> It makes me sad that you are being defensive about this, because I wish

.... because people tend to get defensive when they feel like they
are being singled out repeatedly for things that nobody else is
getting called out for.

Cheers,

Dave.

Amir Goldstein June 1, 2022, 7:10 a.m. UTC | #15

On Wed, Jun 1, 2022 at 7:31 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Sat, May 28, 2022 at 08:00:48AM +0300, Amir Goldstein wrote:
> > On Sat, May 28, 2022 at 2:42 AM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Fri, May 27, 2022 at 08:40:14AM -0700, Luis Chamberlain wrote:
> > > > On Fri, May 27, 2022 at 03:24:02PM +0300, Amir Goldstein wrote:
> > > > > On Fri, May 27, 2022 at 12:08 PM Dave Chinner <david@fromorbit.com> wrote:
> > > > > > Backport candidate: yes. Severe: absolutely not.
> > > > > In the future, if you are writing a cover letter for an improvement
> > > > > series or internal pull request and you know there is a backport
> > > > > candidate inside, if you happen to remember to mention it, it would
> > > > > be of great help to me.
> > >
> > > That's what "fixes" and "cc: stable" tags in the commit itself are
> > > for, not the cover letter.
> >
> > Cover letter is an overview of the work.
> > A good cover letter includes an overview of the individual patches
> > in the context of the whole work as your cover letter did:
> >
> > Summary of series:
> >
> > Patches Modifications
> > ------- -------------
> > 1-7: log write FUA/FLUSH optimisations
> > 8: bug fix
> > 9-11: Async CIL pushes
> > 12-25: xlog_write() rework
> > 26-39: CIL commit scalability
> >
> > So it was lapse of judgement on my part or carelessness that made me
> > eliminate the series without noting patch #8.
> >
> > Furthermore, the subject of the patch has Fix and trailer has
> > Reported-and-tested-by:
> > so auto candidate selection would have picked it up easily, but my scripts
> > only looked for the obvious Fixes: tag inside the eliminated series, so that
> > is a problem with my process that I need to improve.
> >
> > So the blame is entirely on me! not on you!
>
> I can feel a "But..." is about to arrive....
>
> > And yet.
> > "bug fix"?
> > Really?
>
> ... and there's the passive-aggressive blame-shift.
>

Not very passive ;)

> As you just said yourself, all the information you required was in
> both the cover letter and the patch, but you missed them both. You
> also missed the other 3 times this patch was posted to the list,
> too. Hence even if that cover letter said "patch 8: CIL log space
> overrun bug fix", it wouldn't have made any difference because of
> the process failures on your side.
>

That's fair.

> So what's the point you're trying to make with this comment? What is
> the problem it demonstrates that we need to address? We can't be
> much more explicit than "patch X is a bug fix" in a cover letter, so
> what are you expecting us to do differently?
>

It is a bit hard for me to express my expectations in technical terms,
because you are right - you did provide all the information.

At a very high level, I would very much like the authors of patches
and cover letters to consider the people working on backports
when they are describing their work.

There are many people working on backports on every major
company/distro so I think I am not alone in this request.

The best I can do is point to an example patch set that
I eliminated from back port for being a "rework" and where it
was made very clear that the patch set contains an independent bug fix:
https://lore.kernel.org/all/20210121154526.1852176-1-bfoster@redhat.com/

> > I may not have been expecting more of other developers.
> > But I consider you to be one of the best when it comes to analyzing and
> > documenting complex and subtle code, so forgive me for expecting more.
>
> That's not very fair.  If you are going to hold me to a high bar,
> then you need to hold everyone to that same high bar.....
>

Holding everyone to your standards is not going to be easy,
but I will try :)

> > It makes me sad that you are being defensive about this, because I wish
>
> .... because people tend to get defensive when they feel like they
> are being singled out repeatedly for things that nobody else is
> getting called out for.
>

I am sorry that I made you feel this way.

The trigger for this thread was a patch that I missed.
Doing the post mortem, I was raising the postulate [with a (?)]
that the author (who happened to be you) was not considering
the bug to be severe, which you had later confirmed.

After that, the conversation just escalated for the wrong reasons.
I was trying to make a general plea for more consideration in
the people that do backporting work and it came out wrong as
an attack on you. I need to watch my phrasing more carefully.

I sincerely apologize. It was not my intention at all.
I hope we can move past this and avoid clashes in the future.

Moving on...

I was thinking later that there is another point of failure in the
backport process that is demonstrated in this incident -
An elephant in the room even -
We have no *standard* way to mark a patch as a bug fix,
so everyone is using their own scripts and regex magic.

Fixes: is a standard that works, but it does not apply in many
cases, for example:

1. Zero day (some mark those as Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2"))
2. Hard work to figure out the fix commit
3. Discourage AUTOSEL

I am not sure if the patch in question was not annotated with Fixes:
for one or more of the above(?).

Regarding AUTOSEL, I hope this is not a problem anymore, because
AUTOSEL has been told to stay away from xfs code for a while now.

Regarding Zero day, I think this is a common case that deserves
attention by annotation.

Any annotation along the lines of Fixes: ????/0000
could cover cases #1 and #2 and may work with some existing auto
selection scripts without having to change them.

Granted, it may also break some existing script by making them
unselect a fix for a commit they cannot find.

Maybe instead of annotating what the fix fixes, annotation could
be somehow related to how the bug was detected, on which version,
with which test? not reproducible?

Thoughts?

Thanks,
Amir.

Theodore Ts'o June 2, 2022, 4:12 a.m. UTC | #16

On Wed, Jun 01, 2022 at 10:10:11AM +0300, Amir Goldstein wrote:
> At a very high level, I would very much like the authors of patches
> and cover letters to consider the people working on backports
> when they are describing their work.
> 
> There are many people working on backports on every major
> company/distro so I think I am not alone in this request.

So while there are people on my time that are working on backports,
and need to support a variety of kernels for diffeernt varieties of
production, I'm also an upstream maintainer so I see both sides of the
issue.

Yes, there are a lot of people who are working on backports.  But at
the same time, many backport efforts are being done by companies that
have a profit motive for supporting these production/product kernels,
and so it is often easier to get funding, in the form of head count
allocations, for supporting these production/product kernels, than it
is to get funding to support upstream kernel work, from a comparative
point of view.  Take a look at how many kernel engineers at Red Hat,
SuSE, etc., work on supporting their revenue product, versus those who
work on the upstream kernel.

There is a reason why historically, we've optimized for the upstream
maintainers, and not for the product kernels.  After all, companies
are getting paid $$$ for the product kernels.  If companies funded a
more head count to work on making life easier for stable backports,
that would be great.  But otherwise, requests like this end up coming
as effective unfunded mandates on upstream developers' time.

(And it's not stable kernel backports; it's also Syzbot reports,
Luis's failures found by using loop devices, etc.  If I had an
infinite amount of time, or if I have personal time on weekends where
I'm looking for something extra to do for fun, great.  But otherwise,
someone has to fund these efforts, or it's just going to make upstream
developers get irritated at worst, and not pay a lot of attention to
these requests at best.)

The reality is that if a backport is associated with a revenue
product, it's much easier to justify getting headcount when it comes
time to fighting the company budget headcount wargames, and I've seen
this at many different companies.  It's just the way of the world.

> I was thinking later that there is another point of failure in the
> backport process that is demonstrated in this incident -
> An elephant in the room even -
> We have no *standard* way to mark a patch as a bug fix,
> so everyone is using their own scripts and regex magic.
> 
> Fixes: is a standard that works, but it does not apply in many
> cases, for example:
> 
> 1. Zero day (some mark those as Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2"))
> 2. Hard work to figure out the fix commit
> 3. Discourage AUTOSEL

For security fixes, just marking a bug as fixing a security bug, even
if you try to obfuscate the Fixes tag, is often enough to tip off a
potential attacker.   So I would consider:

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2"))

to Not Be A Solution for security related patches.  The real fix here
is to put the commit id in the CVE, and to have kernel security teams
monitoring the CVE feeds looking for bugs that need to be remediated
post haste.  If you sell to the US Federal government, FedRAMP RA-5d
has some interesting mitigation timeline requirements depending on the
CVE Security Score.  And for people who are getting paid to sell into
that particular set of customers, they presumably have built into
their pricing model the cost of having a security team do that work.

Given the tight requirements for mitigating CVE's with a CVESS > 7.0,
you probably can't afford to wait for a LTS kernel to get those
patches into a revenue product (especially once you include time to QA
the updated kernel), so it should hopefully be not that hard to find a
product security team willing to identify commits that need to be
backported into the LTS Stable kernel to address security issues.
Espceially for high severity CVE's, they won't need to worry about
unduly giving their competitors in that market segment a free ride.  :-)

The hard work to figure out the fix commit is a real one, and this is
an example of where the interests of upstream and people who want to
do backports come into partial conflict.  The more we do code cleanup,
refactoring, etc., to reduce technical debt --- something which is of
great interest to upstream developers --- the harder it is to
idetntify the fixes tag, and the harder it is to backport to bug and
security fixes after the tech debt reduction commit has gone in.  So
someone who only cares about backports into product kernels, to the
exclusion of all else, would want to discourage tech debt reudction
commits.  Except they know that won't fly, and they would be flamed to
a crisp if they try.  :-)

I suppose one workaround is if an upstream developer is too tired or
too harried to figure out the correct value of a fixes tag, one cheesy
way around it would be to use:

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2"))

to basically mean, this fixes a bug, but I'm not sure where it is, and
it's long enough ago that maybe it's better if we ask the backport
folks to figure out the dependency if the patch doesn't apply cleanly
as to whether or not they need to do the code archeology to figure out
if it applies to an ancient kernel like 4.19 or 5.10, because again,
the product backport folks are likely to outnumber the upstream
developers, and the product backport folks are linked to revenue
streams.

So I would argue that maybe a more sustainable approach is to find a
way for the product backport folks to work together to take load off
of the upstream developers.  I'm sure there will be times when there
are some easy things that upstream folks can do to make things better,
but trying to move all or most of the burden onto the upstream
developers is as much of an unfunded mandate as Syzbot is.  :-/

	      	      	    	     	     - Ted

Amir Goldstein June 2, 2022, 5:34 a.m. UTC | #17

> > I was thinking later that there is another point of failure in the
> > backport process that is demonstrated in this incident -
> > An elephant in the room even -
> > We have no *standard* way to mark a patch as a bug fix,
> > so everyone is using their own scripts and regex magic.
> >
> > Fixes: is a standard that works, but it does not apply in many
> > cases, for example:
> >
> > 1. Zero day (some mark those as Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2"))
> > 2. Hard work to figure out the fix commit
> > 3. Discourage AUTOSEL
>
> For security fixes, just marking a bug as fixing a security bug, even
> if you try to obfuscate the Fixes tag, is often enough to tip off a
> potential attacker.   So I would consider:
>
>     Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2"))
>
> to Not Be A Solution for security related patches.  The real fix here

I miswrote. I meant fixes for bugs that existed since day 1.
The annotation above is adequate, but is also a bit ugly IMO.

>
>
> The hard work to figure out the fix commit is a real one, and this is
> an example of where the interests of upstream and people who want to
> do backports come into partial conflict.  The more we do code cleanup,
> refactoring, etc., to reduce technical debt --- something which is of
> great interest to upstream developers --- the harder it is to
> idetntify the fixes tag, and the harder it is to backport to bug and
> security fixes after the tech debt reduction commit has gone in.  So
> someone who only cares about backports into product kernels, to the
> exclusion of all else, would want to discourage tech debt reudction
> commits.  Except they know that won't fly, and they would be flamed to
> a crisp if they try.  :-)
>
> I suppose one workaround is if an upstream developer is too tired or
> too harried to figure out the correct value of a fixes tag, one cheesy
> way around it would be to use:
>
>     Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2"))
>

I am sure that we can be more expressive than that :)

I suggested:
  Fixes: ????

Not as a joke. It is more descriptive than the above which could really
mean "I know this bug existed since day 1"

> to basically mean, this fixes a bug, but I'm not sure where it is, and
> it's long enough ago that maybe it's better if we ask the backport
> folks to figure out the dependency if the patch doesn't apply cleanly
> as to whether or not they need to do the code archeology to figure out
> if it applies to an ancient kernel like 4.19 or 5.10, because again,
> the product backport folks are likely to outnumber the upstream
> developers, and the product backport folks are linked to revenue
> streams.
>
> So I would argue that maybe a more sustainable approach is to find a
> way for the product backport folks to work together to take load off
> of the upstream developers.  I'm sure there will be times when there
> are some easy things that upstream folks can do to make things better,
> but trying to move all or most of the burden onto the upstream
> developers is as much of an unfunded mandate as Syzbot is.  :-/
>

The funny thing is that my rant was about a cover letter written
by an upstream developer, so unless what you mean is that backport
teams should invest in review and make those comments during
review, then it's too late by the time the backport team got to do their job.

You raise the resources problem and I understand that, but
I am not looking for a solution that creates more work for upstream
maintainers. Quite the contrary.

I am simply looking for a standard way to express
"attention backport teams".

What I am looking for is a solution for this problem:

Developer/maintainer has information that can be very helpful to backport
teams.

The information is known at the time of the writing and it requires no extra
effort on their part, but is not mentioned in a standard way because there
is no standard way that does not incur extra work.

Some maintainers that I am working with are very diligent about requiring Fixes:
when applicable.

It may be because of $$$ as you say, but I have a more romantic interpretation.
I believe it is because they care about the code they are developing and they
understand that mainstream is not a product and has no real user.

Unless maintainers care about downstream products, real users will perceive
their code as bad quality - no matter whose responsibility it was to
backport the
fixes or point out to the relevant fixes.

Thanks,
Amir.

[PATH,5.10,0/4] xfs stable candidate patches for 5.10.y (part 1)

Message

Comments