mbox series

[0/7] xfs: log race fixes and cleanups

Message ID 20190904042451.9314-1-david@fromorbit.com (mailing list archive)
Headers show
Series xfs: log race fixes and cleanups | expand

Message

Dave Chinner Sept. 4, 2019, 4:24 a.m. UTC
HI folks,

This series of patches contain fixes for the generic/530 hangs that
Chandan and Jan reported recently. The first patch in the series is
the main fix for this, though in theory the last patch in the series
is necessary to totally close the problem off.

The second patch is an outstanding fix for a log flush wakeup race
in the xlog_state_do_callback() code from Rik van Riel. Given that
the next 5 patches in the series clean up iand address the tail
pushing race in xlog_state_do_callback(), it makes sense to get this
into the kernel at the same time.

Patches 3-6 factor several independent chunks out of
xlog_state_do_callback, greatly simplifying that function and
breaking the iclog IO completion processing down into discrete
chunks of separate functionality. This makes the code much easier
to read, follow and understand.

An the last patch kicks the AIL if necessary when the log tail is
moved to avod the situation where the AIL pushing has stalled
because the previous last_sync_lsn is further into the log than the
push target reaches.

Comments welcome.

-Dave.

Comments

Christoph Hellwig Sept. 4, 2019, 5:26 a.m. UTC | #1
On Wed, Sep 04, 2019 at 02:24:44PM +1000, Dave Chinner wrote:
> HI folks,
> 
> This series of patches contain fixes for the generic/530 hangs that
> Chandan and Jan reported recently. The first patch in the series is
> the main fix for this, though in theory the last patch in the series
> is necessary to totally close the problem off.

While I haven't been active in the thread I already reported this weeks
ago as well..
Christoph Hellwig Sept. 4, 2019, 5:56 a.m. UTC | #2
On Tue, Sep 03, 2019 at 10:26:00PM -0700, Christoph Hellwig wrote:
> On Wed, Sep 04, 2019 at 02:24:44PM +1000, Dave Chinner wrote:
> > HI folks,
> > 
> > This series of patches contain fixes for the generic/530 hangs that
> > Chandan and Jan reported recently. The first patch in the series is
> > the main fix for this, though in theory the last patch in the series
> > is necessary to totally close the problem off.
> 
> While I haven't been active in the thread I already reported this weeks
> ago as well..

And unfortunately generic/530 still hangs for me with this series.
This is an x86-64 VM with 4G of RAM and virtio-blk, default mkfs.xfs
options from current xfsprogs, 20G test and scratch fs.
Dave Chinner Sept. 4, 2019, 10:57 p.m. UTC | #3
On Tue, Sep 03, 2019 at 10:56:19PM -0700, Christoph Hellwig wrote:
> On Tue, Sep 03, 2019 at 10:26:00PM -0700, Christoph Hellwig wrote:
> > On Wed, Sep 04, 2019 at 02:24:44PM +1000, Dave Chinner wrote:
> > > HI folks,
> > > 
> > > This series of patches contain fixes for the generic/530 hangs that
> > > Chandan and Jan reported recently. The first patch in the series is
> > > the main fix for this, though in theory the last patch in the series
> > > is necessary to totally close the problem off.
> > 
> > While I haven't been active in the thread I already reported this weeks
> > ago as well..

Ok, I didn't know that, and now that I look there's no information
that I can use to determine what went wrong...

> And unfortunately generic/530 still hangs for me with this series.

Where does it hang?

> This is an x86-64 VM with 4G of RAM and virtio-blk, default mkfs.xfs
> options from current xfsprogs, 20G test and scratch fs.

That's pretty much what one of my test rigs is, except iscsi luns
rather than virtio-blk. I haven't been able to reproduce the issues,
so I'm kinda flying blind w.r.t. to testing them here. Can you
get a trace of what is happening (xfs_trans*, xfs_log*, xfs_ail*
tracepoints) so I can have a deeper look?

Cheers,

Dave.
Dave Chinner Sept. 5, 2019, 7:10 a.m. UTC | #4
On Wed, Sep 04, 2019 at 11:51:33PM -0700, Christoph Hellwig wrote:
> On Thu, Sep 05, 2019 at 08:57:16AM +1000, Dave Chinner wrote:
> > > And unfortunately generic/530 still hangs for me with this series.
> > 
> > Where does it hang?
> > 
> > > This is an x86-64 VM with 4G of RAM and virtio-blk, default mkfs.xfs
> > > options from current xfsprogs, 20G test and scratch fs.
> > 
> > That's pretty much what one of my test rigs is, except iscsi luns
> > rather than virtio-blk. I haven't been able to reproduce the issues,
> > so I'm kinda flying blind w.r.t. to testing them here. Can you
> > get a trace of what is happening (xfs_trans*, xfs_log*, xfs_ail*
> > tracepoints) so I can have a deeper look?
> 
> console output below, traces attached.

Thanks, I'll have a look in a minute. I'm pretty sure I know what it
will show - I got a trace from Chandan earlier this afternoon and
the problem is log recovery doesn't yeild the cpu until it runs out
of transaction reservation space, so the push work doesn't run
because workqueue default behaviour is strict "run work only on the
CPU it is queued on"....

I've got an additional patch in testing right now...

Cheers,

Dave.
Dave Chinner Sept. 5, 2019, 7:28 a.m. UTC | #5
On Thu, Sep 05, 2019 at 05:10:31PM +1000, Dave Chinner wrote:
> On Wed, Sep 04, 2019 at 11:51:33PM -0700, Christoph Hellwig wrote:
> > On Thu, Sep 05, 2019 at 08:57:16AM +1000, Dave Chinner wrote:
> > > > And unfortunately generic/530 still hangs for me with this series.
> > > 
> > > Where does it hang?
> > > 
> > > > This is an x86-64 VM with 4G of RAM and virtio-blk, default mkfs.xfs
> > > > options from current xfsprogs, 20G test and scratch fs.
> > > 
> > > That's pretty much what one of my test rigs is, except iscsi luns
> > > rather than virtio-blk. I haven't been able to reproduce the issues,
> > > so I'm kinda flying blind w.r.t. to testing them here. Can you
> > > get a trace of what is happening (xfs_trans*, xfs_log*, xfs_ail*
> > > tracepoints) so I can have a deeper look?
> > 
> > console output below, traces attached.
> 
> Thanks, I'll have a look in a minute. I'm pretty sure I know what it
> will show - I got a trace from Chandan earlier this afternoon and
> the problem is log recovery doesn't yeild the cpu until it runs out
> of transaction reservation space, so the push work doesn't run
> because workqueue default behaviour is strict "run work only on the
> CPU it is queued on"....

Yup, exactly the same trace. Right down to the lsns in the log and
the 307 iclog writes just after the log runs out of space. To quote
from #xfs earlier this afternoon:

[5/9/19 14:21] <dchinner> I see what is -likely- to be a cil checkpoint but without the closing commit record
[5/9/19 14:21] <chandan> which line number in the trace log are you noticing that?
[5/9/19 14:22] <dchinner> 307 sequential calls to xfs_log_assign_tail_lsn() from a kworker and then releasing a log reservation
[5/9/19 14:22] <dchinner> Assuming 32kB iclog size (default)
[5/9/19 14:23] <dchinner> thats 307 * 32 / 4 filesystem blocks, which is 2456 blocks
[5/9/19 14:24] <dchinner> that's 96% of the log in a single CIL commit
[5/9/19 14:24] <dchinner> this isn't a "why hasn't there been iclog completion" problem
[5/9/19 14:24] <dchinner> this is a "why didn't the CIL push occur when it passed 12% of the log...
[5/9/19 14:25] <dchinner> ?
[5/9/19 14:25] <dchinner> " problem
[5/9/19 14:26] <dchinner> oooohhhh
[5/9/19 14:27] <dchinner> this isn't a premeptible kernel, is it?
[5/9/19 14:27] <chandan> correct. Linux kernel on ppc64le isn't prememptible
[5/9/19 14:28] <dchinner> so a kernel thread running in a tight loop wil delay a kworker thread scheduled on the same CPU until running kthread yields the CPU
[5/9/19 14:28] <dchinner> but, because we've recovered all the inodes, etc, everything is hot in cache
[5/9/19 14:28] <dchinner> so the unlink workload runs without blocking, and so never yeilds the CPU until it runs out of transaction space.
[5/9/19 14:29] <dchinner> and only then does the background kworker get scheduled to run.

I'll send the updated patch set soon...

Cheers,

Dave.