diff mbox series

[16/26] xfs: synchronous AIL pushing

Message ID 20191009032124.10541-17-david@fromorbit.com (mailing list archive)
State New, archived
Headers show
Series mm, xfs: non-blocking inode reclaim | expand

Commit Message

Dave Chinner Oct. 9, 2019, 3:21 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

Provide an interface to push the AIL to a target LSN and wait for
the tail of the log to move past that LSN. This is used to wait for
all items older than a specific LSN to either be cleaned (written
back) or relogged to a higher LSN in the AIL. The primary use for
this is to allow IO free inode reclaim throttling.

Factor the common AIL deletion code that does all the wakeups into a
helper so we only have one copy of this somewhat tricky code to
interface with all the wakeups necessary when the LSN of the log
tail changes.

xfs_ail_push_sync() is temporary infrastructure to facilitate
non-blocking, IO-less inode reclaim throttling that allows further
structural changes to be made. Once those structural changes are
made, the need for this function goes away and it is removed,
leaving us with only the xfs_ail_update_finish() factoring when this
is all done.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_trans_ail.c  | 33 +++++++++++++++++++++++++++++++++
 fs/xfs/xfs_trans_priv.h |  2 ++
 2 files changed, 35 insertions(+)

Comments

Christoph Hellwig Oct. 11, 2019, 10:18 a.m. UTC | #1
On Wed, Oct 09, 2019 at 02:21:14PM +1100, Dave Chinner wrote:
> Factor the common AIL deletion code that does all the wakeups into a
> helper so we only have one copy of this somewhat tricky code to
> interface with all the wakeups necessary when the LSN of the log
> tail changes.
> 
> xfs_ail_push_sync() is temporary infrastructure to facilitate
> non-blocking, IO-less inode reclaim throttling that allows further
> structural changes to be made. Once those structural changes are
> made, the need for this function goes away and it is removed,
> leaving us with only the xfs_ail_update_finish() factoring when this
> is all done.

The xfs_ail_update_finish work here is in an earlier patch, so the
changelog will need some updates.

> +	spin_lock(&ailp->ail_lock);
> +	while ((lip = xfs_ail_min(ailp)) != NULL) {
> +		prepare_to_wait(&ailp->ail_push, &wait, TASK_UNINTERRUPTIBLE);
> +		if (XFS_FORCED_SHUTDOWN(ailp->ail_mount) ||
> +		    XFS_LSN_CMP(threshold_lsn, lip->li_lsn) <= 0)
> +			break;
> +		/* XXX: cmpxchg? */
> +		while (XFS_LSN_CMP(threshold_lsn, ailp->ail_target) > 0)
> +			xfs_trans_ail_copy_lsn(ailp, &ailp->ail_target, &threshold_lsn);

This code looks broken on 32-bit given that xfs_trans_ail_copy_lsn takes
the ail_lock there.  Just replacing the xfs_trans_ail_copy_lsn call with
a direct assignment would fix that, no need for cmpxchg either as far
as I can tell (and it would fix that too long line as well).

But a:

		while (XFS_LSN_CMP(threshold_lsn, ailp->ail_target) > 0)
			ailp->ail_target = threshold_lsn;

still looks odd, I think this should simply be an if. 

> +		wake_up_process(ailp->ail_task);
> +		spin_unlock(&ailp->ail_lock);

xfsaild will take ail_lock pretty quickly.  I think we should drop
the lock before waking it.
Brian Foster Oct. 11, 2019, 3:29 p.m. UTC | #2
On Fri, Oct 11, 2019 at 03:18:25AM -0700, Christoph Hellwig wrote:
> On Wed, Oct 09, 2019 at 02:21:14PM +1100, Dave Chinner wrote:
> > Factor the common AIL deletion code that does all the wakeups into a
> > helper so we only have one copy of this somewhat tricky code to
> > interface with all the wakeups necessary when the LSN of the log
> > tail changes.
> > 
> > xfs_ail_push_sync() is temporary infrastructure to facilitate
> > non-blocking, IO-less inode reclaim throttling that allows further
> > structural changes to be made. Once those structural changes are
> > made, the need for this function goes away and it is removed,
> > leaving us with only the xfs_ail_update_finish() factoring when this
> > is all done.
> 
> The xfs_ail_update_finish work here is in an earlier patch, so the
> changelog will need some updates.
> 
> > +	spin_lock(&ailp->ail_lock);
> > +	while ((lip = xfs_ail_min(ailp)) != NULL) {
> > +		prepare_to_wait(&ailp->ail_push, &wait, TASK_UNINTERRUPTIBLE);
> > +		if (XFS_FORCED_SHUTDOWN(ailp->ail_mount) ||
> > +		    XFS_LSN_CMP(threshold_lsn, lip->li_lsn) <= 0)

Wasn't this supposed to change to < 0? The rfc series had that logic,
but it changed from <= to < later in the wrong patch.

> > +			break;
> > +		/* XXX: cmpxchg? */
> > +		while (XFS_LSN_CMP(threshold_lsn, ailp->ail_target) > 0)
> > +			xfs_trans_ail_copy_lsn(ailp, &ailp->ail_target, &threshold_lsn);
> 
> This code looks broken on 32-bit given that xfs_trans_ail_copy_lsn takes
> the ail_lock there.  Just replacing the xfs_trans_ail_copy_lsn call with
> a direct assignment would fix that, no need for cmpxchg either as far
> as I can tell (and it would fix that too long line as well).
> 
> But a:
> 
> 		while (XFS_LSN_CMP(threshold_lsn, ailp->ail_target) > 0)
> 			ailp->ail_target = threshold_lsn;
> 
> still looks odd, I think this should simply be an if. 
> 
> > +		wake_up_process(ailp->ail_task);
> > +		spin_unlock(&ailp->ail_lock);
> 
> xfsaild will take ail_lock pretty quickly.  I think we should drop
> the lock before waking it.

Can't we replace this whole thing with something that repurposes
xfs_ail_push_all_sync()? That only requires some tweaks to the existing
function and the new _push_all_sync() wrapper ends up looking something
like:

	while ((threshold_lsn = xfs_ail_max_lsn(ailp)) != 0)
		xfs_ail_push_sync(ailp, threshold_lsn);

There's an extra lock cycle, but that's still only on tail updates. That
doesn't seem unreasonable to me for the usage of _push_all_sync().

Brian
Dave Chinner Oct. 11, 2019, 11:27 p.m. UTC | #3
On Fri, Oct 11, 2019 at 11:29:45AM -0400, Brian Foster wrote:
> On Fri, Oct 11, 2019 at 03:18:25AM -0700, Christoph Hellwig wrote:
> > On Wed, Oct 09, 2019 at 02:21:14PM +1100, Dave Chinner wrote:
> > > Factor the common AIL deletion code that does all the wakeups into a
> > > helper so we only have one copy of this somewhat tricky code to
> > > interface with all the wakeups necessary when the LSN of the log
> > > tail changes.
> > > 
> > > xfs_ail_push_sync() is temporary infrastructure to facilitate
> > > non-blocking, IO-less inode reclaim throttling that allows further
> > > structural changes to be made. Once those structural changes are
> > > made, the need for this function goes away and it is removed,
> > > leaving us with only the xfs_ail_update_finish() factoring when this
> > > is all done.
> > 
> > The xfs_ail_update_finish work here is in an earlier patch, so the
> > changelog will need some updates.
> > 
> > > +	spin_lock(&ailp->ail_lock);
> > > +	while ((lip = xfs_ail_min(ailp)) != NULL) {
> > > +		prepare_to_wait(&ailp->ail_push, &wait, TASK_UNINTERRUPTIBLE);
> > > +		if (XFS_FORCED_SHUTDOWN(ailp->ail_mount) ||
> > > +		    XFS_LSN_CMP(threshold_lsn, lip->li_lsn) <= 0)
> 
> Wasn't this supposed to change to < 0? The rfc series had that logic,
> but it changed from <= to < later in the wrong patch.

I probably forgot because this code gets removed at the end of the
series. Hence I haven't cared about exact correctness of neatness
as it's just temporary scaffolding to keep stuff from breaking
horribly as the changeover to non-blocking algorithms is done.

It works well enough that I can't break it as it stands - I've
tested each patch individually with both load and fstests, and so
this code as it stands doesn't introduce any bisect landmines - it
prevents a bunch of problems in OOM conditions by retaining the
blocking behaviour of reclaim until we no longer need it...

> > > +			break;
> > > +		/* XXX: cmpxchg? */
> > > +		while (XFS_LSN_CMP(threshold_lsn, ailp->ail_target) > 0)
> > > +			xfs_trans_ail_copy_lsn(ailp, &ailp->ail_target, &threshold_lsn);
> > 
> > This code looks broken on 32-bit given that xfs_trans_ail_copy_lsn takes
> > the ail_lock there.  Just replacing the xfs_trans_ail_copy_lsn call with
> > a direct assignment would fix that, no need for cmpxchg either as far
> > as I can tell (and it would fix that too long line as well).

Oh, right. I'll fix that.

> > still looks odd, I think this should simply be an if. 
> > 
> > > +		wake_up_process(ailp->ail_task);
> > > +		spin_unlock(&ailp->ail_lock);
> > 
> > xfsaild will take ail_lock pretty quickly.  I think we should drop
> > the lock before waking it.
> 
> Can't we replace this whole thing with something that repurposes
> xfs_ail_push_all_sync()? That only requires some tweaks to the existing
> function and the new _push_all_sync() wrapper ends up looking something
> like:
> 
> 	while ((threshold_lsn = xfs_ail_max_lsn(ailp)) != 0)
> 		xfs_ail_push_sync(ailp, threshold_lsn);
> 
> There's an extra lock cycle, but that's still only on tail updates. That
> doesn't seem unreasonable to me for the usage of _push_all_sync().

The whole thing goes away, so there is zero point in trying to
optimise or perfect this code. It's temporary code, treat it as
such.

Cheers,

Dave.
Brian Foster Oct. 12, 2019, 12:08 p.m. UTC | #4
On Sat, Oct 12, 2019 at 10:27:16AM +1100, Dave Chinner wrote:
> On Fri, Oct 11, 2019 at 11:29:45AM -0400, Brian Foster wrote:
> > On Fri, Oct 11, 2019 at 03:18:25AM -0700, Christoph Hellwig wrote:
> > > On Wed, Oct 09, 2019 at 02:21:14PM +1100, Dave Chinner wrote:
> > > > Factor the common AIL deletion code that does all the wakeups into a
> > > > helper so we only have one copy of this somewhat tricky code to
> > > > interface with all the wakeups necessary when the LSN of the log
> > > > tail changes.
> > > > 
> > > > xfs_ail_push_sync() is temporary infrastructure to facilitate
> > > > non-blocking, IO-less inode reclaim throttling that allows further
> > > > structural changes to be made. Once those structural changes are
> > > > made, the need for this function goes away and it is removed,
> > > > leaving us with only the xfs_ail_update_finish() factoring when this
> > > > is all done.
> > > 
> > > The xfs_ail_update_finish work here is in an earlier patch, so the
> > > changelog will need some updates.
> > > 
> > > > +	spin_lock(&ailp->ail_lock);
> > > > +	while ((lip = xfs_ail_min(ailp)) != NULL) {
> > > > +		prepare_to_wait(&ailp->ail_push, &wait, TASK_UNINTERRUPTIBLE);
> > > > +		if (XFS_FORCED_SHUTDOWN(ailp->ail_mount) ||
> > > > +		    XFS_LSN_CMP(threshold_lsn, lip->li_lsn) <= 0)
> > 
> > Wasn't this supposed to change to < 0? The rfc series had that logic,
> > but it changed from <= to < later in the wrong patch.
> 
> I probably forgot because this code gets removed at the end of the
> series. Hence I haven't cared about exact correctness of neatness
> as it's just temporary scaffolding to keep stuff from breaking
> horribly as the changeover to non-blocking algorithms is done.
> 
> It works well enough that I can't break it as it stands - I've
> tested each patch individually with both load and fstests, and so
> this code as it stands doesn't introduce any bisect landmines - it
> prevents a bunch of problems in OOM conditions by retaining the
> blocking behaviour of reclaim until we no longer need it...
> 

Ok, I guess I forgot that this was temporary. FWIW, I think it's worth
fixing the small things like this comparison and the couple things
Christoph points out, if nothing else to facilitate review. Disregard
the larger refactoring feedback..

Brian

> > > > +			break;
> > > > +		/* XXX: cmpxchg? */
> > > > +		while (XFS_LSN_CMP(threshold_lsn, ailp->ail_target) > 0)
> > > > +			xfs_trans_ail_copy_lsn(ailp, &ailp->ail_target, &threshold_lsn);
> > > 
> > > This code looks broken on 32-bit given that xfs_trans_ail_copy_lsn takes
> > > the ail_lock there.  Just replacing the xfs_trans_ail_copy_lsn call with
> > > a direct assignment would fix that, no need for cmpxchg either as far
> > > as I can tell (and it would fix that too long line as well).
> 
> Oh, right. I'll fix that.
> 
> > > still looks odd, I think this should simply be an if. 
> > > 
> > > > +		wake_up_process(ailp->ail_task);
> > > > +		spin_unlock(&ailp->ail_lock);
> > > 
> > > xfsaild will take ail_lock pretty quickly.  I think we should drop
> > > the lock before waking it.
> > 
> > Can't we replace this whole thing with something that repurposes
> > xfs_ail_push_all_sync()? That only requires some tweaks to the existing
> > function and the new _push_all_sync() wrapper ends up looking something
> > like:
> > 
> > 	while ((threshold_lsn = xfs_ail_max_lsn(ailp)) != 0)
> > 		xfs_ail_push_sync(ailp, threshold_lsn);
> > 
> > There's an extra lock cycle, but that's still only on tail updates. That
> > doesn't seem unreasonable to me for the usage of _push_all_sync().
> 
> The whole thing goes away, so there is zero point in trying to
> optimise or perfect this code. It's temporary code, treat it as
> such.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
diff mbox series

Patch

diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 685a21cd24c0..5e500a75b62b 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -662,6 +662,37 @@  xfs_ail_push_all(
 		xfs_ail_push(ailp, threshold_lsn);
 }
 
+/*
+ * Push the AIL to a specific lsn and wait for it to complete.
+ */
+void
+xfs_ail_push_sync(
+	struct xfs_ail		*ailp,
+	xfs_lsn_t		threshold_lsn)
+{
+	struct xfs_log_item	*lip;
+	DEFINE_WAIT(wait);
+
+	spin_lock(&ailp->ail_lock);
+	while ((lip = xfs_ail_min(ailp)) != NULL) {
+		prepare_to_wait(&ailp->ail_push, &wait, TASK_UNINTERRUPTIBLE);
+		if (XFS_FORCED_SHUTDOWN(ailp->ail_mount) ||
+		    XFS_LSN_CMP(threshold_lsn, lip->li_lsn) <= 0)
+			break;
+		/* XXX: cmpxchg? */
+		while (XFS_LSN_CMP(threshold_lsn, ailp->ail_target) > 0)
+			xfs_trans_ail_copy_lsn(ailp, &ailp->ail_target, &threshold_lsn);
+		wake_up_process(ailp->ail_task);
+		spin_unlock(&ailp->ail_lock);
+		schedule();
+		spin_lock(&ailp->ail_lock);
+	}
+	spin_unlock(&ailp->ail_lock);
+
+	finish_wait(&ailp->ail_push, &wait);
+}
+
+
 /*
  * Push out all items in the AIL immediately and wait until the AIL is empty.
  */
@@ -702,6 +733,7 @@  xfs_ail_update_finish(
 	if (!XFS_FORCED_SHUTDOWN(mp))
 		xlog_assign_tail_lsn_locked(mp);
 
+	wake_up_all(&ailp->ail_push);
 	if (list_empty(&ailp->ail_head))
 		wake_up_all(&ailp->ail_empty);
 	spin_unlock(&ailp->ail_lock);
@@ -858,6 +890,7 @@  xfs_trans_ail_init(
 	spin_lock_init(&ailp->ail_lock);
 	INIT_LIST_HEAD(&ailp->ail_buf_list);
 	init_waitqueue_head(&ailp->ail_empty);
+	init_waitqueue_head(&ailp->ail_push);
 
 	ailp->ail_task = kthread_run(xfsaild, ailp, "xfsaild/%s",
 			ailp->ail_mount->m_fsname);
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index 35655eac01a6..1b6f4bbd47c0 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -61,6 +61,7 @@  struct xfs_ail {
 	int			ail_log_flush;
 	struct list_head	ail_buf_list;
 	wait_queue_head_t	ail_empty;
+	wait_queue_head_t	ail_push;
 };
 
 /*
@@ -113,6 +114,7 @@  xfs_trans_ail_remove(
 }
 
 void			xfs_ail_push(struct xfs_ail *, xfs_lsn_t);
+void			xfs_ail_push_sync(struct xfs_ail *, xfs_lsn_t);
 void			xfs_ail_push_all(struct xfs_ail *);
 void			xfs_ail_push_all_sync(struct xfs_ail *);
 struct xfs_log_item	*xfs_ail_min(struct xfs_ail  *ailp);