diff mbox

xfs: largely extend aild sleep time if no work has to be done

Message ID 20170329204224.6412-1-dev@lynxeye.de
State New
Headers show

Commit Message

Lucas Stach March 29, 2017, 8:42 p.m. UTC
If the AIL has been pushed up to the target LSN, there is no
point in waking up every 50ms to check if there is more work
to do. All functions that move the target LSN forward make sure
to wake aild as appropriate.

Keep the timeout wakeup as a watchdog in case we miss the
wakeup from a target LSN update to guarantee forward progress,
but extend the timeout to 10 seconds.

This keeps the safety net, but also makes laptop users happy
as it gets rid of almost all the wakeups caused by a lightly
loaded FS.

Signed-off-by: Lucas Stach <dev@lynxeye.de>
---
 fs/xfs/xfs_trans_ail.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

Comments

Dave Chinner March 29, 2017, 11:22 p.m. UTC | #1
On Wed, Mar 29, 2017 at 10:42:24PM +0200, Lucas Stach wrote:
> If the AIL has been pushed up to the target LSN, there is no
> point in waking up every 50ms to check if there is more work
> to do. All functions that move the target LSN forward make sure
> to wake aild as appropriate.
> 
> Keep the timeout wakeup as a watchdog in case we miss the
> wakeup from a target LSN update to guarantee forward progress,
> but extend the timeout to 10 seconds.
> 
> This keeps the safety net, but also makes laptop users happy
> as it gets rid of almost all the wakeups caused by a lightly
> loaded FS.

The aild already has an idle capability that occurs when the target
has been reached. See xfsaild() - it will ignore the timeout and
schedule indefinitely when the AIL has been emptied and the target
has not been updated during the last push. i.e. this timeout is
not a watchdog, just a backoff for the next check if there is still
work to be done.

Keep in mind that XFS doesn't fully empty the AIL until the log has
been covered, and this takes 60-90s to occur after the last
modification has occurred to the filesystem. Delaying pushes on an
uncovered log risks breaking the covering state machine (it's
dependent on writeback from the AIL occurring within a certain time)
and so changes like this may break idling on more machines that it
"fixes".

FYI, filesystems that refuse to idle are typically a sign of
userspace touching the filesystem every 2-3 minutes. IIRC, the XFS
ail event tracing will tell you if metadata is being dirtied
regularly and so whether the AIL is staying empty or not and
hence whether it should actually be idle...

Yes, there have been bugs in this code in the past, and there may be
bugs now. However, just bumping the timeout up to something massive
is not a solution if there is a still bugs lurking here...

Cheers,

Dave.
diff mbox

Patch

diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index d6c9c3e..1eb40dc 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -457,12 +457,21 @@  xfsaild_push(
 	if (xfs_buf_delwri_submit_nowait(&ailp->xa_buf_list))
 		ailp->xa_log_flush++;
 
-	if (!count || XFS_LSN_CMP(lsn, target) >= 0) {
+	if (!count) {
 out_done:
 		/*
-		 * We reached the target or the AIL is empty, so wait a bit
-		 * longer for I/O to complete and remove pushed items from the
-		 * AIL before we start the next scan from the start of the AIL.
+		 * If there was nothing to be pushed we can go to sleep longer,
+		 * as this is purely a watchdog timeout. If the target gets
+		 * moved forward we will get scheduled in before hitting this
+		 * timeout.
+		 */
+		tout = 10000;
+		ailp->xa_last_pushed_lsn = 0;
+	} else if (XFS_LSN_CMP(lsn, target) >= 0) {
+		/*
+		 * We reached the target, so wait a bit longer for I/O to
+		 * complete and remove pushed items from the AIL before we
+		 * start the next scan from the start of the AIL.
 		 */
 		tout = 50;
 		ailp->xa_last_pushed_lsn = 0;