From patchwork Wed Sep 4 04:24:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11129243 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 532651510 for ; Wed, 4 Sep 2019 04:24:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3D04822DBF for ; Wed, 4 Sep 2019 04:24:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725791AbfIDEYz (ORCPT ); Wed, 4 Sep 2019 00:24:55 -0400 Received: from mail105.syd.optusnet.com.au ([211.29.132.249]:40635 "EHLO mail105.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725864AbfIDEYz (ORCPT ); Wed, 4 Sep 2019 00:24:55 -0400 Received: from dread.disaster.area (pa49-181-255-194.pa.nsw.optusnet.com.au [49.181.255.194]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id CE72C36148E for ; Wed, 4 Sep 2019 14:24:53 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1i5Mqi-0007OL-TK for linux-xfs@vger.kernel.org; Wed, 04 Sep 2019 14:24:52 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1i5Mqi-0002UI-RW for linux-xfs@vger.kernel.org; Wed, 04 Sep 2019 14:24:52 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 1/7] xfs: push the AIL in xlog_grant_head_wake Date: Wed, 4 Sep 2019 14:24:45 +1000 Message-Id: <20190904042451.9314-2-david@fromorbit.com> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20190904042451.9314-1-david@fromorbit.com> References: <20190904042451.9314-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=YO9NNpcXwc8z/SaoS+iAiA==:117 a=YO9NNpcXwc8z/SaoS+iAiA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=J70Eh1EUuV4A:10 a=20KFwNOVAAAA:8 a=iJFTkyUmMBOPLpsSiA4A:9 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner In the situation where the log is full and the CIL has not recently flushed, the AIL push threshold is throttled back to the where the last write of the head of the log was completed. This is stored in log->l_last_sync_lsn. Hence if the CIL holds > 25% of the log space pinned by flushes and/or aggregation in progress, we can get the situation where the head of the log lags a long way behind the reservation grant head. When this happens, the AIL push target is trimmed back from where the reservation grant head wants to push the log tail to, back to where the head of the log currently is. This means the push target doesn't reach far enough into the log to actually move the tail before the transaction reservation goes to sleep. When the CIL push completes, it moves the log head forward such that the AIL push target can now be moved, but that has no mechanism for puhsing the log tail. Further, if the next tail movement of the log is not large enough wake the waiter (i.e. still not enough space for it to have a reservation granted), we don't wake anything up, and hence we do not update the AIL push target to take into account the head of the log moving and allowing the push target to be moved forwards. To avoid this particular condition, if we fail to wake the first waiter on the grant head because we don't have enough space, push on the AIL again. This will pick up any movement of the log head and allow the push target to move forward due to completion of CIL pushing. Signed-off-by: Dave Chinner --- fs/xfs/xfs_log.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index b159a9e9fef0..941f10ff99d9 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -214,15 +214,20 @@ xlog_grant_head_wake( { struct xlog_ticket *tic; int need_bytes; + bool woken_task = false; list_for_each_entry(tic, &head->waiters, t_queue) { need_bytes = xlog_ticket_reservation(log, head, tic); - if (*free_bytes < need_bytes) + if (*free_bytes < need_bytes) { + if (!woken_task) + xlog_grant_push_ail(log, need_bytes); return false; + } *free_bytes -= need_bytes; trace_xfs_log_grant_wake_up(log, tic); wake_up_process(tic->t_task); + woken_task = true; } return true; From patchwork Wed Sep 4 04:24:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11129241 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E8DCE1398 for ; Wed, 4 Sep 2019 04:24:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D20A422DBF for ; Wed, 4 Sep 2019 04:24:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725908AbfIDEYz (ORCPT ); Wed, 4 Sep 2019 00:24:55 -0400 Received: from mail104.syd.optusnet.com.au ([211.29.132.246]:41173 "EHLO mail104.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725791AbfIDEYz (ORCPT ); Wed, 4 Sep 2019 00:24:55 -0400 Received: from dread.disaster.area (pa49-181-255-194.pa.nsw.optusnet.com.au [49.181.255.194]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 6514543E36C for ; Wed, 4 Sep 2019 14:24:53 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1i5Mqi-0007OO-UL for linux-xfs@vger.kernel.org; Wed, 04 Sep 2019 14:24:52 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1i5Mqi-0002UL-So for linux-xfs@vger.kernel.org; Wed, 04 Sep 2019 14:24:52 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 2/7] xfs: fix missed wakeup on l_flush_wait Date: Wed, 4 Sep 2019 14:24:46 +1000 Message-Id: <20190904042451.9314-3-david@fromorbit.com> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20190904042451.9314-1-david@fromorbit.com> References: <20190904042451.9314-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=D+Q3ErZj c=1 sm=1 tr=0 a=YO9NNpcXwc8z/SaoS+iAiA==:117 a=YO9NNpcXwc8z/SaoS+iAiA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=J70Eh1EUuV4A:10 a=fwyzoN0nAAAA:8 a=FOH2dFAWAAAA:8 a=20KFwNOVAAAA:8 a=TIbIJjZQYPkjqnKAcywA:9 a=Sc3RvPAMVtkGz6dGeUiH:22 a=i3VuKzQdj-NEYjvDI-p3:22 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Rik van Riel The code in xlog_wait uses the spinlock to make adding the task to the wait queue, and setting the task state to UNINTERRUPTIBLE atomic with respect to the waker. Doing the wakeup after releasing the spinlock opens up the following race condition: Task 1 task 2 add task to wait queue wake up task set task state to UNINTERRUPTIBLE This issue was found through code inspection as a result of kworkers being observed stuck in UNINTERRUPTIBLE state with an empty wait queue. It is rare and largely unreproducable. Simply moving the spin_unlock to after the wake_up_all results in the waker not being able to see a task on the waitqueue before it has set its state to UNINTERRUPTIBLE. This bug dates back to the conversion of this code to generic waitqueue infrastructure from a counting semaphore back in 2008 which didn't place the wakeups consistently w.r.t. to the relevant spin locks. [dchinner: Also fix a similar issue in the shutdown path on xc_commit_wait. Update commit log with more details of the issue.] Fixes: d748c62367eb ("[XFS] Convert l_flushsema to a sv_t") Reported-by: Chris Mason Signed-off-by: Rik van Riel Signed-off-by: Dave Chinner --- fs/xfs/xfs_log.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 941f10ff99d9..6f494f6369e8 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -2628,7 +2628,6 @@ xlog_state_do_callback( int funcdidcallbacks; /* flag: function did callbacks */ int repeats; /* for issuing console warnings if * looping too many times */ - int wake = 0; spin_lock(&log->l_icloglock); first_iclog = iclog = log->l_iclog; @@ -2824,11 +2823,9 @@ xlog_state_do_callback( #endif if (log->l_iclog->ic_state & (XLOG_STATE_ACTIVE|XLOG_STATE_IOERROR)) - wake = 1; - spin_unlock(&log->l_icloglock); - - if (wake) wake_up_all(&log->l_flush_wait); + + spin_unlock(&log->l_icloglock); } @@ -3928,7 +3925,9 @@ xfs_log_force_umount( * item committed callback functions will do this again under lock to * avoid races. */ + spin_lock(&log->l_cilp->xc_push_lock); wake_up_all(&log->l_cilp->xc_commit_wait); + spin_unlock(&log->l_cilp->xc_push_lock); xlog_state_do_callback(log, true, NULL); #ifdef XFSERRORDEBUG From patchwork Wed Sep 4 04:24:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11129249 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2D86F1510 for ; Wed, 4 Sep 2019 04:24:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 17D4622DBF for ; Wed, 4 Sep 2019 04:24:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725864AbfIDEY5 (ORCPT ); Wed, 4 Sep 2019 00:24:57 -0400 Received: from mail105.syd.optusnet.com.au ([211.29.132.249]:40651 "EHLO mail105.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725877AbfIDEY5 (ORCPT ); Wed, 4 Sep 2019 00:24:57 -0400 Received: from dread.disaster.area (pa49-181-255-194.pa.nsw.optusnet.com.au [49.181.255.194]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id E85E136128F for ; Wed, 4 Sep 2019 14:24:53 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1i5Mqi-0007OR-VW for linux-xfs@vger.kernel.org; Wed, 04 Sep 2019 14:24:52 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1i5Mqi-0002UO-Tt for linux-xfs@vger.kernel.org; Wed, 04 Sep 2019 14:24:52 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 3/7] xfs: factor debug code out of xlog_state_do_callback() Date: Wed, 4 Sep 2019 14:24:47 +1000 Message-Id: <20190904042451.9314-4-david@fromorbit.com> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20190904042451.9314-1-david@fromorbit.com> References: <20190904042451.9314-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=YO9NNpcXwc8z/SaoS+iAiA==:117 a=YO9NNpcXwc8z/SaoS+iAiA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=J70Eh1EUuV4A:10 a=20KFwNOVAAAA:8 a=hUrIRvtVvqYdoKubQJkA:9 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner Start making this function readable by lifting the debug code into a conditional function. Signed-off-by: Dave Chinner --- fs/xfs/xfs_log.c | 83 ++++++++++++++++++++++++++++-------------------- 1 file changed, 48 insertions(+), 35 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 6f494f6369e8..b3d8dbb3b956 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -2612,6 +2612,53 @@ xlog_get_lowest_lsn( return lowest_lsn; } +#ifdef DEBUG +/* + * Make one last gasp attempt to see if iclogs are being left in limbo. If the + * above loop finds an iclog earlier than the current iclog and in one of the + * syncing states, the current iclog is put into DO_CALLBACK and the callbacks + * are deferred to the completion of the earlier iclog. Walk the iclogs in order + * and make sure that no iclog is in DO_CALLBACK unless an earlier iclog is in + * one of the syncing states. + * + * Note that SYNCING|IOABORT is a valid state so we cannot just check for + * ic_state == SYNCING. + */ +static void +xlog_state_callback_check_state( + struct xlog *log, + int did_callbacks) +{ + struct xlog_in_core *iclog; + struct xlog_in_core *first_iclog; + + if (!did_callbacks) + return; + + first_iclog = iclog = log->l_iclog; + do { + ASSERT(iclog->ic_state != XLOG_STATE_DO_CALLBACK); + /* + * Terminate the loop if iclogs are found in states + * which will cause other threads to clean up iclogs. + * + * SYNCING - i/o completion will go through logs + * DONE_SYNC - interrupt thread should be waiting for + * l_icloglock + * IOERROR - give up hope all ye who enter here + */ + if (iclog->ic_state == XLOG_STATE_WANT_SYNC || + iclog->ic_state & XLOG_STATE_SYNCING || + iclog->ic_state == XLOG_STATE_DONE_SYNC || + iclog->ic_state == XLOG_STATE_IOERROR ) + break; + iclog = iclog->ic_next; + } while (first_iclog != iclog); +} +#else +#define xlog_state_callback_check_state(l,d) ((void)0) +#endif + STATIC void xlog_state_do_callback( struct xlog *log, @@ -2786,41 +2833,7 @@ xlog_state_do_callback( } } while (!ioerrors && loopdidcallbacks); -#ifdef DEBUG - /* - * Make one last gasp attempt to see if iclogs are being left in limbo. - * If the above loop finds an iclog earlier than the current iclog and - * in one of the syncing states, the current iclog is put into - * DO_CALLBACK and the callbacks are deferred to the completion of the - * earlier iclog. Walk the iclogs in order and make sure that no iclog - * is in DO_CALLBACK unless an earlier iclog is in one of the syncing - * states. - * - * Note that SYNCING|IOABORT is a valid state so we cannot just check - * for ic_state == SYNCING. - */ - if (funcdidcallbacks) { - first_iclog = iclog = log->l_iclog; - do { - ASSERT(iclog->ic_state != XLOG_STATE_DO_CALLBACK); - /* - * Terminate the loop if iclogs are found in states - * which will cause other threads to clean up iclogs. - * - * SYNCING - i/o completion will go through logs - * DONE_SYNC - interrupt thread should be waiting for - * l_icloglock - * IOERROR - give up hope all ye who enter here - */ - if (iclog->ic_state == XLOG_STATE_WANT_SYNC || - iclog->ic_state & XLOG_STATE_SYNCING || - iclog->ic_state == XLOG_STATE_DONE_SYNC || - iclog->ic_state == XLOG_STATE_IOERROR ) - break; - iclog = iclog->ic_next; - } while (first_iclog != iclog); - } -#endif + xlog_state_callback_check_state(log, funcdidcallbacks); if (log->l_iclog->ic_state & (XLOG_STATE_ACTIVE|XLOG_STATE_IOERROR)) wake_up_all(&log->l_flush_wait); From patchwork Wed Sep 4 04:24:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11129253 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DDA891398 for ; Wed, 4 Sep 2019 04:24:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C9AAB22DBF for ; Wed, 4 Sep 2019 04:24:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725938AbfIDEY7 (ORCPT ); Wed, 4 Sep 2019 00:24:59 -0400 Received: from mail105.syd.optusnet.com.au ([211.29.132.249]:40867 "EHLO mail105.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725945AbfIDEY6 (ORCPT ); Wed, 4 Sep 2019 00:24:58 -0400 Received: from dread.disaster.area (pa49-181-255-194.pa.nsw.optusnet.com.au [49.181.255.194]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id E9A9A3617E2 for ; Wed, 4 Sep 2019 14:24:53 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1i5Mqj-0007OT-08 for linux-xfs@vger.kernel.org; Wed, 04 Sep 2019 14:24:53 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1i5Mqi-0002UU-Up for linux-xfs@vger.kernel.org; Wed, 04 Sep 2019 14:24:52 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 4/7] xfs: factor callbacks out of xlog_state_do_callback() Date: Wed, 4 Sep 2019 14:24:48 +1000 Message-Id: <20190904042451.9314-5-david@fromorbit.com> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20190904042451.9314-1-david@fromorbit.com> References: <20190904042451.9314-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=YO9NNpcXwc8z/SaoS+iAiA==:117 a=YO9NNpcXwc8z/SaoS+iAiA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=J70Eh1EUuV4A:10 a=20KFwNOVAAAA:8 a=lng_JIwEjGLabVi12EoA:9 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner Simplify the code flow by lifting the iclog callback work out of the main iclog iteration loop. This isolates the log juggling and callbacks from the iclog state change logic in the loop. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_log.c | 69 +++++++++++++++++++++++++++++++----------------- 1 file changed, 45 insertions(+), 24 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index b3d8dbb3b956..c3a43fe023ea 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -2612,6 +2612,47 @@ xlog_get_lowest_lsn( return lowest_lsn; } +/* + * Keep processing entries in the iclog callback list until we come around and + * it is empty. We need to atomically see that the list is empty and change the + * state to DIRTY so that we don't miss any more callbacks being added. + * + * This function is called with the icloglock held and returns with it held. We + * drop it while running callbacks, however, as holding it over thousands of + * callbacks is unnecessary and causes excessive contention if we do. + */ +static bool +xlog_state_do_iclog_callbacks( + struct xlog *log, + struct xlog_in_core *iclog, + bool aborted) +{ + bool ret = false; + + spin_unlock(&log->l_icloglock); + + spin_lock(&iclog->ic_callback_lock); + while (!list_empty(&iclog->ic_callbacks)) { + LIST_HEAD(tmp); + + list_splice_init(&iclog->ic_callbacks, &tmp); + + spin_unlock(&iclog->ic_callback_lock); + xlog_cil_process_committed(&tmp, aborted); + spin_lock(&iclog->ic_callback_lock); + ret = true; + } + + /* + * Pick up the icloglock while still holding the callback lock so we + * serialise against anyone trying to add more callbacks to this iclog + * now we've finished processing. + */ + spin_lock(&log->l_icloglock); + spin_unlock(&iclog->ic_callback_lock); + return ret; +} + #ifdef DEBUG /* * Make one last gasp attempt to see if iclogs are being left in limbo. If the @@ -2784,31 +2825,9 @@ xlog_state_do_callback( } else ioerrors++; - spin_unlock(&log->l_icloglock); + if (xlog_state_do_iclog_callbacks(log, iclog, aborted)) + loopdidcallbacks++; - /* - * Keep processing entries in the callback list until - * we come around and it is empty. We need to - * atomically see that the list is empty and change the - * state to DIRTY so that we don't miss any more - * callbacks being added. - */ - spin_lock(&iclog->ic_callback_lock); - while (!list_empty(&iclog->ic_callbacks)) { - LIST_HEAD(tmp); - - list_splice_init(&iclog->ic_callbacks, &tmp); - - spin_unlock(&iclog->ic_callback_lock); - xlog_cil_process_committed(&tmp, aborted); - spin_lock(&iclog->ic_callback_lock); - } - - loopdidcallbacks++; - funcdidcallbacks++; - - spin_lock(&log->l_icloglock); - spin_unlock(&iclog->ic_callback_lock); if (!(iclog->ic_state & XLOG_STATE_IOERROR)) iclog->ic_state = XLOG_STATE_DIRTY; @@ -2824,6 +2843,8 @@ xlog_state_do_callback( iclog = iclog->ic_next; } while (first_iclog != iclog); + funcdidcallbacks += loopdidcallbacks; + if (repeats > 5000) { flushcnt += repeats; repeats = 0; From patchwork Wed Sep 4 04:24:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11129251 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 91BCF1510 for ; Wed, 4 Sep 2019 04:24:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 725AC22DBF for ; Wed, 4 Sep 2019 04:24:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725963AbfIDEY7 (ORCPT ); Wed, 4 Sep 2019 00:24:59 -0400 Received: from mail105.syd.optusnet.com.au ([211.29.132.249]:40869 "EHLO mail105.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725938AbfIDEY6 (ORCPT ); Wed, 4 Sep 2019 00:24:58 -0400 Received: from dread.disaster.area (pa49-181-255-194.pa.nsw.optusnet.com.au [49.181.255.194]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 00622361996 for ; Wed, 4 Sep 2019 14:24:53 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1i5Mqj-0007OX-1T for linux-xfs@vger.kernel.org; Wed, 04 Sep 2019 14:24:53 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1i5Mqi-0002UX-W1 for linux-xfs@vger.kernel.org; Wed, 04 Sep 2019 14:24:52 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 5/7] xfs: factor iclog state processing out of xlog_state_do_callback() Date: Wed, 4 Sep 2019 14:24:49 +1000 Message-Id: <20190904042451.9314-6-david@fromorbit.com> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20190904042451.9314-1-david@fromorbit.com> References: <20190904042451.9314-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 a=YO9NNpcXwc8z/SaoS+iAiA==:117 a=YO9NNpcXwc8z/SaoS+iAiA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=J70Eh1EUuV4A:10 a=20KFwNOVAAAA:8 a=T-GVzMzSLi8Sjll_dAQA:9 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner The iclog IO completion state processing is somewhat complex, and because it's inside two nested loops it is highly indented and very hard to read. Factor it out, flatten the logic flow and clean up the comments so that it much easier to see what the code is doing both in processing the individual iclogs and in the over xlog_state_do_callback() operation. Signed-off-by: Dave Chinner --- fs/xfs/xfs_log.c | 199 ++++++++++++++++++++++++----------------------- 1 file changed, 103 insertions(+), 96 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index c3a43fe023ea..c3efece72847 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -2612,6 +2612,87 @@ xlog_get_lowest_lsn( return lowest_lsn; } +/* + * Return true if we need to stop processing, false to continue to the next + * iclog. The caller will need to run callbacks if the iclog is returned in the + * XLOG_STATE_CALLBACK state. + */ +static bool +xlog_state_iodone_process_iclog( + struct xlog *log, + struct xlog_in_core *iclog, + struct xlog_in_core *completed_iclog) +{ + xfs_lsn_t lowest_lsn; + + /* Skip all iclogs in the ACTIVE & DIRTY states */ + if (iclog->ic_state & (XLOG_STATE_ACTIVE|XLOG_STATE_DIRTY)) + return false; + + /* + * Between marking a filesystem SHUTDOWN and stopping the log, we do + * flush all iclogs to disk (if there wasn't a log I/O error). So, we do + * want things to go smoothly in case of just a SHUTDOWN w/o a + * LOG_IO_ERROR. + */ + if (iclog->ic_state & XLOG_STATE_IOERROR) + return false; + + /* + * Can only perform callbacks in order. Since this iclog is not in the + * DONE_SYNC/ DO_CALLBACK state, we skip the rest and just try to clean + * up. If we set our iclog to DO_CALLBACK, we will not process it when + * we retry since a previous iclog is in the CALLBACK and the state + * cannot change since we are holding the l_icloglock. + */ + if (!(iclog->ic_state & + (XLOG_STATE_DONE_SYNC | XLOG_STATE_DO_CALLBACK))) { + if (completed_iclog && + (completed_iclog->ic_state == XLOG_STATE_DONE_SYNC)) { + completed_iclog->ic_state = XLOG_STATE_DO_CALLBACK; + } + return true; + } + + /* + * We now have an iclog that is in either the DO_CALLBACK or DONE_SYNC + * states. The other states (WANT_SYNC, SYNCING, or CALLBACK were caught + * by the above if and are going to clean (i.e. we aren't doing their + * callbacks) see the above if. + * + * We will do one more check here to see if we have chased our tail + * around. + */ + lowest_lsn = xlog_get_lowest_lsn(log); + if (lowest_lsn && + XFS_LSN_CMP(lowest_lsn, be64_to_cpu(iclog->ic_header.h_lsn)) < 0) + return false; /* Leave this iclog for another thread */ + + iclog->ic_state = XLOG_STATE_CALLBACK; + + /* + * Completion of a iclog IO does not imply that a transaction has + * completed, as transactions can be large enough to span many iclogs. + * We cannot change the tail of the log half way through a transaction + * as this may be the only transaction in the log and moving th etail to + * point to the middle of it will prevent recovery from finding the + * start of the transaction. Hence we should only update the + * last_sync_lsn if this iclog contains transaction completion callbacks + * on it. + * + * We have to do this before we drop the icloglock to ensure we are the + * only one that can update it. + */ + ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn), + be64_to_cpu(iclog->ic_header.h_lsn)) <= 0); + if (!list_empty_careful(&iclog->ic_callbacks)) + atomic64_set(&log->l_last_sync_lsn, + be64_to_cpu(iclog->ic_header.h_lsn)); + + return false; + +} + /* * Keep processing entries in the iclog callback list until we come around and * it is empty. We need to atomically see that the list is empty and change the @@ -2706,22 +2787,16 @@ xlog_state_do_callback( bool aborted, struct xlog_in_core *ciclog) { - xlog_in_core_t *iclog; - xlog_in_core_t *first_iclog; /* used to know when we've - * processed all iclogs once */ - int flushcnt = 0; - xfs_lsn_t lowest_lsn; - int ioerrors; /* counter: iclogs with errors */ - int loopdidcallbacks; /* flag: inner loop did callbacks*/ - int funcdidcallbacks; /* flag: function did callbacks */ - int repeats; /* for issuing console warnings if - * looping too many times */ + struct xlog_in_core *iclog; + struct xlog_in_core *first_iclog; + int flushcnt = 0; + int ioerrors = 0; + int funcdidcallbacks = 0; + int loopdidcallbacks; + int repeats = 0; + int ret; spin_lock(&log->l_icloglock); - first_iclog = iclog = log->l_iclog; - ioerrors = 0; - funcdidcallbacks = 0; - repeats = 0; do { /* @@ -2738,92 +2813,24 @@ xlog_state_do_callback( repeats++; do { - - /* skip all iclogs in the ACTIVE & DIRTY states */ - if (iclog->ic_state & - (XLOG_STATE_ACTIVE|XLOG_STATE_DIRTY)) { - iclog = iclog->ic_next; - continue; - } - /* - * Between marking a filesystem SHUTDOWN and stopping - * the log, we do flush all iclogs to disk (if there - * wasn't a log I/O error). So, we do want things to - * go smoothly in case of just a SHUTDOWN w/o a - * LOG_IO_ERROR. + * We have to process iclogs marked with errors at + * least once, so we count errors rather than terminate + * the inner loop here. */ - if (!(iclog->ic_state & XLOG_STATE_IOERROR)) { - /* - * Can only perform callbacks in order. Since - * this iclog is not in the DONE_SYNC/ - * DO_CALLBACK state, we skip the rest and - * just try to clean up. If we set our iclog - * to DO_CALLBACK, we will not process it when - * we retry since a previous iclog is in the - * CALLBACK and the state cannot change since - * we are holding the l_icloglock. - */ - if (!(iclog->ic_state & - (XLOG_STATE_DONE_SYNC | - XLOG_STATE_DO_CALLBACK))) { - if (ciclog && (ciclog->ic_state == - XLOG_STATE_DONE_SYNC)) { - ciclog->ic_state = XLOG_STATE_DO_CALLBACK; - } - break; - } - /* - * We now have an iclog that is in either the - * DO_CALLBACK or DONE_SYNC states. The other - * states (WANT_SYNC, SYNCING, or CALLBACK were - * caught by the above if and are going to - * clean (i.e. we aren't doing their callbacks) - * see the above if. - */ - - /* - * We will do one more check here to see if we - * have chased our tail around. - */ - - lowest_lsn = xlog_get_lowest_lsn(log); - if (lowest_lsn && - XFS_LSN_CMP(lowest_lsn, - be64_to_cpu(iclog->ic_header.h_lsn)) < 0) { - iclog = iclog->ic_next; - continue; /* Leave this iclog for - * another thread */ - } - - iclog->ic_state = XLOG_STATE_CALLBACK; - + if (iclog->ic_state & XLOG_STATE_IOERROR) + ioerrors++; - /* - * Completion of a iclog IO does not imply that - * a transaction has completed, as transactions - * can be large enough to span many iclogs. We - * cannot change the tail of the log half way - * through a transaction as this may be the only - * transaction in the log and moving th etail to - * point to the middle of it will prevent - * recovery from finding the start of the - * transaction. Hence we should only update the - * last_sync_lsn if this iclog contains - * transaction completion callbacks on it. - * - * We have to do this before we drop the - * icloglock to ensure we are the only one that - * can update it. - */ - ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn), - be64_to_cpu(iclog->ic_header.h_lsn)) <= 0); - if (!list_empty_careful(&iclog->ic_callbacks)) - atomic64_set(&log->l_last_sync_lsn, - be64_to_cpu(iclog->ic_header.h_lsn)); + ret = xlog_state_iodone_process_iclog(log, iclog, + ciclog); + if (ret) + break; - } else - ioerrors++; + if (!(iclog->ic_state & + (XLOG_STATE_CALLBACK | XLOG_STATE_IOERROR))) { + iclog = iclog->ic_next; + continue; + } if (xlog_state_do_iclog_callbacks(log, iclog, aborted)) loopdidcallbacks++; From patchwork Wed Sep 4 04:24:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11129255 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 423DA1398 for ; Wed, 4 Sep 2019 04:25:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2BE1022DBF for ; Wed, 4 Sep 2019 04:25:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725945AbfIDEZA (ORCPT ); Wed, 4 Sep 2019 00:25:00 -0400 Received: from mail105.syd.optusnet.com.au ([211.29.132.249]:41049 "EHLO mail105.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725877AbfIDEZA (ORCPT ); Wed, 4 Sep 2019 00:25:00 -0400 Received: from dread.disaster.area (pa49-181-255-194.pa.nsw.optusnet.com.au [49.181.255.194]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 7E7C6361B48 for ; Wed, 4 Sep 2019 14:24:54 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1i5Mqj-0007Oa-2v for linux-xfs@vger.kernel.org; Wed, 04 Sep 2019 14:24:53 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1i5Mqj-0002Ua-1D for linux-xfs@vger.kernel.org; Wed, 04 Sep 2019 14:24:53 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 6/7] xfs: push iclog state cleaning into xlog_state_clean_log Date: Wed, 4 Sep 2019 14:24:50 +1000 Message-Id: <20190904042451.9314-7-david@fromorbit.com> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20190904042451.9314-1-david@fromorbit.com> References: <20190904042451.9314-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=YO9NNpcXwc8z/SaoS+iAiA==:117 a=YO9NNpcXwc8z/SaoS+iAiA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=J70Eh1EUuV4A:10 a=20KFwNOVAAAA:8 a=2D6X0XNKtPc4CptNA_0A:9 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner xlog_state_clean_log() is only called from one place, and it occurs when an iclog is transitioning back to ACTIVE. Prior to calling xlog_state_clean_log, the iclog we are processing has a hard coded state check to DIRTY so that xlog_state_clean_log() processes it correctly. We also have a hard coded wakeup after xlog_state_clean_log() to enfore log force waiters on that iclog are woken correctly. Both of these things are operations required to finish processing an iclog and return it to the ACTIVE state again, so they make little sense to be separated from the rest of the clean state transition code. Hence push these things inside xlog_state_clean_log(), document the behaviour and rename it xlog_state_clean_iclog() to indicate that it's being driven by an iclog state change and does the iclog state change work itself. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_log.c | 57 ++++++++++++++++++++++++++++-------------------- 1 file changed, 33 insertions(+), 24 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index c3efece72847..6a59d71d4c60 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -2505,21 +2505,35 @@ xlog_write( ***************************************************************************** */ -/* Clean iclogs starting from the head. This ordering must be - * maintained, so an iclog doesn't become ACTIVE beyond one that - * is SYNCING. This is also required to maintain the notion that we use - * a ordered wait queue to hold off would be writers to the log when every - * iclog is trying to sync to disk. +/* + * An iclog has just finished it's completion processing, so we need to update + * the iclog state and propagate that up into the overall log state. Hence we + * prepare the iclog for cleaning, and then clean all the pending dirty iclogs + * starting from the head, and then wake up any threads that are waiting for the + * iclog to be marked clean. + * + * The ordering of marking iclogs ACTIVE must be maintained, so an iclog + * doesn't become ACTIVE beyond one that is SYNCING. This is also required to + * maintain the notion that we use a ordered wait queue to hold off would be + * writers to the log when every iclog is trying to sync to disk. + * + * Caller must hold the icloglock before calling us. * - * State Change: DIRTY -> ACTIVE + * State Change: !IOERROR -> DIRTY -> ACTIVE */ STATIC void -xlog_state_clean_log( - struct xlog *log) +xlog_state_clean_iclog( + struct xlog *log, + struct xlog_in_core *dirty_iclog) { - xlog_in_core_t *iclog; - int changed = 0; + struct xlog_in_core *iclog; + int changed = 0; + + /* Prepare the completed iclog. */ + if (!(dirty_iclog->ic_state & XLOG_STATE_IOERROR)) + dirty_iclog->ic_state = XLOG_STATE_DIRTY; + /* Walk all the iclogs to update the ordered active state. */ iclog = log->l_iclog; do { if (iclog->ic_state == XLOG_STATE_DIRTY) { @@ -2557,7 +2571,13 @@ xlog_state_clean_log( iclog = iclog->ic_next; } while (iclog != log->l_iclog); - /* log is locked when we are called */ + + /* + * Wake up threads waiting in xfs_log_force() for the dirty iclog + * to be cleaned. + */ + wake_up_all(&dirty_iclog->ic_force_wait); + /* * Change state for the dummy log recording. * We usually go to NEED. But we go to NEED2 if the changed indicates @@ -2591,7 +2611,7 @@ xlog_state_clean_log( ASSERT(0); } } -} /* xlog_state_clean_log */ +} STATIC xfs_lsn_t xlog_get_lowest_lsn( @@ -2835,18 +2855,7 @@ xlog_state_do_callback( if (xlog_state_do_iclog_callbacks(log, iclog, aborted)) loopdidcallbacks++; - if (!(iclog->ic_state & XLOG_STATE_IOERROR)) - iclog->ic_state = XLOG_STATE_DIRTY; - - /* - * Transition from DIRTY to ACTIVE if applicable. - * NOP if STATE_IOERROR. - */ - xlog_state_clean_log(log); - - /* wake up threads waiting in xfs_log_force() */ - wake_up_all(&iclog->ic_force_wait); - + xlog_state_clean_iclog(log, iclog); iclog = iclog->ic_next; } while (first_iclog != iclog); From patchwork Wed Sep 4 04:24:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 11129245 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 67B151398 for ; Wed, 4 Sep 2019 04:24:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5146F22DBF for ; Wed, 4 Sep 2019 04:24:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725947AbfIDEY4 (ORCPT ); Wed, 4 Sep 2019 00:24:56 -0400 Received: from mail104.syd.optusnet.com.au ([211.29.132.246]:41185 "EHLO mail104.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725840AbfIDEY4 (ORCPT ); Wed, 4 Sep 2019 00:24:56 -0400 Received: from dread.disaster.area (pa49-181-255-194.pa.nsw.optusnet.com.au [49.181.255.194]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 84C5543E35B for ; Wed, 4 Sep 2019 14:24:53 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92) (envelope-from ) id 1i5Mqj-0007Oc-3l for linux-xfs@vger.kernel.org; Wed, 04 Sep 2019 14:24:53 +1000 Received: from dave by discord.disaster.area with local (Exim 4.92) (envelope-from ) id 1i5Mqj-0002Ud-2R for linux-xfs@vger.kernel.org; Wed, 04 Sep 2019 14:24:53 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 7/7] xfs: push the grant head when the log head moves forward Date: Wed, 4 Sep 2019 14:24:51 +1000 Message-Id: <20190904042451.9314-8-david@fromorbit.com> X-Mailer: git-send-email 2.23.0.rc1 In-Reply-To: <20190904042451.9314-1-david@fromorbit.com> References: <20190904042451.9314-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=YO9NNpcXwc8z/SaoS+iAiA==:117 a=YO9NNpcXwc8z/SaoS+iAiA==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=J70Eh1EUuV4A:10 a=20KFwNOVAAAA:8 a=7Q2-SO56R8pWWUt_uDsA:9 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner When the log fills up, we can get into the state where the outstanding items in the CIL being committed and aggregated are larger than the range that the reservation grant head tail pushing will attempt to clean. This can result in the tail pushing range being trimmed back to the the log head (l_last_sync_lsn) and so may not actually move the push target at all. When the iclogs associated with the CIL commit finally land, the log head moves forward, and this removes the restriction on the AIL push target. However, if we already have transactions sleeping on the grant head, and there's nothing in the AIL still to flush from the current push target, then nothing will move the tail of the log and trigger a log reservation wakeup. Hence the there is nothing that will trigger xlog_grant_push_ail() to recalculate the AIL push target and start pushing on the AIL again to write back the metadata objects that pin the tail of the log and hence free up space and allow the transaction reservations to be woken and make progress. Hence we need to push on the grant head when we move the log head forward, as this may be the only trigger we have that can move the AIL push target forwards in this situation. Signed-off-by: Dave Chinner Nacked-by: to the upstream commit I guess so I at least remember this --- fs/xfs/xfs_log.c | 72 +++++++++++++++++++++++++++++++----------------- 1 file changed, 47 insertions(+), 25 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 6a59d71d4c60..733693e1ac9f 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -2632,6 +2632,46 @@ xlog_get_lowest_lsn( return lowest_lsn; } +/* + * Completion of a iclog IO does not imply that a transaction has completed, as + * transactions can be large enough to span many iclogs. We cannot change the + * tail of the log half way through a transaction as this may be the only + * transaction in the log and moving the tail to point to the middle of it + * will prevent recovery from finding the start of the transaction. Hence we + * should only update the last_sync_lsn if this iclog contains transaction + * completion callbacks on it. + * + * We have to do this before we drop the icloglock to ensure we are the only one + * that can update it. + * + * If we are moving the last_sync_lsn forwards, we also need to ensure we kick + * the reservation grant head pushing. This is due to the fact that the push + * target is bound by the current last_sync_lsn value. Hence if we have a large + * amount of log space bound up in this committing transaction then the + * last_sync_lsn value may be the limiting factor preventing tail pushing from + * freeing space in the log. Hence once we've updated the last_sync_lsn we + * should push the AIL to ensure the push target (and hence the grant head) is + * no longer bound by the old log head location and can move forwards and make + * progress again. + */ +static void +xlog_state_set_callback( + struct xlog *log, + struct xlog_in_core *iclog, + xfs_lsn_t header_lsn) +{ + iclog->ic_state = XLOG_STATE_CALLBACK; + + ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn), header_lsn) <= 0); + + if (list_empty_careful(&iclog->ic_callbacks)) + return; + + atomic64_set(&log->l_last_sync_lsn, header_lsn); + xlog_grant_push_ail(log, 0); + +} + /* * Return true if we need to stop processing, false to continue to the next * iclog. The caller will need to run callbacks if the iclog is returned in the @@ -2644,6 +2684,7 @@ xlog_state_iodone_process_iclog( struct xlog_in_core *completed_iclog) { xfs_lsn_t lowest_lsn; + xfs_lsn_t header_lsn; /* Skip all iclogs in the ACTIVE & DIRTY states */ if (iclog->ic_state & (XLOG_STATE_ACTIVE|XLOG_STATE_DIRTY)) @@ -2681,34 +2722,15 @@ xlog_state_iodone_process_iclog( * callbacks) see the above if. * * We will do one more check here to see if we have chased our tail - * around. + * around. If this is not the lowest lsn iclog, then we will leave it + * for another completion to process. */ + header_lsn = be64_to_cpu(iclog->ic_header.h_lsn); lowest_lsn = xlog_get_lowest_lsn(log); - if (lowest_lsn && - XFS_LSN_CMP(lowest_lsn, be64_to_cpu(iclog->ic_header.h_lsn)) < 0) - return false; /* Leave this iclog for another thread */ - - iclog->ic_state = XLOG_STATE_CALLBACK; - - /* - * Completion of a iclog IO does not imply that a transaction has - * completed, as transactions can be large enough to span many iclogs. - * We cannot change the tail of the log half way through a transaction - * as this may be the only transaction in the log and moving th etail to - * point to the middle of it will prevent recovery from finding the - * start of the transaction. Hence we should only update the - * last_sync_lsn if this iclog contains transaction completion callbacks - * on it. - * - * We have to do this before we drop the icloglock to ensure we are the - * only one that can update it. - */ - ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn), - be64_to_cpu(iclog->ic_header.h_lsn)) <= 0); - if (!list_empty_careful(&iclog->ic_callbacks)) - atomic64_set(&log->l_last_sync_lsn, - be64_to_cpu(iclog->ic_header.h_lsn)); + if (lowest_lsn && XFS_LSN_CMP(lowest_lsn, header_lsn) < 0) + return false; + xlog_state_set_callback(log, iclog, header_lsn); return false; }