From patchwork Fri Oct 12 17:46:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 10639033 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5A2F913AD for ; Fri, 12 Oct 2018 17:46:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 497FB2B326 for ; Fri, 12 Oct 2018 17:46:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3D9852B9FC; Fri, 12 Oct 2018 17:46:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CAEB02B326 for ; Fri, 12 Oct 2018 17:46:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726788AbeJMBUc (ORCPT ); Fri, 12 Oct 2018 21:20:32 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33184 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726665AbeJMBUc (ORCPT ); Fri, 12 Oct 2018 21:20:32 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8FCE490C65 for ; Fri, 12 Oct 2018 17:46:55 +0000 (UTC) Received: from bfoster.bos.redhat.com (dhcp-41-2.bos.redhat.com [10.18.41.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4A95910027D8 for ; Fri, 12 Oct 2018 17:46:55 +0000 (UTC) From: Brian Foster To: linux-xfs@vger.kernel.org Subject: [PATCH] xfs: clear ail delwri queued bufs on unmount of shutdown fs Date: Fri, 12 Oct 2018 13:46:54 -0400 Message-Id: <20181012174654.2557-1-bfoster@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Fri, 12 Oct 2018 17:46:55 +0000 (UTC) Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In the typical unmount case, the AIL is forced out by the unmount sequence before the xfsaild task is stopped. Since AIL items are removed on writeback completion, this means that the AIL ->ail_buf_list delwri queue has been drained. This is not always true in the shutdown case, however. It's possible for buffers to sit on a delwri queue for a period of time across submission attempts if said items are locked or have been relogged and pinned since first added to the queue. If the attempt to log such an item results in a log I/O error, the error processing can shutdown the fs, remove the item from the AIL, stale the buffer (dropping the LRU reference) and clear its delwri queue state. The latter bit means the buffer will be released from a delwri queue on the next submission attempt, but this might never occur if the filesystem has shutdown and the AIL is empty. This means that such buffers are held indefinitely by the AIL delwri queue across destruction of the AIL. Aside from being a memory leak, these buffers can also hold references to in-core perag structures. The latter problem manifests as a generic/475 failure, reproducing the following asserts at unmount time: XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 151 XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 132 To prevent this problem, clear the AIL delwri queue as a final step before xfsaild() exit. The !empty state should never occur in the normal case, so add an assert to catch unexpected problems going forward. Signed-off-by: Brian Foster --- Hi all, This is a minor issue I ran into via generic/475. That test still occasionally fails for me with this patch in place, but the failure appears to be unrelated. Brian fs/xfs/xfs_trans_ail.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 55326f971cb3..d3a4e89bf4a0 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -531,17 +531,33 @@ xfsaild( set_current_state(TASK_INTERRUPTIBLE); /* - * Check kthread_should_stop() after we set the task state - * to guarantee that we either see the stop bit and exit or - * the task state is reset to runnable such that it's not - * scheduled out indefinitely and detects the stop bit at - * next iteration. - * + * Check kthread_should_stop() after we set the task state to + * guarantee that we either see the stop bit and exit or the + * task state is reset to runnable such that it's not scheduled + * out indefinitely and detects the stop bit at next iteration. * A memory barrier is included in above task state set to * serialize again kthread_stop(). */ if (kthread_should_stop()) { __set_current_state(TASK_RUNNING); + + /* + * The caller forces out the AIL before stopping the + * thread in the common case, which means the delwri + * queue is drained. In the shutdown case, the queue may + * still hold relogged buffers that haven't been + * submitted because they were pinned since added to the + * queue. + * + * Log I/O error processing stales the underlying buffer + * and clears the delwri state, expecting the buf to be + * removed on the next submission attempt. That won't + * happen if we're shutting down, so this is the last + * opportunity to release such buffers from the queue. + */ + ASSERT(list_empty(&ailp->ail_buf_list) || + XFS_FORCED_SHUTDOWN(ailp->ail_mount)); + xfs_buf_delwri_cancel(&ailp->ail_buf_list); break; }