From patchwork Tue Sep 5 13:48:45 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hou Tao X-Patchwork-Id: 9939077 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 734316038C for ; Tue, 5 Sep 2017 13:49:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6488F1FF82 for ; Tue, 5 Sep 2017 13:49:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 59763200DF; Tue, 5 Sep 2017 13:49:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C05071FF82 for ; Tue, 5 Sep 2017 13:49:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751409AbdIENtY (ORCPT ); Tue, 5 Sep 2017 09:49:24 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:5540 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751389AbdIENtX (ORCPT ); Tue, 5 Sep 2017 09:49:23 -0400 Received: from 172.30.72.59 (EHLO DGGEMS402-HUB.china.huawei.com) ([172.30.72.59]) by dggrg05-dlp.huawei.com (MOS 4.4.6-GA FastPath queued) with ESMTP id DGQ43072; Tue, 05 Sep 2017 21:49:21 +0800 (CST) Received: from [127.0.0.1] (10.177.31.14) by DGGEMS402-HUB.china.huawei.com (10.3.19.202) with Microsoft SMTP Server id 14.3.301.0; Tue, 5 Sep 2017 21:49:16 +0800 To: From: Hou Tao Subject: umount XFS hung when stopping the xfsaild kthread Message-ID: Date: Tue, 5 Sep 2017 21:48:45 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 X-Originating-IP: [10.177.31.14] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090204.59AEAB61.002E, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: b259a9224a6089535da0a2bf05243f12 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi all, We recently encounter a XFS umount hang problem. As we can see the following stacks, the umount process was trying to stop the xfsaild kthread and waiting for the exit of the xfsaild thread, and the xfsaild thread was waiting for wake-up. [] kthread_stop+0x4a/0xe0 [] xfs_trans_ail_destroy+0x17/0x30 [xfs] [] xfs_log_unmount+0x1e/0x60 [xfs] [] xfs_unmountfs+0xd5/0x190 [xfs] [] xfs_fs_put_super+0x32/0x90 [xfs] [] generic_shutdown_super+0x56/0xe0 [] kill_block_super+0x27/0x70 [] deactivate_locked_super+0x49/0x60 [] deactivate_super+0x46/0x60 [] mntput_no_expire+0xc5/0x120 [] SyS_umount+0x9f/0x3c0 [] system_call_fastpath+0x16/0x1b [] 0xffffffffffffffff [] xfsaild+0x537/0x5e0 [xfs] [] kthread+0xcf/0xe0 [] ret_from_fork+0x58/0x90 [] 0xffffffffffffffff The kernel version is RHEL7.3 and we are trying to reproduce it (not yet). I have check the related code and suspect the same problem may also exists in the mainline. The following is the possible sequences which may lead to the hang of umount: xfsaild: kthread_should_stop() // return false, so xfsaild continue umount: set_bit(KTHREAD_SHOULD_STOP, &kthread->flags) // by kthread_stop() umount: wake_up_process() // because xfsaild is still running, so 0 is returned xfsaild: __set_current_state() xfsaild: schedule() // Now, on one will wake it up The solution I think is adding an extra kthread_should_stop() before invoking schedule(). Maybe a smp_mb() is needed too, because we needs to ensure the read of the stop flag happens after the write of the task status. Something likes the following patch: Any suggestions ? Regards, Tao --- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 9056c0f..6313f67 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -520,6 +520,11 @@ xfsaild( if (!xfs_ail_min(ailp) && ailp->xa_target == ailp->xa_target_prev) { spin_unlock(&ailp->xa_lock); + + smp_mb(); + if (kthread_should_stop()) + break; + freezable_schedule(); tout = 0; continue;