From patchwork Wed Apr 25 20:07:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fabiano Rosas X-Patchwork-Id: 10364209 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 5B12A60546 for ; Wed, 25 Apr 2018 20:08:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4DA51281E1 for ; Wed, 25 Apr 2018 20:08:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 41FC3283C8; Wed, 25 Apr 2018 20:08:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D8EAA281E1 for ; Wed, 25 Apr 2018 20:07:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751725AbeDYUH6 (ORCPT ); Wed, 25 Apr 2018 16:07:58 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:37746 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752075AbeDYUH5 (ORCPT ); Wed, 25 Apr 2018 16:07:57 -0400 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w3PK5jr3088686 for ; Wed, 25 Apr 2018 16:07:56 -0400 Received: from e19.ny.us.ibm.com (e19.ny.us.ibm.com [129.33.205.209]) by mx0b-001b2d01.pphosted.com with ESMTP id 2hjwma72g3-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 25 Apr 2018 16:07:56 -0400 Received: from localhost by e19.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 25 Apr 2018 16:07:53 -0400 Received: from b01cxnp23032.gho.pok.ibm.com (9.57.198.27) by e19.ny.us.ibm.com (146.89.104.206) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 25 Apr 2018 16:07:50 -0400 Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com [9.57.199.109]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w3PK7oiV49152064; Wed, 25 Apr 2018 20:07:50 GMT Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6F72F112047; Wed, 25 Apr 2018 16:07:26 -0400 (EDT) Received: from farosas.localdomain (unknown [9.85.144.99]) by b01ledav004.gho.pok.ibm.com (Postfix) with ESMTP id 5F56D112040; Wed, 25 Apr 2018 16:07:25 -0400 (EDT) To: linux-block@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, jack@suse.cz, tj@kernel.org From: Fabiano Rosas Subject: write call hangs in kernel space after virtio hot-remove Date: Wed, 25 Apr 2018 17:07:48 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 Content-Language: en-US X-TM-AS-GCONF: 00 x-cbid: 18042520-0056-0000-0000-00000445209E X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008920; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000257; SDB=6.01023199; UDB=6.00522319; IPR=6.00802433; MB=3.00020781; MTD=3.00000008; XFM=3.00000015; UTC=2018-04-25 20:07:52 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18042520-0057-0000-0000-0000088920C4 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-04-25_06:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1804250182 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP I'm looking into an issue where removing a virtio disk via sysfs while another process is issuing write() calls results in the writing task going into a livelock: root@guest # cat test.sh #!/bin/bash dd if=/dev/zero of=/dev/vda bs=1M count=10000 & sleep 1 echo 1 > /sys/bus/pci/devices/0000:00:04.0/remove root@guest # ls /dev/vd* /dev/vda root@guest # grep Dirty /proc/meminfo Dirty: 0 kB root@guest # sh test.sh root@guest # ps aux | grep "[d]d if" root 1699 38.6 0.0 111424 1216 hvc0 D+ 10:48 0:01 dd if=/dev/zero of=/dev/vda bs=1M count=10000 root@guest # ls /dev/vd* ls: cannot access /dev/vd*: No such file or directory root@guest # cat /proc/1699/stack [<0>] 0xc0000000ffe28218 [<0>] __switch_to+0x31c/0x480 [<0>] balance_dirty_pages+0x990/0xb90 [<0>] balance_dirty_pages_ratelimited+0x50c/0x6c0 [<0>] generic_perform_write+0x1b0/0x260 [<0>] __generic_file_write_iter+0x200/0x240 [<0>] blkdev_write_iter+0xa4/0x150 [<0>] __vfs_write+0x14c/0x240 [<0>] vfs_write+0xd0/0x240 [<0>] ksys_write+0x6c/0x110 [<0>] system_call+0x58/0x6c root@guest # grep Dirty /proc/meminfo Dirty: 1506816 kB --- I have done some tracing and I believe this is caused by the clearing of 'WB_registered' in 'wb_shutdown': sh-1697 [000] .... 3994.541664: sysfs_remove_link <-del_gendisk sh-1697 [000] .... 3994.541671: wb_shutdown <-bdi_unregister Later, when 'balance_dirty_pages' tries to start writeback, it doesn't happen because 'WB_registered' is not set: fs/fs-writeback.c static void wb_wakeup(struct bdi_writeback *wb) { spin_lock_bh(&wb->work_lock); if (test_bit(WB_registered, &wb->state)) mod_delayed_work(bdi_wq, &wb->dwork, 0); spin_unlock_bh(&wb->work_lock); } So we get stuck in a loop in 'balance_dirty_pages': root@guest # cat /sys/kernel/debug/tracing/set_ftrace_filter balance_dirty_pages_ratelimited balance_dirty_pages wb_wakeup wb_workfn io_schedule_timeout dd-1699 [000] .... 11192.535946: wb_wakeup <-balance_dirty_pages dd-1699 [000] .... 11192.535950: io_schedule_timeout <-balance_dirty_pages dd-1699 [000] .... 11192.745968: wb_wakeup <-balance_dirty_pages dd-1699 [000] .... 11192.745972: io_schedule_timeout <-balance_dirty_pages The test on 'WB_registered' before starting the writeback task is introduced by: "5acda9 bdi: avoid oops on device removal". I have made a *naive* attempt at fixing it by allowing writeback to happen even without 'WB_registered': -- The effect of that is that the 'dd' process now finishes successfully and we get "Buffer I/O error"s for the dirty pages that remain. I believe this to be in conformance with existing interfaces since dd does not issue any fsync() calls. Does my analysis make any sense and would something along these lines be acceptable as a solution? Cheers diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index d4d04fe..050b067 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -982,7 +982,7 @@ void wb_start_background_writeback(struct bdi_writeback *wb) * writeback as soon as there is no other work to do. */ trace_writeback_wake_background(wb); - wb_wakeup(wb); + mod_delayed_work(bdi_wq, &wb->dwork, 0); } /* @@ -1933,7 +1933,7 @@ void wb_workfn(struct work_struct *work) struct bdi_writeback, dwork); long pages_written; - set_worker_desc("flush-%s", dev_name(wb->bdi->dev)); + set_worker_desc("flush-%s", wb->bdi->dev ? dev_name(wb->bdi->dev) : "?" ); current->flags |= PF_SWAPWRITE; if (likely(!current_is_workqueue_rescuer() ||