From patchwork Tue May 10 14:09:32 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anand Jain X-Patchwork-Id: 9058351 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id B4FBCBF29F for ; Tue, 10 May 2016 14:10:08 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id DE0CA20155 for ; Tue, 10 May 2016 14:10:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8080E20154 for ; Tue, 10 May 2016 14:10:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752606AbcEJOJ6 (ORCPT ); Tue, 10 May 2016 10:09:58 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:18547 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752560AbcEJOJz (ORCPT ); Tue, 10 May 2016 10:09:55 -0400 Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u4AE9otp013108 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 10 May 2016 14:09:51 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0022.oracle.com (8.13.8/8.13.8) with ESMTP id u4AE9oE8001364 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 10 May 2016 14:09:50 GMT Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by userv0121.oracle.com (8.13.8/8.13.8) with ESMTP id u4AE9nAT019340; Tue, 10 May 2016 14:09:49 GMT Received: from arch2.sg.oracle.com (/10.186.101.65) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 10 May 2016 07:09:49 -0700 From: Anand Jain To: linux-btrfs@vger.kernel.org Cc: dsterba@suse.cz, yauhen.kharuzhy@zavadatar.com Subject: [PATCH 13/13] btrfs: check for failed device and hot replace Date: Tue, 10 May 2016 22:09:32 +0800 Message-Id: <1462889372-5274-15-git-send-email-anand.jain@oracle.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1462889372-5274-1-git-send-email-anand.jain@oracle.com> References: <1462889372-5274-1-git-send-email-anand.jain@oracle.com> X-Source-IP: aserv0022.oracle.com [141.146.126.234] Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Anand Jain This patch checks for failed device and kicks out auto replace, if when user decided to disable auto replace it can be done by future sysfs or future ioctl interface to set fs_info->no_auto_replace parameter to 1. Signed-off-by: Anand Jain Tested-by: Austin S. Hemmelgarn Tested-by: Yauhen Kharuzhy --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/disk-io.c | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 47e9cd9dd29a..67bb36bb82ee 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1862,6 +1862,8 @@ struct btrfs_fs_info { struct list_head pinned_chunks; int creating_free_space_tree; + + int no_auto_replace; }; struct btrfs_subvolume_writers { diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 1deb5714cc3a..5c5c51319bec 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1876,6 +1876,39 @@ sleep: return 0; } +static int btrfs_recuperate(struct btrfs_root *root) +{ + int ret; + u64 failed_devid = 0; + struct btrfs_device *device; + struct btrfs_fs_devices *fs_devices; + + fs_devices = root->fs_info->fs_devices; + + /* fixme: does it need device_list_mutex */ + mutex_lock(&fs_devices->device_list_mutex); + rcu_read_lock(); + list_for_each_entry_rcu(device, + &fs_devices->devices, dev_list) { + if (device->failed) { + failed_devid = device->devid; + break; + } + } + rcu_read_unlock(); + mutex_unlock(&fs_devices->device_list_mutex); + + /* + * We are using the replace code which should be interrupt-able + * during unmount, and as of now there is no user land stop + * request that we support and this will run until its complete + */ + if (failed_devid && !root->fs_info->no_auto_replace) + ret = btrfs_auto_replace_start(root, failed_devid); + + return ret; +} + /* * returns: * < 0 : Check didn't run, std error @@ -1951,6 +1984,8 @@ static int health_kthread(void *arg) /* Check devices health */ btrfs_update_devices_health(root); + btrfs_recuperate(root); + mutex_unlock(&root->fs_info->health_mutex); sleep: