From patchwork Sat Apr 2 01:30:49 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anand Jain X-Patchwork-Id: 8730111 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id B386BC0553 for ; Sat, 2 Apr 2016 01:31:51 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id AB503200F2 for ; Sat, 2 Apr 2016 01:31:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8793A20386 for ; Sat, 2 Apr 2016 01:31:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932427AbcDBBbl (ORCPT ); Fri, 1 Apr 2016 21:31:41 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:48450 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932359AbcDBBb2 (ORCPT ); Fri, 1 Apr 2016 21:31:28 -0400 Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u321VPDa032149 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 2 Apr 2016 01:31:25 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u321VP26023968 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 2 Apr 2016 01:31:25 GMT Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0121.oracle.com (8.13.8/8.13.8) with ESMTP id u321VP7v014175; Sat, 2 Apr 2016 01:31:25 GMT Received: from arch2.localdomain (/42.60.24.64) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 01 Apr 2016 18:31:24 -0700 From: Anand Jain To: linux-btrfs@vger.kernel.org Cc: yauhen.kharuzhy@zavadatar.com, dsterba@suse.cz Subject: [PATCH 11/13] btrfs: introduce device dynamic state transition to offline or failed Date: Sat, 2 Apr 2016 09:30:49 +0800 Message-Id: <1459560651-14809-12-git-send-email-anand.jain@oracle.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1459560651-14809-1-git-send-email-anand.jain@oracle.com> References: <1459560651-14809-1-git-send-email-anand.jain@oracle.com> X-Source-IP: userv0021.oracle.com [156.151.31.71] Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch provides helper functions to force a device to offline or failed, and we need this device states for the following reasons, 1) a. it can be reported that device has failed when it does b. close the device when it goes offline so that blocklayer can cleanup 2) identify the candidate for the auto replace 3) avoid further commit error reported against the failing device and 4) a device in the multi device btrfs may go offline from the system (but as of now in in some system config btrfs gets unmounted in this context, which is not a correct behavior) Signed-off-by: Anand Jain Tested-by: Austin S. Hemmelgarn --- fs/btrfs/volumes.c | 137 +++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 13 +++++ 2 files changed, 150 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 072cefac958c..eb9f28504d3f 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7149,3 +7149,140 @@ out: read_unlock(&map_tree->map_tree.lock); return ret; } + +static void __close_device(struct work_struct *work) +{ + struct btrfs_device *device; + + device = container_of(work, struct btrfs_device, rcu_work); + + if (device->bdev) + blkdev_put(device->bdev, device->mode); + + device->bdev = NULL; +} + +static void close_device(struct rcu_head *head) +{ + struct btrfs_device *device; + + device = container_of(head, struct btrfs_device, rcu); + + INIT_WORK(&device->rcu_work, __close_device); + schedule_work(&device->rcu_work); +} + +void btrfs_close_one_device_dont_free(struct btrfs_device *device) +{ + struct btrfs_fs_devices *fs_devices = device->fs_devices; + + if (device->bdev) + fs_devices->open_devices--; + + if (device->writeable && + device->devid != BTRFS_DEV_REPLACE_DEVID) { + list_del_init(&device->dev_alloc_list); + fs_devices->rw_devices--; + } + + device->writeable = 0; + + call_rcu(&device->rcu, close_device); +} + +void force_device_close(struct btrfs_device *device) +{ + struct btrfs_device *next_device; + struct btrfs_fs_devices *fs_devices; + + fs_devices = device->fs_devices; + + mutex_lock(&fs_devices->device_list_mutex); + lock_chunks(fs_devices->fs_info->fs_root); + + next_device = list_entry(fs_devices->devices.next, + struct btrfs_device, dev_list); + if (device->bdev == fs_devices->fs_info->sb->s_bdev) + fs_devices->fs_info->sb->s_bdev = next_device->bdev; + + if (device->bdev == fs_devices->latest_bdev) + fs_devices->latest_bdev = next_device->bdev; + + btrfs_close_one_device_dont_free(device); + + /* + * TODO: works for now, but its better to keep the state of + * missing and offline different, and update rest of the + * places where we check for only missing and not for failed + * or offline as of now. + */ + device->missing = 1; + fs_devices->missing_devices++; + device->writeable = 0; + + rcu_barrier(); + + unlock_chunks(fs_devices->fs_info->fs_root); + mutex_unlock(&fs_devices->device_list_mutex); +} + +void btrfs_enforce_device_state(struct btrfs_device *dev, char *why) +{ + bool degrade_option; + int tolerated_fail; + struct btrfs_fs_info *fs_info; + struct btrfs_fs_devices *fs_devices; + + fs_devices = dev->fs_devices; + fs_info = fs_devices->fs_info; + degrade_option = btrfs_test_opt(fs_info->fs_root, DEGRADED); + + /* todo: support seed later */ + if (fs_devices->seeding) + return; + + /* this shouldn't be called if device is already missing */ + if (dev->missing || !dev->bdev) + return; + + if (dev->offline || dev->failed) + return; + + /* Only RW device is requested to force close let FS handle it*/ + if (fs_devices->rw_devices == 1) { + btrfs_std_error(fs_info, -EIO, + "force offline last RW device"); + return; + } + + if (!strcmp(why, "offline")) + dev->offline = 1; + else if (!strcmp(why, "failed")) + dev->failed = 1; + else + return; + + btrfs_sysfs_rm_device_link(fs_devices, dev); + + force_device_close(dev); + + tolerated_fail = btrfs_check_degradable(fs_info, + fs_info->sb->s_flags); + if (tolerated_fail > 0) { + btrfs_warn_in_rcu(fs_info, "device %s %s, chunks degraded", + rcu_str_deref(dev->name), why); + } else if(tolerated_fail < 0) { + btrfs_warn_in_rcu(fs_info, + "device %s %s, chunks failed", + rcu_str_deref(dev->name), why); + btrfs_std_error(fs_info, -EIO, "devices below critical level"); + } else { + btrfs_warn_in_rcu(fs_info, + "device %s %s, No chunks are degraded", + rcu_str_deref(dev->name), why); + } + btrfs_info_in_rcu(fs_info, + "num_devices %llu rw_devices %llu degraded-option: %s", + fs_devices->num_devices, fs_devices->rw_devices, + degrade_option ? "set":"unset"); +} diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index b4308afa3097..da7d3b8ba50e 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -72,7 +72,19 @@ struct btrfs_device { int writeable; int in_fs_metadata; + /* missing: device wasn't found at the time of mount */ int missing; + /* failed: device confirmed to have experienced critical io failure */ + int failed; + /* + * offline: system or user or block layer transport has removed + * offlined the device which was once present and without going + * through unmount. Implies an intriem communication break down + * and not necessarily a candidate for the device replace. And + * device might be online after user intervention or after + * block transport layer error recovery. + */ + int offline; int can_discard; int is_tgtdev_for_dev_replace; @@ -575,5 +587,6 @@ struct list_head *btrfs_get_fs_uuids(void); void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info); void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info); int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags); +void btrfs_enforce_device_state(struct btrfs_device *dev, char *why); #endif