From patchwork Mon Apr 6 11:51:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anand Jain X-Patchwork-Id: 11475401 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B45D61805 for ; Mon, 6 Apr 2020 11:51:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 93700206F8 for ; Mon, 6 Apr 2020 11:51:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="ns/JMUsI" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727575AbgDFLve (ORCPT ); Mon, 6 Apr 2020 07:51:34 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:40632 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726329AbgDFLve (ORCPT ); Mon, 6 Apr 2020 07:51:34 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 036BnHj6027615 for ; Mon, 6 Apr 2020 11:51:32 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=bszKer2+tebRvMuF2lhAhjztr5cP3v/1OBY0PY660F8=; b=ns/JMUsI6FDu5ISrFbA/3P5yiksji5kj0FGZmF8nVcs1+M9+nk8MqV/vvwlIcM2CuBa8 4zkaLIt14BCjoFTRdVitApo9BEehu83eOX0tnaQdK6s2jIQEDBFtfzCVdIr5HsFhQzsU LbJfYkKZF+T3Qa0qRyDAwTjGa8Up4WM+WkQqdzeYiNQ6QwE5RmboV6KjFHKBTVAOx/kr VVSqD7BDc5mKW5C6xxE83ZcNh1fg4ijs9mD5PxFG5v2HI9WwC86BBwdrdo+tlSBOvLQy ahQE498Sf9oAnVR42l+XmH3N4M0JOoM6uRcb5yzN4Tr7WaYLN4FbTvTjx02CcJH86z0/ xg== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2120.oracle.com with ESMTP id 306jvmx8eb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Mon, 06 Apr 2020 11:51:32 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 036BlEPo064708 for ; Mon, 6 Apr 2020 11:51:32 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3030.oracle.com with ESMTP id 3073qcw4tp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Mon, 06 Apr 2020 11:51:32 +0000 Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 036BpV94000660 for ; Mon, 6 Apr 2020 11:51:31 GMT Received: from tp.localdomain (/39.109.145.141) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 06 Apr 2020 04:51:31 -0700 From: Anand Jain To: linux-btrfs@vger.kernel.org Subject: [PATCH v7 1/5] btrfs: add btrfs_strmatch helper Date: Mon, 6 Apr 2020 19:51:07 +0800 Message-Id: <1586173871-5559-2-git-send-email-anand.jain@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1586173871-5559-1-git-send-email-anand.jain@oracle.com> References: <1586173871-5559-1-git-send-email-anand.jain@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9582 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 mlxscore=0 mlxlogscore=999 spamscore=0 bulkscore=0 adultscore=0 malwarescore=0 suspectscore=1 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004060104 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9582 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 suspectscore=1 mlxlogscore=999 mlxscore=0 bulkscore=0 adultscore=0 priorityscore=1501 lowpriorityscore=0 clxscore=1015 malwarescore=0 impostorscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004060104 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Add a generic helper to match the golden-string in the given-string, and ignore the leading and trailing whitespaces if any. Signed-off-by: Anand Jain Suggested-by: David Sterba --- v5: born fs/btrfs/sysfs.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 93cf76118a04..7bb68cef98ab 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -809,6 +809,29 @@ static ssize_t btrfs_checksum_show(struct kobject *kobj, BTRFS_ATTR(, checksum, btrfs_checksum_show); +/* + * Match the %golden in the %given. Ignore the leading and trailing whitespaces + * if any. + */ +static int btrfs_strmatch(const char *given, const char *golden) +{ + size_t len = strlen(golden); + char *stripped; + + /* strip leading whitespace */ + stripped = skip_spaces(given); + + if (strncmp(stripped, golden, len) == 0) { + /* strip trailing whitespace */ + if (strlen(skip_spaces(stripped + len))) + return -EINVAL; + + return 0; + } + + return -EINVAL; +} + static const struct attribute *btrfs_attrs[] = { BTRFS_ATTR_PTR(, label), BTRFS_ATTR_PTR(, nodesize), From patchwork Mon Apr 6 11:51:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anand Jain X-Patchwork-Id: 11475403 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 07F0F174A for ; Mon, 6 Apr 2020 11:51:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D13F72072A for ; Mon, 6 Apr 2020 11:51:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="JvreHjuT" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727797AbgDFLvi (ORCPT ); Mon, 6 Apr 2020 07:51:38 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:60682 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727717AbgDFLvg (ORCPT ); Mon, 6 Apr 2020 07:51:36 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 036Bm1EL079091 for ; Mon, 6 Apr 2020 11:51:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=U0SHMAK8qmXmvSrWMkcr793HwUl7gEh/nV+MFcBBSzE=; b=JvreHjuTM6ynt5VfaInjFWF0SL/XJi6TSo7QHXYJ59NfhJ3vadoFcgSJ41IDQTxjPrtu mi7bIIftrl6Hadd1byAkLvlUuqGR8GX1FoFbX0eyzl5PfAjvGIqR8/a70oCxmYD/2TIi YvVPuvQ2Od6OpfmAWr9aAG447GXncX0i2X5lTuA65i2NrPBhZ7KnN/NUsTUmqTbtrj7X KVDOHwK53srH8H1qoBwwYngjPOINLxUEG1HHHUpJfH5TdhCFu7/eIdhCu5QEYJUnNeqD LD1I8KFe1dHnlaCLuIth4VWJvEs3zPwkfIPteWGIOaCrbM1FQds/fr35AnLJFiHlBFe1 Pw== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by aserp2120.oracle.com with ESMTP id 306j6m6arf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Mon, 06 Apr 2020 11:51:34 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 036BlMQE043963 for ; Mon, 6 Apr 2020 11:51:34 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3020.oracle.com with ESMTP id 30741a96gh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Mon, 06 Apr 2020 11:51:34 +0000 Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 036BpXx1009318 for ; Mon, 6 Apr 2020 11:51:33 GMT Received: from tp.localdomain (/39.109.145.141) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 06 Apr 2020 04:51:33 -0700 From: Anand Jain To: linux-btrfs@vger.kernel.org Subject: [PATCH v7 2/5] btrfs: create read policy framework Date: Mon, 6 Apr 2020 19:51:08 +0800 Message-Id: <1586173871-5559-3-git-send-email-anand.jain@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1586173871-5559-1-git-send-email-anand.jain@oracle.com> References: <1586173871-5559-1-git-send-email-anand.jain@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9582 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 phishscore=0 malwarescore=0 bulkscore=0 spamscore=0 adultscore=99 mlxlogscore=999 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004060104 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9582 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxlogscore=999 spamscore=0 priorityscore=1501 suspectscore=1 lowpriorityscore=0 malwarescore=0 impostorscore=0 mlxscore=0 phishscore=0 adultscore=99 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004060104 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org As of now we use %pid method to read stripped mirrored data, which means process id determines the stripe id to be read. This type of routing typically helps in a system with many small independent processes tying to read random data. On the other hand the %pid based read IO policy is inefficient because if there is a single process trying to read large data the overall disk bandwidth remains under-utilized. So this patch introduces read policy framework so that we could add more read policies, such as IO routing based on device's wait-queue or manual when we have a read-preferred device or a policy based on the target storage caching. Signed-off-by: Anand Jain --- (rebased on 5.6) v7: Fix missing /* fall through */ in the switch Removed Reviewed-by: Josef Bacik v6:- v5: Title renamed from:- btrfs: add read_policy framework Change log updated. Unnecessary comment dropped, added more where necessary. Optimize code in the switch remove duplicate code. Define BTRFS_READ_POLICY_DEFAULT dropped. Rename enum btrfs_read_policy_type to enum btrfs_read_policy. Rename BTRFS_READ_BY_PID to BTRFS_READ_POLICY_PID. (As its mainly renames. Reviewed-by retained). v4: - v3: Declare fs_devices::readmirror as enum btrfs_readmirror_policy_type v2: Declare fs_devices::readmirror as u8 instead of atomic_t A small change in comment and change log wordings. fs/btrfs/volumes.c | 15 ++++++++++++++- fs/btrfs/volumes.h | 14 ++++++++++++++ 2 files changed, 28 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c1909e5f4506..bafcf10f72ea 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1206,6 +1206,7 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices, fs_devices->latest_bdev = latest_dev->bdev; fs_devices->total_rw_bytes = 0; fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_REGULAR; + fs_devices->read_policy = BTRFS_READ_POLICY_PID; out: return ret; } @@ -5445,7 +5446,19 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info, else num_stripes = map->num_stripes; - preferred_mirror = first + current->pid % num_stripes; + switch (fs_info->fs_devices->read_policy) { + default: + /* + * Shouldn't happen, just warn and use pid instead of failing. + */ + btrfs_warn_rl(fs_info, + "unknown read_policy type %u, fallback to pid", + fs_info->fs_devices->read_policy); + /* fall through */ + case BTRFS_READ_POLICY_PID: + preferred_mirror = first + current->pid % num_stripes; + break; + } if (dev_replace_is_ongoing && fs_info->dev_replace.cont_reading_from_srcdev_mode == diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index f067b5934c46..f5ed864e4c5d 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -211,6 +211,15 @@ enum btrfs_chunk_allocation_policy { BTRFS_CHUNK_ALLOC_REGULAR, }; +/* + * Read policies for the mirrored block groups, read picks the stripe based + * on these policies. + */ +enum btrfs_read_policy { + BTRFS_READ_POLICY_PID, + BTRFS_NR_READ_POLICY, +}; + struct btrfs_fs_devices { u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */ u8 metadata_uuid[BTRFS_FSID_SIZE]; @@ -264,6 +273,11 @@ struct btrfs_fs_devices { struct completion kobj_unregister; enum btrfs_chunk_allocation_policy chunk_alloc_policy; + + /* + * policy used to read the mirrored stripes + */ + enum btrfs_read_policy read_policy; }; #define BTRFS_BIO_INLINE_CSUM_SIZE 64 From patchwork Mon Apr 6 11:51:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anand Jain X-Patchwork-Id: 11475405 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C01F914DD for ; Mon, 6 Apr 2020 11:51:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9E6242072A for ; Mon, 6 Apr 2020 11:51:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="RV3omNjq" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727801AbgDFLvj (ORCPT ); Mon, 6 Apr 2020 07:51:39 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:60732 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727192AbgDFLvj (ORCPT ); Mon, 6 Apr 2020 07:51:39 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 036BlgAG078970 for ; Mon, 6 Apr 2020 11:51:37 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=EcATJ5oaElG740C/SLF1CHMX2B7HfhpTjWfGX7OL3vw=; b=RV3omNjqeM+5D4OrIyyDTNu5lmpEIvGpKWfJHmtwcVSyAQN7VavOXI97xvb2yVpJ2vsF QVO/PL4sv52da0bJ+rREimsZGhXhOqQ+MWrt0tedrMlfC7uhGNjtLhhT1NvzEZBrXZWq rpnW+yx2br409Z3PUlyNcyLIXFdmNMlvvUK/9yXOM2NUlHLFdLbmD9RSFaQjTC4ANP9h hrcI2D53AcOVpC4dNjccD0v9Xwv5pDZXE6vwEeAABIbxrL63gztiuHcN/ifIMc1ZVDjT npeA4fLlbxtA7WSIiKqD23U2EMELRw5hkj0G2nGBxupV59G63mG/bq1a7wmDIugocOqT /A== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 306j6m6arn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Mon, 06 Apr 2020 11:51:37 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 036BmPro007355 for ; Mon, 6 Apr 2020 11:51:36 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3020.oracle.com with ESMTP id 30839ptwaa-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Mon, 06 Apr 2020 11:51:36 +0000 Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 036BpZZT025403 for ; Mon, 6 Apr 2020 11:51:35 GMT Received: from tp.localdomain (/39.109.145.141) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 06 Apr 2020 04:51:35 -0700 From: Anand Jain To: linux-btrfs@vger.kernel.org Subject: [PATCH v7 3/5] btrfs: create read policy sysfs attribute, pid Date: Mon, 6 Apr 2020 19:51:09 +0800 Message-Id: <1586173871-5559-4-git-send-email-anand.jain@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1586173871-5559-1-git-send-email-anand.jain@oracle.com> References: <1586173871-5559-1-git-send-email-anand.jain@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9582 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 spamscore=0 malwarescore=0 mlxscore=0 mlxlogscore=999 bulkscore=0 suspectscore=1 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004060104 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9582 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxlogscore=999 spamscore=0 priorityscore=1501 suspectscore=1 lowpriorityscore=0 malwarescore=0 impostorscore=0 mlxscore=0 phishscore=0 adultscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004060104 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Add /sys/fs/btrfs/UUID/read_policy attribute so that the read policy for the raid1 and raid10 chunks can be tuned. When this attribute is read, it shall show all available policies, with active policy being with in [ ]. The read_policy attribute can be written using one of the items listed in there. For example: $cat /sys/fs/btrfs/UUID/read_policy [pid] $echo pid > /sys/fs/btrfs/UUID/read_policy Signed-off-by: Anand Jain --- v5: Title rename: old: btrfs: sysfs, add read_policy attribute Uses the btrfs_strmatch() helper (BTRFS_READ_POLICY_NAME_MAX dropped). Use the table for the policy names. Rename len to ret. Use a simple logic to prefix space in btrfs_read_policy_show() Reviewed-by: Josef Bacik dropped. v4:- v3: rename [by_pid] to [pid] v2: v2: check input len before strip and kstrdup fs/btrfs/sysfs.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 7bb68cef98ab..c9a8850b186a 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -832,6 +832,54 @@ static int btrfs_strmatch(const char *given, const char *golden) return -EINVAL; } +static const char* const btrfs_read_policy_name[] = { "pid" }; + +static ssize_t btrfs_read_policy_show(struct kobject *kobj, + struct kobj_attribute *a, char *buf) +{ + int i; + ssize_t ret = 0; + struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj); + + for (i = 0; i < BTRFS_NR_READ_POLICY; i++) { + if (fs_devices->read_policy == i) + ret += snprintf(buf + ret, PAGE_SIZE - ret, "%s[%s]", + (ret == 0 ? "" : " "), + btrfs_read_policy_name[i]); + else + ret += snprintf(buf + ret, PAGE_SIZE - ret, "%s%s", + (ret == 0 ? "" : " "), + btrfs_read_policy_name[i]); + } + + ret += snprintf(buf + ret, PAGE_SIZE - ret, "\n"); + + return ret; +} + +static ssize_t btrfs_read_policy_store(struct kobject *kobj, + struct kobj_attribute *a, + const char *buf, size_t len) +{ + int i; + struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj); + + for (i = 0; i < BTRFS_NR_READ_POLICY; i++) { + if (btrfs_strmatch(buf, btrfs_read_policy_name[i]) == 0) { + if (i != fs_devices->read_policy) { + fs_devices->read_policy = i; + btrfs_info(fs_devices->fs_info, + "read policy set to '%s'", + btrfs_read_policy_name[i]); + } + return len; + } + } + + return -EINVAL; +} +BTRFS_ATTR_RW(, read_policy, btrfs_read_policy_show, btrfs_read_policy_store); + static const struct attribute *btrfs_attrs[] = { BTRFS_ATTR_PTR(, label), BTRFS_ATTR_PTR(, nodesize), @@ -840,6 +888,7 @@ static const struct attribute *btrfs_attrs[] = { BTRFS_ATTR_PTR(, quota_override), BTRFS_ATTR_PTR(, metadata_uuid), BTRFS_ATTR_PTR(, checksum), + BTRFS_ATTR_PTR(, read_policy), NULL, }; From patchwork Mon Apr 6 11:51:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anand Jain X-Patchwork-Id: 11475407 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B127614DD for ; Mon, 6 Apr 2020 11:51:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8FF972072A for ; Mon, 6 Apr 2020 11:51:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="gyvKjkoR" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727822AbgDFLvk (ORCPT ); Mon, 6 Apr 2020 07:51:40 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:40696 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727717AbgDFLvj (ORCPT ); Mon, 6 Apr 2020 07:51:39 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 036BmsVE027369 for ; Mon, 6 Apr 2020 11:51:39 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=667cYBjl1Ex8+r7OzuZZfGNo9GXxKqYez6IrgT8bItY=; b=gyvKjkoRPUShGzml12mfegD3Stij0dmvp6fgAeANEKTMtSkR2gxFNuT/NVC253ymh0Ai 5mOC2OsxrGL/XaKUfbd3if6nA6akRNyMVVO6Kh7OunOSRiM7cUZilWu4lHG8BsWEXvA3 oLk0+09lcqeMzrRQyA3ytJtgiBGPQRpHRbi2gC/UkBlr80lRScM4P8t5HuVW+CUsLKkX VjCV0Gn9T9q0dVkd2dOWwg09LBx00ynr9I+tSo3fGAw/dr0YC89dLk55XZ871rgaPBUe jkt1MYQAvgbWAGDBU4HLeRX1DoZzT1QG2IaV05RLHhRGJfjzCGx3YEQvru5rrdytEEkS Tg== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2120.oracle.com with ESMTP id 306jvmx8ev-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Mon, 06 Apr 2020 11:51:39 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 036BlOvq044059 for ; Mon, 6 Apr 2020 11:51:38 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserp3020.oracle.com with ESMTP id 30741a96qn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Mon, 06 Apr 2020 11:51:38 +0000 Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 036Bpb5R025445 for ; Mon, 6 Apr 2020 11:51:37 GMT Received: from tp.localdomain (/39.109.145.141) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 06 Apr 2020 04:51:37 -0700 From: Anand Jain To: linux-btrfs@vger.kernel.org Subject: [PATCH v7 4/5] btrfs: introduce new device-state read_preferred Date: Mon, 6 Apr 2020 19:51:10 +0800 Message-Id: <1586173871-5559-5-git-send-email-anand.jain@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1586173871-5559-1-git-send-email-anand.jain@oracle.com> References: <1586173871-5559-1-git-send-email-anand.jain@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9582 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 phishscore=0 malwarescore=0 bulkscore=0 spamscore=0 adultscore=0 mlxlogscore=999 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004060104 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9582 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 suspectscore=1 mlxlogscore=999 mlxscore=0 bulkscore=0 adultscore=0 priorityscore=1501 lowpriorityscore=0 clxscore=1015 malwarescore=0 impostorscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004060104 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This is a preparatory patch and introduces a new device flag 'read_preferred', and is a generic flag which along with the read_policy 'device' in the following patch the user can route the read IO to the device of choice. This also provides a sysfs interface to set the device state as read_preferred. Signed-off-by: Anand Jain --- v7: Change log updated. v6: If there is no change in device's read prefer then don't log. Add pid to the logs. v5: born fs/btrfs/sysfs.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 1 + 2 files changed, 56 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index c9a8850b186a..72daaedb7b04 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -1317,11 +1317,66 @@ static ssize_t btrfs_devinfo_writeable_show(struct kobject *kobj, } BTRFS_ATTR(devid, writeable, btrfs_devinfo_writeable_show); +static ssize_t btrfs_devinfo_read_pref_show(struct kobject *kobj, + struct kobj_attribute *a, char *buf) +{ + int val; + struct btrfs_device *device = container_of(kobj, struct btrfs_device, + devid_kobj); + + val = !!test_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state); + + return snprintf(buf, PAGE_SIZE, "%d\n", val); +} + +static ssize_t btrfs_devinfo_read_pref_store(struct kobject *kobj, + struct kobj_attribute *a, + const char *buf, size_t len) +{ + int ret; + unsigned long val; + struct btrfs_device *device; + + ret = kstrtoul(skip_spaces(buf), 0, &val); + if (ret) + return ret; + + if (val != 0 && val != 1) + return -EINVAL; + + /* + * lock is not required, the btrfs_device struct can't be freed while + * its kobject btrfs_device::devid_kobj is still open. + */ + device = container_of(kobj, struct btrfs_device, devid_kobj); + + if (val && + ! test_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state)) { + + set_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state); + btrfs_info(device->fs_devices->fs_info, + "set read preferred on devid %llu (%d)", + device->devid, task_pid_nr(current)); + } else if (!val && + test_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state)) { + + clear_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state); + btrfs_info(device->fs_devices->fs_info, + "reset read preferred on devid %llu (%d)", + device->devid, task_pid_nr(current)); + } + + return len; +} +BTRFS_ATTR_RW(devid, read_preferred, btrfs_devinfo_read_pref_show, + btrfs_devinfo_read_pref_store); + static struct attribute *devid_attrs[] = { BTRFS_ATTR_PTR(devid, in_fs_metadata), BTRFS_ATTR_PTR(devid, missing), BTRFS_ATTR_PTR(devid, replace_target), BTRFS_ATTR_PTR(devid, writeable), + BTRFS_ATTR_PTR(devid, read_preferred), NULL }; ATTRIBUTE_GROUPS(devid); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 1775d35706ab..487a54c3140e 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -50,6 +50,7 @@ struct btrfs_io_geometry { #define BTRFS_DEV_STATE_MISSING (2) #define BTRFS_DEV_STATE_REPLACE_TGT (3) #define BTRFS_DEV_STATE_FLUSH_SENT (4) +#define BTRFS_DEV_STATE_READ_PREFERRED (5) struct btrfs_device { struct list_head dev_list; /* device_list_mutex */ From patchwork Mon Apr 6 11:51:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anand Jain X-Patchwork-Id: 11475409 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 15C9814DD for ; Mon, 6 Apr 2020 11:51:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DFDEE2072A for ; Mon, 6 Apr 2020 11:51:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="G/s2qAh+" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727831AbgDFLvm (ORCPT ); Mon, 6 Apr 2020 07:51:42 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:40718 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727717AbgDFLvl (ORCPT ); Mon, 6 Apr 2020 07:51:41 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 036BmsVG027369 for ; Mon, 6 Apr 2020 11:51:40 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id : in-reply-to : references; s=corp-2020-01-29; bh=b8D3KCYlcHlSc8ubkZB8GxPYL1MW6t6EH36FG0m0ifM=; b=G/s2qAh+An6B5OvM0+iNFCGJopAMmRQvyol2fhxv9UuJMtXfvDRCcs6pJRQCUdoeRYmY ioAAIhvs//676tezxhzIaR69DyBacWQfn8SBlNVDjgh3KsPrvG5VtJB24t4/jLwFG+RA l9HSaRgzjtVYMtByYiR+CqdiAR9lqf02tvmpw573AjyoO0gc/mLWVM6/kqY6pT7k+f44 R92QKmUoIbsU4mpHSU+n3kP7nJee03Ck6E5jkOwKpqFiXH4MvWCYPLu5RP+U3cYC4WGa thV0zjf4LFE/OTo/4e63rVDyLrAR1gv3ddA11d8RqgTTNFiu4my3jcqng8BGkjBEPMkY fA== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2120.oracle.com with ESMTP id 306jvmx8f3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Mon, 06 Apr 2020 11:51:40 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 036BlEJo064751 for ; Mon, 6 Apr 2020 11:51:40 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3030.oracle.com with ESMTP id 3073qcw523-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Mon, 06 Apr 2020 11:51:40 +0000 Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 036BpdnM009335 for ; Mon, 6 Apr 2020 11:51:39 GMT Received: from tp.localdomain (/39.109.145.141) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 06 Apr 2020 04:51:39 -0700 From: Anand Jain To: linux-btrfs@vger.kernel.org Subject: [PATCH v7 5/5] btrfs: introduce new read_policy device Date: Mon, 6 Apr 2020 19:51:11 +0800 Message-Id: <1586173871-5559-6-git-send-email-anand.jain@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1586173871-5559-1-git-send-email-anand.jain@oracle.com> References: <1586173871-5559-1-git-send-email-anand.jain@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9582 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 mlxscore=0 mlxlogscore=999 spamscore=0 bulkscore=0 adultscore=0 malwarescore=0 suspectscore=1 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004060104 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9582 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 suspectscore=1 mlxlogscore=999 mlxscore=0 bulkscore=0 adultscore=0 priorityscore=1501 lowpriorityscore=0 clxscore=1015 malwarescore=0 impostorscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004060104 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Read-policy type 'device' and device flag 'read-preferred': The read-policy type device picks the device(s) flagged as read-preferred for reading chunks of type raid1, raid10, raid1c3 and raid1c4. As system might contain ssd, nvme, iscsi or san lun, and which are all a non-rotational device its not a good idea to set the read-preferred automatically. Instead device read-policy along with the read-preferred flag provides an ability to do it manually. This advance tuning is useful in more than one situation, like for example, - In heterogeneous-disk volume it provides an ability to choose the low latency disks for reading. - Useful for more accurate testing. - Avoid known problematic device from reading the chunk until it is replaced (by mark the good devices as read-preferred). Note: If the read-policy type is set to 'device', but there isn't any device which is flagged as read-preferred, then stripe 0 is used for reading. The device replace won't migrate the read-preferred flag to the new replace target device. As of now this is in-memory only feature. Its point less to set the read-preferred flag on the missing device, as IOs aren't submitted to the missing device. If there are more than one read-preferred device in a chunk, the read IO shall go to the stripe 0 (as of now, when qdepth patches are integrated we will use the least busy device among the read-preferred devices). Usage example: Consider a typical two disks raid1. Configure devid1 for reading. $ echo 1 > devinfo/1/read_preferred $ cat devinfo/1/read_preferred; cat devinfo/2/read_preferred 1 0 $ pwd /sys/fs/btrfs/12345678-1234-1234-1234-123456789abc $ cat read_policy; echo device > ./read_policy; cat read_policy [pid] device pid [device] Now read IOs are sent to devid 1 (sdb). $ echo 3 > /proc/sys/vm/drop_caches; md5sum /btrfs/YkZI $ iostat -zy 1 | egrep 'sdb|sdc' (from another terminal) sdb 50.00 40048.00 0.00 40048 0 Change the read-preferred device from devid 1 to devid 2 (sdc). $ echo 0 > ./devinfo/1/read_preferred; echo 1 > ./devinfo/2/read_preferred; [ 3343.918658] BTRFS info (device sdb): reset read preferred on devid 1 (1334) [ 3343.919876] BTRFS info (device sdb): set read preferred on devid 2 (1334) Further read ios are sent to devid 2 (sdc). $ echo 3 > /proc/sys/vm/drop_caches; md5sum /btrfs/YkZI $ iostat -zy 1 | egrep 'sdb|sdc' (from another terminal) sdc 49.00 40048.00 0.00 40048 0 Signed-off-by: Anand Jain --- v7: Change log updated. v6: . If there isn't read preferred device in the chunk don't reset read policy to default, instead just use stripe 0. As this is in the read path it avoids going through the device list to find read preferred device. So inline to this drop to check if there is read preferred device before setting read policy to device. . Commit log updated. Adds more info about this new feature. v5: born fs/btrfs/sysfs.c | 3 ++- fs/btrfs/volumes.c | 24 ++++++++++++++++++++++++ fs/btrfs/volumes.h | 1 + 3 files changed, 27 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 72daaedb7b04..af53ed879dd6 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -832,7 +832,8 @@ static int btrfs_strmatch(const char *given, const char *golden) return -EINVAL; } -static const char* const btrfs_read_policy_name[] = { "pid" }; +/* Must follow the order as in enum btrfs_read_policy */ +static const char* const btrfs_read_policy_name[] = { "pid", "device" }; static ssize_t btrfs_read_policy_show(struct kobject *kobj, struct kobj_attribute *a, char *buf) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9dd7e3687463..5e53380e1d8d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5380,6 +5380,26 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len) return ret; } +static int btrfs_find_read_preferred(struct map_lookup *map, int num_stripe) +{ + int i; + + /* + * If there are more than one read preferred devices, then just pick the + * first found read preferred device as of now. Once we have the Qdepth + * based device selection, we could pick the least busy device among the + * read preferred devices. + */ + for (i = 0; i < num_stripe; i++) { + if (test_bit(BTRFS_DEV_STATE_READ_PREFERRED, + &map->stripes[i].dev->dev_state)) + return i; + } + + /* If there is no read preferred device then just use stripe 0 */ + return 0; +} + static int find_live_mirror(struct btrfs_fs_info *fs_info, struct map_lookup *map, int first, int dev_replace_is_ongoing) @@ -5399,6 +5419,10 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info, num_stripes = map->num_stripes; switch (fs_info->fs_devices->read_policy) { + case BTRFS_READ_POLICY_DEVICE: + preferred_mirror = btrfs_find_read_preferred(map, num_stripes); + preferred_mirror = first + preferred_mirror; + break; default: /* * Shouldn't happen, just warn and use pid instead of failing. diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 487a54c3140e..efa9635a4748 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -214,6 +214,7 @@ BTRFS_DEVICE_GETSET_FUNCS(bytes_used); */ enum btrfs_read_policy { BTRFS_READ_POLICY_PID, + BTRFS_READ_POLICY_DEVICE, BTRFS_NR_READ_POLICY, };