From patchwork Wed Jun 12 06:48:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10988683 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 49BD96C5 for ; Wed, 12 Jun 2019 06:51:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 33A8328987 for ; Wed, 12 Jun 2019 06:51:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1DAAE2899F; Wed, 12 Jun 2019 06:51:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DD373289AB for ; Wed, 12 Jun 2019 06:51:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404582AbfFLGvS (ORCPT ); Wed, 12 Jun 2019 02:51:18 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:35850 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2403835AbfFLGvS (ORCPT ); Wed, 12 Jun 2019 02:51:18 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x5C6mwXK055436; Wed, 12 Jun 2019 06:50:58 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=eIAf1m2AR/bPmVTeiHenMrmQtJoSnA2UaqNNug8sYJ0=; b=t+73FP8b2eWhlGqg/JmUBc/0RrzpKsj/Va8zmayM7w12+6/eDXCZ/tAumsIRa45hrLaC ++CBxZVWoJhoeaAuEIb3I0a8L0NuY1NZli3wKve9p0b7Iei+LwT0BOX5rNHrqwlfsTNn eM3AX0bh0qNaryh1rOQb7S7zSv/8iXAsA68ZFpKgnnoyppTy/XmBU6EHChguk7Wrfnni bHd1HQFssFi4ySW3H2o7cb1frFTk3G5UKiAKVU86rQgF6WkYHS7Z4f03GSfU3scyUgns Do0ORP9FQ9y4R83/R7eptAKvj6AcBJBKt+T5ocKR5hf3xEwfDq5+LXdDv+pmwrkWY95V 9Q== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2120.oracle.com with ESMTP id 2t05nqsc2t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 12 Jun 2019 06:50:58 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x5C6mt20049391; Wed, 12 Jun 2019 06:48:57 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3030.oracle.com with ESMTP id 2t04hyrwqd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 12 Jun 2019 06:48:57 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x5C6mvLq019048; Wed, 12 Jun 2019 06:48:57 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 11 Jun 2019 23:48:56 -0700 Subject: [PATCH 13/14] xfs: multithreaded iwalk implementation From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, bfoster@redhat.com Date: Tue, 11 Jun 2019 23:48:55 -0700 Message-ID: <156032213550.3774243.7211431131768873383.stgit@magnolia> In-Reply-To: <156032205136.3774243.15725828509940520561.stgit@magnolia> References: <156032205136.3774243.15725828509940520561.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9285 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906120047 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9285 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=3 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906120047 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Create a parallel iwalk implementation and switch quotacheck to use it. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/xfs_globals.c | 3 + fs/xfs/xfs_iwalk.c | 82 +++++++++++++++++++++++++++++++++ fs/xfs/xfs_iwalk.h | 2 + fs/xfs/xfs_pwork.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_pwork.h | 58 +++++++++++++++++++++++ fs/xfs/xfs_qm.c | 2 - fs/xfs/xfs_sysctl.h | 6 ++ fs/xfs/xfs_sysfs.c | 40 ++++++++++++++++ fs/xfs/xfs_trace.h | 18 +++++++ 10 files changed, 337 insertions(+), 1 deletion(-) create mode 100644 fs/xfs/xfs_pwork.c create mode 100644 fs/xfs/xfs_pwork.h diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 74d30ef0dbce..48940a27d4aa 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -84,6 +84,7 @@ xfs-y += xfs_aops.o \ xfs_message.o \ xfs_mount.o \ xfs_mru_cache.o \ + xfs_pwork.o \ xfs_reflink.o \ xfs_stats.o \ xfs_super.o \ diff --git a/fs/xfs/xfs_globals.c b/fs/xfs/xfs_globals.c index d0d377384120..4f93f2c4dc38 100644 --- a/fs/xfs/xfs_globals.c +++ b/fs/xfs/xfs_globals.c @@ -31,6 +31,9 @@ xfs_param_t xfs_params = { .fstrm_timer = { 1, 30*100, 3600*100}, .eofb_timer = { 1, 300, 3600*24}, .cowb_timer = { 1, 1800, 3600*24}, +#ifdef DEBUG + .pwork_threads = { 0, 0, NR_CPUS }, +#endif }; struct xfs_globals xfs_globals = { diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c index def37347a362..0fe740298981 100644 --- a/fs/xfs/xfs_iwalk.c +++ b/fs/xfs/xfs_iwalk.c @@ -20,6 +20,7 @@ #include "xfs_icache.h" #include "xfs_health.h" #include "xfs_trans.h" +#include "xfs_pwork.h" /* * Walking Inodes in the Filesystem @@ -45,6 +46,9 @@ */ struct xfs_iwalk_ag { + /* parallel work control data; will be null if single threaded */ + struct xfs_pwork pwork; + struct xfs_mount *mp; struct xfs_trans *tp; @@ -181,6 +185,9 @@ xfs_iwalk_ag_recs( trace_xfs_iwalk_ag_rec(mp, agno, irec); + if (xfs_pwork_want_abort(&iwag->pwork)) + return 0; + if (iwag->inobt_walk_fn) { error = iwag->inobt_walk_fn(mp, tp, agno, irec, iwag->data); @@ -192,6 +199,9 @@ xfs_iwalk_ag_recs( continue; for (j = 0; j < XFS_INODES_PER_CHUNK; j++) { + if (xfs_pwork_want_abort(&iwag->pwork)) + return 0; + /* Skip if this inode is free */ if (XFS_INOBT_MASK(j) & irec->ir_free) continue; @@ -386,6 +396,8 @@ xfs_iwalk_ag( struct xfs_inobt_rec_incore *irec; cond_resched(); + if (xfs_pwork_want_abort(&iwag->pwork)) + goto out; /* Fetch the inobt record. */ irec = &iwag->recs[iwag->nr_recs]; @@ -506,6 +518,7 @@ xfs_iwalk( .startino = startino, .trim_start = 1, .skip_empty = 1, + .pwork = XFS_PWORK_SINGLE_THREADED, }; xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, startino); int error; @@ -528,6 +541,74 @@ xfs_iwalk( return error; } +/* Run per-thread iwalk work. */ +static int +xfs_iwalk_ag_work( + struct xfs_mount *mp, + struct xfs_pwork *pwork) +{ + struct xfs_iwalk_ag *iwag; + int error; + + iwag = container_of(pwork, struct xfs_iwalk_ag, pwork); + if (xfs_pwork_want_abort(pwork)) + goto out; + + error = xfs_iwalk_alloc(iwag); + if (error) + goto out; + + error = xfs_iwalk_ag(iwag); + xfs_iwalk_free(iwag); +out: + kmem_free(iwag); + return error; +} + +/* + * Walk all the inodes in the filesystem using multiple threads to process each + * AG. + */ +int +xfs_iwalk_threaded( + struct xfs_mount *mp, + xfs_ino_t startino, + xfs_iwalk_fn iwalk_fn, + unsigned int max_prefetch, + void *data) +{ + struct xfs_pwork_ctl pctl; + xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, startino); + unsigned int nr_threads; + int error; + + ASSERT(agno < mp->m_sb.sb_agcount); + + nr_threads = xfs_pwork_guess_datadev_parallelism(mp); + error = xfs_pwork_init(mp, &pctl, xfs_iwalk_ag_work, "xfs_iwalk", + nr_threads); + if (error) + return error; + + for (; agno < mp->m_sb.sb_agcount; agno++) { + struct xfs_iwalk_ag *iwag; + + if (xfs_pwork_ctl_want_abort(&pctl)) + break; + + iwag = kmem_zalloc(sizeof(struct xfs_iwalk_ag), KM_SLEEP); + iwag->mp = mp; + iwag->iwalk_fn = iwalk_fn; + iwag->data = data; + iwag->startino = startino; + xfs_iwalk_set_prefetch(iwag, max_prefetch); + xfs_pwork_queue(&pctl, &iwag->pwork); + startino = XFS_AGINO_TO_INO(mp, agno + 1, 0); + } + + return xfs_pwork_destroy(&pctl); +} + /* * Walk all inode btree records in the filesystem starting from @startino. The * @inobt_walk_fn will be called for each btree record, being passed the incore @@ -549,6 +630,7 @@ xfs_inobt_walk( .inobt_walk_fn = inobt_walk_fn, .data = data, .startino = startino, + .pwork = XFS_PWORK_SINGLE_THREADED, }; xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, startino); int error; diff --git a/fs/xfs/xfs_iwalk.h b/fs/xfs/xfs_iwalk.h index 97c1120d4237..56e0dfe1b2ce 100644 --- a/fs/xfs/xfs_iwalk.h +++ b/fs/xfs/xfs_iwalk.h @@ -15,6 +15,8 @@ typedef int (*xfs_iwalk_fn)(struct xfs_mount *mp, struct xfs_trans *tp, int xfs_iwalk(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t startino, xfs_iwalk_fn iwalk_fn, unsigned int max_prefetch, void *data); +int xfs_iwalk_threaded(struct xfs_mount *mp, xfs_ino_t startino, + xfs_iwalk_fn iwalk_fn, unsigned int max_prefetch, void *data); /* Walk all inode btree records in the filesystem starting from @startino. */ typedef int (*xfs_inobt_walk_fn)(struct xfs_mount *mp, struct xfs_trans *tp, diff --git a/fs/xfs/xfs_pwork.c b/fs/xfs/xfs_pwork.c new file mode 100644 index 000000000000..8d0d5f130252 --- /dev/null +++ b/fs/xfs/xfs_pwork.c @@ -0,0 +1,126 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2019 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_trace.h" +#include "xfs_sysctl.h" +#include "xfs_pwork.h" + +/* + * Parallel Work Queue + * =================== + * + * Abstract away the details of running a large and "obviously" parallelizable + * task across multiple CPUs. Callers initialize the pwork control object with + * a desired level of parallelization and a work function. Next, they embed + * struct xfs_pwork in whatever structure they use to pass work context to a + * worker thread and queue that pwork. The work function will be passed the + * pwork item when it is run (from process context) and any returned error will + * be recorded in xfs_pwork_ctl.error. Work functions should check for errors + * and abort if necessary; the non-zeroness of xfs_pwork_ctl.error does not + * stop workqueue item processing. + * + * This is the rough equivalent of the xfsprogs workqueue code, though we can't + * reuse that name here. + */ + +/* Invoke our caller's function. */ +static void +xfs_pwork_work( + struct work_struct *work) +{ + struct xfs_pwork *pwork; + struct xfs_pwork_ctl *pctl; + int error; + + pwork = container_of(work, struct xfs_pwork, work); + pctl = pwork->pctl; + error = pctl->work_fn(pctl->mp, pwork); + if (error && !pctl->error) + pctl->error = error; +} + +/* + * Set up control data for parallel work. @work_fn is the function that will + * be called. @tag will be written into the kernel threads. @nr_threads is + * the level of parallelism desired, or 0 for no limit. + */ +int +xfs_pwork_init( + struct xfs_mount *mp, + struct xfs_pwork_ctl *pctl, + xfs_pwork_work_fn work_fn, + const char *tag, + unsigned int nr_threads) +{ +#ifdef DEBUG + if (xfs_globals.pwork_threads > 0) + nr_threads = xfs_globals.pwork_threads; + else if (xfs_globals.pwork_threads < 0) + nr_threads = 0; +#endif + trace_xfs_pwork_init(mp, nr_threads, current->pid); + + pctl->wq = alloc_workqueue("%s-%d", WQ_FREEZABLE, nr_threads, tag, + current->pid); + if (!pctl->wq) + return -ENOMEM; + pctl->work_fn = work_fn; + pctl->error = 0; + pctl->mp = mp; + + return 0; +} + +/* Queue some parallel work. */ +void +xfs_pwork_queue( + struct xfs_pwork_ctl *pctl, + struct xfs_pwork *pwork) +{ + INIT_WORK(&pwork->work, xfs_pwork_work); + pwork->pctl = pctl; + queue_work(pctl->wq, &pwork->work); +} + +/* Wait for the work to finish and tear down the control structure. */ +int +xfs_pwork_destroy( + struct xfs_pwork_ctl *pctl) +{ + destroy_workqueue(pctl->wq); + pctl->wq = NULL; + return pctl->error; +} + +/* + * Return the amount of parallelism that the data device can handle, or 0 for + * no limit. + */ +unsigned int +xfs_pwork_guess_datadev_parallelism( + struct xfs_mount *mp) +{ + struct xfs_buftarg *btp = mp->m_ddev_targp; + int iomin; + int ioopt; + + if (blk_queue_nonrot(btp->bt_bdev->bd_queue)) + return num_online_cpus(); + if (mp->m_sb.sb_width && mp->m_sb.sb_unit) + return mp->m_sb.sb_width / mp->m_sb.sb_unit; + iomin = bdev_io_min(btp->bt_bdev); + ioopt = bdev_io_opt(btp->bt_bdev); + if (iomin && ioopt) + return ioopt / iomin; + + return 1; +} diff --git a/fs/xfs/xfs_pwork.h b/fs/xfs/xfs_pwork.h new file mode 100644 index 000000000000..4cf1a6f48237 --- /dev/null +++ b/fs/xfs/xfs_pwork.h @@ -0,0 +1,58 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (C) 2019 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __XFS_PWORK_H__ +#define __XFS_PWORK_H__ + +struct xfs_pwork; +struct xfs_mount; + +typedef int (*xfs_pwork_work_fn)(struct xfs_mount *mp, struct xfs_pwork *pwork); + +/* + * Parallel work coordination structure. + */ +struct xfs_pwork_ctl { + struct workqueue_struct *wq; + struct xfs_mount *mp; + xfs_pwork_work_fn work_fn; + int error; +}; + +/* + * Embed this parallel work control item inside your own work structure, + * then queue work with it. + */ +struct xfs_pwork { + struct work_struct work; + struct xfs_pwork_ctl *pctl; +}; + +#define XFS_PWORK_SINGLE_THREADED { .pctl = NULL } + +/* Have we been told to abort? */ +static inline bool +xfs_pwork_ctl_want_abort( + struct xfs_pwork_ctl *pctl) +{ + return pctl && pctl->error; +} + +/* Have we been told to abort? */ +static inline bool +xfs_pwork_want_abort( + struct xfs_pwork *pwork) +{ + return xfs_pwork_ctl_want_abort(pwork->pctl); +} + +int xfs_pwork_init(struct xfs_mount *mp, struct xfs_pwork_ctl *pctl, + xfs_pwork_work_fn work_fn, const char *tag, + unsigned int nr_threads); +void xfs_pwork_queue(struct xfs_pwork_ctl *pctl, struct xfs_pwork *pwork); +int xfs_pwork_destroy(struct xfs_pwork_ctl *pctl); +unsigned int xfs_pwork_guess_datadev_parallelism(struct xfs_mount *mp); + +#endif /* __XFS_PWORK_H__ */ diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 52e8ec0aa064..8004c931c86e 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -1304,7 +1304,7 @@ xfs_qm_quotacheck( flags |= XFS_PQUOTA_CHKD; } - error = xfs_iwalk(mp, NULL, 0, xfs_qm_dqusage_adjust, 0, NULL); + error = xfs_iwalk_threaded(mp, 0, xfs_qm_dqusage_adjust, 0, NULL); if (error) goto error_return; diff --git a/fs/xfs/xfs_sysctl.h b/fs/xfs/xfs_sysctl.h index ad7f9be13087..b555e045e2f4 100644 --- a/fs/xfs/xfs_sysctl.h +++ b/fs/xfs/xfs_sysctl.h @@ -37,6 +37,9 @@ typedef struct xfs_param { xfs_sysctl_val_t fstrm_timer; /* Filestream dir-AG assoc'n timeout. */ xfs_sysctl_val_t eofb_timer; /* Interval between eofb scan wakeups */ xfs_sysctl_val_t cowb_timer; /* Interval between cowb scan wakeups */ +#ifdef DEBUG + xfs_sysctl_val_t pwork_threads; /* Parallel workqueue thread count */ +#endif } xfs_param_t; /* @@ -82,6 +85,9 @@ enum { extern xfs_param_t xfs_params; struct xfs_globals { +#ifdef DEBUG + int pwork_threads; /* parallel workqueue threads */ +#endif int log_recovery_delay; /* log recovery delay (secs) */ int mount_delay; /* mount setup delay (secs) */ bool bug_on_assert; /* BUG() the kernel on assert failure */ diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c index cabda13f3c64..910e6b9cb1a7 100644 --- a/fs/xfs/xfs_sysfs.c +++ b/fs/xfs/xfs_sysfs.c @@ -206,11 +206,51 @@ always_cow_show( } XFS_SYSFS_ATTR_RW(always_cow); +#ifdef DEBUG +/* + * Override how many threads the parallel work queue is allowed to create. + * This has to be a debug-only global (instead of an errortag) because one of + * the main users of parallel workqueues is mount time quotacheck. + */ +STATIC ssize_t +pwork_threads_store( + struct kobject *kobject, + const char *buf, + size_t count) +{ + int ret; + int val; + + ret = kstrtoint(buf, 0, &val); + if (ret) + return ret; + + if (val < 0 || val > NR_CPUS) + return -EINVAL; + + xfs_globals.pwork_threads = val; + + return count; +} + +STATIC ssize_t +pwork_threads_show( + struct kobject *kobject, + char *buf) +{ + return snprintf(buf, PAGE_SIZE, "%d\n", xfs_globals.pwork_threads); +} +XFS_SYSFS_ATTR_RW(pwork_threads); +#endif /* DEBUG */ + static struct attribute *xfs_dbg_attrs[] = { ATTR_LIST(bug_on_assert), ATTR_LIST(log_recovery_delay), ATTR_LIST(mount_delay), ATTR_LIST(always_cow), +#ifdef DEBUG + ATTR_LIST(pwork_threads), +#endif NULL, }; diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index f9bb1d50bc0e..658cbade1998 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3556,6 +3556,24 @@ TRACE_EVENT(xfs_iwalk_ag_rec, __entry->startino, __entry->freemask) ) +TRACE_EVENT(xfs_pwork_init, + TP_PROTO(struct xfs_mount *mp, unsigned int nr_threads, pid_t pid), + TP_ARGS(mp, nr_threads, pid), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(unsigned int, nr_threads) + __field(pid_t, pid) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->nr_threads = nr_threads; + __entry->pid = pid; + ), + TP_printk("dev %d:%d nr_threads %u pid %u", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->nr_threads, __entry->pid) +) + #endif /* _TRACE_XFS_H */ #undef TRACE_INCLUDE_PATH