From patchwork Wed Nov 16 23:50:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 13045938 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B2D4C433FE for ; Wed, 16 Nov 2022 23:50:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234080AbiKPXuQ (ORCPT ); Wed, 16 Nov 2022 18:50:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234083AbiKPXuP (ORCPT ); Wed, 16 Nov 2022 18:50:15 -0500 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 600C628716 for ; Wed, 16 Nov 2022 15:50:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=BtJOqeCS7SNVXx7/5uk1g4w35hWKG91Ntf3f/X0gC7M=; b=oID8kqgTHDaR0Jje1P2rrLRk6h BhdlGqbWYRweJzfe8iU/Z/lWbLlGXsJVdkJDYzdk+casdDt2SuAb+pMcviLr7AFS7BgRmbViXxWs7 6TH1bsf2ri1wsY5VI9F9aQ4c2Z3Lg5S2qyN4E/cyyku1m2KrfClpGi9H+w/aj74GBhfkYwxDEVT6S IynkkdrW/bZsN+iHbWIzyqGzVVtPGdRgOAsCDBmf/97480e/12nqP3EoZ0BmuI2rKSecqSJ35ghgB BFH2zI1oi2ZLqAG57PmDi8+JyMV5/xHF0NXqOGYM7MgmGKYqwngz+MS7ZCmlnpmdEi9G3GcDAx8Rs KtfaWQEQ==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ovSAW-0043zP-KJ; Wed, 16 Nov 2022 16:50:13 -0700 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1ovSAW-000KnF-Dt; Wed, 16 Nov 2022 16:50:12 -0700 From: Logan Gunthorpe To: linux-raid@vger.kernel.org, Jes Sorensen Cc: Guoqing Jiang , Xiao Ni , Mariusz Tkaczyk , Coly Li , Chaitanya Kulkarni , Jonmichael Hands , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Kinga Tanska Date: Wed, 16 Nov 2022 16:50:03 -0700 Message-Id: <20221116235009.79875-2-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221116235009.79875-1-logang@deltatee.com> References: <20221116235009.79875-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-raid@vger.kernel.org, jes@trained-monkey.org, guoqing.jiang@linux.dev, xni@redhat.com, colyli@suse.de, chaitanyak@nvidia.com, jm@chia.net, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, mariusz.tkaczyk@linux.intel.com, kinga.tanska@linux.intel.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH mdadm v5 1/7] Create: goto abort_locked instead of return 1 in error path X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org The return 1 after the fstat_is_blkdev() check should be replaced with an error return that goes through the error path to unlock resources locked by this function. Signed-off-by: Logan Gunthorpe Acked-by: Kinga Tanska --- Create.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Create.c b/Create.c index 953e73722518..2e8203ecdccd 100644 --- a/Create.c +++ b/Create.c @@ -939,7 +939,7 @@ int Create(struct supertype *st, char *mddev, goto abort_locked; } if (!fstat_is_blkdev(fd, dv->devname, &rdev)) - return 1; + goto abort_locked; inf->disk.major = major(rdev); inf->disk.minor = minor(rdev); } From patchwork Wed Nov 16 23:50:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 13045940 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D1CEC43217 for ; Wed, 16 Nov 2022 23:50:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238529AbiKPXuT (ORCPT ); Wed, 16 Nov 2022 18:50:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234644AbiKPXuQ (ORCPT ); Wed, 16 Nov 2022 18:50:16 -0500 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 79C5828716 for ; Wed, 16 Nov 2022 15:50:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=rlxtNyPmFWFPQgVVnPcBXfpoPxvJcmCLlS4oCAhQclo=; b=onMmH4fJI08BkZKmTiZh0S/nrr mf2vq9vvEPr/aFfMvMG4gMxvYnPqJzij7aKYApLEsjN68BXrjof0c86H4F8sUkNg/fCHNGZ4H4vHT w7gwlW9+viCyoyzHw5mzg+MVsI1UoMrbf+ccX2dqts76163AFreCu4pCKw3arRR6xMT+EIR6HRwRd khE2BVHJjqbPfdlayXgrzN2Jk/OMdSoV1kVcV9ENlVK9WDwBQWsAiDaXmvN0poiP3nmttf8mjnvOb sRW4KFnkYIzHCAKef50FTx+KLdVgX/8cNT49NPKzN0aCHrN9jyNETtT0okpOYTB0dneWxyCfLChlx 1GboOQ0Q==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ovSAW-0043zQ-KO; Wed, 16 Nov 2022 16:50:14 -0700 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1ovSAW-000KnI-HP; Wed, 16 Nov 2022 16:50:12 -0700 From: Logan Gunthorpe To: linux-raid@vger.kernel.org, Jes Sorensen Cc: Guoqing Jiang , Xiao Ni , Mariusz Tkaczyk , Coly Li , Chaitanya Kulkarni , Jonmichael Hands , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Kinga Tanska Date: Wed, 16 Nov 2022 16:50:04 -0700 Message-Id: <20221116235009.79875-3-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221116235009.79875-1-logang@deltatee.com> References: <20221116235009.79875-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-raid@vger.kernel.org, jes@trained-monkey.org, guoqing.jiang@linux.dev, xni@redhat.com, colyli@suse.de, chaitanyak@nvidia.com, jm@chia.net, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, mariusz.tkaczyk@linux.intel.com, kinga.tanska@linux.intel.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH mdadm v5 2/7] Create: remove safe_mode_delay local variable X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org All .getinfo_super() call sets the info.safe_mode_delay variables to a constant value, so no matter what the current state is that function will always set it to the same value. Create() calls .getinfo_super() multiple times while creating the array. The value is stored in a local variable for every disk in the loop to add disks (so the last disc call takes precedence). The local variable is then used in the call to sysfs_set_safemode(). This can be simplified by using info.safe_mode_delay directly. The info variable had .getinfo_super() called on it early in the function so, by the reasoning above, it will have the same value as the local variable which can thus be removed. Doing this allows for factoring out code from Create() in a subsequent patch. Signed-off-by: Logan Gunthorpe Acked-by: Kinga Tanska --- Create.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/Create.c b/Create.c index 2e8203ecdccd..8ded81dc265d 100644 --- a/Create.c +++ b/Create.c @@ -137,7 +137,6 @@ int Create(struct supertype *st, char *mddev, int did_default = 0; int do_default_layout = 0; int do_default_chunk = 0; - unsigned long safe_mode_delay = 0; char chosen_name[1024]; struct map_ent *map = NULL; unsigned long long newsize; @@ -952,7 +951,6 @@ int Create(struct supertype *st, char *mddev, goto abort_locked; } st->ss->getinfo_super(st, inf, NULL); - safe_mode_delay = inf->safe_mode_delay; if (have_container && c->verbose > 0) pr_err("Using %s for device %d\n", @@ -1065,7 +1063,7 @@ int Create(struct supertype *st, char *mddev, "readonly"); break; } - sysfs_set_safemode(&info, safe_mode_delay); + sysfs_set_safemode(&info, info.safe_mode_delay); if (err) { pr_err("failed to activate array.\n"); ioctl(mdfd, STOP_ARRAY, NULL); From patchwork Wed Nov 16 23:50:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 13045944 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED184C4332F for ; Wed, 16 Nov 2022 23:50:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234704AbiKPXu0 (ORCPT ); Wed, 16 Nov 2022 18:50:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41846 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234612AbiKPXuR (ORCPT ); Wed, 16 Nov 2022 18:50:17 -0500 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B62C84AF06 for ; Wed, 16 Nov 2022 15:50:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=9HNQQ7vpqG7tbLU5LWomzVfWMMKNI2PFBkHz+ntg0fE=; b=MXxgpEe0wR5jIO5ye9roDhPjGS OhNh9La90/2QBh1QBEjyWiut8RhvtAH1uMFttZeR4JuFv9B3rrq9yDGHbwBAPp3DIJDPYvSBSkJPD 2HfTulG4//epUbA/DFaSY0xt1vjP7xYuv2s7wrf6AhqVexa1n67luOKRfR3WvsSyqjc2zdLF3zvXy dKpixnn8ia8oml4yKd1WLDMFtQybxHSFTvZzu2Mk6F4qPRXshHUOwIBJuc0IQJxxSCxLy1mGhBrsU j5vXYFkQDujYeybul3DNxMmh1TTAKnXAbKa1gFKey7KjnUSw48dHhf5e4MAAb7HS6wpEC2O0Yzp/v FndeU4Zg==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ovSAW-0043zU-O2; Wed, 16 Nov 2022 16:50:14 -0700 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1ovSAW-000KnL-Ky; Wed, 16 Nov 2022 16:50:12 -0700 From: Logan Gunthorpe To: linux-raid@vger.kernel.org, Jes Sorensen Cc: Guoqing Jiang , Xiao Ni , Mariusz Tkaczyk , Coly Li , Chaitanya Kulkarni , Jonmichael Hands , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Kinga Tanska Date: Wed, 16 Nov 2022 16:50:05 -0700 Message-Id: <20221116235009.79875-4-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221116235009.79875-1-logang@deltatee.com> References: <20221116235009.79875-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-raid@vger.kernel.org, jes@trained-monkey.org, guoqing.jiang@linux.dev, xni@redhat.com, colyli@suse.de, chaitanyak@nvidia.com, jm@chia.net, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, mariusz.tkaczyk@linux.intel.com, kinga.tanska@linux.intel.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH mdadm v5 3/7] Create: Factor out add_disks() helpers X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org The Create function is massive with a very large number of variables. Reading and understanding the function is almost impossible. To help with this, factor out the two pass loop that adds the disks to the array. This moves about 160 lines into three new helper functions and removes a bunch of local variables from the main Create function. The main new helper function add_disks() does the two pass loop and calls into add_disk_to_super() and update_metadata(). Factoring out the latter two helpers also helps to reduce a ton of indentation. No functional changes intended. Signed-off-by: Logan Gunthorpe Acked-by: Kinga Tanska --- Create.c | 382 +++++++++++++++++++++++++++++++------------------------ 1 file changed, 213 insertions(+), 169 deletions(-) diff --git a/Create.c b/Create.c index 8ded81dc265d..6a0446644e04 100644 --- a/Create.c +++ b/Create.c @@ -91,6 +91,214 @@ int default_layout(struct supertype *st, int level, int verbose) return layout; } +static int add_disk_to_super(int mdfd, struct shape *s, struct context *c, + struct supertype *st, struct mddev_dev *dv, + struct mdinfo *info, int have_container, int major_num) +{ + dev_t rdev; + int fd; + + if (dv->disposition == 'j') { + info->disk.raid_disk = MD_DISK_ROLE_JOURNAL; + info->disk.state = (1<disk.raid_disk < s->raiddisks) { + info->disk.state = (1<disk.state = 0; + } + + if (dv->writemostly == FlagSet) { + if (major_num == BITMAP_MAJOR_CLUSTERED) { + pr_err("Can not set %s --write-mostly with a clustered bitmap\n",dv->devname); + return 1; + } else { + info->disk.state |= (1<failfast == FlagSet) + info->disk.state |= (1<ss->external && st->container_devnm[0]) + fd = open(dv->devname, O_RDWR); + else + fd = open(dv->devname, O_RDWR|O_EXCL); + + if (fd < 0) { + pr_err("failed to open %s after earlier success - aborting\n", + dv->devname); + return 1; + } + if (!fstat_is_blkdev(fd, dv->devname, &rdev)) + return 1; + info->disk.major = major(rdev); + info->disk.minor = minor(rdev); + } + if (fd >= 0) + remove_partitions(fd); + if (st->ss->add_to_super(st, &info->disk, fd, dv->devname, + dv->data_offset)) { + ioctl(mdfd, STOP_ARRAY, NULL); + return 1; + } + st->ss->getinfo_super(st, info, NULL); + + if (have_container && c->verbose > 0) + pr_err("Using %s for device %d\n", + map_dev(info->disk.major, info->disk.minor, 0), + info->disk.number); + + if (!have_container) { + /* getinfo_super might have lost these ... */ + info->disk.major = major(rdev); + info->disk.minor = minor(rdev); + } + + return 0; +} + +static int update_metadata(int mdfd, struct shape *s, struct supertype *st, + struct map_ent **map, struct mdinfo *info, + char *chosen_name) +{ + struct mdinfo info_new; + struct map_ent *me = NULL; + + /* check to see if the uuid has changed due to these + * metadata changes, and if so update the member array + * and container uuid. Note ->write_init_super clears + * the subarray cursor such that ->getinfo_super once + * again returns container info. + */ + st->ss->getinfo_super(st, &info_new, NULL); + if (st->ss->external && is_container(s->level) && + !same_uuid(info_new.uuid, info->uuid, 0)) { + map_update(map, fd2devnm(mdfd), + info_new.text_version, + info_new.uuid, chosen_name); + me = map_by_devnm(map, st->container_devnm); + } + + if (st->ss->write_init_super(st)) { + st->ss->free_super(st); + return 1; + } + + /* + * Before activating the array, perform extra steps + * required to configure the internal write-intent + * bitmap. + */ + if (info_new.consistency_policy == CONSISTENCY_POLICY_BITMAP && + st->ss->set_bitmap && st->ss->set_bitmap(st, info)) { + st->ss->free_super(st); + return 1; + } + + /* update parent container uuid */ + if (me) { + char *path = xstrdup(me->path); + + st->ss->getinfo_super(st, &info_new, NULL); + map_update(map, st->container_devnm, info_new.text_version, + info_new.uuid, path); + free(path); + } + + flush_metadata_updates(st); + st->ss->free_super(st); + + return 0; +} + +static int add_disks(int mdfd, struct mdinfo *info, struct shape *s, + struct context *c, struct supertype *st, + struct map_ent **map, struct mddev_dev *devlist, + int total_slots, int have_container, int insert_point, + int major_num, char *chosen_name) +{ + struct mddev_dev *moved_disk = NULL; + int pass, raid_disk_num, dnum; + struct mddev_dev *dv; + struct mdinfo *infos; + int ret = 0; + + infos = xmalloc(sizeof(*infos) * total_slots); + enable_fds(total_slots); + for (pass = 1; pass <= 2; pass++) { + for (dnum = 0, raid_disk_num = 0, dv = devlist; dv; + dv = (dv->next) ? (dv->next) : moved_disk, dnum++) { + if (dnum >= total_slots) + abort(); + if (dnum == insert_point) { + raid_disk_num += 1; + moved_disk = dv; + continue; + } + if (strcasecmp(dv->devname, "missing") == 0) { + raid_disk_num += 1; + continue; + } + if (have_container) + moved_disk = NULL; + if (have_container && dnum < total_slots - 1) + /* repeatedly use the container */ + moved_disk = dv; + + switch(pass) { + case 1: + infos[dnum] = *info; + infos[dnum].disk.number = dnum; + infos[dnum].disk.raid_disk = raid_disk_num++; + + if (dv->disposition == 'j') + raid_disk_num--; + + ret = add_disk_to_super(mdfd, s, c, st, dv, + &infos[dnum], have_container, + major_num); + if (ret) + goto out; + + break; + case 2: + infos[dnum].errors = 0; + + ret = add_disk(mdfd, st, info, &infos[dnum]); + if (ret) { + pr_err("ADD_NEW_DISK for %s failed: %s\n", + dv->devname, strerror(errno)); + if (errno == EINVAL && + info->array.level == 0) { + pr_err("Possibly your kernel doesn't support RAID0 layouts.\n"); + pr_err("Either upgrade, or use --layout=dangerous\n"); + } + goto out; + } + break; + } + if (!have_container && + dv == moved_disk && dnum != insert_point) break; + } + + if (pass == 1) { + ret = update_metadata(mdfd, s, st, map, info, + chosen_name); + if (ret) + goto out; + } + } + +out: + free(infos); + return ret; +} + int Create(struct supertype *st, char *mddev, char *name, int *uuid, int subdevs, struct mddev_dev *devlist, @@ -117,7 +325,7 @@ int Create(struct supertype *st, char *mddev, unsigned long long minsize = 0, maxsize = 0; char *mindisc = NULL; char *maxdisc = NULL; - int dnum, raid_disk_num; + int dnum; struct mddev_dev *dv; dev_t rdev; int fail = 0, warn = 0; @@ -126,14 +334,13 @@ int Create(struct supertype *st, char *mddev, int missing_disks = 0; int insert_point = subdevs * 2; /* where to insert a missing drive */ int total_slots; - int pass; int rv; int bitmap_fd; int have_container = 0; int container_fd = -1; int need_mdmon = 0; unsigned long long bitmapsize; - struct mdinfo info, *infos; + struct mdinfo info; int did_default = 0; int do_default_layout = 0; int do_default_chunk = 0; @@ -869,174 +1076,11 @@ int Create(struct supertype *st, char *mddev, } } - infos = xmalloc(sizeof(*infos) * total_slots); - enable_fds(total_slots); - for (pass = 1; pass <= 2; pass++) { - struct mddev_dev *moved_disk = NULL; /* the disk that was moved out of the insert point */ - - for (dnum = 0, raid_disk_num = 0, dv = devlist; dv; - dv = (dv->next) ? (dv->next) : moved_disk, dnum++) { - int fd; - struct mdinfo *inf = &infos[dnum]; - - if (dnum >= total_slots) - abort(); - if (dnum == insert_point) { - raid_disk_num += 1; - moved_disk = dv; - continue; - } - if (strcasecmp(dv->devname, "missing") == 0) { - raid_disk_num += 1; - continue; - } - if (have_container) - moved_disk = NULL; - if (have_container && dnum < info.array.raid_disks - 1) - /* repeatedly use the container */ - moved_disk = dv; - - switch(pass) { - case 1: - *inf = info; - - inf->disk.number = dnum; - inf->disk.raid_disk = raid_disk_num++; - - if (dv->disposition == 'j') { - inf->disk.raid_disk = MD_DISK_ROLE_JOURNAL; - inf->disk.state = (1<disk.raid_disk < s->raiddisks) - inf->disk.state = (1<disk.state = 0; - - if (dv->writemostly == FlagSet) { - if (major_num == BITMAP_MAJOR_CLUSTERED) { - pr_err("Can not set %s --write-mostly with a clustered bitmap\n",dv->devname); - goto abort_locked; - } else - inf->disk.state |= (1<failfast == FlagSet) - inf->disk.state |= (1<ss->external && - st->container_devnm[0]) - fd = open(dv->devname, O_RDWR); - else - fd = open(dv->devname, O_RDWR|O_EXCL); - - if (fd < 0) { - pr_err("failed to open %s after earlier success - aborting\n", - dv->devname); - goto abort_locked; - } - if (!fstat_is_blkdev(fd, dv->devname, &rdev)) - goto abort_locked; - inf->disk.major = major(rdev); - inf->disk.minor = minor(rdev); - } - if (fd >= 0) - remove_partitions(fd); - if (st->ss->add_to_super(st, &inf->disk, - fd, dv->devname, - dv->data_offset)) { - ioctl(mdfd, STOP_ARRAY, NULL); - goto abort_locked; - } - st->ss->getinfo_super(st, inf, NULL); - - if (have_container && c->verbose > 0) - pr_err("Using %s for device %d\n", - map_dev(inf->disk.major, - inf->disk.minor, - 0), dnum); - - if (!have_container) { - /* getinfo_super might have lost these ... */ - inf->disk.major = major(rdev); - inf->disk.minor = minor(rdev); - } - break; - case 2: - inf->errors = 0; - - rv = add_disk(mdfd, st, &info, inf); - - if (rv) { - pr_err("ADD_NEW_DISK for %s failed: %s\n", - dv->devname, strerror(errno)); - if (errno == EINVAL && - info.array.level == 0) { - pr_err("Possibly your kernel doesn't support RAID0 layouts.\n"); - pr_err("Either upgrade, or use --layout=dangerous\n"); - } - goto abort_locked; - } - break; - } - if (!have_container && - dv == moved_disk && dnum != insert_point) break; - } - if (pass == 1) { - struct mdinfo info_new; - struct map_ent *me = NULL; - - /* check to see if the uuid has changed due to these - * metadata changes, and if so update the member array - * and container uuid. Note ->write_init_super clears - * the subarray cursor such that ->getinfo_super once - * again returns container info. - */ - st->ss->getinfo_super(st, &info_new, NULL); - if (st->ss->external && !is_container(s->level) && - !same_uuid(info_new.uuid, info.uuid, 0)) { - map_update(&map, fd2devnm(mdfd), - info_new.text_version, - info_new.uuid, chosen_name); - me = map_by_devnm(&map, st->container_devnm); - } - - if (st->ss->write_init_super(st)) { - st->ss->free_super(st); - goto abort_locked; - } - /* - * Before activating the array, perform extra steps - * required to configure the internal write-intent - * bitmap. - */ - if (info_new.consistency_policy == - CONSISTENCY_POLICY_BITMAP && - st->ss->set_bitmap && - st->ss->set_bitmap(st, &info)) { - st->ss->free_super(st); - goto abort_locked; - } - - /* update parent container uuid */ - if (me) { - char *path = xstrdup(me->path); - - st->ss->getinfo_super(st, &info_new, NULL); - map_update(&map, st->container_devnm, - info_new.text_version, - info_new.uuid, path); - free(path); - } + if (add_disks(mdfd, &info, s, c, st, &map, devlist, total_slots, + have_container, insert_point, major_num, chosen_name)) + goto abort_locked; - flush_metadata_updates(st); - st->ss->free_super(st); - } - } map_unlock(&map); - free(infos); if (is_container(s->level)) { /* No need to start. But we should signal udev to From patchwork Wed Nov 16 23:50:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 13045943 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDEC9C433FE for ; Wed, 16 Nov 2022 23:50:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238722AbiKPXuX (ORCPT ); Wed, 16 Nov 2022 18:50:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234704AbiKPXuQ (ORCPT ); Wed, 16 Nov 2022 18:50:16 -0500 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D17831FB9 for ; Wed, 16 Nov 2022 15:50:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=Vs88+o17X8gJhTh4HNSRTOQqoUl4H5167igPUxm1Fns=; b=t4wpjJk1/fMb3BwHXqdRzJE80f 6E4upnBwOR0lwyyFgAHyvHltjs7vitjsmR1ZJXVmU9cwsV4yCQRG5UpdgpSN4JgKP6yb47GF+AwJl vbWC0UR5nHWdUn7ICQJRlkSJV+7i63Hh0lAQPQLGCuyp9TNqFsd6kp7FkG7x/c0pkv1fMYFg82O74 0JCs0+QvZh2S5aKA2AYB61UrxNZ3LFqQfsUWD3z4BNVXR3XY92ju96gY9NCzkkRPhxA7UhHQopLjR fezzrlIMyIEG0p9CZn0qSpCBbyUTKy8dhIR81wiRaAUmrfLgF55AdUjDDvFjt5b7VtCE4Hy6KVxeD MNQubGjg==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ovSAW-0043zW-Rz; Wed, 16 Nov 2022 16:50:14 -0700 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1ovSAW-000KnO-OT; Wed, 16 Nov 2022 16:50:12 -0700 From: Logan Gunthorpe To: linux-raid@vger.kernel.org, Jes Sorensen Cc: Guoqing Jiang , Xiao Ni , Mariusz Tkaczyk , Coly Li , Chaitanya Kulkarni , Jonmichael Hands , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Kinga Tanska Date: Wed, 16 Nov 2022 16:50:06 -0700 Message-Id: <20221116235009.79875-5-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221116235009.79875-1-logang@deltatee.com> References: <20221116235009.79875-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-raid@vger.kernel.org, jes@trained-monkey.org, guoqing.jiang@linux.dev, xni@redhat.com, colyli@suse.de, chaitanyak@nvidia.com, jm@chia.net, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, mariusz.tkaczyk@linux.intel.com, kinga.tanska@linux.intel.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH mdadm v5 4/7] mdadm: Introduce pr_info() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org Feedback was given to avoid informational pr_err() calls that print to stderr, even though that's done all through out the code. Using printf() directly doesn't maintain the same format (an "mdadm" prefix on every line. So introduce pr_info() which prints to stdout with the same format and use it for a couple informational pr_err() calls in Create(). Future work can make this call used in more cases. Signed-off-by: Logan Gunthorpe Acked-by: Kinga Tanska --- Create.c | 7 ++++--- mdadm.h | 2 ++ 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/Create.c b/Create.c index 6a0446644e04..4acda30c5256 100644 --- a/Create.c +++ b/Create.c @@ -984,11 +984,12 @@ int Create(struct supertype *st, char *mddev, mdi = sysfs_read(-1, devnm, GET_VERSION); - pr_err("Creating array inside %s container %s\n", + pr_info("Creating array inside %s container %s\n", mdi?mdi->text_version:"managed", devnm); sysfs_free(mdi); } else - pr_err("Defaulting to version %s metadata\n", info.text_version); + pr_info("Defaulting to version %s metadata\n", + info.text_version); } map_update(&map, fd2devnm(mdfd), info.text_version, @@ -1145,7 +1146,7 @@ int Create(struct supertype *st, char *mddev, ioctl(mdfd, RESTART_ARRAY_RW, NULL); } if (c->verbose >= 0) - pr_err("array %s started.\n", mddev); + pr_info("array %s started.\n", mddev); if (st->ss->external && st->container_devnm[0]) { if (need_mdmon) start_mdmon(st->container_devnm); diff --git a/mdadm.h b/mdadm.h index 3673494e560b..18c24915e94c 100644 --- a/mdadm.h +++ b/mdadm.h @@ -1798,6 +1798,8 @@ static inline int xasprintf(char **strp, const char *fmt, ...) { #endif #define cont_err(fmt ...) fprintf(stderr, " " fmt) +#define pr_info(fmt, args...) printf("%s: "fmt, Name, ##args) + void *xmalloc(size_t len); void *xrealloc(void *ptr, size_t len); void *xcalloc(size_t num, size_t size); From patchwork Wed Nov 16 23:50:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 13045945 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E076C433FE for ; Wed, 16 Nov 2022 23:50:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238854AbiKPXu1 (ORCPT ); Wed, 16 Nov 2022 18:50:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237359AbiKPXuR (ORCPT ); Wed, 16 Nov 2022 18:50:17 -0500 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 60CC324F16 for ; Wed, 16 Nov 2022 15:50:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=0qvuAlawolcCLxr5rOJ2frNjI+CYZR0rofieicqHXsE=; b=WyyrIyCe3dksDPNqbbCeGGYU7G iSLlae4CUAyQH3k3VAX85UKINht2ud8CfNNlUD7o4fwAZf9Vlx/J+yfTbkQIReVtG6SK6tVkgRMKy fqEW6SsOAzV2Ir4cBWnPCXtDz3z4BUltJUE3dwYGa69sFp4CTLs0QuqXcWU8pBjkzgosxGb+qbarv FU5qbPGRZM/VvnOLv0tThEbYKIcnzWIQEJx3TzEuuCO0Spe+itTNNCD5TvXadK14TA+MtJ4sUGuHU Sk8G2roS9kttu4PzBaRxAGmcTqpWjYYZZrRAXqFBX9IaH1AxMKqdHn38Mkwz/vs7WpgvI69xlR5Uo Q74wzNXw==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ovSAY-0043zQ-6M; Wed, 16 Nov 2022 16:50:15 -0700 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1ovSAW-000KnR-SW; Wed, 16 Nov 2022 16:50:12 -0700 From: Logan Gunthorpe To: linux-raid@vger.kernel.org, Jes Sorensen Cc: Guoqing Jiang , Xiao Ni , Mariusz Tkaczyk , Coly Li , Chaitanya Kulkarni , Jonmichael Hands , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Kinga Tanska Date: Wed, 16 Nov 2022 16:50:07 -0700 Message-Id: <20221116235009.79875-6-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221116235009.79875-1-logang@deltatee.com> References: <20221116235009.79875-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-raid@vger.kernel.org, jes@trained-monkey.org, guoqing.jiang@linux.dev, xni@redhat.com, colyli@suse.de, chaitanyak@nvidia.com, jm@chia.net, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, mariusz.tkaczyk@linux.intel.com, kinga.tanska@linux.intel.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH mdadm v5 5/7] mdadm: Add --write-zeros option for Create X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org Add the --write-zeros option for Create which will send a write zeros request to all the disks before assembling the array. After zeroing the array, the disks will be in a known clean state and the initial sync may be skipped. Writing zeroes is best used when there is a hardware offload method to zero the data. But even still, zeroing can take several minutes on a large device. Because of this, all disks are zeroed in parallel using their own forked process and a message is printed to the user. The main process will proceed only after all the zeroing processes have completed successfully. Signed-off-by: Logan Gunthorpe Acked-by: Kinga Tanska --- Create.c | 173 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- ReadMe.c | 2 + mdadm.c | 9 +++ mdadm.h | 5 ++ 4 files changed, 187 insertions(+), 2 deletions(-) diff --git a/Create.c b/Create.c index 4acda30c5256..11636efbb12b 100644 --- a/Create.c +++ b/Create.c @@ -26,6 +26,10 @@ #include "md_u.h" #include "md_p.h" #include +#include +#include +#include +#include static int round_size_and_verify(unsigned long long *size, int chunk) { @@ -91,9 +95,146 @@ int default_layout(struct supertype *st, int level, int verbose) return layout; } +static pid_t write_zeroes_fork(int fd, struct shape *s, struct supertype *st, + struct mddev_dev *dv) + +{ + const unsigned long long req_size = 1 << 30; + unsigned long long offset_bytes, size_bytes, sz; + sigset_t sigset; + int ret = 0; + pid_t pid; + + size_bytes = KIB_TO_BYTES(s->size); + + /* + * If size_bytes is zero, this is a zoned raid array where + * each disk is of a different size and uses its full + * disk. Thus zero the entire disk. + */ + if (!size_bytes && !get_dev_size(fd, dv->devname, &size_bytes)) + return -1; + + if (dv->data_offset != INVALID_SECTORS) + offset_bytes = SEC_TO_BYTES(dv->data_offset); + else + offset_bytes = SEC_TO_BYTES(st->data_offset); + + pr_info("zeroing data from %lld to %lld on: %s\n", + offset_bytes, size_bytes, dv->devname); + + pid = fork(); + if (pid < 0) { + pr_err("Could not fork to zero disks: %m\n"); + return pid; + } else if (pid != 0) { + return pid; + } + + sigemptyset(&sigset); + sigaddset(&sigset, SIGINT); + sigprocmask(SIG_UNBLOCK, &sigset, NULL); + + while (size_bytes) { + /* + * Split requests to the kernel into 1GB chunks seeing the + * fallocate() call is not interruptible and blocking a + * ctrl-c for several minutes is not desirable. + * + * 1GB is chosen as a compromise: the user may still have + * to wait several seconds if they ctrl-c on devices that + * zero slowly, but will reduce the number of requests + * required and thus the overhead on devices that perform + * better. + */ + sz = size_bytes; + if (sz >= req_size) + sz = req_size; + + if (fallocate(fd, FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE, + offset_bytes, sz)) { + pr_err("zeroing %s failed: %m\n", dv->devname); + ret = 1; + break; + } + + offset_bytes += sz; + size_bytes -= sz; + } + + exit(ret); +} + +static int wait_for_zero_forks(int *zero_pids, int count) +{ + int wstatus, ret = 0, i, sfd, wait_count = 0; + struct signalfd_siginfo fdsi; + bool interrupted; + sigset_t sigset; + ssize_t s; + + for (i = 0; i < count; i++) + if (zero_pids[i]) + wait_count++; + if (!wait_count) + return 0; + + sigemptyset(&sigset); + sigaddset(&sigset, SIGINT); + sigaddset(&sigset, SIGCHLD); + sigprocmask(SIG_BLOCK, &sigset, NULL); + + sfd = signalfd(-1, &sigset, 0); + if (sfd < 0) { + pr_err("Unable to create signalfd: %m"); + return 1; + } + + while (1) { + s = read(sfd, &fdsi, sizeof(fdsi)); + if (s != sizeof(fdsi)) { + pr_err("Invalid signalfd read: %m"); + close(sfd); + return 1; + } + + if (fdsi.ssi_signo == SIGINT) { + printf("\n"); + pr_info("Interrupting zeroing processes, please wait...\n"); + interrupted = true; + } else if (fdsi.ssi_signo == SIGCHLD) { + if (!--wait_count) + break; + } + } + + close(sfd); + + for (i = 0; i < count; i++) { + if (!zero_pids[i]) + continue; + + waitpid(zero_pids[i], &wstatus, 0); + zero_pids[i] = 0; + if (!WIFEXITED(wstatus) || WEXITSTATUS(wstatus)) + ret = 1; + } + + if (interrupted) + return 1; + + if (ret) + pr_err("zeroing failed!\n"); + else + pr_info("zeroing finished\n"); + + return ret; +} + static int add_disk_to_super(int mdfd, struct shape *s, struct context *c, struct supertype *st, struct mddev_dev *dv, - struct mdinfo *info, int have_container, int major_num) + struct mdinfo *info, int have_container, int major_num, + int *zero_pid) { dev_t rdev; int fd; @@ -148,6 +289,14 @@ static int add_disk_to_super(int mdfd, struct shape *s, struct context *c, } st->ss->getinfo_super(st, info, NULL); + if (fd >= 0 && s->write_zeroes) { + *zero_pid = write_zeroes_fork(fd, s, st, dv); + if (*zero_pid <= 0) { + ioctl(mdfd, STOP_ARRAY, NULL); + return 1; + } + } + if (have_container && c->verbose > 0) pr_err("Using %s for device %d\n", map_dev(info->disk.major, info->disk.minor, 0), @@ -224,10 +373,23 @@ static int add_disks(int mdfd, struct mdinfo *info, struct shape *s, { struct mddev_dev *moved_disk = NULL; int pass, raid_disk_num, dnum; + int zero_pids[total_slots]; struct mddev_dev *dv; struct mdinfo *infos; + sigset_t sigset, orig_sigset; int ret = 0; + /* + * Block SIGINT so the main thread will always wait for the + * zeroing processes when being interrupted. Otherwise the + * zeroing processes will finish their work in the background + * keeping the disk busy. + */ + sigemptyset(&sigset); + sigaddset(&sigset, SIGINT); + sigprocmask(SIG_BLOCK, &sigset, &orig_sigset); + memset(zero_pids, 0, sizeof(zero_pids)); + infos = xmalloc(sizeof(*infos) * total_slots); enable_fds(total_slots); for (pass = 1; pass <= 2; pass++) { @@ -261,7 +423,7 @@ static int add_disks(int mdfd, struct mdinfo *info, struct shape *s, ret = add_disk_to_super(mdfd, s, c, st, dv, &infos[dnum], have_container, - major_num); + major_num, &zero_pids[dnum]); if (ret) goto out; @@ -287,6 +449,10 @@ static int add_disks(int mdfd, struct mdinfo *info, struct shape *s, } if (pass == 1) { + ret = wait_for_zero_forks(zero_pids, total_slots); + if (ret) + goto out; + ret = update_metadata(mdfd, s, st, map, info, chosen_name); if (ret) @@ -295,7 +461,10 @@ static int add_disks(int mdfd, struct mdinfo *info, struct shape *s, } out: + if (ret) + wait_for_zero_forks(zero_pids, total_slots); free(infos); + sigprocmask(SIG_SETMASK, &orig_sigset, NULL); return ret; } diff --git a/ReadMe.c b/ReadMe.c index 50a5e36d05fc..9424bfc3eeca 100644 --- a/ReadMe.c +++ b/ReadMe.c @@ -138,6 +138,7 @@ struct option long_options[] = { {"size", 1, 0, 'z'}, {"auto", 1, 0, Auto}, /* also for --assemble */ {"assume-clean",0,0, AssumeClean }, + {"write-zeroes",0,0, WriteZeroes }, {"metadata", 1, 0, 'e'}, /* superblock format */ {"bitmap", 1, 0, Bitmap}, {"bitmap-chunk", 1, 0, BitmapChunk}, @@ -390,6 +391,7 @@ char Help_create[] = " --write-journal= : Specify journal device for RAID-4/5/6 array\n" " --consistency-policy= : Specify the policy that determines how the array\n" " -k : maintains consistency in case of unexpected shutdown.\n" +" --write-zeroes : Write zeroes to the disks before creating. This will bypass initial sync.\n" "\n" ; diff --git a/mdadm.c b/mdadm.c index 972adb524dfb..141838bd394f 100644 --- a/mdadm.c +++ b/mdadm.c @@ -602,6 +602,10 @@ int main(int argc, char *argv[]) s.assume_clean = 1; continue; + case O(CREATE, WriteZeroes): + s.write_zeroes = 1; + continue; + case O(GROW,'n'): case O(CREATE,'n'): case O(BUILD,'n'): /* number of raid disks */ @@ -1306,6 +1310,11 @@ int main(int argc, char *argv[]) } } + if (s.write_zeroes && !s.assume_clean) { + pr_info("Disk zeroing requested, setting --assume-clean to skip resync\n"); + s.assume_clean = 1; + } + if (!mode && devs_found) { mode = MISC; devmode = 'Q'; diff --git a/mdadm.h b/mdadm.h index 18c24915e94c..82e920fb523a 100644 --- a/mdadm.h +++ b/mdadm.h @@ -273,6 +273,9 @@ static inline void __put_unaligned32(__u32 val, void *p) #define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0])) +#define KIB_TO_BYTES(x) ((x) << 10) +#define SEC_TO_BYTES(x) ((x) << 9) + extern const char Name[]; struct md_bb_entry { @@ -433,6 +436,7 @@ extern char Version[], Usage[], Help[], OptionHelp[], */ enum special_options { AssumeClean = 300, + WriteZeroes, BitmapChunk, WriteBehind, ReAdd, @@ -593,6 +597,7 @@ struct shape { int bitmap_chunk; char *bitmap_file; int assume_clean; + bool write_zeroes; int write_behind; unsigned long long size; unsigned long long data_offset; From patchwork Wed Nov 16 23:50:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 13045942 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE980C4332F for ; Wed, 16 Nov 2022 23:50:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233023AbiKPXuV (ORCPT ); Wed, 16 Nov 2022 18:50:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41856 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234881AbiKPXuQ (ORCPT ); Wed, 16 Nov 2022 18:50:16 -0500 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7ADFD31229 for ; Wed, 16 Nov 2022 15:50:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=fsT4GyZraUULLUCSqNPFpxDD29UsbWsjXCW1JwNTiNU=; b=VVXkauRoa9FKUq7qtVH3i5VYNL 3O0ONQ6ISHg5I37Sk82Lewdayhzk/ze82K70sNa1c659LWNMRaFIqCy5xxUTtVij7vhqELrQKvW1a ybxyOBHdt5FNzIxGP8Ldk1bgKz/xhyvfLYEobVEUw8WIveEwIYAAV7MRier8jwMwuiEToESSa5wgg EgnP0CNAiYx7CqbT5OKPh5f+/l1wIdx4exgpQSxU9HPGM5+RPNDa62l4bQocVMEvlgEXCnvlBXXjp lIEHcQga6m9i3I6D5tcfHgN2A59Sfe0HwK+/X76k5RYpQBhR2kspKKIpsF+7TvJ7WQjW1ELFB3r1z w1m7Pw0Q==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ovSAX-0043zO-Hp; Wed, 16 Nov 2022 16:50:14 -0700 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1ovSAX-000KnU-0I; Wed, 16 Nov 2022 16:50:13 -0700 From: Logan Gunthorpe To: linux-raid@vger.kernel.org, Jes Sorensen Cc: Guoqing Jiang , Xiao Ni , Mariusz Tkaczyk , Coly Li , Chaitanya Kulkarni , Jonmichael Hands , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Kinga Tanska Date: Wed, 16 Nov 2022 16:50:08 -0700 Message-Id: <20221116235009.79875-7-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221116235009.79875-1-logang@deltatee.com> References: <20221116235009.79875-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-raid@vger.kernel.org, jes@trained-monkey.org, guoqing.jiang@linux.dev, xni@redhat.com, colyli@suse.de, chaitanyak@nvidia.com, jm@chia.net, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, mariusz.tkaczyk@linux.intel.com, kinga.tanska@linux.intel.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH mdadm v5 6/7] tests/00raid5-zero: Introduce test to exercise --write-zeros. X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org Attempt to create a raid5 array with --write-zeros. If it is successful check the array to ensure it is in sync. If it is unsuccessful and an unsupported error is printed, skip the test. Signed-off-by: Logan Gunthorpe Acked-by: Kinga Tanska --- tests/00raid5-zero | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 tests/00raid5-zero diff --git a/tests/00raid5-zero b/tests/00raid5-zero new file mode 100644 index 000000000000..7d0f05a12539 --- /dev/null +++ b/tests/00raid5-zero @@ -0,0 +1,12 @@ + +if mdadm -CfR $md0 -l 5 -n3 $dev0 $dev1 $dev2 --write-zeroes ; then + check nosync + echo check > /sys/block/md0/md/sync_action; + check wait +elif grep "zeroing [^ ]* failed: Operation not supported" \ + $targetdir/stderr; then + echo "write-zeros not supported, skipping" +else + echo >&2 "ERROR: mdadm return failure without not supported message" + exit 1 +fi From patchwork Wed Nov 16 23:50:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 13045941 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A680C433FE for ; Wed, 16 Nov 2022 23:50:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238534AbiKPXuU (ORCPT ); Wed, 16 Nov 2022 18:50:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237236AbiKPXuQ (ORCPT ); Wed, 16 Nov 2022 18:50:16 -0500 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 79EA12C11A for ; Wed, 16 Nov 2022 15:50:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=bEvuirV01yQz9L2a6Cvbj9KL8+zwytplhbfj5fXDfJI=; b=Cb83onqcKFFzhQdVTnujtAKUXF c1fY9QPAbSULlfVvhj94r5yWaQe51F0usCMmM/boBCRUaJmNydDvmcaFXrxs4an2GSnfR3sKgQpcO oE4N2MRiApZILsXB1vS5d3gF//Fkdec/B08CsdbBenfno2Wd0TeEvpwudj/a/7xMmaMGXZI9B2Jxg 241zKy5yoeqr9pfvEDLJlo3759R4luzx0gWE8Co5hUsSdF2ga+oXd6YoavDKZ5krAz4p4xV/IGOHt FeURuKMQeheGpEdLW3x0zgzcmErU6FlS/KaIlCJUkHMEc+to/N6CYMIVBsqw8aV/hfTBZ0ODLV+Na WWnvYRLw==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ovSAX-0043zP-9o; Wed, 16 Nov 2022 16:50:14 -0700 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1ovSAX-000KnX-2i; Wed, 16 Nov 2022 16:50:13 -0700 From: Logan Gunthorpe To: linux-raid@vger.kernel.org, Jes Sorensen Cc: Guoqing Jiang , Xiao Ni , Mariusz Tkaczyk , Coly Li , Chaitanya Kulkarni , Jonmichael Hands , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Kinga Tanska Date: Wed, 16 Nov 2022 16:50:09 -0700 Message-Id: <20221116235009.79875-8-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221116235009.79875-1-logang@deltatee.com> References: <20221116235009.79875-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-raid@vger.kernel.org, jes@trained-monkey.org, guoqing.jiang@linux.dev, xni@redhat.com, colyli@suse.de, chaitanyak@nvidia.com, jm@chia.net, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, mariusz.tkaczyk@linux.intel.com, kinga.tanska@linux.intel.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH mdadm v5 7/7] manpage: Add --write-zeroes option to manpage X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org Document the new --write-zeroes option in the manpage. Signed-off-by: Logan Gunthorpe Acked-by: Kinga Tanska --- mdadm.8.in | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/mdadm.8.in b/mdadm.8.in index 70c79d1e6e76..3beb475fd955 100644 --- a/mdadm.8.in +++ b/mdadm.8.in @@ -837,6 +837,22 @@ array is resynced at creation. From Linux version 3.0, .B \-\-assume\-clean can be used with that command to avoid the automatic resync. +.TP +.BR \-\-write-zeroes +When creating an array, send write zeroes requests to all the block +devices. This should zero the data area on all disks such that the +initial sync is not necessary and, if successfull, will behave +as if +.B \-\-assume\-clean +was specified. +.IP +This is intended for use with devices that have hardware offload for +zeroing, but despit this zeroing can still take several minutes for +large disks. Thus a message is printed before and after zeroing and +each disk is zeroed in parallel with the others. +.IP +This is only meaningful with --create. + .TP .BR \-\-backup\-file= This is needed when