From patchwork Fri Feb 24 18:33:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Derrick X-Patchwork-Id: 13151671 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 377F6C7EE23 for ; Fri, 24 Feb 2023 18:36:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229888AbjBXSgi (ORCPT ); Fri, 24 Feb 2023 13:36:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229481AbjBXSgh (ORCPT ); Fri, 24 Feb 2023 13:36:37 -0500 X-Greylist: delayed 150 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Fri, 24 Feb 2023 10:36:36 PST Received: from resqmta-h1p-028589.sys.comcast.net (resqmta-h1p-028589.sys.comcast.net [IPv6:2001:558:fd02:2446::6]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C7746C52F for ; Fri, 24 Feb 2023 10:36:36 -0800 (PST) Received: from resomta-h1p-027911.sys.comcast.net ([96.102.179.202]) by resqmta-h1p-028589.sys.comcast.net with ESMTP id VX6dpHL7vEHHIVctRpD1Ws; Fri, 24 Feb 2023 18:34:05 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20211018a; t=1677263645; bh=OQQr3ZZ1aotkrNi+c3xKvuXGnWynaXu23fZ7KOEQ4jE=; h=Received:Received:From:To:Subject:Date:Message-Id:MIME-Version: Xfinity-Spam-Result; b=MZ8g5fkJpgrx+OyvwJdInsP+4YjfDg3IpQi5iTS6+9I+B9GdCCzf3N/yE+3L0Bsap Upwaa49yW2zt9nZMj+p+2qXvZZ0GzEmv6J3mGnWW70sCIiLRnXaTZ8Z+7Z/n2f9DfU UNyq5k0msVVWhMydJ6lg4dDtloOVE8yJVHrJMGuKbQfLrcjzZbG7DWp1fOZSzIR+Za VsRrVHDyr7XWfsRw0OSscR3LGkeqUQ80VsqZVtP/QL9SM2VNSvW4KAZdGxvcKdYZvY e0ZXAIuGBMP97xWOBdluwYQk45NiSbtU3paKg24P/D+SPds4fz1vYkaBgLwgXYlMRF eaaSBKIPt+QMg== Received: from jderrick-mobl4.amr.corp.intel.com ([71.205.181.50]) by resomta-h1p-027911.sys.comcast.net with ESMTPA id VcsvpymQMN2puVct6pamiB; Fri, 24 Feb 2023 18:33:45 +0000 X-Xfinity-VAAS: gggruggvucftvghtrhhoucdtuddrgedvhedrudekfedguddtlecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucevohhmtggrshhtqdftvghsihdpqfgfvfdppffquffrtefokffrnecuuegrihhlohhuthemuceftddunecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeflohhnrghthhgrnhcuffgvrhhrihgtkhcuoehjohhnrghthhgrnhdruggvrhhrihgtkheslhhinhhugidruggvvheqnecuggftrfgrthhtvghrnheptdetleejfffgffevhefhteevfeeuvdehveffffehtdejuedvvefgfedttdehfedtnecukfhppeejuddrvddthedrudekuddrhedtnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehhvghlohepjhguvghrrhhitghkqdhmohgslhegrdgrmhhrrdgtohhrphdrihhnthgvlhdrtghomhdpihhnvghtpeejuddrvddthedrudekuddrhedtpdhmrghilhhfrhhomhepjhhonhgrthhhrghnrdguvghrrhhitghksehlihhnuhigrdguvghvpdhnsggprhgtphhtthhopeelpdhrtghpthhtohepshhonhhgsehkvghrnhgvlhdrohhrghdprhgtphhtthhopehlihhnuhigqdhrrghiugesvhhgvghrrdhkvghrnhgvlhdrohhrghdprhgtphhtthhopehhrdhrvghinhgulhesthhhvghlohhunhhgvgdrnhgvthdprhgtphhtthhopeignhhisehrvgguhhgrthdrtghomhdprhgtphhtthhope hhtghhsehlshhtrdguvgdprhgtphhtthhopehhtghhsehinhhfrhgruggvrggurdhorhhg X-Xfinity-VMeta: sc=-100.00;st=legit From: Jonathan Derrick To: Song Liu , Cc: Reindl Harald , Xiao Ni , Christoph Hellwig , Christoph Hellwig , Paul Menzel , Sushma Kalakota , Jon Derrick Subject: [PATCH v5 1/3] md: Move sb writer loop to its own function Date: Fri, 24 Feb 2023 11:33:21 -0700 Message-Id: <20230224183323.638-2-jonathan.derrick@linux.dev> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230224183323.638-1-jonathan.derrick@linux.dev> References: <20230224183323.638-1-jonathan.derrick@linux.dev> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org From: Jon Derrick Preparatory patch for optimal I/O size calculation. Move the sb writer loop routine into its own function for clarity. Reviewed-by: Christoph Hellwig Signed-off-by: Jon Derrick --- drivers/md/md-bitmap.c | 125 +++++++++++++++++++++-------------------- 1 file changed, 65 insertions(+), 60 deletions(-) diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c index e7cc6ba1b657..cdc61e65abae 100644 --- a/drivers/md/md-bitmap.c +++ b/drivers/md/md-bitmap.c @@ -209,76 +209,81 @@ static struct md_rdev *next_active_rdev(struct md_rdev *rdev, struct mddev *mdde return NULL; } -static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait) +static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap, + struct page *page) { - struct md_rdev *rdev; struct block_device *bdev; struct mddev *mddev = bitmap->mddev; struct bitmap_storage *store = &bitmap->storage; + loff_t offset = mddev->bitmap_info.offset; + int size = PAGE_SIZE; + + bdev = (rdev->meta_bdev) ? rdev->meta_bdev : rdev->bdev; + if (page->index == store->file_pages - 1) { + int last_page_size = store->bytes & (PAGE_SIZE - 1); + + if (last_page_size == 0) + last_page_size = PAGE_SIZE; + size = roundup(last_page_size, + bdev_logical_block_size(bdev)); + } + + /* Just make sure we aren't corrupting data or metadata */ + if (mddev->external) { + /* Bitmap could be anywhere. */ + if (rdev->sb_start + offset + + (page->index * (PAGE_SIZE / SECTOR_SIZE)) + > rdev->data_offset && + rdev->sb_start + offset + < (rdev->data_offset + mddev->dev_sectors + + (PAGE_SIZE / SECTOR_SIZE))) + return -EINVAL; + } else if (offset < 0) { + /* DATA BITMAP METADATA */ + if (offset + + (long)(page->index * (PAGE_SIZE / SECTOR_SIZE)) + + size / SECTOR_SIZE > 0) + /* bitmap runs in to metadata */ + return -EINVAL; + + if (rdev->data_offset + mddev->dev_sectors + > rdev->sb_start + offset) + /* data runs in to bitmap */ + return -EINVAL; + } else if (rdev->sb_start < rdev->data_offset) { + /* METADATA BITMAP DATA */ + if (rdev->sb_start + offset + + page->index * (PAGE_SIZE / SECTOR_SIZE) + + size / SECTOR_SIZE > rdev->data_offset) + /* bitmap runs in to data */ + return -EINVAL; + } else { + /* DATA METADATA BITMAP - no problems */ + } -restart: - rdev = NULL; - while ((rdev = next_active_rdev(rdev, mddev)) != NULL) { - int size = PAGE_SIZE; - loff_t offset = mddev->bitmap_info.offset; + md_super_write(mddev, rdev, + rdev->sb_start + offset + + page->index * (PAGE_SIZE / SECTOR_SIZE), + size, page); + return 0; +} - bdev = (rdev->meta_bdev) ? rdev->meta_bdev : rdev->bdev; +static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait) +{ + struct md_rdev *rdev; + struct mddev *mddev = bitmap->mddev; + int ret; - if (page->index == store->file_pages-1) { - int last_page_size = store->bytes & (PAGE_SIZE-1); - if (last_page_size == 0) - last_page_size = PAGE_SIZE; - size = roundup(last_page_size, - bdev_logical_block_size(bdev)); - } - /* Just make sure we aren't corrupting data or - * metadata - */ - if (mddev->external) { - /* Bitmap could be anywhere. */ - if (rdev->sb_start + offset + (page->index - * (PAGE_SIZE/512)) - > rdev->data_offset - && - rdev->sb_start + offset - < (rdev->data_offset + mddev->dev_sectors - + (PAGE_SIZE/512))) - goto bad_alignment; - } else if (offset < 0) { - /* DATA BITMAP METADATA */ - if (offset - + (long)(page->index * (PAGE_SIZE/512)) - + size/512 > 0) - /* bitmap runs in to metadata */ - goto bad_alignment; - if (rdev->data_offset + mddev->dev_sectors - > rdev->sb_start + offset) - /* data runs in to bitmap */ - goto bad_alignment; - } else if (rdev->sb_start < rdev->data_offset) { - /* METADATA BITMAP DATA */ - if (rdev->sb_start - + offset - + page->index*(PAGE_SIZE/512) + size/512 - > rdev->data_offset) - /* bitmap runs in to data */ - goto bad_alignment; - } else { - /* DATA METADATA BITMAP - no problems */ + do { + rdev = NULL; + while ((rdev = next_active_rdev(rdev, mddev)) != NULL) { + ret = __write_sb_page(rdev, bitmap, page); + if (ret) + return ret; } - md_super_write(mddev, rdev, - rdev->sb_start + offset - + page->index * (PAGE_SIZE/512), - size, - page); - } + } while (wait && md_super_wait(mddev) < 0); - if (wait && md_super_wait(mddev) < 0) - goto restart; return 0; - - bad_alignment: - return -EINVAL; } static void md_bitmap_file_kick(struct bitmap *bitmap); From patchwork Fri Feb 24 18:33:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Derrick X-Patchwork-Id: 13151674 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40706C7EE2F for ; Fri, 24 Feb 2023 18:36:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229836AbjBXSgk (ORCPT ); Fri, 24 Feb 2023 13:36:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229832AbjBXSgi (ORCPT ); Fri, 24 Feb 2023 13:36:38 -0500 Received: from resqmta-h1p-028591.sys.comcast.net (resqmta-h1p-028591.sys.comcast.net [IPv6:2001:558:fd02:2446::9]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4DE4D6C52D for ; Fri, 24 Feb 2023 10:36:37 -0800 (PST) Received: from resomta-h1p-027911.sys.comcast.net ([96.102.179.202]) by resqmta-h1p-028591.sys.comcast.net with ESMTP id VX6dpz14qGC5dVctRpKfHM; Fri, 24 Feb 2023 18:34:05 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20211018a; t=1677263645; bh=5Z0w+ugex+J+O+Uy6m1Y4yEW69xQp3sPnb0QS7Al950=; h=Received:Received:From:To:Subject:Date:Message-Id:MIME-Version: Xfinity-Spam-Result; b=Wk4VaX/enhWV84ZBtNVkaNwZR7+cq4t/JOl62MQNIMCzqcRQDqPSlkf7Y5PcbJ8QN n3A4MLm6HJ/nAh9rwWXRJjtpgeIcI+wP7NBQW8UtyWgNHlCFpbQpl0e3NKjhaekxl+ dmnUn9H7RT3BtoRTnbt6QXnqGBKHm1Zgvy64f3P5aEPGAtXWtFtB2KZ2T/0EG6Bf7U e0wixWhx+Xp+RMEJVhjfmQMVeXW2360++WpFDYUgx9QSAI65fIPk+IjYwNE6VVGTIL xs+j3CIsYsh8QjgcaNlVVumCJl80Geq+N8z8Ma3N3uq7kOkFw7EthHc+YRb6xZT5I9 c7+0cyFFdyNYw== Received: from jderrick-mobl4.amr.corp.intel.com ([71.205.181.50]) by resomta-h1p-027911.sys.comcast.net with ESMTPA id VcsvpymQMN2puVct8pamiL; Fri, 24 Feb 2023 18:33:46 +0000 X-Xfinity-VAAS: gggruggvucftvghtrhhoucdtuddrgedvhedrudekfedguddtlecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucevohhmtggrshhtqdftvghsihdpqfgfvfdppffquffrtefokffrnecuuegrihhlohhuthemuceftddunecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeflohhnrghthhgrnhcuffgvrhhrihgtkhcuoehjohhnrghthhgrnhdruggvrhhrihgtkheslhhinhhugidruggvvheqnecuggftrfgrthhtvghrnheptdetleejfffgffevhefhteevfeeuvdehveffffehtdejuedvvefgfedttdehfedtnecukfhppeejuddrvddthedrudekuddrhedtnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehhvghlohepjhguvghrrhhitghkqdhmohgslhegrdgrmhhrrdgtohhrphdrihhnthgvlhdrtghomhdpihhnvghtpeejuddrvddthedrudekuddrhedtpdhmrghilhhfrhhomhepjhhonhgrthhhrghnrdguvghrrhhitghksehlihhnuhigrdguvghvpdhnsggprhgtphhtthhopeelpdhrtghpthhtohepshhonhhgsehkvghrnhgvlhdrohhrghdprhgtphhtthhopehlihhnuhigqdhrrghiugesvhhgvghrrdhkvghrnhgvlhdrohhrghdprhgtphhtthhopehhrdhrvghinhgulhesthhhvghlohhunhhgvgdrnhgvthdprhgtphhtthhopeignhhisehrvgguhhgrthdrtghomhdprhgtphhtthhope hhtghhsehlshhtrdguvgdprhgtphhtthhopehhtghhsehinhhfrhgruggvrggurdhorhhg X-Xfinity-VMeta: sc=-100.00;st=legit From: Jonathan Derrick To: Song Liu , Cc: Reindl Harald , Xiao Ni , Christoph Hellwig , Christoph Hellwig , Paul Menzel , Sushma Kalakota , Jon Derrick Subject: [PATCH v5 2/3] md: Fix types in sb writer Date: Fri, 24 Feb 2023 11:33:22 -0700 Message-Id: <20230224183323.638-3-jonathan.derrick@linux.dev> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230224183323.638-1-jonathan.derrick@linux.dev> References: <20230224183323.638-1-jonathan.derrick@linux.dev> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org From: Jon Derrick Page->index is a pgoff_t and multiplying could cause overflows on a 32-bit architecture. In the sb writer, this is used to calculate and verify the sector being used, and is multiplied by a sector value. Using sector_t will cast it to a u64 type and is the more appropriate type for the unit. Additionally, the integer size unit is converted to a sector unit in later calculations, and is now corrected to be an unsigned type. Finally, clean up the calculations using variable aliases to improve readabiliy. Reviewed-by: Christoph Hellwig Signed-off-by: Jon Derrick --- drivers/md/md-bitmap.c | 35 ++++++++++++++--------------------- 1 file changed, 14 insertions(+), 21 deletions(-) diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c index cdc61e65abae..bf250a5e3a86 100644 --- a/drivers/md/md-bitmap.c +++ b/drivers/md/md-bitmap.c @@ -215,12 +215,13 @@ static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap, struct block_device *bdev; struct mddev *mddev = bitmap->mddev; struct bitmap_storage *store = &bitmap->storage; - loff_t offset = mddev->bitmap_info.offset; - int size = PAGE_SIZE; + sector_t offset = mddev->bitmap_info.offset; + sector_t ps, sboff, doff; + unsigned int size = PAGE_SIZE; bdev = (rdev->meta_bdev) ? rdev->meta_bdev : rdev->bdev; if (page->index == store->file_pages - 1) { - int last_page_size = store->bytes & (PAGE_SIZE - 1); + unsigned int last_page_size = store->bytes & (PAGE_SIZE - 1); if (last_page_size == 0) last_page_size = PAGE_SIZE; @@ -228,43 +229,35 @@ static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap, bdev_logical_block_size(bdev)); } + ps = page->index * PAGE_SIZE / SECTOR_SIZE; + sboff = rdev->sb_start + offset; + doff = rdev->data_offset; + /* Just make sure we aren't corrupting data or metadata */ if (mddev->external) { /* Bitmap could be anywhere. */ - if (rdev->sb_start + offset - + (page->index * (PAGE_SIZE / SECTOR_SIZE)) - > rdev->data_offset && - rdev->sb_start + offset - < (rdev->data_offset + mddev->dev_sectors - + (PAGE_SIZE / SECTOR_SIZE))) + if (sboff + ps > doff && + sboff < (doff + mddev->dev_sectors + PAGE_SIZE / SECTOR_SIZE)) return -EINVAL; } else if (offset < 0) { /* DATA BITMAP METADATA */ - if (offset - + (long)(page->index * (PAGE_SIZE / SECTOR_SIZE)) - + size / SECTOR_SIZE > 0) + if (offset + ps + size / SECTOR_SIZE > 0) /* bitmap runs in to metadata */ return -EINVAL; - if (rdev->data_offset + mddev->dev_sectors - > rdev->sb_start + offset) + if (doff + mddev->dev_sectors > sboff) /* data runs in to bitmap */ return -EINVAL; } else if (rdev->sb_start < rdev->data_offset) { /* METADATA BITMAP DATA */ - if (rdev->sb_start + offset - + page->index * (PAGE_SIZE / SECTOR_SIZE) - + size / SECTOR_SIZE > rdev->data_offset) + if (sboff + ps + size / SECTOR_SIZE > doff) /* bitmap runs in to data */ return -EINVAL; } else { /* DATA METADATA BITMAP - no problems */ } - md_super_write(mddev, rdev, - rdev->sb_start + offset - + page->index * (PAGE_SIZE / SECTOR_SIZE), - size, page); + md_super_write(mddev, rdev, sboff + ps, (int) size, page); return 0; } From patchwork Fri Feb 24 18:33:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Derrick X-Patchwork-Id: 13151673 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63A6EC7EE30 for ; Fri, 24 Feb 2023 18:36:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229890AbjBXSgl (ORCPT ); Fri, 24 Feb 2023 13:36:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35168 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229887AbjBXSgi (ORCPT ); Fri, 24 Feb 2023 13:36:38 -0500 Received: from resqmta-h1p-028594.sys.comcast.net (resqmta-h1p-028594.sys.comcast.net [IPv6:2001:558:fd02:2446::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3ECB16C51F for ; Fri, 24 Feb 2023 10:36:37 -0800 (PST) Received: from resomta-h1p-027911.sys.comcast.net ([96.102.179.202]) by resqmta-h1p-028594.sys.comcast.net with ESMTP id VXICpGQKuzZgKVctRpvO6g; Fri, 24 Feb 2023 18:34:05 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20211018a; t=1677263645; bh=0VaiQ+Sfjsrp/FZM+TlYPCjyB5kxNAvjLskIS4PXKUc=; h=Received:Received:From:To:Subject:Date:Message-Id:MIME-Version: Xfinity-Spam-Result; b=ubCm1ybGx14TyzWDxFqCpEylURuKyYam9v9bGuMV5lar3rYrqXWnyzoMvub0Qs1Lz Tr0tNi5v8J9NNVoXw0JqNoNl/sDfPWmd75nH9uZW3ih/C/QZ6x1lggBmYRPgco49cF zj6peSYAa97U6r3xcVEJ28ezxgzoqETq/L7QQkj+JJLDjyfdr7A0GwPPfH6jlp4u9U FTa+1z2/7FRVDy8nGx+sBnub3z8sdnZE4lmcBbMzAiDUy+4OlDOm76JTnUq38NP/ZG 1G9m4DrFDWkGa6U8iB9GKHQk9VhgWlxkRN3pgO41oQ7K5wm0memASrZmWfRVCGM/Qi zVGTkPWoTBTPA== Received: from jderrick-mobl4.amr.corp.intel.com ([71.205.181.50]) by resomta-h1p-027911.sys.comcast.net with ESMTPA id VcsvpymQMN2puVct9pamiU; Fri, 24 Feb 2023 18:33:48 +0000 X-Xfinity-VAAS: gggruggvucftvghtrhhoucdtuddrgedvhedrudekfedguddtlecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucevohhmtggrshhtqdftvghsihdpqfgfvfdppffquffrtefokffrnecuuegrihhlohhuthemuceftddunecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefhvfevufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeflohhnrghthhgrnhcuffgvrhhrihgtkhcuoehjohhnrghthhgrnhdruggvrhhrihgtkheslhhinhhugidruggvvheqnecuggftrfgrthhtvghrnheptdetleejfffgffevhefhteevfeeuvdehveffffehtdejuedvvefgfedttdehfedtnecukfhppeejuddrvddthedrudekuddrhedtnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehhvghlohepjhguvghrrhhitghkqdhmohgslhegrdgrmhhrrdgtohhrphdrihhnthgvlhdrtghomhdpihhnvghtpeejuddrvddthedrudekuddrhedtpdhmrghilhhfrhhomhepjhhonhgrthhhrghnrdguvghrrhhitghksehlihhnuhigrdguvghvpdhnsggprhgtphhtthhopeelpdhrtghpthhtohepshhonhhgsehkvghrnhgvlhdrohhrghdprhgtphhtthhopehlihhnuhigqdhrrghiugesvhhgvghrrdhkvghrnhgvlhdrohhrghdprhgtphhtthhopehhrdhrvghinhgulhesthhhvghlohhunhhgvgdrnhgvthdprhgtphhtthhopeignhhisehrvgguhhgrthdrtghomhdprhgtphhtthhope hhtghhsehlshhtrdguvgdprhgtphhtthhopehhtghhsehinhhfrhgruggvrggurdhorhhg X-Xfinity-VMeta: sc=-100.00;st=legit From: Jonathan Derrick To: Song Liu , Cc: Reindl Harald , Xiao Ni , Christoph Hellwig , Christoph Hellwig , Paul Menzel , Sushma Kalakota , Jon Derrick Subject: [PATCH v5 3/3] md: Use optimal I/O size for last bitmap page Date: Fri, 24 Feb 2023 11:33:23 -0700 Message-Id: <20230224183323.638-4-jonathan.derrick@linux.dev> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230224183323.638-1-jonathan.derrick@linux.dev> References: <20230224183323.638-1-jonathan.derrick@linux.dev> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org From: Jon Derrick If the bitmap space has enough room, size the I/O for the last bitmap page write to the optimal I/O size for the storage device. The expanded write is checked that it won't overrun the data or metadata. The drive this was tested against has higher latencies when there are sub-4k writes due to device-side read-mod-writes of its atomic 4k write unit. This change helps increase performance by sizing the last bitmap page I/O for the device's preferred write unit, if it is given. Example Intel/Solidigm P5520 Raid10, Chunk-size 64M, bitmap-size 57228 bits $ mdadm --create /dev/md0 --level=10 --raid-devices=4 /dev/nvme{0,1,2,3}n1 --assume-clean --bitmap=internal --bitmap-chunk=64M $ fio --name=test --direct=1 --filename=/dev/md0 --rw=randwrite --bs=4k --runtime=60 Without patch: write: IOPS=1676, BW=6708KiB/s (6869kB/s)(393MiB/60001msec); 0 zone resets With patch: write: IOPS=15.7k, BW=61.4MiB/s (64.4MB/s)(3683MiB/60001msec); 0 zone resets Biosnoop: Without patch: Time Process PID Device LBA Size Lat 1.410377 md0_raid10 6900 nvme0n1 W 16 4096 0.02 1.410387 md0_raid10 6900 nvme2n1 W 16 4096 0.02 1.410374 md0_raid10 6900 nvme3n1 W 16 4096 0.01 1.410381 md0_raid10 6900 nvme1n1 W 16 4096 0.02 1.410411 md0_raid10 6900 nvme1n1 W 115346512 4096 0.01 1.410418 md0_raid10 6900 nvme0n1 W 115346512 4096 0.02 1.410915 md0_raid10 6900 nvme2n1 W 24 3584 0.43 <-- 1.410935 md0_raid10 6900 nvme3n1 W 24 3584 0.45 <-- 1.411124 md0_raid10 6900 nvme1n1 W 24 3584 0.64 <-- 1.411147 md0_raid10 6900 nvme0n1 W 24 3584 0.66 <-- 1.411176 md0_raid10 6900 nvme3n1 W 2019022184 4096 0.01 1.411189 md0_raid10 6900 nvme2n1 W 2019022184 4096 0.02 With patch: Time Process PID Device LBA Size Lat 5.747193 md0_raid10 727 nvme0n1 W 16 4096 0.01 5.747192 md0_raid10 727 nvme1n1 W 16 4096 0.02 5.747195 md0_raid10 727 nvme3n1 W 16 4096 0.01 5.747202 md0_raid10 727 nvme2n1 W 16 4096 0.02 5.747229 md0_raid10 727 nvme3n1 W 1196223704 4096 0.02 5.747224 md0_raid10 727 nvme0n1 W 1196223704 4096 0.01 5.747279 md0_raid10 727 nvme0n1 W 24 4096 0.01 <-- 5.747279 md0_raid10 727 nvme1n1 W 24 4096 0.02 <-- 5.747284 md0_raid10 727 nvme3n1 W 24 4096 0.02 <-- 5.747291 md0_raid10 727 nvme2n1 W 24 4096 0.02 <-- 5.747314 md0_raid10 727 nvme2n1 W 2234636712 4096 0.01 5.747317 md0_raid10 727 nvme1n1 W 2234636712 4096 0.02 Reviewed-by: Christoph Hellwig Signed-off-by: Jon Derrick --- drivers/md/md-bitmap.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c index bf250a5e3a86..920bb68156d2 100644 --- a/drivers/md/md-bitmap.c +++ b/drivers/md/md-bitmap.c @@ -209,6 +209,28 @@ static struct md_rdev *next_active_rdev(struct md_rdev *rdev, struct mddev *mdde return NULL; } +static unsigned int optimal_io_size(struct block_device *bdev, + unsigned int last_page_size, + unsigned int io_size) +{ + if (bdev_io_opt(bdev) > bdev_logical_block_size(bdev)) + return roundup(last_page_size, bdev_io_opt(bdev)); + return io_size; +} + +static unsigned int bitmap_io_size(unsigned int io_size, unsigned int opt_size, + sector_t start, sector_t boundary) +{ + if (io_size != opt_size && + start + opt_size / SECTOR_SIZE <= boundary) + return opt_size; + if (start + io_size / SECTOR_SIZE <= boundary) + return io_size; + + /* Overflows boundary */ + return 0; +} + static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap, struct page *page) { @@ -218,6 +240,7 @@ static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap, sector_t offset = mddev->bitmap_info.offset; sector_t ps, sboff, doff; unsigned int size = PAGE_SIZE; + unsigned int opt_size = PAGE_SIZE; bdev = (rdev->meta_bdev) ? rdev->meta_bdev : rdev->bdev; if (page->index == store->file_pages - 1) { @@ -225,8 +248,8 @@ static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap, if (last_page_size == 0) last_page_size = PAGE_SIZE; - size = roundup(last_page_size, - bdev_logical_block_size(bdev)); + size = roundup(last_page_size, bdev_logical_block_size(bdev)); + opt_size = optimal_io_size(bdev, last_page_size, size); } ps = page->index * PAGE_SIZE / SECTOR_SIZE; @@ -241,7 +264,8 @@ static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap, return -EINVAL; } else if (offset < 0) { /* DATA BITMAP METADATA */ - if (offset + ps + size / SECTOR_SIZE > 0) + size = bitmap_io_size(size, opt_size, offset + ps, 0); + if (size == 0) /* bitmap runs in to metadata */ return -EINVAL; @@ -250,7 +274,8 @@ static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap, return -EINVAL; } else if (rdev->sb_start < rdev->data_offset) { /* METADATA BITMAP DATA */ - if (sboff + ps + size / SECTOR_SIZE > doff) + size = bitmap_io_size(size, opt_size, sboff + ps, doff); + if (size == 0) /* bitmap runs in to data */ return -EINVAL; } else {