From patchwork Thu Oct 18 13:18:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10647221 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BBEB93B73 for ; Thu, 18 Oct 2018 13:18:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B04D128396 for ; Thu, 18 Oct 2018 13:18:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A3E76283A5; Thu, 18 Oct 2018 13:18:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5036A283A2 for ; Thu, 18 Oct 2018 13:18:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728140AbeJRVTc (ORCPT ); Thu, 18 Oct 2018 17:19:32 -0400 Received: from mx1.redhat.com ([209.132.183.28]:50798 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727615AbeJRVTc (ORCPT ); Thu, 18 Oct 2018 17:19:32 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 02DCC30020AF; Thu, 18 Oct 2018 13:18:32 +0000 (UTC) Received: from localhost (ovpn-8-18.pek2.redhat.com [10.72.8.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0207C611C9; Thu, 18 Oct 2018 13:18:30 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Vitaly Kuznetsov , Dave Chinner , Linux FS Devel , "Darrick J . Wong" , xfs@vger.kernel.org, Christoph Hellwig , Bart Van Assche , Matthew Wilcox Subject: [PATCH 1/5] block: warn on un-aligned DMA IO buffer Date: Thu, 18 Oct 2018 21:18:13 +0800 Message-Id: <20181018131817.11813-2-ming.lei@redhat.com> In-Reply-To: <20181018131817.11813-1-ming.lei@redhat.com> References: <20181018131817.11813-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Thu, 18 Oct 2018 13:18:32 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Now we only check if DMA IO buffer is aligned to queue_dma_alignment() for pass-through request, and it isn't done for normal IO request. Given the check has to be done on each bvec, it isn't efficient to add the check in generic_make_request_checks(). This patch addes one WARN in blk_queue_split() for capturing this issue. Cc: Vitaly Kuznetsov Cc: Dave Chinner Cc: Linux FS Devel Cc: Darrick J. Wong Cc: xfs@vger.kernel.org Cc: Dave Chinner Cc: Christoph Hellwig Cc: Bart Van Assche Cc: Matthew Wilcox Signed-off-by: Ming Lei Reviewed-by: Christoph Hellwig --- block/blk-merge.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/blk-merge.c b/block/blk-merge.c index 42a46744c11b..d2dbd508cb6d 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -174,6 +174,8 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, const unsigned max_sectors = get_max_io_size(q, bio); bio_for_each_segment(bv, bio, iter) { + WARN_ON_ONCE(queue_dma_alignment(q) & bv.bv_offset); + /* * If the queue doesn't support SG gaps and adding this * offset would create a gap, disallow it. From patchwork Thu Oct 18 13:18:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10647229 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3927213B0 for ; Thu, 18 Oct 2018 13:18:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2BAD4288BD for ; Thu, 18 Oct 2018 13:18:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1F8C0288CB; Thu, 18 Oct 2018 13:18:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A3536288BD for ; Thu, 18 Oct 2018 13:18:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728177AbeJRVTp (ORCPT ); Thu, 18 Oct 2018 17:19:45 -0400 Received: from mx1.redhat.com ([209.132.183.28]:28812 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727615AbeJRVTo (ORCPT ); Thu, 18 Oct 2018 17:19:44 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1F6A8315009D; Thu, 18 Oct 2018 13:18:44 +0000 (UTC) Received: from localhost (ovpn-8-18.pek2.redhat.com [10.72.8.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 00B7446E69; Thu, 18 Oct 2018 13:18:34 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Vitaly Kuznetsov , Dave Chinner , Linux FS Devel , "Darrick J . Wong" , xfs@vger.kernel.org, Christoph Hellwig , Bart Van Assche , Matthew Wilcox Subject: [PATCH 2/5] block: move .dma_alignment into q->limits Date: Thu, 18 Oct 2018 21:18:14 +0800 Message-Id: <20181018131817.11813-3-ming.lei@redhat.com> In-Reply-To: <20181018131817.11813-1-ming.lei@redhat.com> References: <20181018131817.11813-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Thu, 18 Oct 2018 13:18:44 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Turns out q->dma_alignement should be stack limit because now bvec table is immutalbe, the underlying queue's dma alignment has to be perceptible by stack driver, so IO buffer can be allocated as dma aligned before adding to bio. So this patch moves .dma_alignment into q->limits and prepares for making it as one stacked limit. Cc: Vitaly Kuznetsov Cc: Dave Chinner Cc: Linux FS Devel Cc: Darrick J. Wong Cc: xfs@vger.kernel.org Cc: Dave Chinner Cc: Christoph Hellwig Cc: Bart Van Assche Cc: Matthew Wilcox Signed-off-by: Ming Lei Reviewed-by: Christoph Hellwig Reviewed-by: Bart Van Assche --- block/blk-settings.c | 6 +++--- include/linux/blkdev.h | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/block/blk-settings.c b/block/blk-settings.c index ffd459969689..cf9cd241dc16 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -830,7 +830,7 @@ EXPORT_SYMBOL(blk_queue_virt_boundary); **/ void blk_queue_dma_alignment(struct request_queue *q, int mask) { - q->dma_alignment = mask; + q->limits.dma_alignment = mask; } EXPORT_SYMBOL(blk_queue_dma_alignment); @@ -852,8 +852,8 @@ void blk_queue_update_dma_alignment(struct request_queue *q, int mask) { BUG_ON(mask > PAGE_SIZE); - if (mask > q->dma_alignment) - q->dma_alignment = mask; + if (mask > q->limits.dma_alignment) + q->limits.dma_alignment = mask; } EXPORT_SYMBOL(blk_queue_update_dma_alignment); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 61207560e826..be938a31bc2e 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -366,6 +366,7 @@ struct queue_limits { unsigned long seg_boundary_mask; unsigned long virt_boundary_mask; + unsigned int dma_alignment; unsigned int max_hw_sectors; unsigned int max_dev_sectors; unsigned int chunk_sectors; @@ -561,7 +562,6 @@ struct request_queue { unsigned int dma_drain_size; void *dma_drain_buffer; unsigned int dma_pad_mask; - unsigned int dma_alignment; struct blk_queue_tag *queue_tags; @@ -1617,7 +1617,7 @@ static inline unsigned int bdev_zone_sectors(struct block_device *bdev) static inline int queue_dma_alignment(struct request_queue *q) { - return q ? q->dma_alignment : 511; + return q ? q->limits.dma_alignment : 511; } static inline int blk_rq_aligned(struct request_queue *q, unsigned long addr, From patchwork Thu Oct 18 13:18:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10647237 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0E64B15E2 for ; Thu, 18 Oct 2018 13:18:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 01FEB288B7 for ; Thu, 18 Oct 2018 13:18:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E8363288BD; Thu, 18 Oct 2018 13:18:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 76284288B7 for ; Thu, 18 Oct 2018 13:18:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727615AbeJRVTt (ORCPT ); Thu, 18 Oct 2018 17:19:49 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47967 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727509AbeJRVTt (ORCPT ); Thu, 18 Oct 2018 17:19:49 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2C0D030041F5; Thu, 18 Oct 2018 13:18:48 +0000 (UTC) Received: from localhost (ovpn-8-18.pek2.redhat.com [10.72.8.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1BA2561359; Thu, 18 Oct 2018 13:18:46 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Vitaly Kuznetsov , Dave Chinner , Linux FS Devel , "Darrick J . Wong" , xfs@vger.kernel.org, Christoph Hellwig , Bart Van Assche , Matthew Wilcox Subject: [PATCH 3/5] block: make dma_alignment as stacked limit Date: Thu, 18 Oct 2018 21:18:15 +0800 Message-Id: <20181018131817.11813-4-ming.lei@redhat.com> In-Reply-To: <20181018131817.11813-1-ming.lei@redhat.com> References: <20181018131817.11813-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.43]); Thu, 18 Oct 2018 13:18:48 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch converts .dma_alignment into stacked limit, so the stack driver may get updated with underlying dma alignment, and allocate IO buffer as queue DMA aligned. Cc: Vitaly Kuznetsov Cc: Dave Chinner Cc: Linux FS Devel Cc: Darrick J. Wong Cc: xfs@vger.kernel.org Cc: Dave Chinner Cc: Christoph Hellwig Cc: Bart Van Assche Cc: Matthew Wilcox Signed-off-by: Ming Lei --- block/blk-settings.c | 89 +++++++++++++++++++++++++++++----------------------- 1 file changed, 50 insertions(+), 39 deletions(-) diff --git a/block/blk-settings.c b/block/blk-settings.c index cf9cd241dc16..aef4510a99b6 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -525,6 +525,54 @@ void blk_queue_stack_limits(struct request_queue *t, struct request_queue *b) EXPORT_SYMBOL(blk_queue_stack_limits); /** + * blk_queue_dma_alignment - set dma length and memory alignment + * @q: the request queue for the device + * @mask: alignment mask + * + * description: + * set required memory and length alignment for direct dma transactions. + * this is used when building direct io requests for the queue. + * + **/ +void blk_queue_dma_alignment(struct request_queue *q, int mask) +{ + q->limits.dma_alignment = mask; +} +EXPORT_SYMBOL(blk_queue_dma_alignment); + +static int __blk_queue_update_dma_alignment(struct queue_limits *t, int mask) +{ + BUG_ON(mask >= PAGE_SIZE); + + if (mask > t->dma_alignment) + return mask; + else + return t->dma_alignment; +} + +/** + * blk_queue_update_dma_alignment - update dma length and memory alignment + * @q: the request queue for the device + * @mask: alignment mask + * + * description: + * update required memory and length alignment for direct dma transactions. + * If the requested alignment is larger than the current alignment, then + * the current queue alignment is updated to the new value, otherwise it + * is left alone. The design of this is to allow multiple objects + * (driver, device, transport etc) to set their respective + * alignments without having them interfere. + * + **/ +void blk_queue_update_dma_alignment(struct request_queue *q, int mask) +{ + q->limits.dma_alignment = + __blk_queue_update_dma_alignment(&q->limits, mask); +} +EXPORT_SYMBOL(blk_queue_update_dma_alignment); + + +/** * blk_stack_limits - adjust queue_limits for stacked devices * @t: the stacking driver limits (top device) * @b: the underlying queue limits (bottom, component device) @@ -563,6 +611,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, b->seg_boundary_mask); t->virt_boundary_mask = min_not_zero(t->virt_boundary_mask, b->virt_boundary_mask); + t->dma_alignment = __blk_queue_update_dma_alignment(t, + b->dma_alignment); t->max_segments = min_not_zero(t->max_segments, b->max_segments); t->max_discard_segments = min_not_zero(t->max_discard_segments, @@ -818,45 +868,6 @@ void blk_queue_virt_boundary(struct request_queue *q, unsigned long mask) } EXPORT_SYMBOL(blk_queue_virt_boundary); -/** - * blk_queue_dma_alignment - set dma length and memory alignment - * @q: the request queue for the device - * @mask: alignment mask - * - * description: - * set required memory and length alignment for direct dma transactions. - * this is used when building direct io requests for the queue. - * - **/ -void blk_queue_dma_alignment(struct request_queue *q, int mask) -{ - q->limits.dma_alignment = mask; -} -EXPORT_SYMBOL(blk_queue_dma_alignment); - -/** - * blk_queue_update_dma_alignment - update dma length and memory alignment - * @q: the request queue for the device - * @mask: alignment mask - * - * description: - * update required memory and length alignment for direct dma transactions. - * If the requested alignment is larger than the current alignment, then - * the current queue alignment is updated to the new value, otherwise it - * is left alone. The design of this is to allow multiple objects - * (driver, device, transport etc) to set their respective - * alignments without having them interfere. - * - **/ -void blk_queue_update_dma_alignment(struct request_queue *q, int mask) -{ - BUG_ON(mask > PAGE_SIZE); - - if (mask > q->limits.dma_alignment) - q->limits.dma_alignment = mask; -} -EXPORT_SYMBOL(blk_queue_update_dma_alignment); - void blk_queue_flush_queueable(struct request_queue *q, bool queueable) { if (queueable) From patchwork Thu Oct 18 13:18:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10647239 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D5F2515E2 for ; Thu, 18 Oct 2018 13:18:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C967E288B7 for ; Thu, 18 Oct 2018 13:18:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BD7B0288C6; Thu, 18 Oct 2018 13:18:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1114F288B7 for ; Thu, 18 Oct 2018 13:18:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727711AbeJRVTx (ORCPT ); Thu, 18 Oct 2018 17:19:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36594 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727509AbeJRVTx (ORCPT ); Thu, 18 Oct 2018 17:19:53 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 35DD788314; Thu, 18 Oct 2018 13:18:52 +0000 (UTC) Received: from localhost (ovpn-8-18.pek2.redhat.com [10.72.8.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2459C177F7; Thu, 18 Oct 2018 13:18:50 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Vitaly Kuznetsov , Dave Chinner , Linux FS Devel , "Darrick J . Wong" , xfs@vger.kernel.org, Christoph Hellwig , Bart Van Assche , Matthew Wilcox Subject: [PATCH 4/5] block: introduce helpers for allocating IO buffers from slab Date: Thu, 18 Oct 2018 21:18:16 +0800 Message-Id: <20181018131817.11813-5-ming.lei@redhat.com> In-Reply-To: <20181018131817.11813-1-ming.lei@redhat.com> References: <20181018131817.11813-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 18 Oct 2018 13:18:52 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP One big issue is that the allocated buffer from slab has to respect the queue DMA alignment limit. This patch supports to create one kmem_cache for less-than PAGE_SIZE allocation, and makes sure that the allocation is aligned with queue DMA alignment. For >= PAGE_SIZE allocation, it should be done via buddy directly. Cc: Vitaly Kuznetsov Cc: Dave Chinner Cc: Linux FS Devel Cc: Darrick J. Wong Cc: xfs@vger.kernel.org Cc: Dave Chinner Cc: Christoph Hellwig Cc: Bart Van Assche Cc: Matthew Wilcox Signed-off-by: Ming Lei --- block/Makefile | 3 +- block/blk-core.c | 2 + block/blk-sec-buf.c | 144 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/blk-sec-buf.h | 43 +++++++++++++ include/linux/blkdev.h | 5 ++ 5 files changed, 196 insertions(+), 1 deletion(-) create mode 100644 block/blk-sec-buf.c create mode 100644 include/linux/blk-sec-buf.h diff --git a/block/Makefile b/block/Makefile index 27eac600474f..74f3ed6ef954 100644 --- a/block/Makefile +++ b/block/Makefile @@ -9,7 +9,8 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-tag.o blk-sysfs.o \ blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \ blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \ genhd.o partition-generic.o ioprio.o \ - badblocks.o partitions/ blk-rq-qos.o + badblocks.o partitions/ blk-rq-qos.o \ + blk-sec-buf.o obj-$(CONFIG_BOUNCE) += bounce.o obj-$(CONFIG_BLK_SCSI_REQUEST) += scsi_ioctl.o diff --git a/block/blk-core.c b/block/blk-core.c index cdfabc5646da..02fe17dd5e67 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1079,6 +1079,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id, if (blkcg_init_queue(q)) goto fail_ref; + mutex_init(&q->blk_sec_buf_slabs_mutex); + return q; fail_ref: diff --git a/block/blk-sec-buf.c b/block/blk-sec-buf.c new file mode 100644 index 000000000000..2842a913a3d1 --- /dev/null +++ b/block/blk-sec-buf.c @@ -0,0 +1,144 @@ +/* + * Sector size level IO buffer allocation helpers for less-than PAGE_SIZE + * allocation. + * + * Controllers may has DMA alignment requirement, meantime filesystem or + * other upper layer component may allocate IO buffer via slab and submit + * bio with this buffer directly. Then DMA alignment limit can't be + * repectected. + * + * Create DMA aligned slab, and allocate this less-than PAGE_SIZE IO buffer + * from the created slab for above users. + * + * Copyright (C) 2018 Ming Lei + * + */ +#include +#include + +static void __blk_destroy_sec_buf_slabs(struct blk_sec_buf_slabs *slabs) +{ + int i; + + if (!slabs) + return; + + for (i = 0; i < BLK_NR_SEC_BUF_SLAB; i++) + kmem_cache_destroy(slabs->slabs[i]); + kfree(slabs); +} + +void blk_destroy_sec_buf_slabs(struct request_queue *q) +{ + mutex_lock(&q->blk_sec_buf_slabs_mutex); + if (q->sec_buf_slabs && !--q->sec_buf_slabs->ref_cnt) { + __blk_destroy_sec_buf_slabs(q->sec_buf_slabs); + q->sec_buf_slabs = NULL; + } + mutex_unlock(&q->blk_sec_buf_slabs_mutex); +} +EXPORT_SYMBOL_GPL(blk_destroy_sec_buf_slabs); + +int blk_create_sec_buf_slabs(char *name, struct request_queue *q) +{ + struct blk_sec_buf_slabs *slabs; + char *slab_name; + int i; + int nr_slabs = BLK_NR_SEC_BUF_SLAB; + int ret = -ENOMEM; + + /* No need to create kmem_cache if kmalloc is fine */ + if (!q || queue_dma_alignment(q) < ARCH_KMALLOC_MINALIGN) + return 0; + + slab_name = kmalloc(strlen(name) + 5, GFP_KERNEL); + if (!slab_name) + return ret; + + mutex_lock(&q->blk_sec_buf_slabs_mutex); + if (q->sec_buf_slabs) { + q->sec_buf_slabs->ref_cnt++; + ret = 0; + goto out; + } + + slabs = kzalloc(sizeof(*slabs), GFP_KERNEL); + if (!slabs) + goto out; + + for (i = 0; i < nr_slabs; i++) { + int size = (i == nr_slabs - 1) ? PAGE_SIZE - 512 : (i + 1) << 9; + + sprintf(slab_name, "%s-%d", name, i); + slabs->slabs[i] = kmem_cache_create(slab_name, size, + queue_dma_alignment(q) + 1, + SLAB_PANIC, NULL); + if (!slabs->slabs[i]) + goto fail; + } + + slabs->ref_cnt = 1; + q->sec_buf_slabs = slabs; + ret = 0; + goto out; + + fail: + __blk_destroy_sec_buf_slabs(slabs); + out: + mutex_unlock(&q->blk_sec_buf_slabs_mutex); + kfree(slab_name); + return ret; +} +EXPORT_SYMBOL_GPL(blk_create_sec_buf_slabs); + +void *blk_alloc_sec_buf(struct request_queue *q, int size, gfp_t flags) +{ + int i; + + /* We only serve less-than PAGE_SIZE alloction */ + if (size >= PAGE_SIZE) + return NULL; + + /* + * Fallback to kmalloc if no queue is provided, or kmalloc is + * enough to respect the queue dma alignment + */ + if (!q || queue_dma_alignment(q) < ARCH_KMALLOC_MINALIGN) + return kmalloc(size, flags); + + if (WARN_ON_ONCE(!q->sec_buf_slabs)) + return NULL; + + i = round_up(size, 512) >> 9; + i = i < BLK_NR_SEC_BUF_SLAB ? i : BLK_NR_SEC_BUF_SLAB; + + return kmem_cache_alloc(q->sec_buf_slabs->slabs[i - 1], flags); +} +EXPORT_SYMBOL_GPL(blk_alloc_sec_buf); + +void blk_free_sec_buf(struct request_queue *q, void *buf, int size) +{ + int i; + + /* We only serve less-than PAGE_SIZE alloction */ + if (size >= PAGE_SIZE) + return; + + /* + * Fallback to kmalloc if no queue is provided, or kmalloc is + * enough to respect the queue dma alignment + */ + if (!q || queue_dma_alignment(q) < ARCH_KMALLOC_MINALIGN) { + kfree(buf); + return; + } + + if (WARN_ON_ONCE(!q->sec_buf_slabs)) + return; + + i = round_up(size, 512) >> 9; + i = i < BLK_NR_SEC_BUF_SLAB ? i : BLK_NR_SEC_BUF_SLAB; + + kmem_cache_free(q->sec_buf_slabs->slabs[i - 1], buf); +} +EXPORT_SYMBOL_GPL(blk_free_sec_buf); diff --git a/include/linux/blk-sec-buf.h b/include/linux/blk-sec-buf.h new file mode 100644 index 000000000000..dc81d8fc0d68 --- /dev/null +++ b/include/linux/blk-sec-buf.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_BLK_SEC_BUF_H +#define _LINUX_BLK_SEC_BUF_H + +#include +#include + +#define BLK_NR_SEC_BUF_SLAB ((PAGE_SIZE >> 9) > 128 ? 128 : (PAGE_SIZE >> 9)) +struct blk_sec_buf_slabs { + int ref_cnt; + struct kmem_cache *slabs[BLK_NR_SEC_BUF_SLAB]; +}; + +int blk_create_sec_buf_slabs(char *name, struct request_queue *q); +void blk_destroy_sec_buf_slabs(struct request_queue *q); + +void *blk_alloc_sec_buf(struct request_queue *q, int size, gfp_t flags); +void blk_free_sec_buf(struct request_queue *q, void *buf, int size); + +static inline int bdev_create_sec_buf_slabs(struct block_device *bdev) +{ + char *name = bdev->bd_disk ? bdev->bd_disk->disk_name : "unknown"; + + return blk_create_sec_buf_slabs(name, bdev->bd_queue); +} + +static inline void bdev_destroy_sec_buf_slabs(struct block_device *bdev) +{ + blk_destroy_sec_buf_slabs(bdev->bd_queue); +} + +static inline void *bdev_alloc_sec_buf(struct block_device *bdev, int size, + gfp_t flags) +{ + return blk_alloc_sec_buf(bdev->bd_queue, size, flags); +} +static inline void bdev_free_sec_buf(struct block_device *bdev, void *buf, + int size) +{ + blk_free_sec_buf(bdev->bd_queue, buf, size); +} + +#endif diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index be938a31bc2e..30f5324d1f95 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -27,6 +27,7 @@ #include #include #include +#include struct module; struct scsi_ioctl_command; @@ -523,6 +524,10 @@ struct request_queue { */ gfp_t bounce_gfp; + /* for allocate less-than PAGE_SIZE io buffer */ + struct blk_sec_buf_slabs *sec_buf_slabs; + struct mutex blk_sec_buf_slabs_mutex; + /* * protects queue structures from reentrancy. ->__queue_lock should * _never_ be used directly, it is queue private. always use From patchwork Thu Oct 18 13:18:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10647245 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3951113B0 for ; Thu, 18 Oct 2018 13:19:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 26572288E2 for ; Thu, 18 Oct 2018 13:19:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A63CD288DB; Thu, 18 Oct 2018 13:19:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 46E4E288E2 for ; Thu, 18 Oct 2018 13:19:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727767AbeJRVT7 (ORCPT ); Thu, 18 Oct 2018 17:19:59 -0400 Received: from mx1.redhat.com ([209.132.183.28]:48872 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727509AbeJRVT7 (ORCPT ); Thu, 18 Oct 2018 17:19:59 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CF170308338F; Thu, 18 Oct 2018 13:18:58 +0000 (UTC) Received: from localhost (ovpn-8-18.pek2.redhat.com [10.72.8.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 342E263642; Thu, 18 Oct 2018 13:18:54 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Vitaly Kuznetsov , Dave Chinner , Linux FS Devel , "Darrick J . Wong" , xfs@vger.kernel.org, Christoph Hellwig , Bart Van Assche , Matthew Wilcox Subject: [PATCH 5/5] xfs: use block layer helpers to allocate io buffer from slab Date: Thu, 18 Oct 2018 21:18:17 +0800 Message-Id: <20181018131817.11813-6-ming.lei@redhat.com> In-Reply-To: <20181018131817.11813-1-ming.lei@redhat.com> References: <20181018131817.11813-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Thu, 18 Oct 2018 13:18:59 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP XFS may use kmalloc() to allocate io buffer, this way may not respect request queue's DMA alignment limit, and cause data corruption. This patch uses the introduced block layer helpers to allocate this kind of io buffer, and makes sure that DMA alignment is respected. Cc: Vitaly Kuznetsov Cc: Dave Chinner Cc: Linux FS Devel Cc: Darrick J. Wong Cc: xfs@vger.kernel.org Cc: Dave Chinner Cc: Christoph Hellwig Cc: Bart Van Assche Cc: Matthew Wilcox Signed-off-by: Ming Lei --- fs/xfs/xfs_buf.c | 28 +++++++++++++++++++++++++--- fs/xfs/xfs_super.c | 13 ++++++++++++- 2 files changed, 37 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index e839907e8492..fabee5e1706b 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -314,12 +314,34 @@ xfs_buf_free( __free_page(page); } } else if (bp->b_flags & _XBF_KMEM) - kmem_free(bp->b_addr); + bdev_free_sec_buf(bp->b_target->bt_bdev, bp->b_addr, + BBTOB(bp->b_length)); _xfs_buf_free_pages(bp); xfs_buf_free_maps(bp); kmem_zone_free(xfs_buf_zone, bp); } +void * +xfs_buf_allocate_memory_from_slab(xfs_buf_t *bp, int size) +{ + int retries = 0; + gfp_t lflags = kmem_flags_convert(KM_NOFS); + void *ptr; + + do { + ptr = bdev_alloc_sec_buf(bp->b_target->bt_bdev, size, lflags); + if (ptr) + return ptr; + if (!(++retries % 100)) + xfs_err(NULL, + "%s(%u) possible memory allocation deadlock size %u in %s (mode:0x%x)", + current->comm, current->pid, + (unsigned int)size, __func__, lflags); + congestion_wait(BLK_RW_ASYNC, HZ/50); + } while (1); + +} + /* * Allocates all the pages for buffer in question and builds it's page list. */ @@ -342,7 +364,7 @@ xfs_buf_allocate_memory( */ size = BBTOB(bp->b_length); if (size < PAGE_SIZE) { - bp->b_addr = kmem_alloc(size, KM_NOFS); + bp->b_addr = xfs_buf_allocate_memory_from_slab(bp, size); if (!bp->b_addr) { /* low memory - use alloc_page loop instead */ goto use_alloc_page; @@ -351,7 +373,7 @@ xfs_buf_allocate_memory( if (((unsigned long)(bp->b_addr + size - 1) & PAGE_MASK) != ((unsigned long)bp->b_addr & PAGE_MASK)) { /* b_addr spans two pages - use alloc_page instead */ - kmem_free(bp->b_addr); + bdev_free_sec_buf(bp->b_target->bt_bdev, bp->b_addr, size); bp->b_addr = NULL; goto use_alloc_page; } diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 207ee302b1bb..026cdae3aa4f 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -664,6 +664,10 @@ xfs_blkdev_get( xfs_warn(mp, "Invalid device [%s], error=%d", name, error); } + error = bdev_create_sec_buf_slabs(*bdevp); + if (error) + blkdev_put(*bdevp, FMODE_READ|FMODE_WRITE|FMODE_EXCL); + return error; } @@ -671,8 +675,10 @@ STATIC void xfs_blkdev_put( struct block_device *bdev) { - if (bdev) + if (bdev) { + bdev_destroy_sec_buf_slabs(bdev); blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL); + } } void @@ -706,6 +712,8 @@ xfs_close_devices( } xfs_free_buftarg(mp->m_ddev_targp); fs_put_dax(dax_ddev); + + bdev_destroy_sec_buf_slabs(mp->m_super->s_bdev); } /* @@ -774,6 +782,9 @@ xfs_open_devices( mp->m_logdev_targp = mp->m_ddev_targp; } + if (bdev_create_sec_buf_slabs(ddev)) + goto out_free_rtdev_targ; + return 0; out_free_rtdev_targ: