From patchwork Thu Oct 18 13:18:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 10647239 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D5F2515E2 for ; Thu, 18 Oct 2018 13:18:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C967E288B7 for ; Thu, 18 Oct 2018 13:18:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BD7B0288C6; Thu, 18 Oct 2018 13:18:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1114F288B7 for ; Thu, 18 Oct 2018 13:18:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727711AbeJRVTx (ORCPT ); Thu, 18 Oct 2018 17:19:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36594 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727509AbeJRVTx (ORCPT ); Thu, 18 Oct 2018 17:19:53 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 35DD788314; Thu, 18 Oct 2018 13:18:52 +0000 (UTC) Received: from localhost (ovpn-8-18.pek2.redhat.com [10.72.8.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2459C177F7; Thu, 18 Oct 2018 13:18:50 +0000 (UTC) From: Ming Lei To: Jens Axboe Cc: linux-block@vger.kernel.org, Ming Lei , Vitaly Kuznetsov , Dave Chinner , Linux FS Devel , "Darrick J . Wong" , xfs@vger.kernel.org, Christoph Hellwig , Bart Van Assche , Matthew Wilcox Subject: [PATCH 4/5] block: introduce helpers for allocating IO buffers from slab Date: Thu, 18 Oct 2018 21:18:16 +0800 Message-Id: <20181018131817.11813-5-ming.lei@redhat.com> In-Reply-To: <20181018131817.11813-1-ming.lei@redhat.com> References: <20181018131817.11813-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 18 Oct 2018 13:18:52 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP One big issue is that the allocated buffer from slab has to respect the queue DMA alignment limit. This patch supports to create one kmem_cache for less-than PAGE_SIZE allocation, and makes sure that the allocation is aligned with queue DMA alignment. For >= PAGE_SIZE allocation, it should be done via buddy directly. Cc: Vitaly Kuznetsov Cc: Dave Chinner Cc: Linux FS Devel Cc: Darrick J. Wong Cc: xfs@vger.kernel.org Cc: Dave Chinner Cc: Christoph Hellwig Cc: Bart Van Assche Cc: Matthew Wilcox Signed-off-by: Ming Lei --- block/Makefile | 3 +- block/blk-core.c | 2 + block/blk-sec-buf.c | 144 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/blk-sec-buf.h | 43 +++++++++++++ include/linux/blkdev.h | 5 ++ 5 files changed, 196 insertions(+), 1 deletion(-) create mode 100644 block/blk-sec-buf.c create mode 100644 include/linux/blk-sec-buf.h diff --git a/block/Makefile b/block/Makefile index 27eac600474f..74f3ed6ef954 100644 --- a/block/Makefile +++ b/block/Makefile @@ -9,7 +9,8 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-tag.o blk-sysfs.o \ blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \ blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \ genhd.o partition-generic.o ioprio.o \ - badblocks.o partitions/ blk-rq-qos.o + badblocks.o partitions/ blk-rq-qos.o \ + blk-sec-buf.o obj-$(CONFIG_BOUNCE) += bounce.o obj-$(CONFIG_BLK_SCSI_REQUEST) += scsi_ioctl.o diff --git a/block/blk-core.c b/block/blk-core.c index cdfabc5646da..02fe17dd5e67 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1079,6 +1079,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id, if (blkcg_init_queue(q)) goto fail_ref; + mutex_init(&q->blk_sec_buf_slabs_mutex); + return q; fail_ref: diff --git a/block/blk-sec-buf.c b/block/blk-sec-buf.c new file mode 100644 index 000000000000..2842a913a3d1 --- /dev/null +++ b/block/blk-sec-buf.c @@ -0,0 +1,144 @@ +/* + * Sector size level IO buffer allocation helpers for less-than PAGE_SIZE + * allocation. + * + * Controllers may has DMA alignment requirement, meantime filesystem or + * other upper layer component may allocate IO buffer via slab and submit + * bio with this buffer directly. Then DMA alignment limit can't be + * repectected. + * + * Create DMA aligned slab, and allocate this less-than PAGE_SIZE IO buffer + * from the created slab for above users. + * + * Copyright (C) 2018 Ming Lei + * + */ +#include +#include + +static void __blk_destroy_sec_buf_slabs(struct blk_sec_buf_slabs *slabs) +{ + int i; + + if (!slabs) + return; + + for (i = 0; i < BLK_NR_SEC_BUF_SLAB; i++) + kmem_cache_destroy(slabs->slabs[i]); + kfree(slabs); +} + +void blk_destroy_sec_buf_slabs(struct request_queue *q) +{ + mutex_lock(&q->blk_sec_buf_slabs_mutex); + if (q->sec_buf_slabs && !--q->sec_buf_slabs->ref_cnt) { + __blk_destroy_sec_buf_slabs(q->sec_buf_slabs); + q->sec_buf_slabs = NULL; + } + mutex_unlock(&q->blk_sec_buf_slabs_mutex); +} +EXPORT_SYMBOL_GPL(blk_destroy_sec_buf_slabs); + +int blk_create_sec_buf_slabs(char *name, struct request_queue *q) +{ + struct blk_sec_buf_slabs *slabs; + char *slab_name; + int i; + int nr_slabs = BLK_NR_SEC_BUF_SLAB; + int ret = -ENOMEM; + + /* No need to create kmem_cache if kmalloc is fine */ + if (!q || queue_dma_alignment(q) < ARCH_KMALLOC_MINALIGN) + return 0; + + slab_name = kmalloc(strlen(name) + 5, GFP_KERNEL); + if (!slab_name) + return ret; + + mutex_lock(&q->blk_sec_buf_slabs_mutex); + if (q->sec_buf_slabs) { + q->sec_buf_slabs->ref_cnt++; + ret = 0; + goto out; + } + + slabs = kzalloc(sizeof(*slabs), GFP_KERNEL); + if (!slabs) + goto out; + + for (i = 0; i < nr_slabs; i++) { + int size = (i == nr_slabs - 1) ? PAGE_SIZE - 512 : (i + 1) << 9; + + sprintf(slab_name, "%s-%d", name, i); + slabs->slabs[i] = kmem_cache_create(slab_name, size, + queue_dma_alignment(q) + 1, + SLAB_PANIC, NULL); + if (!slabs->slabs[i]) + goto fail; + } + + slabs->ref_cnt = 1; + q->sec_buf_slabs = slabs; + ret = 0; + goto out; + + fail: + __blk_destroy_sec_buf_slabs(slabs); + out: + mutex_unlock(&q->blk_sec_buf_slabs_mutex); + kfree(slab_name); + return ret; +} +EXPORT_SYMBOL_GPL(blk_create_sec_buf_slabs); + +void *blk_alloc_sec_buf(struct request_queue *q, int size, gfp_t flags) +{ + int i; + + /* We only serve less-than PAGE_SIZE alloction */ + if (size >= PAGE_SIZE) + return NULL; + + /* + * Fallback to kmalloc if no queue is provided, or kmalloc is + * enough to respect the queue dma alignment + */ + if (!q || queue_dma_alignment(q) < ARCH_KMALLOC_MINALIGN) + return kmalloc(size, flags); + + if (WARN_ON_ONCE(!q->sec_buf_slabs)) + return NULL; + + i = round_up(size, 512) >> 9; + i = i < BLK_NR_SEC_BUF_SLAB ? i : BLK_NR_SEC_BUF_SLAB; + + return kmem_cache_alloc(q->sec_buf_slabs->slabs[i - 1], flags); +} +EXPORT_SYMBOL_GPL(blk_alloc_sec_buf); + +void blk_free_sec_buf(struct request_queue *q, void *buf, int size) +{ + int i; + + /* We only serve less-than PAGE_SIZE alloction */ + if (size >= PAGE_SIZE) + return; + + /* + * Fallback to kmalloc if no queue is provided, or kmalloc is + * enough to respect the queue dma alignment + */ + if (!q || queue_dma_alignment(q) < ARCH_KMALLOC_MINALIGN) { + kfree(buf); + return; + } + + if (WARN_ON_ONCE(!q->sec_buf_slabs)) + return; + + i = round_up(size, 512) >> 9; + i = i < BLK_NR_SEC_BUF_SLAB ? i : BLK_NR_SEC_BUF_SLAB; + + kmem_cache_free(q->sec_buf_slabs->slabs[i - 1], buf); +} +EXPORT_SYMBOL_GPL(blk_free_sec_buf); diff --git a/include/linux/blk-sec-buf.h b/include/linux/blk-sec-buf.h new file mode 100644 index 000000000000..dc81d8fc0d68 --- /dev/null +++ b/include/linux/blk-sec-buf.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_BLK_SEC_BUF_H +#define _LINUX_BLK_SEC_BUF_H + +#include +#include + +#define BLK_NR_SEC_BUF_SLAB ((PAGE_SIZE >> 9) > 128 ? 128 : (PAGE_SIZE >> 9)) +struct blk_sec_buf_slabs { + int ref_cnt; + struct kmem_cache *slabs[BLK_NR_SEC_BUF_SLAB]; +}; + +int blk_create_sec_buf_slabs(char *name, struct request_queue *q); +void blk_destroy_sec_buf_slabs(struct request_queue *q); + +void *blk_alloc_sec_buf(struct request_queue *q, int size, gfp_t flags); +void blk_free_sec_buf(struct request_queue *q, void *buf, int size); + +static inline int bdev_create_sec_buf_slabs(struct block_device *bdev) +{ + char *name = bdev->bd_disk ? bdev->bd_disk->disk_name : "unknown"; + + return blk_create_sec_buf_slabs(name, bdev->bd_queue); +} + +static inline void bdev_destroy_sec_buf_slabs(struct block_device *bdev) +{ + blk_destroy_sec_buf_slabs(bdev->bd_queue); +} + +static inline void *bdev_alloc_sec_buf(struct block_device *bdev, int size, + gfp_t flags) +{ + return blk_alloc_sec_buf(bdev->bd_queue, size, flags); +} +static inline void bdev_free_sec_buf(struct block_device *bdev, void *buf, + int size) +{ + blk_free_sec_buf(bdev->bd_queue, buf, size); +} + +#endif diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index be938a31bc2e..30f5324d1f95 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -27,6 +27,7 @@ #include #include #include +#include struct module; struct scsi_ioctl_command; @@ -523,6 +524,10 @@ struct request_queue { */ gfp_t bounce_gfp; + /* for allocate less-than PAGE_SIZE io buffer */ + struct blk_sec_buf_slabs *sec_buf_slabs; + struct mutex blk_sec_buf_slabs_mutex; + /* * protects queue structures from reentrancy. ->__queue_lock should * _never_ be used directly, it is queue private. always use