mbox series

[0/5] block: introduce helpers for allocating io buffer from slab

Message ID 20181018131817.11813-1-ming.lei@redhat.com (mailing list archive)
Headers show
Series block: introduce helpers for allocating io buffer from slab | expand

Message

Ming Lei Oct. 18, 2018, 1:18 p.m. UTC
Hi,

Filesystems may allocate io buffer from slab, and use this buffer to
submit bio. This way may break storage drivers if they have special
requirement on DMA alignment.

The patch 1 adds one warning if the io buffer isn't aligned to DMA
alignment.

The 2nd & 3rd patches make DMA alignment as stacked limit.

The 4th patch introduces helpers for allocating io buffer from slab,
and DMA alignment is respected on this allocation.

The 5th patch converts xfs to use the introduced helpers for allocating
io buffer from slab.

See previous discussion on this topic:

https://marc.info/?t=153734857500004&r=1&w=2

Thanks,
Ming

Ming Lei (5):
  block: warn on un-aligned DMA IO buffer
  block: move .dma_alignment into q->limits
  block: make dma_alignment as stacked limit
  block: introduce helpers for allocating IO buffers from slab
  xfs: use block layer helpers to allocate io buffer from slab

 block/Makefile              |   3 +-
 block/blk-core.c            |   2 +
 block/blk-merge.c           |   2 +
 block/blk-sec-buf.c         | 144 ++++++++++++++++++++++++++++++++++++++++++++
 block/blk-settings.c        |  89 +++++++++++++++------------
 fs/xfs/xfs_buf.c            |  28 ++++++++-
 fs/xfs/xfs_super.c          |  13 +++-
 include/linux/blk-sec-buf.h |  43 +++++++++++++
 include/linux/blkdev.h      |   9 ++-
 9 files changed, 287 insertions(+), 46 deletions(-)
 create mode 100644 block/blk-sec-buf.c
 create mode 100644 include/linux/blk-sec-buf.h

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Linux FS Devel <linux-fsdevel@vger.kernel.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: xfs@vger.kernel.org
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Matthew Wilcox <willy@infradead.org>

Comments

Matthew Wilcox Oct. 18, 2018, 2:03 p.m. UTC | #1
On Thu, Oct 18, 2018 at 09:18:12PM +0800, Ming Lei wrote:
> Hi,
> 
> Filesystems may allocate io buffer from slab, and use this buffer to
> submit bio. This way may break storage drivers if they have special
> requirement on DMA alignment.

Before we go down this road, could we have a discussion about what
hardware actually requires this?  Storage has this weird assumption that
I/Os must be at least 512 byte aligned in memory, and I don't know where
this idea comes from.  Network devices can do arbitrary byte alignment.
Even USB controllers can do arbitrary byte alignment.  Sure, performance
is going to suck and there are definite risks on some architectures
with doing IOs that are sub-cacheline aligned, but why is storage such a
special snowflake that we assume that host controllers are only capable
of doing 512-byte aligned DMAs?

I just dragged out the NCR53c810 data sheet from 1993, and it's capable of
doing arbitrary alignment of DMA.  NVMe is capable of 4-byte aligned DMA.
What hardware needs this 512 byte alignment?
Christoph Hellwig Oct. 18, 2018, 2:05 p.m. UTC | #2
On Thu, Oct 18, 2018 at 07:03:42AM -0700, Matthew Wilcox wrote:
> Before we go down this road, could we have a discussion about what
> hardware actually requires this?  Storage has this weird assumption that
> I/Os must be at least 512 byte aligned in memory, and I don't know where
> this idea comes from.  Network devices can do arbitrary byte alignment.
> Even USB controllers can do arbitrary byte alignment.  Sure, performance
> is going to suck and there are definite risks on some architectures
> with doing IOs that are sub-cacheline aligned, but why is storage such a
> special snowflake that we assume that host controllers are only capable
> of doing 512-byte aligned DMAs?

Actually most storage controllers requires 4-byte alignment, but there is
a significant subset that requires 512-byte alignment.
Matthew Wilcox Oct. 18, 2018, 3:06 p.m. UTC | #3
On Thu, Oct 18, 2018 at 04:05:51PM +0200, Christoph Hellwig wrote:
> On Thu, Oct 18, 2018 at 07:03:42AM -0700, Matthew Wilcox wrote:
> > Before we go down this road, could we have a discussion about what
> > hardware actually requires this?  Storage has this weird assumption that
> > I/Os must be at least 512 byte aligned in memory, and I don't know where
> > this idea comes from.  Network devices can do arbitrary byte alignment.
> > Even USB controllers can do arbitrary byte alignment.  Sure, performance
> > is going to suck and there are definite risks on some architectures
> > with doing IOs that are sub-cacheline aligned, but why is storage such a
> > special snowflake that we assume that host controllers are only capable
> > of doing 512-byte aligned DMAs?
> 
> Actually most storage controllers requires 4-byte alignment, but there is
> a significant subset that requires 512-byte alignment.

Can you name one that does require 512-byte alignment, preferably still
in use?  Or even >4-byte alignment.  I just checked AHCI and that requires
only 2-byte alignment.

I have reason to believe that these are uncommon because of the feedback
we got in the NVMe committee after releasing 1.0 which required 4-byte
alignment from people whining that they just couldn't guarantee 4-byte
alignment in their host devices and they absolutely needed to have no
alignment requirements (!)
Christoph Hellwig Oct. 18, 2018, 3:21 p.m. UTC | #4
On Thu, Oct 18, 2018 at 08:06:05AM -0700, Matthew Wilcox wrote:
> Can you name one that does require 512-byte alignment, preferably still
> in use?  Or even >4-byte alignment.  I just checked AHCI and that requires
> only 2-byte alignment.

Xen-blkfront, rsxx, various SD/MMC card readers for example.

> I have reason to believe that these are uncommon because of the feedback
> we got in the NVMe committee after releasing 1.0 which required 4-byte
> alignment from people whining that they just couldn't guarantee 4-byte
> alignment in their host devices and they absolutely needed to have no
> alignment requirements (!)

See how things turned - after NVMe SGLs followed the no alignment
rule enough controller vendors rebelled so that NVMe 1.3 has an option
of SGL support only if 4-byte aligned.
Bart Van Assche Oct. 18, 2018, 3:50 p.m. UTC | #5
On Thu, 2018-10-18 at 07:03 -0700, Matthew Wilcox wrote:
> On Thu, Oct 18, 2018 at 09:18:12PM +0800, Ming Lei wrote:
> > Filesystems may allocate io buffer from slab, and use this buffer to
> > submit bio. This way may break storage drivers if they have special
> > requirement on DMA alignment.
> 
> Before we go down this road, could we have a discussion about what
> hardware actually requires this?  Storage has this weird assumption that
> I/Os must be at least 512 byte aligned in memory, and I don't know where
> this idea comes from.  Network devices can do arbitrary byte alignment.
> Even USB controllers can do arbitrary byte alignment.  Sure, performance
> is going to suck and there are definite risks on some architectures
> with doing IOs that are sub-cacheline aligned, but why is storage such a
> special snowflake that we assume that host controllers are only capable
> of doing 512-byte aligned DMAs?
> 
> I just dragged out the NCR53c810 data sheet from 1993, and it's capable of
> doing arbitrary alignment of DMA.  NVMe is capable of 4-byte aligned DMA.
> What hardware needs this 512 byte alignment?

How about starting with modifying the queue_dma_alignment() function? The
current implementation of that function is as follows:

static inline int queue_dma_alignment(struct request_queue *q)
{
	return q ? q->dma_alignment : 511;
}

In other words, for block drivers that do not set the DMA alignment
explicitly it is assumed that these drivers need 512 byte alignment. I think
the "512 byte alignment as default" was introduced in 2002. From Thomas
Gleixner's history tree, commit ad519c6902fb:

+static inline int queue_dma_alignment(request_queue_t *q)
+{
+       int retval = 511;
+
+       if (q && q->dma_alignment)
+               retval = q->dma_alignment;
+
+       return retval;
+}
+
+static inline int bdev_dma_aligment(struct block_device *bdev)
+{
+       return queue_dma_alignment(bdev_get_queue(bdev));
+}
+

Bart.