diff mbox series

[1/2] iomap: Add iomap_iter API

Message ID 20200401152522.20737-2-willy@infradead.org (mailing list archive)
State New, archived
Headers show
Series Begin switching iomap from apply to iter | expand

Commit Message

Matthew Wilcox April 1, 2020, 3:25 p.m. UTC
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

To replace the apply API

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/Makefile     |  2 +-
 fs/iomap/iter.c       | 81 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/iomap.h | 24 +++++++++++++
 3 files changed, 106 insertions(+), 1 deletion(-)
 create mode 100644 fs/iomap/iter.c

Comments

Christoph Hellwig April 1, 2020, 3:42 p.m. UTC | #1
> +loff_t iomap_iter(struct iomap_iter *iter, loff_t written)
> +{
> +	const struct iomap_ops *ops = iter->ops;
> +	struct iomap *iomap = &iter->iomap;
> +	struct iomap *srcmap = &iter->srcmap;

I think it makes sense to only have members in the iter structure
that this function modifies.  That is, just pass inode, ops and flags
as explicit parameters.

OTOH the len argument / return value seems like something that would
seems useful in the iter structure.  That would require renaming the
current len to something like total_len..

> +/* Magic value for first call to iterator */
> +#define IOMAP_FIRST_CALL	LLONG_MIN

Can we find a way to make a a zero initialized field the indicatator
of the first call?  That way we don't need any knowledge of magic
values in the callers.  And also don't need any special initializer
value, but just leave it to the caller to initialize .pos and
.total_len, and be done with it.
Matthew Wilcox April 1, 2020, 6:42 p.m. UTC | #2
On Wed, Apr 01, 2020 at 08:42:48AM -0700, Christoph Hellwig wrote:
> > +loff_t iomap_iter(struct iomap_iter *iter, loff_t written)
> > +{
> > +	const struct iomap_ops *ops = iter->ops;
> > +	struct iomap *iomap = &iter->iomap;
> > +	struct iomap *srcmap = &iter->srcmap;
> 
> I think it makes sense to only have members in the iter structure
> that this function modifies.  That is, just pass inode, ops and flags
> as explicit parameters.

One of the annoying things we do when looking at the disassembly is
spend a lot of instructions shuffling arguments around.  Passing as many
arguments as possible in a struct minimises that.  Ideally we'd pass
the iomap_iter to iomap_begin() and iomap_end().  Agreed passing the
ops there makes no sense, but I'd like to keep inode and flags in
the iomap_iter struct so they don't need to be passed to begin/end
as explicit arguments.

> OTOH the len argument / return value seems like something that would
> seems useful in the iter structure.  That would require renaming the
> current len to something like total_len..

I'm inclined to go with seg_len and op_len.

> > +/* Magic value for first call to iterator */
> > +#define IOMAP_FIRST_CALL	LLONG_MIN
> 
> Can we find a way to make a a zero initialized field the indicatator
> of the first call?  That way we don't need any knowledge of magic
> values in the callers.  And also don't need any special initializer
> value, but just leave it to the caller to initialize .pos and
> .total_len, and be done with it.

Yeah; this was just a quick hack.  I'll do something neater.
Matthew Wilcox April 1, 2020, 7:20 p.m. UTC | #3
On Wed, Apr 01, 2020 at 08:42:48AM -0700, Christoph Hellwig wrote:
> OTOH the len argument / return value seems like something that would
> seems useful in the iter structure.  That would require renaming the
> current len to something like total_len..

Ah, I remembered why I didn't do it that way.  For more complicated
users (eg readahead ...), we want to be able to pass an errno here.
Then iomap_iter() will call ops->iomap_end() and return the errno
you passed in, and we can terminate the loop.
Christoph Hellwig April 2, 2020, 7:48 a.m. UTC | #4
On Wed, Apr 01, 2020 at 12:20:38PM -0700, Matthew Wilcox wrote:
> On Wed, Apr 01, 2020 at 08:42:48AM -0700, Christoph Hellwig wrote:
> > OTOH the len argument / return value seems like something that would
> > seems useful in the iter structure.  That would require renaming the
> > current len to something like total_len..
> 
> Ah, I remembered why I didn't do it that way.  For more complicated
> users (eg readahead ...), we want to be able to pass an errno here.
> Then iomap_iter() will call ops->iomap_end() and return the errno
> you passed in, and we can terminate the loop.

Once we have the iter struct we can and should separate the written
counter from the errno, as that is a much cleaner interface.  The prime
reason I avoided that earlier was to keep the number of arguments down,
and even that was a bad reason in retrospective.
Jan Kara April 22, 2020, 3:19 p.m. UTC | #5
On Wed 01-04-20 11:42:45, Matthew Wilcox wrote:
> On Wed, Apr 01, 2020 at 08:42:48AM -0700, Christoph Hellwig wrote:
> > > +loff_t iomap_iter(struct iomap_iter *iter, loff_t written)
> > > +{
> > > +	const struct iomap_ops *ops = iter->ops;
> > > +	struct iomap *iomap = &iter->iomap;
> > > +	struct iomap *srcmap = &iter->srcmap;
> > 
> > I think it makes sense to only have members in the iter structure
> > that this function modifies.  That is, just pass inode, ops and flags
> > as explicit parameters.
> 
> One of the annoying things we do when looking at the disassembly is
> spend a lot of instructions shuffling arguments around.  Passing as many
> arguments as possible in a struct minimises that.

Somewhat late to the game but ... from the conversions of "explicit
arguments to struct of arguments" I've seen (e.g. in xarray) compilers seem
to generate somewhat slower code when arguments are passed in structs. From
the profiling I did it just seems that when arguments are passed directly,
they are in registers which is generally the fastest access you can get.
When you pass arguments in structs, compilers just fetch the value from
stack which is slower even if its cached. And when the argument is not used
frequently or there's something else cache heavy going on, you may have to
go to L2 or L3 which is when you feel the pain... E.g. I've observed some
of the xarray functions which were "logically" identical to their
radix-tree counterparts generate non-negligible amount of cache misses when
reading their arguments from the passed struct.

I don't think iomap is as CPU sensitive as xarray (generally there's much
heavier work that happens in the filesystem) so I'd just strive for code
simplicity here. But I wanted to mention this so that it's clear that
pushing arguments to structs isn't free either.

								Honza
diff mbox series

Patch

diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
index eef2722d93a1..477e5e79f874 100644
--- a/fs/iomap/Makefile
+++ b/fs/iomap/Makefile
@@ -9,7 +9,7 @@  ccflags-y += -I $(srctree)/$(src)		# needed for trace events
 obj-$(CONFIG_FS_IOMAP)		+= iomap.o
 
 iomap-y				+= trace.o \
-				   apply.o \
+				   apply.o iter.o \
 				   buffered-io.o \
 				   direct-io.o \
 				   fiemap.o \
diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
new file mode 100644
index 000000000000..1d668fdd928e
--- /dev/null
+++ b/fs/iomap/iter.c
@@ -0,0 +1,81 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2010 Red Hat, Inc.
+ * Copyright (c) 2016-2018 Christoph Hellwig.
+ */
+#include <linux/module.h>
+#include <linux/compiler.h>
+#include <linux/fs.h>
+#include <linux/iomap.h>
+#include "trace.h"
+
+/*
+ * Execute a iomap write on a segment of the mapping that spans a
+ * contiguous range of pages that have identical block mapping state.
+ *
+ * This avoids the need to map pages individually, do individual allocations
+ * for each page and most importantly avoid the need for filesystem specific
+ * locking per page. Instead, all the operations are amortised over the entire
+ * range of pages. It is assumed that the filesystems will lock whatever
+ * resources they require in the iomap_begin call, and release them in the
+ * iomap_end call.
+ */
+loff_t iomap_iter(struct iomap_iter *iter, loff_t written)
+{
+	const struct iomap_ops *ops = iter->ops;
+	struct iomap *iomap = &iter->iomap;
+	struct iomap *srcmap = &iter->srcmap;
+	loff_t end, ret = 0;
+
+	trace_iomap_apply(iter->inode, iter->pos, iter->len, iter->flags,
+			iter->ops, NULL, _RET_IP_);
+
+	if (written != IOMAP_FIRST_CALL) {
+		if (ops->iomap_end)
+			ret = ops->iomap_end(iter->inode, iter->pos,
+					iter->len, written > 0 ? written : 0,
+					iter->flags, iomap);
+		if (written < 0)
+			return written;
+		if (ret < 0)
+			return ret;
+		iter->pos += written;
+		iter->len -= written;
+	}
+
+	/*
+	 * Need to map a range from start position for length bytes. This can
+	 * span multiple pages - it is only guaranteed to return a range of a
+	 * single type of pages (e.g. all into a hole, all mapped or all
+	 * unwritten). Failure at this point has nothing to undo.
+	 *
+	 * If allocation is required for this range, reserve the space now so
+	 * that the allocation is guaranteed to succeed later on. Once we copy
+	 * the data into the page cache pages, then we cannot fail otherwise we
+	 * expose transient stale data. If the reserve fails, we can safely
+	 * back out at this point as there is nothing to undo.
+	 */
+	ret = ops->iomap_begin(iter->inode, iter->pos, iter->len,
+			iter->flags, iomap, srcmap);
+	if (ret)
+		return ret;
+	if (WARN_ON(iomap->offset > iter->pos))
+		return -EIO;
+	if (WARN_ON(iomap->offset + iomap->length <= iter->pos))
+		return -EIO;
+	if (WARN_ON(iomap->length == 0))
+		return -EIO;
+
+	trace_iomap_apply_dstmap(iter->inode, iomap);
+	if (srcmap->type != IOMAP_HOLE)
+		trace_iomap_apply_srcmap(iter->inode, srcmap);
+
+	/*
+	 * Cut down the length to the one actually provided by the filesystem,
+	 * as it might not be able to give us the whole size that we requested.
+	 */
+	end = iomap->offset + iomap->length;
+	if (srcmap->type != IOMAP_HOLE)
+		end = min_t(loff_t, end, srcmap->offset + srcmap->length);
+	return min(iter->len, end - iter->pos);
+}
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 8b09463dae0d..ec00a2268f14 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -142,6 +142,30 @@  struct iomap_ops {
 			ssize_t written, unsigned flags, struct iomap *iomap);
 };
 
+struct iomap_iter {
+	struct inode *inode;
+	const struct iomap_ops *ops;
+	loff_t pos;
+	loff_t len;
+	unsigned flags;
+	struct iomap iomap;
+	struct iomap srcmap;
+};
+
+#define DEFINE_IOMAP_ITER(name, _inode, _pos, _len, _flags, _ops)	\
+	struct iomap_iter name = {					\
+		.inode = _inode,					\
+		.ops = _ops,						\
+		.pos = _pos,						\
+		.len = _len,						\
+		.flags = _flags,					\
+	}
+
+/* Magic value for first call to iterator */
+#define IOMAP_FIRST_CALL	LLONG_MIN
+
+loff_t iomap_iter(struct iomap_iter *, loff_t written);
+
 /*
  * Main iomap iterator function.
  */