diff mbox series

[v12,14/17] btrfs: send: write larger chunks when using stream v2

Message ID 051232485e4ac1a1a5fd35de7328208385c59f65.1637179348.git.osandov@fb.com (mailing list archive)
State New, archived
Headers show
Series btrfs: add ioctls and send/receive support for reading/writing compressed data | expand

Commit Message

Omar Sandoval Nov. 17, 2021, 8:19 p.m. UTC
From: Omar Sandoval <osandov@fb.com>

The length field of the send stream TLV header is 16 bits. This means
that the maximum amount of data that can be sent for one write is 64k
minus one. However, encoded writes must be able to send the maximum
compressed extent (128k) in one command. To support this, send stream
version 2 encodes the DATA attribute differently: it has no length
field, and the length is implicitly up to the end of containing command
(which has a 32-bit length field). Although this is necessary for
encoded writes, normal writes can benefit from it, too.

Also add a check to enforce that the DATA attribute is last. It is only
strictly necessary for v2, but we might as well make v1 consistent with
it.

For v2, let's bump up the send buffer to the maximum compressed extent
size plus 16k for the other metadata (144k total). Since this will most
likely be vmalloc'd (and always will be after the next commit), we round
it up to the next page since we might as well use the rest of the page
on systems with >16k pages.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/send.c | 42 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 34 insertions(+), 8 deletions(-)

Comments

David Sterba Nov. 18, 2021, 3:50 p.m. UTC | #1
On Wed, Nov 17, 2021 at 12:19:24PM -0800, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> The length field of the send stream TLV header is 16 bits. This means
> that the maximum amount of data that can be sent for one write is 64k
> minus one. However, encoded writes must be able to send the maximum
> compressed extent (128k) in one command. To support this, send stream
> version 2 encodes the DATA attribute differently: it has no length
> field, and the length is implicitly up to the end of containing command
> (which has a 32-bit length field). Although this is necessary for
> encoded writes, normal writes can benefit from it, too.
> 
> Also add a check to enforce that the DATA attribute is last. It is only
> strictly necessary for v2, but we might as well make v1 consistent with
> it.
> 
> For v2, let's bump up the send buffer to the maximum compressed extent
> size plus 16k for the other metadata (144k total).

I'm not sure we want to set the number like that, it feels quite
limiting for potential compression enhancements.


> Since this will most
> likely be vmalloc'd (and always will be after the next commit), we round
> it up to the next page since we might as well use the rest of the page
> on systems with >16k pages.

Would it work also without the virtual mappings? For speedup it makes
sense to use vmalloc area, but as a fallback writing in smaller portions
or page by page eventually should be also possible. For that reason I
don't think we should set the maximum other than what fits to 32bit
number minus some overhead.
Omar Sandoval Nov. 18, 2021, 7:34 p.m. UTC | #2
On Thu, Nov 18, 2021 at 04:50:37PM +0100, David Sterba wrote:
> On Wed, Nov 17, 2021 at 12:19:24PM -0800, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > The length field of the send stream TLV header is 16 bits. This means
> > that the maximum amount of data that can be sent for one write is 64k
> > minus one. However, encoded writes must be able to send the maximum
> > compressed extent (128k) in one command. To support this, send stream
> > version 2 encodes the DATA attribute differently: it has no length
> > field, and the length is implicitly up to the end of containing command
> > (which has a 32-bit length field). Although this is necessary for
> > encoded writes, normal writes can benefit from it, too.
> > 
> > Also add a check to enforce that the DATA attribute is last. It is only
> > strictly necessary for v2, but we might as well make v1 consistent with
> > it.
> > 
> > For v2, let's bump up the send buffer to the maximum compressed extent
> > size plus 16k for the other metadata (144k total).
> 
> I'm not sure we want to set the number like that, it feels quite
> limiting for potential compression enhancements.

This is all we need for now, but we can always raise it in the future. I
amended the protocol and the progs send parsing code to assume no hard
limit.

> > Since this will most
> > likely be vmalloc'd (and always will be after the next commit), we round
> > it up to the next page since we might as well use the rest of the page
> > on systems with >16k pages.
> 
> Would it work also without the virtual mappings? For speedup it makes
> sense to use vmalloc area, but as a fallback writing in smaller portions
> or page by page eventually should be also possible. For that reason I
> don't think we should set the maximum other than what fits to 32bit
> number minus some overhead.

I think you're saying that we could allocate a smaller buffer and do
smaller reads that we immediately write to the send pipe/file? So
something like:

send_write() {
	write_tlv_metadata_to_pipe();
	while (written < to_write) {
		read_small_chunk();
		write_small_chunk_to_pipe();
		written += size_of_small_chunk();
	}
}

And from the protocol's point of view, it's still one big command,
although we didn't have to keep it all in memory at once.

If I'm understanding correctly, then yes, I think that's something we
could do eventually. And my description of v2 allows this:

-#define BTRFS_SEND_BUF_SIZE SZ_64K
+/*
+ * In send stream v1, no command is larger than 64k. In send stream v2, no limit
+ * should be assumed.
+ */
+#define BTRFS_SEND_BUF_SIZE_V1 SZ_64K

Although receive would have to be more intelligent about reading huge
commands.
diff mbox series

Patch

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 53b3cc2276ea..12844cb20584 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -81,6 +81,7 @@  struct send_ctx {
 	char *send_buf;
 	u32 send_size;
 	u32 send_max_size;
+	bool put_data;
 	u64 flags;	/* 'flags' member of btrfs_ioctl_send_args is u64 */
 	/* Protocol version compatibility requested */
 	u32 proto;
@@ -584,6 +585,9 @@  static int tlv_put(struct send_ctx *sctx, u16 attr, const void *data, int len)
 	int total_len = sizeof(*hdr) + len;
 	int left = sctx->send_max_size - sctx->send_size;
 
+	if (WARN_ON_ONCE(sctx->put_data))
+		return -EINVAL;
+
 	if (unlikely(left < total_len))
 		return -EOVERFLOW;
 
@@ -721,6 +725,7 @@  static int send_cmd(struct send_ctx *sctx)
 					&sctx->send_off);
 
 	sctx->send_size = 0;
+	sctx->put_data = false;
 
 	return ret;
 }
@@ -4902,14 +4907,30 @@  static inline u64 max_send_read_size(const struct send_ctx *sctx)
 
 static int put_data_header(struct send_ctx *sctx, u32 len)
 {
-	struct btrfs_tlv_header *hdr;
+	if (WARN_ON_ONCE(sctx->put_data))
+		return -EINVAL;
+	sctx->put_data = true;
+	if (sctx->proto >= 2) {
+		/*
+		 * In v2, the data attribute header doesn't include a length; it
+		 * is implicitly to the end of the command.
+		 */
+		if (sctx->send_max_size - sctx->send_size < 2 + len)
+			return -EOVERFLOW;
+		put_unaligned_le16(BTRFS_SEND_A_DATA,
+				   sctx->send_buf + sctx->send_size);
+		sctx->send_size += 2;
+	} else {
+		struct btrfs_tlv_header *hdr;
 
-	if (sctx->send_max_size - sctx->send_size < sizeof(*hdr) + len)
-		return -EOVERFLOW;
-	hdr = (struct btrfs_tlv_header *)(sctx->send_buf + sctx->send_size);
-	put_unaligned_le16(BTRFS_SEND_A_DATA, &hdr->tlv_type);
-	put_unaligned_le16(len, &hdr->tlv_len);
-	sctx->send_size += sizeof(*hdr);
+		if (sctx->send_max_size - sctx->send_size < sizeof(*hdr) + len)
+			return -EOVERFLOW;
+		hdr = (struct btrfs_tlv_header *)(sctx->send_buf +
+						  sctx->send_size);
+		put_unaligned_le16(BTRFS_SEND_A_DATA, &hdr->tlv_type);
+		put_unaligned_le16(len, &hdr->tlv_len);
+		sctx->send_size += sizeof(*hdr);
+	}
 	return 0;
 }
 
@@ -7292,7 +7313,12 @@  long btrfs_ioctl_send(struct file *mnt_file, struct btrfs_ioctl_send_args *arg)
 
 	sctx->clone_roots_cnt = arg->clone_sources_count;
 
-	sctx->send_max_size = BTRFS_SEND_BUF_SIZE_V1;
+	if (sctx->proto >= 2) {
+		sctx->send_max_size = ALIGN(SZ_16K + BTRFS_MAX_COMPRESSED,
+					    PAGE_SIZE);
+	} else {
+		sctx->send_max_size = BTRFS_SEND_BUF_SIZE_V1;
+	}
 	sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL);
 	if (!sctx->send_buf) {
 		ret = -ENOMEM;