diff mbox series

[v3,13/13] btrfs: add subpage overview comments

Message ID 20210325071445.90896-14-wqu@suse.com (mailing list archive)
State New
Headers show
Series btrfs: support read-write for subpage metadata | expand

Commit Message

Qu Wenruo March 25, 2021, 7:14 a.m. UTC
This patch will add an overview for how btrfs subpage support,
including:

- Limitations
- Behaviors
- Basic implementation points

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/subpage.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)
diff mbox series

Patch

diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 2a326d6385ed..c35db695886b 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -1,5 +1,59 @@ 
 // SPDX-License-Identifier: GPL-2.0
 
+/*
+ * Subpage (sectorsize < PAGE_SIZE) support for btrfs overview:
+ *
+ * Limitation:
+ * - Only support 64K page size yet
+ *   This is to make metadata handling easier, as 64K page would ensure
+ *   all nodesize would fit inside one page, thus we don't need to handle
+ *   cases where a tree block crosses several pages.
+ *
+ * - Only metadata read-write yet
+ *   The data read-write part is under heavy tests, while still have several
+ *   bugs remaining.
+ *
+ * - Metadata can't cross 64K page boundary
+ *   btrfs-progs and kernel has done such behavior for a while, thus only
+ *   ancient btrfs could have such problem.
+ *   For such case, btrfs will do a graceful rejection.
+ *
+ * Special behaviors:
+ * - Metadata
+ *   Metadata read is fully subpage.
+ *   Meaning when reading one tree block will only trigger the read for the
+ *   needed range, other unrelated range in the same page will not be touched.
+ *
+ *   Metadata write is partial subpage.
+ *   The writeback is still for the full page, but btrfs will only submit
+ *   the dirty extent buffers in the page.
+ *
+ *   This means, if we have a metadata page like this:
+ *   Page offset
+ *   0         16K         32K         48K        64K
+ *   |/////////|           |///////////|
+ *        \- Tree block A        \- Tree block B
+ *
+ *   Even if we just want to writeback tree block A, we will also writeback
+ *   tree block B if it's also dirty.
+ *
+ *   This may cause extra metadata writeback which results more COW.
+ *
+ * Implementation:
+ * - Common
+ *   Both metadata and data will use an new structure, btrfs_subpage, to
+ *   record the status of each sector inside a page.
+ *   This provides the extra granularity needed.
+ *
+ * - Metadata
+ *   Since we have multiple tree blocks inside one page, we can't rely on page
+ *   locking anymore, or we will have greatly reduced concurrency or even
+ *   deadlock (hold one tree lock while try to lock another tree lock in the
+ *   same page).
+ *
+ *   Thus for metadata locking, subpage support relies on io_tree locking only.
+ *   This means a slightly more tree locking latency.
+ */
 #include <linux/slab.h>
 #include "ctree.h"
 #include "subpage.h"