diff mbox series

[15/17] btrfs: introduce subpage_eb_mapping for extent buffers

Message ID 20200908075230.86856-16-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: add read-only support for subpage sector size | expand

Commit Message

Qu Wenruo Sept. 8, 2020, 7:52 a.m. UTC
One of the design blockage for subpage support is the btree inode
page::private mapping.

Currently page::private for btree inode is a pointer to extent buffer
who owns this page.
This is fine for sectorsize == PAGE_SIZE case, but not suitable for
subpage size support, as in that case one page can hold multiple tree
blocks.

So to support subpage, here we introduce a new structure,
subpage_eb_mapping, to record how many extent buffers are referring to
one page.

It uses a bitmap (at most 16 bits used) to record tree blocks, and a
extent buffer pointers array (at most 16 too) to record the owners.

This patch will modify the following functions to add subpage support
using subpage_eb_mapping structure:
- attach_extent_buffer_page()
- detach_extent_buffer_page()
- grab_extent_buffer_from_page()
- try_release_extent_buffer()

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 221 ++++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/extent_io.h |   3 +
 2 files changed, 212 insertions(+), 12 deletions(-)

Comments

kernel test robot Sept. 8, 2020, 10:22 a.m. UTC | #1
Hi Qu,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v5.9-rc4]
[also build test WARNING on next-20200903]
[cannot apply to kdave/for-next btrfs/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200908-155601
base:    f4d51dffc6c01a9e94650d95ce0104964f8ae822
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=xtensa 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/btrfs/extent_io.c:6309:5: warning: no previous prototype for 'try_release_subpage_ebs' [-Wmissing-prototypes]
    6309 | int try_release_subpage_ebs(struct page *page)
         |     ^~~~~~~~~~~~~~~~~~~~~~~

# https://github.com/0day-ci/linux/commit/3ef1cb4eb96ce4dce4dc94e3f06c4dd41879e977
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200908-155601
git checkout 3ef1cb4eb96ce4dce4dc94e3f06c4dd41879e977
vim +/try_release_subpage_ebs +6309 fs/btrfs/extent_io.c

  6308	
> 6309	int try_release_subpage_ebs(struct page *page)
  6310	{
  6311		struct subpage_eb_mapping *mapping;
  6312		int i;
  6313	
  6314		assert_spin_locked(&page->mapping->private_lock);
  6315		if (!PagePrivate(page))
  6316			return 1;
  6317	
  6318		mapping = (struct subpage_eb_mapping *)page->private;
  6319		for (i = 0; i < SUBPAGE_NR_EXTENT_BUFFERS && PagePrivate(page); i++) {
  6320			struct btrfs_fs_info *fs_info = page_to_fs_info(page);
  6321			struct extent_buffer *eb;
  6322			int ret;
  6323	
  6324			if (!test_bit(i, &mapping->bitmap))
  6325				continue;
  6326	
  6327			eb = mapping->buffers[i];
  6328			spin_unlock(&page->mapping->private_lock);
  6329			spin_lock(&eb->refs_lock);
  6330			ret = release_extent_buffer(eb);
  6331			spin_lock(&page->mapping->private_lock);
  6332	
  6333			/*
  6334			 * Extent buffer can't be freed yet, must jump to next slot
  6335			 * and avoid calling release_extent_buffer().
  6336			 */
  6337			if (!ret)
  6338				i += (fs_info->nodesize / fs_info->sectorsize - 1);
  6339		}
  6340		/*
  6341		 * detach_subpage_mapping() from release_extent_buffer() has detached
  6342		 * all ebs from this page. All related ebs are released.
  6343		 */
  6344		if (!PagePrivate(page))
  6345			return 1;
  6346		return 0;
  6347	}
  6348	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Dan Carpenter Sept. 8, 2020, 2:24 p.m. UTC | #2
Hi Qu,

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200908-155601
base:    f4d51dffc6c01a9e94650d95ce0104964f8ae822
config: x86_64-randconfig-m001-20200907 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
fs/btrfs/extent_io.c:5516 alloc_extent_buffer() error: uninitialized symbol 'num_pages'.

Old smatch warnings:
fs/btrfs/extent_io.c:6397 try_release_extent_buffer() warn: inconsistent returns 'eb->refs_lock'.

# https://github.com/0day-ci/linux/commit/3ef1cb4eb96ce4dce4dc94e3f06c4dd41879e977
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200908-155601
git checkout 3ef1cb4eb96ce4dce4dc94e3f06c4dd41879e977
vim +/subpage_mapping +5511 fs/btrfs/extent_io.c

f28491e0a6c46d Josef Bacik         2013-12-16  5389  struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
ce3e69847e3ec7 David Sterba        2014-06-15  5390  					  u64 start)
d1310b2e0cd98e Chris Mason         2008-01-24  5391  {
da17066c40472c Jeff Mahoney        2016-06-15  5392  	unsigned long len = fs_info->nodesize;
cc5e31a4775d0d David Sterba        2018-03-01  5393  	int num_pages;
cc5e31a4775d0d David Sterba        2018-03-01  5394  	int i;
09cbfeaf1a5a67 Kirill A. Shutemov  2016-04-01  5395  	unsigned long index = start >> PAGE_SHIFT;
d1310b2e0cd98e Chris Mason         2008-01-24  5396  	struct extent_buffer *eb;
6af118ce51b52c Chris Mason         2008-07-22  5397  	struct extent_buffer *exists = NULL;
d1310b2e0cd98e Chris Mason         2008-01-24  5398  	struct page *p;
f28491e0a6c46d Josef Bacik         2013-12-16  5399  	struct address_space *mapping = fs_info->btree_inode->i_mapping;
3ef1cb4eb96ce4 Qu Wenruo           2020-09-08  5400  	struct subpage_eb_mapping *subpage_mapping = NULL;
3ef1cb4eb96ce4 Qu Wenruo           2020-09-08  5401  	bool subpage = (fs_info->sectorsize < PAGE_SIZE);
d1310b2e0cd98e Chris Mason         2008-01-24  5402  	int uptodate = 1;
19fe0a8b787d7c Miao Xie            2010-10-26  5403  	int ret;
d1310b2e0cd98e Chris Mason         2008-01-24  5404  
da17066c40472c Jeff Mahoney        2016-06-15  5405  	if (!IS_ALIGNED(start, fs_info->sectorsize)) {
c871b0f2fd27e7 Liu Bo              2016-06-06  5406  		btrfs_err(fs_info, "bad tree block start %llu", start);
c871b0f2fd27e7 Liu Bo              2016-06-06  5407  		return ERR_PTR(-EINVAL);
c871b0f2fd27e7 Liu Bo              2016-06-06  5408  	}
039b297b76d816 Qu Wenruo           2020-09-08  5409  	if (fs_info->sectorsize < PAGE_SIZE && round_down(start, PAGE_SIZE) !=
039b297b76d816 Qu Wenruo           2020-09-08  5410  	    round_down(start + len - 1, PAGE_SIZE)) {
039b297b76d816 Qu Wenruo           2020-09-08  5411  		btrfs_err(fs_info,
039b297b76d816 Qu Wenruo           2020-09-08  5412  		"tree block crosses page boundary, start %llu nodesize %lu",
039b297b76d816 Qu Wenruo           2020-09-08  5413  			  start, len);
039b297b76d816 Qu Wenruo           2020-09-08  5414  		return ERR_PTR(-EINVAL);
039b297b76d816 Qu Wenruo           2020-09-08  5415  	}
c871b0f2fd27e7 Liu Bo              2016-06-06  5416  
f28491e0a6c46d Josef Bacik         2013-12-16  5417  	eb = find_extent_buffer(fs_info, start);
452c75c3d21870 Chandra Seetharaman 2013-10-07  5418  	if (eb)
6af118ce51b52c Chris Mason         2008-07-22  5419  		return eb;
6af118ce51b52c Chris Mason         2008-07-22  5420  
23d79d81b13431 David Sterba        2014-06-15  5421  	eb = __alloc_extent_buffer(fs_info, start, len);
2b114d1d33551a Peter               2008-04-01  5422  	if (!eb)
c871b0f2fd27e7 Liu Bo              2016-06-06  5423  		return ERR_PTR(-ENOMEM);
d1310b2e0cd98e Chris Mason         2008-01-24  5424  
3ef1cb4eb96ce4 Qu Wenruo           2020-09-08  5425  	if (subpage) {
3ef1cb4eb96ce4 Qu Wenruo           2020-09-08  5426  		subpage_mapping = kmalloc(sizeof(*subpage_mapping), GFP_NOFS);
3ef1cb4eb96ce4 Qu Wenruo           2020-09-08  5427  		if (!mapping) {

This was probably supposed to be if "subpage_mapping" instead of
"mapping".

3ef1cb4eb96ce4 Qu Wenruo           2020-09-08  5428  			exists = ERR_PTR(-ENOMEM);
3ef1cb4eb96ce4 Qu Wenruo           2020-09-08  5429  			goto free_eb;

The "num_pages" variable is uninitialized on this goto path.

3ef1cb4eb96ce4 Qu Wenruo           2020-09-08  5430  		}
3ef1cb4eb96ce4 Qu Wenruo           2020-09-08  5431  	}
3ef1cb4eb96ce4 Qu Wenruo           2020-09-08  5432  
65ad010488a5cc David Sterba        2018-06-29  5433  	num_pages = num_extent_pages(eb);
727011e07cbdf8 Chris Mason         2010-08-06  5434  	for (i = 0; i < num_pages; i++, index++) {
d1b5c5671d010d Michal Hocko        2015-08-19  5435  		p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
c871b0f2fd27e7 Liu Bo              2016-06-06  5436  		if (!p) {
c871b0f2fd27e7 Liu Bo              2016-06-06  5437  			exists = ERR_PTR(-ENOMEM);
6af118ce51b52c Chris Mason         2008-07-22  5438  			goto free_eb;
c871b0f2fd27e7 Liu Bo              2016-06-06  5439  		}
4f2de97acee653 Josef Bacik         2012-03-07  5440  
4f2de97acee653 Josef Bacik         2012-03-07  5441  		spin_lock(&mapping->private_lock);
4f2de97acee653 Josef Bacik         2012-03-07  5442  		if (PagePrivate(p)) {
3ef1cb4eb96ce4 Qu Wenruo           2020-09-08  5443  			exists = grab_extent_buffer_from_page(p, start);
b76bd038ff9290 Qu Wenruo           2020-09-08  5444  			if (exists && atomic_inc_not_zero(&exists->refs)) {
                                                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This increment doesn't pair perfectly with the decrement in the error
handling.  In other words, we sometimes decrement it when it was never
incremented.  This presumably will lead to a use after free.

4f2de97acee653 Josef Bacik         2012-03-07  5445  				spin_unlock(&mapping->private_lock);
4f2de97acee653 Josef Bacik         2012-03-07  5446  				unlock_page(p);
09cbfeaf1a5a67 Kirill A. Shutemov  2016-04-01  5447  				put_page(p);
2457aec63745e2 Mel Gorman          2014-06-04  5448  				mark_extent_buffer_accessed(exists, p);
4f2de97acee653 Josef Bacik         2012-03-07  5449  				goto free_eb;
4f2de97acee653 Josef Bacik         2012-03-07  5450  			}
5ca64f45e92dc5 Omar Sandoval       2015-02-24  5451  			exists = NULL;
4f2de97acee653 Josef Bacik         2012-03-07  5452  
3ef1cb4eb96ce4 Qu Wenruo           2020-09-08  5453  			if (!subpage) {
4f2de97acee653 Josef Bacik         2012-03-07  5454  				/*
3ef1cb4eb96ce4 Qu Wenruo           2020-09-08  5455  				 * Do this so attach doesn't complain and we
3ef1cb4eb96ce4 Qu Wenruo           2020-09-08  5456  				 * need to drop the ref the old guy had.
4f2de97acee653 Josef Bacik         2012-03-07  5457  				 */
4f2de97acee653 Josef Bacik         2012-03-07  5458  				ClearPagePrivate(p);
0b32f4bbb423f0 Josef Bacik         2012-03-13  5459  				WARN_ON(PageDirty(p));
09cbfeaf1a5a67 Kirill A. Shutemov  2016-04-01  5460  				put_page(p);
d1310b2e0cd98e Chris Mason         2008-01-24  5461  			}
3ef1cb4eb96ce4 Qu Wenruo           2020-09-08  5462  		}
3ef1cb4eb96ce4 Qu Wenruo           2020-09-08  5463  		attach_extent_buffer_page(eb, p, subpage_mapping);
4f2de97acee653 Josef Bacik         2012-03-07  5464  		spin_unlock(&mapping->private_lock);
3ef1cb4eb96ce4 Qu Wenruo           2020-09-08  5465  		subpage_mapping = NULL;
0b32f4bbb423f0 Josef Bacik         2012-03-13  5466  		WARN_ON(PageDirty(p));
727011e07cbdf8 Chris Mason         2010-08-06  5467  		eb->pages[i] = p;
d1310b2e0cd98e Chris Mason         2008-01-24  5468  		if (!PageUptodate(p))
d1310b2e0cd98e Chris Mason         2008-01-24  5469  			uptodate = 0;
eb14ab8ed24a04 Chris Mason         2011-02-10  5470  
eb14ab8ed24a04 Chris Mason         2011-02-10  5471  		/*
b16d011e79fb35 Nikolay Borisov     2018-07-04  5472  		 * We can't unlock the pages just yet since the extent buffer
b16d011e79fb35 Nikolay Borisov     2018-07-04  5473  		 * hasn't been properly inserted in the radix tree, this
b16d011e79fb35 Nikolay Borisov     2018-07-04  5474  		 * opens a race with btree_releasepage which can free a page
b16d011e79fb35 Nikolay Borisov     2018-07-04  5475  		 * while we are still filling in all pages for the buffer and
b16d011e79fb35 Nikolay Borisov     2018-07-04  5476  		 * we could crash.
eb14ab8ed24a04 Chris Mason         2011-02-10  5477  		 */
d1310b2e0cd98e Chris Mason         2008-01-24  5478  	}
d1310b2e0cd98e Chris Mason         2008-01-24  5479  	if (uptodate)
b4ce94de9b4d64 Chris Mason         2009-02-04  5480  		set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
115391d2315239 Josef Bacik         2012-03-09  5481  again:
e1860a7724828a David Sterba        2016-05-09  5482  	ret = radix_tree_preload(GFP_NOFS);
c871b0f2fd27e7 Liu Bo              2016-06-06  5483  	if (ret) {
c871b0f2fd27e7 Liu Bo              2016-06-06  5484  		exists = ERR_PTR(ret);
19fe0a8b787d7c Miao Xie            2010-10-26  5485  		goto free_eb;
c871b0f2fd27e7 Liu Bo              2016-06-06  5486  	}
19fe0a8b787d7c Miao Xie            2010-10-26  5487  
f28491e0a6c46d Josef Bacik         2013-12-16  5488  	spin_lock(&fs_info->buffer_lock);
f28491e0a6c46d Josef Bacik         2013-12-16  5489  	ret = radix_tree_insert(&fs_info->buffer_radix,
d31cf6896006df Qu Wenruo           2020-09-08  5490  				start / fs_info->sectorsize, eb);
f28491e0a6c46d Josef Bacik         2013-12-16  5491  	spin_unlock(&fs_info->buffer_lock);
19fe0a8b787d7c Miao Xie            2010-10-26  5492  	radix_tree_preload_end();
452c75c3d21870 Chandra Seetharaman 2013-10-07  5493  	if (ret == -EEXIST) {
f28491e0a6c46d Josef Bacik         2013-12-16  5494  		exists = find_extent_buffer(fs_info, start);
452c75c3d21870 Chandra Seetharaman 2013-10-07  5495  		if (exists)
6af118ce51b52c Chris Mason         2008-07-22  5496  			goto free_eb;
452c75c3d21870 Chandra Seetharaman 2013-10-07  5497  		else
452c75c3d21870 Chandra Seetharaman 2013-10-07  5498  			goto again;
6af118ce51b52c Chris Mason         2008-07-22  5499  	}
6af118ce51b52c Chris Mason         2008-07-22  5500  	/* add one reference for the tree */
0b32f4bbb423f0 Josef Bacik         2012-03-13  5501  	check_buffer_tree_ref(eb);
34b41acec1ccc0 Josef Bacik         2013-12-13  5502  	set_bit(EXTENT_BUFFER_IN_TREE, &eb->bflags);
eb14ab8ed24a04 Chris Mason         2011-02-10  5503  
eb14ab8ed24a04 Chris Mason         2011-02-10  5504  	/*
b16d011e79fb35 Nikolay Borisov     2018-07-04  5505  	 * Now it's safe to unlock the pages because any calls to
b16d011e79fb35 Nikolay Borisov     2018-07-04  5506  	 * btree_releasepage will correctly detect that a page belongs to a
b16d011e79fb35 Nikolay Borisov     2018-07-04  5507  	 * live buffer and won't free them prematurely.
eb14ab8ed24a04 Chris Mason         2011-02-10  5508  	 */
28187ae569e8a6 Nikolay Borisov     2018-07-04  5509  	for (i = 0; i < num_pages; i++)
28187ae569e8a6 Nikolay Borisov     2018-07-04  5510  		unlock_page(eb->pages[i]);
d1310b2e0cd98e Chris Mason         2008-01-24 @5511  	return eb;
d1310b2e0cd98e Chris Mason         2008-01-24  5512  
6af118ce51b52c Chris Mason         2008-07-22  5513  free_eb:
5ca64f45e92dc5 Omar Sandoval       2015-02-24  5514  	WARN_ON(!atomic_dec_and_test(&eb->refs));
3ef1cb4eb96ce4 Qu Wenruo           2020-09-08  5515  	kfree(subpage_mapping);
727011e07cbdf8 Chris Mason         2010-08-06 @5516  	for (i = 0; i < num_pages; i++) {
727011e07cbdf8 Chris Mason         2010-08-06  5517  		if (eb->pages[i])
727011e07cbdf8 Chris Mason         2010-08-06  5518  			unlock_page(eb->pages[i]);
727011e07cbdf8 Chris Mason         2010-08-06  5519  	}
eb14ab8ed24a04 Chris Mason         2011-02-10  5520  
897ca6e9b4fef8 Miao Xie            2010-10-26  5521  	btrfs_release_extent_buffer(eb);
6af118ce51b52c Chris Mason         2008-07-22  5522  	return exists;
d1310b2e0cd98e Chris Mason         2008-01-24  5523  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
diff mbox series

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a83b63ecc5f8..87b3bb781532 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -29,6 +29,34 @@  static struct kmem_cache *extent_state_cache;
 static struct kmem_cache *extent_buffer_cache;
 static struct bio_set btrfs_bioset;
 
+/* Upper limit of how many extent buffers can be stored in one page */
+#define SUBPAGE_NR_EXTENT_BUFFERS (SZ_64K / SZ_4K)
+/*
+ * Structure for subpage support, recording the page -> extent buffer mapping
+ *
+ * For subpage support, one 64K page can contain several tree blocks, other than
+ * 1:1 page <-> extent buffer mapping from sectorsize == PAGE_SIZE case.
+ */
+struct subpage_eb_mapping {
+	/*
+	 * Which range has extent buffer.
+	 *
+	 * One bit represents one sector, bit nr represents the offset in page.
+	 * At most 16 bits are utilized.
+	 */
+	unsigned long bitmap;
+
+	/* We only support 64K PAGE_SIZE system to mount 4K sectorsize fs */
+	struct extent_buffer *buffers[SUBPAGE_NR_EXTENT_BUFFERS];
+};
+
+struct btrfs_fs_info *page_to_fs_info(struct page *page)
+{
+	ASSERT(page && page->mapping);
+
+	return BTRFS_I(page->mapping->host)->root->fs_info;
+}
+
 static inline bool extent_state_in_tree(const struct extent_state *state)
 {
 	return !RB_EMPTY_NODE(&state->rb_node);
@@ -3098,12 +3126,50 @@  static int submit_extent_page(unsigned int opf,
 	return ret;
 }
 
+static void attach_subpage_mapping(struct extent_buffer *eb,
+				   struct page *page,
+				   struct subpage_eb_mapping *mapping)
+{
+	u32 sectorsize = eb->fs_info->sectorsize;
+	u32 nodesize = eb->fs_info->nodesize;
+	int index_start = (eb->start - page_offset(page)) / sectorsize;
+	int nr_bits = nodesize / sectorsize;
+	int i;
+
+	ASSERT(mapping);
+	if (!PagePrivate(page)) {
+		/* Attach mapping to page::private and initialize */
+		memset(mapping, 0, sizeof(*mapping));
+		attach_page_private(page, mapping);
+	} else {
+		/* Use the existing page::private as mapping */
+		kfree(mapping);
+		mapping = (struct subpage_eb_mapping *) page->private;
+	}
+
+	/* Set the bitmap and pointers */
+	for (i = index_start; i < index_start + nr_bits; i++) {
+		set_bit(i, &mapping->bitmap);
+		mapping->buffers[i] = eb;
+	}
+}
+
 static void attach_extent_buffer_page(struct extent_buffer *eb,
-				      struct page *page)
+				      struct page *page,
+				      struct subpage_eb_mapping *mapping)
 {
+	bool subpage = (eb->fs_info->sectorsize < PAGE_SIZE);
 	if (page->mapping)
 		assert_spin_locked(&page->mapping->private_lock);
 
+	if (subpage && page->mapping) {
+		attach_subpage_mapping(eb, page, mapping);
+		return;
+	}
+	/*
+	 * Anonymous page and sectorsize == PAGE_SIZE uses page::private as a
+	 * pointer to eb directly.
+	 */
 	if (!PagePrivate(page))
 		attach_page_private(page, eb);
 	else
@@ -4928,16 +4994,61 @@  int extent_buffer_under_io(const struct extent_buffer *eb)
 		test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
 }
 
+static void detach_subpage_mapping(struct extent_buffer *eb, struct page *page)
+{
+	struct subpage_eb_mapping *mapping;
+	u32 sectorsize = eb->fs_info->sectorsize;
+	int start_index;
+	int nr_bits = eb->fs_info->nodesize / sectorsize;
+	int i;
+
+	/* Page already detached */
+	if (!PagePrivate(page))
+		return;
+
+	assert_spin_locked(&page->mapping->private_lock);
+	ASSERT(eb->start >= page_offset(page) &&
+	       eb->start < page_offset(page) + PAGE_SIZE);
+
+	mapping = (struct subpage_eb_mapping *)page->private;
+	start_index = (eb->start - page_offset(page)) / sectorsize;
+
+	for (i = start_index; i < start_index + nr_bits; i++) {
+		if (test_bit(i, &mapping->bitmap) &&
+		    mapping->buffers[i] == eb) {
+			clear_bit(i, &mapping->bitmap);
+			mapping->buffers[i] = NULL;
+		}
+	}
+
+	/* Are we the last owner ? */
+	if (mapping->bitmap == 0) {
+		kfree(mapping);
+		detach_page_private(page);
+		/* One for the first time allocated the page */
+		put_page(page);
+	}
+}
+
 static void detach_extent_buffer_page(struct extent_buffer *eb,
 				      struct page *page)
 {
 	bool mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
+	bool subpage = (eb->fs_info->sectorsize < PAGE_SIZE);
 
 	if (!page)
 		return;
 
 	if (mapped)
 		spin_lock(&page->mapping->private_lock);
+
+	if (subpage && page->mapping) {
+		detach_subpage_mapping(eb, page);
+		if (mapped)
+			spin_unlock(&page->mapping->private_lock);
+		return;
+	}
+
 	if (PagePrivate(page) && page->private == (unsigned long)eb) {
 		BUG_ON(test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
 		BUG_ON(PageDirty(page));
@@ -5035,7 +5146,7 @@  struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
 			btrfs_release_extent_buffer(new);
 			return NULL;
 		}
-		attach_extent_buffer_page(new, p);
+		attach_extent_buffer_page(new, p, NULL);
 		WARN_ON(PageDirty(p));
 		SetPageUptodate(p);
 		new->pages[i] = p;
@@ -5243,8 +5354,31 @@  struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
  * The function here is to ensure we have proper locking and detect such race
  * so we won't allocating an eb twice.
  */
-static struct extent_buffer *grab_extent_buffer_from_page(struct page *page)
+static struct extent_buffer *grab_extent_buffer_from_page(struct page *page,
+							  u64 bytenr)
 {
+	struct btrfs_fs_info *fs_info = page_to_fs_info(page);
+	bool subpage = (fs_info->sectorsize < PAGE_SIZE);
+
+	if (!PagePrivate(page))
+		return NULL;
+
+	if (subpage) {
+		struct subpage_eb_mapping *mapping;
+		u32 sectorsize = fs_info->sectorsize;
+		int start_index;
+
+		ASSERT(bytenr >= page_offset(page) &&
+		       bytenr < page_offset(page) + PAGE_SIZE);
+
+		start_index = (bytenr - page_offset(page)) / sectorsize;
+		mapping = (struct subpage_eb_mapping *)page->private;
+
+		if (test_bit(start_index, &mapping->bitmap))
+			return mapping->buffers[start_index];
+		return NULL;
+	}
+
 	/*
 	 * For PAGE_SIZE == sectorsize case, a btree_inode page should have its
 	 * private pointer as extent buffer who owns this page.
@@ -5263,6 +5397,8 @@  struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 	struct extent_buffer *exists = NULL;
 	struct page *p;
 	struct address_space *mapping = fs_info->btree_inode->i_mapping;
+	struct subpage_eb_mapping *subpage_mapping = NULL;
+	bool subpage = (fs_info->sectorsize < PAGE_SIZE);
 	int uptodate = 1;
 	int ret;
 
@@ -5286,6 +5422,14 @@  struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 	if (!eb)
 		return ERR_PTR(-ENOMEM);
 
+	if (subpage) {
+		subpage_mapping = kmalloc(sizeof(*subpage_mapping), GFP_NOFS);
+		if (!mapping) {
+			exists = ERR_PTR(-ENOMEM);
+			goto free_eb;
+		}
+	}
+
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++, index++) {
 		p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
@@ -5296,7 +5440,7 @@  struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 
 		spin_lock(&mapping->private_lock);
 		if (PagePrivate(p)) {
-			exists = grab_extent_buffer_from_page(p);
+			exists = grab_extent_buffer_from_page(p, start);
 			if (exists && atomic_inc_not_zero(&exists->refs)) {
 				spin_unlock(&mapping->private_lock);
 				unlock_page(p);
@@ -5306,16 +5450,19 @@  struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 			}
 			exists = NULL;
 
-			/*
-			 * Do this so attach doesn't complain and we need to
-			 * drop the ref the old guy had.
-			 */
-			ClearPagePrivate(p);
-			WARN_ON(PageDirty(p));
-			put_page(p);
+			if (!subpage) {
+				/*
+				 * Do this so attach doesn't complain and we
+				 * need to drop the ref the old guy had.
+				 */
+				ClearPagePrivate(p);
+				WARN_ON(PageDirty(p));
+				put_page(p);
+			}
 		}
-		attach_extent_buffer_page(eb, p);
+		attach_extent_buffer_page(eb, p, subpage_mapping);
 		spin_unlock(&mapping->private_lock);
+		subpage_mapping = NULL;
 		WARN_ON(PageDirty(p));
 		eb->pages[i] = p;
 		if (!PageUptodate(p))
@@ -5365,6 +5512,7 @@  struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 
 free_eb:
 	WARN_ON(!atomic_dec_and_test(&eb->refs));
+	kfree(subpage_mapping);
 	for (i = 0; i < num_pages; i++) {
 		if (eb->pages[i])
 			unlock_page(eb->pages[i]);
@@ -6158,8 +6306,49 @@  void memmove_extent_buffer(const struct extent_buffer *dst,
 	}
 }
 
+int try_release_subpage_ebs(struct page *page)
+{
+	struct subpage_eb_mapping *mapping;
+	int i;
+
+	assert_spin_locked(&page->mapping->private_lock);
+	if (!PagePrivate(page))
+		return 1;
+
+	mapping = (struct subpage_eb_mapping *)page->private;
+	for (i = 0; i < SUBPAGE_NR_EXTENT_BUFFERS && PagePrivate(page); i++) {
+		struct btrfs_fs_info *fs_info = page_to_fs_info(page);
+		struct extent_buffer *eb;
+		int ret;
+
+		if (!test_bit(i, &mapping->bitmap))
+			continue;
+
+		eb = mapping->buffers[i];
+		spin_unlock(&page->mapping->private_lock);
+		spin_lock(&eb->refs_lock);
+		ret = release_extent_buffer(eb);
+		spin_lock(&page->mapping->private_lock);
+
+		/*
+		 * Extent buffer can't be freed yet, must jump to next slot
+		 * and avoid calling release_extent_buffer().
+		 */
+		if (!ret)
+			i += (fs_info->nodesize / fs_info->sectorsize - 1);
+	}
+	/*
+	 * detach_subpage_mapping() from release_extent_buffer() has detached
+	 * all ebs from this page. All related ebs are released.
+	 */
+	if (!PagePrivate(page))
+		return 1;
+	return 0;
+}
+
 int try_release_extent_buffer(struct page *page)
 {
+	bool subpage = (page_to_fs_info(page)->sectorsize < PAGE_SIZE);
 	struct extent_buffer *eb;
 
 	/*
@@ -6172,6 +6361,14 @@  int try_release_extent_buffer(struct page *page)
 		return 1;
 	}
 
+	if (subpage) {
+		int ret;
+
+		ret = try_release_subpage_ebs(page);
+		spin_unlock(&page->mapping->private_lock);
+		return ret;
+	}
+
 	eb = (struct extent_buffer *)page->private;
 	BUG_ON(!eb);
 
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index e16c5449ba48..6593b6883438 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -184,6 +184,9 @@  static inline int extent_compress_type(unsigned long bio_flags)
 	return bio_flags >> EXTENT_BIO_FLAG_SHIFT;
 }
 
+/* Unable to inline it due to the requirement for both ASSERT() and BTRFS_I() */
+struct btrfs_fs_info *page_to_fs_info(struct page *page);
+
 struct extent_map_tree;
 
 typedef struct extent_map *(get_extent_t)(struct btrfs_inode *inode,