Message ID | 2f8e30f966cfd9e06b0745a691bd1a5566aab780.1690171333.git.naohiro.aota@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: zoned: write-time activation of metadata block group | expand |
Hi Naohiro, kernel test robot noticed the following build errors: [auto build test ERROR on kdave/for-next] [also build test ERROR on linus/master v6.5-rc3 next-20230721] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Naohiro-Aota/btrfs-zoned-introduce-block_group-context-for-submit_eb_page/20230724-122053 base: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next patch link: https://lore.kernel.org/r/2f8e30f966cfd9e06b0745a691bd1a5566aab780.1690171333.git.naohiro.aota%40wdc.com patch subject: [PATCH 5/8] btrfs: zoned: activate metadata block group on write time config: parisc-randconfig-r023-20230724 (https://download.01.org/0day-ci/archive/20230724/202307241420.4nG9x5F7-lkp@intel.com/config) compiler: hppa-linux-gcc (GCC) 12.3.0 reproduce: (https://download.01.org/0day-ci/archive/20230724/202307241420.4nG9x5F7-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202307241420.4nG9x5F7-lkp@intel.com/ All errors (new ones prefixed by >>): fs/btrfs/extent_io.c: In function 'submit_eb_page': >> fs/btrfs/extent_io.c:1827:58: error: passing argument 2 of 'btrfs_check_meta_write_pointer' from incompatible pointer type [-Werror=incompatible-pointer-types] 1827 | if (!btrfs_check_meta_write_pointer(eb->fs_info, wbc, eb, bg_context)) { | ^~~ | | | struct writeback_control * In file included from fs/btrfs/extent_io.c:30: fs/btrfs/zoned.h:192:54: note: expected 'struct extent_buffer *' but argument is of type 'struct writeback_control *' 192 | struct extent_buffer *eb, | ~~~~~~~~~~~~~~~~~~~~~~^~ fs/btrfs/extent_io.c:1827:63: error: passing argument 3 of 'btrfs_check_meta_write_pointer' from incompatible pointer type [-Werror=incompatible-pointer-types] 1827 | if (!btrfs_check_meta_write_pointer(eb->fs_info, wbc, eb, bg_context)) { | ^~ | | | struct extent_buffer * fs/btrfs/zoned.h:193:59: note: expected 'struct btrfs_block_group **' but argument is of type 'struct extent_buffer *' 193 | struct btrfs_block_group **cache_ret) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~ >> fs/btrfs/extent_io.c:1827:14: error: too many arguments to function 'btrfs_check_meta_write_pointer' 1827 | if (!btrfs_check_meta_write_pointer(eb->fs_info, wbc, eb, bg_context)) { | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fs/btrfs/zoned.h:191:20: note: declared here 191 | static inline bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors vim +/btrfs_check_meta_write_pointer +1827 fs/btrfs/extent_io.c 1766 1767 /* 1768 * Submit all page(s) of one extent buffer. 1769 * 1770 * @page: the page of one extent buffer 1771 * @eb_context: to determine if we need to submit this page, if current page 1772 * belongs to this eb, we don't need to submit 1773 * 1774 * The caller should pass each page in their bytenr order, and here we use 1775 * @eb_context to determine if we have submitted pages of one extent buffer. 1776 * 1777 * If we have, we just skip until we hit a new page that doesn't belong to 1778 * current @eb_context. 1779 * 1780 * If not, we submit all the page(s) of the extent buffer. 1781 * 1782 * Return >0 if we have submitted the extent buffer successfully. 1783 * Return 0 if we don't need to submit the page, as it's already submitted by 1784 * previous call. 1785 * Return <0 for fatal error. 1786 */ 1787 static int submit_eb_page(struct page *page, struct writeback_control *wbc, 1788 struct extent_buffer **eb_context, 1789 struct btrfs_block_group **bg_context) 1790 { 1791 struct address_space *mapping = page->mapping; 1792 struct extent_buffer *eb; 1793 int ret; 1794 1795 if (!PagePrivate(page)) 1796 return 0; 1797 1798 if (btrfs_sb(page->mapping->host->i_sb)->nodesize < PAGE_SIZE) 1799 return submit_eb_subpage(page, wbc); 1800 1801 spin_lock(&mapping->private_lock); 1802 if (!PagePrivate(page)) { 1803 spin_unlock(&mapping->private_lock); 1804 return 0; 1805 } 1806 1807 eb = (struct extent_buffer *)page->private; 1808 1809 /* 1810 * Shouldn't happen and normally this would be a BUG_ON but no point 1811 * crashing the machine for something we can survive anyway. 1812 */ 1813 if (WARN_ON(!eb)) { 1814 spin_unlock(&mapping->private_lock); 1815 return 0; 1816 } 1817 1818 if (eb == *eb_context) { 1819 spin_unlock(&mapping->private_lock); 1820 return 0; 1821 } 1822 ret = atomic_inc_not_zero(&eb->refs); 1823 spin_unlock(&mapping->private_lock); 1824 if (!ret) 1825 return 0; 1826 > 1827 if (!btrfs_check_meta_write_pointer(eb->fs_info, wbc, eb, bg_context)) { 1828 /* 1829 * If for_sync, this hole will be filled with 1830 * trasnsaction commit. 1831 */ 1832 if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync) 1833 ret = -EAGAIN; 1834 else 1835 ret = 0; 1836 free_extent_buffer(eb); 1837 return ret; 1838 } 1839 1840 *eb_context = eb; 1841 1842 if (!lock_extent_buffer_for_io(eb, wbc)) { 1843 free_extent_buffer(eb); 1844 return 0; 1845 } 1846 if (*bg_context) { 1847 /* Implies write in zoned mode. */ 1848 struct btrfs_block_group *bg = *bg_context; 1849 1850 /* Mark the last eb in the block group. */ 1851 btrfs_schedule_zone_finish_bg(bg, eb); 1852 bg->meta_write_pointer += eb->len; 1853 } 1854 write_one_eb(eb, wbc); 1855 free_extent_buffer(eb); 1856 return 1; 1857 } 1858
Hi Naohiro, kernel test robot noticed the following build errors: [auto build test ERROR on kdave/for-next] [also build test ERROR on linus/master v6.5-rc3 next-20230724] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Naohiro-Aota/btrfs-zoned-introduce-block_group-context-for-submit_eb_page/20230724-122053 base: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next patch link: https://lore.kernel.org/r/2f8e30f966cfd9e06b0745a691bd1a5566aab780.1690171333.git.naohiro.aota%40wdc.com patch subject: [PATCH 5/8] btrfs: zoned: activate metadata block group on write time config: x86_64-rhel-8.3-rust (https://download.01.org/0day-ci/archive/20230724/202307241530.1g5O8gwF-lkp@intel.com/config) compiler: clang version 15.0.7 (https://github.com/llvm/llvm-project.git 8dfdcc7b7bf66834a761bd8de445840ef68e4d1a) reproduce: (https://download.01.org/0day-ci/archive/20230724/202307241530.1g5O8gwF-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202307241530.1g5O8gwF-lkp@intel.com/ All errors (new ones prefixed by >>): fs/btrfs/extent_io.c:214:16: warning: variable 'pages_processed' set but not used [-Wunused-but-set-variable] unsigned long pages_processed = 0; ^ >> fs/btrfs/extent_io.c:1827:60: error: too many arguments to function call, expected 3, have 4 if (!btrfs_check_meta_write_pointer(eb->fs_info, wbc, eb, bg_context)) { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~ fs/btrfs/zoned.h:191:20: note: 'btrfs_check_meta_write_pointer' declared here static inline bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, ^ 1 warning and 1 error generated. vim +1827 fs/btrfs/extent_io.c 1766 1767 /* 1768 * Submit all page(s) of one extent buffer. 1769 * 1770 * @page: the page of one extent buffer 1771 * @eb_context: to determine if we need to submit this page, if current page 1772 * belongs to this eb, we don't need to submit 1773 * 1774 * The caller should pass each page in their bytenr order, and here we use 1775 * @eb_context to determine if we have submitted pages of one extent buffer. 1776 * 1777 * If we have, we just skip until we hit a new page that doesn't belong to 1778 * current @eb_context. 1779 * 1780 * If not, we submit all the page(s) of the extent buffer. 1781 * 1782 * Return >0 if we have submitted the extent buffer successfully. 1783 * Return 0 if we don't need to submit the page, as it's already submitted by 1784 * previous call. 1785 * Return <0 for fatal error. 1786 */ 1787 static int submit_eb_page(struct page *page, struct writeback_control *wbc, 1788 struct extent_buffer **eb_context, 1789 struct btrfs_block_group **bg_context) 1790 { 1791 struct address_space *mapping = page->mapping; 1792 struct extent_buffer *eb; 1793 int ret; 1794 1795 if (!PagePrivate(page)) 1796 return 0; 1797 1798 if (btrfs_sb(page->mapping->host->i_sb)->nodesize < PAGE_SIZE) 1799 return submit_eb_subpage(page, wbc); 1800 1801 spin_lock(&mapping->private_lock); 1802 if (!PagePrivate(page)) { 1803 spin_unlock(&mapping->private_lock); 1804 return 0; 1805 } 1806 1807 eb = (struct extent_buffer *)page->private; 1808 1809 /* 1810 * Shouldn't happen and normally this would be a BUG_ON but no point 1811 * crashing the machine for something we can survive anyway. 1812 */ 1813 if (WARN_ON(!eb)) { 1814 spin_unlock(&mapping->private_lock); 1815 return 0; 1816 } 1817 1818 if (eb == *eb_context) { 1819 spin_unlock(&mapping->private_lock); 1820 return 0; 1821 } 1822 ret = atomic_inc_not_zero(&eb->refs); 1823 spin_unlock(&mapping->private_lock); 1824 if (!ret) 1825 return 0; 1826 > 1827 if (!btrfs_check_meta_write_pointer(eb->fs_info, wbc, eb, bg_context)) { 1828 /* 1829 * If for_sync, this hole will be filled with 1830 * trasnsaction commit. 1831 */ 1832 if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync) 1833 ret = -EAGAIN; 1834 else 1835 ret = 0; 1836 free_extent_buffer(eb); 1837 return ret; 1838 } 1839 1840 *eb_context = eb; 1841 1842 if (!lock_extent_buffer_for_io(eb, wbc)) { 1843 free_extent_buffer(eb); 1844 return 0; 1845 } 1846 if (*bg_context) { 1847 /* Implies write in zoned mode. */ 1848 struct btrfs_block_group *bg = *bg_context; 1849 1850 /* Mark the last eb in the block group. */ 1851 btrfs_schedule_zone_finish_bg(bg, eb); 1852 bg->meta_write_pointer += eb->len; 1853 } 1854 write_one_eb(eb, wbc); 1855 free_extent_buffer(eb); 1856 return 1; 1857 } 1858
Hi Naohiro, kernel test robot noticed the following build errors: [auto build test ERROR on kdave/for-next] [also build test ERROR on linus/master v6.5-rc3 next-20230724] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Naohiro-Aota/btrfs-zoned-introduce-block_group-context-for-submit_eb_page/20230724-122053 base: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next patch link: https://lore.kernel.org/r/2f8e30f966cfd9e06b0745a691bd1a5566aab780.1690171333.git.naohiro.aota%40wdc.com patch subject: [PATCH 5/8] btrfs: zoned: activate metadata block group on write time config: arm-randconfig-r002-20230724 (https://download.01.org/0day-ci/archive/20230724/202307241541.8w52nEnt-lkp@intel.com/config) compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a) reproduce: (https://download.01.org/0day-ci/archive/20230724/202307241541.8w52nEnt-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202307241541.8w52nEnt-lkp@intel.com/ All errors (new ones prefixed by >>): fs/btrfs/extent_io.c:214:16: warning: variable 'pages_processed' set but not used [-Wunused-but-set-variable] 214 | unsigned long pages_processed = 0; | ^ >> fs/btrfs/extent_io.c:1827:60: error: too many arguments to function call, expected 3, have 4 1827 | if (!btrfs_check_meta_write_pointer(eb->fs_info, wbc, eb, bg_context)) { | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~ fs/btrfs/zoned.h:191:20: note: 'btrfs_check_meta_write_pointer' declared here 191 | static inline bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, | ^ 1 warning and 1 error generated. vim +1827 fs/btrfs/extent_io.c 1766 1767 /* 1768 * Submit all page(s) of one extent buffer. 1769 * 1770 * @page: the page of one extent buffer 1771 * @eb_context: to determine if we need to submit this page, if current page 1772 * belongs to this eb, we don't need to submit 1773 * 1774 * The caller should pass each page in their bytenr order, and here we use 1775 * @eb_context to determine if we have submitted pages of one extent buffer. 1776 * 1777 * If we have, we just skip until we hit a new page that doesn't belong to 1778 * current @eb_context. 1779 * 1780 * If not, we submit all the page(s) of the extent buffer. 1781 * 1782 * Return >0 if we have submitted the extent buffer successfully. 1783 * Return 0 if we don't need to submit the page, as it's already submitted by 1784 * previous call. 1785 * Return <0 for fatal error. 1786 */ 1787 static int submit_eb_page(struct page *page, struct writeback_control *wbc, 1788 struct extent_buffer **eb_context, 1789 struct btrfs_block_group **bg_context) 1790 { 1791 struct address_space *mapping = page->mapping; 1792 struct extent_buffer *eb; 1793 int ret; 1794 1795 if (!PagePrivate(page)) 1796 return 0; 1797 1798 if (btrfs_sb(page->mapping->host->i_sb)->nodesize < PAGE_SIZE) 1799 return submit_eb_subpage(page, wbc); 1800 1801 spin_lock(&mapping->private_lock); 1802 if (!PagePrivate(page)) { 1803 spin_unlock(&mapping->private_lock); 1804 return 0; 1805 } 1806 1807 eb = (struct extent_buffer *)page->private; 1808 1809 /* 1810 * Shouldn't happen and normally this would be a BUG_ON but no point 1811 * crashing the machine for something we can survive anyway. 1812 */ 1813 if (WARN_ON(!eb)) { 1814 spin_unlock(&mapping->private_lock); 1815 return 0; 1816 } 1817 1818 if (eb == *eb_context) { 1819 spin_unlock(&mapping->private_lock); 1820 return 0; 1821 } 1822 ret = atomic_inc_not_zero(&eb->refs); 1823 spin_unlock(&mapping->private_lock); 1824 if (!ret) 1825 return 0; 1826 > 1827 if (!btrfs_check_meta_write_pointer(eb->fs_info, wbc, eb, bg_context)) { 1828 /* 1829 * If for_sync, this hole will be filled with 1830 * trasnsaction commit. 1831 */ 1832 if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync) 1833 ret = -EAGAIN; 1834 else 1835 ret = 0; 1836 free_extent_buffer(eb); 1837 return ret; 1838 } 1839 1840 *eb_context = eb; 1841 1842 if (!lock_extent_buffer_for_io(eb, wbc)) { 1843 free_extent_buffer(eb); 1844 return 0; 1845 } 1846 if (*bg_context) { 1847 /* Implies write in zoned mode. */ 1848 struct btrfs_block_group *bg = *bg_context; 1849 1850 /* Mark the last eb in the block group. */ 1851 btrfs_schedule_zone_finish_bg(bg, eb); 1852 bg->meta_write_pointer += eb->len; 1853 } 1854 write_one_eb(eb, wbc); 1855 free_extent_buffer(eb); 1856 return 1; 1857 } 1858
On Mon, Jul 24, 2023 at 01:18:34PM +0900, Naohiro Aota wrote: > Since metadata write-out is always allocated sequentially, when we need to > write to a non-active block group, we can wait for the ongoing IOs to > complete, activate a new block group, and then proceed with writing to the > new block group. Somewhat unrelated, but isn't the same true for data as well, and maybe in a follow on the same activation policy should apply there as well? I guess it won't quite work without much work as reservations are tied to a BG, but it would seem like the better scheme. > + if (test_bit(BTRFS_FS_ACTIVE_ZONE_TRACKING, &fs_info->flags)) { Just a more or less cosmetic nit: there is a lot of code in this branch, and I wonder if the active zone tracking code would benefit from being split out into one or more helpers. > + bool is_system = cache->flags & BTRFS_BLOCK_GROUP_SYSTEM; > + > + spin_lock(&cache->lock); > + if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, > + &cache->runtime_flags)) { > + spin_unlock(&cache->lock); > + return true; > + } > + > + spin_unlock(&cache->lock); What is the point of the cache->lock crticial section here? The information can be out of date as soon as you drop the lock, so it looks superflous to me. > + if (fs_info->treelog_bg == cache->start) { > + if (!btrfs_zone_activate(cache)) { > + int ret_fin = btrfs_zone_finish_one_bg(fs_info); > + > + if (ret_fin != 1 || !btrfs_zone_activate(cache)) > + return false; The ret_fin variable here doesn't seem to be actually needed. > + } > + } else if ((!is_system && fs_info->active_meta_bg != cache) || > + (is_system && fs_info->active_system_bg != cache)) { > + struct btrfs_block_group *tgt = is_system ? > + fs_info->active_system_bg : fs_info->active_meta_bg; There's a lot of checks for is_system here and later on in the logic. If we had a helper for this, you could just pass in a bg double pointer argument that the callers sets to &fs_info->active_system_bg or &fs_info->active_meta_bg and simplify a lot of the logic.
On Mon, Jul 24, 2023 at 08:12:28AM -0700, Christoph Hellwig wrote: > On Mon, Jul 24, 2023 at 01:18:34PM +0900, Naohiro Aota wrote: > > Since metadata write-out is always allocated sequentially, when we need to > > write to a non-active block group, we can wait for the ongoing IOs to > > complete, activate a new block group, and then proceed with writing to the > > new block group. > > Somewhat unrelated, but isn't the same true for data as well, and maybe > in a follow on the same activation policy should apply there as well? > I guess it won't quite work without much work as reservations are tied > to a BG, but it would seem like the better scheme. Yeah, we may be able to apply the write-time activation to data as well. But, there are some differences between data and metadata, which make it not so useful. - Allocation of data and writing out is done in the same context under run_delalloc_zoned(), and so they are near in the timing - While, metadata allocation is done while building a B-tree, and written on a transaction commit - No over-commit for data in the first place > > + if (test_bit(BTRFS_FS_ACTIVE_ZONE_TRACKING, &fs_info->flags)) { > > Just a more or less cosmetic nit: there is a lot of code in this > branch, and I wonder if the active zone tracking code would benefit > from being split out into one or more helpers. Yeah, I'll create a helper. > > + bool is_system = cache->flags & BTRFS_BLOCK_GROUP_SYSTEM; > > + > > + spin_lock(&cache->lock); > > + if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, > > + &cache->runtime_flags)) { > > + spin_unlock(&cache->lock); > > + return true; > > + } > > + > > + spin_unlock(&cache->lock); > > What is the point of the cache->lock crticial section here? The > information can be out of date as soon as you drop the lock, so it > looks superflous to me. The block group's activeness starts from non-active, then activated, and finally finished. So, if its "ACTIVE" here, the block group is going to be still active or finished. It's OK it is active. If it is finished, the IO will fail anyway, so it is a bug itself. > > + if (fs_info->treelog_bg == cache->start) { > > + if (!btrfs_zone_activate(cache)) { > > + int ret_fin = btrfs_zone_finish_one_bg(fs_info); > > + > > + if (ret_fin != 1 || !btrfs_zone_activate(cache)) > > + return false; > > The ret_fin variable here doesn't seem to be actually needed. Indeed. > > + } > > + } else if ((!is_system && fs_info->active_meta_bg != cache) || > > + (is_system && fs_info->active_system_bg != cache)) { > > + struct btrfs_block_group *tgt = is_system ? > > + fs_info->active_system_bg : fs_info->active_meta_bg; > > There's a lot of checks for is_system here and later on in the > logic. If we had a helper for this, you could just pass in a bg double > pointer argument that the callers sets to &fs_info->active_system_bg or > &fs_info->active_meta_bg and simplify a lot of the logic. > Sure.
On Tue, Jul 25, 2023 at 09:28:09AM +0000, Naohiro Aota wrote: > > > + bool is_system = cache->flags & BTRFS_BLOCK_GROUP_SYSTEM; > > > + > > > + spin_lock(&cache->lock); > > > + if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, > > > + &cache->runtime_flags)) { > > > + spin_unlock(&cache->lock); > > > + return true; > > > + } > > > + > > > + spin_unlock(&cache->lock); > > > > What is the point of the cache->lock crticial section here? The > > information can be out of date as soon as you drop the lock, so it > > looks superflous to me. > > The block group's activeness starts from non-active, then activated, and > finally finished. So, if its "ACTIVE" here, the block group is going to be > still active or finished. It's OK it is active. If it is finished, the IO > will fail anyway, so it is a bug itself. Yes, but what I mean is that there is no point in taking cache->lock here. The only thing that's done under it is testing a single bit, and the information from that bit is only used after dropping the lock. So there should be no need to take the lock in the first place.
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 91d38f38c1e7..75f8482f45e5 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -4274,6 +4274,17 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info) struct btrfs_caching_control *caching_ctl; struct rb_node *n; + if (btrfs_is_zoned(info)) { + if (info->active_meta_bg) { + btrfs_put_block_group(info->active_meta_bg); + info->active_meta_bg = NULL; + } + if (info->active_system_bg) { + btrfs_put_block_group(info->active_system_bg); + info->active_system_bg = NULL; + } + } + write_lock(&info->block_group_cache_lock); while (!list_empty(&info->caching_block_groups)) { caching_ctl = list_entry(info->caching_block_groups.next, diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 46a0b5357009..3f104559c0cc 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1894,7 +1894,7 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc, if (!ret) return 0; - if (!btrfs_check_meta_write_pointer(eb->fs_info, eb, bg_context)) { + if (!btrfs_check_meta_write_pointer(eb->fs_info, wbc, eb, bg_context)) { /* * If for_sync, this hole will be filled with * trasnsaction commit. diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h index 203d2a267828..1f2d33112106 100644 --- a/fs/btrfs/fs.h +++ b/fs/btrfs/fs.h @@ -766,6 +766,9 @@ struct btrfs_fs_info { u64 data_reloc_bg; struct mutex zoned_data_reloc_io_lock; + struct btrfs_block_group *active_meta_bg; + struct btrfs_block_group *active_system_bg; + u64 nr_global_roots; spinlock_t zone_active_bgs_lock; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index dbfa49c70c1a..f440853dff1c 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -65,6 +65,9 @@ #define SUPER_INFO_SECTORS ((u64)BTRFS_SUPER_INFO_SIZE >> SECTOR_SHIFT) +static void wait_eb_writebacks(struct btrfs_block_group *block_group); +static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_written); + static inline bool sb_zone_is_full(const struct blk_zone *zone) { return (zone->cond == BLK_ZONE_COND_FULL) || @@ -1748,6 +1751,7 @@ void btrfs_finish_ordered_zoned(struct btrfs_ordered_extent *ordered) } bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct writeback_control *wbc, struct extent_buffer *eb, struct btrfs_block_group **cache_ret) { @@ -1779,6 +1783,74 @@ bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, if (cache->meta_write_pointer != eb->start) return false; + if (test_bit(BTRFS_FS_ACTIVE_ZONE_TRACKING, &fs_info->flags)) { + bool is_system = cache->flags & BTRFS_BLOCK_GROUP_SYSTEM; + + spin_lock(&cache->lock); + if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, + &cache->runtime_flags)) { + spin_unlock(&cache->lock); + return true; + } + + spin_unlock(&cache->lock); + if (fs_info->treelog_bg == cache->start) { + if (!btrfs_zone_activate(cache)) { + int ret_fin = btrfs_zone_finish_one_bg(fs_info); + + if (ret_fin != 1 || !btrfs_zone_activate(cache)) + return false; + } + } else if ((!is_system && fs_info->active_meta_bg != cache) || + (is_system && fs_info->active_system_bg != cache)) { + struct btrfs_block_group *tgt = is_system ? + fs_info->active_system_bg : fs_info->active_meta_bg; + + /* + * zoned_meta_io_lock protects + * fs_info->active_{meta,system}_bg. + */ + lockdep_assert_held(&fs_info->zoned_meta_io_lock); + + if (tgt) { + /* + * If there is an unsent IO left in the + * allocated area, we cannot wait for them + * as it may cause a deadlock. + */ + if (tgt->meta_write_pointer < tgt->start + tgt->alloc_offset) { + if (wbc->sync_mode == WB_SYNC_NONE || + (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync)) + return false; + } + + /* Pivot active metadata/system block group. */ + btrfs_zoned_meta_io_unlock(fs_info); + wait_eb_writebacks(tgt); + do_zone_finish(tgt, true); + btrfs_zoned_meta_io_lock(fs_info); + if (is_system && fs_info->active_system_bg == tgt) { + btrfs_put_block_group(tgt); + fs_info->active_system_bg = NULL; + } else if (!is_system && fs_info->active_meta_bg == tgt) { + btrfs_put_block_group(tgt); + fs_info->active_meta_bg = NULL; + } + } + if (!btrfs_zone_activate(cache)) + return false; + if (is_system && fs_info->active_system_bg != cache) { + ASSERT(fs_info->active_system_bg == NULL); + fs_info->active_system_bg = cache; + btrfs_get_block_group(cache); + } else if (!is_system && fs_info->active_meta_bg != cache) { + ASSERT(fs_info->active_meta_bg == NULL); + fs_info->active_meta_bg = cache; + btrfs_get_block_group(cache); + } + } + } + return true; } diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 42a4df94dc29..6935e04effdd 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -59,6 +59,7 @@ void btrfs_redirty_list_add(struct btrfs_transaction *trans, bool btrfs_use_zone_append(struct btrfs_bio *bbio); void btrfs_record_physical_zoned(struct btrfs_bio *bbio); bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, + struct writeback_control *wbc, struct extent_buffer *eb, struct btrfs_block_group **cache_ret); int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, u64 length);
In the current implementation, block groups are activated at reservation time to ensure that all reserved bytes can be written to an active metadata block group. However, this approach has proven to be less efficient, as it activates block groups more frequently than necessary, putting pressure on the active zone resource and leading to potential issues such as early ENOSPC or hung_task. Another drawback of the current method is that it hampers metadata over-commit, and necessitates additional flush operations and block group allocations, resulting in decreased overall performance. To address these issues, this commit introduces a write-time activation of metadata and system block group. This involves reserving at least one active block group specifically for a metadata and system block group. Since metadata write-out is always allocated sequentially, when we need to write to a non-active block group, we can wait for the ongoing IOs to complete, activate a new block group, and then proceed with writing to the new block group. Fixes: b09315139136 ("btrfs: zoned: activate metadata block group on flush_space") CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> --- fs/btrfs/block-group.c | 11 +++++++ fs/btrfs/extent_io.c | 2 +- fs/btrfs/fs.h | 3 ++ fs/btrfs/zoned.c | 72 ++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/zoned.h | 1 + 5 files changed, 88 insertions(+), 1 deletion(-)