Message ID | cover.1706068026.git.wqu@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs: defrag: further preparation for multi-page sector size | expand |
Adding MM list to this cover letter. On 2024/1/24 14:29, Qu Wenruo wrote: > With the folio interface, it's much easier to support multi-page sector > size (aka, sector/block size > PAGE_SIZE, which is rare between major > upstream filesystems). > > The basic idea is, if we firstly convert to full folio interface, and > allow an address space to only allocate folio which is exactly > block/sector size, the support for multi-page would be mostly done. > > But before that support, there are still quite some conversion left for > btrfs. > > Furthermore, with both subpage and multipage sector size, we need to > handle folio different: > > - For subpage > The folio would always be page sized. > > - For multipage (and regular sectorsize == PAGE_SIZE) > The folio would be sector sized. > > Furthermore, the filemap interface would make various shifts more > complex. > As filemap_*() interfaces use index which is PAGE_SHIFT based, > meanwhile with potential larger folio, the folio shift can be larger > than PAGE_SHIFT. As I really want some feedback on this part. I'm pretty sure we would have some filesystems go utilizing larger folios to implement their multi-page block size support. Thus in that case, can we have an interface change to make all folio versions of filemap_*() to accept a file offset instead of page index? Thanks, Qu
On Wed, Jan 24, 2024 at 02:33:22PM +1030, Qu Wenruo wrote: > I'm pretty sure we would have some filesystems go utilizing larger folios to > implement their multi-page block size support. > > Thus in that case, can we have an interface change to make all folio > versions of filemap_*() to accept a file offset instead of page index? You're confused. There's no change needed to the filemap API to support large folios used by large block sizes. Quite possibly more of btrfs is confused, but it's really very simple. index == pos / PAGE_SIZE. That's all. Even if you have a 64kB block size device on a 4kB PAGE_SIZE machine. That implies that folios must be at least order-4, but you can still look up a folio at index 23 and get back the folio which was stored at index 16 (range 16-31). hugetlbfs made the mistake of 'hstate->order' and it's still not fixed. It's a little better than it was (thanks to Sid), but more work is needed. Just use the same approach as THPs or you're going to end up hurt.
On 2024/1/24 15:18, Matthew Wilcox wrote: > On Wed, Jan 24, 2024 at 02:33:22PM +1030, Qu Wenruo wrote: >> I'm pretty sure we would have some filesystems go utilizing larger folios to >> implement their multi-page block size support. >> >> Thus in that case, can we have an interface change to make all folio >> versions of filemap_*() to accept a file offset instead of page index? > > You're confused. There's no change needed to the filemap API to support > large folios used by large block sizes. Quite possibly more of btrfs > is confused, but it's really very simple. index == pos / PAGE_SIZE. > That's all. Even if you have a 64kB block size device on a 4kB PAGE_SIZE > machine. Yes, I understand that filemap API is always working on PAGE_SHIFTed index. The concern is, (hopefully) with more fses going to utilized large folios, there would be two shifts. One folio shift (ilog2(blocksize)), one PAGE_SHIFT for filemap interfaces. And I'm pretty sure it's going to cause confusion, e.g. someone doing the conversion without much think, and all go the folio shift, even for filemap_get_folio(). Thus I'm wondering if it's possible to get a bytenr version of filemap_get_folio(). (Or is it better just creating an inline wrapper inside the fs to avoid confusion?) > > That implies that folios must be at least order-4, but you can still > look up a folio at index 23 and get back the folio which was stored at > index 16 (range 16-31). Yep, that's also what I expect, and that is very handy. Thanks, Qu > > hugetlbfs made the mistake of 'hstate->order' and it's still not fixed. > It's a little better than it was (thanks to Sid), but more work is needed. > Just use the same approach as THPs or you're going to end up hurt. >
On Wed, Jan 24, 2024 at 03:57:39PM +1030, Qu Wenruo wrote: > > > On 2024/1/24 15:18, Matthew Wilcox wrote: > > On Wed, Jan 24, 2024 at 02:33:22PM +1030, Qu Wenruo wrote: > > > I'm pretty sure we would have some filesystems go utilizing larger folios to > > > implement their multi-page block size support. > > > > > > Thus in that case, can we have an interface change to make all folio > > > versions of filemap_*() to accept a file offset instead of page index? > > > > You're confused. There's no change needed to the filemap API to support > > large folios used by large block sizes. Quite possibly more of btrfs > > is confused, but it's really very simple. index == pos / PAGE_SIZE. > > That's all. Even if you have a 64kB block size device on a 4kB PAGE_SIZE > > machine. > > Yes, I understand that filemap API is always working on PAGE_SHIFTed index. OK, good. > The concern is, (hopefully) with more fses going to utilized large > folios, there would be two shifts. > > One folio shift (ilog2(blocksize)), one PAGE_SHIFT for filemap interfaces. Don't shift the file position by the folio_shift(). You want to support large(r) folios _and_ large blocksizes at the same time. ie 64kB might be the block size, but all that would mean would be that folio_shift() would be at least 16. It might be 17, 18 or 21 (for a THP). Filesystems already have to deal with different PAGE_SIZE, SECTOR_SIZE, fsblock size and LBA size. Folios aren't making things any worse here (they're also not making anything better in this area, but I never claimed they would). btrfs is slightly unusual in that it defined PAGE_SIZE and fsblock size to be the same (and then had to deal with the consequences of arm64/x86 interoperability later). But most filesystems have pretty good separation of the four concepts.
On 2024/1/24 16:13, Matthew Wilcox wrote: > On Wed, Jan 24, 2024 at 03:57:39PM +1030, Qu Wenruo wrote: >> >> >> On 2024/1/24 15:18, Matthew Wilcox wrote: >>> On Wed, Jan 24, 2024 at 02:33:22PM +1030, Qu Wenruo wrote: >>>> I'm pretty sure we would have some filesystems go utilizing larger folios to >>>> implement their multi-page block size support. >>>> >>>> Thus in that case, can we have an interface change to make all folio >>>> versions of filemap_*() to accept a file offset instead of page index? >>> >>> You're confused. There's no change needed to the filemap API to support >>> large folios used by large block sizes. Quite possibly more of btrfs >>> is confused, but it's really very simple. index == pos / PAGE_SIZE. >>> That's all. Even if you have a 64kB block size device on a 4kB PAGE_SIZE >>> machine. >> >> Yes, I understand that filemap API is always working on PAGE_SHIFTed index. > > OK, good. > >> The concern is, (hopefully) with more fses going to utilized large >> folios, there would be two shifts. >> >> One folio shift (ilog2(blocksize)), one PAGE_SHIFT for filemap interfaces. > > Don't shift the file position by the folio_shift(). You want to support > large(r) folios _and_ large blocksizes at the same time. ie 64kB might > be the block size, but all that would mean would be that folio_shift() > would be at least 16. It might be 17, 18 or 21 (for a THP). Indeed, I forgot we have THP. > > Filesystems already have to deal with different PAGE_SIZE, SECTOR_SIZE, > fsblock size and LBA size. Folios aren't making things any worse here > (they're also not making anything better in this area, but I never > claimed they would). OK, that makes sense. So all the folio shifts would be an fs internal deal, and we have to handle it properly. > > btrfs is slightly unusual in that it defined PAGE_SIZE and fsblock size > to be the same (and then had to deal with the consequences of arm64/x86 > interoperability later). But most filesystems have pretty good separation > of the four concepts. Indeed, although I also found most major fses do not support larger block size (> PAGE_SIZE) either. I guess subpage is simpler in the past, and hopefully with larger folio, we can see more fses support multi-page blocksize soon. Thanks, Qu