Message ID | 20240920143654.1008756-1-wangkefeng.wang@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] tmpfs: fault in smaller chunks if large folio allocation not allowed | expand |
On Fri, Sep 20, 2024 at 10:36:54PM +0800, Kefeng Wang wrote: > The tmpfs supports large folio, but there is some configurable options > to enable/disable large folio allocation, and for huge=within_size, > large folio only allowabled if it fully within i_size, so there is > performance issue when perform write without large folio, the issue is > similar to commit 4e527d5841e2 ("iomap: fault in smaller chunks for > non-large folio mappings"). No. What's wrong with my earlier suggestion?
On 2024/9/22 8:35, Matthew Wilcox wrote: > On Fri, Sep 20, 2024 at 10:36:54PM +0800, Kefeng Wang wrote: >> The tmpfs supports large folio, but there is some configurable options >> to enable/disable large folio allocation, and for huge=within_size, >> large folio only allowabled if it fully within i_size, so there is >> performance issue when perform write without large folio, the issue is >> similar to commit 4e527d5841e2 ("iomap: fault in smaller chunks for >> non-large folio mappings"). > > No. What's wrong with my earlier suggestion? > The tempfs has mount options(never/always/within_size/madvise) for large folio, also has sysfs file /sys/kernel/mm/transparent_hugepage/shmem_enabled to deny/force large folio at runtime, as replied in v1, I think it breaks the rules of mapping_set_folio_order_range(), "Do not tune it based on, eg, i_size." --- for tmpfs, it does choose large folio or not based on the i_size "Context: This should not be called while the inode is active as it is non-atomic." --- during perform write, the inode is active So this is why I don't use mapping_set_folio_order_range() here, but correct me if I am wrong.
On Mon, Sep 23, 2024 at 09:39:07AM +0800, Kefeng Wang wrote: > > > On 2024/9/22 8:35, Matthew Wilcox wrote: > > On Fri, Sep 20, 2024 at 10:36:54PM +0800, Kefeng Wang wrote: > > > The tmpfs supports large folio, but there is some configurable options > > > to enable/disable large folio allocation, and for huge=within_size, > > > large folio only allowabled if it fully within i_size, so there is > > > performance issue when perform write without large folio, the issue is > > > similar to commit 4e527d5841e2 ("iomap: fault in smaller chunks for > > > non-large folio mappings"). > > > > No. What's wrong with my earlier suggestion? > > > > The tempfs has mount options(never/always/within_size/madvise) for large > folio, also has sysfs file /sys/kernel/mm/transparent_hugepage/shmem_enabled > to deny/force large folio at runtime, as replied in v1, I think it > breaks the rules of mapping_set_folio_order_range(), > > "Do not tune it based on, eg, i_size." > --- for tmpfs, it does choose large folio or not based on the i_size > > "Context: This should not be called while the inode is active as it is > non-atomic." > --- during perform write, the inode is active > > So this is why I don't use mapping_set_folio_order_range() here, but > correct me if I am wrong. Yeah, the inode is active here as the max folio size is decided based on the write size, so probably mapping_set_folio_order_range() will not be a safe option.
On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) wrote: > > So this is why I don't use mapping_set_folio_order_range() here, but > > correct me if I am wrong. > > Yeah, the inode is active here as the max folio size is decided based on > the write size, so probably mapping_set_folio_order_range() will not be > a safe option. You really are all making too much of this. Here's the patch I think we need: +++ b/mm/shmem.c @@ -2831,7 +2831,8 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap, cache_no_acl(inode); if (sbinfo->noswap) mapping_set_unevictable(inode->i_mapping); - mapping_set_large_folios(inode->i_mapping); + if (sbinfo->huge) + mapping_set_large_folios(inode->i_mapping); switch (mode & S_IFMT) { default:
On 2024/9/26 21:52, Matthew Wilcox wrote: > On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) wrote: >>> So this is why I don't use mapping_set_folio_order_range() here, but >>> correct me if I am wrong. >> >> Yeah, the inode is active here as the max folio size is decided based on >> the write size, so probably mapping_set_folio_order_range() will not be >> a safe option. > > You really are all making too much of this. Here's the patch I think we > need: > > +++ b/mm/shmem.c > @@ -2831,7 +2831,8 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap, > cache_no_acl(inode); > if (sbinfo->noswap) > mapping_set_unevictable(inode->i_mapping); > - mapping_set_large_folios(inode->i_mapping); > + if (sbinfo->huge) > + mapping_set_large_folios(inode->i_mapping); > But it can't solve all issue, eg, mount with huge = SHMEM_HUGE_WITHIN_SIZE, or mount with SHMEM_HUGE_ALWAYS + runtime SHMEM_HUGE_DENY and the above change will break mount with SHMEM_HUGE_NEVER + runtime SHMEM_HUGE_FORCE
On Thu, Sep 26, 2024 at 10:20:54PM +0800, Kefeng Wang wrote: > On 2024/9/26 21:52, Matthew Wilcox wrote: > > On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) wrote: > > > > So this is why I don't use mapping_set_folio_order_range() here, but > > > > correct me if I am wrong. > > > > > > Yeah, the inode is active here as the max folio size is decided based on > > > the write size, so probably mapping_set_folio_order_range() will not be > > > a safe option. > > > > You really are all making too much of this. Here's the patch I think we > > need: > > > > - mapping_set_large_folios(inode->i_mapping); > > + if (sbinfo->huge) > > + mapping_set_large_folios(inode->i_mapping); > > But it can't solve all issue, eg, > mount with huge = SHMEM_HUGE_WITHIN_SIZE, or The page cache will not create folios which overhang the end of the file by more than the minimum folio size for that mapping. So this is wrong. > mount with SHMEM_HUGE_ALWAYS + runtime SHMEM_HUGE_DENY That's a tweak to this patch, not a fundamental problem with it. > and the above change will break > mount with SHMEM_HUGE_NEVER + runtime SHMEM_HUGE_FORCE Likewise.
On 2024/9/26 22:58, Matthew Wilcox wrote: > On Thu, Sep 26, 2024 at 10:20:54PM +0800, Kefeng Wang wrote: >> On 2024/9/26 21:52, Matthew Wilcox wrote: >>> On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) wrote: >>>>> So this is why I don't use mapping_set_folio_order_range() here, but >>>>> correct me if I am wrong. >>>> >>>> Yeah, the inode is active here as the max folio size is decided based on >>>> the write size, so probably mapping_set_folio_order_range() will not be >>>> a safe option. >>> >>> You really are all making too much of this. Here's the patch I think we >>> need: >>> >>> - mapping_set_large_folios(inode->i_mapping); >>> + if (sbinfo->huge) >>> + mapping_set_large_folios(inode->i_mapping); >> >> But it can't solve all issue, eg, >> mount with huge = SHMEM_HUGE_WITHIN_SIZE, or > > The page cache will not create folios which overhang the end of the file > by more than the minimum folio size for that mapping. So this is wrong. Sorry for the late, not very familiar with it, will test after back to the office in next few days. > >> mount with SHMEM_HUGE_ALWAYS + runtime SHMEM_HUGE_DENY > > That's a tweak to this patch, not a fundamental problem with it. > >> and the above change will break >> mount with SHMEM_HUGE_NEVER + runtime SHMEM_HUGE_FORCE > > Likewise. > But the SHMEM_HUGE_DENY/FORCE could be changed at runtime, I don't find a better way to fix, any more suggestion will be appreciate, thanks.
On 2024/9/26 21:52, Matthew Wilcox wrote: > On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) wrote: >>> So this is why I don't use mapping_set_folio_order_range() here, but >>> correct me if I am wrong. >> >> Yeah, the inode is active here as the max folio size is decided based on >> the write size, so probably mapping_set_folio_order_range() will not be >> a safe option. > > You really are all making too much of this. Here's the patch I think we > need: > > +++ b/mm/shmem.c > @@ -2831,7 +2831,8 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap, > cache_no_acl(inode); > if (sbinfo->noswap) > mapping_set_unevictable(inode->i_mapping); > - mapping_set_large_folios(inode->i_mapping); > + if (sbinfo->huge) > + mapping_set_large_folios(inode->i_mapping); > > switch (mode & S_IFMT) { > default: IMHO, we no longer need the the 'sbinfo->huge' validation after adding support for large folios in the tmpfs write and fallocate paths [1]. Kefeng, can you try if the following RFC patch [1] can solve your problem? Thanks. (PS: I will revise the patch according to Matthew's suggestion) [1] https://lore.kernel.org/all/c03ec1cb1392332726ab265a3d826fe1c408c7e7.1727338549.git.baolin.wang@linux.alibaba.com/
On 2024/9/30 10:02, Baolin Wang wrote: > > > On 2024/9/26 21:52, Matthew Wilcox wrote: >> On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) wrote: >>>> So this is why I don't use mapping_set_folio_order_range() here, but >>>> correct me if I am wrong. >>> >>> Yeah, the inode is active here as the max folio size is decided based on >>> the write size, so probably mapping_set_folio_order_range() will not be >>> a safe option. >> >> You really are all making too much of this. Here's the patch I think we >> need: >> >> +++ b/mm/shmem.c >> @@ -2831,7 +2831,8 @@ static struct inode *__shmem_get_inode(struct >> mnt_idmap *idmap, >> cache_no_acl(inode); >> if (sbinfo->noswap) >> mapping_set_unevictable(inode->i_mapping); >> - mapping_set_large_folios(inode->i_mapping); >> + if (sbinfo->huge) >> + mapping_set_large_folios(inode->i_mapping); >> >> switch (mode & S_IFMT) { >> default: > > IMHO, we no longer need the the 'sbinfo->huge' validation after adding > support for large folios in the tmpfs write and fallocate paths [1]. > > Kefeng, can you try if the following RFC patch [1] can solve your > problem? Thanks. > (PS: I will revise the patch according to Matthew's suggestion) Sure, will try once I come back, but [1] won't solve the issue when set force/deny at runtime, eg, mount with always/within_size, but set deny when runtime, we still fault in large chunks, but we can't allocate large folio, the performance of write will be degradation. > > [1] https://lore.kernel.org/all/ > c03ec1cb1392332726ab265a3d826fe1c408c7e7.1727338549.git.baolin.wang@linux.alibaba.com/
On 2024/9/30 10:30, Kefeng Wang wrote: > > > On 2024/9/30 10:02, Baolin Wang wrote: >> >> >> On 2024/9/26 21:52, Matthew Wilcox wrote: >>> On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) wrote: >>>>> So this is why I don't use mapping_set_folio_order_range() here, but >>>>> correct me if I am wrong. >>>> >>>> Yeah, the inode is active here as the max folio size is decided >>>> based on >>>> the write size, so probably mapping_set_folio_order_range() will not be >>>> a safe option. >>> >>> You really are all making too much of this. Here's the patch I think we >>> need: >>> >>> +++ b/mm/shmem.c >>> @@ -2831,7 +2831,8 @@ static struct inode *__shmem_get_inode(struct >>> mnt_idmap *idmap, >>> cache_no_acl(inode); >>> if (sbinfo->noswap) >>> mapping_set_unevictable(inode->i_mapping); >>> - mapping_set_large_folios(inode->i_mapping); >>> + if (sbinfo->huge) >>> + mapping_set_large_folios(inode->i_mapping); >>> >>> switch (mode & S_IFMT) { >>> default: >> >> IMHO, we no longer need the the 'sbinfo->huge' validation after adding >> support for large folios in the tmpfs write and fallocate paths [1]. >> >> Kefeng, can you try if the following RFC patch [1] can solve your >> problem? Thanks. >> (PS: I will revise the patch according to Matthew's suggestion) > > Sure, will try once I come back, but [1] won't solve the issue when set Cool. Thanks. > force/deny at runtime, eg, mount with always/within_size, but set deny > when runtime, we still fault in large chunks, but we can't allocate > large folio, the performance of write will be degradation. Yes, but as Hugh mentioned before, the options 'force' and 'deny' are rather testing artifacts from the old ages. So I suspect if the 'deny' option will be set in the real products, that means do we really need consider this case too much?
On 2024/9/30 10:52, Baolin Wang wrote: > > > On 2024/9/30 10:30, Kefeng Wang wrote: >> >> >> On 2024/9/30 10:02, Baolin Wang wrote: >>> >>> >>> On 2024/9/26 21:52, Matthew Wilcox wrote: >>>> On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) >>>> wrote: >>>>>> So this is why I don't use mapping_set_folio_order_range() here, but >>>>>> correct me if I am wrong. >>>>> >>>>> Yeah, the inode is active here as the max folio size is decided >>>>> based on >>>>> the write size, so probably mapping_set_folio_order_range() will >>>>> not be >>>>> a safe option. >>>> >>>> You really are all making too much of this. Here's the patch I >>>> think we >>>> need: >>>> >>>> +++ b/mm/shmem.c >>>> @@ -2831,7 +2831,8 @@ static struct inode *__shmem_get_inode(struct >>>> mnt_idmap *idmap, >>>> cache_no_acl(inode); >>>> if (sbinfo->noswap) >>>> mapping_set_unevictable(inode->i_mapping); >>>> - mapping_set_large_folios(inode->i_mapping); >>>> + if (sbinfo->huge) >>>> + mapping_set_large_folios(inode->i_mapping); >>>> >>>> switch (mode & S_IFMT) { >>>> default: >>> >>> IMHO, we no longer need the the 'sbinfo->huge' validation after >>> adding support for large folios in the tmpfs write and fallocate >>> paths [1]. Forget to mention, we still need to check sbinfo->huge, if mount with huge=never, but we fault in large chunk, write is slower than without 9aac777aaf94, the above changes or my patch could fix it. >>> >>> Kefeng, can you try if the following RFC patch [1] can solve your >>> problem? Thanks. >>> (PS: I will revise the patch according to Matthew's suggestion) >> >> Sure, will try once I come back, but [1] won't solve the issue when set > > Cool. Thanks. > >> force/deny at runtime, eg, mount with always/within_size, but set deny >> when runtime, we still fault in large chunks, but we can't allocate >> large folio, the performance of write will be degradation. > > Yes, but as Hugh mentioned before, the options 'force' and 'deny' are > rather testing artifacts from the old ages. So I suspect if the 'deny' > option will be set in the real products, that means do we really need > consider this case too much? OK, so maybe just update the document.
On 2024/9/30 11:15, Kefeng Wang wrote: > > > On 2024/9/30 10:52, Baolin Wang wrote: >> >> >> On 2024/9/30 10:30, Kefeng Wang wrote: >>> >>> >>> On 2024/9/30 10:02, Baolin Wang wrote: >>>> >>>> >>>> On 2024/9/26 21:52, Matthew Wilcox wrote: >>>>> On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) >>>>> wrote: >>>>>>> So this is why I don't use mapping_set_folio_order_range() here, but >>>>>>> correct me if I am wrong. >>>>>> >>>>>> Yeah, the inode is active here as the max folio size is decided >>>>>> based on >>>>>> the write size, so probably mapping_set_folio_order_range() will >>>>>> not be >>>>>> a safe option. >>>>> >>>>> You really are all making too much of this. Here's the patch I >>>>> think we >>>>> need: >>>>> >>>>> +++ b/mm/shmem.c >>>>> @@ -2831,7 +2831,8 @@ static struct inode *__shmem_get_inode(struct >>>>> mnt_idmap *idmap, >>>>> cache_no_acl(inode); >>>>> if (sbinfo->noswap) >>>>> mapping_set_unevictable(inode->i_mapping); >>>>> - mapping_set_large_folios(inode->i_mapping); >>>>> + if (sbinfo->huge) >>>>> + mapping_set_large_folios(inode->i_mapping); >>>>> >>>>> switch (mode & S_IFMT) { >>>>> default: >>>> >>>> IMHO, we no longer need the the 'sbinfo->huge' validation after >>>> adding support for large folios in the tmpfs write and fallocate >>>> paths [1]. > > Forget to mention, we still need to check sbinfo->huge, if mount with > huge=never, but we fault in large chunk, write is slower than without > 9aac777aaf94, the above changes or my patch could fix it. My patch will allow allocating large folios in the tmpfs write and fallocate paths though the 'huge' option is 'never'. My initial thought for supporting large folio is that, if the 'huge' option is enabled, to maintain backward compatibility, we only allow 2M PMD-sized order allocations. If the 'huge' option is disabled(huge=never), we still allow large folio allocations based on the write length. Another choice is to allow the different sized large folio allocation based on the write length when the 'huge' option is enabled, rather than just the 2M PMD sized. But will force the huge orders off if 'huge' option is disabled. Still need some discussions to determine which method is preferable.
On 2024/9/30 14:48, Baolin Wang wrote: > > > On 2024/9/30 11:15, Kefeng Wang wrote: >> >> >> On 2024/9/30 10:52, Baolin Wang wrote: >>> >>> >>> On 2024/9/30 10:30, Kefeng Wang wrote: >>>> >>>> >>>> On 2024/9/30 10:02, Baolin Wang wrote: >>>>> >>>>> >>>>> On 2024/9/26 21:52, Matthew Wilcox wrote: >>>>>> On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) >>>>>> wrote: >>>>>>>> So this is why I don't use mapping_set_folio_order_range() here, >>>>>>>> but >>>>>>>> correct me if I am wrong. >>>>>>> >>>>>>> Yeah, the inode is active here as the max folio size is decided >>>>>>> based on >>>>>>> the write size, so probably mapping_set_folio_order_range() will >>>>>>> not be >>>>>>> a safe option. >>>>>> >>>>>> You really are all making too much of this. Here's the patch I >>>>>> think we >>>>>> need: >>>>>> >>>>>> +++ b/mm/shmem.c >>>>>> @@ -2831,7 +2831,8 @@ static struct inode >>>>>> *__shmem_get_inode(struct mnt_idmap *idmap, >>>>>> cache_no_acl(inode); >>>>>> if (sbinfo->noswap) >>>>>> mapping_set_unevictable(inode->i_mapping); >>>>>> - mapping_set_large_folios(inode->i_mapping); >>>>>> + if (sbinfo->huge) >>>>>> + mapping_set_large_folios(inode->i_mapping); >>>>>> >>>>>> switch (mode & S_IFMT) { >>>>>> default: >>>>> >>>>> IMHO, we no longer need the the 'sbinfo->huge' validation after >>>>> adding support for large folios in the tmpfs write and fallocate >>>>> paths [1]. >> >> Forget to mention, we still need to check sbinfo->huge, if mount with >> huge=never, but we fault in large chunk, write is slower than without >> 9aac777aaf94, the above changes or my patch could fix it. > > My patch will allow allocating large folios in the tmpfs write and > fallocate paths though the 'huge' option is 'never'. Yes, indeed after checking your patch, The Writing intelligently from 'Bonnie -d /mnt/tmpfs/ -s 1024' based on next-20241008, 1) huge=never the base: 2016438 K/Sec my v1/v2 or Matthew's patch : 2874504 K/Sec your patch with filemap_get_order() fix: 6330604 K/Sec 2) huge=always the write performance: 7168917 K/Sec Since large folios supported in the tmpfs write, we do have better performance shown above, that's great. > > My initial thought for supporting large folio is that, if the 'huge' > option is enabled, to maintain backward compatibility, we only allow 2M > PMD-sized order allocations. If the 'huge' option is > disabled(huge=never), we still allow large folio allocations based on > the write length. > > Another choice is to allow the different sized large folio allocation > based on the write length when the 'huge' option is enabled, rather than > just the 2M PMD sized. But will force the huge orders off if 'huge' > option is disabled. > "huge=never Do not allocate huge pages. This is the default." From the document, it's better not to allocate large folio, but we need some special handle for huge=never or runtime deny/force. > Still need some discussions to determine which method is preferable. Personally. I like your current implementation, but it does not match document.
On 2024/10/9 15:09, Kefeng Wang wrote: > > > On 2024/9/30 14:48, Baolin Wang wrote: >> >> >> On 2024/9/30 11:15, Kefeng Wang wrote: >>> >>> >>> On 2024/9/30 10:52, Baolin Wang wrote: >>>> >>>> >>>> On 2024/9/30 10:30, Kefeng Wang wrote: >>>>> >>>>> >>>>> On 2024/9/30 10:02, Baolin Wang wrote: >>>>>> >>>>>> >>>>>> On 2024/9/26 21:52, Matthew Wilcox wrote: >>>>>>> On Thu, Sep 26, 2024 at 10:38:34AM +0200, Pankaj Raghav (Samsung) >>>>>>> wrote: >>>>>>>>> So this is why I don't use mapping_set_folio_order_range() >>>>>>>>> here, but >>>>>>>>> correct me if I am wrong. >>>>>>>> >>>>>>>> Yeah, the inode is active here as the max folio size is decided >>>>>>>> based on >>>>>>>> the write size, so probably mapping_set_folio_order_range() will >>>>>>>> not be >>>>>>>> a safe option. >>>>>>> >>>>>>> You really are all making too much of this. Here's the patch I >>>>>>> think we >>>>>>> need: >>>>>>> >>>>>>> +++ b/mm/shmem.c >>>>>>> @@ -2831,7 +2831,8 @@ static struct inode >>>>>>> *__shmem_get_inode(struct mnt_idmap *idmap, >>>>>>> cache_no_acl(inode); >>>>>>> if (sbinfo->noswap) >>>>>>> mapping_set_unevictable(inode->i_mapping); >>>>>>> - mapping_set_large_folios(inode->i_mapping); >>>>>>> + if (sbinfo->huge) >>>>>>> + mapping_set_large_folios(inode->i_mapping); >>>>>>> >>>>>>> switch (mode & S_IFMT) { >>>>>>> default: >>>>>> >>>>>> IMHO, we no longer need the the 'sbinfo->huge' validation after >>>>>> adding support for large folios in the tmpfs write and fallocate >>>>>> paths [1]. >>> >>> Forget to mention, we still need to check sbinfo->huge, if mount with >>> huge=never, but we fault in large chunk, write is slower than without >>> 9aac777aaf94, the above changes or my patch could fix it. >> >> My patch will allow allocating large folios in the tmpfs write and >> fallocate paths though the 'huge' option is 'never'. > > Yes, indeed after checking your patch, > > The Writing intelligently from 'Bonnie -d /mnt/tmpfs/ -s 1024' based on > next-20241008, > > 1) huge=never > the base: 2016438 K/Sec > my v1/v2 or Matthew's patch : 2874504 K/Sec > your patch with filemap_get_order() fix: 6330604 K/Sec > > 2) huge=always > the write performance: 7168917 K/Sec > > Since large folios supported in the tmpfs write, we do have better > performance shown above, that's great. Great. Thanks for testing. >> My initial thought for supporting large folio is that, if the 'huge' >> option is enabled, to maintain backward compatibility, we only allow >> 2M PMD-sized order allocations. If the 'huge' option is >> disabled(huge=never), we still allow large folio allocations based on >> the write length. >> >> Another choice is to allow the different sized large folio allocation >> based on the write length when the 'huge' option is enabled, rather >> than just the 2M PMD sized. But will force the huge orders off if >> 'huge' option is disabled. >> > > "huge=never Do not allocate huge pages. This is the default." > From the document, it's better not to allocate large folio, but we need > some special handle for huge=never or runtime deny/force. Yes. I'm thinking of adding a new option (something like 'huge=mTHP') to allocate large folios based on the write size. I will resend the patchset, and we can discuss it there.
diff --git a/mm/filemap.c b/mm/filemap.c index 3e46ca45e13d..b33f260fa32f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -4126,13 +4126,28 @@ generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from) } EXPORT_SYMBOL(generic_file_direct_write); +static size_t generic_mapping_max_folio_size(struct address_space *mapping, + loff_t pos) +{ + struct inode *inode = mapping->host; + pgoff_t index = pos >> PAGE_SHIFT; + + if (!shmem_mapping(mapping)) + goto out; + + if (!shmem_allowable_huge_orders(inode, NULL, index, 0, false)) + return PAGE_SIZE; +out: + return mapping_max_folio_size(mapping); +} + ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) { struct file *file = iocb->ki_filp; loff_t pos = iocb->ki_pos; struct address_space *mapping = file->f_mapping; const struct address_space_operations *a_ops = mapping->a_ops; - size_t chunk = mapping_max_folio_size(mapping); + size_t chunk = generic_mapping_max_folio_size(mapping, pos); long status = 0; ssize_t written = 0;
The tmpfs supports large folio, but there is some configurable options to enable/disable large folio allocation, and for huge=within_size, large folio only allowabled if it fully within i_size, so there is performance issue when perform write without large folio, the issue is similar to commit 4e527d5841e2 ("iomap: fault in smaller chunks for non-large folio mappings"). Fix it by checking whether it allows large folio allocation or not before perform write. Fixes: 9aac777aaf94 ("filemap: Convert generic_perform_write() to support large folios") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> --- v2: - Don't use IOCB flags mm/filemap.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)