mbox series

[RFC,0/6] mm: THP-agnostic refactor on huge mappings

Message ID 20240717220219.3743374-1-peterx@redhat.com (mailing list archive)
Headers show
Series mm: THP-agnostic refactor on huge mappings | expand

Message

Peter Xu July 17, 2024, 10:02 p.m. UTC
This is an RFC series, so not yet for merging.  Please don't be scared by
the code changes: most of them are code movements only.

This series is based on the dax mprotect fix series here (while that one is
based on mm-unstable):

  [PATCH v3 0/8] mm/mprotect: Fix dax puds
  https://lore.kernel.org/r/20240715192142.3241557-1-peterx@redhat.com

Overview
========

This series doesn't provide any feature change.  The only goal of this
series is to start decoupling two ideas: "THP" and "huge mapping".  We
already started with having PGTABLE_HAS_HUGE_LEAVES config option, and this
one extends that idea into the code.

The issue is that we have so many functions that only compile with
CONFIG_THP=on, even though they're about huge mappings, and huge mapping is
a pretty common concept, which can apply to many things besides THPs
nowadays.  The major THP file is mm/huge_memory.c as of now.

The first example of such huge mapping users will be hugetlb.  We lived
until now with no problem simply because Linux almost duplicated all the
logics there in the "THP" files into hugetlb APIs.  If we want to get rid
of hugetlb specific APIs and paths, this _might_ be the first thing we want
to do, because we want to be able to e.g., zapping a hugetlb pmd entry even
if !CONFIG_THP.

Then consider other things like dax / pfnmaps.  Dax can depend on THP, then
it'll naturally be able to use pmd/pud helpers, that's okay.  However is it
a must?  Do we also want to have every new pmd/pud mappings in the future
to depend on THP (like PFNMAP)?  My answer is no, but I'm open to opinions.

If anyone agrees with me that "huge mapping" (aka, PMD/PUD mappings that
are larger than PAGE_SIZE) is a more generic concept than THP, then I think
at some point we need to move the generic code out of THP code into a
common code base.

This is what this series does as a start.

In general, this series tries to move many THP things (mostly resides in
huge_memory.c right now) into two new files: huge_mapping_{pmd|pud}.c.
When I move them out, I also put them separately into different files for
different layers.  Then if an arch supports e.g. only PMD, it can avoid
compile the PUD helpers, with things like:

        CONFIG_PGTABLE_HAS_PUD_LEAVES=n
        obj-$(CONFIG_PGTABLE_HAS_PUD_LEAVES) += huge_mapping_pud.o

Note that there're a few tree-wide changes into arch/, but that's not a
lot, to make this not disturbing too much people, I only copied the open
lists of each arch not yet the arch maintainers.

Tests
=====

My normal 19-archs cross-compilation tests pass with it, and smoke tested
on x86_64 with a local config of mine.

Comments welcomed, thanks.

Peter Xu (6):
  mm/treewide: Remove pgd_devmap()
  mm: PGTABLE_HAS_P[MU]D_LEAVES config options
  mm/treewide: Make pgtable-generic.c THP agnostic
  mm: Move huge mapping declarations from internal.h to huge_mm.h
  mm/huge_mapping: Create huge_mapping_pxx.c
  mm: Convert "*_trans_huge() || *_devmap()" to use *_leaf()

 arch/arm64/include/asm/pgtable.h             |   11 +-
 arch/powerpc/include/asm/book3s/64/pgtable.h |    7 +-
 arch/powerpc/mm/book3s64/pgtable.c           |    2 +-
 arch/riscv/include/asm/pgtable.h             |    4 +-
 arch/s390/include/asm/pgtable.h              |    2 +-
 arch/s390/mm/pgtable.c                       |    4 +-
 arch/sparc/mm/tlb.c                          |    2 +-
 arch/x86/include/asm/pgtable.h               |    5 -
 arch/x86/mm/pgtable.c                        |   15 +-
 include/linux/huge_mm.h                      |  332 ++++--
 include/linux/mm.h                           |   18 +
 include/linux/mm_types.h                     |    2 +-
 include/linux/pgtable.h                      |   61 +-
 include/trace/events/huge_mapping.h          |   41 +
 include/trace/events/thp.h                   |   28 -
 mm/Kconfig                                   |    6 +
 mm/Makefile                                  |    2 +
 mm/gup.c                                     |    2 -
 mm/hmm.c                                     |    4 +-
 mm/huge_mapping_pmd.c                        |  976 +++++++++++++++
 mm/huge_mapping_pud.c                        |  235 ++++
 mm/huge_memory.c                             | 1125 +-----------------
 mm/internal.h                                |   33 -
 mm/mapping_dirty_helpers.c                   |    4 +-
 mm/memory.c                                  |   16 +-
 mm/migrate_device.c                          |    2 +-
 mm/mprotect.c                                |    4 +-
 mm/mremap.c                                  |    5 +-
 mm/page_vma_mapped.c                         |    5 +-
 mm/pgtable-generic.c                         |   37 +-
 30 files changed, 1595 insertions(+), 1395 deletions(-)
 create mode 100644 include/trace/events/huge_mapping.h
 create mode 100644 mm/huge_mapping_pmd.c
 create mode 100644 mm/huge_mapping_pud.c

Comments

David Hildenbrand July 22, 2024, 1:29 p.m. UTC | #1
On 18.07.24 00:02, Peter Xu wrote:
> This is an RFC series, so not yet for merging.  Please don't be scared by
> the code changes: most of them are code movements only.
> 
> This series is based on the dax mprotect fix series here (while that one is
> based on mm-unstable):
> 
>    [PATCH v3 0/8] mm/mprotect: Fix dax puds
>    https://lore.kernel.org/r/20240715192142.3241557-1-peterx@redhat.com
> 
> Overview
> ========
> 
> This series doesn't provide any feature change.  The only goal of this
> series is to start decoupling two ideas: "THP" and "huge mapping".  We
> already started with having PGTABLE_HAS_HUGE_LEAVES config option, and this
> one extends that idea into the code.
> 
> The issue is that we have so many functions that only compile with
> CONFIG_THP=on, even though they're about huge mappings, and huge mapping is
> a pretty common concept, which can apply to many things besides THPs
> nowadays.  The major THP file is mm/huge_memory.c as of now.
> 
> The first example of such huge mapping users will be hugetlb.  We lived
> until now with no problem simply because Linux almost duplicated all the
> logics there in the "THP" files into hugetlb APIs.  If we want to get rid
> of hugetlb specific APIs and paths, this _might_ be the first thing we want
> to do, because we want to be able to e.g., zapping a hugetlb pmd entry even
> if !CONFIG_THP.
> 
> Then consider other things like dax / pfnmaps.  Dax can depend on THP, then
> it'll naturally be able to use pmd/pud helpers, that's okay.  However is it
> a must?  Do we also want to have every new pmd/pud mappings in the future
> to depend on THP (like PFNMAP)?  My answer is no, but I'm open to opinions.
> 
> If anyone agrees with me that "huge mapping" (aka, PMD/PUD mappings that
> are larger than PAGE_SIZE) is a more generic concept than THP, then I think
> at some point we need to move the generic code out of THP code into a
> common code base.
> 
> This is what this series does as a start.

Hi Peter!

 From a quick glimpse, patch #1-#4 do make sense independent of patch #5.

I am not so sure about all of the code movement in patch #5. If large 
folios are the future, then likely huge_memory.c should simply be the 
home for all that logic.

Maybe the goal should better be to compile huge_memory.c not only for 
THP, but also for other use cases that require that logic, and fence off 
all THP specific stuff using #ifdef?

Not sure, though. But a lot of this code movements/churn might be avoidable.
Peter Xu July 22, 2024, 3:31 p.m. UTC | #2
On Mon, Jul 22, 2024 at 03:29:43PM +0200, David Hildenbrand wrote:
> On 18.07.24 00:02, Peter Xu wrote:
> > This is an RFC series, so not yet for merging.  Please don't be scared by
> > the code changes: most of them are code movements only.
> > 
> > This series is based on the dax mprotect fix series here (while that one is
> > based on mm-unstable):
> > 
> >    [PATCH v3 0/8] mm/mprotect: Fix dax puds
> >    https://lore.kernel.org/r/20240715192142.3241557-1-peterx@redhat.com
> > 
> > Overview
> > ========
> > 
> > This series doesn't provide any feature change.  The only goal of this
> > series is to start decoupling two ideas: "THP" and "huge mapping".  We
> > already started with having PGTABLE_HAS_HUGE_LEAVES config option, and this
> > one extends that idea into the code.
> > 
> > The issue is that we have so many functions that only compile with
> > CONFIG_THP=on, even though they're about huge mappings, and huge mapping is
> > a pretty common concept, which can apply to many things besides THPs
> > nowadays.  The major THP file is mm/huge_memory.c as of now.
> > 
> > The first example of such huge mapping users will be hugetlb.  We lived
> > until now with no problem simply because Linux almost duplicated all the
> > logics there in the "THP" files into hugetlb APIs.  If we want to get rid
> > of hugetlb specific APIs and paths, this _might_ be the first thing we want
> > to do, because we want to be able to e.g., zapping a hugetlb pmd entry even
> > if !CONFIG_THP.
> > 
> > Then consider other things like dax / pfnmaps.  Dax can depend on THP, then
> > it'll naturally be able to use pmd/pud helpers, that's okay.  However is it
> > a must?  Do we also want to have every new pmd/pud mappings in the future
> > to depend on THP (like PFNMAP)?  My answer is no, but I'm open to opinions.
> > 
> > If anyone agrees with me that "huge mapping" (aka, PMD/PUD mappings that
> > are larger than PAGE_SIZE) is a more generic concept than THP, then I think
> > at some point we need to move the generic code out of THP code into a
> > common code base.
> > 
> > This is what this series does as a start.
> 
> Hi Peter!
> 
> From a quick glimpse, patch #1-#4 do make sense independent of patch #5.
> 
> I am not so sure about all of the code movement in patch #5. If large folios
> are the future, then likely huge_memory.c should simply be the home for all
> that logic.
> 
> Maybe the goal should better be to compile huge_memory.c not only for THP,
> but also for other use cases that require that logic, and fence off all THP
> specific stuff using #ifdef?
> 
> Not sure, though. But a lot of this code movements/churn might be avoidable.

I'm fine using ifdefs in the current fine, but IMHO it's a matter of
whether we want to keep huge_memory.c growing into even larger file, and
keep all large folio logics only in that file.  Currently it's ~4000 LOCs.

Nornally I don't see this as much of a "code churn" category, because it
doesn't changes the code itself but only move things.  I personally also
prefer without code churns, but only in the case where there'll be tiny
little functional changes here and there without real benefit.

It's pretty unavoidable to me when one file grows too large and we'll need
to split, and in this case git doesn't have a good way to track such
movement..

Irrelevant of this, just to mention I think there's still one option that I
at least can make the huge pfnmap depends on THP again which shouldn't be a
huge deal (I don't have any use case that needs huge pfnmap but disable
THP, anyway..), so this series isn't an immediate concern to me for that
route.  But for a hugetlb rework this might be something we need to do,
because we simplly can't make CONFIG_HUGETLB rely on CONFIG_THP..

Thanks,
David Hildenbrand July 23, 2024, 8:18 a.m. UTC | #3
On 22.07.24 17:31, Peter Xu wrote:
> On Mon, Jul 22, 2024 at 03:29:43PM +0200, David Hildenbrand wrote:
>> On 18.07.24 00:02, Peter Xu wrote:
>>> This is an RFC series, so not yet for merging.  Please don't be scared by
>>> the code changes: most of them are code movements only.
>>>
>>> This series is based on the dax mprotect fix series here (while that one is
>>> based on mm-unstable):
>>>
>>>     [PATCH v3 0/8] mm/mprotect: Fix dax puds
>>>     https://lore.kernel.org/r/20240715192142.3241557-1-peterx@redhat.com
>>>
>>> Overview
>>> ========
>>>
>>> This series doesn't provide any feature change.  The only goal of this
>>> series is to start decoupling two ideas: "THP" and "huge mapping".  We
>>> already started with having PGTABLE_HAS_HUGE_LEAVES config option, and this
>>> one extends that idea into the code.
>>>
>>> The issue is that we have so many functions that only compile with
>>> CONFIG_THP=on, even though they're about huge mappings, and huge mapping is
>>> a pretty common concept, which can apply to many things besides THPs
>>> nowadays.  The major THP file is mm/huge_memory.c as of now.
>>>
>>> The first example of such huge mapping users will be hugetlb.  We lived
>>> until now with no problem simply because Linux almost duplicated all the
>>> logics there in the "THP" files into hugetlb APIs.  If we want to get rid
>>> of hugetlb specific APIs and paths, this _might_ be the first thing we want
>>> to do, because we want to be able to e.g., zapping a hugetlb pmd entry even
>>> if !CONFIG_THP.
>>>
>>> Then consider other things like dax / pfnmaps.  Dax can depend on THP, then
>>> it'll naturally be able to use pmd/pud helpers, that's okay.  However is it
>>> a must?  Do we also want to have every new pmd/pud mappings in the future
>>> to depend on THP (like PFNMAP)?  My answer is no, but I'm open to opinions.
>>>
>>> If anyone agrees with me that "huge mapping" (aka, PMD/PUD mappings that
>>> are larger than PAGE_SIZE) is a more generic concept than THP, then I think
>>> at some point we need to move the generic code out of THP code into a
>>> common code base.
>>>
>>> This is what this series does as a start.
>>
>> Hi Peter!
>>
>>  From a quick glimpse, patch #1-#4 do make sense independent of patch #5.
>>
>> I am not so sure about all of the code movement in patch #5. If large folios
>> are the future, then likely huge_memory.c should simply be the home for all
>> that logic.
>>
>> Maybe the goal should better be to compile huge_memory.c not only for THP,
>> but also for other use cases that require that logic, and fence off all THP
>> specific stuff using #ifdef?
>>
>> Not sure, though. But a lot of this code movements/churn might be avoidable.
> 
> I'm fine using ifdefs in the current fine, but IMHO it's a matter of
> whether we want to keep huge_memory.c growing into even larger file, and
> keep all large folio logics only in that file.  Currently it's ~4000 LOCs.

Depends on "how much" for sure. huge_memory.c is currently on place 12 
of the biggest files in mm/. So there might not be immediate cause for 
action ... just yet :) [guess which file is on #2 :) ]

> 
> Nornally I don't see this as much of a "code churn" category, because it
> doesn't changes the code itself but only move things.  I personally also
> prefer without code churns, but only in the case where there'll be tiny
> little functional changes here and there without real benefit.
> 
> It's pretty unavoidable to me when one file grows too large and we'll need
> to split, and in this case git doesn't have a good way to track such
> movement..

Yes, that's what I mean.

I've been recently thinking if we should pursue a different direction:

Just as we recently relocated most follow_huge_* stuff into gup.c, 
likely we should rather look into moving copy_huge_pmd, change_huge_pmd, 
copy_huge_pmd ... into the files where they logically belong to.

In madvise.c, we've been doing that in some places already: For 
madvise_cold_or_pageout_pte_range() we inline the code, but not for 
madvise_free_huge_pmd().

pmd_trans_huge() would already compile to a NOP without 
CONFIG_TRANSPARENT_HUGEPAGE, but to make that code avoid most 
CONFIG_TRANSPARENT_HUGEPAGE, we'd need a couple more function stubs to 
make the compiler happy while still being able to compile that code out 
when not required.

The idea would be that e.g., pmd_leaf() would return "false" at compile 
time if no active configuration (THP, HUGETLB, ...) would be active. So 
we could just use pmd_leaf() similar to pmd_trans_huge() in relevant 
code and have the compiler optimize it all out without putting it into 
separate files.

That means, large folios and PMD/PUD mappings will become "more common" 
and better integrated, without the need to jump between files.

Just some thought about an alternative that would make sense to me.

> 
> Irrelevant of this, just to mention I think there's still one option that I
> at least can make the huge pfnmap depends on THP again which shouldn't be a
> huge deal (I don't have any use case that needs huge pfnmap but disable
> THP, anyway..), so this series isn't an immediate concern to me for that
> route.  But for a hugetlb rework this might be something we need to do,
> because we simplly can't make CONFIG_HUGETLB rely on CONFIG_THP..

Yes, likely. FSDAX went a similar direction and called that FSDAX thing 
a "THP" whereby it really doesn't have anything in common with a THP, 
besides being partially mappable -- IMHO.
Peter Xu July 23, 2024, 9:04 p.m. UTC | #4
On Tue, Jul 23, 2024 at 10:18:37AM +0200, David Hildenbrand wrote:
> On 22.07.24 17:31, Peter Xu wrote:
> > On Mon, Jul 22, 2024 at 03:29:43PM +0200, David Hildenbrand wrote:
> > > On 18.07.24 00:02, Peter Xu wrote:
> > > > This is an RFC series, so not yet for merging.  Please don't be scared by
> > > > the code changes: most of them are code movements only.
> > > > 
> > > > This series is based on the dax mprotect fix series here (while that one is
> > > > based on mm-unstable):
> > > > 
> > > >     [PATCH v3 0/8] mm/mprotect: Fix dax puds
> > > >     https://lore.kernel.org/r/20240715192142.3241557-1-peterx@redhat.com
> > > > 
> > > > Overview
> > > > ========
> > > > 
> > > > This series doesn't provide any feature change.  The only goal of this
> > > > series is to start decoupling two ideas: "THP" and "huge mapping".  We
> > > > already started with having PGTABLE_HAS_HUGE_LEAVES config option, and this
> > > > one extends that idea into the code.
> > > > 
> > > > The issue is that we have so many functions that only compile with
> > > > CONFIG_THP=on, even though they're about huge mappings, and huge mapping is
> > > > a pretty common concept, which can apply to many things besides THPs
> > > > nowadays.  The major THP file is mm/huge_memory.c as of now.
> > > > 
> > > > The first example of such huge mapping users will be hugetlb.  We lived
> > > > until now with no problem simply because Linux almost duplicated all the
> > > > logics there in the "THP" files into hugetlb APIs.  If we want to get rid
> > > > of hugetlb specific APIs and paths, this _might_ be the first thing we want
> > > > to do, because we want to be able to e.g., zapping a hugetlb pmd entry even
> > > > if !CONFIG_THP.
> > > > 
> > > > Then consider other things like dax / pfnmaps.  Dax can depend on THP, then
> > > > it'll naturally be able to use pmd/pud helpers, that's okay.  However is it
> > > > a must?  Do we also want to have every new pmd/pud mappings in the future
> > > > to depend on THP (like PFNMAP)?  My answer is no, but I'm open to opinions.
> > > > 
> > > > If anyone agrees with me that "huge mapping" (aka, PMD/PUD mappings that
> > > > are larger than PAGE_SIZE) is a more generic concept than THP, then I think
> > > > at some point we need to move the generic code out of THP code into a
> > > > common code base.
> > > > 
> > > > This is what this series does as a start.
> > > 
> > > Hi Peter!
> > > 
> > >  From a quick glimpse, patch #1-#4 do make sense independent of patch #5.
> > > 
> > > I am not so sure about all of the code movement in patch #5. If large folios
> > > are the future, then likely huge_memory.c should simply be the home for all
> > > that logic.
> > > 
> > > Maybe the goal should better be to compile huge_memory.c not only for THP,
> > > but also for other use cases that require that logic, and fence off all THP
> > > specific stuff using #ifdef?
> > > 
> > > Not sure, though. But a lot of this code movements/churn might be avoidable.
> > 
> > I'm fine using ifdefs in the current fine, but IMHO it's a matter of
> > whether we want to keep huge_memory.c growing into even larger file, and
> > keep all large folio logics only in that file.  Currently it's ~4000 LOCs.
> 
> Depends on "how much" for sure. huge_memory.c is currently on place 12 of
> the biggest files in mm/. So there might not be immediate cause for action
> ... just yet :) [guess which file is on #2 :) ]

7821, hugetlb.c  
7602, vmscan.c          
7275, slub.c       
7072, page_alloc.c
6673, memory.c     
5402, memcontrol.c 
5239, shmem.c   
5155, vmalloc.c      
4419, filemap.c       
4060, mmap.c       
3882, huge_memory.c

IMHO a split is normally better than keeping everything in one file, but
yeah I'd confess THP file isn't that bad comparing to others..  And I'm
definitely surprised it's even out of top ten.

> 
> > 
> > Nornally I don't see this as much of a "code churn" category, because it
> > doesn't changes the code itself but only move things.  I personally also
> > prefer without code churns, but only in the case where there'll be tiny
> > little functional changes here and there without real benefit.
> > 
> > It's pretty unavoidable to me when one file grows too large and we'll need
> > to split, and in this case git doesn't have a good way to track such
> > movement..
> 
> Yes, that's what I mean.
> 
> I've been recently thinking if we should pursue a different direction:
> 
> Just as we recently relocated most follow_huge_* stuff into gup.c, likely we
> should rather look into moving copy_huge_pmd, change_huge_pmd, copy_huge_pmd
> ... into the files where they logically belong to.
> 
> In madvise.c, we've been doing that in some places already: For
> madvise_cold_or_pageout_pte_range() we inline the code, but not for
> madvise_free_huge_pmd().
> 
> pmd_trans_huge() would already compile to a NOP without
> CONFIG_TRANSPARENT_HUGEPAGE, but to make that code avoid most
> CONFIG_TRANSPARENT_HUGEPAGE, we'd need a couple more function stubs to make
> the compiler happy while still being able to compile that code out when not
> required.

Right, I had a patch does exactly that, where it's called pmd_is_leaf(),
for example, but taking CONFIG_* into account.

I remember I had some issue with that, e.g. I used to see pmd_trans_huge()
(when !THP) can optimize some path but pmd_is_leaf() didn't do the same job
even if all configs were off.  But that's another story and I didn't yet
dig deeper.  Could be something small but overlooked.

> 
> The idea would be that e.g., pmd_leaf() would return "false" at compile time
> if no active configuration (THP, HUGETLB, ...) would be active. So we could
> just use pmd_leaf() similar to pmd_trans_huge() in relevant code and have
> the compiler optimize it all out without putting it into separate files.
> 
> That means, large folios and PMD/PUD mappings will become "more common" and
> better integrated, without the need to jump between files.
> 
> Just some thought about an alternative that would make sense to me.

Yeah comments are always welcomed, thanks.

So I suppose maybe it would be easier for now that I make the pfnmap branch
depending on THP. It looks to me something like this may still take some
time to consolidate.  When it's light enough, maybe it can be a few initial
patches on top of a hugetlb series that can start to use this.  Maybe
that'll at least make the patches easier to review.

Thanks,
David Hildenbrand July 23, 2024, 9:22 p.m. UTC | #5
On 23.07.24 23:04, Peter Xu wrote:
> On Tue, Jul 23, 2024 at 10:18:37AM +0200, David Hildenbrand wrote:
>> On 22.07.24 17:31, Peter Xu wrote:
>>> On Mon, Jul 22, 2024 at 03:29:43PM +0200, David Hildenbrand wrote:
>>>> On 18.07.24 00:02, Peter Xu wrote:
>>>>> This is an RFC series, so not yet for merging.  Please don't be scared by
>>>>> the code changes: most of them are code movements only.
>>>>>
>>>>> This series is based on the dax mprotect fix series here (while that one is
>>>>> based on mm-unstable):
>>>>>
>>>>>      [PATCH v3 0/8] mm/mprotect: Fix dax puds
>>>>>      https://lore.kernel.org/r/20240715192142.3241557-1-peterx@redhat.com
>>>>>
>>>>> Overview
>>>>> ========
>>>>>
>>>>> This series doesn't provide any feature change.  The only goal of this
>>>>> series is to start decoupling two ideas: "THP" and "huge mapping".  We
>>>>> already started with having PGTABLE_HAS_HUGE_LEAVES config option, and this
>>>>> one extends that idea into the code.
>>>>>
>>>>> The issue is that we have so many functions that only compile with
>>>>> CONFIG_THP=on, even though they're about huge mappings, and huge mapping is
>>>>> a pretty common concept, which can apply to many things besides THPs
>>>>> nowadays.  The major THP file is mm/huge_memory.c as of now.
>>>>>
>>>>> The first example of such huge mapping users will be hugetlb.  We lived
>>>>> until now with no problem simply because Linux almost duplicated all the
>>>>> logics there in the "THP" files into hugetlb APIs.  If we want to get rid
>>>>> of hugetlb specific APIs and paths, this _might_ be the first thing we want
>>>>> to do, because we want to be able to e.g., zapping a hugetlb pmd entry even
>>>>> if !CONFIG_THP.
>>>>>
>>>>> Then consider other things like dax / pfnmaps.  Dax can depend on THP, then
>>>>> it'll naturally be able to use pmd/pud helpers, that's okay.  However is it
>>>>> a must?  Do we also want to have every new pmd/pud mappings in the future
>>>>> to depend on THP (like PFNMAP)?  My answer is no, but I'm open to opinions.
>>>>>
>>>>> If anyone agrees with me that "huge mapping" (aka, PMD/PUD mappings that
>>>>> are larger than PAGE_SIZE) is a more generic concept than THP, then I think
>>>>> at some point we need to move the generic code out of THP code into a
>>>>> common code base.
>>>>>
>>>>> This is what this series does as a start.
>>>>
>>>> Hi Peter!
>>>>
>>>>   From a quick glimpse, patch #1-#4 do make sense independent of patch #5.
>>>>
>>>> I am not so sure about all of the code movement in patch #5. If large folios
>>>> are the future, then likely huge_memory.c should simply be the home for all
>>>> that logic.
>>>>
>>>> Maybe the goal should better be to compile huge_memory.c not only for THP,
>>>> but also for other use cases that require that logic, and fence off all THP
>>>> specific stuff using #ifdef?
>>>>
>>>> Not sure, though. But a lot of this code movements/churn might be avoidable.
>>>
>>> I'm fine using ifdefs in the current fine, but IMHO it's a matter of
>>> whether we want to keep huge_memory.c growing into even larger file, and
>>> keep all large folio logics only in that file.  Currently it's ~4000 LOCs.
>>
>> Depends on "how much" for sure. huge_memory.c is currently on place 12 of
>> the biggest files in mm/. So there might not be immediate cause for action
>> ... just yet :) [guess which file is on #2 :) ]
> 
> 7821, hugetlb.c
> 7602, vmscan.c
> 7275, slub.c
> 7072, page_alloc.c
> 6673, memory.c
> 5402, memcontrol.c
> 5239, shmem.c
> 5155, vmalloc.c
> 4419, filemap.c
> 4060, mmap.c
> 3882, huge_memory.c
> 
> IMHO a split is normally better than keeping everything in one file, but
> yeah I'd confess THP file isn't that bad comparing to others..  And I'm
> definitely surprised it's even out of top ten.

It's always interesting looking at the numbers here. For v6.10 we had:

     8521 mm/memcontrol.c
     7813 mm/hugetlb.c
     7550 mm/vmscan.c
     7266 mm/slub.c
     7018 mm/page_alloc.c
     6468 mm/memory.c
     5154 mm/vmalloc.c
     5002 mm/shmem.c
     4419 mm/filemap.c
     4019 mm/mmap.c
     3954 mm/ksm.c
     3740 mm/swapfile.c
     3730 mm/huge_memory.c
     3689 mm/gup.c
     3542 mm/mempolicy.c

I suspect memcontrol.c shrunk because of the v1 split-off, leaving 
hugetlb.c now at #1 :)
LEROY Christophe Aug. 22, 2024, 5:08 p.m. UTC | #6
Le 23/07/2024 à 23:04, Peter Xu a écrit :
>>
>>>
>>> Nornally I don't see this as much of a "code churn" category, because it
>>> doesn't changes the code itself but only move things.  I personally also
>>> prefer without code churns, but only in the case where there'll be tiny
>>> little functional changes here and there without real benefit.
>>>
>>> It's pretty unavoidable to me when one file grows too large and we'll need
>>> to split, and in this case git doesn't have a good way to track such
>>> movement..
>>
>> Yes, that's what I mean.
>>
>> I've been recently thinking if we should pursue a different direction:
>>
>> Just as we recently relocated most follow_huge_* stuff into gup.c, likely we
>> should rather look into moving copy_huge_pmd, change_huge_pmd, copy_huge_pmd
>> ... into the files where they logically belong to.
>>
>> In madvise.c, we've been doing that in some places already: For
>> madvise_cold_or_pageout_pte_range() we inline the code, but not for
>> madvise_free_huge_pmd().
>>
>> pmd_trans_huge() would already compile to a NOP without
>> CONFIG_TRANSPARENT_HUGEPAGE, but to make that code avoid most
>> CONFIG_TRANSPARENT_HUGEPAGE, we'd need a couple more function stubs to make
>> the compiler happy while still being able to compile that code out when not
>> required.
> 
> Right, I had a patch does exactly that, where it's called pmd_is_leaf(),
> for example, but taking CONFIG_* into account.
> 
> I remember I had some issue with that, e.g. I used to see pmd_trans_huge()
> (when !THP) can optimize some path but pmd_is_leaf() didn't do the same job
> even if all configs were off.  But that's another story and I didn't yet
> dig deeper.  Could be something small but overlooked.

When I prepared series 
https://patchwork.kernel.org/project/linux-mm/list/?series=871008 , I 
detected that some architectures define some pXd_leaf() for levels they 
don't support, that's the reason why Andrew had to drop v2 at the last 
minute.

And that's maybe the reason why some of the code you expect to get 
folded-off remains.

Since then I sent v3 that fixes that. Don't know if Andrew is planning 
to take it.

Christophe