Message ID | 20190218231319.178224-1-yuzhao@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2,1/3] arm64: mm: use appropriate ctors for page tables | expand |
On 02/19/2019 04:43 AM, Yu Zhao wrote: > For pte page, use pgtable_page_ctor(); for pmd page, use > pgtable_pmd_page_ctor() if not folded; and for the rest (pud, > p4d and pgd), don't use any. pgtable_page_ctor()/dtor() is not optional for any level page table page as it determines the struct page state and zone statistics. We should not skip it for any page table page. As stated before pgtable_pmd_page_ctor() is not a replacement for pgtable_page_ctor().
On Tue, Feb 19, 2019 at 09:51:01AM +0530, Anshuman Khandual wrote: > > > On 02/19/2019 04:43 AM, Yu Zhao wrote: > > For pte page, use pgtable_page_ctor(); for pmd page, use > > pgtable_pmd_page_ctor() if not folded; and for the rest (pud, > > p4d and pgd), don't use any. > pgtable_page_ctor()/dtor() is not optional for any level page table page > as it determines the struct page state and zone statistics. This is not true. pgtable_page_ctor() is only meant for user pte page. The name isn't perfect (we named it this way before we had split pmd page table lock, and never bothered to change it). The commit cccd843f54be ("mm: mark pages in use for page tables") clearly states so: Note that only pages currently accounted as NR_PAGETABLES are tracked as PageTable; this does not include pgd/p4d/pud/pmd pages. I'm sure if we go back further, we can find similar stories: we don't set PageTable on page tables other than pte; and we don't account page tables other than pte. I don't have any objection if you want change these two. But please make sure they are consistent across all archs. > We should not skip it for any page table page. In fact, calling it on pmd/pud/p4d is peculiar, and may even be considered wrong. AFAIK, no other arch does so. > As stated before pgtable_pmd_page_ctor() is not a replacement for > pgtable_page_ctor(). pgtable_pmd_page_ctor() must be used on user pmd. For kernel pmd, it's okay to use pgtable_page_ctor() instead only because kernel doesn't have thp.
+ Matthew Wilcox On 02/19/2019 11:02 AM, Yu Zhao wrote: > On Tue, Feb 19, 2019 at 09:51:01AM +0530, Anshuman Khandual wrote: >> >> >> On 02/19/2019 04:43 AM, Yu Zhao wrote: >>> For pte page, use pgtable_page_ctor(); for pmd page, use >>> pgtable_pmd_page_ctor() if not folded; and for the rest (pud, >>> p4d and pgd), don't use any. >> pgtable_page_ctor()/dtor() is not optional for any level page table page >> as it determines the struct page state and zone statistics. > > This is not true. pgtable_page_ctor() is only meant for user pte > page. The name isn't perfect (we named it this way before we had > split pmd page table lock, and never bothered to change it). > > The commit cccd843f54be ("mm: mark pages in use for page tables") > clearly states so: > Note that only pages currently accounted as NR_PAGETABLES are > tracked as PageTable; this does not include pgd/p4d/pud/pmd pages. I think the commit is the following one and it does say so. But what is the rationale of tagging only PTE page as PageTable and updating the zone stat but not doing so for higher level page table pages ? Are not they used as page table pages ? Should not they count towards NR_PAGETABLE ? 1d40a5ea01d53251c ("mm: mark pages in use for page tables") > > I'm sure if we go back further, we can find similar stories: we > don't set PageTable on page tables other than pte; and we don't > account page tables other than pte. I don't have any objection if > you want change these two. But please make sure they are consistent > across all archs. pgtable_page_ctor/dtor() use across arch is not consistent and there is a need for generalization which has been already acknowledged earlier. But for now we can atleast fix this on arm64. https://lore.kernel.org/lkml/1547619692-7946-1-git-send-email-anshuman.khandual@arm.com/ > >> We should not skip it for any page table page. > > In fact, calling it on pmd/pud/p4d is peculiar, and may even be > considered wrong. AFAIK, no other arch does so. Why would it be considered wrong ? IIUC archs have their own understanding of this and there are different implementations. But doing something for PTE page and skipping for others is plain inconsistent. > >> As stated before pgtable_pmd_page_ctor() is not a replacement for >> pgtable_page_ctor(). > > pgtable_pmd_page_ctor() must be used on user pmd. For kernel pmd, > it's okay to use pgtable_page_ctor() instead only because kernel > doesn't have thp. The only extra thing to be done for THP is initializing page->pmd_huge_pte apart from calling pgtable_page_ctor(). Right not it just works on arm64 may be because page->pmd_huge_pte never gets accessed before it's init and no path checks for it when not THP. Its better to init/reset pmd_huge_pte.
On Tue, Feb 19, 2019 at 11:47:12AM +0530, Anshuman Khandual wrote: > + Matthew Wilcox > > On 02/19/2019 11:02 AM, Yu Zhao wrote: > > On Tue, Feb 19, 2019 at 09:51:01AM +0530, Anshuman Khandual wrote: > >> > >> > >> On 02/19/2019 04:43 AM, Yu Zhao wrote: > >>> For pte page, use pgtable_page_ctor(); for pmd page, use > >>> pgtable_pmd_page_ctor() if not folded; and for the rest (pud, > >>> p4d and pgd), don't use any. > >> pgtable_page_ctor()/dtor() is not optional for any level page table page > >> as it determines the struct page state and zone statistics. > > > > This is not true. pgtable_page_ctor() is only meant for user pte > > page. The name isn't perfect (we named it this way before we had > > split pmd page table lock, and never bothered to change it). > > > > The commit cccd843f54be ("mm: mark pages in use for page tables") > > clearly states so: > > Note that only pages currently accounted as NR_PAGETABLES are > > tracked as PageTable; this does not include pgd/p4d/pud/pmd pages. > > I think the commit is the following one and it does say so. But what is > the rationale of tagging only PTE page as PageTable and updating the zone > stat but not doing so for higher level page table pages ? Are not they > used as page table pages ? Should not they count towards NR_PAGETABLE ? > > 1d40a5ea01d53251c ("mm: mark pages in use for page tables") Well, I was just trying to clarify how the ctor is meant to be used. The rational behind it is probably another topic. For starters, the number of pmd/pud/p4d/pgd is at least two orders of magnitude less than the number of pte, which makes them almost negligible. And some archs use kmem for them, so it's infeasible to SetPageTable on or account them in the way the ctor does on those archs. But, as I said, it's not something can't be changed. It's just not the concern of this patch. > > > > I'm sure if we go back further, we can find similar stories: we > > don't set PageTable on page tables other than pte; and we don't > > account page tables other than pte. I don't have any objection if > > you want change these two. But please make sure they are consistent > > across all archs. > > pgtable_page_ctor/dtor() use across arch is not consistent and there is a need > for generalization which has been already acknowledged earlier. But for now we > can atleast fix this on arm64. > > https://lore.kernel.org/lkml/1547619692-7946-1-git-send-email-anshuman.khandual@arm.com/ This is again not true. Please stop making claims not backed up by facts. And the link is completely irrelevant to the ctor. I just checked *all* arches. Only four arches call the ctor outside pte_alloc_one(). They are arm, arm64, ppc and s390. The last two do so not because they want to SetPageTable on or account pmd/pud/p4d/ pgd, but because they have to work around something, as arm/arm64 do. > > > > >> We should not skip it for any page table page. > > > > In fact, calling it on pmd/pud/p4d is peculiar, and may even be > > considered wrong. AFAIK, no other arch does so. > > Why would it be considered wrong ? IIUC archs have their own understanding > of this and there are different implementations. But doing something for > PTE page and skipping for others is plain inconsistent. Allocating memory that will never be used is wrong. Please look into the ctor and find out what exactly it does under different configs. And why I said "may"? Because we know there is only negligible number of pmd/pud/p4d, so the memory allocated may be considered negligible as well. > > > > >> As stated before pgtable_pmd_page_ctor() is not a replacement for > >> pgtable_page_ctor(). > > > > pgtable_pmd_page_ctor() must be used on user pmd. For kernel pmd, > > it's okay to use pgtable_page_ctor() instead only because kernel > > doesn't have thp. > > The only extra thing to be done for THP is initializing page->pmd_huge_pte > apart from calling pgtable_page_ctor(). Right not it just works on arm64 > may be because page->pmd_huge_pte never gets accessed before it's init and > no path checks for it when not THP. Its better to init/reset pmd_huge_pte. This is not the reason. Arm64 gets by with calling pgtable_page_ctor() on pmd because it only does so on efi_mm. efi_mm is not user mm, therefore doesn't involve thp.
On Tue, Feb 19, 2019 at 11:47:12AM +0530, Anshuman Khandual wrote: > + Matthew Wilcox > On 02/19/2019 11:02 AM, Yu Zhao wrote: > > On Tue, Feb 19, 2019 at 09:51:01AM +0530, Anshuman Khandual wrote: > >> > >> > >> On 02/19/2019 04:43 AM, Yu Zhao wrote: > >>> For pte page, use pgtable_page_ctor(); for pmd page, use > >>> pgtable_pmd_page_ctor() if not folded; and for the rest (pud, > >>> p4d and pgd), don't use any. > >> pgtable_page_ctor()/dtor() is not optional for any level page table page > >> as it determines the struct page state and zone statistics. > > > > This is not true. pgtable_page_ctor() is only meant for user pte > > page. The name isn't perfect (we named it this way before we had > > split pmd page table lock, and never bothered to change it). > > > > The commit cccd843f54be ("mm: mark pages in use for page tables") Where did you get that commit ID from? In Linus' tree, it's 1d40a5ea01d53251c23c7be541d3f4a656cfc537 > > clearly states so: > > Note that only pages currently accounted as NR_PAGETABLES are > > tracked as PageTable; this does not include pgd/p4d/pud/pmd pages. > > I think the commit is the following one and it does say so. But what is > the rationale of tagging only PTE page as PageTable and updating the zone > stat but not doing so for higher level page table pages ? Are not they > used as page table pages ? Should not they count towards NR_PAGETABLE ? > > 1d40a5ea01d53251c ("mm: mark pages in use for page tables") I think they should all be accounted towards NR_PAGETABLE and marked as being PageTable. Somebody needs to make the case for that and send the patches. That patch even says that there should be follow-up patches to do that. I've been a little busy and haven't got back to it. I thought you said you were going to do it. > pgtable_page_ctor/dtor() use across arch is not consistent and there is a need > for generalization which has been already acknowledged earlier. But for now we > can atleast fix this on arm64. > > https://lore.kernel.org/lkml/1547619692-7946-1-git-send-email-anshuman.khandual@arm.com/ ... were you not listening when you were told that was completely inadequate?
On 02/20/2019 07:04 AM, Matthew Wilcox wrote: > On Tue, Feb 19, 2019 at 11:47:12AM +0530, Anshuman Khandual wrote: >> + Matthew Wilcox >> On 02/19/2019 11:02 AM, Yu Zhao wrote: >>> On Tue, Feb 19, 2019 at 09:51:01AM +0530, Anshuman Khandual wrote: >>>> >>>> >>>> On 02/19/2019 04:43 AM, Yu Zhao wrote: >>>>> For pte page, use pgtable_page_ctor(); for pmd page, use >>>>> pgtable_pmd_page_ctor() if not folded; and for the rest (pud, >>>>> p4d and pgd), don't use any. >>>> pgtable_page_ctor()/dtor() is not optional for any level page table page >>>> as it determines the struct page state and zone statistics. >>> >>> This is not true. pgtable_page_ctor() is only meant for user pte >>> page. The name isn't perfect (we named it this way before we had >>> split pmd page table lock, and never bothered to change it). >>> >>> The commit cccd843f54be ("mm: mark pages in use for page tables") > > Where did you get that commit ID from? In Linus' tree, it's > 1d40a5ea01d53251c23c7be541d3f4a656cfc537 > >>> clearly states so: >>> Note that only pages currently accounted as NR_PAGETABLES are >>> tracked as PageTable; this does not include pgd/p4d/pud/pmd pages. >> >> I think the commit is the following one and it does say so. But what is >> the rationale of tagging only PTE page as PageTable and updating the zone >> stat but not doing so for higher level page table pages ? Are not they >> used as page table pages ? Should not they count towards NR_PAGETABLE ? >> >> 1d40a5ea01d53251c ("mm: mark pages in use for page tables") > > I think they should all be accounted towards NR_PAGETABLE and marked > as being PageTable. Somebody needs to make the case for that and Okay so we agree on the applicability part. > send the patches. That patch even says that there should be follow-up > patches to do that. I've been a little busy and haven't got back to it. > I thought you said you were going to do it. This is very much arch specific. pgtabe_page_ctor()/dtor() are not uniformly called for all page table level allocations (user or kernel) across different archs. Yes I am planning to make generic page table allocation functions for all levels which archs can choose to use. But for now I have a series to fix the situation on arm64. > >> pgtable_page_ctor/dtor() use across arch is not consistent and there is a need >> for generalization which has been already acknowledged earlier. But for now we >> can atleast fix this on arm64. >> >> https://lore.kernel.org/lkml/1547619692-7946-1-git-send-email-anshuman.khandual@arm.com/ > > ... were you not listening when you were told that was completely > inadequate? Agreed. The discussion on the thread made it clear that the above patch was inadequate. What I was trying to point out (probably not very clearly) that there is a need for larger generalization/consolidation on page table page allocation front including but might not be limited to allocation flag for user/kernel page table, standard allocation functions etc. The very idea of quoting the above URL here was to bring attention to the fact that different archs are doing these allocations differently already.
On 02/20/2019 03:58 AM, Yu Zhao wrote: > On Tue, Feb 19, 2019 at 11:47:12AM +0530, Anshuman Khandual wrote: >> + Matthew Wilcox >> >> On 02/19/2019 11:02 AM, Yu Zhao wrote: >>> On Tue, Feb 19, 2019 at 09:51:01AM +0530, Anshuman Khandual wrote: >>>> >>>> >>>> On 02/19/2019 04:43 AM, Yu Zhao wrote: >>>>> For pte page, use pgtable_page_ctor(); for pmd page, use >>>>> pgtable_pmd_page_ctor() if not folded; and for the rest (pud, >>>>> p4d and pgd), don't use any. >>>> pgtable_page_ctor()/dtor() is not optional for any level page table page >>>> as it determines the struct page state and zone statistics. >>> >>> This is not true. pgtable_page_ctor() is only meant for user pte >>> page. The name isn't perfect (we named it this way before we had >>> split pmd page table lock, and never bothered to change it). >>> >>> The commit cccd843f54be ("mm: mark pages in use for page tables") >>> clearly states so: >>> Note that only pages currently accounted as NR_PAGETABLES are >>> tracked as PageTable; this does not include pgd/p4d/pud/pmd pages. >> >> I think the commit is the following one and it does say so. But what is >> the rationale of tagging only PTE page as PageTable and updating the zone >> stat but not doing so for higher level page table pages ? Are not they >> used as page table pages ? Should not they count towards NR_PAGETABLE ? >> >> 1d40a5ea01d53251c ("mm: mark pages in use for page tables") > > Well, I was just trying to clarify how the ctor is meant to be used. > The rational behind it is probably another topic. > > For starters, the number of pmd/pud/p4d/pgd is at least two orders > of magnitude less than the number of pte, which makes them almost > negligible. And some archs use kmem for them, so it's infeasible to > SetPageTable on or account them in the way the ctor does on those > archs. > I understand the kmem cases which are definitely problematic and should be fixed. IIRC there is a mechanism to custom init pages allocated for slab cache with a ctor function which in turn can call pgtable_page_ctor(). But destructor helper support for slab has been dropped I guess. > But, as I said, it's not something can't be changed. It's just not > the concern of this patch. Using pgtable_pmd_page_ctor() during PMD level pgtable page allocation as suggested in the patch breaks pmd_alloc_one() changes as per the previous proposal. Hence we all would need some agreement here. https://www.spinics.net/lists/arm-kernel/msg701960.html We can still accommodate the split PMD ptlock feature in pmd_alloc_one(). A possible solution can be like this above and over the previous series. diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index a4168d366127..c02abb2a69f7 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -9,6 +9,7 @@ config ARM64 select ACPI_SPCR_TABLE if ACPI select ACPI_PPTT if ACPI select ARCH_CLOCKSOURCE_DATA + select ARCH_ENABLE_SPLIT_PMD_PTLOCK if HAVE_ARCH_TRANSPARENT_HUGEPAGE select ARCH_HAS_DEBUG_VIRTUAL select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_DMA_COHERENT_TO_PFN diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h index a02a4d1d967d..258e09fb3ce2 100644 --- a/arch/arm64/include/asm/pgalloc.h +++ b/arch/arm64/include/asm/pgalloc.h @@ -37,13 +37,29 @@ static inline void pte_free(struct mm_struct *mm, pgtable_t pte); static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) { - return (pmd_t *)pte_alloc_one_virt(mm); + pgtable_t ptr; + + ptr = pte_alloc_one(mm); + if (!ptr) + return 0; + +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS + ptr->pmd_huge_pte = NULL; +#endif + return (pmd_t *)page_to_virt(ptr); } static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp) { + struct page *page; + BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1)); - pte_free(mm, virt_to_page(pmdp)); + page = virt_to_page(pmdp); + +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS + VM_BUG_ON_PAGE(page->pmd_huge_pte, page); +#endif + pte_free(mm, page); } > >>> >>> I'm sure if we go back further, we can find similar stories: we >>> don't set PageTable on page tables other than pte; and we don't >>> account page tables other than pte. I don't have any objection if >>> you want change these two. But please make sure they are consistent >>> across all archs. >> >> pgtable_page_ctor/dtor() use across arch is not consistent and there is a need >> for generalization which has been already acknowledged earlier. But for now we >> can atleast fix this on arm64. >> >> https://lore.kernel.org/lkml/1547619692-7946-1-git-send-email-anshuman.khandual@arm.com/ > > This is again not true. Please stop making claims not backed up by > facts. And the link is completely irrelevant to the ctor. > > I just checked *all* arches. Only four arches call the ctor outside > pte_alloc_one(). They are arm, arm64, ppc and s390. The last two do > so not because they want to SetPageTable on or account pmd/pud/p4d/ > pgd, but because they have to work around something, as arm/arm64 > do. That reaffirms the fact that pgtable_page_ctor()/dtor() are getting used not in a consistent manner. > >> >>> >>>> We should not skip it for any page table page. >>> >>> In fact, calling it on pmd/pud/p4d is peculiar, and may even be >>> considered wrong. AFAIK, no other arch does so. >> >> Why would it be considered wrong ? IIUC archs have their own understanding >> of this and there are different implementations. But doing something for >> PTE page and skipping for others is plain inconsistent. > > Allocating memory that will never be used is wrong. Please look into > the ctor and find out what exactly it does under different configs. Are you referring to ptlock_init() --> ptlock_alloc() triggered spinlock_t allocations with USE_SPLIT_PTE_PTLOCKS and ALLOC_SPLIT_PTLOCKS. > > And why I said "may"? Because we know there is only negligible number > of pmd/pud/p4d, so the memory allocated may be considered negligible > as well. Okay.
On Wed, Feb 20, 2019 at 03:57:59PM +0530, Anshuman Khandual wrote: > On 02/20/2019 03:58 AM, Yu Zhao wrote: > > On Tue, Feb 19, 2019 at 11:47:12AM +0530, Anshuman Khandual wrote: > >> On 02/19/2019 11:02 AM, Yu Zhao wrote: > >>> On Tue, Feb 19, 2019 at 09:51:01AM +0530, Anshuman Khandual wrote: > >>>> On 02/19/2019 04:43 AM, Yu Zhao wrote: > >>>>> For pte page, use pgtable_page_ctor(); for pmd page, use > >>>>> pgtable_pmd_page_ctor() if not folded; and for the rest (pud, > >>>>> p4d and pgd), don't use any. > >>>> pgtable_page_ctor()/dtor() is not optional for any level page table page > >>>> as it determines the struct page state and zone statistics. > >>> > >>> This is not true. pgtable_page_ctor() is only meant for user pte > >>> page. The name isn't perfect (we named it this way before we had > >>> split pmd page table lock, and never bothered to change it). > >>> > >>> The commit cccd843f54be ("mm: mark pages in use for page tables") > >>> clearly states so: > >>> Note that only pages currently accounted as NR_PAGETABLES are > >>> tracked as PageTable; this does not include pgd/p4d/pud/pmd pages. > >> > >> I think the commit is the following one and it does say so. But what is > >> the rationale of tagging only PTE page as PageTable and updating the zone > >> stat but not doing so for higher level page table pages ? Are not they > >> used as page table pages ? Should not they count towards NR_PAGETABLE ? > >> > >> 1d40a5ea01d53251c ("mm: mark pages in use for page tables") > > > > Well, I was just trying to clarify how the ctor is meant to be used. > > The rational behind it is probably another topic. > > > > For starters, the number of pmd/pud/p4d/pgd is at least two orders > > of magnitude less than the number of pte, which makes them almost > > negligible. And some archs use kmem for them, so it's infeasible to > > SetPageTable on or account them in the way the ctor does on those > > archs. > > I understand the kmem cases which are definitely problematic and should > be fixed. IIRC there is a mechanism to custom init pages allocated for > slab cache with a ctor function which in turn can call pgtable_page_ctor(). > But destructor helper support for slab has been dropped I guess. You can't put a spinlock in the struct page if the page is allocated through slab. Slab uses basically all of struct page for its own purposes. I tried to make that clear with the new layout of struct page where everything's in a union discriminated by what the page is allocated for.
On Wed, Feb 20, 2019 at 03:57:59PM +0530, Anshuman Khandual wrote: > > > On 02/20/2019 03:58 AM, Yu Zhao wrote: > > On Tue, Feb 19, 2019 at 11:47:12AM +0530, Anshuman Khandual wrote: > >> + Matthew Wilcox > >> > >> On 02/19/2019 11:02 AM, Yu Zhao wrote: > >>> On Tue, Feb 19, 2019 at 09:51:01AM +0530, Anshuman Khandual wrote: > >>>> > >>>> > >>>> On 02/19/2019 04:43 AM, Yu Zhao wrote: > >>>>> For pte page, use pgtable_page_ctor(); for pmd page, use > >>>>> pgtable_pmd_page_ctor() if not folded; and for the rest (pud, > >>>>> p4d and pgd), don't use any. > >>>> pgtable_page_ctor()/dtor() is not optional for any level page table page > >>>> as it determines the struct page state and zone statistics. > >>> > >>> This is not true. pgtable_page_ctor() is only meant for user pte > >>> page. The name isn't perfect (we named it this way before we had > >>> split pmd page table lock, and never bothered to change it). > >>> > >>> The commit cccd843f54be ("mm: mark pages in use for page tables") > >>> clearly states so: > >>> Note that only pages currently accounted as NR_PAGETABLES are > >>> tracked as PageTable; this does not include pgd/p4d/pud/pmd pages. > >> > >> I think the commit is the following one and it does say so. But what is > >> the rationale of tagging only PTE page as PageTable and updating the zone > >> stat but not doing so for higher level page table pages ? Are not they > >> used as page table pages ? Should not they count towards NR_PAGETABLE ? > >> > >> 1d40a5ea01d53251c ("mm: mark pages in use for page tables") > > > > Well, I was just trying to clarify how the ctor is meant to be used. > > The rational behind it is probably another topic. > > > > For starters, the number of pmd/pud/p4d/pgd is at least two orders > > of magnitude less than the number of pte, which makes them almost > > negligible. And some archs use kmem for them, so it's infeasible to > > SetPageTable on or account them in the way the ctor does on those > > archs. > > > > I understand the kmem cases which are definitely problematic and should > be fixed. IIRC there is a mechanism to custom init pages allocated for > slab cache with a ctor function which in turn can call pgtable_page_ctor(). > But destructor helper support for slab has been dropped I guess. > > > > But, as I said, it's not something can't be changed. It's just not > > the concern of this patch. > > Using pgtable_pmd_page_ctor() during PMD level pgtable page allocation > as suggested in the patch breaks pmd_alloc_one() changes as per the > previous proposal. Hence we all would need some agreement here. > > https://www.spinics.net/lists/arm-kernel/msg701960.html A proposal that requires all page tables to go through a same set of ctors on all archs is not only inefficient (for kernel page tables) but also infeasible (for arches use kmem for page tables). I've explained this clearly. The generalized page table functions must recognize the differences on different levels and between user and kernel page tables, and provide unified api that is capable of handling the differences. The change below is not helping at all. > > We can still accommodate the split PMD ptlock feature in pmd_alloc_one(). > A possible solution can be like this above and over the previous series. > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index a4168d366127..c02abb2a69f7 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -9,6 +9,7 @@ config ARM64 > select ACPI_SPCR_TABLE if ACPI > select ACPI_PPTT if ACPI > select ARCH_CLOCKSOURCE_DATA > + select ARCH_ENABLE_SPLIT_PMD_PTLOCK if HAVE_ARCH_TRANSPARENT_HUGEPAGE > select ARCH_HAS_DEBUG_VIRTUAL > select ARCH_HAS_DEVMEM_IS_ALLOWED > select ARCH_HAS_DMA_COHERENT_TO_PFN > diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h > index a02a4d1d967d..258e09fb3ce2 100644 > --- a/arch/arm64/include/asm/pgalloc.h > +++ b/arch/arm64/include/asm/pgalloc.h > @@ -37,13 +37,29 @@ static inline void pte_free(struct mm_struct *mm, pgtable_t pte); > > static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) > { > - return (pmd_t *)pte_alloc_one_virt(mm); > + pgtable_t ptr; > + > + ptr = pte_alloc_one(mm); > + if (!ptr) > + return 0; > + > +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS > + ptr->pmd_huge_pte = NULL; > +#endif > + return (pmd_t *)page_to_virt(ptr); > } > > static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp) > { > + struct page *page; > + > BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1)); > - pte_free(mm, virt_to_page(pmdp)); > + page = virt_to_page(pmdp); > + > +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS > + VM_BUG_ON_PAGE(page->pmd_huge_pte, page); > +#endif > + pte_free(mm, page); > } > > > > > >>> > >>> I'm sure if we go back further, we can find similar stories: we > >>> don't set PageTable on page tables other than pte; and we don't > >>> account page tables other than pte. I don't have any objection if > >>> you want change these two. But please make sure they are consistent > >>> across all archs. > >> > >> pgtable_page_ctor/dtor() use across arch is not consistent and there is a need > >> for generalization which has been already acknowledged earlier. But for now we > >> can atleast fix this on arm64. > >> > >> https://lore.kernel.org/lkml/1547619692-7946-1-git-send-email-anshuman.khandual@arm.com/ > > > > This is again not true. Please stop making claims not backed up by > > facts. And the link is completely irrelevant to the ctor. > > > > I just checked *all* arches. Only four arches call the ctor outside > > pte_alloc_one(). They are arm, arm64, ppc and s390. The last two do > > so not because they want to SetPageTable on or account pmd/pud/p4d/ > > pgd, but because they have to work around something, as arm/arm64 > > do. > > That reaffirms the fact that pgtable_page_ctor()/dtor() are getting used > not in a consistent manner. Now it's getting absurd. I'll just stop before this turns into complete nonsense.
On Wed, Feb 20, 2019 at 01:22:44PM -0700, Yu Zhao wrote: > On Wed, Feb 20, 2019 at 03:57:59PM +0530, Anshuman Khandual wrote: > > Using pgtable_pmd_page_ctor() during PMD level pgtable page allocation > > as suggested in the patch breaks pmd_alloc_one() changes as per the > > previous proposal. Hence we all would need some agreement here. > > > > https://www.spinics.net/lists/arm-kernel/msg701960.html > > A proposal that requires all page tables to go through a same set of > ctors on all archs is not only inefficient (for kernel page tables) > but also infeasible (for arches use kmem for page tables). I've > explained this clearly. > > The generalized page table functions must recognize the differences > on different levels and between user and kernel page tables, and > provide unified api that is capable of handling the differences. The two architectures I'm aware of (s390 and power) which use sub-page allocations for page tables do so by allocating entire pages and then implementing their own allocators. It shouldn't be a huge problem to use a ctor for the pages. We can probably even implement a dtor for them. Oh, another corner-case I've just remembered is x86-32's PAE with four 8-byte entries in the PGD. That should also go away and be replaced with a shared implementation of sub-page allocations which can also be marked as PageTable. Ideally PTEs, PMDs, etc, etc would all be accounted to the individual processes causing them to be allocated. This isn't really feasible with the x86 PGD; by definition there's only one per process. I'm OK with failing to account this 32-byte allocation to the task though. So maybe the pgd_cache can remain separate from the hypothetical unified ppc/s390 code.
On Mon, Feb 18, 2019 at 10:32:05PM -0700, Yu Zhao wrote: > pgtable_pmd_page_ctor() must be used on user pmd. For kernel pmd, > it's okay to use pgtable_page_ctor() instead only because kernel > doesn't have thp. I'm not sure that's true. I think you can create THPs in vmalloc these days. See HAVE_ARCH_HUGE_VMAP which is supported by arm64.
Hi, On Mon, Feb 18, 2019 at 04:13:17PM -0700, Yu Zhao wrote: > For pte page, use pgtable_page_ctor(); for pmd page, use > pgtable_pmd_page_ctor() if not folded; and for the rest (pud, > p4d and pgd), don't use any. > > Signed-off-by: Yu Zhao <yuzhao@google.com> > --- > arch/arm64/mm/mmu.c | 33 +++++++++++++++++++++------------ > 1 file changed, 21 insertions(+), 12 deletions(-) [...] > -static phys_addr_t pgd_pgtable_alloc(void) > +static phys_addr_t pgd_pgtable_alloc(int shift) > { > void *ptr = (void *)__get_free_page(PGALLOC_GFP); > - if (!ptr || !pgtable_page_ctor(virt_to_page(ptr))) > - BUG(); > + BUG_ON(!ptr); > + > + /* > + * Initialize page table locks in case later we need to > + * call core mm functions like apply_to_page_range() on > + * this pre-allocated page table. > + */ > + if (shift == PAGE_SHIFT) > + BUG_ON(!pgtable_page_ctor(virt_to_page(ptr))); > + else if (shift == PMD_SHIFT && PMD_SHIFT != PUD_SHIFT) > + BUG_ON(!pgtable_pmd_page_ctor(virt_to_page(ptr))); IIUC, this is for nopmd kernels, where we only have real PGD and PTE levels of table. From my PoV, that would be clearer if we did: else if (shift == PMD_SHIFT && !is_defined(__PAGETABLE_PMD_FOLDED)) ... though IMO it would be a bit nicer if the generic pgtable_pmd_page_ctor() were nop'd out for __PAGETABLE_PMD_FOLDED builds, so that callers don't have to be aware of folding. I couldn't think of a nicer way of distinguishing levels of table, and having separate function pointers for each level seems over-the-top, so otehr than that this looks good to me. Assuming you're happy with the above change: Acked-by: Mark Rutland <mark.rutland@arm.com> Thanks, Mark.
On Tue, Feb 26, 2019 at 03:12:31PM +0000, Mark Rutland wrote: > Hi, > > On Mon, Feb 18, 2019 at 04:13:17PM -0700, Yu Zhao wrote: > > For pte page, use pgtable_page_ctor(); for pmd page, use > > pgtable_pmd_page_ctor() if not folded; and for the rest (pud, > > p4d and pgd), don't use any. > > > > Signed-off-by: Yu Zhao <yuzhao@google.com> > > --- > > arch/arm64/mm/mmu.c | 33 +++++++++++++++++++++------------ > > 1 file changed, 21 insertions(+), 12 deletions(-) > > [...] > > > -static phys_addr_t pgd_pgtable_alloc(void) > > +static phys_addr_t pgd_pgtable_alloc(int shift) > > { > > void *ptr = (void *)__get_free_page(PGALLOC_GFP); > > - if (!ptr || !pgtable_page_ctor(virt_to_page(ptr))) > > - BUG(); > > + BUG_ON(!ptr); > > + > > + /* > > + * Initialize page table locks in case later we need to > > + * call core mm functions like apply_to_page_range() on > > + * this pre-allocated page table. > > + */ > > + if (shift == PAGE_SHIFT) > > + BUG_ON(!pgtable_page_ctor(virt_to_page(ptr))); > > + else if (shift == PMD_SHIFT && PMD_SHIFT != PUD_SHIFT) > > + BUG_ON(!pgtable_pmd_page_ctor(virt_to_page(ptr))); > > IIUC, this is for nopmd kernels, where we only have real PGD and PTE > levels of table. From my PoV, that would be clearer if we did: > > else if (shift == PMD_SHIFT && !is_defined(__PAGETABLE_PMD_FOLDED)) > > ... though IMO it would be a bit nicer if the generic > pgtable_pmd_page_ctor() were nop'd out for __PAGETABLE_PMD_FOLDED > builds, so that callers don't have to be aware of folding. Agreed. Will make pgtable_pmd_page_ctor() nop when pmd is folded. > I couldn't think of a nicer way of distinguishing levels of table, and > having separate function pointers for each level seems over-the-top, so > otehr than that this looks good to me. > > Assuming you're happy with the above change: > > Acked-by: Mark Rutland <mark.rutland@arm.com> > > Thanks, > Mark.
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index b6f5aa52ac67..fa7351877af3 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -98,7 +98,7 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, } EXPORT_SYMBOL(phys_mem_access_prot); -static phys_addr_t __init early_pgtable_alloc(void) +static phys_addr_t __init early_pgtable_alloc(int shift) { phys_addr_t phys; void *ptr; @@ -173,7 +173,7 @@ static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end, static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr, unsigned long end, phys_addr_t phys, pgprot_t prot, - phys_addr_t (*pgtable_alloc)(void), + phys_addr_t (*pgtable_alloc)(int), int flags) { unsigned long next; @@ -183,7 +183,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr, if (pmd_none(pmd)) { phys_addr_t pte_phys; BUG_ON(!pgtable_alloc); - pte_phys = pgtable_alloc(); + pte_phys = pgtable_alloc(PAGE_SHIFT); __pmd_populate(pmdp, pte_phys, PMD_TYPE_TABLE); pmd = READ_ONCE(*pmdp); } @@ -207,7 +207,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr, static void init_pmd(pud_t *pudp, unsigned long addr, unsigned long end, phys_addr_t phys, pgprot_t prot, - phys_addr_t (*pgtable_alloc)(void), int flags) + phys_addr_t (*pgtable_alloc)(int), int flags) { unsigned long next; pmd_t *pmdp; @@ -245,7 +245,7 @@ static void init_pmd(pud_t *pudp, unsigned long addr, unsigned long end, static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr, unsigned long end, phys_addr_t phys, pgprot_t prot, - phys_addr_t (*pgtable_alloc)(void), int flags) + phys_addr_t (*pgtable_alloc)(int), int flags) { unsigned long next; pud_t pud = READ_ONCE(*pudp); @@ -257,7 +257,7 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr, if (pud_none(pud)) { phys_addr_t pmd_phys; BUG_ON(!pgtable_alloc); - pmd_phys = pgtable_alloc(); + pmd_phys = pgtable_alloc(PMD_SHIFT); __pud_populate(pudp, pmd_phys, PUD_TYPE_TABLE); pud = READ_ONCE(*pudp); } @@ -293,7 +293,7 @@ static inline bool use_1G_block(unsigned long addr, unsigned long next, static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end, phys_addr_t phys, pgprot_t prot, - phys_addr_t (*pgtable_alloc)(void), + phys_addr_t (*pgtable_alloc)(int), int flags) { unsigned long next; @@ -303,7 +303,7 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end, if (pgd_none(pgd)) { phys_addr_t pud_phys; BUG_ON(!pgtable_alloc); - pud_phys = pgtable_alloc(); + pud_phys = pgtable_alloc(PUD_SHIFT); __pgd_populate(pgdp, pud_phys, PUD_TYPE_TABLE); pgd = READ_ONCE(*pgdp); } @@ -344,7 +344,7 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end, static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys, unsigned long virt, phys_addr_t size, pgprot_t prot, - phys_addr_t (*pgtable_alloc)(void), + phys_addr_t (*pgtable_alloc)(int), int flags) { unsigned long addr, length, end, next; @@ -370,11 +370,20 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys, } while (pgdp++, addr = next, addr != end); } -static phys_addr_t pgd_pgtable_alloc(void) +static phys_addr_t pgd_pgtable_alloc(int shift) { void *ptr = (void *)__get_free_page(PGALLOC_GFP); - if (!ptr || !pgtable_page_ctor(virt_to_page(ptr))) - BUG(); + BUG_ON(!ptr); + + /* + * Initialize page table locks in case later we need to + * call core mm functions like apply_to_page_range() on + * this pre-allocated page table. + */ + if (shift == PAGE_SHIFT) + BUG_ON(!pgtable_page_ctor(virt_to_page(ptr))); + else if (shift == PMD_SHIFT && PMD_SHIFT != PUD_SHIFT) + BUG_ON(!pgtable_pmd_page_ctor(virt_to_page(ptr))); /* Ensure the zeroed page is visible to the page table walker */ dsb(ishst);
For pte page, use pgtable_page_ctor(); for pmd page, use pgtable_pmd_page_ctor() if not folded; and for the rest (pud, p4d and pgd), don't use any. Signed-off-by: Yu Zhao <yuzhao@google.com> --- arch/arm64/mm/mmu.c | 33 +++++++++++++++++++++------------ 1 file changed, 21 insertions(+), 12 deletions(-)