diff mbox

[PATCHv4,08/25] thp: support file pages in zap_huge_pmd()

Message ID 1457737157-38573-9-git-send-email-kirill.shutemov@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Kirill A . Shutemov March 11, 2016, 10:59 p.m. UTC
split_huge_pmd() for file mappings (and DAX too) is implemented by just
clearing pmd entry as we can re-fill this area from page cache on pte
level later.

This means we don't need deposit page tables when file THP is mapped.
Therefore we shouldn't try to withdraw a page table on zap_huge_pmd()
file THP PMD.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/huge_memory.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

Comments

Aneesh Kumar K.V March 18, 2016, 1:53 p.m. UTC | #1
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:

> [ text/plain ]
> split_huge_pmd() for file mappings (and DAX too) is implemented by just
> clearing pmd entry as we can re-fill this area from page cache on pte
> level later.
>
> This means we don't need deposit page tables when file THP is mapped.
> Therefore we shouldn't try to withdraw a page table on zap_huge_pmd()
> file THP PMD.

Archs like ppc64 use deposited page table to track the hardware page
table slot information. We probably may want to add hooks which arch can
use to achieve the same even with file THP 

>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  mm/huge_memory.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 44468fb7cdbf..c22144e3fe11 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1684,10 +1684,16 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  		struct page *page = pmd_page(orig_pmd);
>  		page_remove_rmap(page, true);
>  		VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
> -		add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
>  		VM_BUG_ON_PAGE(!PageHead(page), page);
> -		pte_free(tlb->mm, pgtable_trans_huge_withdraw(tlb->mm, pmd));
> -		atomic_long_dec(&tlb->mm->nr_ptes);
> +		if (PageAnon(page)) {
> +			pgtable_t pgtable;
> +			pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd);
> +			pte_free(tlb->mm, pgtable);
> +			atomic_long_dec(&tlb->mm->nr_ptes);
> +			add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
> +		} else {
> +			add_mm_counter(tlb->mm, MM_FILEPAGES, -HPAGE_PMD_NR);
> +		}
>  		spin_unlock(ptl);
>  		tlb_remove_page(tlb, page);
>  	}
> -- 
> 2.7.0
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kirill A. Shutemov March 19, 2016, 1:02 a.m. UTC | #2
On Fri, Mar 18, 2016 at 07:23:41PM +0530, Aneesh Kumar K.V wrote:
> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
> 
> > [ text/plain ]
> > split_huge_pmd() for file mappings (and DAX too) is implemented by just
> > clearing pmd entry as we can re-fill this area from page cache on pte
> > level later.
> >
> > This means we don't need deposit page tables when file THP is mapped.
> > Therefore we shouldn't try to withdraw a page table on zap_huge_pmd()
> > file THP PMD.
> 
> Archs like ppc64 use deposited page table to track the hardware page
> table slot information. We probably may want to add hooks which arch can
> use to achieve the same even with file THP 

Could you describe more on what kind of information you're talking about?
Aneesh Kumar K.V March 21, 2016, 4:33 a.m. UTC | #3
"Kirill A. Shutemov" <kirill@shutemov.name> writes:

> [ text/plain ]
> On Fri, Mar 18, 2016 at 07:23:41PM +0530, Aneesh Kumar K.V wrote:
>> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
>> 
>> > [ text/plain ]
>> > split_huge_pmd() for file mappings (and DAX too) is implemented by just
>> > clearing pmd entry as we can re-fill this area from page cache on pte
>> > level later.
>> >
>> > This means we don't need deposit page tables when file THP is mapped.
>> > Therefore we shouldn't try to withdraw a page table on zap_huge_pmd()
>> > file THP PMD.
>> 
>> Archs like ppc64 use deposited page table to track the hardware page
>> table slot information. We probably may want to add hooks which arch can
>> use to achieve the same even with file THP 
>
> Could you describe more on what kind of information you're talking about?
>

Hardware page table in ppc64 requires us to map each subpage of the huge
page. This is needed because at low level we use segment base page size
to find the hash slot and on TLB miss, we use the faulting address and
base page size (which is 64k even with THP) to find whether we have
the page mapped in hash page table. Since we use base page size of 64K,
we need to make sure that subpages are mapped (on demand) in hash page
table. If we have then mapped we also need to track their hash table
slot information so that we can clear it on invalidate of hugepage.

With THP we used the deposited page table to store the hash slot
information.

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kirill A. Shutemov March 21, 2016, 2:39 p.m. UTC | #4
On Mon, Mar 21, 2016 at 10:03:29AM +0530, Aneesh Kumar K.V wrote:
> "Kirill A. Shutemov" <kirill@shutemov.name> writes:
> 
> > [ text/plain ]
> > On Fri, Mar 18, 2016 at 07:23:41PM +0530, Aneesh Kumar K.V wrote:
> >> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
> >> 
> >> > [ text/plain ]
> >> > split_huge_pmd() for file mappings (and DAX too) is implemented by just
> >> > clearing pmd entry as we can re-fill this area from page cache on pte
> >> > level later.
> >> >
> >> > This means we don't need deposit page tables when file THP is mapped.
> >> > Therefore we shouldn't try to withdraw a page table on zap_huge_pmd()
> >> > file THP PMD.
> >> 
> >> Archs like ppc64 use deposited page table to track the hardware page
> >> table slot information. We probably may want to add hooks which arch can
> >> use to achieve the same even with file THP 
> >
> > Could you describe more on what kind of information you're talking about?
> >
> 
> Hardware page table in ppc64 requires us to map each subpage of the huge
> page. This is needed because at low level we use segment base page size
> to find the hash slot and on TLB miss, we use the faulting address and
> base page size (which is 64k even with THP) to find whether we have
> the page mapped in hash page table. Since we use base page size of 64K,
> we need to make sure that subpages are mapped (on demand) in hash page
> table. If we have then mapped we also need to track their hash table
> slot information so that we can clear it on invalidate of hugepage.
> 
> With THP we used the deposited page table to store the hash slot
> information.

Could you prepare the patch with these hooks so we can discuss it details?
I still have hard time wrap my had around this.

I think you have the same problem with DAX huge pages. Or you don't care
about DAX?
Aneesh Kumar K.V March 21, 2016, 4:42 p.m. UTC | #5
"Kirill A. Shutemov" <kirill@shutemov.name> writes:

> [ text/plain ]
> On Mon, Mar 21, 2016 at 10:03:29AM +0530, Aneesh Kumar K.V wrote:
>> "Kirill A. Shutemov" <kirill@shutemov.name> writes:
>> 
>> > [ text/plain ]
>> > On Fri, Mar 18, 2016 at 07:23:41PM +0530, Aneesh Kumar K.V wrote:
>> >> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
>> >> 
>> >> > [ text/plain ]
>> >> > split_huge_pmd() for file mappings (and DAX too) is implemented by just
>> >> > clearing pmd entry as we can re-fill this area from page cache on pte
>> >> > level later.
>> >> >
>> >> > This means we don't need deposit page tables when file THP is mapped.
>> >> > Therefore we shouldn't try to withdraw a page table on zap_huge_pmd()
>> >> > file THP PMD.
>> >> 
>> >> Archs like ppc64 use deposited page table to track the hardware page
>> >> table slot information. We probably may want to add hooks which arch can
>> >> use to achieve the same even with file THP 
>> >
>> > Could you describe more on what kind of information you're talking about?
>> >
>> 
>> Hardware page table in ppc64 requires us to map each subpage of the huge
>> page. This is needed because at low level we use segment base page size
>> to find the hash slot and on TLB miss, we use the faulting address and
>> base page size (which is 64k even with THP) to find whether we have
>> the page mapped in hash page table. Since we use base page size of 64K,
>> we need to make sure that subpages are mapped (on demand) in hash page
>> table. If we have then mapped we also need to track their hash table
>> slot information so that we can clear it on invalidate of hugepage.
>> 
>> With THP we used the deposited page table to store the hash slot
>> information.
>
> Could you prepare the patch with these hooks so we can discuss it details?
> I still have hard time wrap my had around this.


ok

>
> I think you have the same problem with DAX huge pages. Or you don't care
> about DAX?
>

Yes, DAX hugepage will not be working on ppc64. It is there in the TODO
list to get it working :).

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 44468fb7cdbf..c22144e3fe11 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1684,10 +1684,16 @@  int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		struct page *page = pmd_page(orig_pmd);
 		page_remove_rmap(page, true);
 		VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
-		add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
 		VM_BUG_ON_PAGE(!PageHead(page), page);
-		pte_free(tlb->mm, pgtable_trans_huge_withdraw(tlb->mm, pmd));
-		atomic_long_dec(&tlb->mm->nr_ptes);
+		if (PageAnon(page)) {
+			pgtable_t pgtable;
+			pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd);
+			pte_free(tlb->mm, pgtable);
+			atomic_long_dec(&tlb->mm->nr_ptes);
+			add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
+		} else {
+			add_mm_counter(tlb->mm, MM_FILEPAGES, -HPAGE_PMD_NR);
+		}
 		spin_unlock(ptl);
 		tlb_remove_page(tlb, page);
 	}