Message ID | 1457737157-38573-9-git-send-email-kirill.shutemov@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: > [ text/plain ] > split_huge_pmd() for file mappings (and DAX too) is implemented by just > clearing pmd entry as we can re-fill this area from page cache on pte > level later. > > This means we don't need deposit page tables when file THP is mapped. > Therefore we shouldn't try to withdraw a page table on zap_huge_pmd() > file THP PMD. Archs like ppc64 use deposited page table to track the hardware page table slot information. We probably may want to add hooks which arch can use to achieve the same even with file THP > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > --- > mm/huge_memory.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 44468fb7cdbf..c22144e3fe11 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1684,10 +1684,16 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, > struct page *page = pmd_page(orig_pmd); > page_remove_rmap(page, true); > VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); > - add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); > VM_BUG_ON_PAGE(!PageHead(page), page); > - pte_free(tlb->mm, pgtable_trans_huge_withdraw(tlb->mm, pmd)); > - atomic_long_dec(&tlb->mm->nr_ptes); > + if (PageAnon(page)) { > + pgtable_t pgtable; > + pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd); > + pte_free(tlb->mm, pgtable); > + atomic_long_dec(&tlb->mm->nr_ptes); > + add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); > + } else { > + add_mm_counter(tlb->mm, MM_FILEPAGES, -HPAGE_PMD_NR); > + } > spin_unlock(ptl); > tlb_remove_page(tlb, page); > } > -- > 2.7.0 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 18, 2016 at 07:23:41PM +0530, Aneesh Kumar K.V wrote: > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: > > > [ text/plain ] > > split_huge_pmd() for file mappings (and DAX too) is implemented by just > > clearing pmd entry as we can re-fill this area from page cache on pte > > level later. > > > > This means we don't need deposit page tables when file THP is mapped. > > Therefore we shouldn't try to withdraw a page table on zap_huge_pmd() > > file THP PMD. > > Archs like ppc64 use deposited page table to track the hardware page > table slot information. We probably may want to add hooks which arch can > use to achieve the same even with file THP Could you describe more on what kind of information you're talking about?
"Kirill A. Shutemov" <kirill@shutemov.name> writes: > [ text/plain ] > On Fri, Mar 18, 2016 at 07:23:41PM +0530, Aneesh Kumar K.V wrote: >> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: >> >> > [ text/plain ] >> > split_huge_pmd() for file mappings (and DAX too) is implemented by just >> > clearing pmd entry as we can re-fill this area from page cache on pte >> > level later. >> > >> > This means we don't need deposit page tables when file THP is mapped. >> > Therefore we shouldn't try to withdraw a page table on zap_huge_pmd() >> > file THP PMD. >> >> Archs like ppc64 use deposited page table to track the hardware page >> table slot information. We probably may want to add hooks which arch can >> use to achieve the same even with file THP > > Could you describe more on what kind of information you're talking about? > Hardware page table in ppc64 requires us to map each subpage of the huge page. This is needed because at low level we use segment base page size to find the hash slot and on TLB miss, we use the faulting address and base page size (which is 64k even with THP) to find whether we have the page mapped in hash page table. Since we use base page size of 64K, we need to make sure that subpages are mapped (on demand) in hash page table. If we have then mapped we also need to track their hash table slot information so that we can clear it on invalidate of hugepage. With THP we used the deposited page table to store the hash slot information. -aneesh -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 21, 2016 at 10:03:29AM +0530, Aneesh Kumar K.V wrote: > "Kirill A. Shutemov" <kirill@shutemov.name> writes: > > > [ text/plain ] > > On Fri, Mar 18, 2016 at 07:23:41PM +0530, Aneesh Kumar K.V wrote: > >> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: > >> > >> > [ text/plain ] > >> > split_huge_pmd() for file mappings (and DAX too) is implemented by just > >> > clearing pmd entry as we can re-fill this area from page cache on pte > >> > level later. > >> > > >> > This means we don't need deposit page tables when file THP is mapped. > >> > Therefore we shouldn't try to withdraw a page table on zap_huge_pmd() > >> > file THP PMD. > >> > >> Archs like ppc64 use deposited page table to track the hardware page > >> table slot information. We probably may want to add hooks which arch can > >> use to achieve the same even with file THP > > > > Could you describe more on what kind of information you're talking about? > > > > Hardware page table in ppc64 requires us to map each subpage of the huge > page. This is needed because at low level we use segment base page size > to find the hash slot and on TLB miss, we use the faulting address and > base page size (which is 64k even with THP) to find whether we have > the page mapped in hash page table. Since we use base page size of 64K, > we need to make sure that subpages are mapped (on demand) in hash page > table. If we have then mapped we also need to track their hash table > slot information so that we can clear it on invalidate of hugepage. > > With THP we used the deposited page table to store the hash slot > information. Could you prepare the patch with these hooks so we can discuss it details? I still have hard time wrap my had around this. I think you have the same problem with DAX huge pages. Or you don't care about DAX?
"Kirill A. Shutemov" <kirill@shutemov.name> writes: > [ text/plain ] > On Mon, Mar 21, 2016 at 10:03:29AM +0530, Aneesh Kumar K.V wrote: >> "Kirill A. Shutemov" <kirill@shutemov.name> writes: >> >> > [ text/plain ] >> > On Fri, Mar 18, 2016 at 07:23:41PM +0530, Aneesh Kumar K.V wrote: >> >> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: >> >> >> >> > [ text/plain ] >> >> > split_huge_pmd() for file mappings (and DAX too) is implemented by just >> >> > clearing pmd entry as we can re-fill this area from page cache on pte >> >> > level later. >> >> > >> >> > This means we don't need deposit page tables when file THP is mapped. >> >> > Therefore we shouldn't try to withdraw a page table on zap_huge_pmd() >> >> > file THP PMD. >> >> >> >> Archs like ppc64 use deposited page table to track the hardware page >> >> table slot information. We probably may want to add hooks which arch can >> >> use to achieve the same even with file THP >> > >> > Could you describe more on what kind of information you're talking about? >> > >> >> Hardware page table in ppc64 requires us to map each subpage of the huge >> page. This is needed because at low level we use segment base page size >> to find the hash slot and on TLB miss, we use the faulting address and >> base page size (which is 64k even with THP) to find whether we have >> the page mapped in hash page table. Since we use base page size of 64K, >> we need to make sure that subpages are mapped (on demand) in hash page >> table. If we have then mapped we also need to track their hash table >> slot information so that we can clear it on invalidate of hugepage. >> >> With THP we used the deposited page table to store the hash slot >> information. > > Could you prepare the patch with these hooks so we can discuss it details? > I still have hard time wrap my had around this. ok > > I think you have the same problem with DAX huge pages. Or you don't care > about DAX? > Yes, DAX hugepage will not be working on ppc64. It is there in the TODO list to get it working :). -aneesh -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 44468fb7cdbf..c22144e3fe11 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1684,10 +1684,16 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, struct page *page = pmd_page(orig_pmd); page_remove_rmap(page, true); VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); - add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); VM_BUG_ON_PAGE(!PageHead(page), page); - pte_free(tlb->mm, pgtable_trans_huge_withdraw(tlb->mm, pmd)); - atomic_long_dec(&tlb->mm->nr_ptes); + if (PageAnon(page)) { + pgtable_t pgtable; + pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd); + pte_free(tlb->mm, pgtable); + atomic_long_dec(&tlb->mm->nr_ptes); + add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); + } else { + add_mm_counter(tlb->mm, MM_FILEPAGES, -HPAGE_PMD_NR); + } spin_unlock(ptl); tlb_remove_page(tlb, page); }
split_huge_pmd() for file mappings (and DAX too) is implemented by just clearing pmd entry as we can re-fill this area from page cache on pte level later. This means we don't need deposit page tables when file THP is mapped. Therefore we shouldn't try to withdraw a page table on zap_huge_pmd() file THP PMD. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- mm/huge_memory.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)