Message ID | 20200908195539.25896-3-willy@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Remove assumptions of THP size | expand |
On Tue, Sep 08, 2020 at 08:55:29PM +0100, Matthew Wilcox (Oracle) wrote: > A compound page in the page cache will not necessarily be of PMD size, > so check explicitly. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > mm/memory.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 602f4283122f..4b35b4e71e64 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3562,13 +3562,14 @@ static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) > unsigned long haddr = vmf->address & HPAGE_PMD_MASK; > pmd_t entry; > int i; > - vm_fault_t ret; > + vm_fault_t ret = VM_FAULT_FALLBACK; > > if (!transhuge_vma_suitable(vma, haddr)) > - return VM_FAULT_FALLBACK; > + return ret; > > - ret = VM_FAULT_FALLBACK; > page = compound_head(page); > + if (page_order(page) != HPAGE_PMD_ORDER) > + return ret; Maybe also VM_BUG_ON_PAGE(page_order(page) > HPAGE_PMD_ORDER, page)? Just in case. > > /* > * Archs like ppc64 need additonal space to store information > -- > 2.28.0 >
On Wed, Sep 09, 2020 at 05:29:04PM +0300, Kirill A. Shutemov wrote: > On Tue, Sep 08, 2020 at 08:55:29PM +0100, Matthew Wilcox (Oracle) wrote: > > A compound page in the page cache will not necessarily be of PMD size, > > so check explicitly. > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > --- > > mm/memory.c | 7 ++++--- > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 602f4283122f..4b35b4e71e64 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -3562,13 +3562,14 @@ static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) > > unsigned long haddr = vmf->address & HPAGE_PMD_MASK; > > pmd_t entry; > > int i; > > - vm_fault_t ret; > > + vm_fault_t ret = VM_FAULT_FALLBACK; > > > > if (!transhuge_vma_suitable(vma, haddr)) > > - return VM_FAULT_FALLBACK; > > + return ret; > > > > - ret = VM_FAULT_FALLBACK; > > page = compound_head(page); > > + if (page_order(page) != HPAGE_PMD_ORDER) > > + return ret; > > Maybe also VM_BUG_ON_PAGE(page_order(page) > HPAGE_PMD_ORDER, page)? > Just in case. In the patch where I actually start creating THPs, I limit the order to HPAGE_PMD_ORDER, so we're not going to see this today. At some point in the future, I can imagine that we allow THPs larger than PMD size, and what we'd want alloc_set_pte() to look like is: if (pud_none(*vmf->pud) && PageTransCompound(page)) { ret = do_set_pud(vmf, page); if (ret != VM_FAULT_FALLBACK) return ret; } if (pmd_none(*vmf->pmd) && PageTransCompound(page)) { ret = do_set_pmd(vmf, page); if (ret != VM_FAULT_FALLBACK) return ret; } Once we're in that situation, in do_set_pmd(), we'd want to figure out which sub-page of the >PMD-sized page to insert. But I don't want to write code for that now. So, what's the right approach if somebody does call alloc_set_pte() with a >PMD sized page? It's not exported, so the only two ways to get it called with a >PMD sized page is to (1) persuade filemap_map_pages() to call it, which means putting it in the page cache or (2) return it from vm_ops->fault. If someone actually does that (an interesting device driver, perhaps), I don't think hitting it with a BUG is the right response. I think it should actually be to map the right PMD-sized chunk of the page, but we don't even do that today -- we map the first PMD-sized chunk of the page. With this patch, we'll simply map the appropriate PAGE_SIZE chunk at the requested address. So this would be a bugfix for such a demented driver. At some point, it'd be nice to handle this with a PMD, but I don't want to write that code without a test-case. We could probably simulate it with the page cache THP code and be super-aggressive about creating order-10 pages ... but this is feeling more and more out of scope for this patch set, which today hit 99 patches.
On Wed, Sep 09, 2020 at 03:50:35PM +0100, Matthew Wilcox wrote: > On Wed, Sep 09, 2020 at 05:29:04PM +0300, Kirill A. Shutemov wrote: > > On Tue, Sep 08, 2020 at 08:55:29PM +0100, Matthew Wilcox (Oracle) wrote: > > > A compound page in the page cache will not necessarily be of PMD size, > > > so check explicitly. > > > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > > --- > > > mm/memory.c | 7 ++++--- > > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > > > diff --git a/mm/memory.c b/mm/memory.c > > > index 602f4283122f..4b35b4e71e64 100644 > > > --- a/mm/memory.c > > > +++ b/mm/memory.c > > > @@ -3562,13 +3562,14 @@ static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) > > > unsigned long haddr = vmf->address & HPAGE_PMD_MASK; > > > pmd_t entry; > > > int i; > > > - vm_fault_t ret; > > > + vm_fault_t ret = VM_FAULT_FALLBACK; > > > > > > if (!transhuge_vma_suitable(vma, haddr)) > > > - return VM_FAULT_FALLBACK; > > > + return ret; > > > > > > - ret = VM_FAULT_FALLBACK; > > > page = compound_head(page); > > > + if (page_order(page) != HPAGE_PMD_ORDER) > > > + return ret; > > > > Maybe also VM_BUG_ON_PAGE(page_order(page) > HPAGE_PMD_ORDER, page)? > > Just in case. > > In the patch where I actually start creating THPs, I limit the order to > HPAGE_PMD_ORDER, so we're not going to see this today. At some point > in the future, I can imagine that we allow THPs larger than PMD size, > and what we'd want alloc_set_pte() to look like is: > > if (pud_none(*vmf->pud) && PageTransCompound(page)) { > ret = do_set_pud(vmf, page); > if (ret != VM_FAULT_FALLBACK) > return ret; > } > if (pmd_none(*vmf->pmd) && PageTransCompound(page)) { > ret = do_set_pmd(vmf, page); > if (ret != VM_FAULT_FALLBACK) > return ret; > } > > Once we're in that situation, in do_set_pmd(), we'd want to figure out > which sub-page of the >PMD-sized page to insert. But I don't want to > write code for that now. > > So, what's the right approach if somebody does call alloc_set_pte() > with a >PMD sized page? It's not exported, so the only two ways to get > it called with a >PMD sized page is to (1) persuade filemap_map_pages() > to call it, which means putting it in the page cache or (2) return it > from vm_ops->fault. If someone actually does that (an interesting > device driver, perhaps), I don't think hitting it with a BUG is the > right response. I think it should actually be to map the right PMD-sized > chunk of the page, but we don't even do that today -- we map the first > PMD-sized chunk of the page. > > With this patch, we'll simply map the appropriate PAGE_SIZE chunk at the > requested address. So this would be a bugfix for such a demented driver. > At some point, it'd be nice to handle this with a PMD, but I don't want > to write that code without a test-case. We could probably simulate > it with the page cache THP code and be super-aggressive about creating > order-10 pages ... but this is feeling more and more out of scope for > this patch set, which today hit 99 patches. Okay, fair enough. VM_BUG is too strong reaction here as we can make a reasonable fallback. Maybe WARN_ON_ONCE() would make sense? It would indicate the place that has to be adjust once we would get abouve PMD-order pages.
diff --git a/mm/memory.c b/mm/memory.c index 602f4283122f..4b35b4e71e64 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3562,13 +3562,14 @@ static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) unsigned long haddr = vmf->address & HPAGE_PMD_MASK; pmd_t entry; int i; - vm_fault_t ret; + vm_fault_t ret = VM_FAULT_FALLBACK; if (!transhuge_vma_suitable(vma, haddr)) - return VM_FAULT_FALLBACK; + return ret; - ret = VM_FAULT_FALLBACK; page = compound_head(page); + if (page_order(page) != HPAGE_PMD_ORDER) + return ret; /* * Archs like ppc64 need additonal space to store information
A compound page in the page cache will not necessarily be of PMD size, so check explicitly. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- mm/memory.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)