Message ID | 87e23dfbac6f4a68e61d91cddfdfe157163975c1.1602093760.git.yuleixzhang@tencent.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Enhance memory utilization with DMEMFS | expand |
On 08/10/20 09:53, yulei.kernel@gmail.com wrote: > From: Yulei Zhang <yuleixzhang@tencent.com> > > x86 pat uses 'struct page' by only checking if it's system ram, > however it is not true if dmem is used, let's teach pat to > recognize this case if it is ram but it is !pfn_valid() > > We always use WB for dmem and any attempt to change this > behavior will be rejected and WARN_ON is triggered > > Signed-off-by: Xiao Guangrong <gloryxiao@tencent.com> > Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com> Hooks like these will make it very hard to merge this series. I like the idea of struct page-backed memory, but this is a lot of code and I wonder if it's worth adding all these complications. One can already use mem= to remove the "struct page" cost for most of the host memory, and manage the allocation of the remaining memory in userspace with /dev/mem. What is the advantage of doing this in the kernel? Paolo
On Tue, Oct 13, 2020 at 3:27 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 08/10/20 09:53, yulei.kernel@gmail.com wrote: > > From: Yulei Zhang <yuleixzhang@tencent.com> > > > > x86 pat uses 'struct page' by only checking if it's system ram, > > however it is not true if dmem is used, let's teach pat to > > recognize this case if it is ram but it is !pfn_valid() > > > > We always use WB for dmem and any attempt to change this > > behavior will be rejected and WARN_ON is triggered > > > > Signed-off-by: Xiao Guangrong <gloryxiao@tencent.com> > > Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com> > > Hooks like these will make it very hard to merge this series. > > I like the idea of struct page-backed memory, but this is a lot of code > and I wonder if it's worth adding all these complications. > > One can already use mem= to remove the "struct page" cost for most of > the host memory, and manage the allocation of the remaining memory in > userspace with /dev/mem. What is the advantage of doing this in the kernel? > > Paolo > hi Paolo,as far as I know there are a few limitations to play with /dev/mem in this case. 1. access to /dev/men is restricted due to the security requirement, but usually our virtual machines are unprivileged processes. 2. what we get from /dev/mem is a whole block of memory, as dynamic VMs running on /dev/mem will cause memory fragment, it needs extra logic to manage the allocation and recovery to avoid wasted memory. dmemfs can support this and also leverage the kernel tlb management. 3. it needs to support hugepage with different page size granularity. 4. MCE recovery capability is also required.
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c index 8f665c352bf0..fd8a298fc30b 100644 --- a/arch/x86/mm/pat/memtype.c +++ b/arch/x86/mm/pat/memtype.c @@ -511,6 +511,13 @@ static int reserve_ram_pages_type(u64 start, u64 end, for (pfn = (start >> PAGE_SHIFT); pfn < (end >> PAGE_SHIFT); ++pfn) { enum page_cache_mode type; + /* + * it's dmem if it's ram but not 'struct page' backend, + * we always use WB + */ + if (WARN_ON(!pfn_valid(pfn))) + return -EBUSY; + page = pfn_to_page(pfn); type = get_page_memtype(page); if (type != _PAGE_CACHE_MODE_WB) { @@ -539,6 +546,13 @@ static int free_ram_pages_type(u64 start, u64 end) u64 pfn; for (pfn = (start >> PAGE_SHIFT); pfn < (end >> PAGE_SHIFT); ++pfn) { + /* + * it's dmem, see the comments in + * reserve_ram_pages_type() + */ + if (WARN_ON(!pfn_valid(pfn))) + continue; + page = pfn_to_page(pfn); set_page_memtype(page, _PAGE_CACHE_MODE_WB); } @@ -714,6 +728,13 @@ static enum page_cache_mode lookup_memtype(u64 paddr) if (pat_pagerange_is_ram(paddr, paddr + PAGE_SIZE)) { struct page *page; + /* + * dmem always uses WB, see the comments in + * reserve_ram_pages_type() + */ + if (!pfn_valid(paddr >> PAGE_SHIFT)) + return rettype; + page = pfn_to_page(paddr >> PAGE_SHIFT); return get_page_memtype(page); }