diff mbox series

[04/35] dmem: let pat recognize dmem

Message ID 87e23dfbac6f4a68e61d91cddfdfe157163975c1.1602093760.git.yuleixzhang@tencent.com (mailing list archive)
State New, archived
Headers show
Series Enhance memory utilization with DMEMFS | expand

Commit Message

yulei zhang Oct. 8, 2020, 7:53 a.m. UTC
From: Yulei Zhang <yuleixzhang@tencent.com>

x86 pat uses 'struct page' by only checking if it's system ram,
however it is not true if dmem is used, let's teach pat to
recognize this case if it is ram but it is !pfn_valid()

We always use WB for dmem and any attempt to change this
behavior will be rejected and WARN_ON is triggered

Signed-off-by: Xiao Guangrong <gloryxiao@tencent.com>
Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
 arch/x86/mm/pat/memtype.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

Comments

Paolo Bonzini Oct. 13, 2020, 7:27 a.m. UTC | #1
On 08/10/20 09:53, yulei.kernel@gmail.com wrote:
> From: Yulei Zhang <yuleixzhang@tencent.com>
> 
> x86 pat uses 'struct page' by only checking if it's system ram,
> however it is not true if dmem is used, let's teach pat to
> recognize this case if it is ram but it is !pfn_valid()
> 
> We always use WB for dmem and any attempt to change this
> behavior will be rejected and WARN_ON is triggered
> 
> Signed-off-by: Xiao Guangrong <gloryxiao@tencent.com>
> Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>

Hooks like these will make it very hard to merge this series.

I like the idea of struct page-backed memory, but this is a lot of code
and I wonder if it's worth adding all these complications.

One can already use mem= to remove the "struct page" cost for most of
the host memory, and manage the allocation of the remaining memory in
userspace with /dev/mem.  What is the advantage of doing this in the kernel?

Paolo
yulei zhang Oct. 13, 2020, 9:53 a.m. UTC | #2
On Tue, Oct 13, 2020 at 3:27 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 08/10/20 09:53, yulei.kernel@gmail.com wrote:
> > From: Yulei Zhang <yuleixzhang@tencent.com>
> >
> > x86 pat uses 'struct page' by only checking if it's system ram,
> > however it is not true if dmem is used, let's teach pat to
> > recognize this case if it is ram but it is !pfn_valid()
> >
> > We always use WB for dmem and any attempt to change this
> > behavior will be rejected and WARN_ON is triggered
> >
> > Signed-off-by: Xiao Guangrong <gloryxiao@tencent.com>
> > Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
>
> Hooks like these will make it very hard to merge this series.
>
> I like the idea of struct page-backed memory, but this is a lot of code
> and I wonder if it's worth adding all these complications.
>
> One can already use mem= to remove the "struct page" cost for most of
> the host memory, and manage the allocation of the remaining memory in
> userspace with /dev/mem.  What is the advantage of doing this in the kernel?
>
> Paolo
>

hi Paolo,as far as I know there are a few limitations to play with
/dev/mem in this case.
1. access to /dev/men is restricted due to the security requirement,
but usually our virtual machines are unprivileged processes.
2. what we get from /dev/mem is a whole block of memory, as dynamic
VMs running on /dev/mem will cause memory fragment, it needs extra logic
to manage the allocation and recovery to avoid wasted memory. dmemfs
can support this and also leverage the kernel tlb management.
3. it needs to support hugepage with different page size granularity.
4. MCE recovery capability is also required.
diff mbox series

Patch

diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 8f665c352bf0..fd8a298fc30b 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -511,6 +511,13 @@  static int reserve_ram_pages_type(u64 start, u64 end,
 	for (pfn = (start >> PAGE_SHIFT); pfn < (end >> PAGE_SHIFT); ++pfn) {
 		enum page_cache_mode type;
 
+		/*
+		 * it's dmem if it's ram but not 'struct page' backend,
+		 * we always use WB
+		 */
+		if (WARN_ON(!pfn_valid(pfn)))
+			return -EBUSY;
+
 		page = pfn_to_page(pfn);
 		type = get_page_memtype(page);
 		if (type != _PAGE_CACHE_MODE_WB) {
@@ -539,6 +546,13 @@  static int free_ram_pages_type(u64 start, u64 end)
 	u64 pfn;
 
 	for (pfn = (start >> PAGE_SHIFT); pfn < (end >> PAGE_SHIFT); ++pfn) {
+		/*
+		 * it's dmem, see the comments in
+		 * reserve_ram_pages_type()
+		 */
+		if (WARN_ON(!pfn_valid(pfn)))
+			continue;
+
 		page = pfn_to_page(pfn);
 		set_page_memtype(page, _PAGE_CACHE_MODE_WB);
 	}
@@ -714,6 +728,13 @@  static enum page_cache_mode lookup_memtype(u64 paddr)
 	if (pat_pagerange_is_ram(paddr, paddr + PAGE_SIZE)) {
 		struct page *page;
 
+		/*
+		 * dmem always uses WB, see the comments in
+		 * reserve_ram_pages_type()
+		 */
+		if (!pfn_valid(paddr >> PAGE_SHIFT))
+			return rettype;
+
 		page = pfn_to_page(paddr >> PAGE_SHIFT);
 		return get_page_memtype(page);
 	}