diff mbox

[3/3] kvm: add a function to check if page is from NVDIMM pmem.

Message ID 359fdf0103b61014bf811d88d4ce36bc793d18f2.1530716899.git.yi.z.zhang@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Zhang, Yi July 4, 2018, 3:30 p.m. UTC
For device specific memory space, when we move these area of pfn to
memory zone, we will set the page reserved flag at that time, some of
these reserved for device mmio, and some of these are not, such as
NVDIMM pmem.

Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
backend, since these pages are reserved. the check of
kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
to indentify these pages are from NVDIMM pmem. and let kvm treat these
as normal pages.

Without this patch, Many operations will be missed due to this
mistreatment to pmem pages. For example, a page may not have chance to
be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.

Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yu <yu.c.zhang@linux.intel.com>
---
 virt/kvm/kvm_main.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

Comments

Dan Williams July 4, 2018, 2:50 p.m. UTC | #1
[ adding Jerome ]

On Wed, Jul 4, 2018 at 8:30 AM, Zhang Yi <yi.z.zhang@linux.intel.com> wrote:
> For device specific memory space, when we move these area of pfn to
> memory zone, we will set the page reserved flag at that time, some of
> these reserved for device mmio, and some of these are not, such as
> NVDIMM pmem.
>
> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
> backend, since these pages are reserved. the check of
> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
> to indentify these pages are from NVDIMM pmem. and let kvm treat these
> as normal pages.
>
> Without this patch, Many operations will be missed due to this
> mistreatment to pmem pages. For example, a page may not have chance to
> be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
>
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yu <yu.c.zhang@linux.intel.com>
> ---
>  virt/kvm/kvm_main.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index afb2e6e..1365d18 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -140,10 +140,23 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
>  {
>  }
>
> +static bool kvm_is_nd_pfn(kvm_pfn_t pfn)
> +{
> +       struct page *page = pfn_to_page(pfn);
> +
> +       return is_zone_device_page(page) &&
> +               ((page->pgmap->type == MEMORY_DEVICE_FS_DAX) ||
> +                (page->pgmap->type == MEMORY_DEVICE_DEV_DAX));

Jerome, might there be any use case to pass MEMORY_DEVICE_PUBLIC
memory to a guest vm?

> +}
> +
>  bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
>  {
> -       if (pfn_valid(pfn))
> -               return PageReserved(pfn_to_page(pfn));
> +       struct page *page;
> +
> +       if (pfn_valid(pfn)) {
> +               page = pfn_to_page(pfn);
> +               return kvm_is_nd_pfn(pfn) ? false : PageReserved(page);
> +       }
>
>         return true;
>  }
> --
> 2.7.4
>
Paolo Bonzini July 4, 2018, 3:25 p.m. UTC | #2
On 04/07/2018 17:30, Zhang Yi wrote:
> For device specific memory space, when we move these area of pfn to
> memory zone, we will set the page reserved flag at that time, some of
> these reserved for device mmio, and some of these are not, such as
> NVDIMM pmem.
> 
> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
> backend, since these pages are reserved. the check of
> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
> to indentify these pages are from NVDIMM pmem. and let kvm treat these
> as normal pages.
> 
> Without this patch, Many operations will be missed due to this
> mistreatment to pmem pages. For example, a page may not have chance to
> be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
> 
> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yu <yu.c.zhang@linux.intel.com>
> ---
>  virt/kvm/kvm_main.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index afb2e6e..1365d18 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -140,10 +140,23 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
>  {
>  }
>  
> +static bool kvm_is_nd_pfn(kvm_pfn_t pfn)
> +{
> +	struct page *page = pfn_to_page(pfn);
> +
> +	return is_zone_device_page(page) &&
> +		((page->pgmap->type == MEMORY_DEVICE_FS_DAX) ||
> +		 (page->pgmap->type == MEMORY_DEVICE_DEV_DAX));
> +}

If the mm people agree, I'd prefer something that takes a struct page *
and is exported by include/linux/mm.h.  Then KVM can just do something like

	struct page *page;
	if (!pfn_valid(pfn))
		return true;

	page = pfn_to_page(pfn);
	return PageReserved(page) && !is_dax_page(page);

Thanks,

Paolo
Paolo Bonzini July 4, 2018, 3:27 p.m. UTC | #3
On 04/07/2018 16:50, Dan Williams wrote:
>> +       return is_zone_device_page(page) &&
>> +               ((page->pgmap->type == MEMORY_DEVICE_FS_DAX) ||
>> +                (page->pgmap->type == MEMORY_DEVICE_DEV_DAX));
> Jerome, might there be any use case to pass MEMORY_DEVICE_PUBLIC
> memory to a guest vm?
> 

An even better reason to place this in mm.h. :)  There should be an
function to tell you if a reserved page has accessed/dirty bits etc.,
that's all that KVM needs to know.

Paolo
Zhang, Yi July 5, 2018, 1:19 p.m. UTC | #4
On 2018年07月04日 23:25, Paolo Bonzini wrote:
> On 04/07/2018 17:30, Zhang Yi wrote:
>> For device specific memory space, when we move these area of pfn to
>> memory zone, we will set the page reserved flag at that time, some of
>> these reserved for device mmio, and some of these are not, such as
>> NVDIMM pmem.
>>
>> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
>> backend, since these pages are reserved. the check of
>> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
>> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
>> to indentify these pages are from NVDIMM pmem. and let kvm treat these
>> as normal pages.
>>
>> Without this patch, Many operations will be missed due to this
>> mistreatment to pmem pages. For example, a page may not have chance to
>> be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
>> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
>>
>> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
>> Signed-off-by: Zhang Yu <yu.c.zhang@linux.intel.com>
>> ---
>>  virt/kvm/kvm_main.c | 17 +++++++++++++++--
>>  1 file changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index afb2e6e..1365d18 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -140,10 +140,23 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
>>  {
>>  }
>>  
>> +static bool kvm_is_nd_pfn(kvm_pfn_t pfn)
>> +{
>> +	struct page *page = pfn_to_page(pfn);
>> +
>> +	return is_zone_device_page(page) &&
>> +		((page->pgmap->type == MEMORY_DEVICE_FS_DAX) ||
>> +		 (page->pgmap->type == MEMORY_DEVICE_DEV_DAX));
>> +}
> If the mm people agree, I'd prefer something that takes a struct page *
> and is exported by include/linux/mm.h.  Then KVM can just do something like
>
> 	struct page *page;
> 	if (!pfn_valid(pfn))
> 		return true;
>
> 	page = pfn_to_page(pfn);
> 	return PageReserved(page) && !is_dax_page(page);
>
> Thanks,
>
> Paolo
Yeah, that could be much better. Thanks for your comments Paolo.

Hi Kara, Do u have Any opinions/ideas to add such definition in mm?

Regards,
Yi
Jan Kara July 9, 2018, 12:36 p.m. UTC | #5
On Thu 05-07-18 21:19:30, Zhang,Yi wrote:
> 
> 
> On 2018年07月04日 23:25, Paolo Bonzini wrote:
> > On 04/07/2018 17:30, Zhang Yi wrote:
> >> For device specific memory space, when we move these area of pfn to
> >> memory zone, we will set the page reserved flag at that time, some of
> >> these reserved for device mmio, and some of these are not, such as
> >> NVDIMM pmem.
> >>
> >> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
> >> backend, since these pages are reserved. the check of
> >> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
> >> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
> >> to indentify these pages are from NVDIMM pmem. and let kvm treat these
> >> as normal pages.
> >>
> >> Without this patch, Many operations will be missed due to this
> >> mistreatment to pmem pages. For example, a page may not have chance to
> >> be unpinned for KVM guest(in kvm_release_pfn_clean); not able to be
> >> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
> >>
> >> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com>
> >> Signed-off-by: Zhang Yu <yu.c.zhang@linux.intel.com>
> >> ---
> >>  virt/kvm/kvm_main.c | 17 +++++++++++++++--
> >>  1 file changed, 15 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >> index afb2e6e..1365d18 100644
> >> --- a/virt/kvm/kvm_main.c
> >> +++ b/virt/kvm/kvm_main.c
> >> @@ -140,10 +140,23 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> >>  {
> >>  }
> >>  
> >> +static bool kvm_is_nd_pfn(kvm_pfn_t pfn)
> >> +{
> >> +	struct page *page = pfn_to_page(pfn);
> >> +
> >> +	return is_zone_device_page(page) &&
> >> +		((page->pgmap->type == MEMORY_DEVICE_FS_DAX) ||
> >> +		 (page->pgmap->type == MEMORY_DEVICE_DEV_DAX));
> >> +}
> > If the mm people agree, I'd prefer something that takes a struct page *
> > and is exported by include/linux/mm.h.  Then KVM can just do something like
> >
> > 	struct page *page;
> > 	if (!pfn_valid(pfn))
> > 		return true;
> >
> > 	page = pfn_to_page(pfn);
> > 	return PageReserved(page) && !is_dax_page(page);
> >
> > Thanks,
> >
> > Paolo
> Yeah, that could be much better. Thanks for your comments Paolo.
> 
> Hi Kara, Do u have Any opinions/ideas to add such definition in mm?

What Paolo suggests sounds good to me.

								Honza
diff mbox

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index afb2e6e..1365d18 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -140,10 +140,23 @@  __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
 {
 }
 
+static bool kvm_is_nd_pfn(kvm_pfn_t pfn)
+{
+	struct page *page = pfn_to_page(pfn);
+
+	return is_zone_device_page(page) &&
+		((page->pgmap->type == MEMORY_DEVICE_FS_DAX) ||
+		 (page->pgmap->type == MEMORY_DEVICE_DEV_DAX));
+}
+
 bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
 {
-	if (pfn_valid(pfn))
-		return PageReserved(pfn_to_page(pfn));
+	struct page *page;
+
+	if (pfn_valid(pfn)) {
+		page = pfn_to_page(pfn);
+		return kvm_is_nd_pfn(pfn) ? false : PageReserved(page);
+	}
 
 	return true;
 }