Message ID | 20191021172353.3056-1-david@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | mm: Memory offlining + page isolation cleanups | expand |
On 21.10.19 19:23, David Hildenbrand wrote: > Two cleanups that popped up while working on (and discussing) virtio-mem: > https://lkml.org/lkml/2019/9/19/463 > > Tested with DIMMs on x86. > > As discussed with michal in v1, I'll soon look into removing the use > of PG_reserved during memory onlining completely - most probably > disallowing to offline memory blocks with holes, cleaning up the > onlining+offlining code. BTW, I remember that ZONE_DEVICE pages are still required to be set PG_reserved. That has to be sorted out first. I remember that somebody was working on it a while ago but didn't hear about that again. Will look into that as well - should be as easy as adding a zone check (if there isn't a pfn_to_online_page() check already). But of course, there might be special cases ....
On Tue 22-10-19 08:52:28, David Hildenbrand wrote: > On 21.10.19 19:23, David Hildenbrand wrote: > > Two cleanups that popped up while working on (and discussing) virtio-mem: > > https://lkml.org/lkml/2019/9/19/463 > > > > Tested with DIMMs on x86. > > > > As discussed with michal in v1, I'll soon look into removing the use > > of PG_reserved during memory onlining completely - most probably > > disallowing to offline memory blocks with holes, cleaning up the > > onlining+offlining code. > > BTW, I remember that ZONE_DEVICE pages are still required to be set > PG_reserved. That has to be sorted out first. Do they? > I remember that somebody was > working on it a while ago but didn't hear about that again. Will look into > that as well - should be as easy as adding a zone check (if there isn't a > pfn_to_online_page() check already). But of course, there might be special > cases .... I remember Alexander didn't want to change the PageReserved handling because he was worried about unforeseeable side effects. I have a vague recollection he (or maybe Dan) has promissed some follow up clean ups which didn't seem to materialize.
On 22.10.19 10:08, Michal Hocko wrote: > On Tue 22-10-19 08:52:28, David Hildenbrand wrote: >> On 21.10.19 19:23, David Hildenbrand wrote: >>> Two cleanups that popped up while working on (and discussing) virtio-mem: >>> https://lkml.org/lkml/2019/9/19/463 >>> >>> Tested with DIMMs on x86. >>> >>> As discussed with michal in v1, I'll soon look into removing the use >>> of PG_reserved during memory onlining completely - most probably >>> disallowing to offline memory blocks with holes, cleaning up the >>> onlining+offlining code. >> >> BTW, I remember that ZONE_DEVICE pages are still required to be set >> PG_reserved. That has to be sorted out first. > > Do they? Yes, especially KVM code :/ > >> I remember that somebody was >> working on it a while ago but didn't hear about that again. Will look into >> that as well - should be as easy as adding a zone check (if there isn't a >> pfn_to_online_page() check already). But of course, there might be special >> cases .... > > I remember Alexander didn't want to change the PageReserved handling > because he was worried about unforeseeable side effects. I have a vague > recollection he (or maybe Dan) has promissed some follow up clean ups > which didn't seem to materialize. I'm looking into it right now, especially the KVM part.
On Tue 22-10-19 10:15:07, David Hildenbrand wrote: > On 22.10.19 10:08, Michal Hocko wrote: > > On Tue 22-10-19 08:52:28, David Hildenbrand wrote: > > > On 21.10.19 19:23, David Hildenbrand wrote: > > > > Two cleanups that popped up while working on (and discussing) virtio-mem: > > > > https://lkml.org/lkml/2019/9/19/463 > > > > > > > > Tested with DIMMs on x86. > > > > > > > > As discussed with michal in v1, I'll soon look into removing the use > > > > of PG_reserved during memory onlining completely - most probably > > > > disallowing to offline memory blocks with holes, cleaning up the > > > > onlining+offlining code. > > > > > > BTW, I remember that ZONE_DEVICE pages are still required to be set > > > PG_reserved. That has to be sorted out first. > > > > Do they? > > Yes, especially KVM code :/ Details please?
On 22.10.19 10:21, Michal Hocko wrote: > On Tue 22-10-19 10:15:07, David Hildenbrand wrote: >> On 22.10.19 10:08, Michal Hocko wrote: >>> On Tue 22-10-19 08:52:28, David Hildenbrand wrote: >>>> On 21.10.19 19:23, David Hildenbrand wrote: >>>>> Two cleanups that popped up while working on (and discussing) virtio-mem: >>>>> https://lkml.org/lkml/2019/9/19/463 >>>>> >>>>> Tested with DIMMs on x86. >>>>> >>>>> As discussed with michal in v1, I'll soon look into removing the use >>>>> of PG_reserved during memory onlining completely - most probably >>>>> disallowing to offline memory blocks with holes, cleaning up the >>>>> onlining+offlining code. >>>> >>>> BTW, I remember that ZONE_DEVICE pages are still required to be set >>>> PG_reserved. That has to be sorted out first. >>> >>> Do they? >> >> Yes, especially KVM code :/ > > Details please? > E.g., arch/x86/kvm/mmu.c:kvm_is_mmio_pfn() And I currently have From 55606751b67989bd06d17844a6bcfbf85d44ee69 Mon Sep 17 00:00:00 2001 From: David Hildenbrand <david@redhat.com> Date: Tue, 22 Oct 2019 10:08:04 +0200 Subject: [PATCH v1] KVM: x86/mmu: Prepare kvm_is_mmio_pfn() for PG_reserved changes Right now, ZONE_DEVICE memory is always set PG_reserved. We want to change that in the future. KVM has this weird use case that you can map anything from /dev/mem into the guest. pfn_valid() is not a reliable check whether the memmap was initialized and can be touched. pfn_to_online_page() makes sure that we have an initialized memmap - however, there is no reliable and fast check to detect memmaps that were initialized and are ZONE_DEVICE. Let's rewrite kvm_is_mmio_pfn() so we really only touch initialized memmaps that are guaranteed to not contain garbage. Make sure that RAM without a memmap is still not detected as MMIO and that ZONE_DEVICE that is not UC/UC-/WC is not detected as MMIO. Signed-off-by: David Hildenbrand <david@redhat.com> --- arch/x86/kvm/mmu.c | 38 ++++++++++++++++++++++---------------- 1 file changed, 22 insertions(+), 16 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 24c23c66b226..c91c9a5d14dc 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2962,23 +2962,29 @@ static bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn, static bool kvm_is_mmio_pfn(kvm_pfn_t pfn) { - if (pfn_valid(pfn)) - return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn)) && - /* - * Some reserved pages, such as those from NVDIMM - * DAX devices, are not for MMIO, and can be mapped - * with cached memory type for better performance. - * However, the above check misconceives those pages - * as MMIO, and results in KVM mapping them with UC - * memory type, which would hurt the performance. - * Therefore, we check the host memory type in addition - * and only treat UC/UC-/WC pages as MMIO. - */ - (!pat_enabled() || pat_pfn_immune_to_uc_mtrr(pfn)); + struct page *page = pfn_to_online_page(pfn); + + /* + * Online pages consist of pages managed by the buddy. Especially, + * ZONE_DEVICE pages are never online. Online pages that are reserved + * indicate the zero page and MMIO pages. + */ + if (page) + return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn)); - return !e820__mapped_raw_any(pfn_to_hpa(pfn), - pfn_to_hpa(pfn + 1) - 1, - E820_TYPE_RAM); + /* + * Any RAM that is not online (e.g., mapped via /dev/mem without + * a memmap or with an uninitialized memmap) is not MMIO. + */ + if (e820__mapped_raw_any(pfn_to_hpa(pfn), pfn_to_hpa(pfn + 1) - 1, + E820_TYPE_RAM)) + return false; + + /* + * Finally, anything with a valid memmap could be ZONE_DEVICE - or the + * memmap could be uninitialized. Treat only UC/UC-/WC pages as MMIO. + */ + return pfn_valid() && !pat_enabled() || pat_pfn_immune_to_uc_mtrr(pfn); } /* Bits which may be returned by set_spte() */
On 22.10.19 10:32, David Hildenbrand wrote: > On 22.10.19 10:21, Michal Hocko wrote: >> On Tue 22-10-19 10:15:07, David Hildenbrand wrote: >>> On 22.10.19 10:08, Michal Hocko wrote: >>>> On Tue 22-10-19 08:52:28, David Hildenbrand wrote: >>>>> On 21.10.19 19:23, David Hildenbrand wrote: >>>>>> Two cleanups that popped up while working on (and discussing) virtio-mem: >>>>>> https://lkml.org/lkml/2019/9/19/463 >>>>>> >>>>>> Tested with DIMMs on x86. >>>>>> >>>>>> As discussed with michal in v1, I'll soon look into removing the use >>>>>> of PG_reserved during memory onlining completely - most probably >>>>>> disallowing to offline memory blocks with holes, cleaning up the >>>>>> onlining+offlining code. >>>>> >>>>> BTW, I remember that ZONE_DEVICE pages are still required to be set >>>>> PG_reserved. That has to be sorted out first. >>>> >>>> Do they? >>> >>> Yes, especially KVM code :/ >> >> Details please? >> Oh, and I think you might be wondering "how can we have RAM without a memmap in the guest", see https://lwn.net/Articles/778240/
On Tue 22-10-19 10:32:11, David Hildenbrand wrote:
[...]
> E.g., arch/x86/kvm/mmu.c:kvm_is_mmio_pfn()
Thanks for these references. I am not really familiar with kvm so I
cannot really comment on the specific code but I am wondering why
it simply doesn't check for ZONE_DEVICE explicitly? Also we do care
about holes in RAM (from the early boot), those should be reserved
already AFAIR. So we are left with hotplugged memory with holes and
I am not really sure we should bother with this until there is a clear
usecase in sight.
On 22.10.19 11:14, Michal Hocko wrote: > On Tue 22-10-19 10:32:11, David Hildenbrand wrote: > [...] >> E.g., arch/x86/kvm/mmu.c:kvm_is_mmio_pfn() > > Thanks for these references. I am not really familiar with kvm so I > cannot really comment on the specific code but I am wondering why > it simply doesn't check for ZONE_DEVICE explicitly? Also we do care > about holes in RAM (from the early boot), those should be reserved > already AFAIR. So we are left with hotplugged memory with holes and > I am not really sure we should bother with this until there is a clear > usecase in sight. Well, checking for ZONE_DEVICE is only possible if you have an initialized memmap. And that is not guaranteed when you start mapping random stuff into your guest via /dev/mem. I am reworking these patches right now and audit the whole kernel for PageReserved() checks that might affect ZONE_DEVICE. I'll send the collection of patches as RFC.
On Tue 22-10-19 11:17:24, David Hildenbrand wrote: > On 22.10.19 11:14, Michal Hocko wrote: > > On Tue 22-10-19 10:32:11, David Hildenbrand wrote: > > [...] > > > E.g., arch/x86/kvm/mmu.c:kvm_is_mmio_pfn() > > > > Thanks for these references. I am not really familiar with kvm so I > > cannot really comment on the specific code but I am wondering why > > it simply doesn't check for ZONE_DEVICE explicitly? Also we do care > > about holes in RAM (from the early boot), those should be reserved > > already AFAIR. So we are left with hotplugged memory with holes and > > I am not really sure we should bother with this until there is a clear > > usecase in sight. > > Well, checking for ZONE_DEVICE is only possible if you have an initialized > memmap. And that is not guaranteed when you start mapping random stuff into > your guest via /dev/mem. Yes, I can understand that part but checking PageReserved on an uninitialized memmap is pointless as well. So if you can test for it you can very well test for ZONE_DEVICE as well. PageReserved -> ZONE_DEVICE is a terrible assumption. > I am reworking these patches right now and audit the whole kernel for > PageReserved() checks that might affect ZONE_DEVICE. I'll send the > collection of patches as RFC. Thanks a lot!
On 22.10.19 11:24, Michal Hocko wrote: > On Tue 22-10-19 11:17:24, David Hildenbrand wrote: >> On 22.10.19 11:14, Michal Hocko wrote: >>> On Tue 22-10-19 10:32:11, David Hildenbrand wrote: >>> [...] >>>> E.g., arch/x86/kvm/mmu.c:kvm_is_mmio_pfn() >>> >>> Thanks for these references. I am not really familiar with kvm so I >>> cannot really comment on the specific code but I am wondering why >>> it simply doesn't check for ZONE_DEVICE explicitly? Also we do care >>> about holes in RAM (from the early boot), those should be reserved >>> already AFAIR. So we are left with hotplugged memory with holes and >>> I am not really sure we should bother with this until there is a clear >>> usecase in sight. >> >> Well, checking for ZONE_DEVICE is only possible if you have an initialized >> memmap. And that is not guaranteed when you start mapping random stuff into >> your guest via /dev/mem. > > Yes, I can understand that part but checking PageReserved on an > uninitialized memmap is pointless as well. So if you can test for it you That's why I add pfn_to_online_page() :) > can very well test for ZONE_DEVICE as well. PageReserved -> ZONE_DEVICE > is a terrible assumption. Indeed, it is. But there are more parts in the kernel that I'll be fixing. Stay tuned.