Message ID | 4991FD0D.1070108@goop.org (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
On Tue, Feb 10, 2009 at 02:17:49PM -0800, Jeremy Fitzhardinge wrote: > Marcelo Tosatti wrote: >> KVM's paravirt mmu pte batching has issues with, at least, kernel >> updates from DEBUG_PAGEALLOC. >> >> This has been experienced with slab allocation from irq context from >> within lazy mmu sections: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=480822 >> >> DEBUG_PAGEALLOC will map/unmap the kernel pagetables to catch bad >> accesses, with code such as: >> >> __change_page_attr(): >> >> /* >> * Do we really change anything ? >> */ >> if (pte_val(old_pte) != pte_val(new_pte)) { >> set_pte_atomic(kpte, new_pte); >> cpa->flags |= CPA_FLUSHTLB; >> } >> >> A present->nonpresent update can be queued, but not yet committed to >> memory. So the set_pte_atomic will be skipped but the update flushed >> afterwards. set_pte_ATOMIC. >> > > Are you saying that there's a queued update which means that old_pte is > a stale value which happens to equal new_pte, so new_pte is never set? > OK, sounds like a generic problem, of the same sort we've had with > kmap_atomic being used in interrupt routines in lazy mode. Yes. It seems however that only set_pte_at/pte_update/_defer are used under significatly long lazy mmu sections (long as in number of updates). Is it worthwhile to bother (and risk) batching kernel pte updates ? Until someone forgets about arch_flush_lazy_mmu_mode again... > In this case, I think the proper fix is to call > arch_flush_lazy_mmu_mode() before reading old_pte to make sure its up to > date, and calling it again when processing CPA_FLUSHTLB. > Could you try the patch below instead? It should work yes. > (BTW, set_pte_atomic doesn't mean synchronous; it just means its safe to > use on live ptes on 32-bit PAE machines which can't otherwise atomically > update a pte.) Doh, of course. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti wrote: > On Tue, Feb 10, 2009 at 02:17:49PM -0800, Jeremy Fitzhardinge wrote: > >> Marcelo Tosatti wrote: >> >>> KVM's paravirt mmu pte batching has issues with, at least, kernel >>> updates from DEBUG_PAGEALLOC. >>> >>> This has been experienced with slab allocation from irq context from >>> within lazy mmu sections: >>> >>> https://bugzilla.redhat.com/show_bug.cgi?id=480822 >>> >>> DEBUG_PAGEALLOC will map/unmap the kernel pagetables to catch bad >>> accesses, with code such as: >>> >>> __change_page_attr(): >>> >>> /* >>> * Do we really change anything ? >>> */ >>> if (pte_val(old_pte) != pte_val(new_pte)) { >>> set_pte_atomic(kpte, new_pte); >>> cpa->flags |= CPA_FLUSHTLB; >>> } >>> >>> A present->nonpresent update can be queued, but not yet committed to >>> memory. So the set_pte_atomic will be skipped but the update flushed >>> afterwards. set_pte_ATOMIC. >>> >>> >> Are you saying that there's a queued update which means that old_pte is >> a stale value which happens to equal new_pte, so new_pte is never set? >> OK, sounds like a generic problem, of the same sort we've had with >> kmap_atomic being used in interrupt routines in lazy mode. >> > > Yes. It seems however that only set_pte_at/pte_update/_defer are > used under significatly long lazy mmu sections (long as in number of > updates). Is it worthwhile to bother (and risk) batching kernel pte updates ? > Well, that depends on how expensive each update is. For something like kunmap atomic, I think combining the clear+tlb flush probably is worthwhile. > Until someone forgets about arch_flush_lazy_mmu_mode again... > It has been surprisingly unproblematic, though this CPA issue came to light. But given that there's only a few "correct" ways to update the kernel mappings now (kmap/vmap/vmalloc, kmap_atomic and cpa, I think), it should be easy to cover all the bases. (Hm, better check vmap.) J -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jeremy Fitzhardinge wrote: >> >> Yes. It seems however that only set_pte_at/pte_update/_defer are >> used under significatly long lazy mmu sections (long as in number of >> updates). Is it worthwhile to bother (and risk) batching kernel pte >> updates ? >> > > Well, that depends on how expensive each update is. For something > like kunmap atomic, I think combining the clear+tlb flush probably is > worthwhile. I agree, kmap_atomic() is fairly common.
Avi Kivity wrote: > Jeremy Fitzhardinge wrote: >>> >>> Yes. It seems however that only set_pte_at/pte_update/_defer are >>> used under significatly long lazy mmu sections (long as in number of >>> updates). Is it worthwhile to bother (and risk) batching kernel pte >>> updates ? >>> >> >> Well, that depends on how expensive each update is. For something >> like kunmap atomic, I think combining the clear+tlb flush probably is >> worthwhile. > > I agree, kmap_atomic() is fairly common. (Not that we're actually batching it at present; we need to work out proper semantics for nesting batches...) J -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 84ba748..fb12f06 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -576,6 +576,13 @@ static int __change_page_attr(struct cpa_data *cpa, int primary) else address = *cpa->vaddr; + /* + * If we're called with lazy mmu updates enabled, the + * in-memory pte state may be stale. Flush pending updates to + * bring them up to date. + */ + arch_flush_lazy_mmu_mode(); + repeat: kpte = lookup_address(address, &level); if (!kpte) @@ -854,6 +861,13 @@ static int change_page_attr_set_clr(unsigned long *addr, int numpages, } else cpa_flush_all(cache); + /* + * If we've been called with lazy mmu updates enabled, then + * make sure that everything gets flushed out before we + * return. + */ + arch_flush_lazy_mmu_mode(); + out: return ret; }