Message ID | CALCETrUPa9xvugPNcTmShJFfgSesa31dD-wy0hY3XnH1Knjn6g@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 04/26/2017 04:52 PM, Andy Lutomirski wrote: > I was trying to understand xen_drop_mm_ref() to update it for some > changes I'm working on, and I'm wondering whether we need > xen_exit_mmap() at all. > > AFAICS the intent is to force all CPUs to drop their lazy uses of the > mm being destroyed so it can be unpinned before tearing down the page > tables, thus making it faster to tear down the page tables. This > seems like it'll speed up xen_set_pud() and xen_set_pmd(), but this > seems like it may be of rather limited value. Why do you think it's of limited value? Without it we will end up with a hypercall for each update. Or is your point that the number of those update is relatively small when we are tearing down? > Could we get away with > deleting it? > > Also, this code in drop_other_mm_ref() looks dubious to me: > > /* If this cpu still has a stale cr3 reference, then make sure > it has been flushed. */ > if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd)) > load_cr3(swapper_pg_dir); > > If cr3 hasn't been flushed to the hypervisor because we're in a lazy > mode, why would load_cr3() help? Shouldn't this be xen_mc_flush() > instead? load_cr3() actually ends with xen_mc_flush() by way of xen_write_cr3() -> xen_mc_issue(). -boris
On Wed, Apr 26, 2017 at 3:45 PM, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > On 04/26/2017 04:52 PM, Andy Lutomirski wrote: >> I was trying to understand xen_drop_mm_ref() to update it for some >> changes I'm working on, and I'm wondering whether we need >> xen_exit_mmap() at all. >> >> AFAICS the intent is to force all CPUs to drop their lazy uses of the >> mm being destroyed so it can be unpinned before tearing down the page >> tables, thus making it faster to tear down the page tables. This >> seems like it'll speed up xen_set_pud() and xen_set_pmd(), but this >> seems like it may be of rather limited value. > > Why do you think it's of limited value? Without it we will end up with a > hypercall for each update. > > Or is your point that the number of those update is relatively small > when we are tearing down? The latter. Also, unless I'm missing something, xen_set_pte() doesn't have the optimization. I haven't looked at exactly how page table teardown works, but if it clears each PTE individually, then that's the bulk of the work. > > >> Could we get away with >> deleting it? >> >> Also, this code in drop_other_mm_ref() looks dubious to me: >> >> /* If this cpu still has a stale cr3 reference, then make sure >> it has been flushed. */ >> if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd)) >> load_cr3(swapper_pg_dir); >> >> If cr3 hasn't been flushed to the hypervisor because we're in a lazy >> mode, why would load_cr3() help? Shouldn't this be xen_mc_flush() >> instead? > > load_cr3() actually ends with xen_mc_flush() by way of xen_write_cr3() > -> xen_mc_issue(). xen_mc_issue() does: if ((paravirt_get_lazy_mode() & mode) == 0) xen_mc_flush(); I assume the load_cr3() is intended to deal with the case where we're in lazy mode, but we'll still be in lazy mode, right? Or does it serve some other purpose? --Andy
On 04/26/2017 06:49 PM, Andy Lutomirski wrote: > On Wed, Apr 26, 2017 at 3:45 PM, Boris Ostrovsky > <boris.ostrovsky@oracle.com> wrote: >> On 04/26/2017 04:52 PM, Andy Lutomirski wrote: >>> I was trying to understand xen_drop_mm_ref() to update it for some >>> changes I'm working on, and I'm wondering whether we need >>> xen_exit_mmap() at all. >>> >>> AFAICS the intent is to force all CPUs to drop their lazy uses of the >>> mm being destroyed so it can be unpinned before tearing down the page >>> tables, thus making it faster to tear down the page tables. This >>> seems like it'll speed up xen_set_pud() and xen_set_pmd(), but this >>> seems like it may be of rather limited value. >> >> Why do you think it's of limited value? Without it we will end up with a >> hypercall for each update. >> >> Or is your point that the number of those update is relatively small >> when we are tearing down? > > The latter. Also, unless I'm missing something, xen_set_pte() doesn't > have the optimization. I haven't looked at exactly how page table > teardown works, but if it clears each PTE individually, then that's > the bulk of the work. > >> >> >>> Could we get away with >>> deleting it? >>> >>> Also, this code in drop_other_mm_ref() looks dubious to me: >>> >>> /* If this cpu still has a stale cr3 reference, then make sure >>> it has been flushed. */ >>> if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd)) >>> load_cr3(swapper_pg_dir); >>> >>> If cr3 hasn't been flushed to the hypervisor because we're in a lazy >>> mode, why would load_cr3() help? Shouldn't this be xen_mc_flush() >>> instead? >> >> load_cr3() actually ends with xen_mc_flush() by way of xen_write_cr3() >> -> xen_mc_issue(). > > xen_mc_issue() does: > > if ((paravirt_get_lazy_mode() & mode) == 0) > xen_mc_flush(); > > I assume the load_cr3() is intended to deal with the case where we're > in lazy mode, but we'll still be in lazy mode, right? Or does it > serve some other purpose? Of course. I can't read (I ignored the "== 0" part). Apparently the early version had an explicit flush but then it disappeared (commit 9f79991d4186089e228274196413572cc000143b). The point of CR3 loading here, I believe, is to make sure the hypervisor knows that the (v)CPU is no longer using the the mm's cr3 (we are loading swapper_pgdir here). -boris
On Wed, Apr 26, 2017 at 5:55 PM, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > > > On 04/26/2017 06:49 PM, Andy Lutomirski wrote: >> >> On Wed, Apr 26, 2017 at 3:45 PM, Boris Ostrovsky >> <boris.ostrovsky@oracle.com> wrote: >>> >>> On 04/26/2017 04:52 PM, Andy Lutomirski wrote: >>>> >>>> I was trying to understand xen_drop_mm_ref() to update it for some >>>> changes I'm working on, and I'm wondering whether we need >>>> xen_exit_mmap() at all. >>>> >>>> AFAICS the intent is to force all CPUs to drop their lazy uses of the >>>> mm being destroyed so it can be unpinned before tearing down the page >>>> tables, thus making it faster to tear down the page tables. This >>>> seems like it'll speed up xen_set_pud() and xen_set_pmd(), but this >>>> seems like it may be of rather limited value. >>> >>> >>> Why do you think it's of limited value? Without it we will end up with a >>> hypercall for each update. >>> >>> Or is your point that the number of those update is relatively small >>> when we are tearing down? >> >> >> The latter. Also, unless I'm missing something, xen_set_pte() doesn't >> have the optimization. I haven't looked at exactly how page table >> teardown works, but if it clears each PTE individually, then that's >> the bulk of the work. >> >>> >>> >>>> Could we get away with >>>> deleting it? >>>> >>>> Also, this code in drop_other_mm_ref() looks dubious to me: >>>> >>>> /* If this cpu still has a stale cr3 reference, then make sure >>>> it has been flushed. */ >>>> if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd)) >>>> load_cr3(swapper_pg_dir); >>>> >>>> If cr3 hasn't been flushed to the hypervisor because we're in a lazy >>>> mode, why would load_cr3() help? Shouldn't this be xen_mc_flush() >>>> instead? >>> >>> >>> load_cr3() actually ends with xen_mc_flush() by way of xen_write_cr3() >>> -> xen_mc_issue(). >> >> >> xen_mc_issue() does: >> >> if ((paravirt_get_lazy_mode() & mode) == 0) >> xen_mc_flush(); >> >> I assume the load_cr3() is intended to deal with the case where we're >> in lazy mode, but we'll still be in lazy mode, right? Or does it >> serve some other purpose? > > > Of course. I can't read (I ignored the "== 0" part). > > Apparently the early version had an explicit flush but then it disappeared > (commit 9f79991d4186089e228274196413572cc000143b). > > The point of CR3 loading here, I believe, is to make sure the hypervisor > knows that the (v)CPU is no longer using the the mm's cr3 (we are loading > swapper_pgdir here). But that's what leave_mm() does. To be fair, the x86 lazy TLB management is a big mess, and this came up because I'm trying to clean it up without removing it. I suppose I can try to keep xen_exit_mmap() working. Is there a simple way to try to unpin but to not treat it as an error if the hypervisor rejects it?
* Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > > xen_mc_issue() does: > > > > if ((paravirt_get_lazy_mode() & mode) == 0) > > xen_mc_flush(); > > > > I assume the load_cr3() is intended to deal with the case where we're > > in lazy mode, but we'll still be in lazy mode, right? Or does it > > serve some other purpose? > > Of course. I can't read (I ignored the "== 0" part). Ha, ob'sidenote: the preferred form to write this is: if (!(paravirt_get_lazy_mode() & mode)) xen_mc_flush(); ... exactly due to the readability problem you ran into: a 'pre' negation is much easier to read, plus '==' tends to trigger 'equal to' attributes in the brain, which is the opposite of negation. So it's very easy to mis-read such syntactic constructs even if they are technically correct. I think '== 0' should be forbidden in all cases where the purpose is a logic operation and should be used strictly only in cases where we do explicit integer arithmetics. (Bools and '== false' are suboptimal for similar reasons.) ... but I digress! Ingo
>>> On 27.04.17 at 02:55, <boris.ostrovsky@oracle.com> wrote: > The point of CR3 loading here, I believe, is to make sure the hypervisor > knows that the (v)CPU is no longer using the the mm's cr3 (we are > loading swapper_pgdir here). Correct, or else there would still be a non-zero refcount for the page tables hanging off of that CR3, disallowing those pages to become writable, and in turn disallowing the use of direct writes instead of hypercalls to clear the individual entries (or to be precise, direct writes would still be allowed, but their use would be even slower than using hypercalls because they would trap into the hypervisor for emulation). Jan
>>>> >>>>> Also, this code in drop_other_mm_ref() looks dubious to me: >>>>> >>>>> /* If this cpu still has a stale cr3 reference, then make sure >>>>> it has been flushed. */ >>>>> if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd)) >>>>> load_cr3(swapper_pg_dir); >>>>> >>>>> If cr3 hasn't been flushed to the hypervisor because we're in a lazy >>>>> mode, why would load_cr3() help? Shouldn't this be xen_mc_flush() >>>>> instead? >>>> >>>> load_cr3() actually ends with xen_mc_flush() by way of xen_write_cr3() >>>> -> xen_mc_issue(). >>> >>> xen_mc_issue() does: >>> >>> if ((paravirt_get_lazy_mode() & mode) == 0) >>> xen_mc_flush(); >>> >>> I assume the load_cr3() is intended to deal with the case where we're >>> in lazy mode, but we'll still be in lazy mode, right? Or does it >>> serve some other purpose? >> >> Of course. I can't read (I ignored the "== 0" part). >> >> Apparently the early version had an explicit flush but then it disappeared >> (commit 9f79991d4186089e228274196413572cc000143b). >> >> The point of CR3 loading here, I believe, is to make sure the hypervisor >> knows that the (v)CPU is no longer using the the mm's cr3 (we are loading >> swapper_pgdir here). > But that's what leave_mm() does. To be fair, the x86 lazy TLB > management is a big mess, and this came up because I'm trying to clean > it up without removing it. True. I don't know though if you can guarantee that leave_mm() (or load_cr3() inside it) is actually called if we are in lazy mode. > > I suppose I can try to keep xen_exit_mmap() working. Is there a > simple way to try to unpin but to not treat it as an error if the > hypervisor rejects it? Even if we managed to craft a call in Linux to do this (current xen_pgd_unpin() will result in a WARNing in xen_mc_flush()) this will still cause a bunch of warnings in the hypervisor (if it is built as DEBUG, but bad nevertheless). But even without that, it is an error for a reason so how are you planning to continue if you ignore it? -boris
On Thu, Apr 27, 2017 at 6:21 AM, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > >>>>> >>>>>> Also, this code in drop_other_mm_ref() looks dubious to me: >>>>>> >>>>>> /* If this cpu still has a stale cr3 reference, then make sure >>>>>> it has been flushed. */ >>>>>> if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd)) >>>>>> load_cr3(swapper_pg_dir); >>>>>> >>>>>> If cr3 hasn't been flushed to the hypervisor because we're in a lazy >>>>>> mode, why would load_cr3() help? Shouldn't this be xen_mc_flush() >>>>>> instead? >>>>> >>>>> load_cr3() actually ends with xen_mc_flush() by way of xen_write_cr3() >>>>> -> xen_mc_issue(). >>>> >>>> xen_mc_issue() does: >>>> >>>> if ((paravirt_get_lazy_mode() & mode) == 0) >>>> xen_mc_flush(); >>>> >>>> I assume the load_cr3() is intended to deal with the case where we're >>>> in lazy mode, but we'll still be in lazy mode, right? Or does it >>>> serve some other purpose? >>> >>> Of course. I can't read (I ignored the "== 0" part). >>> >>> Apparently the early version had an explicit flush but then it disappeared >>> (commit 9f79991d4186089e228274196413572cc000143b). >>> >>> The point of CR3 loading here, I believe, is to make sure the hypervisor >>> knows that the (v)CPU is no longer using the the mm's cr3 (we are loading >>> swapper_pgdir here). >> But that's what leave_mm() does. To be fair, the x86 lazy TLB >> management is a big mess, and this came up because I'm trying to clean >> it up without removing it. > > True. I don't know though if you can guarantee that leave_mm() (or > load_cr3() inside it) is actually called if we are in lazy mode. The code just before that makes these calls. Anyway, I propose to rewrite the whole thing like this: https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/tlbflush_cleanup&id=ff143a54bb3bafaaad6e32145a9cfbc112e8584f > >> >> I suppose I can try to keep xen_exit_mmap() working. Is there a >> simple way to try to unpin but to not treat it as an error if the >> hypervisor rejects it? > > Even if we managed to craft a call in Linux to do this (current > xen_pgd_unpin() will result in a WARNing in xen_mc_flush()) this will > still cause a bunch of warnings in the hypervisor (if it is built as > DEBUG, but bad nevertheless). > > But even without that, it is an error for a reason so how are you > planning to continue if you ignore it? > I was imagining that we'd just try to unpin and carry on if it fails. We can always unpin later in xen_pgd_free(). --Andy
On 04/27/2017 12:46 PM, Andy Lutomirski wrote: > On Thu, Apr 27, 2017 at 6:21 AM, Boris Ostrovsky > <boris.ostrovsky@oracle.com> wrote: >>>>>>> Also, this code in drop_other_mm_ref() looks dubious to me: >>>>>>> >>>>>>> /* If this cpu still has a stale cr3 reference, then make sure >>>>>>> it has been flushed. */ >>>>>>> if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd)) >>>>>>> load_cr3(swapper_pg_dir); >>>>>>> >>>>>>> If cr3 hasn't been flushed to the hypervisor because we're in a lazy >>>>>>> mode, why would load_cr3() help? Shouldn't this be xen_mc_flush() >>>>>>> instead? >>>>>> load_cr3() actually ends with xen_mc_flush() by way of xen_write_cr3() >>>>>> -> xen_mc_issue(). >>>>> xen_mc_issue() does: >>>>> >>>>> if ((paravirt_get_lazy_mode() & mode) == 0) >>>>> xen_mc_flush(); >>>>> >>>>> I assume the load_cr3() is intended to deal with the case where we're >>>>> in lazy mode, but we'll still be in lazy mode, right? Or does it >>>>> serve some other purpose? >>>> Of course. I can't read (I ignored the "== 0" part). >>>> >>>> Apparently the early version had an explicit flush but then it disappeared >>>> (commit 9f79991d4186089e228274196413572cc000143b). >>>> >>>> The point of CR3 loading here, I believe, is to make sure the hypervisor >>>> knows that the (v)CPU is no longer using the the mm's cr3 (we are loading >>>> swapper_pgdir here). >>> But that's what leave_mm() does. To be fair, the x86 lazy TLB >>> management is a big mess, and this came up because I'm trying to clean >>> it up without removing it. >> True. I don't know though if you can guarantee that leave_mm() (or >> load_cr3() inside it) is actually called if we are in lazy mode. > The code just before that makes these calls. Yes, and I was unsure whether we always get to make these calls, based on mm and cpu_tlbstate. I think we do and with your changes it is made even more clear. > > Anyway, I propose to rewrite the whole thing like this: > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/tlbflush_cleanup&id=ff143a54bb3bafaaad6e32145a9cfbc112e8584f Can you explain xen_pgd_free() change? When do you expect xen_exit_mmap() to fail unpinning (compared to what we have now)? -boris
On Thu, Apr 27, 2017 at 1:00 PM, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > On 04/27/2017 12:46 PM, Andy Lutomirski wrote: >> On Thu, Apr 27, 2017 at 6:21 AM, Boris Ostrovsky >> <boris.ostrovsky@oracle.com> wrote: >>>>>>>> Also, this code in drop_other_mm_ref() looks dubious to me: >>>>>>>> >>>>>>>> /* If this cpu still has a stale cr3 reference, then make sure >>>>>>>> it has been flushed. */ >>>>>>>> if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd)) >>>>>>>> load_cr3(swapper_pg_dir); >>>>>>>> >>>>>>>> If cr3 hasn't been flushed to the hypervisor because we're in a lazy >>>>>>>> mode, why would load_cr3() help? Shouldn't this be xen_mc_flush() >>>>>>>> instead? >>>>>>> load_cr3() actually ends with xen_mc_flush() by way of xen_write_cr3() >>>>>>> -> xen_mc_issue(). >>>>>> xen_mc_issue() does: >>>>>> >>>>>> if ((paravirt_get_lazy_mode() & mode) == 0) >>>>>> xen_mc_flush(); >>>>>> >>>>>> I assume the load_cr3() is intended to deal with the case where we're >>>>>> in lazy mode, but we'll still be in lazy mode, right? Or does it >>>>>> serve some other purpose? >>>>> Of course. I can't read (I ignored the "== 0" part). >>>>> >>>>> Apparently the early version had an explicit flush but then it disappeared >>>>> (commit 9f79991d4186089e228274196413572cc000143b). >>>>> >>>>> The point of CR3 loading here, I believe, is to make sure the hypervisor >>>>> knows that the (v)CPU is no longer using the the mm's cr3 (we are loading >>>>> swapper_pgdir here). >>>> But that's what leave_mm() does. To be fair, the x86 lazy TLB >>>> management is a big mess, and this came up because I'm trying to clean >>>> it up without removing it. >>> True. I don't know though if you can guarantee that leave_mm() (or >>> load_cr3() inside it) is actually called if we are in lazy mode. >> The code just before that makes these calls. > > Yes, and I was unsure whether we always get to make these calls, based > on mm and cpu_tlbstate. I think we do and with your changes it is made > even more clear. :) > >> >> Anyway, I propose to rewrite the whole thing like this: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/tlbflush_cleanup&id=ff143a54bb3bafaaad6e32145a9cfbc112e8584f > > Can you explain xen_pgd_free() change? When do you expect > xen_exit_mmap() to fail unpinning (compared to what we have now)? I don't expect it to fail, but I made fairly extensive changes. --Andy
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 37cb5aad71de..e4e073844cbf 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -998,101 +998,6 @@ static void xen_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm) spin_unlock(&mm->page_table_lock); } - -#ifdef CONFIG_SMP -/* Another cpu may still have their %cr3 pointing at the pagetable, so - we need to repoint it somewhere else before we can unpin it. */ -static void drop_other_mm_ref(void *info) -{ - struct mm_struct *mm = info; - struct mm_struct *active_mm; - - active_mm = this_cpu_read(cpu_tlbstate.active_mm); - - if (active_mm == mm && this_cpu_read(cpu_tlbstate.state) != TLBSTATE_OK) - leave_mm(smp_processor_id()); - - /* If this cpu still has a stale cr3 reference, then make sure - it has been flushed. */ - if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd)) - load_cr3(swapper_pg_dir); -} - -static void xen_drop_mm_ref(struct mm_struct *mm) -{ - cpumask_var_t mask; - unsigned cpu; - - if (current->active_mm == mm) { - if (current->mm == mm) - load_cr3(swapper_pg_dir); - else - leave_mm(smp_processor_id()); - } - - /* Get the "official" set of cpus referring to our pagetable. */ - if (!alloc_cpumask_var(&mask, GFP_ATOMIC)) { - for_each_online_cpu(cpu) { - if (!cpumask_test_cpu(cpu, mm_cpumask(mm)) - && per_cpu(xen_current_cr3, cpu) != __pa(mm->pgd)) - continue; - smp_call_function_single(cpu, drop_other_mm_ref, mm, 1); - } - return; - } - cpumask_copy(mask, mm_cpumask(mm)); - - /* It's possible that a vcpu may have a stale reference to our - cr3, because its in lazy mode, and it hasn't yet flushed - its set of pending hypercalls yet. In this case, we can - look at its actual current cr3 value, and force it to flush - if needed. */ - for_each_online_cpu(cpu) { - if (per_cpu(xen_current_cr3, cpu) == __pa(mm->pgd)) - cpumask_set_cpu(cpu, mask); - } - - if (!cpumask_empty(mask)) - smp_call_function_many(mask, drop_other_mm_ref, mm, 1); - free_cpumask_var(mask); -} -#else -static void xen_drop_mm_ref(struct mm_struct *mm) -{ - if (current->active_mm == mm) - load_cr3(swapper_pg_dir); -} -#endif - -/* - * While a process runs, Xen pins its pagetables, which means that the - * hypervisor forces it to be read-only, and it controls all updates - * to it. This means that all pagetable updates have to go via the - * hypervisor, which is moderately expensive. - * - * Since we're pulling the pagetable down, we switch to use init_mm, - * unpin old process pagetable and mark it all read-write, which - * allows further operations on it to be simple memory accesses. - * - * The only subtle point is that another CPU may be still using the - * pagetable because of lazy tlb flushing. This means we need need to - * switch all CPUs off this pagetable before we can unpin it. - */ -static void xen_exit_mmap(struct mm_struct *mm) -{ - get_cpu(); /* make sure we don't move around */ - xen_drop_mm_ref(mm); - put_cpu(); - - spin_lock(&mm->page_table_lock); - - /* pgd may not be pinned in the error exit path of execve */ - if (xen_page_pinned(mm->pgd)) - xen_pgd_unpin(mm); - - spin_unlock(&mm->page_table_lock); -} - static void xen_post_allocator_init(void);