Message ID | 5010C083.30102@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/26/2012 06:58 AM, Xiao Guangrong wrote: > Currently, kvm allocates some pages and use them as error indicators, > it wastes memory and is not good for scalability > > Base on Avi's suggestion, we use the error codes instead of these pages > to indicate the error conditions > > > +static pfn_t get_bad_pfn(void) > +{ > + return -ENOENT; > +} > + > +pfn_t get_fault_pfn(void) > +{ > + return -EFAULT; > +} > +EXPORT_SYMBOL_GPL(get_fault_pfn); > + > +static pfn_t get_hwpoison_pfn(void) > +{ > + return -EHWPOISON; > +} > + Would be better as #defines > int is_hwpoison_pfn(pfn_t pfn) > { > - return pfn == hwpoison_pfn; > + return pfn == -EHWPOISON; > } > EXPORT_SYMBOL_GPL(is_hwpoison_pfn); > > int is_noslot_pfn(pfn_t pfn) > { > - return pfn == bad_pfn; > + return pfn == -ENOENT; > } > EXPORT_SYMBOL_GPL(is_noslot_pfn); > > int is_invalid_pfn(pfn_t pfn) > { > - return pfn == hwpoison_pfn || pfn == fault_pfn; > + return !is_noslot_pfn(pfn) && is_error_pfn(pfn); > } > EXPORT_SYMBOL_GPL(is_invalid_pfn); > So is_*_pfn() could go away and be replaced by ==. > > EXPORT_SYMBOL_GPL(gfn_to_page); > > void kvm_release_page_clean(struct page *page) > { > - kvm_release_pfn_clean(page_to_pfn(page)); > + if (!is_error_page(page)) > + kvm_release_pfn_clean(page_to_pfn(page)); > } > EXPORT_SYMBOL_GPL(kvm_release_page_clean); Note, we can remove calls to kvm_release_page_clean() from error paths now, so in the future we can drop the test. Since my comments are better done as a separate patch, I applied all three patches. Thanks!
On Thu, 26 Jul 2012 11:56:15 +0300 Avi Kivity <avi@redhat.com> wrote: > Since my comments are better done as a separate patch, I applied all > three patches. Thanks! Is this patch really safe for all architectures? IS_ERR_VALUE() casts -MAX_ERRNO to unsigned long and then does comparison. Isn't it possible to conflict with valid pfns? What are the underlying assumptions? Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/26/2012 04:56 PM, Avi Kivity wrote: > On 07/26/2012 06:58 AM, Xiao Guangrong wrote: >> Currently, kvm allocates some pages and use them as error indicators, >> it wastes memory and is not good for scalability >> >> Base on Avi's suggestion, we use the error codes instead of these pages >> to indicate the error conditions >> >> >> +static pfn_t get_bad_pfn(void) >> +{ >> + return -ENOENT; >> +} >> + >> +pfn_t get_fault_pfn(void) >> +{ >> + return -EFAULT; >> +} >> +EXPORT_SYMBOL_GPL(get_fault_pfn); >> + >> +static pfn_t get_hwpoison_pfn(void) >> +{ >> + return -EHWPOISON; >> +} >> + > > Would be better as #defines Okay. > >> int is_hwpoison_pfn(pfn_t pfn) >> { >> - return pfn == hwpoison_pfn; >> + return pfn == -EHWPOISON; >> } >> EXPORT_SYMBOL_GPL(is_hwpoison_pfn); >> >> int is_noslot_pfn(pfn_t pfn) >> { >> - return pfn == bad_pfn; >> + return pfn == -ENOENT; >> } >> EXPORT_SYMBOL_GPL(is_noslot_pfn); >> >> int is_invalid_pfn(pfn_t pfn) >> { >> - return pfn == hwpoison_pfn || pfn == fault_pfn; >> + return !is_noslot_pfn(pfn) && is_error_pfn(pfn); >> } >> EXPORT_SYMBOL_GPL(is_invalid_pfn); >> > > So is_*_pfn() could go away and be replaced by ==. > Okay. >> >> EXPORT_SYMBOL_GPL(gfn_to_page); >> >> void kvm_release_page_clean(struct page *page) >> { >> - kvm_release_pfn_clean(page_to_pfn(page)); >> + if (!is_error_page(page)) >> + kvm_release_pfn_clean(page_to_pfn(page)); >> } >> EXPORT_SYMBOL_GPL(kvm_release_page_clean); > > Note, we can remove calls to kvm_release_page_clean() from error paths > now, so in the future we can drop the test. > Right, since the release path (kvm_release_page_clean) is used in many place and on many architectures, i did the change as small as possible for good review. > Since my comments are better done as a separate patch, Yes, i will make a patch to apply all your comments. Thank you, Avi! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/26/2012 05:20 PM, Takuya Yoshikawa wrote: > On Thu, 26 Jul 2012 11:56:15 +0300 > Avi Kivity <avi@redhat.com> wrote: > >> Since my comments are better done as a separate patch, I applied all >> three patches. Thanks! > > Is this patch really safe for all architectures? > > IS_ERR_VALUE() casts -MAX_ERRNO to unsigned long and then does comparison. > Isn't it possible to conflict with valid pfns? > See IS_ERR_VALUE(): #define IS_ERR_VALUE(x) unlikely((x) >= (unsigned long)-MAX_ERRNO) The minimal value of the error code is: 0xffff f001 on 32-bit and 0x ffff ffff ffff f001 on 64-bit, it is fair larger that a valid pfn (for the pfn, the most top of 12 bits are always 0). Note, PAE is a special case, but only 64G physical memory is valid, 0xffff f001 is also suitable for that. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 26 Jul 2012 17:35:13 +0800 Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote: > > Is this patch really safe for all architectures? > > > > IS_ERR_VALUE() casts -MAX_ERRNO to unsigned long and then does comparison. > > Isn't it possible to conflict with valid pfns? > > > > See IS_ERR_VALUE(): > > #define IS_ERR_VALUE(x) unlikely((x) >= (unsigned long)-MAX_ERRNO) > > The minimal value of the error code is: > 0xffff f001 on 32-bit and 0x ffff ffff ffff f001 on 64-bit, > it is fair larger that a valid pfn (for the pfn, the most top of 12 bits > are always 0). > > Note, PAE is a special case, but only 64G physical memory is valid, > 0xffff f001 is also suitable for that. Ah, I see. I misread the type pfn_t and was confused. Thank you! Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6d1a51e..f4e132c 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -411,6 +411,7 @@ void kvm_arch_flush_shadow(struct kvm *kvm); int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, struct page **pages, int nr_pages); +struct page *get_bad_page(void); struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn); unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn); void kvm_release_page_clean(struct page *page); @@ -564,7 +565,7 @@ void kvm_arch_sync_events(struct kvm *kvm); int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu); void kvm_vcpu_kick(struct kvm_vcpu *vcpu); -int kvm_is_mmio_pfn(pfn_t pfn); +bool kvm_is_mmio_pfn(pfn_t pfn); struct kvm_irq_ack_notifier { struct hlist_node link; diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index ebae24b..7972278 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -203,8 +203,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu) if (!work) return -ENOMEM; - work->page = bad_page; - get_page(bad_page); + work->page = get_bad_page(); INIT_LIST_HEAD(&work->queue); /* for list_del to work */ spin_lock(&vcpu->async_pf.lock); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index eb15833..92aae8b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -100,17 +100,11 @@ EXPORT_SYMBOL_GPL(kvm_rebooting); static bool largepages_enabled = true; -struct page *bad_page; -static pfn_t bad_pfn; - -static struct page *hwpoison_page; -static pfn_t hwpoison_pfn; - -static struct page *fault_page; -static pfn_t fault_pfn; - -inline int kvm_is_mmio_pfn(pfn_t pfn) +bool kvm_is_mmio_pfn(pfn_t pfn) { + if (is_error_pfn(pfn)) + return false; + if (pfn_valid(pfn)) { int reserved; struct page *tail = pfn_to_page(pfn); @@ -936,34 +930,55 @@ EXPORT_SYMBOL_GPL(kvm_disable_largepages); int is_error_page(struct page *page) { - return page == bad_page || page == hwpoison_page || page == fault_page; + return IS_ERR(page); } EXPORT_SYMBOL_GPL(is_error_page); int is_error_pfn(pfn_t pfn) { - return pfn == bad_pfn || pfn == hwpoison_pfn || pfn == fault_pfn; + return IS_ERR_VALUE(pfn); } EXPORT_SYMBOL_GPL(is_error_pfn); +static pfn_t get_bad_pfn(void) +{ + return -ENOENT; +} + +pfn_t get_fault_pfn(void) +{ + return -EFAULT; +} +EXPORT_SYMBOL_GPL(get_fault_pfn); + +static pfn_t get_hwpoison_pfn(void) +{ + return -EHWPOISON; +} + int is_hwpoison_pfn(pfn_t pfn) { - return pfn == hwpoison_pfn; + return pfn == -EHWPOISON; } EXPORT_SYMBOL_GPL(is_hwpoison_pfn); int is_noslot_pfn(pfn_t pfn) { - return pfn == bad_pfn; + return pfn == -ENOENT; } EXPORT_SYMBOL_GPL(is_noslot_pfn); int is_invalid_pfn(pfn_t pfn) { - return pfn == hwpoison_pfn || pfn == fault_pfn; + return !is_noslot_pfn(pfn) && is_error_pfn(pfn); } EXPORT_SYMBOL_GPL(is_invalid_pfn); +struct page *get_bad_page(void) +{ + return ERR_PTR(-ENOENT); +} + static inline unsigned long bad_hva(void) { return PAGE_OFFSET; @@ -1035,13 +1050,6 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn) } EXPORT_SYMBOL_GPL(gfn_to_hva); -pfn_t get_fault_pfn(void) -{ - get_page(fault_page); - return fault_pfn; -} -EXPORT_SYMBOL_GPL(get_fault_pfn); - int get_user_page_nowait(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, int write, struct page **page) { @@ -1119,8 +1127,7 @@ static pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, if (npages == -EHWPOISON || (!async && check_user_page_hwpoison(addr))) { up_read(¤t->mm->mmap_sem); - get_page(hwpoison_page); - return page_to_pfn(hwpoison_page); + return get_hwpoison_pfn(); } vma = find_vma_intersection(current->mm, addr, addr+1); @@ -1158,10 +1165,8 @@ static pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool atomic, bool *async, *async = false; addr = gfn_to_hva(kvm, gfn); - if (kvm_is_error_hva(addr)) { - get_page(bad_page); - return page_to_pfn(bad_page); - } + if (kvm_is_error_hva(addr)) + return get_bad_pfn(); return hva_to_pfn(addr, atomic, async, write_fault, writable); } @@ -1215,37 +1220,45 @@ int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, struct page **pages, } EXPORT_SYMBOL_GPL(gfn_to_page_many_atomic); +static struct page *kvm_pfn_to_page(pfn_t pfn) +{ + WARN_ON(kvm_is_mmio_pfn(pfn)); + + if (is_error_pfn(pfn) || kvm_is_mmio_pfn(pfn)) + return get_bad_page(); + + return pfn_to_page(pfn); +} + struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) { pfn_t pfn; pfn = gfn_to_pfn(kvm, gfn); - if (!kvm_is_mmio_pfn(pfn)) - return pfn_to_page(pfn); - - WARN_ON(kvm_is_mmio_pfn(pfn)); - get_page(bad_page); - return bad_page; + return kvm_pfn_to_page(pfn); } EXPORT_SYMBOL_GPL(gfn_to_page); void kvm_release_page_clean(struct page *page) { - kvm_release_pfn_clean(page_to_pfn(page)); + if (!is_error_page(page)) + kvm_release_pfn_clean(page_to_pfn(page)); } EXPORT_SYMBOL_GPL(kvm_release_page_clean); void kvm_release_pfn_clean(pfn_t pfn) { - if (!kvm_is_mmio_pfn(pfn)) + if (!is_error_pfn(pfn) && !kvm_is_mmio_pfn(pfn)) put_page(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); void kvm_release_page_dirty(struct page *page) { + WARN_ON(is_error_page(page)); + kvm_release_pfn_dirty(page_to_pfn(page)); } EXPORT_SYMBOL_GPL(kvm_release_page_dirty); @@ -2724,33 +2737,6 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, if (r) goto out_fail; - bad_page = alloc_page(GFP_KERNEL | __GFP_ZERO); - - if (bad_page == NULL) { - r = -ENOMEM; - goto out; - } - - bad_pfn = page_to_pfn(bad_page); - - hwpoison_page = alloc_page(GFP_KERNEL | __GFP_ZERO); - - if (hwpoison_page == NULL) { - r = -ENOMEM; - goto out_free_0; - } - - hwpoison_pfn = page_to_pfn(hwpoison_page); - - fault_page = alloc_page(GFP_KERNEL | __GFP_ZERO); - - if (fault_page == NULL) { - r = -ENOMEM; - goto out_free_0; - } - - fault_pfn = page_to_pfn(fault_page); - if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL)) { r = -ENOMEM; goto out_free_0; @@ -2825,12 +2811,6 @@ out_free_1: out_free_0a: free_cpumask_var(cpus_hardware_enabled); out_free_0: - if (fault_page) - __free_page(fault_page); - if (hwpoison_page) - __free_page(hwpoison_page); - __free_page(bad_page); -out: kvm_arch_exit(); out_fail: return r; @@ -2850,8 +2830,5 @@ void kvm_exit(void) kvm_arch_hardware_unsetup(); kvm_arch_exit(); free_cpumask_var(cpus_hardware_enabled); - __free_page(fault_page); - __free_page(hwpoison_page); - __free_page(bad_page); } EXPORT_SYMBOL_GPL(kvm_exit);
Currently, kvm allocates some pages and use them as error indicators, it wastes memory and is not good for scalability Base on Avi's suggestion, we use the error codes instead of these pages to indicate the error conditions Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> --- include/linux/kvm_host.h | 3 +- virt/kvm/async_pf.c | 3 +- virt/kvm/kvm_main.c | 121 +++++++++++++++++++--------------------------- 3 files changed, 52 insertions(+), 75 deletions(-)