From patchwork Wed Jul 20 04:34:31 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Kardashevskiy X-Patchwork-Id: 9238861 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 2149B60867 for ; Wed, 20 Jul 2016 04:42:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1759B2624D for ; Wed, 20 Jul 2016 04:42:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F367427AC2; Wed, 20 Jul 2016 04:41:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 694AC2624D for ; Wed, 20 Jul 2016 04:41:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751254AbcGTElv (ORCPT ); Wed, 20 Jul 2016 00:41:51 -0400 Received: from ozlabs.ru ([83.169.36.222]:48929 "EHLO ozlabs.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750982AbcGTElt (ORCPT ); Wed, 20 Jul 2016 00:41:49 -0400 X-Greylist: delayed 427 seconds by postgrey-1.27 at vger.kernel.org; Wed, 20 Jul 2016 00:41:49 EDT Received: from vpl2.ozlabs.ibm.com (localhost [IPv6:::1]) by ozlabs.ru (Postfix) with ESMTP id 86B0F12065E; Wed, 20 Jul 2016 06:34:43 +0200 (CEST) From: Alexey Kardashevskiy To: linuxppc-dev@lists.ozlabs.org Cc: Alexey Kardashevskiy , Alex Williamson , Paul Mackerras , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Balbir Singh , David Gibson , Benjamin Herrenschmidt , Nicholas Piggin Subject: [PATCH kernel v2 2/2] powerpc/mm/iommu: Put pages on process exit Date: Wed, 20 Jul 2016 14:34:31 +1000 Message-Id: <1468989271-753-3-git-send-email-aik@ozlabs.ru> X-Mailer: git-send-email 2.5.0.rc3 In-Reply-To: <1468989271-753-1-git-send-email-aik@ozlabs.ru> References: <1468989271-753-1-git-send-email-aik@ozlabs.ru> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP At the moment VFIO IOMMU SPAPR v2 driver pins all guest RAM pages when the userspace starts using VFIO. When the userspace process finishes, all the pinned pages need to be put; this is done as a part of the userspace memory context (MM) destruction which happens on the very last mmdrop(). This approach has a problem that a MM of the userspace process may live longer than the userspace process itself as kernel threads use userspace process MMs which was runnning on a CPU where the kernel thread was scheduled to. If this happened, the MM remains referenced until this exact kernel thread wakes up again and releases the very last reference to the MM, on an idle system this can take even hours. This references and caches MM once per container and adds tracking how many times each preregistered area was registered in a specific container. This way we do not depend on @current pointing to a valid task descriptor. This changes the userspce interface to return EBUSY if memory is already registered (mm_iommu_get() used to increment the counter); however it should not have any practical effect as the only userspace tool available now does register memory area once per container anyway. As tce_iommu_register_pages/tce_iommu_unregister_pages are called under container->lock, this does not need additional locking. Cc: David Gibson Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Balbir Singh Cc: Nicholas Piggin Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/include/asm/mmu_context.h | 1 - arch/powerpc/mm/mmu_context_book3s64.c | 4 --- arch/powerpc/mm/mmu_context_iommu.c | 10 ------- drivers/vfio/vfio_iommu_spapr_tce.c | 52 +++++++++++++++++++++++++++++++++- 4 files changed, 51 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 745b4bd..90338fd 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -24,7 +24,6 @@ extern long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long e extern long mm_iommu_put(struct mm_struct *mm, struct mm_iommu_table_group_mem_t *mem); extern void mm_iommu_init(mm_context_t *ctx); -extern void mm_iommu_cleanup(mm_context_t *ctx); extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(struct mm_struct *mm, unsigned long ua, unsigned long size); extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm, diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c index 227b2a6..5c67d1c 100644 --- a/arch/powerpc/mm/mmu_context_book3s64.c +++ b/arch/powerpc/mm/mmu_context_book3s64.c @@ -159,10 +159,6 @@ static inline void destroy_pagetable_page(struct mm_struct *mm) void destroy_context(struct mm_struct *mm) { -#ifdef CONFIG_SPAPR_TCE_IOMMU - mm_iommu_cleanup(&mm->context); -#endif - #ifdef CONFIG_PPC_ICSWX drop_cop(mm->context.acop, mm); kfree(mm->context.cop_lockp); diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c index 65086bf..901773d 100644 --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -293,13 +293,3 @@ void mm_iommu_init(mm_context_t *ctx) { INIT_LIST_HEAD_RCU(&ctx->iommu_group_mem_list); } - -void mm_iommu_cleanup(mm_context_t *ctx) -{ - struct mm_iommu_table_group_mem_t *mem, *tmp; - - list_for_each_entry_safe(mem, tmp, &ctx->iommu_group_mem_list, next) { - list_del_rcu(&mem->next); - mm_iommu_do_free(mem); - } -} diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 9752e77..40e71a0 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -89,6 +89,15 @@ struct tce_iommu_group { }; /* + * A container needs to remember which preregistered areas and how many times + * it has referenced to do proper cleanup at the userspace process exit. + */ +struct tce_iommu_prereg { + struct list_head next; + struct mm_iommu_table_group_mem_t *mem; +}; + +/* * The container descriptor supports only a single group per container. * Required by the API as the container is not supplied with the IOMMU group * at the moment of initialization. @@ -101,12 +110,26 @@ struct tce_container { struct mm_struct *mm; struct iommu_table *tables[IOMMU_TABLE_GROUP_MAX_TABLES]; struct list_head group_list; + struct list_head prereg_list; }; +static long tce_iommu_prereg_free(struct tce_container *container, + struct tce_iommu_prereg *tcemem) +{ + long ret; + + list_del(&tcemem->next); + ret = mm_iommu_put(container->mm, tcemem->mem); + kfree(tcemem); + + return ret; +} + static long tce_iommu_unregister_pages(struct tce_container *container, __u64 vaddr, __u64 size) { struct mm_iommu_table_group_mem_t *mem; + struct tce_iommu_prereg *tcemem; if ((vaddr & ~PAGE_MASK) || (size & ~PAGE_MASK)) return -EINVAL; @@ -115,7 +138,12 @@ static long tce_iommu_unregister_pages(struct tce_container *container, if (!mem) return -ENOENT; - return mm_iommu_put(container->mm, mem); + list_for_each_entry(tcemem, &container->prereg_list, next) { + if (tcemem->mem == mem) + return tce_iommu_prereg_free(container, tcemem); + } + + return -ENOENT; } static long tce_iommu_register_pages(struct tce_container *container, @@ -123,6 +151,7 @@ static long tce_iommu_register_pages(struct tce_container *container, { long ret = 0; struct mm_iommu_table_group_mem_t *mem = NULL; + struct tce_iommu_prereg *tcemem; unsigned long entries = size >> PAGE_SHIFT; if ((vaddr & ~PAGE_MASK) || (size & ~PAGE_MASK) || @@ -140,6 +169,18 @@ static long tce_iommu_register_pages(struct tce_container *container, ret = mm_iommu_get(container->mm, vaddr, entries, &mem); if (ret) return ret; + + list_for_each_entry(tcemem, &container->prereg_list, next) { + if (tcemem->mem == mem) { + mm_iommu_put(container->mm, mem); + return -EBUSY; + } + } + + tcemem = kzalloc(sizeof(*tcemem), GFP_KERNEL); + tcemem->mem = mem; + list_add(&tcemem->next, &container->prereg_list); + container->enabled = true; return 0; @@ -325,6 +366,7 @@ static void *tce_iommu_open(unsigned long arg) mutex_init(&container->lock); INIT_LIST_HEAD_RCU(&container->group_list); + INIT_LIST_HEAD_RCU(&container->prereg_list); container->v2 = arg == VFIO_SPAPR_TCE_v2_IOMMU; @@ -362,6 +404,14 @@ static void tce_iommu_release(void *iommu_data) tce_iommu_free_table(tbl); } + while (!list_empty(&container->prereg_list)) { + struct tce_iommu_prereg *tcemem; + + tcemem = list_first_entry(&container->prereg_list, + struct tce_iommu_prereg, next); + tce_iommu_prereg_free(container, tcemem); + } + if (container->mm) mmdrop(container->mm); tce_iommu_disable(container);