From patchwork Thu Jan 21 07:00:53 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Education Directorate X-Patchwork-Id: 8078621 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id C59D2BEEE5 for ; Thu, 21 Jan 2016 07:01:08 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id C29CE205DD for ; Thu, 21 Jan 2016 07:01:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9D429205D1 for ; Thu, 21 Jan 2016 07:01:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758549AbcAUHBE (ORCPT ); Thu, 21 Jan 2016 02:01:04 -0500 Received: from mail-pa0-f44.google.com ([209.85.220.44]:33039 "EHLO mail-pa0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751055AbcAUHBB (ORCPT ); Thu, 21 Jan 2016 02:01:01 -0500 Received: by mail-pa0-f44.google.com with SMTP id cy9so18349752pac.0; Wed, 20 Jan 2016 23:01:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:reply-to:mime-version :content-type:content-disposition:user-agent; bh=NnAR2wGv/zK97N5tZdLoUIAnB+h7NdF0eerh18PnSR4=; b=RNPZ8nrUfx3iPGS9BZO746aTBln+tQO60Ibn1waJOltcjTuugbOuM49O8l1ARBTIAG hZOZlUF5PGJzeE9ozDNF4LT7imjOABG7jbcAUOZaASOmemnO/knkfX8FGPAqrfSsbqdT g3VgmzR9/FSHDyaYxEI6OgED31Ff4SmnM2FUZxlfmtU+8gAIbkeXpDWdIK70kgraaF5s NN8Jegyl6VbLr+AFWzfwzdCDVXS/KArRTPmG3Hr4DYl83V6hSmzKL3vAo/g/ekIWMk1I D9RykJQZDyYRX3Sn6qHeyvBC4IbrmepTxQgxQgMweQloeFxOLC31IoOF+e8ySzBXQ9pb FhQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:reply-to :mime-version:content-type:content-disposition:user-agent; bh=NnAR2wGv/zK97N5tZdLoUIAnB+h7NdF0eerh18PnSR4=; b=SbYjhYhlBZye5rIwyzT8lhKFbcpcaq00pk/XRLS7gFQn4s68dPfgqanhhlZAJCZHD1 G72GUZLGsxhyN3QnPPO+ggGqYySuIhsaCFtSJTSMjGTQfSXpHc7BvGakyHW9DjLqsSeV C4QFKC3pbZ3lwRZNEzfTAR9vdyqz6E7M7whCjCeXVKw3zVH7MeFBhCwaXRbgGVMcUQxa M7hRcrFqlbua2cA5ir8gynH5WYxLvdlRrQbxmG8L5BsI+wNsAaBp4pfWJ23cVK0sCpPu wtOFGdn4SK2hLr6HJ05DW8e+oluepOtBjPXD+3AUTrc+wR1SNl/4nxmC1Tnt7uyL6dMi 3NUQ== X-Gm-Message-State: ALoCoQm48cC4PXNcGeLSh3wd6Z4ZVqyJZTxAqbIq24RAm7ri4g9Qu89RSte4aeZJFwocpyIsu/zhOWET3bHU7QN3flLwqgvURQ== X-Received: by 10.66.219.98 with SMTP id pn2mr58314151pac.113.1453359661218; Wed, 20 Jan 2016 23:01:01 -0800 (PST) Received: from cotter.ozlabs.ibm.com ([122.99.82.10]) by smtp.gmail.com with ESMTPSA id n8sm53181558pfj.46.2016.01.20.23.00.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 20 Jan 2016 23:01:00 -0800 (PST) Date: Thu, 21 Jan 2016 18:00:53 +1100 From: Balbir Singh To: kvm-ppc@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org Cc: akpm@linux-foundation.org, aik@ozlabs.ru, paulus@samba.org Subject: [RFC][PATCH] KVM-PPC: Migrate Pinned Pages out of CMA Message-ID: <20160121070053.GA4319@cotter.ozlabs.ibm.com> Reply-To: bsingharora@gmail.com MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Balbir Singh When PCI/Device pass through is enabled via VFIO, KVM-PPC will pin pages using get_user_pages_fast(). One of the downsides of the pinning is that the page could be in CMA region. The CMA region is used for other allocations like the hash page table. Ideally we want the pinned pages to be from non CMA region. This patch (currently only for KVM PPC with VFIO) forcefully migrates the pages out (huge pages are ommitted for the moment). There are more efficient ways of doing this, but that might be elaborate and might impact a larger audience beyond just the kvm ppc implementation. The magic is in new_iommu_non_cma_page() which allocates the new page from a non CMA region. I've tested the patches lightly at my end, but there might be bugs For example if after lru_add_drain(), the page is not isolated is this a BUG? Second question - is mm_iommu_move_page_from_cma() generic enough to be used as helper ourside of KVM-PPC? Previous discussion was at http://permalink.gmane.org/gmane.linux.kernel.mm/136738 Signed-off-by: Balbir Singh --- arch/powerpc/include/asm/mmu_context.h | 1 + arch/powerpc/mm/mmu_context_iommu.c | 80 ++++++++++++++++++++++++++++++++-- 2 files changed, 77 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 878c277..2ef72aa 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm); #ifdef CONFIG_SPAPR_TCE_IOMMU struct mm_iommu_table_group_mem_t; +extern int isolate_lru_page(struct page *page); /* internal.h */ extern bool mm_iommu_preregistered(void); extern long mm_iommu_get(unsigned long ua, unsigned long entries, struct mm_iommu_table_group_mem_t **pmem); diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c index da6a216..ad843a5 100644 --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -15,6 +15,9 @@ #include #include #include +#include +#include +#include #include static DEFINE_MUTEX(mem_list_mutex); @@ -72,6 +75,54 @@ bool mm_iommu_preregistered(void) } EXPORT_SYMBOL_GPL(mm_iommu_preregistered); +/* + * Taken from alloc_migrate_target with changes to remove CMA allocations + */ +struct page *new_iommu_non_cma_page(struct page *page, unsigned long private, + int **resultp) +{ + gfp_t gfp_mask = GFP_USER; + struct page *new_page; + + if (PageHuge(page) || PageTransHuge(page)) + return NULL; + + if (PageHighMem(page)) + gfp_mask |= __GFP_HIGHMEM; + + /* + * We don't want the allocation to force an OOM if possibe + */ + new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN); + return new_page; +} + +static int mm_iommu_move_page_from_cma(struct page *page) +{ + int ret; + LIST_HEAD(cma_migrate_pages); + + /* Ignore huge/THP pages for now */ + if (PageHuge(page) || PageTransHuge(page)) + return -EBUSY; + + lru_add_drain(); + ret = isolate_lru_page(page); + if (ret) + get_page(page); /* Potential BUG? */ + + list_add(&page->lru, &cma_migrate_pages); + put_page(page); /* Drop the gup reference */ + + ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page, + NULL, 0, MIGRATE_SYNC, MR_CMA); + if (ret) { + if (!list_empty(&cma_migrate_pages)) + putback_movable_pages(&cma_migrate_pages); + } + return 0; +} + long mm_iommu_get(unsigned long ua, unsigned long entries, struct mm_iommu_table_group_mem_t **pmem) { @@ -124,15 +175,36 @@ long mm_iommu_get(unsigned long ua, unsigned long entries, for (i = 0; i < entries; ++i) { if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT), 1/* pages */, 1/* iswrite */, &page)) { + ret = -EFAULT; for (j = 0; j < i; ++j) - put_page(pfn_to_page( - mem->hpas[j] >> PAGE_SHIFT)); + put_page(pfn_to_page(mem->hpas[j] >> + PAGE_SHIFT)); vfree(mem->hpas); kfree(mem); - ret = -EFAULT; goto unlock_exit; } - + /* + * If we get a page from the CMA zone, since we are going to + * be pinning these entries, we might as well move them out + * of the CMA zone if possible. NOTE: faulting in + migration + * can be expensive. Batching can be considered later + */ + if (get_pageblock_migratetype(page) == MIGRATE_CMA) { + if (mm_iommu_move_page_from_cma(page)) + goto populate; + if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT), + 1/* pages */, 1/* iswrite */, + &page)) { + ret = -EFAULT; + for (j = 0; j < i; ++j) + put_page(pfn_to_page(mem->hpas[j] >> + PAGE_SHIFT)); + vfree(mem->hpas); + kfree(mem); + goto unlock_exit; + } + } +populate: mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT; }