diff mbox series

[RFC,V2,4/4] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu_get

Message ID 20180906054342.25094-4-aneesh.kumar@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series [RFC,V2,1/4] mm: Export alloc_migrate_huge_page | expand

Commit Message

Aneesh Kumar K.V Sept. 6, 2018, 5:43 a.m. UTC
Current code doesn't do page migration if the page allocated is a compound page.
With HugeTLB migration support, we can end up allocating hugetlb pages from
CMA region. Also THP pages can be allocated from CMA region. This patch updates
the code to handle compound pages correctly.

This use the new helper get_user_pages_cma_migrate. It does one get_user_pages
with right count, instead of doing one get_user_pages per page. That avoids
reading page table multiple times.

The patch also convert the hpas member of mm_iommu_table_group_mem_t to a union.
We use the same storage location to store pointers to struct page. We cannot
update alll the code path use struct page *, because we access hpas in real mode
and we can't do that struct page * to pfn conversion in real mode.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/mm/mmu_context_iommu.c | 124 +++++++++-------------------
 1 file changed, 37 insertions(+), 87 deletions(-)

Comments

Michal Hocko Sept. 6, 2018, 12:53 p.m. UTC | #1
On Thu 06-09-18 11:13:42, Aneesh Kumar K.V wrote:
> Current code doesn't do page migration if the page allocated is a compound page.
> With HugeTLB migration support, we can end up allocating hugetlb pages from
> CMA region. Also THP pages can be allocated from CMA region. This patch updates
> the code to handle compound pages correctly.
> 
> This use the new helper get_user_pages_cma_migrate. It does one get_user_pages
> with right count, instead of doing one get_user_pages per page. That avoids
> reading page table multiple times.
> 
> The patch also convert the hpas member of mm_iommu_table_group_mem_t to a union.
> We use the same storage location to store pointers to struct page. We cannot
> update alll the code path use struct page *, because we access hpas in real mode
> and we can't do that struct page * to pfn conversion in real mode.

I am not fmailiar with this code so bear with me. I am completely
missing the purpose of this patch. The changelog doesn't really explain
that AFAICS. I can only guess that you do not want to establish long
pins on CMA pages, right? So whenever you are about to pin a page that
is in CMA you migrate it away to a different !__GFP_MOVABLE page, right?
If that is the case then how do you handle pins which are already in
zone_movable? I do not see any specific check for those.

Btw. why is this a proper thing to do? Problems with longterm pins are
not only for CMA/ZONE_MOVABLE pages. Pinned pages are not reclaimable as
well so there is a risk of OOMs if there are too many of them. We have
discussed approaches that would allow to force pin invalidation/revocation
at LSF/MM. Isn't that a more appropriate solution to the problem you are
seeing?
Aneesh Kumar K.V Sept. 6, 2018, 1:30 p.m. UTC | #2
On 09/06/2018 06:23 PM, Michal Hocko wrote:
> On Thu 06-09-18 11:13:42, Aneesh Kumar K.V wrote:
>> Current code doesn't do page migration if the page allocated is a compound page.
>> With HugeTLB migration support, we can end up allocating hugetlb pages from
>> CMA region. Also THP pages can be allocated from CMA region. This patch updates
>> the code to handle compound pages correctly.
>>
>> This use the new helper get_user_pages_cma_migrate. It does one get_user_pages
>> with right count, instead of doing one get_user_pages per page. That avoids
>> reading page table multiple times.
>>
>> The patch also convert the hpas member of mm_iommu_table_group_mem_t to a union.
>> We use the same storage location to store pointers to struct page. We cannot
>> update alll the code path use struct page *, because we access hpas in real mode
>> and we can't do that struct page * to pfn conversion in real mode.
> 
> I am not fmailiar with this code so bear with me. I am completely
> missing the purpose of this patch. The changelog doesn't really explain
> that AFAICS. I can only guess that you do not want to establish long
> pins on CMA pages, right? So whenever you are about to pin a page that
> is in CMA you migrate it away to a different !__GFP_MOVABLE page, right?

That is right.

> If that is the case then how do you handle pins which are already in
> zone_movable? I do not see any specific check for those.


> 
> Btw. why is this a proper thing to do? Problems with longterm pins are
> not only for CMA/ZONE_MOVABLE pages. Pinned pages are not reclaimable as
> well so there is a risk of OOMs if there are too many of them. We have
> discussed approaches that would allow to force pin invalidation/revocation
> at LSF/MM. Isn't that a more appropriate solution to the problem you are
> seeing?
> 

The CMA area is used on powerpc platforms to allocate guest specific 
page table (hash page table). If we don't have sufficient free pages we 
fail to allocate hash page table that result in failure to start guest.

Now with vfio, we end up pinning the entire guest RAM. There is a 
possibility that these guest RAM  pages got allocated from CMA region. 
We already do supporting migrating those pages out except for compound 
pages. What this patch does is to start supporting compound page 
migration that got allocated out of CMA region (ie, THP pages and 
hugetlb pages if platform supported hugetlb migration).

Now to do that I added a helper get_user_pages_cma_migrate().

I agree that long term pinned pages do have other issues. The patchset 
is not solving that issue.

-aneesh
Michal Hocko Sept. 7, 2018, 9:03 a.m. UTC | #3
On Thu 06-09-18 19:00:43, Aneesh Kumar K.V wrote:
> On 09/06/2018 06:23 PM, Michal Hocko wrote:
> > On Thu 06-09-18 11:13:42, Aneesh Kumar K.V wrote:
> > > Current code doesn't do page migration if the page allocated is a compound page.
> > > With HugeTLB migration support, we can end up allocating hugetlb pages from
> > > CMA region. Also THP pages can be allocated from CMA region. This patch updates
> > > the code to handle compound pages correctly.
> > > 
> > > This use the new helper get_user_pages_cma_migrate. It does one get_user_pages
> > > with right count, instead of doing one get_user_pages per page. That avoids
> > > reading page table multiple times.
> > > 
> > > The patch also convert the hpas member of mm_iommu_table_group_mem_t to a union.
> > > We use the same storage location to store pointers to struct page. We cannot
> > > update alll the code path use struct page *, because we access hpas in real mode
> > > and we can't do that struct page * to pfn conversion in real mode.
> > 
> > I am not fmailiar with this code so bear with me. I am completely
> > missing the purpose of this patch. The changelog doesn't really explain
> > that AFAICS. I can only guess that you do not want to establish long
> > pins on CMA pages, right? So whenever you are about to pin a page that
> > is in CMA you migrate it away to a different !__GFP_MOVABLE page, right?
> 
> That is right.
> 
> > If that is the case then how do you handle pins which are already in
> > zone_movable? I do not see any specific check for those.
> 
> 
> > 
> > Btw. why is this a proper thing to do? Problems with longterm pins are
> > not only for CMA/ZONE_MOVABLE pages. Pinned pages are not reclaimable as
> > well so there is a risk of OOMs if there are too many of them. We have
> > discussed approaches that would allow to force pin invalidation/revocation
> > at LSF/MM. Isn't that a more appropriate solution to the problem you are
> > seeing?
> > 
> 
> The CMA area is used on powerpc platforms to allocate guest specific page
> table (hash page table). If we don't have sufficient free pages we fail to
> allocate hash page table that result in failure to start guest.
> 
> Now with vfio, we end up pinning the entire guest RAM. There is a
> possibility that these guest RAM  pages got allocated from CMA region. We
> already do supporting migrating those pages out except for compound pages.
> What this patch does is to start supporting compound page migration that got
> allocated out of CMA region (ie, THP pages and hugetlb pages if platform
> supported hugetlb migration).

This definitely belongs to the changelog.

> Now to do that I added a helper get_user_pages_cma_migrate().
> 
> I agree that long term pinned pages do have other issues. The patchset is
> not solving that issue.

It would be great to note why a generic approach is not viable. I assume
the main reason is that those pins are pretty much permanent for the
guest lifetime so the situation has to be handled in advance. In other
words, more information please.
Aneesh Kumar K.V Sept. 7, 2018, 11:15 a.m. UTC | #4
On 09/07/2018 02:33 PM, Michal Hocko wrote:
> On Thu 06-09-18 19:00:43, Aneesh Kumar K.V wrote:
>> On 09/06/2018 06:23 PM, Michal Hocko wrote:
>>> On Thu 06-09-18 11:13:42, Aneesh Kumar K.V wrote:
>>>> Current code doesn't do page migration if the page allocated is a compound page.
>>>> With HugeTLB migration support, we can end up allocating hugetlb pages from
>>>> CMA region. Also THP pages can be allocated from CMA region. This patch updates
>>>> the code to handle compound pages correctly.
>>>>
>>>> This use the new helper get_user_pages_cma_migrate. It does one get_user_pages
>>>> with right count, instead of doing one get_user_pages per page. That avoids
>>>> reading page table multiple times.
>>>>
>>>> The patch also convert the hpas member of mm_iommu_table_group_mem_t to a union.
>>>> We use the same storage location to store pointers to struct page. We cannot
>>>> update alll the code path use struct page *, because we access hpas in real mode
>>>> and we can't do that struct page * to pfn conversion in real mode.
>>>
>>> I am not fmailiar with this code so bear with me. I am completely
>>> missing the purpose of this patch. The changelog doesn't really explain
>>> that AFAICS. I can only guess that you do not want to establish long
>>> pins on CMA pages, right? So whenever you are about to pin a page that
>>> is in CMA you migrate it away to a different !__GFP_MOVABLE page, right?
>>
>> That is right.
>>
>>> If that is the case then how do you handle pins which are already in
>>> zone_movable? I do not see any specific check for those.
>>
>>
>>>
>>> Btw. why is this a proper thing to do? Problems with longterm pins are
>>> not only for CMA/ZONE_MOVABLE pages. Pinned pages are not reclaimable as
>>> well so there is a risk of OOMs if there are too many of them. We have
>>> discussed approaches that would allow to force pin invalidation/revocation
>>> at LSF/MM. Isn't that a more appropriate solution to the problem you are
>>> seeing?
>>>
>>
>> The CMA area is used on powerpc platforms to allocate guest specific page
>> table (hash page table). If we don't have sufficient free pages we fail to
>> allocate hash page table that result in failure to start guest.
>>
>> Now with vfio, we end up pinning the entire guest RAM. There is a
>> possibility that these guest RAM  pages got allocated from CMA region. We
>> already do supporting migrating those pages out except for compound pages.
>> What this patch does is to start supporting compound page migration that got
>> allocated out of CMA region (ie, THP pages and hugetlb pages if platform
>> supported hugetlb migration).
> 
> This definitely belongs to the changelog.
> 
>> Now to do that I added a helper get_user_pages_cma_migrate().
>>
>> I agree that long term pinned pages do have other issues. The patchset is
>> not solving that issue.
> 
> It would be great to note why a generic approach is not viable. I assume
> the main reason is that those pins are pretty much permanent for the
> guest lifetime so the situation has to be handled in advance. In other
> words, more information please.
> 

That is correct. I will add these details to commit message. And will 
also do a cover letter for the patch series.

-aneesh
Michal Hocko Sept. 7, 2018, 11:25 a.m. UTC | #5
On Fri 07-09-18 16:45:09, Aneesh Kumar K.V wrote:
> On 09/07/2018 02:33 PM, Michal Hocko wrote:
> > On Thu 06-09-18 19:00:43, Aneesh Kumar K.V wrote:
> > > On 09/06/2018 06:23 PM, Michal Hocko wrote:
> > > > On Thu 06-09-18 11:13:42, Aneesh Kumar K.V wrote:
> > > > > Current code doesn't do page migration if the page allocated is a compound page.
> > > > > With HugeTLB migration support, we can end up allocating hugetlb pages from
> > > > > CMA region. Also THP pages can be allocated from CMA region. This patch updates
> > > > > the code to handle compound pages correctly.
> > > > > 
> > > > > This use the new helper get_user_pages_cma_migrate. It does one get_user_pages
> > > > > with right count, instead of doing one get_user_pages per page. That avoids
> > > > > reading page table multiple times.
> > > > > 
> > > > > The patch also convert the hpas member of mm_iommu_table_group_mem_t to a union.
> > > > > We use the same storage location to store pointers to struct page. We cannot
> > > > > update alll the code path use struct page *, because we access hpas in real mode
> > > > > and we can't do that struct page * to pfn conversion in real mode.
> > > > 
> > > > I am not fmailiar with this code so bear with me. I am completely
> > > > missing the purpose of this patch. The changelog doesn't really explain
> > > > that AFAICS. I can only guess that you do not want to establish long
> > > > pins on CMA pages, right? So whenever you are about to pin a page that
> > > > is in CMA you migrate it away to a different !__GFP_MOVABLE page, right?
> > > 
> > > That is right.
> > > 
> > > > If that is the case then how do you handle pins which are already in
> > > > zone_movable? I do not see any specific check for those.
> > > 
> > > 
> > > > 
> > > > Btw. why is this a proper thing to do? Problems with longterm pins are
> > > > not only for CMA/ZONE_MOVABLE pages. Pinned pages are not reclaimable as
> > > > well so there is a risk of OOMs if there are too many of them. We have
> > > > discussed approaches that would allow to force pin invalidation/revocation
> > > > at LSF/MM. Isn't that a more appropriate solution to the problem you are
> > > > seeing?
> > > > 
> > > 
> > > The CMA area is used on powerpc platforms to allocate guest specific page
> > > table (hash page table). If we don't have sufficient free pages we fail to
> > > allocate hash page table that result in failure to start guest.
> > > 
> > > Now with vfio, we end up pinning the entire guest RAM. There is a
> > > possibility that these guest RAM  pages got allocated from CMA region. We
> > > already do supporting migrating those pages out except for compound pages.
> > > What this patch does is to start supporting compound page migration that got
> > > allocated out of CMA region (ie, THP pages and hugetlb pages if platform
> > > supported hugetlb migration).
> > 
> > This definitely belongs to the changelog.
> > 
> > > Now to do that I added a helper get_user_pages_cma_migrate().
> > > 
> > > I agree that long term pinned pages do have other issues. The patchset is
> > > not solving that issue.
> > 
> > It would be great to note why a generic approach is not viable. I assume
> > the main reason is that those pins are pretty much permanent for the
> > guest lifetime so the situation has to be handled in advance. In other
> > words, more information please.
> > 
> 
> That is correct. I will add these details to commit message. And will also
> do a cover letter for the patch series.

OK, then the early migration makes some sense. Although I suspect this
will lead to other issues (OOM in kernel zones) but revocation approach
is clearly not usable. An excessive pinning simply sucks.

Thanks a lot for the updated information though!
diff mbox series

Patch

diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index f472965f7638..607acd03ab06 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -20,6 +20,7 @@ 
 #include <linux/swap.h>
 #include <asm/mmu_context.h>
 #include <asm/pte-walk.h>
+#include <linux/mm_inline.h>
 
 static DEFINE_MUTEX(mem_list_mutex);
 
@@ -30,8 +31,18 @@  struct mm_iommu_table_group_mem_t {
 	atomic64_t mapped;
 	unsigned int pageshift;
 	u64 ua;			/* userspace address */
-	u64 entries;		/* number of entries in hpas[] */
-	u64 *hpas;		/* vmalloc'ed */
+	u64 entries;		/* number of entries in hpages[] */
+	/*
+	 * in mm_iommu_get we temporarily use this to store
+	 * struct page address.
+	 *
+	 * We need to convert ua to hpa in real mode. Make it
+	 * simpler by storing physicall address.
+	 */
+	union {
+		struct page **hpages;	/* vmalloc'ed */
+		phys_addr_t *hpas;
+	};
 };
 
 static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
@@ -74,63 +85,12 @@  bool mm_iommu_preregistered(struct mm_struct *mm)
 }
 EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
 
-/*
- * Taken from alloc_migrate_target with changes to remove CMA allocations
- */
-struct page *new_iommu_non_cma_page(struct page *page, unsigned long private)
-{
-	gfp_t gfp_mask = GFP_USER;
-	struct page *new_page;
-
-	if (PageCompound(page))
-		return NULL;
-
-	if (PageHighMem(page))
-		gfp_mask |= __GFP_HIGHMEM;
-
-	/*
-	 * We don't want the allocation to force an OOM if possibe
-	 */
-	new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
-	return new_page;
-}
-
-static int mm_iommu_move_page_from_cma(struct page *page)
-{
-	int ret = 0;
-	LIST_HEAD(cma_migrate_pages);
-
-	/* Ignore huge pages for now */
-	if (PageCompound(page))
-		return -EBUSY;
-
-	lru_add_drain();
-	ret = isolate_lru_page(page);
-	if (ret)
-		return ret;
-
-	list_add(&page->lru, &cma_migrate_pages);
-	put_page(page); /* Drop the gup reference */
-
-	ret = migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page,
-				NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE);
-	if (ret) {
-		if (!list_empty(&cma_migrate_pages))
-			putback_movable_pages(&cma_migrate_pages);
-	}
-
-	return 0;
-}
-
 long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
 		struct mm_iommu_table_group_mem_t **pmem)
 {
 	struct mm_iommu_table_group_mem_t *mem;
-	long i, j, ret = 0, locked_entries = 0;
+	long i, ret = 0, locked_entries = 0;
 	unsigned int pageshift;
-	unsigned long flags;
-	unsigned long cur_ua;
-	struct page *page = NULL;
 
 	mutex_lock(&mem_list_mutex);
 
@@ -177,47 +137,37 @@  long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
 		goto unlock_exit;
 	}
 
+	ret = get_user_pages_cma_migrate(ua, entries, 1, mem->hpages);
+	if (ret != entries) {
+		/* free the reference taken */
+		for (i = 0; i < ret; i++)
+			put_page(mem->hpages[i]);
+
+		vfree(mem->hpas);
+		kfree(mem);
+		ret = -EFAULT;
+		goto unlock_exit;
+	} else
+		ret = 0;
+
+	pageshift = PAGE_SHIFT;
 	for (i = 0; i < entries; ++i) {
-		cur_ua = ua + (i << PAGE_SHIFT);
-		if (1 != get_user_pages_fast(cur_ua,
-					1/* pages */, 1/* iswrite */, &page)) {
-			ret = -EFAULT;
-			for (j = 0; j < i; ++j)
-				put_page(pfn_to_page(mem->hpas[j] >>
-						PAGE_SHIFT));
-			vfree(mem->hpas);
-			kfree(mem);
-			goto unlock_exit;
-		}
+		struct page *page = mem->hpages[i];
 		/*
-		 * If we get a page from the CMA zone, since we are going to
-		 * be pinning these entries, we might as well move them out
-		 * of the CMA zone if possible. NOTE: faulting in + migration
-		 * can be expensive. Batching can be considered later
+		 * Allow to use larger than 64k IOMMU pages. Only do that
+		 * if we are backed by hugetlb.
 		 */
-		if (is_migrate_cma_page(page)) {
-			if (mm_iommu_move_page_from_cma(page))
-				goto populate;
-			if (1 != get_user_pages_fast(cur_ua,
-						1/* pages */, 1/* iswrite */,
-						&page)) {
-				ret = -EFAULT;
-				for (j = 0; j < i; ++j)
-					put_page(pfn_to_page(mem->hpas[j] >>
-								PAGE_SHIFT));
-				vfree(mem->hpas);
-				kfree(mem);
-				goto unlock_exit;
-			}
-		}
-populate:
-		pageshift = PAGE_SHIFT;
-		if (mem->pageshift > PAGE_SHIFT && PageHuge(page)) {
+		if ((mem->pageshift > PAGE_SHIFT) && PageHuge(page)) {
 			struct page *head = compound_head(page);
 			pageshift = compound_order(head) + PAGE_SHIFT;
 		}
 		mem->pageshift = min(mem->pageshift, pageshift);
+		/*
+		 * We don't need struct page reference any more, switch
+		 * physicall address.
+		 */
 		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
+
 	}
 
 	atomic64_set(&mem->mapped, 1);