From patchwork Mon Jan 14 09:54:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 10761853 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B64B413B5 for ; Mon, 14 Jan 2019 09:55:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C523028A75 for ; Mon, 14 Jan 2019 09:55:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B7B1328AD2; Mon, 14 Jan 2019 09:55:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C631B28A75 for ; Mon, 14 Jan 2019 09:55:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BE9518E0006; Mon, 14 Jan 2019 04:55:04 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B49CE8E0002; Mon, 14 Jan 2019 04:55:04 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 99F568E0006; Mon, 14 Jan 2019 04:55:04 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by kanga.kvack.org (Postfix) with ESMTP id 5F6428E0002 for ; Mon, 14 Jan 2019 04:55:04 -0500 (EST) Received: by mail-qt1-f200.google.com with SMTP id n39so23687543qtn.18 for ; Mon, 14 Jan 2019 01:55:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:in-reply-to:references:mime-version :content-transfer-encoding:message-id; bh=U7HEPtg57nXMiAQrEb3ygWLiirwC0jA7QSr5ANJByhM=; b=iprSqd7s0W3+DQUF6b0F5OyHAgGKfaKa84UnWmLoMzhYRl2hT8QcXmLzv81RxRBuch 9XVmrDfjqO027PPzDiycBdxyTLspc63DbOL6xI8TTI7CL18UdsrlMvZRdpcOKJkl/Tbf 3zwlqrxUoboBxWmuLG1orGedtnguzYnP9WG+gvGnRjT5vbVFrl/kPLxXxKgc+VYxfawL RYJFxpohnJ+zKqBGp9bfb3tPwzJ9i150+cA+c88ujNidquGu7h35Z6TUfYcN2LwZqKXZ SHctDRlW/tEluH5DuSp8ZWQ16fITJnDU+CIlYbLF+7j7ZnULiGKRp1Ov9j6VWcG8gPko Qw6w== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com X-Gm-Message-State: AJcUukcLHIxiCVlxlIUGui75jrcBTySYf87MSwgo29zH1n6Rau7sRKeW IVFCtf/U0R/Vg4KORoVSDYD+NY1nl6ZEqIqE0HJuUrLLAFOhEgQVJlBAE6y6YelTsTZD0uFjmT7 z+NWye5ZkzlRURUQqbupMvfm97Z9Fv20Y1qjFQs8AZPlxLzJcCprqeysJi3i1LnN9oQ== X-Received: by 2002:ac8:2ca9:: with SMTP id 38mr23756709qtw.338.1547459704086; Mon, 14 Jan 2019 01:55:04 -0800 (PST) X-Google-Smtp-Source: ALg8bN6tjas+bXT0znV4N56r4TH7S3NqSWlMXVFg+q0uSprnoM+4lLAe+AwUW5LCQK6zcI2lq68+ X-Received: by 2002:ac8:2ca9:: with SMTP id 38mr23756668qtw.338.1547459703118; Mon, 14 Jan 2019 01:55:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547459703; cv=none; d=google.com; s=arc-20160816; b=OSV5xp1fibY41mkJ1y0u/b71UG+HUwL0PlwafOqdddchpyvnlD0iUxW0xpHmuf+s/b TQbwsVAXhbBYKRN5cuApN58zC2qJe8P+nJdOEk9OCr4qlbji45G0mGz9MIde+HgC/CXC gDK6F2SuNOs69bL9PyKGwKcFlFzw1mm2xRDfJwUVHEQRPk8qYQiIiGa/7PI7aFKi1Oh1 mEBrTcwG3jVksVaKRiW+sSOJrLrxVy2lKrjNJbLQUfpnbj6bADX4tQgOh6tugEgL1kyj yiU+HVWCu/rMs2uUJg58djA3kMaEyseDDZF0y6zPnhYaaT/OdMu9KHWrhuQac0ggwTpa oLMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:content-transfer-encoding:mime-version:references :in-reply-to:date:subject:cc:to:from; bh=U7HEPtg57nXMiAQrEb3ygWLiirwC0jA7QSr5ANJByhM=; b=ctx+80ZbzsB7uMzxOfuitJ6AymnrLiq+Z9ZjjW8Oa3D/DqcHa/PPQ8rglk91ONAOCD uuG2qEzepHnS3n0xhjSHBmkHMH36/FqUCtueiQGVRtPRGAcSxTZmYTbyRmFCky+yuuiT 3cWO2EEz9xwDVCgveTB0rB8ySIYoAGTsLg73ASwsGKKUu8AWbhlMTfS9nb1plTpx3Jy0 fr4nbnBGxRxb4GeVxQSYBrWBgGplPGHeZh9E2APtXnYyN+T5lZ9ftEXkwwdHlRgQzezM S4TvJh5v+ouDE7SeVH+CFQ57dnxs5IIAAq/8+/+eM+UjV04OGlMjtbyoc3uKBU9sqTvz CpuQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com. [148.163.158.5]) by mx.google.com with ESMTPS id h123si2357963qkf.66.2019.01.14.01.55.02 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 14 Jan 2019 01:55:03 -0800 (PST) Received-SPF: pass (google.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) client-ip=148.163.158.5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id x0E9nabL102305 for ; Mon, 14 Jan 2019 04:55:02 -0500 Received: from e13.ny.us.ibm.com (e13.ny.us.ibm.com [129.33.205.203]) by mx0b-001b2d01.pphosted.com with ESMTP id 2q0p2yw64t-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 14 Jan 2019 04:55:02 -0500 Received: from localhost by e13.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 14 Jan 2019 09:55:01 -0000 Received: from b01cxnp22034.gho.pok.ibm.com (9.57.198.24) by e13.ny.us.ibm.com (146.89.104.200) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Mon, 14 Jan 2019 09:54:58 -0000 Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp22034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x0E9svCI20971746 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Mon, 14 Jan 2019 09:54:57 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6C3FBAE064; Mon, 14 Jan 2019 09:54:57 +0000 (GMT) Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DF6FCAE063; Mon, 14 Jan 2019 09:54:54 +0000 (GMT) Received: from skywalker.in.ibm.com (unknown [9.124.31.106]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTP; Mon, 14 Jan 2019 09:54:54 +0000 (GMT) From: "Aneesh Kumar K.V" To: akpm@linux-foundation.org, Michal Hocko , Alexey Kardashevskiy , David Gibson , Andrea Arcangeli , mpe@ellerman.id.au Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, "Aneesh Kumar K.V" Subject: [PATCH V7 2/4] mm: Update get_user_pages_longterm to migrate pages allocated from CMA region Date: Mon, 14 Jan 2019 15:24:35 +0530 X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190114095438.32470-1-aneesh.kumar@linux.ibm.com> References: <20190114095438.32470-1-aneesh.kumar@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 19011409-0064-0000-0000-0000039658E1 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00010402; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000274; SDB=6.01146284; UDB=6.00597007; IPR=6.00926570; MB=3.00025118; MTD=3.00000008; XFM=3.00000015; UTC=2019-01-14 09:55:01 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19011409-0065-0000-0000-00003C07A8A8 Message-Id: <20190114095438.32470-4-aneesh.kumar@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-01-14_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1901140081 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This patch updates get_user_pages_longterm to migrate pages allocated out of CMA region. This makes sure that we don't keep non-movable pages (due to page reference count) in the CMA area. This will be used by ppc64 in a later patch to avoid pinning pages in the CMA region. ppc64 uses CMA region for allocation of the hardware page table (hash page table) and not able to migrate pages out of CMA region results in page table allocation failures. One case where we hit this easy is when a guest using a VFIO passthrough device. VFIO locks all the guest's memory and if the guest memory is backed by CMA region, it becomes unmovable resulting in fragmenting the CMA and possibly preventing other guests from allocation a large enough hash page table. NOTE: We allocate the new page without using __GFP_THISNODE Signed-off-by: Aneesh Kumar K.V --- include/linux/hugetlb.h | 2 + include/linux/mm.h | 3 +- mm/gup.c | 200 +++++++++++++++++++++++++++++++++++----- mm/hugetlb.c | 4 +- 4 files changed, 182 insertions(+), 27 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 087fd5f48c91..1eed0cdaec0e 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -371,6 +371,8 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask); struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address); +struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask, + int nid, nodemask_t *nmask); int huge_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); diff --git a/include/linux/mm.h b/include/linux/mm.h index 80bb6408fe73..20ec56f8e2bb 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1536,7 +1536,8 @@ long get_user_pages_locked(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, int *locked); long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, struct page **pages, unsigned int gup_flags); -#ifdef CONFIG_FS_DAX + +#if defined(CONFIG_FS_DAX) || defined(CONFIG_CMA) long get_user_pages_longterm(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas); diff --git a/mm/gup.c b/mm/gup.c index 05acd7e2eb22..6e8152594e83 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -13,6 +13,9 @@ #include #include #include +#include +#include +#include #include #include @@ -1126,7 +1129,167 @@ long get_user_pages(unsigned long start, unsigned long nr_pages, } EXPORT_SYMBOL(get_user_pages); +#if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA) + #ifdef CONFIG_FS_DAX +static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages) +{ + long i; + struct vm_area_struct *vma_prev = NULL; + + for (i = 0; i < nr_pages; i++) { + struct vm_area_struct *vma = vmas[i]; + + if (vma == vma_prev) + continue; + + vma_prev = vma; + + if (vma_is_fsdax(vma)) + return true; + } + return false; +} +#else +static inline bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages) +{ + return false; +} +#endif + +#ifdef CONFIG_CMA +static struct page *new_non_cma_page(struct page *page, unsigned long private) +{ + /* + * We want to make sure we allocate the new page from the same node + * as the source page. + */ + int nid = page_to_nid(page); + /* + * Trying to allocate a page for migration. Ignore allocation + * failure warnings. We don't force __GFP_THISNODE here because + * this node here is the node where we have CMA reservation and + * in some case these nodes will have really less non movable + * allocation memory. + */ + gfp_t gfp_mask = GFP_USER | __GFP_NOWARN; + + if (PageHighMem(page)) + gfp_mask |= __GFP_HIGHMEM; + +#ifdef CONFIG_HUGETLB_PAGE + if (PageHuge(page)) { + struct hstate *h = page_hstate(page); + /* + * We don't want to dequeue from the pool because pool pages will + * mostly be from the CMA region. + */ + return alloc_migrate_huge_page(h, gfp_mask, nid, NULL); + } +#endif + if (PageTransHuge(page)) { + struct page *thp; + /* + * ignore allocation failure warnings + */ + gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN; + + /* + * Remove the movable mask so that we don't allocate from + * CMA area again. + */ + thp_gfpmask &= ~__GFP_MOVABLE; + thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER); + if (!thp) + return NULL; + prep_transhuge_page(thp); + return thp; + } + + return __alloc_pages_node(nid, gfp_mask, 0); +} + +static long check_and_migrate_cma_pages(unsigned long start, long nr_pages, + unsigned int gup_flags, + struct page **pages, + struct vm_area_struct **vmas) +{ + long i; + bool drain_allow = true; + bool migrate_allow = true; + LIST_HEAD(cma_page_list); + +check_again: + for (i = 0; i < nr_pages; i++) { + /* + * If we get a page from the CMA zone, since we are going to + * be pinning these entries, we might as well move them out + * of the CMA zone if possible. + */ + if (is_migrate_cma_page(pages[i])) { + + struct page *head = compound_head(pages[i]); + + if (PageHuge(head)) { + isolate_huge_page(head, &cma_page_list); + } else { + if (!PageLRU(head) && drain_allow) { + lru_add_drain_all(); + drain_allow = false; + } + + if (!isolate_lru_page(head)) { + list_add_tail(&head->lru, &cma_page_list); + mod_node_page_state(page_pgdat(head), + NR_ISOLATED_ANON + + page_is_file_cache(head), + hpage_nr_pages(head)); + } + } + } + } + + if (!list_empty(&cma_page_list)) { + /* + * drop the above get_user_pages reference. + */ + for (i = 0; i < nr_pages; i++) + put_page(pages[i]); + + if (migrate_pages(&cma_page_list, new_non_cma_page, + NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE)) { + /* + * some of the pages failed migration. Do get_user_pages + * without migration. + */ + migrate_allow = false; + + if (!list_empty(&cma_page_list)) + putback_movable_pages(&cma_page_list); + } + /* + * We did migrate all the pages, Try to get the page references again + * migrating any new CMA pages which we failed to isolate earlier. + */ + nr_pages = get_user_pages(start, nr_pages, gup_flags, pages, vmas); + if ((nr_pages > 0) && migrate_allow) { + drain_allow = true; + goto check_again; + } + } + + return nr_pages; +} +#else +static inline long check_and_migrate_cma_pages(unsigned long start, long nr_pages, + unsigned int gup_flags, + struct page **pages, + struct vm_area_struct **vmas) +{ + return nr_pages; +} +#endif + /* * This is the same as get_user_pages() in that it assumes we are * operating on the current task's mm, but it goes further to validate @@ -1140,11 +1303,11 @@ EXPORT_SYMBOL(get_user_pages); * Contrast this to iov_iter_get_pages() usages which are transient. */ long get_user_pages_longterm(unsigned long start, unsigned long nr_pages, - unsigned int gup_flags, struct page **pages, - struct vm_area_struct **vmas_arg) + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas_arg) { struct vm_area_struct **vmas = vmas_arg; - struct vm_area_struct *vma_prev = NULL; + unsigned long flags; long rc, i; if (!pages) @@ -1157,31 +1320,20 @@ long get_user_pages_longterm(unsigned long start, unsigned long nr_pages, return -ENOMEM; } + flags = memalloc_nocma_save(); rc = get_user_pages(start, nr_pages, gup_flags, pages, vmas); + memalloc_nocma_restore(flags); + if (rc < 0) + goto out; - for (i = 0; i < rc; i++) { - struct vm_area_struct *vma = vmas[i]; - - if (vma == vma_prev) - continue; - - vma_prev = vma; - - if (vma_is_fsdax(vma)) - break; - } - - /* - * Either get_user_pages() failed, or the vma validation - * succeeded, in either case we don't need to put_page() before - * returning. - */ - if (i >= rc) + if (check_dax_vmas(vmas, rc)) { + for (i = 0; i < rc; i++) + put_page(pages[i]); + rc = -EOPNOTSUPP; goto out; + } - for (i = 0; i < rc; i++) - put_page(pages[i]); - rc = -EOPNOTSUPP; + rc = check_and_migrate_cma_pages(start, rc, gup_flags, pages, vmas); out: if (vmas != vmas_arg) kfree(vmas); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index df2e7dd5ff17..913862771808 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1586,8 +1586,8 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, return page; } -static struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask, - int nid, nodemask_t *nmask) +struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask, + int nid, nodemask_t *nmask) { struct page *page;