From patchwork Tue Mar 8 21:34:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12774402 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE793C433F5 for ; Tue, 8 Mar 2022 21:35:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 26C2C8D0010; Tue, 8 Mar 2022 16:35:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 217CB8D0001; Tue, 8 Mar 2022 16:35:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 01B488D0010; Tue, 8 Mar 2022 16:35:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id DDD088D0001 for ; Tue, 8 Mar 2022 16:35:12 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id AD1EF2072D for ; Tue, 8 Mar 2022 21:35:12 +0000 (UTC) X-FDA: 79222524864.14.ED32DD5 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf16.hostedemail.com (Postfix) with ESMTP id 403C118000E for ; Tue, 8 Mar 2022 21:35:12 +0000 (UTC) Received: by mail-pg1-f202.google.com with SMTP id g31-20020a63521f000000b003783582a261so143992pgb.5 for ; Tue, 08 Mar 2022 13:35:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=kONPxFUwqtCJSoObDTSTQSIAj4A5yIP0m8A8ViPpLgk=; b=ZiK0UJaOnPideDrH5WTLprss7RJmCz0RC2ybLROo78mtqlmyDY4OM99jizFS6RyC5A 0e4blSZ0vyOH47X4mNIob+MUQptouzEm2XTA1mPQNZtJ1FlhgAz4EPwLVYuptvKClp4U JnwIaFSuX9mzUxwzYhyh6OUEGRH1DE1yA3UAdUuoxDYr0ZiMC/HLun4dBv8ICsDDrrD2 LdGKbb8dvCZxxfBki1YUzSlp7Pj+yOn/kgkFWoHg7G/ggiDOBwqXkWj8r1ozoz6l9f6+ GIgtmwYirTULnJ+5N3srvWZIRwBhbdOjOn6mglU6jJHb8ruaUypY2LeMCc+iOqhEoYPP H8PA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=kONPxFUwqtCJSoObDTSTQSIAj4A5yIP0m8A8ViPpLgk=; b=TwTbBXMHHkdq+ZQmjsISw3xvDr1sWJgdY2pjX0uw0YOX4Pw/rZ2tTjwfv7F6WWHn3L L5guRzaYnARklEw5CILyEVIjfXVyax/OBuI8juTRuGIBxq64CYNGDH6R7ztypO3jnYLo OFOIbnaTn4Hr335zek8VUQB0o0W/tX/RqTU0rAYRuxfjfeEYmMEtwlCkKw4S0sNkWpUb 7oN9/czZ2juVPnX4kKyXivMvHUCIv0H6odd+GbnTorswctMpTxbCRe9IrCAlxQXi2iHH ImlePJ5T/fSUqLQL2+OF5vy+hUqnwAQCfW2jFodzQPTREuZSIDD3VcaEmOy4PLR1GBcq 2TXQ== X-Gm-Message-State: AOAM533rimxEoDJnHqzbVkbUwvrtnllYUK3L5Od8/IqZhgdP2wxpo+02 qSn/Wt3nLQEAxidOzWhyP7U2HZrwkFcs X-Google-Smtp-Source: ABdhPJwswgBfRWAdXvSJ2bD/0J5YHz9l+TXT0xPaCjv3ZH+ZswujjTrwYwTAVcnkhgZJC5wXmxoWU9k06LmE X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:930b:b0:14d:b0c0:1f71 with SMTP id bc11-20020a170902930b00b0014db0c01f71mr19701788plb.113.1646775311273; Tue, 08 Mar 2022 13:35:11 -0800 (PST) Date: Tue, 8 Mar 2022 13:34:16 -0800 In-Reply-To: <20220308213417.1407042-1-zokeefe@google.com> Message-Id: <20220308213417.1407042-14-zokeefe@google.com> Mime-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog Subject: [RFC PATCH 13/14] mm/madvise: add __madvise_collapse_*_batch() actions. From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Richard Henderson , Thomas Bogendoerfer , Yang Shi , "Zach O'Keefe" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 403C118000E X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ZiK0UJaO; spf=pass (imf16.hostedemail.com: domain of 3D8wnYgcKCF4VKGAABACKKCHA.8KIHEJQT-IIGR68G.KNC@flex--zokeefe.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3D8wnYgcKCF4VKGAABACKKCHA.8KIHEJQT-IIGR68G.KNC@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: 9fi44sg9edu18i737em44pstxixfewe5 X-HE-Tag: 1646775312-287625 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add implementations for the following batch actions: scan_pmd: Iterate over batch and scan the pmd for eligibility. Note that this function is called with mmap_lock in read, and does not drop it before returning. If a batch entry fails, ->continue_collapse field of its madvise_collapse_data is set to 'false' so that later _batch actions know to ignore it. Return the number of THPs already the batch, which is needed by _madvise_collapse() to determine overall "success" criteria (all pmds either collapsed successfully, or already THP-backed). prealloc_hpages: Iterate over batch and allocate / charge hugepages. Before allocating a new page, check on local free hugepage list. Similarly, if, after allocating a hugepage, charging the memcg fails, save the hugepage on a local free list for future use. swapin_pmd: Iterate over batch and attempt to swap-in pages that are currently swapped out. Called with mmap_lock in read, and returns with it held; however, it might drop and require the lock internally. Specifically, __collapse_huge_page_swapin() might drop + require the mmap_lock. When it does so, it only revalidates the vma/address for a single pmd. Since we need to revalidate the vma for the entire region covered in the batch, we need to be notified when the lock is dropped so that we can perform the required revalidation. As such, add an argument to __collapse_huge_page_swapin() to notify caller when mmap_lock is dropped. collapse_pmd: Iterate over the batch and perform the actual collapse for each pmd. Note that this is done while holding the mmap_lock in write for the entire batch action. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 153 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 145 insertions(+), 8 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index ea53c706602e..e8156f15a3da 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2572,8 +2572,23 @@ __madvise_collapse_scan_pmd_batch(struct mm_struct *mm, int batch_size, struct collapse_control *cc) { - /* Implemented in later patch */ - return 0; + unsigned long addr, i; + int thps = 0; + + mmap_assert_locked(mm); + + for (addr = batch_start, i = 0; i < batch_size; + addr += HPAGE_PMD_SIZE, ++i) { + struct madvise_collapse_data *data = batch_data + i; + + scan_pmd(mm, vma, addr, cc, &data->scan_result); + data->continue_collapse = + data->scan_result.result == SCAN_SUCCEED; + if (data->scan_result.result == SCAN_PAGE_COMPOUND) + ++thps; + } + mmap_assert_locked(mm); + return thps; } /* @@ -2590,8 +2605,39 @@ __madvise_collapse_prealloc_hpages_batch(struct mm_struct *mm, int batch_size, struct collapse_control *cc) { - /* Implemented in later patch */ - return 0; + int nr_hpages = 0; + int i; + + for (i = 0; i < batch_size; ++i) { + struct madvise_collapse_data *data = batch_data + i; + + if (!data->continue_collapse) + continue; + + if (!list_empty(&cc->free_hpages[node])) { + data->hpage = list_first_entry(&cc->free_hpages[node], + struct page, lru); + list_del(&data->hpage->lru); + } else { + data->hpage = __alloc_pages_node(node, gfp, + HPAGE_PMD_ORDER); + if (unlikely(!data->hpage)) + break; + + prep_transhuge_page(data->hpage); + + if (unlikely(mem_cgroup_charge(page_folio(data->hpage), + mm, gfp))) { + /* No use reusing page, so give it back */ + put_page(data->hpage); + data->hpage = NULL; + data->continue_collapse = false; + break; + } + } + ++nr_hpages; + } + return nr_hpages; } /* @@ -2612,8 +2658,67 @@ __madvise_collapse_swapin_pmd_batch(struct mm_struct *mm, struct collapse_control *cc) { - /* Implemented in later patch */ - return true; + unsigned long addr; + int i; + bool ret = true; + + /* + * This function is called with mmap_lock held, and returns with it + * held. However, __collapse_huge_page_swapin() may internally drop and + * reaquire the lock. When it does, it only revalidates the single pmd + * provided to it. We need to know when it drops the lock so that we can + * revalidate the batch of pmds we are operating on. + * + * Initially setting this to 'true' because the caller just locked + * mmap_lock and so we need to revalidate before doing anything else. + */ + bool need_revalidate_pmd_count = true; + + for (addr = batch_start, i = 0; + i < batch_size; + addr += HPAGE_PMD_SIZE, ++i) { + struct vm_area_struct *vma; + struct madvise_collapse_data *data = batch_data + i; + + mmap_assert_locked(mm); + + /* + * We might have dropped the lock during previous iteration. + * It's acceptable to exit this function without revalidating + * the vma since the caller immediately unlocks mmap_lock + * anyway. + */ + if (!data->continue_collapse) + continue; + + if (need_revalidate_pmd_count) { + if (madvise_collapse_vma_revalidate_pmd_count(mm, + batch_start, + batch_size, + &vma)) { + ret = false; + break; + } + need_revalidate_pmd_count = false; + } + + data->pmd = mm_find_pmd(mm, addr); + + if (!data->pmd || + (data->scan_result.unmapped && + !__collapse_huge_page_swapin(mm, vma, addr, data->pmd, + VM_NOHUGEPAGE, + data->scan_result.referenced, + &need_revalidate_pmd_count))) { + /* Hold on to the THP until we know we don't need it. */ + data->continue_collapse = false; + list_add_tail(&data->hpage->lru, + &cc->free_hpages[node]); + data->hpage = NULL; + } + } + mmap_assert_locked(mm); + return ret; } /* @@ -2630,8 +2735,40 @@ __madvise_collapse_pmd_batch(struct mm_struct *mm, int node, struct collapse_control *cc) { - /* Implemented in later patch */ - return 0; + unsigned long addr; + struct vm_area_struct *vma; + int i, ret = 0; + + mmap_assert_write_locked(mm); + + if (madvise_collapse_vma_revalidate_pmd_count(mm, batch_start, + batch_size, &vma)) + goto out; + + for (addr = batch_start, i = 0; + i < batch_size; + addr += HPAGE_PMD_SIZE, ++i) { + int result; + struct madvise_collapse_data *data = batch_data + i; + + if (!data->continue_collapse || + (mm_find_pmd(mm, addr) != data->pmd)) + continue; + + result = __do_collapse_huge_page(mm, vma, addr, data->pmd, + data->hpage, + cc->enforce_pte_scan_limits, + NULL); + + if (result == SCAN_SUCCEED) + ++ret; + else + list_add_tail(&data->hpage->lru, + &cc->free_hpages[node]); + data->hpage = NULL; + } +out: + return ret; } static bool continue_collapse(struct madvise_collapse_data *batch_data,