From patchwork Fri Apr 4 21:06:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: SeongJae Park X-Patchwork-Id: 14038912 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96E02C36010 for ; Fri, 4 Apr 2025 21:07:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BD88F6B0093; Fri, 4 Apr 2025 17:07:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B38616B0092; Fri, 4 Apr 2025 17:07:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 89FD46B0096; Fri, 4 Apr 2025 17:07:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 68EBB6B0092 for ; Fri, 4 Apr 2025 17:07:11 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 45BBF1405DA for ; Fri, 4 Apr 2025 21:07:12 +0000 (UTC) X-FDA: 83297596704.06.6CC167C Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf25.hostedemail.com (Postfix) with ESMTP id 8E947A000B for ; Fri, 4 Apr 2025 21:07:10 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=HjPrYRhr; spf=pass (imf25.hostedemail.com: domain of sj@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743800830; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0HlGlm/xL8v2iPJbXYinHFDaKH26Yu1BTKvZNFIYAno=; b=VjUPQ+iv+eovQuujNxlWqY0UevDYCaiH3YN7GqSjosnDlzDnNOcEIL/dsMVuxbikZsw6lO +Qr6mO4rxvELbMnlMNH4IC+ETr10gnWCYNvTx+1dXA7sMu92m6hzyjHh1LQbeNTZwh4m6q d3SsDXJKmAMFPXtmpCS+44r4XcYZ084= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=HjPrYRhr; spf=pass (imf25.hostedemail.com: domain of sj@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743800830; a=rsa-sha256; cv=none; b=npTjpogSW6/SQbd4bFY6vF8FWgnRqWBM8J/gleXqtlHHKLKsNm9a0u4lfnzGmKqAkb5eP1 gXYTknWC0NCmkz2LB+54KevZcL4Yqb4MdVGd/6j53AxOAjzWugWE3tpBZgjfXoZGOw4ODi nkiowZLcA5EEdWpxkUK2i0NY/6UKDCc= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 3FEB25C6C42; Fri, 4 Apr 2025 21:04:52 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BE601C4CEDD; Fri, 4 Apr 2025 21:07:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1743800828; bh=x25eIUo6VkYoCUtqxJ6hHhPHU0LH2iMLq4u6rDb9zDQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=HjPrYRhr9Z7oTIv0r/O7mTYr2szq9bcvmUiqSzHwAamexSCI1B7OB5tAvTdgqr0XR r4iSWr7knMRVGPyCELhZTjQ0tUwZ11hqg76RYMy9LVzWdswC/c+ylC/P0JHcQLFSDG E9IbbpFZufhnMH1G0a39KdFFJVp0bjta7evjxrdnBXqW8BEwyzF9j3iAN/GWwG/zLI hdo8SNtiKK2HZYRGX+Be77WnsEYlwaiUr8SVsIX8mJXlP0+UJhNsbTxNvLmdTrhcQT GO/1FlgMgpcnvX1R5vu+kp5GhVqW6w9Hn4uaLTZGaIXSyzda+5QCKU7+A7trNNM8sZ PN6QwXfiSmTqw== From: SeongJae Park To: Andrew Morton Cc: SeongJae Park , "Liam R.Howlett" , David Hildenbrand , Lorenzo Stoakes , Rik van Riel , Shakeel Butt , Vlastimil Babka , kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 2/4] mm/madvise: batch tlb flushes for MADV_FREE Date: Fri, 4 Apr 2025 14:06:58 -0700 Message-Id: <20250404210700.2156-3-sj@kernel.org> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250404210700.2156-1-sj@kernel.org> References: <20250404210700.2156-1-sj@kernel.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 8E947A000B X-Stat-Signature: zggh9eixrak4u9pkgbq96z1ebpjj3xxe X-HE-Tag: 1743800830-40027 X-HE-Meta: U2FsdGVkX18XimThejjPFrUjcd0EhgM9m2gf7epiVcPa1Htgol3QLnGMQQvWnd4nvk4LZVmeCBe9MUlzlxwLJkV3OmZXyoe38zvTUP5mbXXXROqjyNcC6iVJR/ubQxUyLvnap1JiX3H5ztnoJJN4FzBnIJQGI3Vap++Gzxvw6RfKrT5bYT9gsfXmfQbVDbkRRZb8qupgKMlIKzpFOxM8GuKABXyXdfFtta48Zsen5E/U2YQO0fqnpv4xbiO++4oPfqxAn7SRtEKJkhi12EaqlDP1TbpyXAxOEcd+UDASM6XD2yrOjE1L9Mwnx7ELUupC2PS5sKV0YUmkU+Udg5oBSgO8z7aIBJlYXN1HdRPuyS+JByS2ujMGR/blNDLIhVNCEdVLiZlUS5L1I2WVy5Ra93OoaIASf4JPKDFz1EKyrVnLRbM2x+g8pS20oZXj0Ol7MZpqUCMVXElwkLKjETLcNA3cK2/Jy/h5ch5FiwSHB8rKdCft/kq+C2bAyyt2P5Ga3OE3y3sHwNEq/ricVKPNs8TnLbpyjaHiDa66YMbxZMq8g9nQQIhWJdfvOUGNx5DOwdwK5FGdzLjRzJ6Zyt6or9za440v++MdAPqzVHy5iIs18a4fs/Z23y8PTBFkT/cw4051B8z0XhLPxxlShLSr68gxIfbqZ6wUt3Szf0i8N6ZwHkbM8gytvKi1Hrcq3cZnFf5P8rQd5jWY9S6NV9lPAzqQCfWvyxrEa7k0KnH57hHUEoNVLOPJsYbFVJty5Ewd3+WVUyDVPCTRAGeMRrXpzw4dypzAsPdQQkF51akvpFHCT6KK7wTWYG/0J4aWzJjmlvchHdF2xBr1lqmQ9DGiC1E/b7hZqseparPk1Z3e4x2dZmLhGH+pJCw4uL73wVezmQbWCCzmnFjCkB4iqFfbnNxSUefSTcIghSRac9HjgTs834bSriMaCPiFZ3AoELiN72Lp4+V6Sv4OH5cAZO0 nEVM6/Vv Lv3nw1+PNvOJ5xrmAsTzQk4lOB93YJkM9WkWKHOsLUZ1RI2pKeUBMRGk+LOcwgdyzXnMcm6Cvn6pnx8YnRiK8MkKvTblCX3AZnILfdiXxLPF+efJVyfw3KsE6a4sRnG3yFIGLxz9dZ16rC/beqAJ6x4Yj69jYufq7j9sFz5XBVBlzWzMq19/rXjE6O1J5Ux5GyPEXL7p7MGhMNIx2FAlw22iJ0GN8iiqwuE8ZTQobc8Ss2zQSdX8rNJdftF/Sd9s8Q8WgVfMPtVUTOhoC6LWG/LvfMg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: MADV_FREE handling for [process_]madvise() flushes tlb for each vma of each address range. Update the logic to do tlb flushes in a batched way. Initialize an mmu_gather object from do_madvise() and vector_madvise(), which are the entry level functions for [process_]madvise(), respectively. And pass those objects to the function for per-vma work, via madvise_behavior struct. Make the per-vma logic not flushes tlb on their own but just saves the tlb entries to the received mmu_gather object. Finally, the entry level functions flush the tlb entries that gathered for the entire user request, at once. Signed-off-by: SeongJae Park --- mm/madvise.c | 59 +++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 47 insertions(+), 12 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 8bcfdd995d18..564095e381b2 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -799,12 +799,13 @@ static const struct mm_walk_ops madvise_free_walk_ops = { .walk_lock = PGWALK_RDLOCK, }; -static int madvise_free_single_vma(struct vm_area_struct *vma, - unsigned long start_addr, unsigned long end_addr) +static int madvise_free_single_vma( + struct madvise_behavior *behavior, struct vm_area_struct *vma, + unsigned long start_addr, unsigned long end_addr) { struct mm_struct *mm = vma->vm_mm; struct mmu_notifier_range range; - struct mmu_gather tlb; + struct mmu_gather *tlb = behavior->tlb; /* MADV_FREE works for only anon vma at the moment */ if (!vma_is_anonymous(vma)) @@ -820,17 +821,14 @@ static int madvise_free_single_vma(struct vm_area_struct *vma, range.start, range.end); lru_add_drain(); - tlb_gather_mmu(&tlb, mm); update_hiwater_rss(mm); mmu_notifier_invalidate_range_start(&range); - tlb_start_vma(&tlb, vma); + tlb_start_vma(tlb, vma); walk_page_range(vma->vm_mm, range.start, range.end, - &madvise_free_walk_ops, &tlb); - tlb_end_vma(&tlb, vma); + &madvise_free_walk_ops, tlb); + tlb_end_vma(tlb, vma); mmu_notifier_invalidate_range_end(&range); - tlb_finish_mmu(&tlb); - return 0; } @@ -953,7 +951,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, if (action == MADV_DONTNEED || action == MADV_DONTNEED_LOCKED) return madvise_dontneed_single_vma(vma, start, end); else if (action == MADV_FREE) - return madvise_free_single_vma(vma, start, end); + return madvise_free_single_vma(behavior, vma, start, end); else return -EINVAL; } @@ -1626,6 +1624,29 @@ static void madvise_unlock(struct mm_struct *mm, int behavior) mmap_read_unlock(mm); } +static bool madvise_batch_tlb_flush(int behavior) +{ + switch (behavior) { + case MADV_FREE: + return true; + default: + return false; + } +} + +static void madvise_init_tlb(struct madvise_behavior *madv_behavior, + struct mm_struct *mm) +{ + if (madvise_batch_tlb_flush(madv_behavior->behavior)) + tlb_gather_mmu(madv_behavior->tlb, mm); +} + +static void madvise_finish_tlb(struct madvise_behavior *madv_behavior) +{ + if (madvise_batch_tlb_flush(madv_behavior->behavior)) + tlb_finish_mmu(madv_behavior->tlb); +} + static bool is_valid_madvise(unsigned long start, size_t len_in, int behavior) { size_t len; @@ -1782,14 +1803,20 @@ static int madvise_do_behavior(struct mm_struct *mm, int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior) { int error; - struct madvise_behavior madv_behavior = {.behavior = behavior}; + struct mmu_gather tlb; + struct madvise_behavior madv_behavior = { + .behavior = behavior, + .tlb = &tlb, + }; if (madvise_should_skip(start, len_in, behavior, &error)) return error; error = madvise_lock(mm, behavior); if (error) return error; + madvise_init_tlb(&madv_behavior, mm); error = madvise_do_behavior(mm, start, len_in, &madv_behavior); + madvise_finish_tlb(&madv_behavior); madvise_unlock(mm, behavior); return error; @@ -1806,13 +1833,18 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter, { ssize_t ret = 0; size_t total_len; - struct madvise_behavior madv_behavior = {.behavior = behavior}; + struct mmu_gather tlb; + struct madvise_behavior madv_behavior = { + .behavior = behavior, + .tlb = &tlb, + }; total_len = iov_iter_count(iter); ret = madvise_lock(mm, behavior); if (ret) return ret; + madvise_init_tlb(&madv_behavior, mm); while (iov_iter_count(iter)) { unsigned long start = (unsigned long)iter_iov_addr(iter); @@ -1841,14 +1873,17 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter, } /* Drop and reacquire lock to unwind race. */ + madvise_finish_tlb(&madv_behavior); madvise_unlock(mm, behavior); madvise_lock(mm, behavior); + madvise_init_tlb(&madv_behavior, mm); continue; } if (ret < 0) break; iov_iter_advance(iter, iter_iov_len(iter)); } + madvise_finish_tlb(&madv_behavior); madvise_unlock(mm, behavior); ret = (total_len - iov_iter_count(iter)) ? : ret;