From patchwork Thu Sep 7 07:54:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gregory Price X-Patchwork-Id: 13377668 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3152DEE57CD for ; Fri, 8 Sep 2023 16:44:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245130AbjIHQoj (ORCPT ); Fri, 8 Sep 2023 12:44:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244653AbjIHQoi (ORCPT ); Fri, 8 Sep 2023 12:44:38 -0400 Received: from mail-yw1-x1144.google.com (mail-yw1-x1144.google.com [IPv6:2607:f8b0:4864:20::1144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9250B1FCD; Fri, 8 Sep 2023 09:44:34 -0700 (PDT) Received: by mail-yw1-x1144.google.com with SMTP id 00721157ae682-592976e5b6dso22338147b3.2; Fri, 08 Sep 2023 09:44:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694191473; x=1694796273; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qh5+IZUz/0Kl8GWQ4GoTY9C99F+VOAz3Glaap/hxJSs=; b=qzoqfQuSJ+ubNrlYZszBj6NvPzgvAB7e6pGF3KTDwEEuXOpN0DDUi5MHIWWaq0ML0p gHJtvbTSHnRurBgmyerA2fa00hOSrGBWpvkgDoAagWvpQsTHgdcS7qPeDp3+zylJxOe9 jQ0+ji8LlgAeZuvhL9YBo3dRwMEl0z5qAPsfYESGday8Vy1z2v16DnQawEO+qyQ4qhJR xfZX1Kp15ZLaKJEoLHNhyTHK6NYvzPie8KF6jUJfx7yGTDHMsvegIpo0f8cHU+ANemud q9CS5qLqzw+Zmr9SQRbqqa0qLDnHiMpps/FZ+P/Xtgh7CAtAvxFB9Fn9/3oV0EY18xz6 APSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694191473; x=1694796273; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qh5+IZUz/0Kl8GWQ4GoTY9C99F+VOAz3Glaap/hxJSs=; b=Qsn+4ChlCi5sJOtRSs1k1eD5jKJ0sUTBQe7Nu9iL3TpWRgkXcc3s0NPH1tXEC9cFpS /Hi8TAi5w3CUTgLWJMVtgncE9VAY5UqfZLHbVnqFNmdsCZTY6xngodaUXmZHSmbH2zXL GeNVxDe8lTmRQt/s2W+0bJCF13I/hCaaABmba1VMMMmgonyxlRhASqaFJsRTJa7rk2g6 e5/z2I3zZ/s+pWToGO+PcumPovg0iA060iRoSWJ/qRrVu/Y/tBj5cdi4qHi8UgKCaNN/ hr4qFmjXWVy9yf7+heVpoAjQNoRWZQseYTQusD7TESmcBnT+VnXQAQSemsqs6ww+78fC QsMA== X-Gm-Message-State: AOJu0Yy+nMu8ZoHqWqy7XqJM7DcKm7TVbeszdRgjHz78kYxa67THyo14 Ulg0itueTy5I58AdYPS7YtfpKStWel+y X-Google-Smtp-Source: AGHT+IEsl4emTVdfg2hwUVzyyr8JweQ/Dx4686fO//LivaIWYzDbNvsFc4vVR8EJIa6+NkoJUUa40g== X-Received: by 2002:a25:aa09:0:b0:d7e:dd21:9b16 with SMTP id s9-20020a25aa09000000b00d7edd219b16mr2747749ybi.8.1694191473598; Fri, 08 Sep 2023 09:44:33 -0700 (PDT) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id e66-20020a253745000000b00d7ba7de90casm438858yba.51.2023.09.08.09.44.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Sep 2023 09:44:33 -0700 (PDT) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-mm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, linux-cxl@vger.kernel.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, arnd@arndb.de, akpm@linux-foundation.org, x86@kernel.org, Gregory Price Subject: [RFC PATCH 1/3] mm/migrate: remove unused mm argument from do_move_pages_to_node Date: Thu, 7 Sep 2023 03:54:51 -0400 Message-Id: <20230907075453.350554-2-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230907075453.350554-1-gregory.price@memverge.com> References: <20230907075453.350554-1-gregory.price@memverge.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-cxl@vger.kernel.org preparatory work to re-use do_move_pages_to_node with a physical address instead of virtual address. This function does not actively use the mm_struct, so it can be removed. Signed-off-by: Gregory Price --- mm/migrate.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index b7fa020003f3..6ecb1e68c34a 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2026,8 +2026,7 @@ static int store_status(int __user *status, int start, int value, int nr) return 0; } -static int do_move_pages_to_node(struct mm_struct *mm, - struct list_head *pagelist, int node) +static int do_move_pages_to_node(struct list_head *pagelist, int node) { int err; struct migration_target_control mtc = { @@ -2123,7 +2122,7 @@ static int add_page_for_migration(struct mm_struct *mm, const void __user *p, return err; } -static int move_pages_and_store_status(struct mm_struct *mm, int node, +static int move_pages_and_store_status(int node, struct list_head *pagelist, int __user *status, int start, int i, unsigned long nr_pages) { @@ -2132,7 +2131,7 @@ static int move_pages_and_store_status(struct mm_struct *mm, int node, if (list_empty(pagelist)) return 0; - err = do_move_pages_to_node(mm, pagelist, node); + err = do_move_pages_to_node(pagelist, node); if (err) { /* * Positive err means the number of failed @@ -2190,7 +2189,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, current_node = node; start = i; } else if (node != current_node) { - err = move_pages_and_store_status(mm, current_node, + err = move_pages_and_store_status(current_node, &pagelist, status, start, i, nr_pages); if (err) goto out; @@ -2225,7 +2224,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, if (err) goto out_flush; - err = move_pages_and_store_status(mm, current_node, &pagelist, + err = move_pages_and_store_status(current_node, &pagelist, status, start, i, nr_pages); if (err) { /* We have accounted for page i */ @@ -2237,7 +2236,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, } out_flush: /* Make sure we do not overwrite the existing error */ - err1 = move_pages_and_store_status(mm, current_node, &pagelist, + err1 = move_pages_and_store_status(current_node, &pagelist, status, start, i, nr_pages); if (err >= 0) err = err1; From patchwork Thu Sep 7 07:54:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gregory Price X-Patchwork-Id: 13377669 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5031EE14D8 for ; Fri, 8 Sep 2023 16:44:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245161AbjIHQoq (ORCPT ); Fri, 8 Sep 2023 12:44:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41988 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242648AbjIHQok (ORCPT ); Fri, 8 Sep 2023 12:44:40 -0400 Received: from mail-yb1-xb44.google.com (mail-yb1-xb44.google.com [IPv6:2607:f8b0:4864:20::b44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4CBFD1FCD; Fri, 8 Sep 2023 09:44:36 -0700 (PDT) Received: by mail-yb1-xb44.google.com with SMTP id 3f1490d57ef6-d7bbaa8efa7so2118555276.3; Fri, 08 Sep 2023 09:44:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694191475; x=1694796275; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6pK2+v+bJerZC0QtlWNJlcEIzNzg5GzRuGHa684VAWc=; b=joEfwr+uaU3RH5SEOm/0R+Wr1+IFITHnLNM1EISQdOYEXpLkrFFnqvWqkFO79nllXe XxelEJUUbt0G+UKxl3JHuaCU9boCSd/VEBbMuWhcCwUpAXBbsyjeXJ3mJj9HQrqahEIj syJKprVcLbzk36Ddrl1FWwqtPYG6bYBgSVkHogokHnxUcBIJnKyqLi/yX/MchGSXVJn/ MxDrs1047IgKLUDxYkcm6VsCEniXbXcrO+RnkR9an3sWd3lnRWAkg6ip7ECD1Xy2Lgz2 cmRSqziD0VbKy+XGto96aKoDqq7Ecvhh2/RsA52gvZL4cKA4nw+wrdCAFr4EIdhR82hA hZcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694191475; x=1694796275; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6pK2+v+bJerZC0QtlWNJlcEIzNzg5GzRuGHa684VAWc=; b=pUojS1Wk0JrAF7yvIdk+3eA0XD8HA1zKQ53OocrDMCgoUsmFs4qE+2pKsrIUfzxmf8 xvxUnscPaWXXPdR6t42xXh1pfY1lD7OD8susUlJI1X4EVqpo34QLn5yo1Z6XxFIwJyI9 3yl77AlkM5EU2h+7phO+1ZCNyCw2qyV+XaVQsNKOF+vdgohXT2N1EZ0PzcGAWNiEYIzY Ia97kW24ZPNQvi9vwyxUvLLagyLCznL+C/5teT9S2iBeteqdDWNSH8AmQEGU2+VeO90H T766380/C5fc3EOOENObxucjH0o1JVsmLNmm28QOenq4Hn5VxZ+5WczVexiEMnKvbdiB 9NBQ== X-Gm-Message-State: AOJu0YxzrDPHqKiQNOldOh0vpF0WlgPcRYAZ6BCGz93q9LxAa6Wcppz7 +bBzD7YfyTo6KhhftvPtIPLVvoOqHLkj X-Google-Smtp-Source: AGHT+IH8b2tXIVpgN1XyM9kOUwOB6usVT3QdiD3+lKpMNy6cRv1gHeA7bcYf3HIM/r/ufJRyD8H6pA== X-Received: by 2002:a25:9f87:0:b0:cff:ff4a:8bf with SMTP id u7-20020a259f87000000b00cffff4a08bfmr2469879ybq.36.1694191475316; Fri, 08 Sep 2023 09:44:35 -0700 (PDT) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id e66-20020a253745000000b00d7ba7de90casm438858yba.51.2023.09.08.09.44.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Sep 2023 09:44:35 -0700 (PDT) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-mm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, linux-cxl@vger.kernel.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, arnd@arndb.de, akpm@linux-foundation.org, x86@kernel.org, Gregory Price Subject: [RFC PATCH 2/3] mm/migrate: refactor add_page_for_migration for code re-use Date: Thu, 7 Sep 2023 03:54:52 -0400 Message-Id: <20230907075453.350554-3-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230907075453.350554-1-gregory.price@memverge.com> References: <20230907075453.350554-1-gregory.price@memverge.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-cxl@vger.kernel.org add_page_for_migration presently does two actions: 1) validates the page is present and migratable 2) isolates the page from LRU and puts it into the migration list Break add_page_for_migration into 2 functions: add_page_for_migration - isolate the page from LUR and add to list add_virt_page_for_migration - validate the page and call the above add_page_for_migration does not require the mm_struct and so can be re-used for a physical addressing version of move_pages Signed-off-by: Gregory Price --- mm/migrate.c | 79 ++++++++++++++++++++++++++++++---------------------- 1 file changed, 46 insertions(+), 33 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index 6ecb1e68c34a..3506b8202937 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2042,52 +2042,33 @@ static int do_move_pages_to_node(struct list_head *pagelist, int node) } /* - * Resolves the given address to a struct page, isolates it from the LRU and - * puts it to the given pagelist. + * Isolates the page from the LRU and puts it into the given pagelist * Returns: * errno - if the page cannot be found/isolated * 0 - when it doesn't have to be migrated because it is already on the * target node * 1 - when it has been queued */ -static int add_page_for_migration(struct mm_struct *mm, const void __user *p, - int node, struct list_head *pagelist, bool migrate_all) +static int add_page_for_migration(struct page *page, int node, + struct list_head *pagelist, bool migrate_all) { - struct vm_area_struct *vma; - unsigned long addr; - struct page *page; int err; bool isolated; - mmap_read_lock(mm); - addr = (unsigned long)untagged_addr_remote(mm, p); - - err = -EFAULT; - vma = vma_lookup(mm, addr); - if (!vma || !vma_migratable(vma)) - goto out; - - /* FOLL_DUMP to ignore special (like zero) pages */ - page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP); - - err = PTR_ERR(page); - if (IS_ERR(page)) - goto out; - err = -ENOENT; if (!page) goto out; if (is_zone_device_page(page)) - goto out_putpage; + goto out; err = 0; if (page_to_nid(page) == node) - goto out_putpage; + goto out; err = -EACCES; if (page_mapcount(page) > 1 && !migrate_all) - goto out_putpage; + goto out; if (PageHuge(page)) { if (PageHead(page)) { @@ -2101,7 +2082,7 @@ static int add_page_for_migration(struct mm_struct *mm, const void __user *p, isolated = isolate_lru_page(head); if (!isolated) { err = -EBUSY; - goto out_putpage; + goto out; } err = 1; @@ -2110,12 +2091,44 @@ static int add_page_for_migration(struct mm_struct *mm, const void __user *p, NR_ISOLATED_ANON + page_is_file_lru(head), thp_nr_pages(head)); } -out_putpage: - /* - * Either remove the duplicate refcount from - * isolate_lru_page() or drop the page ref if it was - * not isolated. - */ +out: + return err; +} + +/* + * Resolves the given address to a struct page, isolates it from the LRU and + * puts it to the given pagelist. + * Returns: + * errno - if the page cannot be found/isolated + * 0 - when it doesn't have to be migrated because it is already on the + * target node + * 1 - when it has been queued + */ +static int add_virt_page_for_migration(struct mm_struct *mm, + const void __user *p, int node, struct list_head *pagelist, + bool migrate_all) +{ + struct vm_area_struct *vma; + unsigned long addr; + struct page *page; + int err = -EFAULT; + + mmap_read_lock(mm); + addr = (unsigned long)untagged_addr_remote(mm, p); + + vma = vma_lookup(mm, addr); + if (!vma || !vma_migratable(vma)) + goto out; + + /* FOLL_DUMP to ignore special (like zero) pages */ + page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP); + + err = PTR_ERR(page); + if (IS_ERR(page)) + goto out; + + err = add_page_for_migration(page, node, pagelist, migrate_all); + put_page(page); out: mmap_read_unlock(mm); @@ -2201,7 +2214,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, * Errors in the page lookup or isolation are not fatal and we simply * report them via status */ - err = add_page_for_migration(mm, p, current_node, &pagelist, + err = add_virt_page_for_migration(mm, p, current_node, &pagelist, flags & MPOL_MF_MOVE_ALL); if (err > 0) { From patchwork Thu Sep 7 07:54:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gregory Price X-Patchwork-Id: 13377670 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B54DEC8756 for ; Fri, 8 Sep 2023 16:44:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245168AbjIHQoq (ORCPT ); Fri, 8 Sep 2023 12:44:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244177AbjIHQom (ORCPT ); Fri, 8 Sep 2023 12:44:42 -0400 Received: from mail-yb1-xb42.google.com (mail-yb1-xb42.google.com [IPv6:2607:f8b0:4864:20::b42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DFDA41FD5; Fri, 8 Sep 2023 09:44:37 -0700 (PDT) Received: by mail-yb1-xb42.google.com with SMTP id 3f1490d57ef6-d801c83325fso975244276.0; Fri, 08 Sep 2023 09:44:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694191477; x=1694796277; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=94ZndJQXH4f0WPgl5UOmwjlTxLPyJ6bAw+ixqralC30=; b=UhtWvjt9cWI29CWsGiwVYNmScgrBCratVDNO4R8fwlkqrbpNnYQ75UwqB6P/nozVTB GtJlLwgQB66L6AxRNGCp1ZlpG/RiydQrAFVNGmcI9HyQgdvY70c0UgQ8Gl4KFM8RcTH5 zbjE4MIbWWy4zG2Z3MoWztLb81pPnmXE2ZYmBOnz/EvGPgTEBZ0jeo8fZzAnOCGB8F+r QLLSVTxEXlkq0VqHVf4mUGq3ofDvxieaVMt6aW6a3WCaad/MThwehFSDTDfrklqXk9nR qVlIkXJmJdy+j7NGTeKpBAF2QWBnrzJG1nL9H84aN9i2gC0uxB+7VtWrVMVEDbwrViPJ Fxdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694191477; x=1694796277; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=94ZndJQXH4f0WPgl5UOmwjlTxLPyJ6bAw+ixqralC30=; b=Wet/ArkHIBjlTQZS2ez+VqybGgHUZFvb3NEcwT9+0ZvpXSTQWZQl63gQGqrBifm1w8 2/KvdkOdeattANlbP21cv8W4gnj68+9abZxvC33MUwEptR1hxGDAglCAGcxLVxRhTU9U 6vTZQdo3IejE+YxP+DH6/KlMYZewXqY9FDaZ3Qt8qEdy8ru/NIB48oHkKiCC97+S8hB5 +zrMOmfJ84eCW0jEJ84E/V4uK3YsCCknhxbLlWO3D3MV57XHVPVxfG6Yv7E4+u0kmnKp rfbnmebGTys+ea9ErAJ3rjSRcrLiPP5IJYI1MpA3CaEl43Zfj4YNkqzDGDkyOB9Xdo8T 61bw== X-Gm-Message-State: AOJu0Yxlqr82x9rq/7AwNz0m+HVtZmyGvPRDI557lqGFq7gOYQwj6/ej 9Pm/U2+arUdTlagFdBr23pVNaQ+6/JV6 X-Google-Smtp-Source: AGHT+IEXVPPbY0ZtvYADM/uaCH+HzpzYokt21eT1sYJm7VNHWg4hLx78Q2VNg1Q4YeCEVBm/zsZ43g== X-Received: by 2002:a25:aea1:0:b0:cfe:8cbf:5d28 with SMTP id b33-20020a25aea1000000b00cfe8cbf5d28mr2784936ybj.31.1694191476849; Fri, 08 Sep 2023 09:44:36 -0700 (PDT) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id e66-20020a253745000000b00d7ba7de90casm438858yba.51.2023.09.08.09.44.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Sep 2023 09:44:36 -0700 (PDT) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-mm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, linux-cxl@vger.kernel.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, arnd@arndb.de, akpm@linux-foundation.org, x86@kernel.org, Gregory Price Subject: [RFC PATCH 3/3] mm/migrate: Create move_phys_pages syscall Date: Thu, 7 Sep 2023 03:54:53 -0400 Message-Id: <20230907075453.350554-4-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230907075453.350554-1-gregory.price@memverge.com> References: <20230907075453.350554-1-gregory.price@memverge.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-cxl@vger.kernel.org Similar to the move_pages system call, instead of taking a pid and list of virtual addresses, this system call takes a list of physical addresses. Because there is no task to validate the memory policy against, each page needs to be interrogated to determine whether the migration is valid, and all tasks that map it need to be interrogated. This is accomplished via an rmap_walk on the folio containing the page, and interrogating all tasks that map the page. Each page must be interrogated individually, which should be considered when using this to migrate shared regions. The remaining logic is the same as the move_pages syscall. One change to do_pages_move is made (to check whether an mm_struct is passed) in order to re-use the existing migration code. Signed-off-by: Gregory Price --- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + include/linux/syscalls.h | 5 + include/uapi/asm-generic/unistd.h | 8 +- kernel/sys_ni.c | 1 + mm/migrate.c | 178 +++++++++++++++++++++++- tools/include/uapi/asm-generic/unistd.h | 8 +- 7 files changed, 197 insertions(+), 5 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 2d0b1bd866ea..25db6d71af0c 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -457,3 +457,4 @@ 450 i386 set_mempolicy_home_node sys_set_mempolicy_home_node 451 i386 cachestat sys_cachestat 452 i386 fchmodat2 sys_fchmodat2 +454 i386 move_phys_pages sys_move_phys_pages diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 1d6eee30eceb..9676f2e7698c 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -375,6 +375,7 @@ 451 common cachestat sys_cachestat 452 common fchmodat2 sys_fchmodat2 453 64 map_shadow_stack sys_map_shadow_stack +454 common move_phys_pages sys_move_phys_pages # # Due to a historical design error, certain syscalls are numbered differently diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 22bc6bc147f8..6860675a942f 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -821,6 +821,11 @@ asmlinkage long sys_move_pages(pid_t pid, unsigned long nr_pages, const int __user *nodes, int __user *status, int flags); +asmlinkage long sys_move_phys_pages(unsigned long nr_pages, + const void __user * __user *pages, + const int __user *nodes, + int __user *status, + int flags); asmlinkage long sys_rt_tgsigqueueinfo(pid_t tgid, pid_t pid, int sig, siginfo_t __user *uinfo); asmlinkage long sys_perf_event_open( diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index abe087c53b4b..8838fcfaf261 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -823,8 +823,14 @@ __SYSCALL(__NR_cachestat, sys_cachestat) #define __NR_fchmodat2 452 __SYSCALL(__NR_fchmodat2, sys_fchmodat2) +/* CONFIG_MMU only */ +#ifndef __ARCH_NOMMU +#define __NR_move_phys_pages 454 +__SYSCALL(__NR_move_phys_pages, sys_move_phys_pages) +#endif + #undef __NR_syscalls -#define __NR_syscalls 453 +#define __NR_syscalls 455 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index e137c1385c56..07441b10f92a 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -192,6 +192,7 @@ COND_SYSCALL(migrate_pages); COND_SYSCALL(move_pages); COND_SYSCALL(set_mempolicy_home_node); COND_SYSCALL(cachestat); +COND_SYSCALL(move_phys_pages); COND_SYSCALL(perf_event_open); COND_SYSCALL(accept4); diff --git a/mm/migrate.c b/mm/migrate.c index 3506b8202937..8a6f1eb6e512 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2161,6 +2161,101 @@ static int move_pages_and_store_status(int node, return store_status(status, start, node, i - start); } +struct rmap_page_ctxt { + bool found; + bool migratable; + bool node_allowed; + int node; +}; + +/* + * Walks each vma mapping a given page and determines if those + * vma's are both migratable, and that the target node is within + * the allowed cpuset of the owning task. + */ +static bool phys_page_migratable(struct folio *folio, + struct vm_area_struct *vma, + unsigned long address, + void *arg) +{ + struct rmap_page_ctxt *ctxt = (struct rmap_page_ctxt *)arg; + struct task_struct *owner = vma->vm_mm->owner; + nodemask_t task_nodes = cpuset_mems_allowed(owner); + + ctxt->found |= true; + ctxt->migratable &= vma_migratable(vma); + ctxt->node_allowed &= node_isset(ctxt->node, task_nodes); + + return ctxt->migratable && ctxt->node_allowed; +} + +static struct folio *phys_migrate_get_folio(struct page *page) +{ + struct folio *folio; + + folio = page_folio(page); + if (!folio_test_lru(folio) || !folio_try_get(folio)) + return NULL; + if (unlikely(page_folio(page) != folio || !folio_test_lru(folio))) { + folio_put(folio); + folio = NULL; + } + return folio; +} + +/* + * Validates the physical address is online and migratable. Walks the folio + * containing the page to validate the vma is migratable and the cpuset node + * restrictions. Then calls add_page_for_migration to isolate it from the + * LRU and place it into the given pagelist. + * Returns: + * errno - if the page is not online, migratable, or can't be isolated + * 0 - when it doesn't have to be migrated because it is already on the + * target node + * 1 - when it has been queued + */ +static int add_phys_page_for_migration(const void __user *p, int node, + struct list_head *pagelist, + bool migrate_all) +{ + unsigned long pfn; + struct page *page; + struct folio *folio; + int err; + struct rmap_page_ctxt rmctxt = { + .found = false, + .migratable = true, + .node_allowed = true, + .node = node + }; + struct rmap_walk_control rwc = { + .rmap_one = phys_page_migratable, + .arg = &rmctxt + }; + + pfn = ((unsigned long)p) >> PAGE_SHIFT; + page = pfn_to_online_page(pfn); + if (!page || PageTail(page)) + return -ENOENT; + + folio = phys_migrate_get_folio(page); + if (folio) { + rmap_walk(folio, &rwc); + folio_put(folio); + } + + if (!rmctxt.found) + err = -ENOENT; + else if (!rmctxt.migratable) + err = -EFAULT; + else if (!rmctxt.node_allowed) + err = -EACCES; + else + err = add_page_for_migration(page, node, pagelist, migrate_all); + + return err; +} + /* * Migrate an array of page address onto an array of nodes and fill * the corresponding array of status. @@ -2214,8 +2309,13 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, * Errors in the page lookup or isolation are not fatal and we simply * report them via status */ - err = add_virt_page_for_migration(mm, p, current_node, &pagelist, - flags & MPOL_MF_MOVE_ALL); + if (mm) + err = add_virt_page_for_migration(mm, p, current_node, &pagelist, + flags & MPOL_MF_MOVE_ALL); + else + err = add_phys_page_for_migration(p, current_node, &pagelist, + flags & MPOL_MF_MOVE_ALL); + if (err > 0) { /* The page is successfully queued for migration */ @@ -2303,6 +2403,36 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages, mmap_read_unlock(mm); } +/* + * Determine the nodes of an array of pages and store it in an array of status. + */ +static void do_phys_pages_stat_array(unsigned long nr_pages, + const void __user **pages, int *status) +{ + unsigned long i; + + for (i = 0; i < nr_pages; i++) { + unsigned long pfn = (unsigned long)(*pages) >> PAGE_SHIFT; + struct page *page = pfn_to_online_page(pfn); + int err = -ENOENT; + + if (!page) + goto set_status; + + get_page(page); + + if (!is_zone_device_page(page)) + err = page_to_nid(page); + + put_page(page); +set_status: + *status = err; + + pages++; + status++; + } +} + static int get_compat_pages_array(const void __user *chunk_pages[], const void __user * __user *pages, unsigned long chunk_nr) @@ -2345,7 +2475,10 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages, break; } - do_pages_stat_array(mm, chunk_nr, chunk_pages, chunk_status); + if (mm) + do_pages_stat_array(mm, chunk_nr, chunk_pages, chunk_status); + else + do_phys_pages_stat_array(chunk_nr, chunk_pages, chunk_status); if (copy_to_user(status, chunk_status, chunk_nr * sizeof(*status))) break; @@ -2446,6 +2579,45 @@ SYSCALL_DEFINE6(move_pages, pid_t, pid, unsigned long, nr_pages, return kernel_move_pages(pid, nr_pages, pages, nodes, status, flags); } +/* + * Move a list of pages in the address space of the currently executing + * process. + */ +static int kernel_move_phys_pages(unsigned long nr_pages, + const void __user * __user *pages, + const int __user *nodes, + int __user *status, int flags) +{ + int err; + nodemask_t target_nodes; + + /* Check flags */ + if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL)) + return -EINVAL; + + if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE)) + return -EPERM; + + /* All tasks mapping each page is checked in phys_page_migratable */ + nodes_setall(target_nodes); + if (nodes) + err = do_pages_move(NULL, target_nodes, nr_pages, pages, + nodes, status, flags); + else + err = do_pages_stat(NULL, nr_pages, pages, status); + + return err; +} + +SYSCALL_DEFINE5(move_phys_pages, unsigned long, nr_pages, + const void __user * __user *, pages, + const int __user *, nodes, + int __user *, status, int, flags) +{ + return kernel_move_phys_pages(nr_pages, pages, nodes, status, flags); +} + + #ifdef CONFIG_NUMA_BALANCING /* * Returns true if this is a safe migration target node for misplaced NUMA diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h index fd6c1cb585db..b140ad444946 100644 --- a/tools/include/uapi/asm-generic/unistd.h +++ b/tools/include/uapi/asm-generic/unistd.h @@ -820,8 +820,14 @@ __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node) #define __NR_cachestat 451 __SYSCALL(__NR_cachestat, sys_cachestat) +/* CONFIG_MMU only */ +#ifndef __ARCH_NOMMU +#define __NR_move_phys_pages 454 +__SYSCALL(__NR_move_phys_pages, sys_move_phys_pages) +#endif + #undef __NR_syscalls -#define __NR_syscalls 452 +#define __NR_syscalls 455 /* * 32 bit systems traditionally used different