From patchwork Mon May 16 12:54:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Jakub_Mat=C4=9Bna?= X-Patchwork-Id: 12850763 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94770C433EF for ; Mon, 16 May 2022 12:53:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B9B16B0072; Mon, 16 May 2022 08:53:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 31FB86B0078; Mon, 16 May 2022 08:53:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 034CF6B0073; Mon, 16 May 2022 08:53:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id DBA446B0072 for ; Mon, 16 May 2022 08:53:47 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8D66C32C9D for ; Mon, 16 May 2022 12:53:47 +0000 (UTC) X-FDA: 79471598094.14.00EC4D5 Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) by imf11.hostedemail.com (Postfix) with ESMTP id 64072400CA for ; Mon, 16 May 2022 12:53:41 +0000 (UTC) Received: by mail-wm1-f43.google.com with SMTP id k126so8643987wme.2 for ; Mon, 16 May 2022 05:53:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1zMWiBKrxJRt4dm7AECg/jq1FaXgoQHTOEXoqP7ndjs=; b=mZvVdyV4R2NcvGq9PkFdJMQvHTRv8eCR1cvl/UDrc9cP/TP6x0tncRCcBuir+2FIdK ZxTdX/iI8L9vq2skwvMtr/gCeUCuGpaAuhNu6q5UvR9CSxtwi99hwLYDmkxcHe00t6ew 6Zlr/N+Ko8pCvL44oTBUHUbtvTel3xc3XVycs6ybFhIZTQtMLHry6Ehj8oSBsjUKEMg5 20Taq3qqMX3qqjoT18hnG05xQ0gVV/pttQErniIEaCnO/+DXYisblcvK+B/0Gi+6BRCa KZSCqQcW8MKFlWmybnuiVfeSrIkjVcaMrgVTl1uSSL9UGZUDcC827xITl6iJcZaVSe+9 oU2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1zMWiBKrxJRt4dm7AECg/jq1FaXgoQHTOEXoqP7ndjs=; b=UvZI/HPcl0MSU9uD3UrAAe9a38Yfweu70pMCKW4uhNO86YbIMb88sLrLdknNpUgGoi 4v7abMcD7Qixv9482IlrPViUNONXd5eoM6wTux37+a+bu65y9dasxPknaAKFhU1H0JtS MTYFT0ZpYwBKg6wfkDGfTVuzM4PN1WMvPfrVnSD4ktvwo+app4SUnMF4AcdLvjyu3tO9 aMCResZn/ixkkjTG9bIgAIgOeQG0qzNCS8PV1Y40y5M1qFlIWb+vYd/QLx4+HmpGvlyX +RkQGCRpqnSWfVOV+U9g8DxRpQhAgz+S7zFYnZWwPOzJ0+Fhr8MXkMTRI2jh0GKMNC95 Mw+Q== X-Gm-Message-State: AOAM531LAg9gEdPtgRbtpKDv/JvmnxgwI8kQ5h+NjVblRxBlo2iat50x rypuQeSd1quox4F6jtyKHlQ= X-Google-Smtp-Source: ABdhPJw8lMmpeW5ADElTuWJi8C91ciKJd0nw+e53cYaNTI5gWF8KbPB3ZQGhRbb9YDktYFilc2BG8A== X-Received: by 2002:a05:600c:3d8c:b0:394:6097:9994 with SMTP id bi12-20020a05600c3d8c00b0039460979994mr26918715wmb.29.1652705625931; Mon, 16 May 2022 05:53:45 -0700 (PDT) Received: from orion.localdomain ([93.99.228.15]) by smtp.gmail.com with ESMTPSA id u23-20020a05600c00d700b003942a244ec2sm9958565wmm.7.2022.05.16.05.53.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 05:53:45 -0700 (PDT) Received: by orion.localdomain (Postfix, from userid 1003) id 8BBBCA0E76; Mon, 16 May 2022 14:54:07 +0200 (CEST) From: =?utf-8?q?Jakub_Mat=C4=9Bna?= To: linux-mm@kvack.org Cc: patches@lists.linux.dev, linux-kernel@vger.kernel.org, vbabka@suse.cz, mhocko@kernel.org, mgorman@techsingularity.net, willy@infradead.org, liam.howlett@oracle.com, hughd@google.com, kirill@shutemov.name, riel@surriel.com, rostedt@goodmis.org, peterz@infradead.org, david@redhat.com, =?utf-8?q?Jakub_Mat=C4=9Bna?= Subject: [RFC PATCH v3 1/6] [PATCH 1/6] mm: refactor of vma_merge() Date: Mon, 16 May 2022 14:54:00 +0200 Message-Id: <20220516125405.1675-2-matenajakub@gmail.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220516125405.1675-1-matenajakub@gmail.com> References: <20220516125405.1675-1-matenajakub@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 64072400CA X-Stat-Signature: h8i8zmea6p3zkc6mncbsqnfqheaemtdc Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=mZvVdyV4; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of matenajakub@gmail.com designates 209.85.128.43 as permitted sender) smtp.mailfrom=matenajakub@gmail.com X-Rspam-User: X-HE-Tag: 1652705621-552837 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Refactor vma_merge() to make it shorter, more understandable and suitable for tracing of successful merges that are made possible by following patches in the series. Main change is the elimination of code duplicity in the case of merge next check. This is done by first doing checks and caching the results before executing the merge itself. Exit paths are also unified. Signed-off-by: Jakub Matěna Acked-by: Kirill A. Shutemov --- mm/mmap.c | 81 +++++++++++++++++++++++++------------------------------ 1 file changed, 36 insertions(+), 45 deletions(-) diff --git a/mm/mmap.c b/mm/mmap.c index 3aa839f81e63..4a4611443593 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1171,7 +1171,9 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm, { pgoff_t pglen = (end - addr) >> PAGE_SHIFT; struct vm_area_struct *area, *next; - int err; + int err = -1; + bool merge_prev = false; + bool merge_next = false; /* * We later require that vma->vm_flags == vm_flags, @@ -1190,66 +1192,55 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm, VM_WARN_ON(area && end > area->vm_end); VM_WARN_ON(addr >= end); - /* - * Can it merge with the predecessor? - */ + /* Can we merge the predecessor? */ if (prev && prev->vm_end == addr && mpol_equal(vma_policy(prev), policy) && can_vma_merge_after(prev, vm_flags, anon_vma, file, pgoff, vm_userfaultfd_ctx, anon_name)) { - /* - * OK, it can. Can we now merge in the successor as well? - */ - if (next && end == next->vm_start && - mpol_equal(policy, vma_policy(next)) && - can_vma_merge_before(next, vm_flags, - anon_vma, file, - pgoff+pglen, - vm_userfaultfd_ctx, anon_name) && - is_mergeable_anon_vma(prev->anon_vma, - next->anon_vma, NULL)) { - /* cases 1, 6 */ - err = __vma_adjust(prev, prev->vm_start, - next->vm_end, prev->vm_pgoff, NULL, - prev); - } else /* cases 2, 5, 7 */ - err = __vma_adjust(prev, prev->vm_start, - end, prev->vm_pgoff, NULL, prev); - if (err) - return NULL; - khugepaged_enter_vma_merge(prev, vm_flags); - return prev; + merge_prev = true; + area = prev; } - - /* - * Can this new request be merged in front of next? - */ + /* Can we merge the successor? */ if (next && end == next->vm_start && mpol_equal(policy, vma_policy(next)) && can_vma_merge_before(next, vm_flags, anon_vma, file, pgoff+pglen, vm_userfaultfd_ctx, anon_name)) { + merge_next = true; + } + /* Can we merge both the predecessor and the successor? */ + if (merge_prev && merge_next && + is_mergeable_anon_vma(prev->anon_vma, + next->anon_vma, NULL)) { /* cases 1, 6 */ + err = __vma_adjust(prev, prev->vm_start, + next->vm_end, prev->vm_pgoff, NULL, + prev); + } else if (merge_prev) { /* cases 2, 5, 7 */ + err = __vma_adjust(prev, prev->vm_start, + end, prev->vm_pgoff, NULL, prev); + } else if (merge_next) { if (prev && addr < prev->vm_end) /* case 4 */ err = __vma_adjust(prev, prev->vm_start, - addr, prev->vm_pgoff, NULL, next); - else { /* cases 3, 8 */ + addr, prev->vm_pgoff, NULL, next); + else /* cases 3, 8 */ err = __vma_adjust(area, addr, next->vm_end, - next->vm_pgoff - pglen, NULL, next); - /* - * In case 3 area is already equal to next and - * this is a noop, but in case 8 "area" has - * been removed and next was expanded over it. - */ - area = next; - } - if (err) - return NULL; - khugepaged_enter_vma_merge(area, vm_flags); - return area; + next->vm_pgoff - pglen, NULL, next); + /* + * In case 3 and 4 area is already equal to next and + * this is a noop, but in case 8 "area" has + * been removed and next was expanded over it. + */ + area = next; } - return NULL; + /* + * Cannot merge with predecessor or successor or error in __vma_adjust? + */ + if (err) + return NULL; + khugepaged_enter_vma_merge(area, vm_flags); + return area; } /* From patchwork Mon May 16 12:54:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Jakub_Mat=C4=9Bna?= X-Patchwork-Id: 12850762 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E4BBC433F5 for ; Mon, 16 May 2022 12:53:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 122006B0071; Mon, 16 May 2022 08:53:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D11E6B0072; Mon, 16 May 2022 08:53:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EDABE6B0075; Mon, 16 May 2022 08:53:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id DAD586B0071 for ; Mon, 16 May 2022 08:53:47 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 9837A14BE for ; Mon, 16 May 2022 12:53:47 +0000 (UTC) X-FDA: 79471598094.10.ED81CDA Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) by imf23.hostedemail.com (Postfix) with ESMTP id C2B921400C6 for ; Mon, 16 May 2022 12:53:30 +0000 (UTC) Received: by mail-wm1-f43.google.com with SMTP id bd25-20020a05600c1f1900b0039485220e16so7722057wmb.0 for ; Mon, 16 May 2022 05:53:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=0Z8unS+I3KC7HgCSi3Sa8z96pMNRd1BvZvnwNMD3WTs=; b=A3Dlgex7+DnOnYZLaF9FBzoiuQTihhx3vEfDIIJp3ykd3bn66AKO/ul3Gml8GQK9jh f3hwwjHkZCiJ9DgKTkGT62t+28fyu4fPeH6rpjIS14P866udo2QEa/nNiX3bv362z4Bl rpJU+Bj6Cff+nP2ncVPO9oJ0cnBIsdpRo4ftBVxKQfEBRJFiWMtRksBNCkEDEZkQczGR WvnhZx97jd08S63rv/lP1ZWpwI8Mt8hiY5Uqjse+zT5wigM7wz3Athcrzf2qZgpS8z3H uysUaWSpR4Co9Q6TEPdesUrO5VMOoIu23/e7zZZGR04IZ88yHXaeQq1cL7Oqi10OAYTH IeVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0Z8unS+I3KC7HgCSi3Sa8z96pMNRd1BvZvnwNMD3WTs=; b=gM8sx/egKb4ZH+J3pTFvJHNxoZZAkjhu/eA2OcPNongUAHDdWk6WrmJ7nW+y2Zv0T/ pNb7jloyiQBV4ImffN0xBHgrz2LtllZhMfkoa9GPQ3BMAzOOB7ESPJjrfJvyvzrARjk0 2q4XZ+LHcQlCXiwCgdE5YqQ0rLki9aZwwp41GLMNBjAdR2+2P1ba9KNbTWYdWWnbyEwG IEIHKQy7c1L5MTOWZLLtGd5WzsBCuNprb9rThSGvFD5Ez4TCdcbBTj++JOdpjm4lpCmL JqV9rwa820OOjptlmz9eyxKQo4yJXjjYT05/JQE9Uon8fv9cG4O5JcT4CHBtB6md4kbP 9i6A== X-Gm-Message-State: AOAM530+g0iyOrBh8CaHx7/bw2U0v06COUOGVDm6qMVm0iL2xQM1wp40 pBI4pcVtTA0pwHTHX1n5VPQ= X-Google-Smtp-Source: ABdhPJzV07JC9Mg9zVfcWX1VUOyPfRogVK8bD1F7lIrfsaUsrqugDWj/POWQV+HHzB0bOOI1Q6XAQg== X-Received: by 2002:a05:600c:1da5:b0:395:baf7:ca4c with SMTP id p37-20020a05600c1da500b00395baf7ca4cmr16782305wms.99.1652705625778; Mon, 16 May 2022 05:53:45 -0700 (PDT) Received: from orion.localdomain ([93.99.228.15]) by smtp.gmail.com with ESMTPSA id p15-20020a7bcdef000000b00394517e7d98sm10372100wmj.25.2022.05.16.05.53.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 05:53:45 -0700 (PDT) Received: by orion.localdomain (Postfix, from userid 1003) id 8F349A0E77; Mon, 16 May 2022 14:54:07 +0200 (CEST) From: =?utf-8?q?Jakub_Mat=C4=9Bna?= To: linux-mm@kvack.org Cc: patches@lists.linux.dev, linux-kernel@vger.kernel.org, vbabka@suse.cz, mhocko@kernel.org, mgorman@techsingularity.net, willy@infradead.org, liam.howlett@oracle.com, hughd@google.com, kirill@shutemov.name, riel@surriel.com, rostedt@goodmis.org, peterz@infradead.org, david@redhat.com, =?utf-8?q?Jakub_Mat=C4=9Bna?= Subject: [RFC PATCH v3 2/6] [PATCH 2/6] mm: add merging after mremap resize Date: Mon, 16 May 2022 14:54:01 +0200 Message-Id: <20220516125405.1675-3-matenajakub@gmail.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220516125405.1675-1-matenajakub@gmail.com> References: <20220516125405.1675-1-matenajakub@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: C2B921400C6 X-Stat-Signature: 47rdo3zgdx3bip4xqbj1osd1zjzrakuf X-Rspam-User: Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=A3Dlgex7; spf=pass (imf23.hostedemail.com: domain of matenajakub@gmail.com designates 209.85.128.43 as permitted sender) smtp.mailfrom=matenajakub@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1652705610-181296 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When mremap call results in expansion, it might be possible to merge the VMA with the next VMA which might become adjacent. This patch adds vma_merge call after the expansion is done to try and merge. Signed-off-by: Jakub Matěna --- mm/mremap.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/mm/mremap.c b/mm/mremap.c index 303d3290b938..75cda854ec58 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -9,6 +9,7 @@ */ #include +#include #include #include #include @@ -1022,8 +1023,10 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, } } - if (vma_adjust(vma, vma->vm_start, addr + new_len, - vma->vm_pgoff, NULL)) { + if (!vma_merge(mm, vma, addr + old_len, addr + new_len, + vma->vm_flags, vma->anon_vma, vma->vm_file, + vma->vm_pgoff + (old_len >> PAGE_SHIFT), vma_policy(vma), + vma->vm_userfaultfd_ctx, anon_vma_name(vma))) { vm_unacct_memory(pages); ret = -ENOMEM; goto out; From patchwork Mon May 16 12:54:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Jakub_Mat=C4=9Bna?= X-Patchwork-Id: 12850764 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19D14C4332F for ; Mon, 16 May 2022 12:53:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ABE916B0073; Mon, 16 May 2022 08:53:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A467C6B0075; Mon, 16 May 2022 08:53:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8BF4F6B0078; Mon, 16 May 2022 08:53:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7EC2C6B0073 for ; Mon, 16 May 2022 08:53:48 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 631D51212F8 for ; Mon, 16 May 2022 12:53:48 +0000 (UTC) X-FDA: 79471598136.10.4E9C1F8 Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) by imf09.hostedemail.com (Postfix) with ESMTP id 288C51400BF for ; Mon, 16 May 2022 12:53:37 +0000 (UTC) Received: by mail-wm1-f47.google.com with SMTP id r188-20020a1c44c5000000b003946c466c17so8199726wma.4 for ; Mon, 16 May 2022 05:53:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=4Tb/G+5lkfHnvtEKe+geJ9MpHVXcxF0ik9q8tJTT9QY=; b=MBJ5UVxFXZQ3Bbmw7lohTUXD9BhgHvbzqqUwZ4d8YFzAN3UuUh1QCM6pq0g7YLfYMN eW3GbDhSHySEAVroDk/PVm39k7ugxJM3/2Yvh/OCdZBBjQPqlGVV8kT2LGQA12Hd5RWi Vi9VWMnTegV11lKhOk+gdtP1wq93Fx6WXHJGPTHT85och0Y0yMoNzKJnJtoFkKstu6M2 YrjodRqz2tfFBKib14RA5CsIQAy9EAItKkqAwpTXk94o/m5cG1bVZUapEORJRwPEmcen HKMEOMscXgAfTJC82gnQMt3UrVaUhiFkkds5FM4ZxRrSd9Z/M43rAE7HWm7Yk36041Tf 4RIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4Tb/G+5lkfHnvtEKe+geJ9MpHVXcxF0ik9q8tJTT9QY=; b=Hm9Yc6qs0zBtcDmW47Ag4YQnvAKuNWE3YmhmH++f3jd7q43dymug5L0JDLNJZgppP+ HDcfSaeGcwOAktQGUL83F6UvTmfJb3cxVJczNi8kkUm9VQ6cTNNp6mnwuja4bLwGYYqJ VOxXgH6vd5ChTZ80zpGajsrRcbFaOU66G8DH39xb2JgqSFlc8HgPWGELZWPycXKKFf0n Fec6uqUdYhN1q01DWisLTk8fOCRgaJdOIhbBZZazqwWi7NO+ConAlb7baB4CERNTZ1jg Yx+9FdoCcUnCtnAaix9s8CMynI0Ik/vnLBRZfrAwQ6G1R6R8LAQhr2BjZtidHCBlLZeo /Pow== X-Gm-Message-State: AOAM533VqOF5F7FF6LfjGrtnHZXa3sz7f7BcOUrcF7SWS+4Lb5tZBEdR rpWHD2UxSj1sDhZNnNFNgDU= X-Google-Smtp-Source: ABdhPJyPytLIRy+Nv9XrDz0YVXtUAseeu631D+FB5fIf9KftrEScobtVidHF58lNQ04ZrdDKgsBybQ== X-Received: by 2002:a05:600c:20e:b0:394:2985:6d0c with SMTP id 14-20020a05600c020e00b0039429856d0cmr26778458wmi.106.1652705626579; Mon, 16 May 2022 05:53:46 -0700 (PDT) Received: from orion.localdomain ([93.99.228.15]) by smtp.gmail.com with ESMTPSA id r15-20020a7bc08f000000b00394615cf468sm13162481wmh.28.2022.05.16.05.53.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 05:53:45 -0700 (PDT) Received: by orion.localdomain (Postfix, from userid 1003) id 92F45A0E78; Mon, 16 May 2022 14:54:07 +0200 (CEST) From: =?utf-8?q?Jakub_Mat=C4=9Bna?= To: linux-mm@kvack.org Cc: patches@lists.linux.dev, linux-kernel@vger.kernel.org, vbabka@suse.cz, mhocko@kernel.org, mgorman@techsingularity.net, willy@infradead.org, liam.howlett@oracle.com, hughd@google.com, kirill@shutemov.name, riel@surriel.com, rostedt@goodmis.org, peterz@infradead.org, david@redhat.com, =?utf-8?q?Jakub_Mat=C4=9Bna?= Subject: [RFC PATCH v3 3/6] [PATCH 3/6] mm: add migration waiting and rmap locking to pagewalk Date: Mon, 16 May 2022 14:54:02 +0200 Message-Id: <20220516125405.1675-4-matenajakub@gmail.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220516125405.1675-1-matenajakub@gmail.com> References: <20220516125405.1675-1-matenajakub@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 288C51400BF X-Stat-Signature: 5a3ggoxekszb456ksod1iaib7kb4oxth Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=MBJ5UVxF; spf=pass (imf09.hostedemail.com: domain of matenajakub@gmail.com designates 209.85.128.47 as permitted sender) smtp.mailfrom=matenajakub@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1652705617-860364 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Following patches need to wait for migration and take rmap locks before they work with the pte itself. This feature is a compact change and is therefore extracted into this patch. In order to wait for migration when a page is being migrated, new flag is added to pagewalk to optionally enable waiting for migration at the walk_pte_range_inner() level. Similar flag was added to take rmap locks at the same level. When waiting for migration pte lock and rmap locks must be dropped and taken again after the migration has ended. Similar mechanism is taken if pte_entry() sets ACTION_AGAIN, which happens in the following patch when a deadlock is encountered, because of a different lock order used during the page update. Migration waiting is done only at the PTE level and is presumes no pmd entry is specified. If pmd_entry() is set together with page migration flag a warning is logged. PMD migration waiting can implemented later if anyone needs it. At this time flags can be specified only by calling walk_page_vma(). If needed flags can also be added to other pagewalk API calls. Signed-off-by: Jakub Matěna --- fs/proc/task_mmu.c | 4 +-- include/linux/pagewalk.h | 11 ++++++- include/linux/rmap.h | 2 ++ mm/mremap.c | 17 +--------- mm/pagewalk.c | 71 +++++++++++++++++++++++++++++++++++++--- mm/rmap.c | 16 +++++++++ 6 files changed, 97 insertions(+), 24 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index f46060eb91b5..fd72263456e9 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -794,7 +794,7 @@ static void smap_gather_stats(struct vm_area_struct *vma, #endif /* mmap_lock is held in m_start */ if (!start) - walk_page_vma(vma, ops, mss); + walk_page_vma(vma, ops, mss, 0); else walk_page_range(vma->vm_mm, start, vma->vm_end, ops, mss); } @@ -1938,7 +1938,7 @@ static int show_numa_map(struct seq_file *m, void *v) seq_puts(m, " huge"); /* mmap_lock is held by m_start */ - walk_page_vma(vma, &show_numa_ops, md); + walk_page_vma(vma, &show_numa_ops, md, 0); if (!md->pages) goto out; diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index ac7b38ad5903..07345df51324 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -70,6 +70,13 @@ enum page_walk_action { ACTION_AGAIN = 2 }; +/* Walk flags */ + +/* Wait for migration before pte entry, not implemented for pmd entries */ +#define WALK_MIGRATION 0x1 +/* Take rmap locks before pte entries */ +#define WALK_LOCK_RMAP 0x2 + /** * struct mm_walk - walk_page_range data * @ops: operation to call during the walk @@ -77,6 +84,7 @@ enum page_walk_action { * @pgd: pointer to PGD; only valid with no_vma (otherwise set to NULL) * @vma: vma currently walked (NULL if walking outside vmas) * @action: next action to perform (see enum page_walk_action) + * @flags: flags performing additional operations (see walk flags) * @no_vma: walk ignoring vmas (vma will always be NULL) * @private: private data for callbacks' usage * @@ -88,6 +96,7 @@ struct mm_walk { pgd_t *pgd; struct vm_area_struct *vma; enum page_walk_action action; + unsigned long flags; bool no_vma; void *private; }; @@ -100,7 +109,7 @@ int walk_page_range_novma(struct mm_struct *mm, unsigned long start, pgd_t *pgd, void *private); int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *ops, - void *private); + void *private, unsigned long flags); int walk_page_mapping(struct address_space *mapping, pgoff_t first_index, pgoff_t nr, const struct mm_walk_ops *ops, void *private); diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 17230c458341..d2d5e511dd93 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -138,6 +138,8 @@ static inline void anon_vma_unlock_read(struct anon_vma *anon_vma) */ void anon_vma_init(void); /* create anon_vma_cachep */ int __anon_vma_prepare(struct vm_area_struct *); +void take_rmap_locks(struct vm_area_struct *vma); +void drop_rmap_locks(struct vm_area_struct *vma); void unlink_anon_vmas(struct vm_area_struct *); int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *); int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *); diff --git a/mm/mremap.c b/mm/mremap.c index 75cda854ec58..309fab7ed706 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include @@ -101,22 +102,6 @@ static pmd_t *alloc_new_pmd(struct mm_struct *mm, struct vm_area_struct *vma, return pmd; } -static void take_rmap_locks(struct vm_area_struct *vma) -{ - if (vma->vm_file) - i_mmap_lock_write(vma->vm_file->f_mapping); - if (vma->anon_vma) - anon_vma_lock_write(vma->anon_vma); -} - -static void drop_rmap_locks(struct vm_area_struct *vma) -{ - if (vma->anon_vma) - anon_vma_unlock_write(vma->anon_vma); - if (vma->vm_file) - i_mmap_unlock_write(vma->vm_file->f_mapping); -} - static pte_t move_soft_dirty_pte(pte_t pte) { /* diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 9b3db11a4d1d..0bfb8c9255f3 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -3,6 +3,9 @@ #include #include #include +#include +#include +#include /* * We want to know the real level where a entry is located ignoring any @@ -20,14 +23,62 @@ static int real_depth(int depth) return depth; } +/* + * Relock pte lock and optionally rmap locks to prevent possible deadlock + * @pte: Locked pte + * @addr: Address of the pte + * @walk: Pagewalk structure + * @ptl: Pte spinlock + * @pmd: Pmd to wait for migration * + */ +static void walk_pte_relock(pte_t **pte, unsigned long addr, struct mm_walk *walk, + spinlock_t *ptl, pmd_t *pmd) +{ + if (walk->no_vma) + pte_unmap(*pte); + else + pte_unmap_unlock(*pte, ptl); + + if (walk->flags & WALK_LOCK_RMAP) + drop_rmap_locks(walk->vma); + + if (walk->flags & WALK_MIGRATION) + migration_entry_wait(walk->mm, pmd, addr); + + if (walk->flags & WALK_LOCK_RMAP) + take_rmap_locks(walk->vma); + + if (walk->no_vma) + *pte = pte_offset_map(pmd, addr); + else + *pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl); +} + static int walk_pte_range_inner(pte_t *pte, unsigned long addr, - unsigned long end, struct mm_walk *walk) + unsigned long end, struct mm_walk *walk, + spinlock_t *ptl, pmd_t *pmd) { const struct mm_walk_ops *ops = walk->ops; int err = 0; for (;;) { + walk->action = ACTION_SUBTREE; + if ((walk->flags & WALK_MIGRATION) && !pte_present(*pte)) { + swp_entry_t entry; + + if (!pte_none(*pte)) { + entry = pte_to_swp_entry(*pte); + if (is_migration_entry(entry)) { + walk_pte_relock(&pte, addr, walk, ptl, pmd); + continue; /* retry iteration */ + } + } + } err = ops->pte_entry(pte, addr, addr + PAGE_SIZE, walk); + if (walk->action == ACTION_AGAIN) { + walk_pte_relock(&pte, addr, walk, ptl, pmd); + continue; /* retry iteration */ + } if (err) break; if (addr >= end - PAGE_SIZE) @@ -45,16 +96,22 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, int err = 0; spinlock_t *ptl; + if (walk->flags & WALK_LOCK_RMAP) + take_rmap_locks(walk->vma); + if (walk->no_vma) { pte = pte_offset_map(pmd, addr); - err = walk_pte_range_inner(pte, addr, end, walk); + err = walk_pte_range_inner(pte, addr, end, walk, ptl, pmd); pte_unmap(pte); } else { pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl); - err = walk_pte_range_inner(pte, addr, end, walk); + err = walk_pte_range_inner(pte, addr, end, walk, ptl, pmd); pte_unmap_unlock(pte, ptl); } + if (walk->flags & WALK_LOCK_RMAP) + drop_rmap_locks(walk->vma); + return err; } @@ -124,8 +181,11 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, * This implies that each ->pmd_entry() handler * needs to know about pmd_trans_huge() pmds */ - if (ops->pmd_entry) + if (ops->pmd_entry) { + /* Migration waiting is not implemented for pmd entries */ + WARN_ON_ONCE(walk->flags & WALK_MIGRATION); err = ops->pmd_entry(pmd, addr, next, walk); + } if (err) break; @@ -507,13 +567,14 @@ int walk_page_range_novma(struct mm_struct *mm, unsigned long start, } int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *ops, - void *private) + void *private, unsigned long flags) { struct mm_walk walk = { .ops = ops, .mm = vma->vm_mm, .vma = vma, .private = private, + .flags = flags }; int err; diff --git a/mm/rmap.c b/mm/rmap.c index fedb82371efe..d4d95ada0946 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2200,6 +2200,22 @@ int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, EXPORT_SYMBOL_GPL(make_device_exclusive_range); #endif +void take_rmap_locks(struct vm_area_struct *vma) +{ + if (vma->vm_file) + i_mmap_lock_write(vma->vm_file->f_mapping); + if (vma->anon_vma) + anon_vma_lock_write(vma->anon_vma); +} + +void drop_rmap_locks(struct vm_area_struct *vma) +{ + if (vma->anon_vma) + anon_vma_unlock_write(vma->anon_vma); + if (vma->vm_file) + i_mmap_unlock_write(vma->vm_file->f_mapping); +} + void __put_anon_vma(struct anon_vma *anon_vma) { struct anon_vma *root = anon_vma->root; From patchwork Mon May 16 12:54:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Jakub_Mat=C4=9Bna?= X-Patchwork-Id: 12850766 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A167C433F5 for ; Mon, 16 May 2022 12:53:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 15C586B0075; Mon, 16 May 2022 08:53:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E2406B007E; Mon, 16 May 2022 08:53:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D56646B0078; Mon, 16 May 2022 08:53:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B23906B007B for ; Mon, 16 May 2022 08:53:48 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 884F761BB1 for ; Mon, 16 May 2022 12:53:48 +0000 (UTC) X-FDA: 79471598136.23.D747196 Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) by imf07.hostedemail.com (Postfix) with ESMTP id 5FF74400BE for ; Mon, 16 May 2022 12:53:40 +0000 (UTC) Received: by mail-wr1-f46.google.com with SMTP id a5so16595135wrp.7 for ; Mon, 16 May 2022 05:53:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=A0rWI2lLONIT7+b9pf3JkzKqaAFR6Sio8Kh0BbPP5t4=; b=cRNQBWUSaU8ZTb3mLWayUvSt2WgVFapHE6E+2n78CQWApmnv3YTT/1CpT4J0OFjVJG cxoaZ1tDhKtqOChZ7/K0lynsretSamTSt56kDmZm+L/iPkMCrJ/EYEG4mAd+2qPVZM1r 3TDktZv12TOO4Fk7YUIfYwtHTCYKew20BwJDx5nbPVZ2Wxwd8DvKuGflHem6KlM/hGCM EYLyNgYr4OWW6EIgl9g+nkgrTyms4tm8LnIUCtQQoe0Ep+3Eon/PhxktLFMBA79iBGue ed9l1U4fNmaYQG663niRCxbqgrZAnq8MmVhgFh/IVrmu+4zRK+Qm52lXqPYo+8ViWh0D 5nxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=A0rWI2lLONIT7+b9pf3JkzKqaAFR6Sio8Kh0BbPP5t4=; b=xEf2Q3e6f8Yftfd1dglRDFPRd359ZxnddbZRGUPuOFXPZ7mzePdpuklzyCji9RmDtI 0LIF5btFrH/aLX8U1AFfGVlWIAvK+V0G5QjiI7aIy7gudk2i5mlBiOLs7jJZLfx5vfJX biRVveHJ5CEjegkkdqUMdMduDGboL4rfgN2Zqzd9HM6x0g9bTOrjnUxzPa8u+3OHnHhA hzIpGQEvMcCEVrAuyJupuhlkeDWggpYb0hZbDeU2whVMK8DzVmfqjFwCb0vQnyK91yL/ 1TvuUBdQyE46++EUIMZp2n1R40ewqhZqKgoBV9vbaK5IPIT+5rFfhaLBd82r5VgsQfC5 oNfA== X-Gm-Message-State: AOAM533+Bp0oF12nSQhfBek7a2D9dQYHBmxs4wfqzsv8sckUvvmCPFE/ +u44+wKgrxMfVXiQF+q+jjM= X-Google-Smtp-Source: ABdhPJyskl20gRCnwQJxLVV5hqWKuOYrDBxhqPI4L3BVflIiUd4SYWT46rl95XELnlRm0Wb4JJgd0w== X-Received: by 2002:adf:eb8e:0:b0:20c:b378:67b2 with SMTP id t14-20020adfeb8e000000b0020cb37867b2mr14465715wrn.164.1652705626101; Mon, 16 May 2022 05:53:46 -0700 (PDT) Received: from orion.localdomain ([93.99.228.15]) by smtp.gmail.com with ESMTPSA id l11-20020adfbd8b000000b0020c5253d91esm9607279wrh.106.2022.05.16.05.53.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 05:53:45 -0700 (PDT) Received: by orion.localdomain (Postfix, from userid 1003) id 96D23A0E79; Mon, 16 May 2022 14:54:07 +0200 (CEST) From: =?utf-8?q?Jakub_Mat=C4=9Bna?= To: linux-mm@kvack.org Cc: patches@lists.linux.dev, linux-kernel@vger.kernel.org, vbabka@suse.cz, mhocko@kernel.org, mgorman@techsingularity.net, willy@infradead.org, liam.howlett@oracle.com, hughd@google.com, kirill@shutemov.name, riel@surriel.com, rostedt@goodmis.org, peterz@infradead.org, david@redhat.com, =?utf-8?q?Jakub_Mat=C4=9Bna?= Subject: [RFC PATCH v3 4/6] [PATCH 4/6] mm: adjust page offset in mremap Date: Mon, 16 May 2022 14:54:03 +0200 Message-Id: <20220516125405.1675-5-matenajakub@gmail.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220516125405.1675-1-matenajakub@gmail.com> References: <20220516125405.1675-1-matenajakub@gmail.com> MIME-Version: 1.0 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=cRNQBWUS; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of matenajakub@gmail.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=matenajakub@gmail.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 5FF74400BE X-Rspam-User: X-Stat-Signature: x1z3ig6k4n8kgfkzfd7nqy33wmbobywo X-HE-Tag: 1652705620-835498 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Adjust page offset of a VMA when it's moved to a new location by mremap. This is made possible for all VMAs that do not share their anonymous pages with other processes and it is checked by going through the anon_vma tree and looking for parent child relationship. Also and maybe redundantly this is checked for individual struct pages belonging to the given vma by looking at their mapcount or swap entry reference count if the page is swapped out. This is all done in can_update_faulted_pgoff(), is_shared_pte(). If none of the pages is shared then we proceed with the page offset update. This means updating page offset in the copy_vma() which is used when creating the VMA copy or possibly when deciding whether to merge with a neighboring VMA. We also set pgoff_updatable to true to later update page offsets of individual pages. This is done later in move_page_tables() when moving individual pte entries to the target VMA. The page offset update actually forces the move to happen at the pte level by using move_ptes(). It is necessary because the page update must happen atomically with the move and that is not possible when moving bigger entries like pmd or pud. We do not need to update swapped out pages, because in that case page offset is reconstructed automatically from VMA after the page is swapped in. As mentioned above there is a small amount of time between checking and actually updating the page offset of pages as well as between merging VMAs and again updating the pages. This could potencially interfere with rmap walk, but fortunately in that case rmap walk can use the still existing old VMA, as it would before the mremap started. Any other potential changes to the VMA or pages is prevented by mmap_lock, which prevents forking and therefore also COW and hence raising the mapcount. Because pages are not shared, but belong to only one process, there is no other process which might fork and in that way increase mapcount of the pages in question. If a page is shared we can't update page offset of the page, because that would interfere with the page offset for the other processes using the page. Page offset is basically immutable as long as the page is used by more than one process. Previously, adjusting page offset was possible only for not yet faulted VMAs, even though a page offset matching the virtual address of the anonymous VMA is necessary to successfully merge with another VMA. Signed-off-by: Jakub Matěna Reported-by: kernel test robot --- fs/exec.c | 2 +- include/linux/mm.h | 4 +- include/linux/pagewalk.h | 2 + include/linux/rmap.h | 2 + mm/mmap.c | 113 +++++++++++++++++++++++++++++++------ mm/mremap.c | 117 +++++++++++++++++++++++++++++++++------ mm/pagewalk.c | 2 +- mm/rmap.c | 41 ++++++++++++++ 8 files changed, 244 insertions(+), 39 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index e3e55d5e0be1..207f60fcb2b4 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -709,7 +709,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift) * process cleanup to remove whatever mess we made. */ if (length != move_page_tables(vma, old_start, - vma, new_start, length, false)) + vma, new_start, length, false, false)) return -ENOMEM; lru_add_drain(); diff --git a/include/linux/mm.h b/include/linux/mm.h index e34edb775334..d8e482aef901 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1951,7 +1951,7 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen); extern unsigned long move_page_tables(struct vm_area_struct *vma, unsigned long old_addr, struct vm_area_struct *new_vma, unsigned long new_addr, unsigned long len, - bool need_rmap_locks); + bool need_rmap_locks, bool update_pgoff); /* * Flags used by change_protection(). For now we make it a bitmap so @@ -2637,7 +2637,7 @@ extern void __vma_link_rb(struct mm_struct *, struct vm_area_struct *, extern void unlink_file_vma(struct vm_area_struct *); extern struct vm_area_struct *copy_vma(struct vm_area_struct **, unsigned long addr, unsigned long len, pgoff_t pgoff, - bool *need_rmap_locks); + bool *need_rmap_locks, bool *update_pgoff); extern void exit_mmap(struct mm_struct *); static inline int check_data_rlimit(unsigned long rlim, diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index 07345df51324..11c99c8d343b 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -101,6 +101,8 @@ struct mm_walk { void *private; }; +int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, + struct mm_walk *walk); int walk_page_range(struct mm_struct *mm, unsigned long start, unsigned long end, const struct mm_walk_ops *ops, void *private); diff --git a/include/linux/rmap.h b/include/linux/rmap.h index d2d5e511dd93..9fee804f47ea 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -144,6 +144,8 @@ void unlink_anon_vmas(struct vm_area_struct *); int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *); int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *); +bool rbt_no_children(struct anon_vma *av); + static inline int anon_vma_prepare(struct vm_area_struct *vma) { if (likely(vma->anon_vma)) diff --git a/mm/mmap.c b/mm/mmap.c index 4a4611443593..3ca78baaee13 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -48,6 +48,8 @@ #include #include #include +#include +#include #include #include @@ -3189,28 +3191,100 @@ int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma) return 0; } +/* + * is_shared_pte() - Check if the given pte points to a page that is not shared between processes. + * @pte: pte to check + * @addr: Address where the page is mapped + * @end: Not used + * @walk: Pagewalk structure holding pointer to VMA where the page belongs + */ +static int is_shared_pte(pte_t *pte, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + int err; + struct page *page; + struct vm_area_struct *old = walk->vma; + + if (is_swap_pte(*pte)) { + swp_entry_t entry = pte_to_swp_entry(*pte); + struct swap_info_struct *info = swp_swap_info(entry); + /* + * If the reference count is higher than one than the swap slot is used by + * more than one process or the swap cache is active, which means that the + * page is mapped by at least one process and swapped out by at least one + * process, so in both cases this means the page is shared. + * There can also exist continuation pages if the reference count is too + * high to fit in just one cell. This is specified by the flag COUNT_CONTINUED, + * which again triggers the below condition if set. + */ + return info->swap_map[swp_offset(entry)] > 1; + } + + if (!pte_present(*pte)) + return 0; + page = vm_normal_page(old, addr, *pte); + if (page == NULL) + return 0; + /* Check page is not shared with other processes */ + err = page_mapcount(page) + page_swapcount(page) > 1; + return err; +} + +/** + * can_update_faulted_pgoff() - Check if pgoff update is possible for faulted pages of a vma + * @vma: VMA which should be moved + * @addr: new virtual address + * If the vma and its pages are not shared with another process, updating + * the new pgoff and also updating index parameter (copy of the pgoff) in + * all faulted pages is possible. + */ +static bool can_update_faulted_pgoff(struct vm_area_struct *vma, unsigned long addr) +{ + const struct mm_walk_ops can_update_pgoff_ops = { + .pte_entry = is_shared_pte + }; + + /* Check vma is not shared with other processes */ + if (vma->anon_vma->root != vma->anon_vma || !rbt_no_children(vma->anon_vma)) + return 1; + /* walk_page_vma() returns 0 on success */ + return !walk_page_vma(vma, &can_update_pgoff_ops, NULL, WALK_MIGRATION | WALK_LOCK_RMAP); +} + /* * Copy the vma structure to a new location in the same mm, * prior to moving page table entries, to effect an mremap move. */ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, unsigned long addr, unsigned long len, pgoff_t pgoff, - bool *need_rmap_locks) + bool *need_rmap_locks, bool *update_pgoff) { struct vm_area_struct *vma = *vmap; unsigned long vma_start = vma->vm_start; struct mm_struct *mm = vma->vm_mm; struct vm_area_struct *new_vma, *prev; struct rb_node **rb_link, *rb_parent; - bool faulted_in_anon_vma = true; + bool anon_pgoff_updated = false; + *need_rmap_locks = false; + *update_pgoff = false; /* - * If anonymous vma has not yet been faulted, update new pgoff + * Try to update new pgoff for anonymous vma * to match new location, to increase its chance of merging. */ - if (unlikely(vma_is_anonymous(vma) && !vma->anon_vma)) { - pgoff = addr >> PAGE_SHIFT; - faulted_in_anon_vma = false; + if (unlikely(vma_is_anonymous(vma))) { + if (!vma->anon_vma) { + pgoff = addr >> PAGE_SHIFT; + anon_pgoff_updated = true; + } else { + anon_pgoff_updated = can_update_faulted_pgoff(vma, addr); + if (anon_pgoff_updated) { + /* Update pgoff of the copied VMA */ + pgoff = addr >> PAGE_SHIFT; + *update_pgoff = true; + *need_rmap_locks = true; + } + } } if (find_vma_links(mm, addr, addr + len, &prev, &rb_link, &rb_parent)) @@ -3227,19 +3301,25 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, /* * The only way we can get a vma_merge with * self during an mremap is if the vma hasn't - * been faulted in yet and we were allowed to - * reset the dst vma->vm_pgoff to the - * destination address of the mremap to allow - * the merge to happen. mremap must change the - * vm_pgoff linearity between src and dst vmas - * (in turn preventing a vma_merge) to be - * safe. It is only safe to keep the vm_pgoff - * linear if there are no pages mapped yet. + * been faulted in yet or is not shared and + * we were allowed to reset the dst + * vma->vm_pgoff to the destination address of + * the mremap to allow the merge to happen. + * mremap must change the vm_pgoff linearity + * between src and dst vmas (in turn + * preventing a vma_merge) to be safe. It is + * only safe to keep the vm_pgoff linear if + * there are no pages mapped yet or the none + * of the pages are shared with another process. */ - VM_BUG_ON_VMA(faulted_in_anon_vma, new_vma); + VM_BUG_ON_VMA(!anon_pgoff_updated, new_vma); *vmap = vma = new_vma; } - *need_rmap_locks = (new_vma->vm_pgoff <= vma->vm_pgoff); + /* + * If new_vma is located before the old vma, rmap traversal order is altered + * and we need to apply rmap locks on vma later. + */ + *need_rmap_locks |= (new_vma->vm_pgoff <= vma->vm_pgoff); } else { new_vma = vm_area_dup(vma); if (!new_vma) @@ -3256,7 +3336,6 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, if (new_vma->vm_ops && new_vma->vm_ops->open) new_vma->vm_ops->open(new_vma); vma_link(mm, new_vma, prev, rb_link, rb_parent); - *need_rmap_locks = false; } return new_vma; diff --git a/mm/mremap.c b/mm/mremap.c index 309fab7ed706..2ef444abb08a 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include @@ -117,10 +118,66 @@ static pte_t move_soft_dirty_pte(pte_t pte) return pte; } +/* + * update_pgoff_page() - Update page offset stored in page->index, if the page is not NULL. + * @addr: new address to calculate the page offset. + * @page: page to update + */ +static int update_pgoff_page(unsigned long addr, struct page *page) +{ + if (page != NULL) { + get_page(page); + if (!trylock_page(page)) { + put_page(page); + return -1; + } + page->index = addr >> PAGE_SHIFT; + unlock_page(page); + put_page(page); + } + return 0; +} + +/* + * update_pgoff_pte_inner() - Wait for migration and update page offset of + * a page represented by pte, if the pte points to mapped page. + */ +static int update_pgoff_pte_inner(pte_t *old_pte, unsigned long old_addr, + struct vm_area_struct *vma, spinlock_t *old_ptl, + pmd_t *old_pmd, unsigned long new_addr) +{ + struct page *page; + /* + * If pte is in migration state then wait for migration + * and return with -1 to trigger relocking mechanism in move_ptes(). + */ + if (!pte_present(*old_pte)) { + if (!pte_none(*old_pte)) { + swp_entry_t entry; + entry = pte_to_swp_entry(*old_pte); + if (is_migration_entry(entry)) { + migration_entry_wait(vma->vm_mm, old_pmd, old_addr); + return -1; + } + } + /* + * If there is no migration entry, but at the same + * time the page is not present then the page offset + * will be reconstructed automatically from the + * VMA after the page is moved back into RAM. + */ + return 0; + } + + page = vm_normal_page(vma, old_addr, *old_pte); + return update_pgoff_page(new_addr, page); +} + static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, unsigned long old_addr, unsigned long old_end, struct vm_area_struct *new_vma, pmd_t *new_pmd, - unsigned long new_addr, bool need_rmap_locks) + unsigned long new_addr, bool need_rmap_locks, + bool update_pgoff) { struct mm_struct *mm = vma->vm_mm; pte_t *old_pte, *new_pte, pte; @@ -146,6 +203,8 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, * serialize access to individual ptes, but only rmap traversal * order guarantees that we won't miss both the old and new ptes). */ + +retry: if (need_rmap_locks) take_rmap_locks(vma); @@ -166,6 +225,10 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, if (pte_none(*old_pte)) continue; + if (update_pgoff) + if (update_pgoff_pte_inner(old_pte, old_addr, vma, old_ptl, + old_pmd, new_addr)) + break; /* Causes unlock after for cycle and goto retry */ pte = ptep_get_and_clear(mm, old_addr, old_pte); /* * If we are remapping a valid PTE, make sure @@ -194,6 +257,8 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, pte_unmap_unlock(old_pte - 1, old_ptl); if (need_rmap_locks) drop_rmap_locks(vma); + if (old_addr < old_end) + goto retry; } #ifndef arch_supports_page_table_move @@ -422,11 +487,19 @@ static __always_inline unsigned long get_extent(enum pgt_entry entry, * pgt_entry. Returns true if the move was successful, else false. */ static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma, - unsigned long old_addr, unsigned long new_addr, - void *old_entry, void *new_entry, bool need_rmap_locks) + struct vm_area_struct *new_vma, unsigned long old_addr, + unsigned long new_addr, void *old_entry, void *new_entry, + bool need_rmap_locks, bool update_pgoff) { bool moved = false; + /* + * In case of page offset update move must be done + * at the pte level using move_ptes() + */ + if (update_pgoff) + return false; + /* See comment in move_ptes() */ if (need_rmap_locks) take_rmap_locks(vma); @@ -465,7 +538,7 @@ static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma, unsigned long move_page_tables(struct vm_area_struct *vma, unsigned long old_addr, struct vm_area_struct *new_vma, unsigned long new_addr, unsigned long len, - bool need_rmap_locks) + bool need_rmap_locks, bool update_pgoff) { unsigned long extent, old_end; struct mmu_notifier_range range; @@ -492,7 +565,14 @@ unsigned long move_page_tables(struct vm_area_struct *vma, * If extent is PUD-sized try to speed up the move by moving at the * PUD level if possible. */ - extent = get_extent(NORMAL_PUD, old_addr, old_end, new_addr); + if (update_pgoff) + /* + * In case of pgoff update, extent is set to PMD + * and is done using move_ptes() + */ + extent = get_extent(NORMAL_PMD, old_addr, old_end, new_addr); + else + extent = get_extent(NORMAL_PUD, old_addr, old_end, new_addr); old_pud = get_old_pud(vma->vm_mm, old_addr); if (!old_pud) @@ -502,15 +582,15 @@ unsigned long move_page_tables(struct vm_area_struct *vma, break; if (pud_trans_huge(*old_pud) || pud_devmap(*old_pud)) { if (extent == HPAGE_PUD_SIZE) { - move_pgt_entry(HPAGE_PUD, vma, old_addr, new_addr, - old_pud, new_pud, need_rmap_locks); + move_pgt_entry(HPAGE_PUD, vma, new_vma, old_addr, new_addr, + old_pud, new_pud, need_rmap_locks, update_pgoff); /* We ignore and continue on error? */ continue; } } else if (IS_ENABLED(CONFIG_HAVE_MOVE_PUD) && extent == PUD_SIZE) { - if (move_pgt_entry(NORMAL_PUD, vma, old_addr, new_addr, - old_pud, new_pud, true)) + if (move_pgt_entry(NORMAL_PUD, vma, new_vma, old_addr, new_addr, + old_pud, new_pud, true, update_pgoff)) continue; } @@ -524,8 +604,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma, if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd) || pmd_devmap(*old_pmd)) { if (extent == HPAGE_PMD_SIZE && - move_pgt_entry(HPAGE_PMD, vma, old_addr, new_addr, - old_pmd, new_pmd, need_rmap_locks)) + move_pgt_entry(HPAGE_PMD, vma, new_vma, old_addr, new_addr, + old_pmd, new_pmd, need_rmap_locks, update_pgoff)) continue; split_huge_pmd(vma, old_pmd, old_addr); if (pmd_trans_unstable(old_pmd)) @@ -536,15 +616,15 @@ unsigned long move_page_tables(struct vm_area_struct *vma, * If the extent is PMD-sized, try to speed the move by * moving at the PMD level if possible. */ - if (move_pgt_entry(NORMAL_PMD, vma, old_addr, new_addr, - old_pmd, new_pmd, true)) + if (move_pgt_entry(NORMAL_PMD, vma, new_vma, old_addr, new_addr, + old_pmd, new_pmd, true, update_pgoff)) continue; } if (pte_alloc(new_vma->vm_mm, new_pmd)) break; move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma, - new_pmd, new_addr, need_rmap_locks); + new_pmd, new_addr, need_rmap_locks, update_pgoff); } mmu_notifier_invalidate_range_end(&range); @@ -568,7 +648,8 @@ static unsigned long move_vma(struct vm_area_struct *vma, unsigned long hiwater_vm; int split = 0; int err = 0; - bool need_rmap_locks; + bool need_rmap_locks = false; + bool update_pgoff = false; /* * We'd prefer to avoid failure later on in do_munmap: @@ -608,7 +689,7 @@ static unsigned long move_vma(struct vm_area_struct *vma, new_pgoff = vma->vm_pgoff + ((old_addr - vma->vm_start) >> PAGE_SHIFT); new_vma = copy_vma(&vma, new_addr, new_len, new_pgoff, - &need_rmap_locks); + &need_rmap_locks, &update_pgoff); if (!new_vma) { if (vm_flags & VM_ACCOUNT) vm_unacct_memory(to_account >> PAGE_SHIFT); @@ -616,7 +697,7 @@ static unsigned long move_vma(struct vm_area_struct *vma, } moved_len = move_page_tables(vma, old_addr, new_vma, new_addr, old_len, - need_rmap_locks); + need_rmap_locks, update_pgoff); if (moved_len < old_len) { err = -ENOMEM; } else if (vma->vm_ops && vma->vm_ops->mremap) { @@ -630,7 +711,7 @@ static unsigned long move_vma(struct vm_area_struct *vma, * and then proceed to unmap new area instead of old. */ move_page_tables(new_vma, new_addr, vma, old_addr, moved_len, - true); + true, update_pgoff); vma = new_vma; old_len = new_len; old_addr = new_addr; diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 0bfb8c9255f3..d603962ddd52 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -89,7 +89,7 @@ static int walk_pte_range_inner(pte_t *pte, unsigned long addr, return err; } -static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, +int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) { pte_t *pte; diff --git a/mm/rmap.c b/mm/rmap.c index d4d95ada0946..b1bddabd21c6 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -389,6 +389,47 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma) return -ENOMEM; } + +/* + * rbst_no_children() - Used by rbt_no_children to check node subtree. + * Check if none of the VMAs connected to the node subtree via + * anon_vma_chain are in child relationship to the given anon_vma. + * @av: anon_vma to check + * @node: node to check in this level + */ +static bool rbst_no_children(struct anon_vma *av, struct rb_node *node) +{ + struct anon_vma_chain *model; + struct anon_vma_chain *avc; + + if (node == NULL) /* leaf node */ + return true; + avc = container_of(node, typeof(*(model)), rb); + if (avc->vma->anon_vma != av) + /* + * Inequality implies avc belongs + * to a VMA of a child process + */ + return false; + return (rbst_no_children(av, node->rb_left) && + rbst_no_children(av, node->rb_right)); +} + +/* + * rbt_no_children() - Check if none of the VMAs connected to the given + * anon_vma via anon_vma_chain are in child relationship + * @av: anon_vma to check if it has children + */ +bool rbt_no_children(struct anon_vma *av) +{ + struct rb_node *root_node; + + if (av == NULL || av->degree <= 1) /* Higher degree might not necessarily imply children */ + return true; + root_node = av->rb_root.rb_root.rb_node; + return rbst_no_children(av, root_node); +} + void unlink_anon_vmas(struct vm_area_struct *vma) { struct anon_vma_chain *avc, *next; From patchwork Mon May 16 12:54:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Jakub_Mat=C4=9Bna?= X-Patchwork-Id: 12850768 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3050DC4332F for ; Mon, 16 May 2022 12:53:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1257C6B0078; Mon, 16 May 2022 08:53:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F03176B007B; Mon, 16 May 2022 08:53:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA58B6B0080; Mon, 16 May 2022 08:53:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B85CD6B0078 for ; Mon, 16 May 2022 08:53:49 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8DCCB61BAA for ; Mon, 16 May 2022 12:53:49 +0000 (UTC) X-FDA: 79471598178.26.05D5FAF Received: from mail-wr1-f53.google.com (mail-wr1-f53.google.com [209.85.221.53]) by imf31.hostedemail.com (Postfix) with ESMTP id 71C75200BD for ; Mon, 16 May 2022 12:53:22 +0000 (UTC) Received: by mail-wr1-f53.google.com with SMTP id h14so731752wrc.6 for ; Mon, 16 May 2022 05:53:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=RW3LyhQcqgE9zy1F+xq4iCJlLD9gveB3mgv1QCdDgwo=; b=W8LdUuv10ewTcDhz1X7a5zGV2iUuUNQI9DHBp0pB1kj2O1+84SnGrz+DkbU9PjKREZ daMHKNbuKz70IqhwUlKmXigrH623arBKegFML7riwWs6TgihD1YLiN73ERETQow+WUno lPT6o10eSLd9+yfgruBPo1oIC7zQSll41N0Td10cu4dsyGq8TdqBl2b+W4hCUJFq9Xdx NtKaMcvYjQiYUqAW6Ak5tG546MMW1KIIKEc2M8XAKapc3NOIjMPuNss8UI6dHDf9hoWI NRbZf9yq3qOFJTHSz1xGoYdlStT0JLMxaiJojNQs3cihQVGO/0XNGLGY3ZR6NZeAX5to HeDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=RW3LyhQcqgE9zy1F+xq4iCJlLD9gveB3mgv1QCdDgwo=; b=QwUdoQxIWc8eiydosyOdSALJb/el5wFVtF0snqBavfAjLQwhuSx5Xj1/ezWqHCdd+T Jl1qdUYJLBkxgNTrJUlgENDl8ooCJBxpcEtUezAlxZWRNzupUIzlg2TLQ6Eia4L+ySsV MjiCGI9+pN7cgBmyaWXc4gzjD6TJHn0K420+U3mVpCCnA4dw/JD58l1oYtr7XgdJg8Md fYDeup9WXX+ZC9u7fzACe8Kc7w0PxJLTt5bnONytZ9m+SHBQvMPdNwygM64uBqc2JtY1 BIh/M0f6NhtGbqbRFVh/Rd0PULeoXebkmigFODRc/Wk+bDGX9YDkRjAygcyJ7ba/1jh7 cXcQ== X-Gm-Message-State: AOAM531ZIi3QyZ6McZLL6/nl2YnzSKtmavX1BCPDlrYWRB4nGmLKqbzD /UMhUQOFrjIEvHtmucCzYww= X-Google-Smtp-Source: ABdhPJwVaUbvlDO5a9DcZzv6XGZZAZB2JybMFB2YgL9SXzdisoGNcasongI8gLNP4TfGQP3q7OihDw== X-Received: by 2002:a05:6000:1ac8:b0:20c:dced:2f6c with SMTP id i8-20020a0560001ac800b0020cdced2f6cmr13621031wry.107.1652705627883; Mon, 16 May 2022 05:53:47 -0700 (PDT) Received: from orion.localdomain ([93.99.228.15]) by smtp.gmail.com with ESMTPSA id x10-20020adfbb4a000000b0020d0435c97bsm4767381wrg.92.2022.05.16.05.53.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 05:53:46 -0700 (PDT) Received: by orion.localdomain (Postfix, from userid 1003) id 9A9B0A0E7A; Mon, 16 May 2022 14:54:07 +0200 (CEST) From: =?utf-8?q?Jakub_Mat=C4=9Bna?= To: linux-mm@kvack.org Cc: patches@lists.linux.dev, linux-kernel@vger.kernel.org, vbabka@suse.cz, mhocko@kernel.org, mgorman@techsingularity.net, willy@infradead.org, liam.howlett@oracle.com, hughd@google.com, kirill@shutemov.name, riel@surriel.com, rostedt@goodmis.org, peterz@infradead.org, david@redhat.com, =?utf-8?q?Jakub_Mat=C4=9Bna?= Subject: [RFC PATCH v3 5/6] [PATCH 5/6] mm: enable merging of VMAs with different anon_vmas Date: Mon, 16 May 2022 14:54:04 +0200 Message-Id: <20220516125405.1675-6-matenajakub@gmail.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220516125405.1675-1-matenajakub@gmail.com> References: <20220516125405.1675-1-matenajakub@gmail.com> MIME-Version: 1.0 Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=W8LdUuv1; spf=pass (imf31.hostedemail.com: domain of matenajakub@gmail.com designates 209.85.221.53 as permitted sender) smtp.mailfrom=matenajakub@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 71C75200BD X-Stat-Signature: xqwe7fqkgfs781qua63y965mnika85ij X-HE-Tag: 1652705602-414030 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Enable merging of a VMA even when it is linked to different anon_vma than the VMA it is being merged to, but only if the VMA in question does not share any pages with a parent or child process. This enables merges that would otherwise not be possible and therefore decrease number of VMAs of a process. In this patch the VMA is checked only at the level of anon_vma to find out if it shares any pages with parent or child process. This check is performed in is_mergeable_anon_vma() which is a part of vma_merge(). In this case it is not easily possible to check mapcount of individual pages (as opposed to previous commit "mm: adjust page offset in mremap"), because vma_merge() does not have a pointer to the VMA or any other means of easily accessing the page structures. In the following two paragraphs we are using cases 1 through 8 as described in comment before vma_merge(). The update itself is done during __vma_adjust() for cases 4 through 8 and partially for case 1. Other cases must be solved elsewhere because __vma_adjust() can only work with pages that already reside in the location of merge, in other words if VMA already exists in the location where merge is happening. This is not true for cases 2, 3 and partially case 1, where the next VMA is already present but the middle part is not. Cases 1,2 and 3 are either expanding or moving VMA to the location of the merge, but unfortunately at the time of the merge the mapping is not there yet and therefore the page update has to be done later. An easy way out is if the pages do not exist yet and therefore there is nothing to update. This happens e.g. when expanding a mapping in mmap_region() or in do_brk_flags(). On the other hand the pages do exist and have to be updated during the mremap call that moves already existing and possibly faulted mapping. In this case the page update is done in move_page_tables(). It is actually quite simple as previous commit "mm: adjust page offset in mremap" already introduced page update and therefore only change is updating one more parameter. If rmap walk happens between __vma_adjust() and page update done in move_page_tables() then old VMA and old anon_vma are used as it would happen before starting the whole merge. The cases 4 through 8 correspond to merges which are a result of a mprotect call or any other flag update that does not move or resize the mapping. Together with part of case 1 the update of physical pages is then handled directly in __vma_adjust() as mentioned before. First it is determined which address range should be updated, which depends on the specific case 1, 4, 5, 6, 7 or 8. The address range is set to variables pu_start and pu_end. Secondly the value being set to the page->mapping must be determined. However, it is always anon_vma belonging to the expand parameter of the __vma_adjust() call. The reason we are updating the pages is that in the __vma_adjust() the ranges vm_start and vm_end are updated for involved VMAs and so physical pages can belong to different VMA and anon_vma than before. The problem is that these two updates (VMAs and pages) should happen atomically from the rmap walk point of view. This would normally be solved by using rmap locks, but at the same time we must keep in mind that page migration uses rmap walk at the start and at the end and rmap locks might trap it in the middle. This would cause pte to not point to actual pages but instead remain in the migration entry state, which would block the page update. The solution is to drop rmap lock to allow page migration to end and page walk all the relevant pages, where we wait for possible migration to end and update the page->mapping (aka anon_vma). After that we again take rmap lock. This whole page update must be done after the expand VMA is already enlarged, but the source VMA still has its original range. This way if rmap walk happens while we are updating the pages then rmap lock will work either with the old or the new anon_vma and therefore also the old or new VMA. If the page is swapped out, zero page or KSM page, then it is not changed and the correct mapping will be reconstructed from the VMA itself when the page returns to normal state. Again as mentioned and explained in the previous commit "mm: adjust page offset in mremap", the mapcount of the pages should not change between vma_merge checks and actually merging in __vma_adjust() as potencial fork is prevented by mmap_lock. If one of the VMAs is not yet faulted and therefore does not have anon_vma assigned then this patch is not needed and merge happens even without it. Signed-off-by: Jakub Matěna Reported-by: kernel test robot --- include/linux/pagewalk.h | 2 + include/linux/rmap.h | 15 ++++++- mm/mmap.c | 60 ++++++++++++++++++++++++--- mm/mremap.c | 22 +++++++--- mm/pagewalk.c | 2 +- mm/rmap.c | 87 ++++++++++++++++++++++++++++++++++++++++ 6 files changed, 176 insertions(+), 12 deletions(-) diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index 11c99c8d343b..9685d1a26f17 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -106,6 +106,8 @@ int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, int walk_page_range(struct mm_struct *mm, unsigned long start, unsigned long end, const struct mm_walk_ops *ops, void *private); +int __walk_page_range(unsigned long start, unsigned long end, + struct mm_walk *walk); int walk_page_range_novma(struct mm_struct *mm, unsigned long start, unsigned long end, const struct mm_walk_ops *ops, pgd_t *pgd, diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 9fee804f47ea..c1ba908f92e6 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -138,6 +138,8 @@ static inline void anon_vma_unlock_read(struct anon_vma *anon_vma) */ void anon_vma_init(void); /* create anon_vma_cachep */ int __anon_vma_prepare(struct vm_area_struct *); +void reconnect_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, + struct vm_area_struct *target, struct vm_area_struct *source); void take_rmap_locks(struct vm_area_struct *vma); void drop_rmap_locks(struct vm_area_struct *vma); void unlink_anon_vmas(struct vm_area_struct *); @@ -154,10 +156,21 @@ static inline int anon_vma_prepare(struct vm_area_struct *vma) return __anon_vma_prepare(vma); } +/** + * anon_vma_merge() - Merge anon_vmas of the given VMAs + * @vma: VMA being merged to + * @next: VMA being merged + */ static inline void anon_vma_merge(struct vm_area_struct *vma, struct vm_area_struct *next) { - VM_BUG_ON_VMA(vma->anon_vma != next->anon_vma, vma); + struct anon_vma *anon_vma1 = vma->anon_vma; + struct anon_vma *anon_vma2 = next->anon_vma; + + VM_BUG_ON_VMA(anon_vma1 && anon_vma2 && anon_vma1 != anon_vma2 && + ((anon_vma2 != anon_vma2->root) + || !rbt_no_children(anon_vma2)), vma); + unlink_anon_vmas(next); } diff --git a/mm/mmap.c b/mm/mmap.c index 3ca78baaee13..e7760e378a68 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -753,6 +753,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, bool start_changed = false, end_changed = false; long adjust_next = 0; int remove_next = 0; + unsigned long pu_start = 0; + unsigned long pu_end = 0; if (next && !insert) { struct vm_area_struct *exporter = NULL, *importer = NULL; @@ -778,6 +780,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, remove_next = 3; VM_WARN_ON(file != next->vm_file); swap(vma, next); + pu_start = start; + pu_end = vma->vm_end; } else { VM_WARN_ON(expand != vma); /* @@ -789,6 +793,10 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, end != next->vm_next->vm_end); /* trim end to next, for case 6 first pass */ end = next->vm_end; + VM_WARN_ON(vma == NULL); + + pu_start = next->vm_start; + pu_end = next->vm_end; } exporter = next; @@ -810,6 +818,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, exporter = next; importer = vma; VM_WARN_ON(expand != importer); + pu_start = vma->vm_end; + pu_end = end; } else if (end < vma->vm_end) { /* * vma shrinks, and !insert tells it's not @@ -820,6 +830,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, exporter = vma; importer = next; VM_WARN_ON(expand != importer); + pu_start = end; + pu_end = vma->vm_end; } /* @@ -863,8 +875,6 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, if (!anon_vma && adjust_next) anon_vma = next->anon_vma; if (anon_vma) { - VM_WARN_ON(adjust_next && next->anon_vma && - anon_vma != next->anon_vma); anon_vma_lock_write(anon_vma); anon_vma_interval_tree_pre_update_vma(vma); if (adjust_next) @@ -887,6 +897,31 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, end_changed = true; } vma->vm_pgoff = pgoff; + + /* Update the anon_vma stored in pages in the range specified by pu_start and pu_end */ + if (anon_vma && next && anon_vma != next->anon_vma && pu_start != pu_end) { + struct vm_area_struct *source; + + anon_vma_unlock_write(anon_vma); + VM_WARN_ON(expand == vma && next->anon_vma && + (next->anon_vma != next->anon_vma->root + || !rbt_no_children(next->anon_vma))); + VM_WARN_ON(expand == next && + (anon_vma != anon_vma->root || !rbt_no_children(anon_vma))); + VM_WARN_ON(expand != vma && expand != next); + VM_WARN_ON(expand == NULL); + if (expand == vma) + source = next; + else + source = vma; + /* + * Page walk over affected address range. + * Wait for migration and update page->mapping. + */ + reconnect_pages_range(mm, pu_start, pu_end, expand, source); + anon_vma_lock_write(anon_vma); + } + if (adjust_next) { next->vm_start += adjust_next; next->vm_pgoff += adjust_next >> PAGE_SHIFT; @@ -991,6 +1026,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, if (remove_next == 2) { remove_next = 1; end = next->vm_end; + pu_start = next->vm_start; + pu_end = next->vm_end; goto again; } else if (next) @@ -1067,7 +1104,20 @@ static inline int is_mergeable_anon_vma(struct anon_vma *anon_vma1, if ((!anon_vma1 || !anon_vma2) && (!vma || list_is_singular(&vma->anon_vma_chain))) return 1; - return anon_vma1 == anon_vma2; + if (anon_vma1 == anon_vma2) + return 1; + /* + * Different anon_vma but not shared by several processes + */ + else if ((anon_vma1 && anon_vma2) && + (anon_vma1 == anon_vma1->root) + && (rbt_no_children(anon_vma1))) + return 1; + /* + * Different anon_vma and shared -> unmergeable + */ + else + return 0; } /* @@ -1213,8 +1263,8 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm, } /* Can we merge both the predecessor and the successor? */ if (merge_prev && merge_next && - is_mergeable_anon_vma(prev->anon_vma, - next->anon_vma, NULL)) { /* cases 1, 6 */ + is_mergeable_anon_vma(next->anon_vma, + prev->anon_vma, NULL)) { /* cases 1, 6 */ err = __vma_adjust(prev, prev->vm_start, next->vm_end, prev->vm_pgoff, NULL, prev); diff --git a/mm/mremap.c b/mm/mremap.c index 2ef444abb08a..3b2428288b0e 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -119,12 +119,16 @@ static pte_t move_soft_dirty_pte(pte_t pte) } /* - * update_pgoff_page() - Update page offset stored in page->index, if the page is not NULL. + * update_pgoff_page() - Update page offset stored in page->index + * and anon_vma in page->mapping, if the page is not NULL. * @addr: new address to calculate the page offset. * @page: page to update + * @vma: vma to get anon_vma */ -static int update_pgoff_page(unsigned long addr, struct page *page) +static int update_pgoff_page(unsigned long addr, struct page *page, struct vm_area_struct *vma) { + struct anon_vma *page_anon_vma; + unsigned long anon_mapping; if (page != NULL) { get_page(page); if (!trylock_page(page)) { @@ -132,6 +136,13 @@ static int update_pgoff_page(unsigned long addr, struct page *page) return -1; } page->index = addr >> PAGE_SHIFT; + + anon_mapping = (unsigned long)READ_ONCE(page->mapping); + page_anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); + if (page_anon_vma != vma->anon_vma + && page_anon_vma != NULL) { /* NULL in case of ZERO_PAGE or KSM page */ + page_move_anon_rmap(page, vma); /* Update physical page's mapping */ + } unlock_page(page); put_page(page); } @@ -144,7 +155,8 @@ static int update_pgoff_page(unsigned long addr, struct page *page) */ static int update_pgoff_pte_inner(pte_t *old_pte, unsigned long old_addr, struct vm_area_struct *vma, spinlock_t *old_ptl, - pmd_t *old_pmd, unsigned long new_addr) + pmd_t *old_pmd, unsigned long new_addr, + struct vm_area_struct *new_vma) { struct page *page; /* @@ -170,7 +182,7 @@ static int update_pgoff_pte_inner(pte_t *old_pte, unsigned long old_addr, } page = vm_normal_page(vma, old_addr, *old_pte); - return update_pgoff_page(new_addr, page); + return update_pgoff_page(new_addr, page, new_vma); } static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, @@ -227,7 +239,7 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, if (update_pgoff) if (update_pgoff_pte_inner(old_pte, old_addr, vma, old_ptl, - old_pmd, new_addr)) + old_pmd, new_addr, new_vma)) break; /* Causes unlock after for cycle and goto retry */ pte = ptep_get_and_clear(mm, old_addr, old_pte); /* diff --git a/mm/pagewalk.c b/mm/pagewalk.c index d603962ddd52..4076a5ecdec0 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -419,7 +419,7 @@ static int walk_page_test(unsigned long start, unsigned long end, return 0; } -static int __walk_page_range(unsigned long start, unsigned long end, +int __walk_page_range(unsigned long start, unsigned long end, struct mm_walk *walk) { int err = 0; diff --git a/mm/rmap.c b/mm/rmap.c index b1bddabd21c6..7caa6ec6110a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -73,6 +73,7 @@ #include #include #include +#include #include @@ -389,6 +390,92 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma) return -ENOMEM; } +/* + * reconnect_page() - If the page is not NULL and has a non-NULL anon_vma, + * reconnect the page to a anon_vma of the given new VMA. + * @page: Page to reconnect to different anon_vma + * @old: Old VMA the page is connected to + * @new: New VMA the page will be reconnected to + */ +static int reconnect_page(struct page *page, struct vm_area_struct *old, + struct vm_area_struct *new) +{ + struct anon_vma *page_anon_vma; + unsigned long anon_mapping; + /* Do some checks and lock the page */ + if (page == NULL) + return 0; /* Virtual memory page is not mapped */ + get_page(page); + if (!trylock_page(page)) { + put_page(page); + return -1; + } + anon_mapping = (unsigned long)READ_ONCE(page->mapping); + page_anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); + if (page_anon_vma != NULL) { /* NULL in case of ZERO_PAGE or KSM page */ + VM_WARN_ON(page_anon_vma != old->anon_vma); + VM_WARN_ON(old->anon_vma == new->anon_vma); + /* Update physical page's mapping */ + page_move_anon_rmap(page, new); + } + unlock_page(page); + put_page(page); + return 0; +} + +/* + * reconnect_page_pte() - Reconnect page mapped by pte from old anon_vma + * to new anon_vma. + * @pte: pte to work with + * @addr: Address where the page should be mapped. + * @end: Not used + * @walk: Pagewalk structure holding pointer to old and new VMAs. + */ +static int reconnect_page_pte(pte_t *pte, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct vm_area_struct *old = walk->vma; + struct page *page; + + /* + * Page's anon_vma will be reconstructed automatically from the + * VMA after the data will be moved back into RAM + */ + if (!pte_present(*pte)) + return 0; + + page = vm_normal_page(old, addr, *pte); + + if (reconnect_page(page, old, walk->private) == -1) + walk->action = ACTION_AGAIN; + return 0; +} + +/* + * reconnect_pages_range() - Reconnect physical pages to anon_vma of target VMA + * @mm: Memory descriptor + * @start: range start + * @end: range end + * @target: VMA to newly contain all physical pages + * @source: VMA which contains the all physical page before reconnecting them + */ +void reconnect_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, + struct vm_area_struct *target, struct vm_area_struct *source) +{ + const struct mm_walk_ops reconnect_pages_ops = { + .pte_entry = reconnect_page_pte + }; + + struct mm_walk walk = { + .ops = &reconnect_pages_ops, + .mm = mm, + .private = target, + .flags = WALK_MIGRATION & WALK_LOCK_RMAP, + .vma = source + }; + /* Modify page->mapping for all pages in range */ + __walk_page_range(start, end, &walk); +} /* * rbst_no_children() - Used by rbt_no_children to check node subtree. From patchwork Mon May 16 12:54:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Jakub_Mat=C4=9Bna?= X-Patchwork-Id: 12850767 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FC3EC433FE for ; Mon, 16 May 2022 12:53:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A4A8F6B007E; Mon, 16 May 2022 08:53:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9D4396B007B; Mon, 16 May 2022 08:53:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7FFA46B007E; Mon, 16 May 2022 08:53:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 6FD5B6B0078 for ; Mon, 16 May 2022 08:53:49 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 44A3B219F5 for ; Mon, 16 May 2022 12:53:49 +0000 (UTC) X-FDA: 79471598178.24.827F9DD Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) by imf31.hostedemail.com (Postfix) with ESMTP id 2E640200B8 for ; Mon, 16 May 2022 12:53:21 +0000 (UTC) Received: by mail-wm1-f50.google.com with SMTP id o12-20020a1c4d0c000000b00393fbe2973dso10847297wmh.2 for ; Mon, 16 May 2022 05:53:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1W3RlyTjVOjH4Gd0UoHRyOhg3A3jrqbRy5vwvsgPFcY=; b=g5FLaynBywGous6vCk87ZA5i9Lg2O1nmxnmBO9QP9/58BL4un0UCui3Hgr9/hT3DE6 GfHkyBIIbDDuMzs7pHrunoD6RX/MxmqNyAFuo8dPkwItIxT5dkVip4qs57dCdrRqqw21 vGThLf8Aa0qp1RWED+nTR7KlD4R22BEhJyZKi7gwsSqabFJtsFT5JumTOcBXDpjjqXYs wycuLYhk3/I6mxwG0tEDuZN/QhWhZhPgTqBdPb0/nFjWQThvrlfZbtqIKQXkEDDc+SKB 6jMzVgFa2SVEyHOxDUPeAtX+VpnEk1s2xcv/1hiFBNrUpZwbMHlFyv6Q56VOGn4WE5iK lzEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1W3RlyTjVOjH4Gd0UoHRyOhg3A3jrqbRy5vwvsgPFcY=; b=WVkb2ihJ3G8s0D5s4+BSTKBaWeym0tebmD/Q1SicU1w0PJrff/2dtDWK3j1tCof0gJ xWLJU4AfiQyh5gmk6r2E7Od4Os4o86DYC8z78k/WA+BdwZn9AIrnHJIxohdV8dL4NCs5 zBV6Zis5rfnhBtYBhakgfbrNckvPUKOsHM8C0rfbmET4UQfpA1HKUu/DShQdrO2TOUaw 7k8QSf7FqBAdVujWYXspYtVmQ8XMfMOQD3OtP2jnkgTcVqCK3J2ApbhJ/UwIj0LdnaUq k7gGzqn5YBfvgTCXR8yArBhhu9uwlKti+s0gbjaeiT7X2BdBaS/N9FuZKjiXZ8KaIMf/ asAw== X-Gm-Message-State: AOAM530ux4BurKPAL7tyJ3ik1GfJ/KV6rwkLChaJajhYE0dcJFWS8JW1 DlZSxWmJunwZDKFyl8k7sRc= X-Google-Smtp-Source: ABdhPJy/2sLuKLl1LvZDJ3ZDRIwboaTEVNLTJRgoLvmkn0XWpdN2Yqwm8wwOfcFFsc0FGMKEkptNOA== X-Received: by 2002:a7b:c095:0:b0:393:fd2e:9191 with SMTP id r21-20020a7bc095000000b00393fd2e9191mr26817829wmh.137.1652705627604; Mon, 16 May 2022 05:53:47 -0700 (PDT) Received: from orion.localdomain ([93.99.228.15]) by smtp.gmail.com with ESMTPSA id u23-20020a05600c00d700b003942a244ec2sm9958610wmm.7.2022.05.16.05.53.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 05:53:46 -0700 (PDT) Received: by orion.localdomain (Postfix, from userid 1003) id 9E153A0E7B; Mon, 16 May 2022 14:54:07 +0200 (CEST) From: =?utf-8?q?Jakub_Mat=C4=9Bna?= To: linux-mm@kvack.org Cc: patches@lists.linux.dev, linux-kernel@vger.kernel.org, vbabka@suse.cz, mhocko@kernel.org, mgorman@techsingularity.net, willy@infradead.org, liam.howlett@oracle.com, hughd@google.com, kirill@shutemov.name, riel@surriel.com, rostedt@goodmis.org, peterz@infradead.org, david@redhat.com, =?utf-8?q?Jakub_Mat=C4=9Bna?= Subject: [RFC PATCH v3 6/6] [PATCH 6/6] mm: add tracing for VMA merges Date: Mon, 16 May 2022 14:54:05 +0200 Message-Id: <20220516125405.1675-7-matenajakub@gmail.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220516125405.1675-1-matenajakub@gmail.com> References: <20220516125405.1675-1-matenajakub@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 2E640200B8 X-Stat-Signature: hnckt6bdcrj7iy53hksijp3j8x6cghfm Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=g5FLaynB; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf31.hostedemail.com: domain of matenajakub@gmail.com designates 209.85.128.50 as permitted sender) smtp.mailfrom=matenajakub@gmail.com X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1652705601-341605 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Adds trace support for vma_merge to measure successful and unsuccessful merges of two VMAs with distinct anon_vmas and also trace support for merges made possible by update of page offset made possible by a previous patch in this series. Signed-off-by: Jakub Matěna --- include/trace/events/mmap.h | 83 +++++++++++++++++++++++++++++++++++++ mm/internal.h | 12 ++++++ mm/mmap.c | 69 ++++++++++++++++-------------- 3 files changed, 133 insertions(+), 31 deletions(-) diff --git a/include/trace/events/mmap.h b/include/trace/events/mmap.h index 4661f7ba07c0..bad7abe4899c 100644 --- a/include/trace/events/mmap.h +++ b/include/trace/events/mmap.h @@ -6,6 +6,27 @@ #define _TRACE_MMAP_H #include +#include <../mm/internal.h> + +#define AV_MERGE_TYPES \ + EM(MERGE_FAILED) \ + EM(AV_MERGE_FAILED) \ + EM(AV_MERGE_NULL) \ + EM(AV_MERGE_SAME) \ + EMe(AV_MERGE_DIFFERENT) + +#undef EM +#undef EMe +#define EM(a) TRACE_DEFINE_ENUM(a); +#define EMe(a) TRACE_DEFINE_ENUM(a); + +AV_MERGE_TYPES + +#undef EM +#undef EMe + +#define EM(a) { a, #a }, +#define EMe(a) { a, #a } TRACE_EVENT(vm_unmapped_area, @@ -42,6 +63,68 @@ TRACE_EVENT(vm_unmapped_area, __entry->low_limit, __entry->high_limit, __entry->align_mask, __entry->align_offset) ); + +TRACE_EVENT(vm_av_merge, + + TP_PROTO(int merged, enum vma_merge_res merge_prev, + enum vma_merge_res merge_next, enum vma_merge_res merge_both), + + TP_ARGS(merged, merge_prev, merge_next, merge_both), + + TP_STRUCT__entry( + __field(int, merged) + __field(enum vma_merge_res, predecessor_different_av) + __field(enum vma_merge_res, successor_different_av) + __field(enum vma_merge_res, predecessor_with_successor_different_av) + __field(int, same_count) + __field(int, diff_count) + __field(int, failed_count) + ), + + TP_fast_assign( + __entry->merged = merged == 0; + __entry->predecessor_different_av = merge_prev; + __entry->successor_different_av = merge_next; + __entry->predecessor_with_successor_different_av = merge_both; + __entry->same_count = (merge_prev == AV_MERGE_SAME) + + (merge_next == AV_MERGE_SAME) + + (merge_both == AV_MERGE_SAME); + __entry->diff_count = (merge_prev == AV_MERGE_DIFFERENT) + + (merge_next == AV_MERGE_DIFFERENT) + + (merge_both == AV_MERGE_DIFFERENT); + __entry->failed_count = (merge_prev == AV_MERGE_FAILED) + + (merge_next == AV_MERGE_FAILED) + + (merge_both == AV_MERGE_FAILED); + ), + + TP_printk("merged=%d predecessor=%s successor=%s predecessor_with_successor=%s same_count=%d diff_count=%d failed_count=%d", + __entry->merged, + __print_symbolic(__entry->predecessor_different_av, AV_MERGE_TYPES), + __print_symbolic(__entry->successor_different_av, AV_MERGE_TYPES), + __print_symbolic(__entry->predecessor_with_successor_different_av, AV_MERGE_TYPES), + __entry->same_count, __entry->diff_count, __entry->failed_count) + +); + +TRACE_EVENT(vm_pgoff_merge, + + TP_PROTO(struct vm_area_struct *vma, bool anon_pgoff_updated), + + TP_ARGS(vma, anon_pgoff_updated), + + TP_STRUCT__entry( + __field(bool, faulted) + __field(bool, updated) + ), + + TP_fast_assign( + __entry->faulted = vma->anon_vma; + __entry->updated = anon_pgoff_updated; + ), + + TP_printk("faulted=%d updated=%d\n", + __entry->faulted, __entry->updated) +); #endif /* This part must be outside protection */ diff --git a/mm/internal.h b/mm/internal.h index cf16280ce132..9284e779f53d 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -35,6 +35,18 @@ struct folio_batch; /* Do not use these with a slab allocator */ #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) +/* + * Following values indicate reason for merge success or failure. + */ +enum vma_merge_res { + MERGE_FAILED, + AV_MERGE_FAILED, + AV_MERGE_NULL, + MERGE_OK = AV_MERGE_NULL, + AV_MERGE_SAME, + AV_MERGE_DIFFERENT, +}; + void page_writeback_init(void); static inline void *folio_raw_mapping(struct folio *folio) diff --git a/mm/mmap.c b/mm/mmap.c index e7760e378a68..3cecc2efe763 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1103,21 +1103,21 @@ static inline int is_mergeable_anon_vma(struct anon_vma *anon_vma1, */ if ((!anon_vma1 || !anon_vma2) && (!vma || list_is_singular(&vma->anon_vma_chain))) - return 1; + return AV_MERGE_NULL; if (anon_vma1 == anon_vma2) - return 1; + return AV_MERGE_SAME; /* * Different anon_vma but not shared by several processes */ else if ((anon_vma1 && anon_vma2) && (anon_vma1 == anon_vma1->root) && (rbt_no_children(anon_vma1))) - return 1; + return AV_MERGE_DIFFERENT; /* * Different anon_vma and shared -> unmergeable */ else - return 0; + return AV_MERGE_FAILED; } /* @@ -1138,12 +1138,10 @@ can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags, struct vm_userfaultfd_ctx vm_userfaultfd_ctx, struct anon_vma_name *anon_name) { - if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) && - is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) { + if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name)) if (vma->vm_pgoff == vm_pgoff) - return 1; - } - return 0; + return is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma); + return MERGE_FAILED; } /* @@ -1160,14 +1158,13 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags, struct vm_userfaultfd_ctx vm_userfaultfd_ctx, struct anon_vma_name *anon_name) { - if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) && - is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) { + if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name)) { pgoff_t vm_pglen; vm_pglen = vma_pages(vma); if (vma->vm_pgoff + vm_pglen == vm_pgoff) - return 1; + return is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma); } - return 0; + return MERGE_FAILED; } /* @@ -1224,8 +1221,14 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm, pgoff_t pglen = (end - addr) >> PAGE_SHIFT; struct vm_area_struct *area, *next; int err = -1; - bool merge_prev = false; - bool merge_next = false; + /* + * Following three variables are used to store values + * indicating wheather this VMA and its anon_vma can + * be merged and also the type of failure or success. + */ + enum vma_merge_res merge_prev = MERGE_FAILED; + enum vma_merge_res merge_both = MERGE_FAILED; + enum vma_merge_res merge_next = MERGE_FAILED; /* * We later require that vma->vm_flags == vm_flags, @@ -1246,32 +1249,34 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm, /* Can we merge the predecessor? */ if (prev && prev->vm_end == addr && - mpol_equal(vma_policy(prev), policy) && - can_vma_merge_after(prev, vm_flags, + mpol_equal(vma_policy(prev), policy)) { + merge_prev = can_vma_merge_after(prev, vm_flags, anon_vma, file, pgoff, - vm_userfaultfd_ctx, anon_name)) { - merge_prev = true; - area = prev; + vm_userfaultfd_ctx, anon_name); } + /* Can we merge the successor? */ if (next && end == next->vm_start && - mpol_equal(policy, vma_policy(next)) && - can_vma_merge_before(next, vm_flags, - anon_vma, file, pgoff+pglen, - vm_userfaultfd_ctx, anon_name)) { - merge_next = true; + mpol_equal(policy, vma_policy(next))) { + merge_next = can_vma_merge_before(next, vm_flags, + anon_vma, file, pgoff+pglen, + vm_userfaultfd_ctx, anon_name); } + /* Can we merge both the predecessor and the successor? */ - if (merge_prev && merge_next && - is_mergeable_anon_vma(next->anon_vma, - prev->anon_vma, NULL)) { /* cases 1, 6 */ + if (merge_prev >= MERGE_OK && merge_next >= MERGE_OK) + merge_both = is_mergeable_anon_vma(next->anon_vma, prev->anon_vma, NULL); + + if (merge_both >= MERGE_OK) { /* cases 1, 6 */ err = __vma_adjust(prev, prev->vm_start, next->vm_end, prev->vm_pgoff, NULL, prev); - } else if (merge_prev) { /* cases 2, 5, 7 */ + area = prev; + } else if (merge_prev >= MERGE_OK) { /* cases 2, 5, 7 */ err = __vma_adjust(prev, prev->vm_start, end, prev->vm_pgoff, NULL, prev); - } else if (merge_next) { + area = prev; + } else if (merge_next >= MERGE_OK) { if (prev && addr < prev->vm_end) /* case 4 */ err = __vma_adjust(prev, prev->vm_start, addr, prev->vm_pgoff, NULL, next); @@ -1285,7 +1290,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm, */ area = next; } - + trace_vm_av_merge(err, merge_prev, merge_next, merge_both); /* * Cannot merge with predecessor or successor or error in __vma_adjust? */ @@ -3346,6 +3351,8 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, /* * Source vma may have been merged into new_vma */ + trace_vm_pgoff_merge(vma, anon_pgoff_updated); + if (unlikely(vma_start >= new_vma->vm_start && vma_start < new_vma->vm_end)) { /*