From patchwork Wed Jan 25 01:57:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 13114947 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAAA4C54E94 for ; Wed, 25 Jan 2023 01:58:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E50C96B0071; Tue, 24 Jan 2023 20:58:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E02A86B0072; Tue, 24 Jan 2023 20:58:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC8246B0073; Tue, 24 Jan 2023 20:58:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id BD7D16B0071 for ; Tue, 24 Jan 2023 20:58:15 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9105AA0A62 for ; Wed, 25 Jan 2023 01:58:15 +0000 (UTC) X-FDA: 80391661350.29.6ECCDD8 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf19.hostedemail.com (Postfix) with ESMTP id D13001A0006 for ; Wed, 25 Jan 2023 01:58:12 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=WAqnNhti; spf=pass (imf19.hostedemail.com: domain of 3s4zQYwcKCDQpeaUUVUWeeWbU.SecbYdkn-ccalQSa.ehW@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3s4zQYwcKCDQpeaUUVUWeeWbU.SecbYdkn-ccalQSa.ehW@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674611892; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=fp9ccs/R/HqGAoRU6R/FJYwx892zglssKTnZXb6TNhs=; b=d0lkoYMa6Pb0wiW+dKDGmJKsCa19M7NmQrzOmtoMgGjFuN3yF2n17xIiRKI7/HusTEREW3 oOLmq3waFreRL6Kr2Rr2rEhKvNRctM95Mw6pQC8UTv2Cj+RaMG60l/ft6Md0CZrxNI2QvO buNGukmMmV5FUDsxWMo4CeY1QV6GULw= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=WAqnNhti; spf=pass (imf19.hostedemail.com: domain of 3s4zQYwcKCDQpeaUUVUWeeWbU.SecbYdkn-ccalQSa.ehW@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3s4zQYwcKCDQpeaUUVUWeeWbU.SecbYdkn-ccalQSa.ehW@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674611892; a=rsa-sha256; cv=none; b=DgDo4t8snZcjRsaheRePL3jqNoz5Ff2L0Sn/1LhsLK7btfff5tAZjngpMCfR544afwNiBq u3Ap/STgTE8XlcQM8796YkctAwsxyHd/STyjDSuxPK6JZfYa+TTOzZ9AYRrJS7CoEfe8nH /MjHLI/a6y+5FIFFljvkbGVXfoCMP2o= Received: by mail-pg1-f201.google.com with SMTP id a33-20020a630b61000000b00429d91cc649so7576150pgl.8 for ; Tue, 24 Jan 2023 17:58:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=fp9ccs/R/HqGAoRU6R/FJYwx892zglssKTnZXb6TNhs=; b=WAqnNhtiG8WjUY8O+bk3uMlpTRdXAO1398g06DJIjO5tpLENLCDEFEWfl5MKQQ2cS4 FJmL4Hx1MgPkUvaWuxuEKoXfRmKWL2q9d5N75BjvVzLdMDt/6/LMiboSf2ZuostsQRoT 5oOPTgTiEDS0FKff/+yrSTTKn/j66fTfrr+Er9d8jkJiPj/NjiPkpckv7Sfa8Z4H5BlN +qW9JTz3wAKujy3DdbdwCaEfka7lpLk/bYQvXMvMgNfLHIjIFV2LvQDDSZdYttuhSOO5 tVfxl+1M1BxCZvN4M8o48BWAxGGoiHGmTA+PLRW0WNl344BBT0PzlqW+jQqTi6sTu+My HpVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=fp9ccs/R/HqGAoRU6R/FJYwx892zglssKTnZXb6TNhs=; b=slK9NLBE0SaKIwXG4Ke/6J0OYSrKsM3wYw9HvFFQALP9agJ3rXSbPrZJxOV9C+V725 dQFySbRx8vT7EA1dwklvZSj5A+ogzEEhE4Er9TTrcAFwMgiNN0faEta8Ba5HgSazfWFz sPAO6WGQmBNkCiYucJEUOY9tmey+4+FwVJpZjJdFk46w2nlwaHXDWVdsWwU64e+7k4vP Xal9vameBywGQ3FBuCKlIiWYH8Unwp9FWXlGu+YIsgUYcktT5bC0yYSm8rDsf0Wwy2Ko 3IjrqyrvUEvD9q050SnEvvpoJspHcZuxGcE8uRZkshS2IO/3VBRzWTzp6plQhwg1oBXJ njRA== X-Gm-Message-State: AFqh2koYABfGiljASBS8IgWNlVfqf+6uqwfftRTHYYcED54mrJDGOD5D 0DHhkK4eILU7UZKFajQWAbQAEwhNweZ6x5ekHjTkjEEjea7yXGrhBKPxQce1pS2kvX86MWiaFor OGtryzaejnaku4HZAB2sN0BAigHyBLXbnzeUI0LZVj3nws9mAXqKtxNXiJSQ= X-Google-Smtp-Source: AMrXdXumtB1amnGJ3IJUGIww8hnAuAjaZabzfhDhTWmrwPx4x/+DFmvmnBwEW8RCk35Au2yhMVxS0rtyDwrR X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:22ca:b0:58b:3d7f:746a with SMTP id f10-20020a056a0022ca00b0058b3d7f746amr3665212pfj.77.1674611891064; Tue, 24 Jan 2023 17:58:11 -0800 (PST) Date: Tue, 24 Jan 2023 17:57:37 -0800 Mime-Version: 1.0 X-Mailer: git-send-email 2.39.1.405.gd4c25cc71f-goog Message-ID: <20230125015738.912924-1-zokeefe@google.com> Subject: [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount From: "Zach O'Keefe" To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Hugh Dickins , Yang Shi , "Zach O'Keefe" X-Rspamd-Queue-Id: D13001A0006 X-Stat-Signature: 37kynjjjf7dioqakttkpoewxczwm69md X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1674611892-319564 X-HE-Meta: U2FsdGVkX18b23n1RLQmaJHGs7zsJc24AeCiXa03eFBvqdLEXpE/2wVrQ0PhbJUzh+c9gwZzMRofl39Tg2BKthoii/f4szXd/G/PKEIPWhc9cPG3ncCOAITN5eS8uyJLDux8tA9UhCaLrmK/kqqvcdo/5tl5SJvrc4i2p6YXumH0+3pSgrLNwyFbRaAnmBpe2v4htIiW+XfJCk5z6xwUBoqjXkpHWxWJ0bodNIMwAOOPuJwipNoFj00EZSDHPjvdoZaVGwfg/RjBI4t4YDX8J3plox3XZn6KyEXkjdyZ1LigcVRhXZr4iNHqFIsmeDmVhE3162hn1nF8Xsfc8gUsvllpxJLkGrsJ6/8s1RSEz4UH0GZsqfxLz+oZMe9vegH2wreVVSUHk3i1iJrDen2dvoEL8cMPXVYGS2AkDbz4ahDajd01Bey5Hn8UW7KdQHpenCIVbvHey+Jv6/GO2x0vXJOojEM+6BP49WoaHtqtkSCpKDMZ09lZ4mFy2oSyx2Mo+qsD0SXVD+F85YsBEvwKDQo7gqxk8sVxVPkIDweRc2nLU9U1CfLDygsCwm/0ddb6hjBVxx7r+OIs9msOu2cpuDzpAdE3VU84iLE8NZzpdHKc2JaqrX8IAYnJBaXu6Pv+Jm8FrQ+FghE3LifA1hPvLZd6USUIUTvqYX7Regbufs4BB83Shg4Ogbo1ijV7jjjvFXq+M51TGtVS9AnX1IRmNWQCoKHqodvp+QTMIwmuHKUgKOIui3YH7reMFXgpIOm3Py0P7RppMqonxBsOv4tDM0c2WQkiVpQ/nP3zrTzFwXakNK586NsrEpPHrornFMV67OBYNATKkG79J1drLJ8rUiRTz0Giv9H9hRyX986iRAPhgs0bsuVMuQQlgz+qYLY9Rje0iWKF+GjT7eiB13sA03PT4mQYw6uy5vMtDI0L1mgUNwEJclJn1YmtXZWruPIw05JOhvWr/2BLw53DjHn cmt609Zs U9xzpinFGEJOhQ1YqbjRhHJcSThans9AvL7PEsq5MJ3cUVxe3+m9j7RKl6zpIYJqFvG0bikoV/oliQL4YJdGiTHZ393qYzBO37QVO4tWGtlVfBVRGTO1lzvACWZmre8hcJE736iVUad1L6j7vWWVV6N+8cQXkRz7McXY+Ni7Pu+IYfJkFiN3H4HGujpgGv2rJu5eS1XeRcaxRQaU9gu80dJc0JGiR6MzwuHj4MT341itQHygJxXbSH7NQsp6/Wv1MfRkWQF0KUfHHz7zdWeNnvLxsKMR430pc1vHvXuZ2KdW3wUJP3FA4AC8uIIWBBsPovCqWYB6RojYP66upUO3NwbehRZtxr8X4uBEJcBWkLOoEC9u2rzEAlPHYrXbbK2jPJ+iR3kMlAiY78FtB2xwnqg3iB2ACuJjiYfARqDHqbnPailGGeLF0gERXzqU7NxxfMbb3QZQQcLFoaM4yHi33YnGvpN1mu2qQY9rWqTaDLMK1xRYbDNPxwBxZIIGdCxtK5N1LVw7pochP2EN+Fl3l1oeEBH99J3GH1rfUlBfwBvuGS2fcLRW0d1PWLDzCR19vObQ1iof3YOcb0Eja6zTftjxFb+Z4uH3n13HOpvvCaxLU8LHud1IZHxT33pdSJYxx54NO3RdnF/dV9m6tB/9uzpkqMCOH4CVmmZvCVuyNcPHoInw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: During collapse, in a few places we check to see if a given small page has any unaccounted references. If the refcount on the page doesn't match our expectations, it must be there is an unknown user concurrently interested in the page, and so it's not safe to move the contents elsewhere. However, the unaccounted pins are likely an ephemeral state. In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that collapse may succeed on retry. Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") Reported-by: Hugh Dickins Signed-off-by: Zach O'Keefe Reviewed-by: Yang Shi Acked-by: Hugh Dickins --- mm/khugepaged.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index e23619bfecc4..fa38cae240b9 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2712,6 +2712,7 @@ static int madvise_collapse_errno(enum scan_result r) case SCAN_CGROUP_CHARGE_FAIL: return -EBUSY; /* Resource temporary unavailable - trying again might succeed */ + case SCAN_PAGE_COUNT: case SCAN_PAGE_LOCK: case SCAN_PAGE_LRU: case SCAN_DEL_PAGE_LRU: From patchwork Wed Jan 25 01:57:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 13114948 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D979C54EAA for ; Wed, 25 Jan 2023 01:58:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 48CA26B0072; Tue, 24 Jan 2023 20:58:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 415C16B0073; Tue, 24 Jan 2023 20:58:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2DDBD6B0075; Tue, 24 Jan 2023 20:58:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 203226B0072 for ; Tue, 24 Jan 2023 20:58:16 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 0432E805AE for ; Wed, 25 Jan 2023 01:58:15 +0000 (UTC) X-FDA: 80391661392.22.1D29C4F Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) by imf28.hostedemail.com (Postfix) with ESMTP id 36AD1C0003 for ; Wed, 25 Jan 2023 01:58:14 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=bXlVQRum; spf=pass (imf28.hostedemail.com: domain of 3tIzQYwcKCDUqfbVVWVXffXcV.TfdcZelo-ddbmRTb.fiX@flex--zokeefe.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3tIzQYwcKCDUqfbVVWVXffXcV.TfdcZelo-ddbmRTb.fiX@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674611894; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CLeT8Nf/jj+XeEyGWZaHUysjcWhO2WN556Qt8YbZFTA=; b=djSryAJEJQf7zG3EWbPhVZ2o63jK6CgekVl0YdYuq5fA1s4rLbxVeCkslBF1gP8XKU49oq I1e/+jv3HgofFBfyiNn8r0Es+KwIozdlw3y6FhKTi74j79rA/SzwgLix4Bwlqb8sKfP+zI /JXSxNuusi7xaoUQo+YXzf+IES0BXI0= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=bXlVQRum; spf=pass (imf28.hostedemail.com: domain of 3tIzQYwcKCDUqfbVVWVXffXcV.TfdcZelo-ddbmRTb.fiX@flex--zokeefe.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3tIzQYwcKCDUqfbVVWVXffXcV.TfdcZelo-ddbmRTb.fiX@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674611894; a=rsa-sha256; cv=none; b=7CscFzIfGOu5vxzfFk8L8I9eCB+HhHMyOv2dSmYRhErfJmZoQiMXRoPw4djIGQuqlcfYwY HpFIKAF3h+r0uGbzughYcW7MJZjcPqUABOcKXtYc3w9VJ4QH/bdjBQKc7yiKuHCh+sNlye x88AGvv32Hf8/T+gNmuRep67rELyam4= Received: by mail-pf1-f201.google.com with SMTP id j1-20020aa78001000000b0057d28e11cb6so7478015pfi.11 for ; Tue, 24 Jan 2023 17:58:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=CLeT8Nf/jj+XeEyGWZaHUysjcWhO2WN556Qt8YbZFTA=; b=bXlVQRumzm40VAjDslHGOuknPuAdE9sb0L4akt62AYYiNfF1nl/PE9a3K4+mhxfxbU qlOjlx9HYpNZ7nlKZHVgHB0Q1n9WvvwyURl+Khn4oVXSDFDI5stAZNLD/GV7mYuhnAmX 4fP31lq2dBuIKqP7/dpZqBG1jQWdaE4UbMixSINcYMwJN6K9z5usFxW1G1RYeQzjzLmE IuTvdJPPP3texNImaofSkc7EevfsFsTYsDDWMHluYgcK4K8lzmtm33VRfk8XHHm/C1AO JzwN7PIDjhXbMVtdks4HwD6HiFKY6CPRKEfjrxQ0G2uWsS+SXuQvOj6jyB/oyIkb1ojY XkMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CLeT8Nf/jj+XeEyGWZaHUysjcWhO2WN556Qt8YbZFTA=; b=BolV3EscAGXNYgnqvxqyF0dBEl7/Dw1Q+RDO9yxjW9BVCxy1gdXmQM8XGQsR4ZZAqf FlrOUM6k/eUHVzVRUbw3X9Nkn6swBEp7TaEbtrgDuNC6KrU8+Q7jylxnZUWfry3rN877 0Oauddzq6J+TDBKIzJoDoxq07YAz0a8rSgf2/4tc/pSGTw9xa3Bom7D+wICBc+Gii+Ph /+93tZrriwJZyZpTr8QvEf6jU1tTPi2qnBxDd+m+dJOpCDm2l1P1ErvXdJMUj1n2xHYD Qzrh3Fnz5gWQNuJcjPU46T+Lf4bF5sAjyaTeGdM/TZx8VF3WlbHKN10wIwbCn2qplxr4 o82g== X-Gm-Message-State: AFqh2ko0g5i0UNJ8b3k5YFx/iF8A0hPJGKyYh884HBXjJb1nWV9nWsSC FUCKAJ/zKOUxp/T3bpzz/n4eX1DoKBG9W0w32nJZJZC3/SMqtrtyXUuryVy71k2Cf5vapdJmJC8 6LiPrFYricJIPk3ogCfr2dUnDODuqVvyFmpmlDBqBpIMWCQ6j2WuUHfp6dns= X-Google-Smtp-Source: AMrXdXuhZDEkZKkDLUre49lmdMlmiICegzf6uQdbSopmZQDIYG8wsakARMsaTdDgeLxTwvPxruwcgZi2TTo/ X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a62:6d04:0:b0:578:9709:615f with SMTP id i4-20020a626d04000000b005789709615fmr3520547pfc.45.1674611892894; Tue, 24 Jan 2023 17:58:12 -0800 (PST) Date: Tue, 24 Jan 2023 17:57:38 -0800 In-Reply-To: <20230125015738.912924-1-zokeefe@google.com> Mime-Version: 1.0 References: <20230125015738.912924-1-zokeefe@google.com> X-Mailer: git-send-email 2.39.1.405.gd4c25cc71f-goog Message-ID: <20230125015738.912924-2-zokeefe@google.com> Subject: [PATCH 2/2] mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups From: "Zach O'Keefe" To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Hugh Dickins , Yang Shi , "Zach O'Keefe" , stable@vger.kernel.org X-Rspamd-Queue-Id: 36AD1C0003 X-Stat-Signature: jqhrugg4g5erfgs3sc4mry9ra5wsm8e7 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1674611893-354220 X-HE-Meta: U2FsdGVkX1/SUYPafTxUVY3h5vBmcOkMQcgin4PuHx2vX4/XOMLZHfT5GFdJYlpGMlVAwiGLhA2eOHExlRvvEiTJwpTh93dhrtHoBcTduLggvP1dN4y34yyZ41D7dYWFz3I1WAPljvoSLdQbS25qXuBP6YTuTO+qIOHYEpprLD7o8dDcwZFZvEBbD+5wejRkCHyjuAvrKtspL/r9L26v4xu/pxHKwLzl3oHx+aw9VwmVM40WnwgYKQJ8DmZu702gHnoChbmmWR8J+OQkhRVVza9lwbmAfK7EVEqqCAz6nyb+Vz2NqV5pM7r7NSymKAaz2T9LLalJQrWx7FqUxr7kvJZWQJJC/CoX4gbgi79aKUjwTiflk0QfA02B4EHRwMX2aCJXnpFcUVgI++5j0l0/Q+kB2/pfIEBk1UwAMlahmTJVWQMjxov998q2CN1fhOvoQCDrIEyHnSRpK+CcNQegppc3dWNNE4OGTEfxNGqBkO1FRaQbDaDCMkaXbmnplNQxex2F58LUCb3u64ikiGLXdFS6mkDns45+RhdXbXO6dMGp1wJjcBeWSa1KMk5NSHCnOmYYTq/WjhhU+aV87eXN82MC78qd9HIdp4q/jMTvH+k+3iK9KqJeAnjuQLWFymTq2sBxYzi+YrtUsJnK54n+uZ6rGvNRe/JjzGo/UnpH5rQg6XGA7HLtSTEDS8ie/zqXfaVp/UaT5rAYFuMDVNrgy2WYVLkiD40lb0gWojuU3mCyMkwSwRn1VWtpAo5AUWvNFUuDs5wQ6+PdUtuF2AoeDjJIvJcmeSo8H9Pp4Tr/WPeeu9ZJ8yUA0YpZqyGQLK51sIbS/EJq9dccfh6otOLPujWBPocQjmSM8ytJNq1ouT16ZzVUyr3NuR0MhcU/1pA3oXLr16Pj/6HYsQXYBaUfgviaoZ5uAae+pNdCDQWbN4TFf6jmmfhV3fhmKreCHd+Dl2VPdoJlHAqvqaWQ/bJ TSNaeO/b GtP93x+0WfDRtaStxLb1M0rI4OotgusqHoaXmOXruLgpyW5e8ZtAXV8Z6gpkRIHryXhoDkXhlzbYPth2FqHw/Exe1vikyddpq/HhzExVcntLRcxbYXGV14QzxBIaSVRy5mUxLQJgSXTNmtUH2I5PHn1GwiZvhEOutC9gNGHN/D2kAFyihW4y3LAp2FucpisAj34VOQx2BdWIYjxOTTuLNRxrxVRc9iZS8ztqseTKSGHJkDCLw3XhdZsVi91YIlDLQK0cgIVgLw6NHCVQtQ+0sgWETm9faJMfetMgrnsIDU5Bb+bmRT7pfoLDnj38lWugWGoZBGO3eXjCZXZQB3BchLLC+SiheI7PKFq4RsZWSq/17T2IrkRv9T6uoXKkE7r5xBELczrZ7iDij7ltctoPI8l3PCEkq8GtdNEThWq8UInuo3or2912yFxitKRilwh45YDKgfsZRQoEOJPuIPrYHihbeL9ZSo1O0VNAgGPO+5nJ4GulsshCzjZEQNGVXzYNdwp5UuWR669QYm3s8tDcJGty2rUkFQvtk+dC3YQk53cUTElp3PwcfM/OYTjSKKKut1iyMiEKvsT1tt97ccErhCdPYkdKxvcHCs/X+ihv6MIOHVMHak1tdz0LV1gJSwZFvxJGsN0aBQ4qoQUlR+qHIjj+p9pPnaW7YuT/VReR6oABbBkoY6x+pu3ptvyT+rQFcncONux4M7YRz6+spQmUw/ZJlT825X9utCVu75AbCjqmwCf+3ky+bUtRB4GLSlCAoAkO/n+Oy5qBs24s5GPXE6Y5Jaw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.001009, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In commit 34488399fa08 ("mm/madvise: add file and shmem support to MADV_COLLAPSE") we make the following change to find_pmd_or_thp_or_none(): - if (!pmd_present(pmde)) - return SCAN_PMD_NULL; + if (pmd_none(pmde)) + return SCAN_PMD_NONE; This was for-use by MADV_COLLAPSE file/shmem codepaths, where MADV_COLLAPSE might identify a pte-mapped hugepage, only to have khugepaged race-in, free the pte table, and clear the pmd. Such codepaths include: A) If we find a suitably-aligned compound page of order HPAGE_PMD_ORDER already in the pagecache. B) In retract_page_tables(), if we fail to grab mmap_lock for the target mm/address. In these cases, collapse_pte_mapped_thp() really does expect a none (not just !present) pmd, and we want to suitably identify that case separate from the case where no pmd is found, or it's a bad-pmd (of course, many things could happen once we drop mmap_lock, and the pmd could plausibly undergo multiple transitions due to intervening fault, split, etc). Regardless, the code is prepared install a huge-pmd only when the existing pmd entry is either a genuine pte-table-mapping-pmd, or the none-pmd. However, the commit introduces a logical hole; namely, that we've allowed !none- && !huge- && !bad-pmds to be classified as genuine pte-table-mapping-pmds. One such example that could leak through are swap entries. The pmd values aren't checked again before use in pte_offset_map_lock(), which is expecting nothing less than a genuine pte-table-mapping-pmd. We want to put back the !pmd_present() check (below the pmd_none() check), but need to be careful to deal with subtleties in pmd transitions and treatments by various arch. The issue is that __split_huge_pmd_locked() temporarily clears the present bit (or otherwise marks the entry as invalid), but pmd_present() and pmd_trans_huge() still need to return true while the pmd is in this transitory state. For example, x86's pmd_present() also checks the _PAGE_PSE , riscv's version also checks the _PAGE_LEAF bit, and arm64 also checks a PMD_PRESENT_INVALID bit. Covering all 4 cases for x86 (all checks done on the same pmd value): 1) pmd_present() && pmd_trans_huge() All we actually know here is that the PSE bit is set. Either: a) We aren't racing with __split_huge_page(), and PRESENT or PROTNONE is set. => huge-pmd b) We are currently racing with __split_huge_page(). The danger here is that we proceed as-if we have a huge-pmd, but really we are looking at a pte-mapping-pmd. So, what is the risk of this danger? The only relevant path is: madvise_collapse() -> collapse_pte_mapped_thp() Where we might just incorrectly report back "success", when really the memory isn't pmd-backed. This is fine, since split could happen immediately after (actually) successful madvise_collapse(). So, it should be safe to just assume huge-pmd here. 2) pmd_present() && !pmd_trans_huge() Either: a) PSE not set and either PRESENT or PROTNONE is. => pte-table-mapping pmd (or PROT_NONE) b) devmap. This routine can be called immediately after unlocking/locking mmap_lock -- or called with no locks held (see khugepaged_scan_mm_slot()), so previous VMA checks have since been invalidated. 3) !pmd_present() && pmd_trans_huge() Not possible. 4) !pmd_present() && !pmd_trans_huge() Neither PRESENT nor PROTNONE set => not present I've checked all archs that implement pmd_trans_huge() (arm64, riscv, powerpc, longarch, x86, mips, s390) and this logic roughly translates (though devmap treatment is unique to x86 and powerpc, and (3) doesn't necessarily hold in general -- but that doesn't matter since !pmd_present() always takes failure path). Also, add a comment above find_pmd_or_thp_or_none() to help future travelers reason about the validity of the code; namely, the possible mutations that might happen out from under us, depending on how mmap_lock is held (if at all). Fixes: 34488399fa08 ("mm/madvise: add file and shmem support to MADV_COLLAPSE") Reported-by: Hugh Dickins Signed-off-by: Zach O'Keefe Cc: stable@vger.kernel.org --- Request that this be pulled into stable since it's theoretically possible (though I have no reproducer) that while mmap_lock is dropped, racing thp migration installs a pmd migration entry which then has a path to be consumed, unchecked, by pte_offset_map(). --- mm/khugepaged.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index fa38cae240b9..7ea668bbea70 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -941,6 +941,10 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, return SCAN_SUCCEED; } +/* + * See pmd_trans_unstable() for how the result may change out from + * underneath us, even if we hold mmap_lock in read. + */ static int find_pmd_or_thp_or_none(struct mm_struct *mm, unsigned long address, pmd_t **pmd) @@ -959,8 +963,12 @@ static int find_pmd_or_thp_or_none(struct mm_struct *mm, #endif if (pmd_none(pmde)) return SCAN_PMD_NONE; + if (!pmd_present(pmde)) + return SCAN_PMD_NULL; if (pmd_trans_huge(pmde)) return SCAN_PMD_MAPPED; + if (pmd_devmap(pmd)) + return SCAN_PMD_NULL; if (pmd_bad(pmde)) return SCAN_PMD_NULL; return SCAN_SUCCEED;