From patchwork Wed Jul 6 20:23:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12908567 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BF45C43334 for ; Wed, 6 Jul 2022 20:24:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4DBB66B0073; Wed, 6 Jul 2022 16:24:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 464666B0074; Wed, 6 Jul 2022 16:24:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 267496B0075; Wed, 6 Jul 2022 16:24:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1191C6B0073 for ; Wed, 6 Jul 2022 16:24:06 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DCB7320D52 for ; Wed, 6 Jul 2022 20:24:05 +0000 (UTC) X-FDA: 79657801650.13.3EB853B Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by imf01.hostedemail.com (Postfix) with ESMTP id 603FD40016 for ; Wed, 6 Jul 2022 20:24:05 +0000 (UTC) Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 266ITxSt022853; Wed, 6 Jul 2022 20:23:57 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=S8ky+HpgCvtwJPQsoEGH4UhlCf9p48ZRt37tRuCVtNs=; b=sTAgq4qRMcatDhyAulbi+yftIxYwoWE75L1+NTiZOfJMD11ocZGlLjpx3EzBtLSozjhL cIKdh9qOR5hOIaOOk84DR8cgBtJg7P5zl07Oc3NcKe4FAzVjZ6CuwuFfMR4vAF7eLcIp LkBKPn91Rda6B51aQw97E4beS18JYG0te+nDEBJWp5phYVtRWsIo4yi5GmU6bwQ0buoL ZGzSKDGvj39uB5x78eeXO9rfFM7Fkq7VEYnHVA8ke37U4FL4iT/mNAmhAEX+/aAwjLIf 37UDCWLaL503lZJl0zeOo/CC8He4Ve9pWrv+2oh5kBizuPKms4l9S7qLUuQtm3cTmD2G AQ== Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3h4ubyk9we-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Jul 2022 20:23:57 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 266KLI3Y022765; Wed, 6 Jul 2022 20:23:56 GMT Received: from nam12-bn8-obe.outbound.protection.outlook.com (mail-bn8nam12lp2176.outbound.protection.outlook.com [104.47.55.176]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com with ESMTP id 3h4ud63w1j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Jul 2022 20:23:56 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=TAVta7vloAL+2Rt60JulYWiRtF7ik92/sBBHnV0OkGiSDoUC0IoIU8ZaM6UKfO/2L7wMC0hhlcaJL7jOsj2CF5dRWgIF5LYYflWCKbCiieLAPkdIxwePM0CsdhC236WsFCF4E+6RX0YkZJf8fdTQ4JBi3RXXbMz7oiC0k5NirTFM0Xm1NB0oLZ3NgyzqbOrcxzfab9YryDc6cHmQfIc+Te/HNTQiF6XsI4FQfubUKWRyK5rLgRtTN2u0l8U8kLD3foAsn/HQl4lhyDCtVGYEPukun3qRo/MbtQRzQZ6qJG80uKJZlmRyn2FYRXhwKnekTWPVM/H/BgUiU0mOMDdzsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=S8ky+HpgCvtwJPQsoEGH4UhlCf9p48ZRt37tRuCVtNs=; b=GTdODo8BQOn1PZ5j0HOZqf1UrWuKznkpM9kewqBbzuiQ0IXuw0OkEa4U+IZGnNpl6obYb/Sa5puSSC3D2Slo8JY0CaXvJi3Htk/86s/owkxFcbv8g/iqkFUolp8TzuugDBP2r+mvecHvc+l7theRRx7oHTi+SWVCbMn8JbZZZNW6lohoeNK7YNJxQoubSbzZnMUUyr68uS15g3TYCneXLDUul/6rLRfIorcJAFxdW2OPYYDoY6UdGVZ6uvUGcwH61fimrBtROllNVJIwa36ts5C4RgbDVAlR+92maVrkFeDSLnIskQUhEmhFy/+flQu9ofzu4FONs0DLMt0EnQvDnw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=S8ky+HpgCvtwJPQsoEGH4UhlCf9p48ZRt37tRuCVtNs=; b=BtXX38bfgaYLXUPwDkcnAjrFbs/F92OVx7Vdb5MHEur/Gih6ticbpAez+9QI5KJoaeNdiALkCYDIJUzQ5mR6N2BlXWgV9sf4O/WS/aIOJS7C1F0S4oH/uxXMuUYCiu/JGwYlIZXn49lgR1RzuIuQR2nLwMLHV6ICbnJc0kZwoTE= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by MN2PR10MB4032.namprd10.prod.outlook.com (2603:10b6:208:181::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5417.15; Wed, 6 Jul 2022 20:23:54 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::c1ba:c197:f81f:ec0]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::c1ba:c197:f81f:ec0%6]) with mapi id 15.20.5417.016; Wed, 6 Jul 2022 20:23:54 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Michal Hocko , Peter Xu , Naoya Horiguchi , David Hildenbrand , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH v4 1/8] hugetlbfs: revert use i_mmap_rwsem to address page fault/truncate race Date: Wed, 6 Jul 2022 13:23:40 -0700 Message-Id: <20220706202347.95150-2-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220706202347.95150-1-mike.kravetz@oracle.com> References: <20220706202347.95150-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MWHPR13CA0043.namprd13.prod.outlook.com (2603:10b6:300:95::29) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 4ec24ff1-5785-4e36-c98e-08da5f8d7253 X-MS-TrafficTypeDiagnostic: MN2PR10MB4032:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: K9ELnLpZrDFXZ8a+JaRaK8yQZqZ141fP6djDZIwJG5TFCpe5ZI7UjszpBJzv3vPbOE53UR1GvJB3xNqtSa5wm+OFxa4RsMFFbW8M1NxiPljxhO8ci0KJMfFV5MhS7uRmitM9N7PdcP6/mRKOXB4l5zRtTTj1IYL0AuqxvzneWP2BtgowcEpbNFq0EFT7TGqWt8hmUQD0DX2n0C1rwGYb+RkkzYzG2KCAJPUNo8ua2ykTbp/sN94eaDDzdvHpAV/Rnq+i8I1ndMxoL/PNM+gHjYo57yR0im7AAbP6g/RfoyxmNQbxE1BvxQDsWRnSgYYFQ9PXGdy68hzn/J/5mhsH1HRkixPfu+uv/4xOtwgE8AcoWgWfGMQkXWVA8E2TAQCFrNpNfo8z+nt6flahbJWd+6WxfL3djjMp23YCHJESvdpVb5GkqJ/EohZ8YNrId6g7mpVVDlDUxHNxrTri2ywTcEm5Ogk0KNZCeR7QQEmnOroQmd6fcVypd0sX49EiUlqVwKEayOcXbJfA96rs10bI4hrhdJYtc16gQOehhCOnFslLozVi9p0C56RFrPdslWV2LQCh/o/btE45zWnjEMLs1MYtiCgsnMKLMZm/+a3654r4HEAASWOuWDBAbaEvesx3qo7tYgSY33r+XjMlY9sc/c7Pf+auY8kXcpKUEQlrGhRXtH58gdtIJCoMwJzrTYpoqQHXPyxGAMrpyvj5bkUw5T8ya+mXcBkVe9OCsVAKmO1e8euQ9wOrd4bcSxQU6kxU X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(39860400002)(366004)(136003)(396003)(346002)(376002)(44832011)(8936002)(54906003)(5660300002)(7416002)(316002)(478600001)(6486002)(36756003)(66556008)(66476007)(8676002)(6506007)(41300700001)(6512007)(6666004)(4326008)(66946007)(2906002)(26005)(2616005)(107886003)(186003)(1076003)(83380400001)(86362001)(38100700002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: ZsQSUBaRr9yO+pWdMRs0/5/Fz3yVfC8xfAtnGIJ1uTGpdp9qWOv0CZrqod5ZG3l8SWZhhsk5J+2kGTHPE73C+5MpC9naPCbgIoL4tiKvXjbMj8aa4mBA/9p+ixS7kfh7YKfeo2AkPpx9rt30oQkIL1WAKVd8FGRaafmnwPkLUjO9qkD0QdYjL+w/b9XeledLGcQdWgfOtLYOap9i5XAE5Q9vti1OiB2Hs64YzmGFrgSMp27/+EvusxLtUOmQsE6QBMU7C4/LHF+SjixXq55DrH5zjx2qm5dcFRcO7ybOMrOz4IHEdtW4CudU/NXuI44/wlbV8/+3QHBxb9nn8vwMGjhSEUyovo2RIUXzJ7sT009zm9gk4Qr5Es5eImL4xrxss6JuWAD8saleJzWEVkgnEPMXMhLOIuqibAAKCVMBFfehqVoxfJVz5YNzLQpWAZhLJGog90uSRsdFeTeiuFcM1j8d+DlyFTSMJI6WFHcfv+q9tfrKYWyPGpJ07Y1Eso/MtkwqyV6SfmEnQMTZ7Di5nTXcNIWTrAndSJWArLKsEXKHa1LBgMUsK8qib/NRpzHdq7uxQVxmccCEs9lXbHhsKBShMOD7brNMJNtdMae+2u/+6N5b+QCyKZsfMmXTbH7Suje+YhQvwz+ODEEWfSb7JRTvHKt05pD6j0TznnbP9cBISvHg36msvZNMMbEhzGUViYQXrewzHXH1G0fz2C2/OZjcCtGyi8tFVdwwT81r8J9SyARttrUqkLoKhKl68nMMfY1D+zMNf6lttyGB8hDnXfweM+AILAAp94FB8KfSOKzi7T2sagHXhMPpgPd48Yk6UH9kqQTb4QKk4n9gl+PXdizIxeL6Mp+b8udPMOKYE+l/KwVO6aMeE5SFrXaxllFS0nEOOTP92ve/P75JDHJlyIw+5DUmr1ndtJHAx1CEtF4Yo1WFH9BiUb3E+wtd7PwKeHZulkctfhpzHKVhqsUjydUf3avxm7rEqtRY1JdY2C4AsZxhQlzOhOokEiJUCpcIrNvrZ+JwY7pf78FllEa03GZW9xtEGFrySd1jStFOybA5jd3LPJKlTAod7MdyXpaFwwjXfL0oywvZXoup6kz6rwVf43o45mKBZzsVRjyTFHOd9AvsxWy9YGpyc/JIzeRl5bC8kHbp+RiQ982IDaEWcPIp7dRUKsfHJHGW1kPs4TAHSA+rJTdmx5VBQDNPyE8w+c4XMoc2lHY88pVxzOdbNy1C2XbR4DmoRCXsYYn/VtomuqjvjSLvRP0Q+AwFkULJWQGBAoWqthw7IDtnbH3I4Tvy/cZ7g4FoFHSp7j46XreBPJQnyS+Co5E68neoIo37NOZEWO0VDZVle5mVMnIE+4CAEE2DGhQr9rr8fBBOpyHSMx8bIh5IS36qkhtO8oIckLxNiKzzWy6c1GgCRUoQmXJT03ufcIANFXWaXPL7u2XgMk3gcMvqVLU9pQ5XG6CyvnB9f+w0v+k6cSCsinHecWRkgwnsJ80bo3O+agD5yBojHZWe+pvnPy7Hhw0teg90ytbPCn0S303Ck+KG6LS7Q4U1GJxWUoGRv0zQHBGJaMWgp4YbPKi7IaM5aROiH8HLqeaODNXLjbhGwBIv2QYJkA== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 4ec24ff1-5785-4e36-c98e-08da5f8d7253 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Jul 2022 20:23:54.0717 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: Jln9XzX0cOjV2yQ+ehVvA4HvlnOtpDwhJA6uSamlKgvrPFSuqnOAGSYWfrSG73QxlfyRVc+g6OnfW7VE0kjWYw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR10MB4032 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.517,18.0.883 definitions=2022-07-06_12:2022-06-28,2022-07-06 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 suspectscore=0 phishscore=0 mlxlogscore=999 spamscore=0 adultscore=0 malwarescore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207060078 X-Proofpoint-GUID: qPgZt_s0ldSzeqM6gYCjN_wPrIejo08A X-Proofpoint-ORIG-GUID: qPgZt_s0ldSzeqM6gYCjN_wPrIejo08A ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1657139045; a=rsa-sha256; cv=pass; b=Rl57gP1aeCdgxbOe7fT7dx7E4uB86UTeuJcZx6dKVwmldRnncf3aQalLEofcwMOohhKVit YsgO1smyxbrbLul7zuX/sqJ4D1WmGJVUNkWsNXaPFcFntERf6Z1kq+BSs/rqhg3v8uuomm NaDLT2MO0veD+qt5tvg410bJkFJPAj4= ARC-Authentication-Results: i=2; imf01.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=sTAgq4qR; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=BtXX38bf; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=none) header.from=oracle.com; spf=none (imf01.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.165.32) smtp.mailfrom=mike.kravetz@oracle.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657139045; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S8ky+HpgCvtwJPQsoEGH4UhlCf9p48ZRt37tRuCVtNs=; b=hjEo32NQ0NLEVZyWc75bvdX1SmnnhjG5YW/odR4huth2hHes3UBeD76dEZ9QUyWgoLK8t7 +DIWFwJAAl3qwdXBAx6pdqMM8SZ6K4GuETB71aGpSKDTY44W2cRPtBZmIdJsyGRxvOI1Sg VWT3mdG46TbafF4G7NzzL+ws+ZQQifw= X-Rspamd-Server: rspam04 X-Rspam-User: Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=sTAgq4qR; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=BtXX38bf; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=none) header.from=oracle.com; spf=none (imf01.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.165.32) smtp.mailfrom=mike.kravetz@oracle.com X-Stat-Signature: 616ypzzqnrdq9xe6ytqxb44d48s7brwz X-Rspamd-Queue-Id: 603FD40016 X-HE-Tag: 1657139045-101027 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") added code to take i_mmap_rwsem in read mode for the duration of fault processing. The use of i_mmap_rwsem to prevent fault/truncate races depends on this. However, this has been shown to cause performance/scaling issues. As a result, that code will be reverted. Since the use i_mmap_rwsem to address page fault/truncate races depends on this, it must also be reverted. In a subsequent patch, code will be added to detect the fault/truncate race and back out operations as required. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 30 +++++++++--------------------- mm/hugetlb.c | 23 ++++++++++++----------- 2 files changed, 21 insertions(+), 32 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 20336cb3c040..9fa1af39382d 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -451,9 +451,10 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, * In this case, we first scan the range and release found pages. * After releasing pages, hugetlb_unreserve_pages cleans up region/reserve * maps and global counts. Page faults can not race with truncation - * in this routine. hugetlb_no_page() holds i_mmap_rwsem and prevents - * page faults in the truncated range by checking i_size. i_size is - * modified while holding i_mmap_rwsem. + * in this routine. hugetlb_no_page() prevents page faults in the + * truncated range. It checks i_size before allocation, and again after + * with the page table lock for the page held. The same lock must be + * acquired to unmap a page. * hole punch is indicated if end is not LLONG_MAX * In the hole punch case we scan the range and release found pages. * Only when releasing a page is the associated region/reserve map @@ -483,16 +484,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, u32 hash = 0; index = folio->index; - if (!truncate_op) { - /* - * Only need to hold the fault mutex in the - * hole punch case. This prevents races with - * page faults. Races are not possible in the - * case of truncation. - */ - hash = hugetlb_fault_mutex_hash(mapping, index); - mutex_lock(&hugetlb_fault_mutex_table[hash]); - } + hash = hugetlb_fault_mutex_hash(mapping, index); + mutex_lock(&hugetlb_fault_mutex_table[hash]); /* * If folio is mapped, it was faulted in after being @@ -536,8 +529,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, } folio_unlock(folio); - if (!truncate_op) - mutex_unlock(&hugetlb_fault_mutex_table[hash]); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); } folio_batch_release(&fbatch); cond_resched(); @@ -575,8 +567,8 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset) BUG_ON(offset & ~huge_page_mask(h)); pgoff = offset >> PAGE_SHIFT; - i_mmap_lock_write(mapping); i_size_write(inode, offset); + i_mmap_lock_write(mapping); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0, ZAP_FLAG_DROP_MARKER); @@ -735,11 +727,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, /* addr is the offset within the file (zero based) */ addr = index * hpage_size; - /* - * fault mutex taken here, protects against fault path - * and hole punch. inode_lock previously taken protects - * against truncation. - */ + /* mutex taken here, fault path and hole punch */ hash = hugetlb_fault_mutex_hash(mapping, index); mutex_lock(&hugetlb_fault_mutex_table[hash]); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ca081078e814..02cceb7b8cce 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5532,18 +5532,17 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, } /* - * We can not race with truncation due to holding i_mmap_rwsem. - * i_size is modified when holding i_mmap_rwsem, so check here - * once for faults beyond end of file. + * Use page lock to guard against racing truncation + * before we get page_table_lock. */ - size = i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >= size) - goto out; - retry: new_page = false; page = find_lock_page(mapping, idx); if (!page) { + size = i_size_read(mapping->host) >> huge_page_shift(h); + if (idx >= size) + goto out; + /* Check for page in userfault range */ if (userfaultfd_missing(vma)) { ret = hugetlb_handle_userfault(vma, mapping, idx, @@ -5633,6 +5632,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, } ptl = huge_pte_lock(h, mm, ptep); + size = i_size_read(mapping->host) >> huge_page_shift(h); + if (idx >= size) + goto backout; + ret = 0; /* If pte changed from under us, retry */ if (!pte_same(huge_ptep_get(ptep), old_pte)) @@ -5741,10 +5744,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, /* * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold - * until finished with ptep. This serves two purposes: - * 1) It prevents huge_pmd_unshare from being called elsewhere - * and making the ptep no longer valid. - * 2) It synchronizes us with i_size modifications during truncation. + * until finished with ptep. This prevents huge_pmd_unshare from + * being called elsewhere and making the ptep no longer valid. * * ptep could have already be assigned via huge_pte_offset. That * is OK, as huge_pte_alloc will return the same value unless From patchwork Wed Jul 6 20:23:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12908568 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75FA9C43334 for ; Wed, 6 Jul 2022 20:24:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 10B5E6B0074; Wed, 6 Jul 2022 16:24:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0930E8E0001; Wed, 6 Jul 2022 16:24:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB1C16B0078; Wed, 6 Jul 2022 16:24:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C95036B0074 for ; Wed, 6 Jul 2022 16:24:08 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 9F7CA80380 for ; Wed, 6 Jul 2022 20:24:08 +0000 (UTC) X-FDA: 79657801776.02.A190AA2 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by imf07.hostedemail.com (Postfix) with ESMTP id 1B9EE40011 for ; Wed, 6 Jul 2022 20:24:07 +0000 (UTC) Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 266IY2FE022884; Wed, 6 Jul 2022 20:24:00 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=bmSIBkGRbfR5tB2WWrvtxqgo+G5y/yCBa1K6lDJILL4=; b=HOA0Vuy66w9RaLzQ2Ymnbsypd51ne7DFM8UJJU9p0VgrENLEjcQuPMEdYos+6iB99AMA E0v6HDJ3cdbxUs4XXIh6aZQ9HpZu14H5F5XENasimhoL5dA8BhDxzMRpb2mJoVCNV+uu kqb3dRayDPg1DGuePpxN3oXV4mkiyhUQrtQnVH9CSHnorqlVua+o6lKTxd2xFIE8lvmP BYjEx8D8btIm6AQgnMBq5b88eFTVi4+Ms0JPcu8z+oqy9fqYLvuibBgi436qbCJeJ+xw VumGdel74fm4mqRLWlaXM2EJxHCU0kIYqHwJMVj+Fe+tewSajduBCXfITQ2AJXuknEA5 tA== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3h4ubyk9wq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Jul 2022 20:24:00 +0000 Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 266KLJfh035194; Wed, 6 Jul 2022 20:23:59 GMT Received: from nam11-bn8-obe.outbound.protection.outlook.com (mail-bn8nam11lp2168.outbound.protection.outlook.com [104.47.58.168]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3h4ud6cu33-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Jul 2022 20:23:59 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dgSbz1vMyuLoaLcEOa1ta3NUlovakqfKTMZmBOK/9YQprR6unrZWVbQ+seUCAtjEDD1a6/0C3KN9gD6lPN/lTeASWW8UqTCvsOavEATYRJHHSlAlbVgOKAGcXU3qlOtnSfBH07oHnC7KHjsBv2YS+QpRty0CQRzGAZkZniV2ZZlUqJ6bTv6f2wpOi6HY2DadjK4nUzc0oUcomsXWcA5jyCWrtEg5M9fF9usp1TlqE5v+HKGjrDJ0ikt9mBM576LyScabpqrVH2pmRl0MAqYVnakjeL9D0wjACoie7NIT6CMvEQjByWFC44HWnkFHLOYLV8ZZdtNRHKtqk2zSw+a0PA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=bmSIBkGRbfR5tB2WWrvtxqgo+G5y/yCBa1K6lDJILL4=; b=JB9UCwI25UKyj/p7zvsHgObPebbeAU/ZaNdawr6ytD3ezmosqQfw/mXMz7iscVBMg3OlOgpIKW1IzFGjbc37TdXVfzVE24R25OUdrx7QHs2hb62laDBlh7LfvhahDhpElrShbuyXMybuNxgWIdtIobvzymErYaT9IpXtUY+GCY8a9EweaFxXat7CmBXVRbeMbeuPOL2VxTS+MrpvhTo2aL+GmrF7EdaTIaGIFvrQIh5agj1zFIqs64VEq/GIv5Lgtd3znBm8ZwORlwaiGHMb6UXgl6ZmJkG72l+E0CKQyTWE6V+Bi4Adfx8PSb9TeznWMzrEzg6ry8nIHa6ApbSubQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=bmSIBkGRbfR5tB2WWrvtxqgo+G5y/yCBa1K6lDJILL4=; b=Vmp5OU1nWGs0mmvXMXB619EPHOr06JfusMRzwfnOl0MctLTZF4P3KRGgm+l9ptupn09iSfJKNe8hcgE+RGed0oS3RIFe0RH0+wpNrZhz31z1Cf/kFWYtvIiX5IjqBp8Ol1DSyyiwHClUVDbSscok05RNg5ARvK88tlwxRvQywX0= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by MN2PR10MB4032.namprd10.prod.outlook.com (2603:10b6:208:181::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5417.15; Wed, 6 Jul 2022 20:23:57 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::c1ba:c197:f81f:ec0]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::c1ba:c197:f81f:ec0%6]) with mapi id 15.20.5417.016; Wed, 6 Jul 2022 20:23:57 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Michal Hocko , Peter Xu , Naoya Horiguchi , David Hildenbrand , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH v4 2/8] hugetlbfs: revert use i_mmap_rwsem for more pmd sharing synchronization Date: Wed, 6 Jul 2022 13:23:41 -0700 Message-Id: <20220706202347.95150-3-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220706202347.95150-1-mike.kravetz@oracle.com> References: <20220706202347.95150-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MWHPR13CA0027.namprd13.prod.outlook.com (2603:10b6:300:95::13) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 5bd81b97-f1e7-4513-fb97-08da5f8d7424 X-MS-TrafficTypeDiagnostic: MN2PR10MB4032:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: aohydP3J14VibyZ3ntrTesnH3LH/28H9hN85dZVBJ1PCwh5ji6jl8KNfyBOMpqWwNJeyVZ4itVjrfB9VOwqOI4pCibj7vklSDpcy92eBurrWRL4T9OQEiep2xLxXVAnp8N0MrwuGMNbnkXm6CDzZ7agU0Hw4Y2lhb8s/Vx3XqFs3M1X4nU6F0FHZFFepu9Lq9g0rpqCINZFx+HRWDjMkWiycMGZhrcC4hxrpXwO0Jg25+gD2ueHr9VP5Zt0YWiVM4NtjFxgvOIijsfdQ8SvhdSuq0p+LHW/GKs0KCagK3LCirxA3c0CAFFiEgyRpJUND0rhXyS4I28gcryU3oM68T/pkHLlY7bY3oLkzazpDCNeKjnhFn8A0ne1Ag6xwyyntFtDYMBB167WZ9PJCVItqFaNA00RH6ILbxYzjjKQ5n3J8yyyo74d9JjBrDmw+GBrOk2EA4PTfA4sPe4NsrvlPl2ZjWr+ofOxi39JmqU5EMh2slmFm5NiYsc06D2VxSfAIPvdfoF+8XRyiSOv94R7REYaw8OR26m5f7JwgCKkuotps0EIVMX7S02wQqJYhU3i5BCd8SywmAOXKgXdSl9DQKoh3Pogs6Jr2qf5QbVu5RXtHu0kIj6euCnd7+JBMSzo/F96k0oXC0aYvS58lK5bIen4AKyPo2/ql2vIDufVcWXETa/9sEIm7BkETIwU/ugJRI2Esk68nRB74fq75qB4rJ7Ig/mR2ukIu9v91DZA3k9tajlYlKR9LZDqFKkoCUyA6 X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(39860400002)(366004)(136003)(396003)(346002)(376002)(44832011)(8936002)(54906003)(5660300002)(30864003)(7416002)(316002)(478600001)(6486002)(36756003)(66556008)(66476007)(8676002)(6506007)(41300700001)(6512007)(6666004)(4326008)(66946007)(2906002)(26005)(2616005)(107886003)(186003)(1076003)(83380400001)(86362001)(38100700002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: emNSHOI4AvcQeofb9NZeCGNlSvdnQKPJiU7U57qwCqe5TA2RZkgbUXszNktIvcaKmpw0qph2APML37qApPYgtB4Tb6v1EhTPiMPq9I/88MPeNVCVqqJY63+kLj1k+FSg0qFK7iJz+zSzFxL8RVDNfu6ysZ0xHfEm1C8WUL0SBiQVEtRW1ldzRWCFiKyVNkHJHDNZ0Jbpw9sObikdu688ajCMe0QRUfzOTpR9iHO5mB3yYIO+VicHEX1S+jvBl8/PerwdH7Ejzbk4IzMOrBuds2EEArM8/ZVAM/2uYLv7297IGBckVLP4D+wbBisb8oTklIvMGFXiYJ7OPufD2fsDrbZG5HUtlgA0vhh1Wj9pec1sO1fkRI5Bxg7GoT+O5+KpKYRAVnYGU0xsESTHoyvoMwrI7rz1EQzuBAblpDnA6oGt83VM/Y1W+WgcJ6JZiwhBMATIqZn6kfENlGMn+BcjYtl6xCBSpWxYwR4hmtjVBzeDeceVPKOFeYgnsgc7TVRUxZLoFjE3F4TTOTZAVvlvecHWSf8rD45XyuH732WYIGHoOc5CTGyn/AuWvZMzZzRBG3PKP2LGpOkLJTK2MkrPiSHyrlTLK6CzHvphHVhoEthe5JQK2IMhxyUV4ef6hc7cml713mfX5uazuMLy9xsN3JwBoG1LaBQKkXjibPYrEQOVYa5ET+7KlYlt4oGunxcFCqLXhgVhGowoejcVqeaYfKaCVAr9RzbHpJD8Z7gWZifLp+ja+93T+GXh+HSAFLJvr/Q+GKwSLrGoYaHiQeBOqoky/S4vkcTm2auwZW0YVr1hfwJokIKkyd9y/Lr/vULBX6ytNt/5jGyCLRRhmFl13q0AUwJqu74/J0zbB6gMflyMk9/+k3EUUYhUw8YRDSdraWD4gDPJ8R/FGw+3kKZEYnlfJFMl+raA1sTUWHs8Nua6S7pg2tWstnWFG8FrNy4AJv7A869v8M5a0HTxoywtGQlvbN5nWS/8l3VDVomRL0XxW56OwwQnGpoFHFCjUnb153inI6ph0G7vNdBbkNY8lK7PyiPZtLDOIjzUFz9Py95/7NAFbJkw2CHrGDOQ/IQFg0r3Z5I7aITy608+hQ/ngDcAhSwjC81IXfTDCe4pVFT1m04LkZCJh98bE81vhf04aPHZFrAVkfYjLQO4DuyqcCkeU1ZBlJP5yvm+CGqiT/O7Kt4cj1mK4oQdp0gebJpbex7OUB+o0gjSSiZVskuEn1rSA2QkXSsrNShuyIEdumHd4E/GaiF/M5lABOABAFUKboQtgESBqMKTXc8YNacFVY2YQoMEVHzQ44xi5xKa7+XpZYernxwQoBCfXWhHRZTG3NEB2ZSEfUVtxoNk7rVa8E5G/alf8ajsiEbA1DRpYZ8O05YsQcNGFGYvHu9nP60enoBXElyINnyZo9/hsOz96PBKdff/BrPJKrhabxINb0bk3B70BoeKxB7uYZCtOtnfHjPlofokSzGHhB6gJQRnu686V9fK5zE58ayKcowgcq4dAmSCeL6xWMH+VSczboDLR9tBD5KVTtC18MULXVRnVL0pdMzIGaKLJqSPkOX87GKRbnn4DczCSsq3fQg5dDoONeXBnOrWHiOeMgigKxFr8Q== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 5bd81b97-f1e7-4513-fb97-08da5f8d7424 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Jul 2022 20:23:57.2141 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: dMeaDZp03eUIV4UTRS8k4x/ZFCvHAtjjhTJ+nr/lQsT9rswDj4N8vHLeqTeYGtggmRn7fEkBng8APv2Pifs83g== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR10MB4032 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.517,18.0.883 definitions=2022-07-06_12:2022-06-28,2022-07-06 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 suspectscore=0 phishscore=0 adultscore=0 malwarescore=0 mlxscore=0 spamscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207060078 X-Proofpoint-GUID: 5s_DF5OJFg_ijNFxLsy9Kzxxk4ukt2h0 X-Proofpoint-ORIG-GUID: 5s_DF5OJFg_ijNFxLsy9Kzxxk4ukt2h0 ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657139048; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bmSIBkGRbfR5tB2WWrvtxqgo+G5y/yCBa1K6lDJILL4=; b=T/H+ttgA/OLgbEQo/bEYU7yiHLIX044XaETGch6WnZIhKoWMK2GZd9Oy+HQ/KN3MKrhsvV EfSU8r6ilCPS6AXvIiB9kIZrsBagebHc3AE0GwjO8b4TGEo6PolY7BfA124rLm7/toOZID zL2amQuon6eb1axZziCEPA7S9gqv17A= ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1657139048; a=rsa-sha256; cv=pass; b=CflIfazk1SyK3hpO94jSc+oy3utTCnxHT0dgjAga9S6Km5pbwHnG4mJLmD67HCudOCU9ve lvKtZuFkv/Cw1NId9BvnqERcV6LO/ocwabUiJB5SRBiY2rEbdrMfXXNpMKq43rMdnWg61p +2u83WHqrSUzMd4Fh5aPX1tDgoY/VPo= ARC-Authentication-Results: i=2; imf07.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=HOA0Vuy6; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=Vmp5OU1n; dmarc=pass (policy=none) header.from=oracle.com; spf=none (imf07.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.165.32) smtp.mailfrom=mike.kravetz@oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") X-Stat-Signature: c5r5djm3sipakcqxhod8oteb1ou6trqk X-Rspamd-Queue-Id: 1B9EE40011 X-Rspam-User: Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=HOA0Vuy6; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=Vmp5OU1n; dmarc=pass (policy=none) header.from=oracle.com; spf=none (imf07.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.165.32) smtp.mailfrom=mike.kravetz@oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") X-Rspamd-Server: rspam10 X-HE-Tag: 1657139047-122115 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") added code to take i_mmap_rwsem in read mode for the duration of fault processing. However, this has been shown to cause performance/scaling issues. Revert the code and go back to only taking the semaphore in huge_pmd_share during the fault path. Keep the code that takes i_mmap_rwsem in write mode before calling try_to_unmap as this is required if huge_pmd_unshare is called. NOTE: Reverting this code does expose the following race condition. Faulting thread Unsharing thread ... ... ptep = huge_pte_offset() or ptep = huge_pte_alloc() ... i_mmap_lock_write lock page table ptep invalid <------------------------ huge_pmd_unshare() Could be in a previously unlock_page_table sharing process or worse i_mmap_unlock_write ... ptl = huge_pte_lock(ptep) get/update pte set_pte_at(pte, ptep) It is unknown if the above race was ever experienced by a user. It was discovered via code inspection when initially addressed. In subsequent patches, a new synchronization mechanism will be added to coordinate pmd sharing and eliminate this race. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 2 -- mm/hugetlb.c | 77 +++++++------------------------------------- mm/rmap.c | 8 +---- mm/userfaultfd.c | 11 ++----- 4 files changed, 15 insertions(+), 83 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 9fa1af39382d..7a9f25fff869 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -499,9 +499,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, if (unlikely(folio_mapped(folio))) { BUG_ON(truncate_op); - mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_lock_write(mapping); - mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vmdelete_list(&mapping->i_mmap, index * pages_per_huge_page(h), (index + 1) * pages_per_huge_page(h), diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 02cceb7b8cce..c1a0e879e0dc 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4738,7 +4738,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct hstate *h = hstate_vma(src_vma); unsigned long sz = huge_page_size(h); unsigned long npages = pages_per_huge_page(h); - struct address_space *mapping = src_vma->vm_file->f_mapping; struct mmu_notifier_range range; unsigned long last_addr_mask; int ret = 0; @@ -4750,14 +4749,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, mmu_notifier_invalidate_range_start(&range); mmap_assert_write_locked(src); raw_write_seqcount_begin(&src->write_protect_seq); - } else { - /* - * For shared mappings i_mmap_rwsem must be held to call - * huge_pte_alloc, otherwise the returned ptep could go - * away if part of a shared pmd and another thread calls - * huge_pmd_unshare. - */ - i_mmap_lock_read(mapping); } last_addr_mask = hugetlb_mask_last_page(h); @@ -4909,8 +4900,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, if (cow) { raw_write_seqcount_end(&src->write_protect_seq); mmu_notifier_invalidate_range_end(&range); - } else { - i_mmap_unlock_read(mapping); } return ret; @@ -5304,30 +5293,9 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, * may get SIGKILLed if it later faults. */ if (outside_reserve) { - struct address_space *mapping = vma->vm_file->f_mapping; - pgoff_t idx; - u32 hash; - put_page(old_page); BUG_ON(huge_pte_none(pte)); - /* - * Drop hugetlb_fault_mutex and i_mmap_rwsem before - * unmapping. unmapping needs to hold i_mmap_rwsem - * in write mode. Dropping i_mmap_rwsem in read mode - * here is OK as COW mappings do not interact with - * PMD sharing. - * - * Reacquire both after unmap operation. - */ - idx = vma_hugecache_offset(h, vma, haddr); - hash = hugetlb_fault_mutex_hash(mapping, idx); - mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); - unmap_ref_private(mm, vma, old_page, haddr); - - i_mmap_lock_read(mapping); - mutex_lock(&hugetlb_fault_mutex_table[hash]); spin_lock(ptl); ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); if (likely(ptep && @@ -5495,9 +5463,7 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, */ hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); ret = handle_userfault(&vmf, reason); - i_mmap_lock_read(mapping); mutex_lock(&hugetlb_fault_mutex_table[hash]); return ret; @@ -5728,11 +5694,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); if (ptep) { - /* - * Since we hold no locks, ptep could be stale. That is - * OK as we are only making decisions based on content and - * not actually modifying content here. - */ entry = huge_ptep_get(ptep); if (unlikely(is_hugetlb_entry_migration(entry))) { migration_entry_wait_huge(vma, ptep); @@ -5740,31 +5701,20 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(hstate_index(h)); + } else { + ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); + if (!ptep) + return VM_FAULT_OOM; } - /* - * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold - * until finished with ptep. This prevents huge_pmd_unshare from - * being called elsewhere and making the ptep no longer valid. - * - * ptep could have already be assigned via huge_pte_offset. That - * is OK, as huge_pte_alloc will return the same value unless - * something has changed. - */ mapping = vma->vm_file->f_mapping; - i_mmap_lock_read(mapping); - ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); - if (!ptep) { - i_mmap_unlock_read(mapping); - return VM_FAULT_OOM; - } + idx = vma_hugecache_offset(h, vma, haddr); /* * Serialize hugepage allocation and instantiation, so that we don't * get spurious allocation failures if two CPUs race to instantiate * the same page in the page cache. */ - idx = vma_hugecache_offset(h, vma, haddr); hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_lock(&hugetlb_fault_mutex_table[hash]); @@ -5832,7 +5782,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, put_page(pagecache_page); } mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); return handle_userfault(&vmf, VM_UFFD_WP); } @@ -5876,7 +5825,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, } out_mutex: mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); /* * Generally it's safe to hold refcount during waiting page lock. But * here we just wait to defer the next page fault to avoid busy loop and @@ -6716,12 +6664,10 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, * Search for a shareable pmd page for hugetlb. In any case calls pmd_alloc() * and returns the corresponding pte. While this is not necessary for the * !shared pmd case because we can allocate the pmd later as well, it makes the - * code much cleaner. - * - * This routine must be called with i_mmap_rwsem held in at least read mode if - * sharing is possible. For hugetlbfs, this prevents removal of any page - * table entries associated with the address space. This is important as we - * are setting up sharing based on existing page table entries (mappings). + * code much cleaner. pmd allocation is essential for the shared case because + * pud has to be populated inside the same i_mmap_rwsem section - otherwise + * racing tasks could either miss the sharing (see huge_pte_offset) or select a + * bad pmd for sharing. */ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pud_t *pud) @@ -6735,7 +6681,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *pte; spinlock_t *ptl; - i_mmap_assert_locked(mapping); + i_mmap_lock_read(mapping); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { if (svma == vma) continue; @@ -6765,6 +6711,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, spin_unlock(ptl); out: pte = (pte_t *)pmd_alloc(mm, pud, addr); + i_mmap_unlock_read(mapping); return pte; } @@ -6775,7 +6722,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, * indicated by page_count > 1, unmap is achieved by clearing pud and * decrementing the ref count. If count == 1, the pte page is not shared. * - * Called with page table lock held and i_mmap_rwsem held in write mode. + * Called with page table lock held. * * returns: 1 successfully unmapped a shared pte page * 0 the underlying pte page is not shared, or it is the last user diff --git a/mm/rmap.c b/mm/rmap.c index edc06c52bc82..6593299d3b18 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -23,10 +23,9 @@ * inode->i_rwsem (while writing or truncating, not reading or faulting) * mm->mmap_lock * mapping->invalidate_lock (in filemap_fault) - * page->flags PG_locked (lock_page) * (see hugetlbfs below) + * page->flags PG_locked (lock_page) * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share) * mapping->i_mmap_rwsem - * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) * anon_vma->rwsem * mm->page_table_lock or pte_lock * swap_lock (in swap_duplicate, swap_info_get) @@ -45,11 +44,6 @@ * anon_vma->rwsem,mapping->i_mmap_rwsem (memory_failure, collect_procs_anon) * ->tasklist_lock * pte map lock - * - * * hugetlbfs PageHuge() pages take locks in this order: - * mapping->i_mmap_rwsem - * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) - * page->flags PG_locked (lock_page) */ #include diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 07d3befc80e4..3225b5f70bd8 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -377,14 +377,10 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, BUG_ON(dst_addr >= dst_start + len); /* - * Serialize via i_mmap_rwsem and hugetlb_fault_mutex. - * i_mmap_rwsem ensures the dst_pte remains valid even - * in the case of shared pmds. fault mutex prevents - * races with other faulting threads. + * Serialize via hugetlb_fault_mutex. */ - mapping = dst_vma->vm_file->f_mapping; - i_mmap_lock_read(mapping); idx = linear_page_index(dst_vma, dst_addr); + mapping = dst_vma->vm_file->f_mapping; hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_lock(&hugetlb_fault_mutex_table[hash]); @@ -392,7 +388,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize); if (!dst_pte) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); goto out_unlock; } @@ -400,7 +395,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, !huge_pte_none_mostly(huge_ptep_get(dst_pte))) { err = -EEXIST; mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); goto out_unlock; } @@ -409,7 +403,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, wp_copy); mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); cond_resched(); From patchwork Wed Jul 6 20:23:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12908569 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A34EFC433EF for ; Wed, 6 Jul 2022 20:24:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 884C18E0002; Wed, 6 Jul 2022 16:24:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 80EB38E0001; Wed, 6 Jul 2022 16:24:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 610688E0002; Wed, 6 Jul 2022 16:24:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 4E71A8E0001 for ; Wed, 6 Jul 2022 16:24:09 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 2255912109D for ; Wed, 6 Jul 2022 20:24:09 +0000 (UTC) X-FDA: 79657801818.18.C61423B Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by imf05.hostedemail.com (Postfix) with ESMTP id 7901C100018 for ; Wed, 6 Jul 2022 20:24:08 +0000 (UTC) Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 266IWeMN009665; Wed, 6 Jul 2022 20:24:03 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=Tba8ZDaXK+/Iic7ZWzu0i4uW3zG08BBO8BWQTMPgjPA=; b=T1qYSD+YVuvYg2YtGqgQtOypxCwEaguGnUzJi2K9tGRrH5VRm5P+fjH4/XvDQ3lYRPm4 Fdwet8VlmBYS023MR3GNj8SwKPXIRqIhBRYPG0LHCUCJ21Au2ZtUu6tFLxH8GpsQmlpx l0JX8tmWqRHCHXS45I3jGkDo6vFSUewxWa/neTgPIy3zAuMlt+PKxFgkxOc0nJoCqxxI VfL2o53m5a+61pPBQ7bgRF+uywXnSq2hOKNjNyWXRj9eTvl1/EY53hZ/lMRdaJaV7tHT /FBOIyJ9H/4qqzrqRGXdULm7MMM03ZpbJtCPQwcDra6KsGX99wJ233ldpgrrKPYbvHGW vQ== Received: from iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta03.appoci.oracle.com [130.35.103.27]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3h4ubyb9ww-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Jul 2022 20:24:02 +0000 Received: from pps.filterd (iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 266KLTPl036189; Wed, 6 Jul 2022 20:24:01 GMT Received: from nam11-bn8-obe.outbound.protection.outlook.com (mail-bn8nam11lp2169.outbound.protection.outlook.com [104.47.58.169]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com with ESMTP id 3h4ud8c0wj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Jul 2022 20:24:01 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FbpqYd8YJvyJbpFfxOS6iI1WaChNn6BqOrFUOSJBitECD5VTSLKJraF+q+32LOkiFjTji355cgFp8J9SCJQYdd/eSV8/Ee+MgP3FGYIMv4/VWbrLOPwu4aSaCGP7v6TsL9SAp3cVo07oI+dDjpZSlXP5urqnuzXc9i+KELFxzHyOaMH4QGQtHWwOyd3OzADToRLscJopBR9oO6iyGhUJpUNq16Oh4yXQuu34gER4btZjBmBNAhsLUeKqdX77mOHT//uT6Yz2ByStgTn7ILID5W6AASyjR3RQLQeUZdd8mlOZM5m9OFUA9OrjCKRWWB90EVeYliYYB1y3giFuSap3yg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Tba8ZDaXK+/Iic7ZWzu0i4uW3zG08BBO8BWQTMPgjPA=; b=CHU8sAs3zu4vJEkTyLbpAj6F/nB9+aay0YBVRnYUlO/nv15ENwj4TZNVizAmOW/5iNPrGqStRHIctUzJBgH6aaE4iLrqD9hZiyFx0XOS0eSwrQnQoxKMLTGdXeBg3o/Y7osNYbQsJ8US3MzwkGeksmCba+aWBbUtfBofmmtcezNkbwYAZFjeK8W6dV4+2+OSOXe59GP5hl2qDF1cDikcmfxQ3qQDGDahVRdNb4s+Bnlt3Auv2qwJshWVDVQk/38UfGWM4CXdmZwefP1gsY0H14ZbclDS7Qie86qRSwD+x9oOcYRPrVJxyn/3cXpJz4ZRfU90PLomYNV7ae+KcAQDQA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Tba8ZDaXK+/Iic7ZWzu0i4uW3zG08BBO8BWQTMPgjPA=; b=O0R+gES8DBNTz1h+F69T5GMCcAVX6YADHFUYOM6jmli441SAuTVTAuzv7V2Ihji6p1QdS61neKg8XKu20KuuXi0JjTWlyDbxz8txTx6aEOWqhyVK6dXlraySI72kodcLgJIzOQGSdsBzSC/jdwTG5WJpUpFE+258GB+bFnxy2Ck= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by MN2PR10MB4032.namprd10.prod.outlook.com (2603:10b6:208:181::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5417.15; Wed, 6 Jul 2022 20:24:00 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::c1ba:c197:f81f:ec0]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::c1ba:c197:f81f:ec0%6]) with mapi id 15.20.5417.016; Wed, 6 Jul 2022 20:24:00 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Michal Hocko , Peter Xu , Naoya Horiguchi , David Hildenbrand , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH v4 3/8] hugetlbfs: move routine remove_huge_page to hugetlb.c Date: Wed, 6 Jul 2022 13:23:42 -0700 Message-Id: <20220706202347.95150-4-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220706202347.95150-1-mike.kravetz@oracle.com> References: <20220706202347.95150-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW3PR05CA0020.namprd05.prod.outlook.com (2603:10b6:303:2b::25) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 684123f5-8930-4b8a-f424-08da5f8d7606 X-MS-TrafficTypeDiagnostic: MN2PR10MB4032:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: aGe0DpcqxMOIQzo3gWbnTCeMM1NbmLNu4MTA7PxUzJ0QVXurLogYkdoU23oDWjhcVPjqR13DpxmxO2l6laDBGUc2ua/J1Nr2cVkhCVEKPpvDpfp+8ECuPg/QizDE2pfY/VD6erzORn5s2GkJr3FHdb8gdse4jYa6jdiJCQvq4GdnCdIJlHMymnSkGZX8d8lb/aBppAaYO5bg8V8zBlUuXUY7zB0JYTQfsOXarR9IhF8f3kh6tubnmAXIxccz4L5tT2k8aPagb927FHuhHMl8PF6pCKuk3tLoAT9CCm8e82Nftu6cLThwDTeRGKxWcu9OM7HsvK+liwiX7n4jLTIFsWgn63Kk47IV0t1AO9INrnzTw527CwbCVc8P4JBsAwpZNVJt1Lx091J9rxHQbaJjwk615e4eZQ/48XaSyb3bmce3JCTGkK4VPelpKMwhq1VA6gDKh2ZWA0URXouaD7SxygxMPaahMT+kXjJWbXgL++DcXsUAJD+sfgnDmFTIsX2RJh59YB6PkTnZzKnlLYHNdE3D64C6OWRIOUcjzGrEak4+uJu2gEPgncl3voHTK1yz2ev1R/Qne5R7ONfhKmC+oyD1hd2R3bxTwoLsdWa+aQeGSFPT4bm99Jcs+KGuiElHxfCaXGfuaw9mzqkEfyl9WlQXiAkhqYY/5tWrae+lnZ3/KvJrZQA3lh4K55GhA4inVDrw7hCL+9HWjlkFSTAkADpiTchLYS7n1BtlEPKtj2hSJlpYgCbsveNfRED8k5yNC1EIC1u8KqJ9qOqwxY0/esqSqG197OSUP4qUbAH37q1Xa8nkg0xVJjOY8hAB3Vjn X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(39860400002)(366004)(136003)(396003)(346002)(376002)(44832011)(8936002)(54906003)(5660300002)(7416002)(316002)(478600001)(6486002)(36756003)(66556008)(66476007)(8676002)(6506007)(41300700001)(6512007)(6666004)(4326008)(66946007)(2906002)(26005)(2616005)(107886003)(186003)(1076003)(83380400001)(86362001)(38100700002)(14583001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: +MfF0uZo5bDsHkxzKZcHhHJU/I7vSc5Oz/fKqJ3sIwolJHhxCuE6Y/JAJM0eqOCZMUZJMrteGAk4Z0bEdy6L949Ht0CoPGZfi4wOfuHmHnrUXKvfSOr+0W+qfHltlGWN0TrnKqm3DcRbig91CfvZTcSrqaEbtybfl2hz6bGxR3FnkTkv2INoZevRa3/bIxsg1fejfKUs9pfH0jFuRZey1za9UnSVBxXCxfY131GZ3k+KxwcrWxDc4X0qWF3hQW6dXMFoX1cVpT4wGwHHrv8DtGHeyi14TmaZtjXLjZWC0lxIloYYs6GtxtMXcMZIxLDQK/qtWm3OITbhbtuHUdKl24gtlwgK2iQ9Zt9KadwxL5kN0qTJQdw8Ae5O6shitdWy9IPsdohLQEPIqonwwp1C0QNH2BuPfCxS9L+EV0D6fjagfdEei6K0QbjWq3HC+/mAzRnUdc4vzjUgdUEwhacvHPXMKlfSyl6BdGYDnrEyPhMowhYDmoEXfSbdOc96+sFu4yjJCiNNAH8julcVd5L8wCQnK0R3Kza3u0KXfQk33BtIoXBJIcxVNitG3+xd4oHBXfB3jlVP5wLSeKRcSwA1gmMZCBKQgYmO2RhcV9Wv9SeK7aQItwBcoAFOxThCwN8ws9yeWBOVUPMrdgzb27LXkHbLgzpZe193PeWKgT+ZyGfPoVlgKoK2iNnRM19wy17gYH2vAuzWPotI3YVwsSdG4u/JTE/moiVXadn+uYGbb21MsrjUVeMhN/XgdjZMFv2chtRZD1auvN+2CW5PAwxNOwoyMDU9zW9ki95mTbCxdZkdbIP3bXldbRBoCgxIRUIlNTXZqUd1EJy83P2TYfR1jZ+KcVNmoOZzLjk3fKGiucOCfKCAoSLkkS3VrkJUGZ89Pxc0eXC7js9zQFsq52rkLLyT1clGMcOjuz1lmDH5EvjsJrv3s9+abDPrNvxnUU04Z1BL/6QOisIwR9OrEK22JDi1DvU/zz4pH3qgNPEKmkYUc4dYCAq/eVXeAdcjAP7fcXbFFW/kSoVCADS1dR/xyfFOv4VVxt25bq5CG95ShlvPgKZGz155FvElwI2k+hd2wj5GqGjktSi30IqnOHj9wV5DHlUGjUelAY3oRLjnabH0XVXCNht1wTNSrg+elE69wiS8NR1xdPOFNpiiVDQ1JKwPelorI05MuoNCpF3wcZpHtzwzyAIqgZQhxGjuHRw+sxvtwoGfp3+Q5+hp2Y+zdh+MBD6RKAN9k0joKPfP0Kw2wgtyhhP6X0NztPifTdzYiJMLLGxeE6DKuqnRyYkeQsQ90bVaiOB6vhrNPKJmfwydM4FxeENBMKB68nCoP6+8iC1fTpBWHGRGh74D/p87KtlHs/A+4psRd9knjdNYTKhFBC9a2CDRe7+JZw/I8308lBraWJmYEztRZrmP837hxk7sVsbc9LQIj/BhdXUtdPnijOHiAf4G8IV45n+dmvjLVuf9jOIp8+X7nbWXQUSMclEmIdjIl91VoUjDkMxZ4MRp6Nb8S0BZ9UdXHAzdHRjEGRh42MrQtZU7mEpIIYZw8Th6D+2cnGHbolL2MYTl10m6ghvkMcWatfIS0etTHu167JAQxq9KzrMLDeqFrvOKJg== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 684123f5-8930-4b8a-f424-08da5f8d7606 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Jul 2022 20:24:00.3103 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: kZZht0+swWRZRhO30SKk7rGnR0vzpJDhDgDoUXbiYZHyQcYqxiZ/y+yiJ5emNfBn8RqpGUpzQquHGpYrLpNPWQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR10MB4032 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.517,18.0.883 definitions=2022-07-06_12:2022-06-28,2022-07-06 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 malwarescore=0 mlxscore=0 spamscore=0 suspectscore=0 adultscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207060078 X-Proofpoint-ORIG-GUID: NWd5dsC_EKr2vcwura-SapuSvW-MrwqT X-Proofpoint-GUID: NWd5dsC_EKr2vcwura-SapuSvW-MrwqT ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1657139048; a=rsa-sha256; cv=pass; b=LGyJ4kVETMPJKxBsZ7Mh36CJqBj4j0U7D3ninYYl0hrFfIjZI0pLw8huLAoHrz1X78UVPI FzUdRWKxGezgVkgbWNTHxUpnyTgR5rGlLENZCCwfiW9SvHWpzjpgmtrfnONAshhYcFdUcQ s2EiW6zgVOoBC+9ztelQsPufU8kxm44= ARC-Authentication-Results: i=2; imf05.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=T1qYSD+Y; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=O0R+gES8; spf=none (imf05.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.177.32) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657139048; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Tba8ZDaXK+/Iic7ZWzu0i4uW3zG08BBO8BWQTMPgjPA=; b=5A8v+IDEdzF9qe4q1qnVAJk+q9e1weXBNVHmjxN4fnxra8ywJV8aLiNhsCKG/B5h3zUaaI dmYX2brMTxwrFhbBCrGRpujWkAjN/5MbzW1hue/AcImB5fgUcl5tl2A2xJjXy9NzwT0Ekr mRVLCD/lOQbMjoft9/yznO2l/W/XT4I= X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 7901C100018 X-Rspam-User: Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=T1qYSD+Y; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=O0R+gES8; spf=none (imf05.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.177.32) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") X-Stat-Signature: 1t79mks93gyzwxdpedenuiionfumfjrh X-HE-Tag: 1657139048-560333 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation for code in hugetlb.c removing pages from the page cache, move remove_huge_page to hugetlb.c. For a more descriptive global name, rename to hugetlb_delete_from_page_cache. Also, rename huge_add_to_page_cache to be consistent. Signed-off-by: Mike Kravetz Reviewed-by: David Hildenbrand --- fs/hugetlbfs/inode.c | 24 ++++++++---------------- include/linux/hugetlb.h | 3 ++- mm/hugetlb.c | 15 +++++++++++---- 3 files changed, 21 insertions(+), 21 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 7a9f25fff869..a878c672cf6d 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -396,13 +396,6 @@ static int hugetlbfs_write_end(struct file *file, struct address_space *mapping, return -EINVAL; } -static void remove_huge_page(struct page *page) -{ - ClearPageDirty(page); - ClearPageUptodate(page); - delete_from_page_cache(page); -} - static void hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, zap_flags_t zap_flags) @@ -510,15 +503,14 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, folio_lock(folio); /* * We must free the huge page and remove from page - * cache (remove_huge_page) BEFORE removing the - * region/reserve map (hugetlb_unreserve_pages). In - * rare out of memory conditions, removal of the - * region/reserve map could fail. Correspondingly, - * the subpool and global reserve usage count can need - * to be adjusted. + * cache BEFORE removing the region/reserve map + * (hugetlb_unreserve_pages). In rare out of memory + * conditions, removal of the region/reserve map could + * fail. Correspondingly, the subpool and global + * reserve usage count can need to be adjusted. */ VM_BUG_ON(HPageRestoreReserve(&folio->page)); - remove_huge_page(&folio->page); + hugetlb_delete_from_page_cache(&folio->page); freed++; if (!truncate_op) { if (unlikely(hugetlb_unreserve_pages(inode, @@ -755,7 +747,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, } clear_huge_page(page, addr, pages_per_huge_page(h)); __SetPageUptodate(page); - error = huge_add_to_page_cache(page, mapping, index); + error = hugetlb_add_to_page_cache(page, mapping, index); if (unlikely(error)) { restore_reserve_on_error(h, &pseudo_vma, addr, page); put_page(page); @@ -1012,7 +1004,7 @@ static int hugetlbfs_error_remove_page(struct address_space *mapping, struct inode *inode = mapping->host; pgoff_t index = page->index; - remove_huge_page(page); + hugetlb_delete_from_page_cache(page); if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1))) hugetlb_fix_reserve_counts(inode); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 29c4d0883d36..05c3a293dab2 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -665,8 +665,9 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask); struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address); -int huge_add_to_page_cache(struct page *page, struct address_space *mapping, +int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); +void hugetlb_delete_from_page_cache(struct page *page); void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, unsigned long address, struct page *page); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c1a0e879e0dc..a9f320c676e4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5402,7 +5402,7 @@ static bool hugetlbfs_pagecache_present(struct hstate *h, return page != NULL; } -int huge_add_to_page_cache(struct page *page, struct address_space *mapping, +int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx) { struct folio *folio = page_folio(page); @@ -5431,6 +5431,13 @@ int huge_add_to_page_cache(struct page *page, struct address_space *mapping, return 0; } +void hugetlb_delete_from_page_cache(struct page *page) +{ + ClearPageDirty(page); + ClearPageUptodate(page); + delete_from_page_cache(page); +} + static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, @@ -5543,7 +5550,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, new_page = true; if (vma->vm_flags & VM_MAYSHARE) { - int err = huge_add_to_page_cache(page, mapping, idx); + int err = hugetlb_add_to_page_cache(page, mapping, idx); if (err) { put_page(page); if (err == -EEXIST) @@ -5951,11 +5958,11 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, /* * Serialization between remove_inode_hugepages() and - * huge_add_to_page_cache() below happens through the + * hugetlb_add_to_page_cache() below happens through the * hugetlb_fault_mutex_table that here must be hold by * the caller. */ - ret = huge_add_to_page_cache(page, mapping, idx); + ret = hugetlb_add_to_page_cache(page, mapping, idx); if (ret) goto out_release_nounlock; page_in_pagecache = true; From patchwork Wed Jul 6 20:23:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12908570 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6C28C43334 for ; Wed, 6 Jul 2022 20:24:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3A1F28E0003; Wed, 6 Jul 2022 16:24:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 309D08E0001; Wed, 6 Jul 2022 16:24:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 06C1F8E0003; Wed, 6 Jul 2022 16:24:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E3AA18E0001 for ; Wed, 6 Jul 2022 16:24:12 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id C5A6360C7E for ; Wed, 6 Jul 2022 20:24:12 +0000 (UTC) X-FDA: 79657801944.01.95BED45 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by imf21.hostedemail.com (Postfix) with ESMTP id 33D821C0029 for ; Wed, 6 Jul 2022 20:24:11 +0000 (UTC) Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 266Ijf04006141; Wed, 6 Jul 2022 20:24:06 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=te8H6FiW+ZCi1kHi4vnGpyWN3HEzotPXjbRd4dg6mzU=; b=TaXdFZcrNT+jH0wLXV6gL8n9BmEGt8hEzRC2j4tQ1M8KXifbzxMwH69Qc8VA/D4MoW2r aa4vppCGitV/cidQagqyTxjKO6niKUD1nZ+oYFC7fh3KTOn3v4NT4EcCe2uJAwqyyRys miCkOlfzX0Hwq+HJD7qLC3OUlmje5eY/a77/g7idj79WpKxVc3cLHEJ8hllpsnn1WwYe i2aLfU8XFqeAsP6SzDrDZIBylPJAw6WrCG21AIyyta+W+9seSDFxR/reo5iXh2LK7D+A ZqGkgu2N7mPlLu8Iax0xIeR0f9Y83D9PW/Ir5VCOVkzd/zcw5MrF01d1l8bztY0qFjJj Pw== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3h4ubyu6vh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Jul 2022 20:24:06 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 266KLH6w015962; Wed, 6 Jul 2022 20:24:05 GMT Received: from nam11-bn8-obe.outbound.protection.outlook.com (mail-bn8nam11lp2169.outbound.protection.outlook.com [104.47.58.169]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3h4ud56ec6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Jul 2022 20:24:05 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Stz45zuajHqqVxA1A27a7WlAxuwnEUoHYuXwXzCxSUWIPkJESgRSb7qK0BKHZtDPtdMY7gE1xtfb/35c5pxp+YdFOu/n985LIrrGVh3GdMAdoeIukg9xJ7efLSWtbYNTYLoA7xK3mZHiGuqL6p/ug12WvHt93gzCMJG/FXeYDNSBgoJKWayWfklXsmTaURm+XDwQFTIlgm6aWFd+a2TXQuCPfICESQcXGC51+u4lnZfW4z/YOJRq1o+Vk3vLT4+OoXILKWmp77N1T8Tjt2nARjLEUuWmvJzOooQ4ZzGE9MBUXPPzB1Adgti9DDd/W521cSnfnb4zVB6vLuFmVas+Aw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=te8H6FiW+ZCi1kHi4vnGpyWN3HEzotPXjbRd4dg6mzU=; b=F+0G3cN2mnymmQHsBOx6+KA0Vh1hEoNmLzitXzfY5OLnYecS60xLHyyP0tt2D6wXHYwCpuFiH0I71bTKy50KLnPzJxqsZum8/FNXQx/fyZLnmR7Bq3kj9qn5SR6TtuK3R+m9AZ+rKeRA4KeR1yvPxnbBoz3tdjVoFbMu4VsSeXTG56wtA02WL4zpXY8UKg76iKly4Zzj5loUv54Wl98wo+HiZRVFzFJMXvEDAkcLfZ6NrVHMK9wDtLXEQPPSaVNXgTXpRKjpIf/85ErXVvKyVGt2QPFbvLGR1aiBLZR3MLzlq8+MiW03vEoOkLkBOL87Dpku2jUdIi8zvaWooaLrSw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=te8H6FiW+ZCi1kHi4vnGpyWN3HEzotPXjbRd4dg6mzU=; b=MlXLSBQSMvinx4NWkCaMNfj8+5z2Ye88jQJJQkX9dCgVcEh4vEKugolKicR886UVLHBtnYsKkOyLJFy96mXcmPdWibYTNK6pPcUeJ2M+Yq+6q2tg7Ud4UIwuEVD5EZ4NA4LHGLeudgrLiBYbmrzDmyKmZ+e0L6B3JtsmJ5Kk5Nw= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by MN2PR10MB4032.namprd10.prod.outlook.com (2603:10b6:208:181::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5417.15; Wed, 6 Jul 2022 20:24:03 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::c1ba:c197:f81f:ec0]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::c1ba:c197:f81f:ec0%6]) with mapi id 15.20.5417.016; Wed, 6 Jul 2022 20:24:03 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Michal Hocko , Peter Xu , Naoya Horiguchi , David Hildenbrand , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH v4 4/8] hugetlbfs: catch and handle truncate racing with page faults Date: Wed, 6 Jul 2022 13:23:43 -0700 Message-Id: <20220706202347.95150-5-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220706202347.95150-1-mike.kravetz@oracle.com> References: <20220706202347.95150-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MWHPR15CA0036.namprd15.prod.outlook.com (2603:10b6:300:ad::22) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: edaea703-4ec4-47c2-f861-08da5f8d77a4 X-MS-TrafficTypeDiagnostic: MN2PR10MB4032:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 1U1NCopZrz6U8SqzBHuXLiY/l4Ed6Nqfj85/keWvrlt4eD1p7SZVhu7sekhUsgvrB79t8AlrjS+C9h0IXR04yJElupTxn5MdQA2h1+Tjfuloft5GrS/ZaCFy+8RSNv7gbKgk/OzEVfAfdb30jShaPlT7vl3xb4kj9u+JR2Al6zkHU7xCcLjCtV0eVnAhLvB2+K6t0bVDKs/wAYfMH10K62K28d0qiSRyLlzGXZlCm6u3BJyHQTP5+CuClQtQObQUinQ8Ih/xgteWMvGuQ3KGC5/k60dBXiukCOluWhC7bGol1iiUEZPREbjodW2kAqgIwXY3IgJgrKOqGAy4KBGJ/GoujmFTIrTvLsJec2EwiqBHsb+3h33GIEEsrxq79ZPqqh4iZ3rApm7Dzh6IkVMxJPogTWWmJr4FNmrsq5el2LfEagz5A68dZCLHdpB52zf7rTOdMtSEvfaUnTklruuwzz1PKnUUmtdV+nuUEAsteCjOxENPf/fxmMjCGlbUB8qYTxZRkRpFZiXcpp7DU+/y3Ll4XgVAJ0ZDZr2Ji14jDD9Zhz0XkJukrFjLNVnT3Wgg3BDnJUQ8EEW6ugX/OOz9DG5m/OK30pkCCdwHGcWNprGBGaVnXNIJthIkHDLBa1LXswwbUeuAdRDyEbyAUKHF3ZsAjn9entmLU0Jjxh+iRnpFY41UplFzkn5gUqHWUVTxguIWfEErHDRVOOQTK3tCskgADhk/c5GAMiSXg9rBOcXpTV+WltwPbuDGhtrCrQzY X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(39860400002)(366004)(136003)(396003)(346002)(376002)(44832011)(8936002)(54906003)(5660300002)(30864003)(7416002)(316002)(478600001)(6486002)(36756003)(66556008)(66476007)(8676002)(6506007)(41300700001)(6512007)(6666004)(4326008)(66946007)(2906002)(26005)(2616005)(107886003)(186003)(1076003)(83380400001)(86362001)(38100700002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: d9I/e0Xl9YWpDRY0jCbQKBrjPdibLEmM3YZN4ps3J8jJsSr+sF6PMRDBBEhJgGGKj4Af8dB6FOxVxw63nmuc2jGNERDUVbcIU/E/GkN530o5+fl6sUBYrL6msIMWLBq41Yjh3DN5WsyriEBgroqQsN526qtlqp03zOS10FapkP+FvX0OO23c2S4zjsRGOWSt67SoARhXLBK/xJWudcQJOXWeLOayVhAa1ohhLsKBYG/xkZ36ro0gyP3KwmAU2YhcXglH3XCMREPl16gRpn02D4jmNixWEo4bKwY0XqRqSnhmHJYByzXSY7D1TCq7TBcxgOEXBmmABxs1MPOF88Q60EWn0s0RcGq7cS/Jayu8KKodu1/aPro6sNFSqimp8wCRqGAUxmeuTv0zxSo8Kr0G9uxtc0a9MjbXuEtFyeXDCv+hT1u+OA9o4vVOXZMoUh2AECg7EHWyVMho8zIUptGB1YkLB+7IsKsP+UAugexzkrl8KN+bjFSYFbk/fw2x8+WMnSNLzajewizioRmRy53mfTy814rpIjZBEgn1VWvOEsvM1L6DuDQ6juTXezQiLER1wro6MxWArRsdYuL4fdIyeM5xDIO3GuORrPGQ5H8kF/GRGdUUg4MakJ7Rhx7ZH0ybAWUBh6Jz1DXDyotTwVdhYpc44gbxAn3bqPtunU5K76CG7w7w/OV4QkXoTRQmuwDqPc6zLNIAeumX8Tp1riSEEW4H1ze6KDoAIigLRU3x3zgvbq8bQoZPl3xHf37cwA9QjoKQvnENcETRK1ur5Mod/HiCfQ2VbicWYO0ydmbbY3F7B+ox2zL4GsuuWTjIXidAsnPrIlH2k92v7S5tHvr6vlg7TFryo+x/B0QiQrq5i9n8XFnkwQ+4ASOMaFdfgS01xKu+wTgNgocIFmLgY3w4eCG55ZbT3cyxzlzBLl1Y51zPkrMROjYKNAbH+dHf9xmmrS8jAlGrUPAU/LHVZplxWAhWQ9cuhTwJEn1e6mKTblINml6t0+RhBGVjs+R1Xqn0ywx45cLrqYfbI7cZlFUh6Z/uinieDiErAByrULpHlScdRwgkF8QjYGSzi9fTzg3c22HQ6vy6/V/IMt/Bq48PPPJcTRBaNLHjv8spYiA7T93mDFC9jxkdYmnF0c5XQYKqufdot4U3utWwovHcBC+ZSAqldC29FQa2Vw8r+Zp8lC0QAuOXk2MH9zOEujIXxaArUbXsoJb3kqLTFJzt1PKFUknPmOUeot28HclQhl2tX4WuLnVjDGtT+7rTE0AmqqWB1+9MyQ4jKvBio03tL42AHmGRhNFn940meVcnLNcF8ChUqy7Zuv/QmgMs+tufJISq2UwtK2r22AXdRityj8J7BmIZwqVR5Pr9WMP1Woqar4gQ0lZz/EwogFAuM5jLey76Lq/cVGUwF9m8EKqOc8l1R01A+DsvjGKacQuQRqlI0t2Z7nJRJEo/t5LRXW3jpZxjKZoK8vh+f8cqpJHklGm6na/0ltCTwx6ZelWSYuU8Ni9ee9BNfheQa//7BswK1IQSXl+vM+6TMUmfUI+qOS4ogbnrMftkYHiAUgW++II1+SimEU8Og4faVakVEpA8gkIcgnuLQS0/NwruklU6Scpg/g== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: edaea703-4ec4-47c2-f861-08da5f8d77a4 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Jul 2022 20:24:03.0548 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: cLl6r8Xe7UdZOxfX8ESjBp/OFiTG7mJmNxaYBucT8Gqmy27z8sykL9zMoUPIGbKdR4rfbiS+ZnpUaMHRE+xb4w== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR10MB4032 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.517,18.0.883 definitions=2022-07-06_12:2022-06-28,2022-07-06 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 phishscore=0 malwarescore=0 mlxscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207060078 X-Proofpoint-ORIG-GUID: CSXnD4FdLKUL8bBUV4Oux3Y6KSCL7Xg2 X-Proofpoint-GUID: CSXnD4FdLKUL8bBUV4Oux3Y6KSCL7Xg2 ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1657139052; a=rsa-sha256; cv=pass; b=qn7DCHTAA9aP7Othw3YQSzdLvWXjUtMqs1MunmrlD3VtRNvpDN+mV6w+XB/GDPfzjm/GBL ktNuAiKsII1X6fAhFLKh6jS5PDmscXrfwnY1McfCyj5dH5Q6eQ83pY10xvYAPqmMJjzmsh b9rdlgsJdGL364VhneAIb0vJTP8EV5k= ARC-Authentication-Results: i=2; imf21.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=TaXdFZcr; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=MlXLSBQS; spf=none (imf21.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.177.32) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657139052; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=te8H6FiW+ZCi1kHi4vnGpyWN3HEzotPXjbRd4dg6mzU=; b=gbuKh8Nw8vZ8Yww85lEqwUPg1QS3/hE2vzSIngzDtpu1ZPCWpTklvsF+yxvdt63bRhby3y oF2UeHCPwfkkwQbvcjkkv9a2+fgkBsvbkNSXgi3xvNXVDGT1hq4ewbRQpTJs+8PZOBy25T g/cihOMc92pjoGLU8o/yxrLl+rc7aV4= X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 33D821C0029 X-Rspam-User: Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=TaXdFZcr; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=MlXLSBQS; spf=none (imf21.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.177.32) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") X-Stat-Signature: 8eyr4abp8huhx6skp6huyjjpoky8pftw X-HE-Tag: 1657139051-805054 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Most hugetlb fault handling code checks for faults beyond i_size. While there are early checks in the code paths, the most difficult to handle are those discovered after taking the page table lock. At this point, we have possibly allocated a page and consumed associated reservations and possibly added the page to the page cache. When discovering a fault beyond i_size, be sure to: - Remove the page from page cache, else it will sit there until the file is removed. - Do not restore any reservation for the page consumed. Otherwise there will be an outstanding reservation for an offset beyond the end of file. The 'truncation' code in remove_inode_hugepages must deal with fault code potentially removing a page/folio from the cache after the page was returned by filemap_get_folios and before locking the page. This can be discovered by a change in folio_mapping() after taking folio lock. In addition, this code must deal with fault code potentially consuming and returning reservations. To synchronize this, remove_inode_hugepages will now take the fault mutex for ALL indices in the hole or truncated range. In this way, it KNOWS fault code has finished with the page/index OR fault code will see the updated file size. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 88 ++++++++++++++++++++++++++++++-------------- mm/hugetlb.c | 39 +++++++++++++++----- 2 files changed, 90 insertions(+), 37 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index a878c672cf6d..31bd4325fce5 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -443,11 +443,10 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, * truncation is indicated by end of range being LLONG_MAX * In this case, we first scan the range and release found pages. * After releasing pages, hugetlb_unreserve_pages cleans up region/reserve - * maps and global counts. Page faults can not race with truncation - * in this routine. hugetlb_no_page() prevents page faults in the - * truncated range. It checks i_size before allocation, and again after - * with the page table lock for the page held. The same lock must be - * acquired to unmap a page. + * maps and global counts. Page faults can race with truncation. + * During faults, hugetlb_no_page() checks i_size before page allocation, + * and again after obtaining page table lock. It will 'back out' + * allocations in the truncated range. * hole punch is indicated if end is not LLONG_MAX * In the hole punch case we scan the range and release found pages. * Only when releasing a page is the associated region/reserve map @@ -456,27 +455,46 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, * This is indicated if we find a mapped page. * Note: If the passed end of range value is beyond the end of file, but * not LLONG_MAX this routine still performs a hole punch operation. + * + * Since page faults can race with this routine, care must be taken as both + * modify huge page reservation data. To somewhat synchronize these operations + * the hugetlb fault mutex is taken for EVERY index in the range to be hole + * punched or truncated. In this way, we KNOW fault code will either have + * completed backout operations under the mutex, or fault code will see the + * updated file size and not allocate a page for offsets beyond truncated size. + * The parameter 'lm__end' indicates the offset of the end of hole or file + * before truncation. For hole punch lm_end == lend. */ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, - loff_t lend) + loff_t lend, loff_t lm_end) { struct hstate *h = hstate_inode(inode); struct address_space *mapping = &inode->i_data; const pgoff_t start = lstart >> huge_page_shift(h); const pgoff_t end = lend >> huge_page_shift(h); + pgoff_t m_end = lm_end >> huge_page_shift(h); + pgoff_t m_start, m_index; struct folio_batch fbatch; pgoff_t next, index; int i, freed = 0; + u32 hash; bool truncate_op = (lend == LLONG_MAX); folio_batch_init(&fbatch); - next = start; + next = m_start = start; while (filemap_get_folios(mapping, &next, end - 1, &fbatch)) { for (i = 0; i < folio_batch_count(&fbatch); ++i) { struct folio *folio = fbatch.folios[i]; - u32 hash = 0; index = folio->index; + /* Take fault mutex for missing folios before index */ + for (m_index = m_start; m_index < index; m_index++) { + hash = hugetlb_fault_mutex_hash(mapping, + m_index); + mutex_lock(&hugetlb_fault_mutex_table[hash]); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + } + m_start = index + 1; hash = hugetlb_fault_mutex_hash(mapping, index); mutex_lock(&hugetlb_fault_mutex_table[hash]); @@ -485,13 +503,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, * unmapped in caller. Unmap (again) now after taking * the fault mutex. The mutex will prevent faults * until we finish removing the folio. - * - * This race can only happen in the hole punch case. - * Getting here in a truncate operation is a bug. */ if (unlikely(folio_mapped(folio))) { - BUG_ON(truncate_op); - i_mmap_lock_write(mapping); hugetlb_vmdelete_list(&mapping->i_mmap, index * pages_per_huge_page(h), @@ -502,20 +515,30 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, folio_lock(folio); /* - * We must free the huge page and remove from page - * cache BEFORE removing the region/reserve map - * (hugetlb_unreserve_pages). In rare out of memory - * conditions, removal of the region/reserve map could - * fail. Correspondingly, the subpool and global - * reserve usage count can need to be adjusted. + * After locking page, make sure mapping is the same. + * We could have raced with page fault populate and + * backout code. */ - VM_BUG_ON(HPageRestoreReserve(&folio->page)); - hugetlb_delete_from_page_cache(&folio->page); - freed++; - if (!truncate_op) { - if (unlikely(hugetlb_unreserve_pages(inode, + if (folio_mapping(folio) == mapping) { + /* + * We must free the folio and remove from + * page cache BEFORE removing the region/ + * reserve map (hugetlb_unreserve_pages). In + * rare out of memory conditions, removal of + * the region/reserve map could fail. + * Correspondingly, the subpool and global + * reserve usage count can need to be adjusted. + */ + VM_BUG_ON(HPageRestoreReserve(&folio->page)); + hugetlb_delete_from_page_cache(&folio->page); + freed++; + if (!truncate_op) { + if (unlikely( + hugetlb_unreserve_pages(inode, index, index + 1, 1))) - hugetlb_fix_reserve_counts(inode); + hugetlb_fix_reserve_counts( + inode); + } } folio_unlock(folio); @@ -525,6 +548,13 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, cond_resched(); } + /* Take fault mutex for missing folios at end of range */ + for (m_index = m_start; m_index < m_end; m_index++) { + hash = hugetlb_fault_mutex_hash(mapping, m_index); + mutex_lock(&hugetlb_fault_mutex_table[hash]); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + } + if (truncate_op) (void)hugetlb_unreserve_pages(inode, start, LONG_MAX, freed); } @@ -532,8 +562,9 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, static void hugetlbfs_evict_inode(struct inode *inode) { struct resv_map *resv_map; + loff_t prev_size = i_size_read(inode); - remove_inode_hugepages(inode, 0, LLONG_MAX); + remove_inode_hugepages(inode, 0, LLONG_MAX, prev_size); /* * Get the resv_map from the address space embedded in the inode. @@ -553,6 +584,7 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset) pgoff_t pgoff; struct address_space *mapping = inode->i_mapping; struct hstate *h = hstate_inode(inode); + loff_t prev_size = i_size_read(inode); BUG_ON(offset & ~huge_page_mask(h)); pgoff = offset >> PAGE_SHIFT; @@ -563,7 +595,7 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset) hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0, ZAP_FLAG_DROP_MARKER); i_mmap_unlock_write(mapping); - remove_inode_hugepages(inode, offset, LLONG_MAX); + remove_inode_hugepages(inode, offset, LLONG_MAX, prev_size); } static void hugetlbfs_zero_partial_page(struct hstate *h, @@ -635,7 +667,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) /* Remove full pages from the file. */ if (hole_end > hole_start) - remove_inode_hugepages(inode, hole_start, hole_end); + remove_inode_hugepages(inode, hole_start, hole_end, hole_end); inode_unlock(inode); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a9f320c676e4..25f644a3a981 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5491,6 +5491,8 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, spinlock_t *ptl; unsigned long haddr = address & huge_page_mask(h); bool new_page, new_pagecache_page = false; + bool beyond_i_size = false; + bool reserve_alloc = false; /* * Currently, we are forced to kill the process in the event the @@ -5548,6 +5550,8 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, clear_huge_page(page, address, pages_per_huge_page(h)); __SetPageUptodate(page); new_page = true; + if (HPageRestoreReserve(page)) + reserve_alloc = true; if (vma->vm_flags & VM_MAYSHARE) { int err = hugetlb_add_to_page_cache(page, mapping, idx); @@ -5606,8 +5610,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, ptl = huge_pte_lock(h, mm, ptep); size = i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >= size) + if (idx >= size) { + beyond_i_size = true; goto backout; + } ret = 0; /* If pte changed from under us, retry */ @@ -5652,10 +5658,25 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, backout: spin_unlock(ptl); backout_unlocked: + if (new_page) { + if (new_pagecache_page) + hugetlb_delete_from_page_cache(page); + + /* + * If reserve was consumed, make sure flag is set so that it + * will be restored in free_huge_page(). + */ + if (reserve_alloc) + SetHPageRestoreReserve(page); + + /* + * Do not restore reserve map entries beyond i_size. + * Otherwise, there will be leaks when the file is removed. + */ + if (!beyond_i_size) + restore_reserve_on_error(h, vma, haddr, page); + } unlock_page(page); - /* restore reserve for newly allocated pages not in page cache */ - if (new_page && !new_pagecache_page) - restore_reserve_on_error(h, vma, haddr, page); put_page(page); goto out; } @@ -5975,15 +5996,15 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, * Recheck the i_size after holding PT lock to make sure not * to leave any page mapped (as page_mapped()) beyond the end * of the i_size (remove_inode_hugepages() is strict about - * enforcing that). If we bail out here, we'll also leave a - * page in the radix tree in the vm_shared case beyond the end - * of the i_size, but remove_inode_hugepages() will take care - * of it as soon as we drop the hugetlb_fault_mutex_table. + * enforcing that). If we bail out here, remove the page + * added to the radix tree. */ size = i_size_read(mapping->host) >> huge_page_shift(h); ret = -EFAULT; - if (idx >= size) + if (idx >= size) { + hugetlb_delete_from_page_cache(page); goto out_release_unlock; + } ret = -EEXIST; /* From patchwork Wed Jul 6 20:23:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12908571 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82E4AC433EF for ; Wed, 6 Jul 2022 20:24:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A75C8E0005; Wed, 6 Jul 2022 16:24:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 12F4C8E0001; Wed, 6 Jul 2022 16:24:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E9DCA8E0005; Wed, 6 Jul 2022 16:24:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D80918E0001 for ; Wed, 6 Jul 2022 16:24:15 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id B111E60508 for ; Wed, 6 Jul 2022 20:24:15 +0000 (UTC) X-FDA: 79657802070.16.E2250B1 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by imf09.hostedemail.com (Postfix) with ESMTP id 3DD58140039 for ; Wed, 6 Jul 2022 20:24:15 +0000 (UTC) Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 266IXbpn009655; Wed, 6 Jul 2022 20:24:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=XjmblYtiNvjlDsjHALrZ+jjlDpAI9GhABzU9BOLgMpQ=; b=HtHlW7cyfgeCEQfzF1ZPA8KOh9mYKK7zrVsJtCLmzmwYgioLJ0Gq8ZEbulHpcKeI3LGI 5fKUukFHJfDPy1iFnCMfpwvprJoM0YfHV/ZotBhrln0D/bIznrVtdPOQHgYUvnOlzZDH pEHX+4g+kwL+WObvmqbH6jvf7EdfEu4l+RNsDxn4KYg+CzicF/FYF8YlwdPcEB9IBmbP +sj8SLP7/HOGZr81evJIRYlTmp4mhxuWC9H5R+nsXZWiL0vkWzPhUPaIJC1Hkb5YwG1e CZdyzMLHCBR71a0xNEYQr+UpABv0RZhDaKobwEwYpEb6GsUp/08RbqWQJ/SIEc8JQI6m WA== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3h4ubyb9x6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Jul 2022 20:24:09 +0000 Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 266KLKn1035231; Wed, 6 Jul 2022 20:24:08 GMT Received: from nam11-bn8-obe.outbound.protection.outlook.com (mail-bn8nam11lp2169.outbound.protection.outlook.com [104.47.58.169]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3h4ud6cu5w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Jul 2022 20:24:08 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SwZk/nk9trQj6q/uBTqaSfx31SVuUuxIq6Kd71AQv578+tJtFaiyb9AqpfmYPh00uFaBB0Fvnu8jhB3mtSSGs4S6HUHTgONokjfiH5+ks6jEnaV97G/zV37p1hZwJBslZPghO6pciC/XUvAI8lEJNUAEu3M3gtNjYal2J5aWBJ3dn9FKHilplk4fthSgOnJjhHdNsCxsKwogzOyknTKtNIcWForR/8wf0p4yMzfBz0T0bMtj/gpveWXI39kCQ/3W4JehPzQdkNE6+FahE0u8TRdFRYJjNSj5v0PbzDdCQ+UHI2ktoGG4yPoOkPStC0hOlGeB4DC9qaN2BZJeq+4osA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=XjmblYtiNvjlDsjHALrZ+jjlDpAI9GhABzU9BOLgMpQ=; b=MfHMnxSfUfLNdK3927xFeL4qFbwG4qaxOPxyj6c2IQfm9ehsgkayG+4aGW08sfRwJBxUD2K3RSDS3s2rW3XaW9Vx1743f3czq7ELomW+6niLDyFT5UBhNSBmLJzPDvoCpyUzqJMFw/VjrGhWHN6vAl13kfluI19sDFypsiDmv0n+OeYyAYPFJ0A4EOs2AIojDfQ+jN14Ie5W30ohPH7ymoPrlA3wt6CuwFf4CJp984jK7wqihCLWNCi4jZpfXUWOl34vYu59H0g1+p2n1utIz89b8Y51D0j2/18kyc01FoxbKgPiGR5bZs4U9ruy2darsz2quomaBrnEz+GKFBhGfA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=XjmblYtiNvjlDsjHALrZ+jjlDpAI9GhABzU9BOLgMpQ=; b=TCTua6lfBECNfZLwAxRpLCbnf7QSoVtKDycEXyFL0RB4EcZcBqdaY4LPTkCP/oOkihad+xJDD0ow2h7OAiwbm5AzJWWa8eFTuZd6Avq35DAzJzfWnciCjhY6koPZAoldwHRPMY7a1dtdUA9fxFqroSPuXlbGLTdyabESvumnxwU= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by MN2PR10MB4032.namprd10.prod.outlook.com (2603:10b6:208:181::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5417.15; Wed, 6 Jul 2022 20:24:05 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::c1ba:c197:f81f:ec0]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::c1ba:c197:f81f:ec0%6]) with mapi id 15.20.5417.016; Wed, 6 Jul 2022 20:24:05 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Michal Hocko , Peter Xu , Naoya Horiguchi , David Hildenbrand , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH v4 5/8] hugetlb: rename vma_shareable() and refactor code Date: Wed, 6 Jul 2022 13:23:44 -0700 Message-Id: <20220706202347.95150-6-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220706202347.95150-1-mike.kravetz@oracle.com> References: <20220706202347.95150-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW3PR05CA0006.namprd05.prod.outlook.com (2603:10b6:303:2b::11) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: e4515aef-f2fe-42c9-e86a-08da5f8d7911 X-MS-TrafficTypeDiagnostic: MN2PR10MB4032:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: DbOLOSC6OH6Wk+9LGTJVyaCPe+uL2Zgj76/wULF84SGPhRKkse9T/lE8ZN0AohRPaNLvrlo5cED9TBzKzlLKcdgobXdLKl2w4xnQWau5eUgz2QfEptOa/VM+t33QiefTqy7JrXXu3hQA6Wol9GGUyaH5ld1cwK9jkC4DPbbtIVST+HeA2Nbx9EUVAlptlTD0nbPrw9mP+EhnfUvzEF7+vK0UM3SHzl0KdynNSyR9Zi7DGMCz8VLQvAaBVAyosn8wGlgGkDyvt8gcSSqGfSCMJVQ3xZz+EewRy73QZBLjX0UpeffojNJqQpbPW/3HeIDBNkmZFwNyh0lOiwNKzIsAN5aUddPZFV2y8JmkwTgdZ6PUVk6AJpkVF9f7sQQw3vwvr0IXefGU2TLk0vewy+ybst3+ZzAY+aYzQbTZ2R0Y6TUBfHLDL4fP9/YU8NHITHJtCRH25sdKd6PdtBLyzJTy/btrYaQjY/GZjUM9vQrUNrM5VxYyRjow0gsNXXHJppj+PywJQP+FzCkO1sWrUPynSRhqW0bVHu+421eoqsumyYR9kSl4wVDBMnhX9bLNWpniDY2/z2O4VZoMt3nVXJaDj2H14+uXcBRpm1TOdwQkjnOCAlFv1aGUfsOSfsAaeoqKSeXGwMQ/oj/5mAP3tNqSyTKG0F9CNhiyBWPT+SoSLYFYrmoQ0IaamfUhpxlDY6wS5rOZiDZtV6+ybZ6FiG5w6+RdIEKqBBDGhqk3fFmYe2qeBULMy+64645N8R1WVBNI X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(39860400002)(366004)(136003)(396003)(346002)(376002)(44832011)(8936002)(54906003)(5660300002)(7416002)(316002)(478600001)(6486002)(36756003)(66556008)(66476007)(8676002)(6506007)(41300700001)(6512007)(6666004)(4326008)(66946007)(2906002)(26005)(2616005)(107886003)(186003)(1076003)(83380400001)(86362001)(38100700002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: J0hL9i8FM5dw1vm4+38q7B0G7r3WHnSWJIMaE2B9rWXmztBcRcEMn/R3vEOSafQ5EtR9qUPsd+AJ9MQURAiTLAZjsRouiHMB+bHprKVD7DD4D7Te789mPanB16/SyC+cgs8zyYBwn8vyAXp1AaZrvmRxeuZ5evIVZmCL4n0t45Vz26MjR1FGYA/ZxS4SAgi12wUtcp0mgzKR0GTvsqN25+Geyzhep40E3S5MJ2rGlN+/0jrbLHwTtf8fD8VIEnJ95yem9Hhx0p4VZL5QPr15wnrcTRAdlaDokilgLSA+IJkTOwgUOSkyRnaYymNHi3A9k6U6+kvAcEJ+YlcZzHXUcmu+oxfuIvBFP92O6Vmav8KDL2fPKxegk+IJsqZFvBEQfuXnoD1zqgr+VDbe9qLX6qv1mLmmatfrc8X/Hg1Ov3N2pNcg1FUgsJ1t0j2vsUmjBr7R0MXLGb7uou/cazZVz0h+nB3hMAETh+NvNjL/+FanNEkP7+yThQ55hHY2PP6ua0xlgIQUXtwipTMfYVdwJbk0NkgbTDndnSm79TSKe0m5T+ikWA9KXgg5clf2Xl9S/RYi9Hro1rpwaoX+31bCaLgi49XPvLlQBt2WOMqekog75ElrFqbHz+AOT3FommA/tXtb0f24t01JJ2uSXXdVejSHqM8hnL1HHaOggoLR84DpL1IoHFF9UL3kfGTQBZ0BemRUBWiXlXbf7HFY8RdJ41YIy3/9yzzI6GESusSrfdXu2lp+b8zVvmEw7xj/AXuEkv9leBG1AbKcHEfdLY1W9kaMn5ADWUi1MKPe/fHEEVx9U08JpV8HScQmVuI1EWDgPJbNgdGZpFki/JxpGgc+sqctALgnzZDB2hLG7G0NsozBj5NdPodppKmW2HWNGxnEg0NtTuZBktngQvixKg2QhpgvGt4n4u1ApPI/xODDpaEkZZz240uj2103ua8Wa9mHnDwcB6A9WErOSDkG4BiA/y5iLirboIM9vxuuOxVHF7v6R+6AwCILuodvy2arYpmjUKUeW/1ly05Is4Aw0tJ9Psa/ut8aPjyy+uOmNE9ocJJ1+qjqg7/Rb9nMwl6oCYWywV0bDJ0ShVmlX/pXh93+rPC4iU/pl7Uc6kHDbyukV7nfHobrS9+BBaFEfqsyQ2qilEF+eSp+9PhbcEPD9nFksJr4SlS2qZWvPHrUvnlIkEYpaYjKSGDdon5Clnu3bUmejM43x0IYtr2Lr9fb8Egf1qSjMxySChzyPCcmiSMFf7n63AHpJTrw4dQn5qxZHoRTKcBRX93xAOomsTG5XpYXUmnHoHtm1TBR1jpCVFJ/sIPjj5VKgB/IslCAx+XawS/ijbmYWc5PYn86Yc7xcUHCmlJ5qI7FwJGOv07lm7McWe8XSwilE2VsPREbgX4TXNyXxWvKtfT8riePAokUDisVlHR8yLLRfMnkvyJveA412KSsWEPb28XZ08RfphGipgs0t7CsaBPc7ghp9ZjqTRzBWFmAjYSMVTLUw7Y80kTjXyusx5is0kzGlQBaHUYizcT/0FrQMJzrweUZufTXJd3RqnaWdmurH1fYVBUogvurft0AgogfuebRBhb1UJ69RtvcNVXgJLJwTTWKgGxc2HBEzg== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: e4515aef-f2fe-42c9-e86a-08da5f8d7911 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Jul 2022 20:24:05.4309 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: Jl/W//LXPsegmOVd0AEewaXrNY8pMPyx04/P7XglEentO0Lb4ucCpef/52vpgKM+mDU4KefHMTKud6zz3QgTrQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR10MB4032 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.517,18.0.883 definitions=2022-07-06_12:2022-06-28,2022-07-06 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 suspectscore=0 phishscore=0 adultscore=0 malwarescore=0 mlxscore=0 spamscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207060078 X-Proofpoint-ORIG-GUID: SPUdiZNZXjU_T3bplexlJVegEH02KK8H X-Proofpoint-GUID: SPUdiZNZXjU_T3bplexlJVegEH02KK8H ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657139055; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XjmblYtiNvjlDsjHALrZ+jjlDpAI9GhABzU9BOLgMpQ=; b=5IE3jYeZ/Mt9TcBnYb7+U4PFMZ+3t0kd2tBETMM/hOOUC8OVkiIxDpBgmfvlKBnG0S6Je1 FzMTfgobApxdrOT0GuHv7WBI+PjGP4d7iASDg4Xj3M5M16klDsQoSTjOZPh179W6drGSOj gtfzSDll9SEl9NRFcZH2K2tSfww+Bww= ARC-Authentication-Results: i=2; imf09.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=HtHlW7cy; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=TCTua6lf; dmarc=pass (policy=none) header.from=oracle.com; spf=none (imf09.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.177.32) smtp.mailfrom=mike.kravetz@oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1657139055; a=rsa-sha256; cv=pass; b=yQElX5NR1xYAhBH3WesddymhcCtW0CuSX4wmg4iBOycadQlnYYYo7LiwLdWpXb6cLMGr1U HFqogU03/tIP+W9zYgAV7Wk4tXBhJaFUgVXjrXOzFq2g11WVD2nFgyDy6nTtdo8mElLuWv CRL3tc38zGuaLgjLNiS1UO+YTL10Zaw= X-Rspam-User: X-Rspamd-Server: rspam07 Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=HtHlW7cy; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=TCTua6lf; dmarc=pass (policy=none) header.from=oracle.com; spf=none (imf09.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.177.32) smtp.mailfrom=mike.kravetz@oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") X-Stat-Signature: yf14ctmop3fxdb5wq7p8dke5bwbuby44 X-Rspamd-Queue-Id: 3DD58140039 X-HE-Tag: 1657139055-661685 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Rename the routine vma_shareable to vma_addr_pmd_shareable as it is checking a specific address within the vma. Refactor code to check if an aligned range is shareable as this will be needed in a subsequent patch. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 25f644a3a981..3d5f3c103927 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6639,26 +6639,33 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma, return saddr; } -static bool vma_shareable(struct vm_area_struct *vma, unsigned long addr) +static bool __vma_aligned_range_pmd_shareable(struct vm_area_struct *vma, + unsigned long start, unsigned long end) { - unsigned long base = addr & PUD_MASK; - unsigned long end = base + PUD_SIZE; - /* * check on proper vm_flags and page table alignment */ - if (vma->vm_flags & VM_MAYSHARE && range_in_vma(vma, base, end)) + if (vma->vm_flags & VM_MAYSHARE && range_in_vma(vma, start, end)) return true; return false; } +static bool vma_addr_pmd_shareable(struct vm_area_struct *vma, + unsigned long addr) +{ + unsigned long start = addr & PUD_MASK; + unsigned long end = start + PUD_SIZE; + + return __vma_aligned_range_pmd_shareable(vma, start, end); +} + bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) { #ifdef CONFIG_USERFAULTFD if (uffd_disable_huge_pmd_share(vma)) return false; #endif - return vma_shareable(vma, addr); + return vma_addr_pmd_shareable(vma, addr); } /* From patchwork Wed Jul 6 20:23:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12908572 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49BABC433EF for ; Wed, 6 Jul 2022 20:24:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DDA8A8E0006; Wed, 6 Jul 2022 16:24:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D64188E0001; Wed, 6 Jul 2022 16:24:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B681C8E0006; Wed, 6 Jul 2022 16:24:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A3D328E0001 for ; Wed, 6 Jul 2022 16:24:18 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 749A220A3D for ; Wed, 6 Jul 2022 20:24:18 +0000 (UTC) X-FDA: 79657802196.05.BF533C8 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by imf24.hostedemail.com (Postfix) with ESMTP id F41A118000B for ; Wed, 6 Jul 2022 20:24:17 +0000 (UTC) Received: from pps.filterd (m0246629.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 266Iutv8010478; Wed, 6 Jul 2022 20:24:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=woIq4osnSBp8gNWba+juRZH6czD+qLIXHmmBC+ZTdaU=; b=bIRsL1LWud9h8hlvAA4fkoOo/z+fr3xuwUm754jCBPdzKrF7XyERxmc9Zz/s98GyNVFq Vpoq9PaukkmPlOAUg20Gm51fp3DGvhmp8tnHw6jRzEmH8WT/h8w2BwLtH27e4aEU2mI3 SWbDmtSoDaT8oxNowaEq99YGKE5d/XkadKv/oJ+sarofF9unPHR1ef/47WReh/ZtYzB0 MlFUU8OYrfr7I1nik+RJrFURztaQAR+pp2ghu9gEzU3NdNX/zXnElc88z+BrekPMbwP7 TiYZ+kQgSbpAFgpjtkPwUfj3LqLlLRME24sOfwdOxSb3aPcMqOlYCB6H5zwWWZ564yWD hg== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3h4ubyb3ur-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Jul 2022 20:24:10 +0000 Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 266KLKn2035231; Wed, 6 Jul 2022 20:24:09 GMT Received: from nam11-bn8-obe.outbound.protection.outlook.com (mail-bn8nam11lp2169.outbound.protection.outlook.com [104.47.58.169]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3h4ud6cu5w-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Jul 2022 20:24:09 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=g3Y+IN3aDwYXCyxXrlT3W0wWu/1aeaCRzj4UYyQnkLt6pQ6W1zR5LQ1Nx6qbTaOmbkS/IkYTZ85ps4bDRSwB2AlHqX4rxiajhvDfrfdBUsgQmzroI/OfxLJ8X+8Aa+1PAQMUHLTazAVMO/JknvpAK6dGGesQUyBAjpCBu4Y58yKnA1q5apMNjMoIMBfmOfVOTtSU6JJIJFpRtqehQ0EA71593Og9zZ4kHZwcyPHpSCeT1u2ODRUZlTCjhzIRIy/VNogsSen9Nj+/xA67W1Y47/QXISDyrJBlk/5vUxcJwzoF0iKbbQqhPjVdllmuMW4M0Ee8Z+byY/7Po6Pi22u8qA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=woIq4osnSBp8gNWba+juRZH6czD+qLIXHmmBC+ZTdaU=; b=mIZWIDqkFxaij99e2E1V6euqdnOrAkkMsOx7mqKaAZbKVR5z8m+RBQOMFBMxxb1kiFoo0NpnsqG3u8HfoYvDuCRapG/iIEMn+Ee0vaa72jqsvuqdrcOU5+XXfs1yUuIzRC52MrInJswYZVA+n6oyI0kRG/vFOYWzXCxEJaGG8vvqk5lfk1xuuIx0ZaSkvX2fSXgLR/RxulOI1ji+EjhiShfLKGpvyQtPBDiMngbr93zWRjmCXc7Z+eRU656iyZuXjyoN3YPkhyLv0y2wAheSyrDg5WJhJqUqic+YVNPcXFMmGITEk8ZBnsF6AV2uQt/2f4qEO0E9axnOq2DCK7tQvw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=woIq4osnSBp8gNWba+juRZH6czD+qLIXHmmBC+ZTdaU=; b=NSV4ghAzzrMQArfku7Ptd+083xO8VUbeNFavQ+tsEgE3PQRqVE4LovqmFFHkuemPIT0D//uZ+dIuc2h6BpUAL/RL8xW3YGRBGTwHZz+A5xWyK4Z+62UVYPURbSWPh/ujdnRpbwOKxsD+ATPgT2fMko+Msbh7oJ+LsDU5sF1ryjU= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by MN2PR10MB4032.namprd10.prod.outlook.com (2603:10b6:208:181::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5417.15; Wed, 6 Jul 2022 20:24:07 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::c1ba:c197:f81f:ec0]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::c1ba:c197:f81f:ec0%6]) with mapi id 15.20.5417.016; Wed, 6 Jul 2022 20:24:07 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Michal Hocko , Peter Xu , Naoya Horiguchi , David Hildenbrand , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH v4 6/8] hugetlb: add vma based lock for pmd sharing synchronization Date: Wed, 6 Jul 2022 13:23:45 -0700 Message-Id: <20220706202347.95150-7-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220706202347.95150-1-mike.kravetz@oracle.com> References: <20220706202347.95150-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW4PR03CA0041.namprd03.prod.outlook.com (2603:10b6:303:8e::16) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 29728986-247c-4a13-6f19-08da5f8d7a5d X-MS-TrafficTypeDiagnostic: MN2PR10MB4032:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: o63xuuAausOwm0WPErDLzSf4SZj6OilP1Gd3lQ9wdK6hjjoq0QADaktSXGvaFUvVVziI5DW+4isj4v5/11AQ2FyypaI9LA37EkUZsJToE9zgn4wlSCMKhYJex65QxJqnd7m/7tpmbx9fTYpg4SFQLCvo5CdGB0ibJIMLQWIzyklWInO2HA333EFHnG/DkOAZ+bS30PqFdNHLZr9/W0CPEnp7f/b8hjmsXkrcYUkfmJe8pRMKzZPUybefNjazSj3oqyLUHdZGLXO9qOuBnlmFFd7RgBJ8utNuDXv4pupiAeBBbzpw4OIYyLbqWZV8tT8Av5eL6O38Yj6dAhU5Qdosu6SuAG4D2qLBnqtRljPzbbRd9IFbdYBWeRfJU/0pRDalHOErOqTc8ERw79NgsBnZUfGvY2Glw0KnxhZ29bJXNSiL2Ryr34gTSmBxMhmz48Yk8uCL7D3Q85xBqBzcAt30HbDL+wslcxkGgoH0TMZ85MMH50UZfbgOQBb7VX2kAhKYtwr+WIg4lCA48fo411c7oGcpmZr++xfUyFY0JtOn4OoFfXJGB0HVqwSUcC44kVDpU3UISBA4smMBfylZ0cO6dl0ADO+odkXKipfYQb7GY3IsdZNzXZIp/TL1XwDNFLuOS9keVMsuZR2mdP6a60aScF2yvJ8qO6BwhapPMKkflzooOieFKFxcjppOFKvv69PiTr5KRe5RMHxhsfsvN+bxl9e26OmwiCpxboY6emWPS53vFav4ALkxO4Chs6O4lii+ X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(39860400002)(366004)(136003)(396003)(346002)(376002)(44832011)(8936002)(54906003)(5660300002)(30864003)(7416002)(316002)(478600001)(6486002)(36756003)(66556008)(66476007)(8676002)(6506007)(41300700001)(6512007)(6666004)(4326008)(66946007)(2906002)(26005)(2616005)(107886003)(186003)(1076003)(83380400001)(86362001)(38100700002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 3BzSu6o00gwz4qws4ENJVvqfqBiBqrQ5tvz6vDEq3r2xVRrCW6LRh01YKs9M8M2hLXLDEQISIkBAoFK0WPDWxPq6YnmUnoo907Bf0jYWQVhkA+vdzNKBxYOpx2UAMpYokX7LfnmAfM83+7W+VzQk5YoYLqEBxzitZ7LjCnOpxzGKnG3EB4r+txoaD/4645FnYeAz7WBtWVDcpKKTRIfcjDYEfeHVjwknbII2ZpcnL761wCER6cSFATT5y1SaoScv3zrqwUcvyq6YVyqRA5n1sP0SdbQoc4ae25imzX3VIcUPu5w9/Y2yjaqJRt0NKbLrZav4uSNKSTlRll4GF51GkIbs1254Dksluz3GqfJ99GIX2xqSo8HLnL8mTHS/H9LYxGr1jrYr36PVwxglme7x6pP70TNQKzfZJpNjc/LUT+bcEG7bMMg1ArREih4DxPwYjqkEn26KD60au+IzHtMlfEOkp9xirkHf/G1f1rdSmtgmFFCd2oe58BttI/iU1MLVZ3t3o++nH358CObYt4Skx3g9q/vDLyHyaGebyv+2yqRsivNOb8EjnyQR2pnEjTGTNGuhsOnOqFgyAImR+FTl3HZtDMDf5VSqlIp+wGrSM8WD0SvKQ6jhKBCIfsAaJUrnIW/GSnD6o6CblcuMTkPPRNNz64ekqxNaSWJjaiR6Fao6zFtHY28LLB+UaZLlZ1tIxuh4JTPHwwJYqoKy6Rki0N8q7kbLo6YtgpRybqZ5kD2vjvv5fHu505Bzjo213XdvPfpYKM6kYLbk7oc+QURWlcyXDBtghz0OvNQjp/3qfWJJtToKGyeAjxrR5hAe+6igE1B5uyqrLDU+tdoZe2MAxPXj6fc7tdC1Wr2oqzXYNJNRP5U2HBisEM7DkzGPXJTRxSmNcDPL77VTTs2/PzedIlEusvfN1o9uq+STjdUpRGcHNv8/n++G46wmHTTTJbDOk9kYCcOjTBc1sUXTMXL/YzAZk7Lfv+waAkU80MDB1qN6SNV8cYc9NYfL/aEJVkAPNp+BtZi3L6KRGnK4xfhHIJbKOVZ7sDiEHP9zp14X4pFZvwbuDjlNgltUBrQs3FjdLEF2DfdFyVc21i3YlFjYa+EAmxeo6u4K30hSvL8Jupnh6sHFlsXpK/54Wt2P7Dbfe2QGmNoUXSFkO/5aOuM9dwQtOsxXp3YrK6iB735OoRDfK+3V3E8PhKAoM01LB7YgO72jTTNqj3pq4ezVfVK1qcNLbH6Vq8GzIINfYKDbebdDgl72VAtyCObIIEhiSp2+uCsULXzAT1Eslsk5QcGACiIQpypUAD5iCw21rdLYTzMkQBnRZuz+4XsFtljZbPPLjOgTNV1L/MzZSmQPA5QErNXarto5LKB/FlF1AeKdIzAVkhC61aXhVNbNRHUrXoLIsdOK5pZz8KGwSON/73lUcQj3XGzjY5Zoih35pS8CPtoX3qBdZ7KfPpDZXUgIn9GIb5aYEl8iegm0b5RH68IEUOg1pWQCq/mimGj2Oeq4BJiP+0LHw+9S6aiKTGSxoin+Of41YL47a/n7IoFd5UNJoSZzGGwDjtj04T1gdbhDqOmU7A9i+ofROacvTFTbiszBlQWnV/tfA4xRWZfYRrKq1g== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 29728986-247c-4a13-6f19-08da5f8d7a5d X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Jul 2022 20:24:07.6348 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: agGZUhLAszs6ER7PhPLX2jxAzf3MYGn3WLvAQS9ZgEMp+0DqDanZUAH05BOzyX0vIcolpOCTovAiGDqWQJzZUg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR10MB4032 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.517,18.0.883 definitions=2022-07-06_12:2022-06-28,2022-07-06 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 suspectscore=0 phishscore=0 adultscore=0 malwarescore=0 mlxscore=0 spamscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207060078 X-Proofpoint-GUID: ebqyYBoO9jBRc6USB6ZXkr7-KPBI6vAM X-Proofpoint-ORIG-GUID: ebqyYBoO9jBRc6USB6ZXkr7-KPBI6vAM ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657139058; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=woIq4osnSBp8gNWba+juRZH6czD+qLIXHmmBC+ZTdaU=; b=mZTRIkpn+nhDSgImWY87u4JvDvwk3yYziBybT37rbRF9sB+wE6dDxsC48RAxD1gsh3uSFJ mqr1cykEOlbfEL85kWNSIgdk0u5R9k3cQljGulnk0NNkjrW1aiQCPjFb8adV9BODQpuYRy XDL6SJJsXTYO3vfBBCjiV2AikXfeqMI= ARC-Authentication-Results: i=2; imf24.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=bIRsL1LW; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=NSV4ghAz; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=none) header.from=oracle.com; spf=none (imf24.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.165.32) smtp.mailfrom=mike.kravetz@oracle.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1657139058; a=rsa-sha256; cv=pass; b=osFVws0l8bpj1AlpcbDIvjgVpyQJl4UauzMXi4ekj5woMJu1NSUWvse457KUOdfH+tCy0Y PVZiKwN4qxfpA54nBpsDpSpXdeXaH/b3V/Zy3+W+HTK+UXAi7gmS26plYYOG7tUaHA3+W6 tkZyHaZmJVjIouN957Ih/d9WOYQYh6U= X-Rspamd-Server: rspam11 X-Rspam-User: X-Stat-Signature: c6h4qna434drhaqddh7ok3tnibcuu7cu X-Rspamd-Queue-Id: F41A118000B Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=bIRsL1LW; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=NSV4ghAz; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=none) header.from=oracle.com; spf=none (imf24.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.165.32) smtp.mailfrom=mike.kravetz@oracle.com X-HE-Tag: 1657139057-714836 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Allocate a rw semaphore and hang off vm_private_data for synchronization use by vmas that could be involved in pmd sharing. Only add infrastructure for the new lock here. Actual use will be added in subsequent patch. Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 36 +++++++++- kernel/fork.c | 6 +- mm/hugetlb.c | 150 ++++++++++++++++++++++++++++++++++++---- mm/rmap.c | 8 ++- 4 files changed, 178 insertions(+), 22 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 05c3a293dab2..248331c0f140 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -126,7 +126,7 @@ struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages, long min_hpages); void hugepage_put_subpool(struct hugepage_subpool *spool); -void reset_vma_resv_huge_pages(struct vm_area_struct *vma); +void hugetlb_dup_vma_private(struct vm_area_struct *vma); void clear_vma_resv_huge_pages(struct vm_area_struct *vma); int hugetlb_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *); int hugetlb_overcommit_handler(struct ctl_table *, int, void *, size_t *, @@ -214,6 +214,13 @@ struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address, pgd_t *pgd, int flags); +void hugetlb_vma_lock_read(struct vm_area_struct *vma); +void hugetlb_vma_unlock_read(struct vm_area_struct *vma); +void hugetlb_vma_lock_write(struct vm_area_struct *vma); +void hugetlb_vma_unlock_write(struct vm_area_struct *vma); +int hugetlb_vma_trylock_write(struct vm_area_struct *vma); +void hugetlb_vma_assert_locked(struct vm_area_struct *vma); + int pmd_huge(pmd_t pmd); int pud_huge(pud_t pud); unsigned long hugetlb_change_protection(struct vm_area_struct *vma, @@ -225,7 +232,7 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); #else /* !CONFIG_HUGETLB_PAGE */ -static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma) +static inline void hugetlb_dup_vma_private(struct vm_area_struct *vma) { } @@ -336,6 +343,31 @@ static inline int prepare_hugepage_range(struct file *file, return -EINVAL; } +static inline void hugetlb_vma_lock_read(struct vm_area_struct *vma) +{ +} + +static inline void hugetlb_vma_unlock_read(struct vm_area_struct *vma) +{ +} + +static inline void hugetlb_vma_lock_write(struct vm_area_struct *vma) +{ +} + +static inline void hugetlb_vma_unlock_write(struct vm_area_struct *vma) +{ +} + +static inline int hugetlb_vma_trylock_write(struct vm_area_struct *vma) +{ + return 1; +} + +static inline void hugetlb_vma_assert_locked(struct vm_area_struct *vma) +{ +} + static inline int pmd_huge(pmd_t pmd) { return 0; diff --git a/kernel/fork.c b/kernel/fork.c index 23f0ba3affe5..ec6e7ddaae12 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -674,12 +674,10 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, } /* - * Clear hugetlb-related page reserves for children. This only - * affects MAP_PRIVATE mappings. Faults generated by the child - * are not guaranteed to succeed, even if read-only + * Copy/update hugetlb private vma information. */ if (is_vm_hugetlb_page(tmp)) - reset_vma_resv_huge_pages(tmp); + hugetlb_dup_vma_private(tmp); /* Link the vma into the MT */ mas.index = tmp->vm_start; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3d5f3c103927..2eca89bb08ab 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -90,6 +90,7 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp; /* Forward declaration */ static int hugetlb_acct_memory(struct hstate *h, long delta); +static bool vma_pmd_shareable(struct vm_area_struct *vma); static inline bool subpool_is_free(struct hugepage_subpool *spool) { @@ -904,6 +905,89 @@ resv_map_set_hugetlb_cgroup_uncharge_info(struct resv_map *resv_map, #endif } +static bool __vma_shareable_flags_pmd(struct vm_area_struct *vma) +{ + return vma->vm_flags & (VM_MAYSHARE | VM_SHARED) && + vma->vm_private_data; +} + +void hugetlb_vma_lock_read(struct vm_area_struct *vma) +{ + if (__vma_shareable_flags_pmd(vma)) + down_read((struct rw_semaphore *)vma->vm_private_data); +} + +void hugetlb_vma_unlock_read(struct vm_area_struct *vma) +{ + if (__vma_shareable_flags_pmd(vma)) + up_read((struct rw_semaphore *)vma->vm_private_data); +} + +void hugetlb_vma_lock_write(struct vm_area_struct *vma) +{ + if (__vma_shareable_flags_pmd(vma)) + down_write((struct rw_semaphore *)vma->vm_private_data); +} + +void hugetlb_vma_unlock_write(struct vm_area_struct *vma) +{ + if (__vma_shareable_flags_pmd(vma)) + up_write((struct rw_semaphore *)vma->vm_private_data); +} + +int hugetlb_vma_trylock_write(struct vm_area_struct *vma) +{ + if (!__vma_shareable_flags_pmd(vma)) + return 1; + + return down_write_trylock((struct rw_semaphore *)vma->vm_private_data); +} + +void hugetlb_vma_assert_locked(struct vm_area_struct *vma) +{ + if (__vma_shareable_flags_pmd(vma)) + lockdep_assert_held((struct rw_semaphore *) + vma->vm_private_data); +} + +static void hugetlb_free_vma_lock(struct vm_area_struct *vma) +{ + /* Only present in sharable vmas */ + if (!vma || !(vma->vm_flags & (VM_MAYSHARE | VM_SHARED))) + return; + + if (vma->vm_private_data) { + kfree(vma->vm_private_data); + vma->vm_private_data = NULL; + } +} + +static void hugetlb_alloc_vma_lock(struct vm_area_struct *vma) +{ + struct rw_semaphore *vma_sema; + + /* Only establish in (flags) sharable vmas */ + if (!vma || !(vma->vm_flags & (VM_MAYSHARE | VM_SHARED))) + return; + + if (!vma_pmd_shareable(vma)) { + vma->vm_private_data = NULL; + return; + } + + vma_sema = kmalloc(sizeof(*vma_sema), GFP_KERNEL); + if (!vma_sema) { + /* + * If we can not allocate semaphore, then vma can not + * participate in pmd sharing. + */ + vma->vm_private_data = NULL; + } else { + init_rwsem(vma_sema); + vma->vm_private_data = vma_sema; + } +} + struct resv_map *resv_map_alloc(void) { struct resv_map *resv_map = kmalloc(sizeof(*resv_map), GFP_KERNEL); @@ -1007,12 +1091,22 @@ static int is_vma_resv_set(struct vm_area_struct *vma, unsigned long flag) return (get_vma_private_data(vma) & flag) != 0; } -/* Reset counters to 0 and clear all HPAGE_RESV_* flags */ -void reset_vma_resv_huge_pages(struct vm_area_struct *vma) +void hugetlb_dup_vma_private(struct vm_area_struct *vma) { + /* + * Clear hugetlb-related page reserves for children. This only + * affects MAP_PRIVATE mappings. Faults generated by the child + * are not guaranteed to succeed, even if read-only + */ VM_BUG_ON_VMA(!is_vm_hugetlb_page(vma), vma); if (!(vma->vm_flags & VM_MAYSHARE)) vma->vm_private_data = (void *)0; + + /* + * Allocate semaphore if pmd sharing is possible. Private mappings + * are ignored. + */ + hugetlb_alloc_vma_lock(vma); } /* @@ -1043,7 +1137,7 @@ void clear_vma_resv_huge_pages(struct vm_area_struct *vma) kref_put(&reservations->refs, resv_map_release); } - reset_vma_resv_huge_pages(vma); + hugetlb_dup_vma_private(vma); } /* Returns true if the VMA has associated reserve pages */ @@ -4591,16 +4685,21 @@ static void hugetlb_vm_op_open(struct vm_area_struct *vma) resv_map_dup_hugetlb_cgroup_uncharge_info(resv); kref_get(&resv->refs); } + + hugetlb_alloc_vma_lock(vma); } static void hugetlb_vm_op_close(struct vm_area_struct *vma) { struct hstate *h = hstate_vma(vma); - struct resv_map *resv = vma_resv_map(vma); + struct resv_map *resv; struct hugepage_subpool *spool = subpool_vma(vma); unsigned long reserve, start, end; long gbl_reserve; + hugetlb_free_vma_lock(vma); + + resv = vma_resv_map(vma); if (!resv || !is_vma_resv_set(vma, HPAGE_RESV_OWNER)) return; @@ -6438,6 +6537,11 @@ bool hugetlb_reserve_pages(struct inode *inode, return false; } + /* + * vma specific semaphore used for pmd sharing synchronization + */ + hugetlb_alloc_vma_lock(vma); + /* * Only apply hugepage reservation if asked. At fault time, an * attempt will be made for VM_NORESERVE to allocate a page @@ -6461,12 +6565,11 @@ bool hugetlb_reserve_pages(struct inode *inode, resv_map = inode_resv_map(inode); chg = region_chg(resv_map, from, to, ®ions_needed); - } else { /* Private mapping. */ resv_map = resv_map_alloc(); if (!resv_map) - return false; + goto out_err; chg = to - from; @@ -6561,6 +6664,7 @@ bool hugetlb_reserve_pages(struct inode *inode, hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h), chg * pages_per_huge_page(h), h_cg); out_err: + hugetlb_free_vma_lock(vma); if (!vma || vma->vm_flags & VM_MAYSHARE) /* Only call region_abort if the region_chg succeeded but the * region_add failed or didn't run. @@ -6640,14 +6744,34 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma, } static bool __vma_aligned_range_pmd_shareable(struct vm_area_struct *vma, - unsigned long start, unsigned long end) + unsigned long start, unsigned long end, + bool check_vma_lock) { +#ifdef CONFIG_USERFAULTFD + if (uffd_disable_huge_pmd_share(vma)) + return false; +#endif /* * check on proper vm_flags and page table alignment */ - if (vma->vm_flags & VM_MAYSHARE && range_in_vma(vma, start, end)) - return true; - return false; + if (!(vma->vm_flags & VM_MAYSHARE)) + return false; + if (check_vma_lock && !vma->vm_private_data) + return false; + if (!range_in_vma(vma, start, end)) + return false; + return true; +} + +static bool vma_pmd_shareable(struct vm_area_struct *vma) +{ + unsigned long start = ALIGN(vma->vm_start, PUD_SIZE), + end = ALIGN_DOWN(vma->vm_end, PUD_SIZE); + + if (start >= end) + return false; + + return __vma_aligned_range_pmd_shareable(vma, start, end, false); } static bool vma_addr_pmd_shareable(struct vm_area_struct *vma, @@ -6656,15 +6780,11 @@ static bool vma_addr_pmd_shareable(struct vm_area_struct *vma, unsigned long start = addr & PUD_MASK; unsigned long end = start + PUD_SIZE; - return __vma_aligned_range_pmd_shareable(vma, start, end); + return __vma_aligned_range_pmd_shareable(vma, start, end, true); } bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) { -#ifdef CONFIG_USERFAULTFD - if (uffd_disable_huge_pmd_share(vma)) - return false; -#endif return vma_addr_pmd_shareable(vma, addr); } diff --git a/mm/rmap.c b/mm/rmap.c index 6593299d3b18..64076c2a49c1 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -24,7 +24,7 @@ * mm->mmap_lock * mapping->invalidate_lock (in filemap_fault) * page->flags PG_locked (lock_page) - * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share) + * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share, see hugetlbfs below) * mapping->i_mmap_rwsem * anon_vma->rwsem * mm->page_table_lock or pte_lock @@ -44,6 +44,12 @@ * anon_vma->rwsem,mapping->i_mmap_rwsem (memory_failure, collect_procs_anon) * ->tasklist_lock * pte map lock + * + * hugetlbfs PageHuge() take locks in this order: + * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) + * vma_lock (hugetlb specific lock for pmd_sharing) + * mapping->i_mmap_rwsem (also used for hugetlb pmd sharing) + * page->flags PG_locked (lock_page) */ #include From patchwork Wed Jul 6 20:23:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12908573 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BE9CC43334 for ; Wed, 6 Jul 2022 20:24:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E2A8C8E0007; Wed, 6 Jul 2022 16:24:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB1668E0001; Wed, 6 Jul 2022 16:24:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BDCF28E0007; Wed, 6 Jul 2022 16:24:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id AD29C8E0001 for ; Wed, 6 Jul 2022 16:24:27 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 81B4F6013D for ; Wed, 6 Jul 2022 20:24:27 +0000 (UTC) X-FDA: 79657802574.27.F801702 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by imf11.hostedemail.com (Postfix) with ESMTP id F07B840030 for ; Wed, 6 Jul 2022 20:24:26 +0000 (UTC) Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 266Il2I4015771; Wed, 6 Jul 2022 20:24:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=seh+t+7lpYZ7uIk7rcl3V3IrvZ/ruQR66XXKFOeUdH8=; b=ygDehNYXji0B6MvzD7xwkXguSoTwbpoKfAnXt5IKZDzVvoBPZoIZ6rlvmLnmS4T1H2xL oYHy4ewKpSjxSoXafRIVdrjOElrw9NvMRIWExODajIda9zYhtTi31zN5WwQkZJWsTpQY nCTw4RaUs86wJ14hDVWmu8ozXfkIfLVnMU91cEUCLd54rl9m0hj4mxYB0kmeWo8qkp+P wylyaTtze8TDz02Xmk3PTOXf4z2+QHUF1Sihb2qlb1oMtpTlly+HcGNzQR4Mk7+QzlhP HcAZftNYxqjToJgyWgKgnHDOzmTXMj5FeVxqbLgQ+248Xxe92zozYnOHZLOsbqLrQSqw 7A== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3h4ubyk99y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Jul 2022 20:24:13 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 266KLJcw016094; Wed, 6 Jul 2022 20:24:13 GMT Received: from nam10-mw2-obe.outbound.protection.outlook.com (mail-mw2nam10lp2103.outbound.protection.outlook.com [104.47.55.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3h4ud56ee5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Jul 2022 20:24:12 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ILPuLCeexdPiWZYZDCgeHu+JohxpdbP2aYV872P1qKG92t1MbhRJ2XTEpFxJ1XFlXFhXX0I7qvCEjgeLQnTelxvkqSk7h53K5hd4Tmpz+tX5Fpy2HThj+5j86hRrb8HlP4+Vpz52BZHQeQ0/qZJdxasxWNE+7jUZTYozChnejCi7H0WMy2F/PQyxF/mB7EcA9bZyYMsdDrRY7TT5ixQ8siMR+NzHgPGKRCXA4ILVunU0aX/P68XMpIu6jG3Hne9Snsb0811wLavJ4DSEsxUmI1yAUKLynoyyOrItW6ppCo0Eqmecn79uv+i+04HdIi6WdPUKJ6Ihwzaw69rH2KYFqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=seh+t+7lpYZ7uIk7rcl3V3IrvZ/ruQR66XXKFOeUdH8=; b=IP3iVrCJLZYGXZWnU0rE5O0n60IsPBPmCyebi5ZpyfIfmTQKNj6RLCDVJ39joLm6b98h3BJ5eNUPERy7JbtY8OgUCDVhdz9s3WZYmx9v5IRZPPCQ5RX7FmHa8cdaozCISHUwIiG0DcEZxCtpOw4gSceSoa2e8sj+iK/YC1QOfmmygNzgC1ATdtCaMgoqcgssmdeyfxOBt4bFfWeb2nZNJD5f0itmFfZ+/LburMXcb5BlL06FCWJrLqd81/ceqv6js5ZgxtqQQ52MSYb2tV22VF9XBx5pKaSxD+iBlT0BuYnRqyWI5KJBSiZuVj6rbNo/GnQwGPsLqR5ixVTENcOp5A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=seh+t+7lpYZ7uIk7rcl3V3IrvZ/ruQR66XXKFOeUdH8=; b=ADOH6yurmd0gBFGOJn2yZWC96KGZS8l6kGfC7Hd3Aq2edVU3CKpDihgRHGzdmx22CF/OLaagcSURZeF2b5gnyzT56NiH6KdqGYxJE0CnRsYeS1GeppyZzTS+1FfvSdolZDFR++2uA7lIboSX3aq4h6AYY6ZIvcoeGkI01eCYCO4= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by SN6PR10MB2847.namprd10.prod.outlook.com (2603:10b6:805:cb::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5395.18; Wed, 6 Jul 2022 20:24:10 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::c1ba:c197:f81f:ec0]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::c1ba:c197:f81f:ec0%6]) with mapi id 15.20.5417.016; Wed, 6 Jul 2022 20:24:10 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Michal Hocko , Peter Xu , Naoya Horiguchi , David Hildenbrand , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH v4 7/8] hugetlb: create hugetlb_unmap_file_folio to unmap single file folio Date: Wed, 6 Jul 2022 13:23:46 -0700 Message-Id: <20220706202347.95150-8-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220706202347.95150-1-mike.kravetz@oracle.com> References: <20220706202347.95150-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MWHPR18CA0052.namprd18.prod.outlook.com (2603:10b6:300:39::14) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 3fd38465-a142-4385-3330-08da5f8d7c01 X-MS-TrafficTypeDiagnostic: SN6PR10MB2847:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: i8z2meG5AczYSf5r2p/+80l1VemxsyAb7rfVPCgl2hc9mWgVC8G+KUQLbeLY5GHK/ru3IGT+hdbmvSIMJgJ8sGl6nOAAPZc4gNh+x7kF+Ow394sUa6pFLVQapHHS5btbz1xb600Pgf5LpK0pPYVR9IYDYu9tlGBP0X7EBYdJu6WNWQaHhcsMCbaaqchNp0u4Tnc/YvmfXoauYYBZrff8FoGmsUlcHSDQMvUpE7SbFijJFFgDSaqEIMxeUZURCz1FhgyD8KBvbF54jBFIFm/6yNljS4uDt7n9NYxVncnFyG4qnLu1BzeY2s9sKnmhYCY089quAIkD/Tt7iuaSCcXrF7pMvl9/S1C08A6Rm0BHD1H0py4GxTSy2WSXlK2JTbzj1mFDD7ZvotSkBciRKExChQS6hN9A/YBWM+L0rDo2RbGACTLiEcOqoxLui4Y32IiTF2yK9OVKlmPdAYGKVp1R2YSoqvVu9rjKaLEjt2h71c4G2MhvSgqaxgNfwVoXfQzaI7FBBVpz0/RHsxdyYDbEJWY7WsMD5W7S1DxVKsu4ChRuNhIZuf1HFmEFZEgBPgLPINR+CVaGZXvBG3Vsx+X3vHTe89V9LxClRWzxN5gP5dpPTXUYUbvw6Di6RT9KX2Gs7uh0IGYSrr9fD7qkD42MU9IPhHZim+FefmbwoHCeJnfdpvFHE3n4xUElu9nxABTJoPNZEM/snycxXZmA3lzR3nsti80dCh7jXmz6BaMK/swIH0KnN1K7Ev3aNKtqVx9x X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(376002)(366004)(396003)(39860400002)(346002)(136003)(7416002)(478600001)(8936002)(5660300002)(6506007)(2906002)(66476007)(41300700001)(6666004)(8676002)(4326008)(66556008)(38100700002)(54906003)(44832011)(36756003)(26005)(186003)(6486002)(6512007)(86362001)(1076003)(2616005)(107886003)(83380400001)(316002)(66946007);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: kL5qx0M7/7YqH3pus5MhJ37MXHpANdZpJtUX4eXfJQ3JEA5MMKH+ecHWJMR1pwMbUWXpGIRVMPl9tNVVKjNOcEE6vfCzR9E08AwCAywFlInjHejxMRxmHz/YG7MPZMnhWzGB4Y/HoUhK4JCShAGZCU6jzcFFyXhDx4pGBBJYd3S8oIEHKMlCO/CBXCvsyq9pV998kWJPc4kAW1q0PRkNRC3HVt2ujhJLZb2eH/dc/s1TMOyyOFolBSB6UWV/Asbrzu8QBOX+gMg74/7C29n53yZl/ia189FqjO7TBUH5IHEXCboHNe/7X8z1oCByzYJ1Oc1sAD13g93PEMg5Iuak4dRIe6/qEziBvrSrIXiZOM/MB6Qfnetf3+gU82/J9qhZi3A0GokCqrzgF6NvfD8uV77Y3MAv/1gdToYBtvxNmVPURojw2ZWKV5pDGBuECXv7eBfb2nmdb1H9ES3Q0lNwecLiB6vZdKWkb4BKI9XjHICpWMWc6T+Qn0A9HRvuscYHzJZQlnwZZgKKBpfIMDeqFu+JRVPRZ/sGzqcCJB4niyHtFKxMYOZ92rU4lG+he9O4VkBhIY4Kh7s8RR2tg/KvMSIpS6HHQbDF5s3cHXglNmK3bfybCtKT9m5X34jmGOBqn/PN1hvzasBr2M2+pPjZx1qxcMTSJnmcIV6yskXozeVCNMGc7w9B+roQbIC57IalZwa1eAqNZMBOA2Lprm4/+dwx2h0pHH82ZP//V8pj+aOvps6N/CjwQR9eg+C8druGY1XfhFCpz3u73MmXFC4DhWJy5piGCbA1iPAe6I35xp5Lj7gCQvjZLThOPUqpJgpVRT9LQkn5hjC/HK7CgPezu+ZJXcLuLj5YlMTqRdV1+H/D/IdXwOL6hTKxpnGl9fK+eSiRxBZMZWjTXJLJ3J2MIOCsjoj3jvngkjsW4saNXmuJe5NgFKvlx0WlRBm71COTWlGMS10Oof6+rZirSQnWoOSDjZToKt7OZ/U8fVaO77hhpK1qiXYjvCYCjQ+RimEgwsCGxIJ6PESYfW20Ej53Nsi8jYKaulEqgRKCI/eJRZVHu6K24Vx/9Y1B34NolSTnT7R59FWA2kj5MQnwzxgAXkubvQWEr3qKlaHpkYtozasr0g6GXUGeGsVQwrByGvVPpFEsNcx8MRndtuxO8oK5GDt8l+j6gFt5GKTNgzqbB9KDRGyBp5LEOepZBCBkCVrYhPUF0pvVpPtQZ45WEGJhL/+WpFclQPVDkQ8mBtv/OQgBjifN8qqQQ3/BAFIrvXlmaiTkiA8oDUW5zB3iWsoAa5eYVZrBI9R/BbxQulR12k9VFQoiKnDGmU+hS5ZP/axsH1mUD5k/fyBIIdoJ4K0BzqVNoLl36z5KYx0TQ2FUgkP1hpil4hZPV4NMuWVvRazRM8EICllf8s3Ueiu913wR7rqP+OHU3HMQWxVsst7Sx1Jh9wHpZk7c0DFOMDnkddxLQdhgrdG/Zdi8XrPg+pKk/U4HA+9gRVVQkYCjjs50kL0qnXRzMUy7hNB8xXisnIsb/JOe1nMne3f4UzhtRs5rwUpAzb6pZXhHswdevVxrSKMMBaQUJY53IFKBwq5KC/1SHkJyq98FNk3cyIaRS6gd7g== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3fd38465-a142-4385-3330-08da5f8d7c01 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Jul 2022 20:24:10.3701 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: HcEviDu/TnYR2mkzD5GsXQmDtOiz2hWGvENvWuIGhN9YuAkor4q8O69/+3SAPJ68eScvXhCRPGCE7W6lC+oWjA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN6PR10MB2847 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.517,18.0.883 definitions=2022-07-06_12:2022-06-28,2022-07-06 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 phishscore=0 malwarescore=0 mlxscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207060078 X-Proofpoint-GUID: AlEAgMHsMK9UIBXJWx1DE4r5pC9zmHSv X-Proofpoint-ORIG-GUID: AlEAgMHsMK9UIBXJWx1DE4r5pC9zmHSv ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1657139067; a=rsa-sha256; cv=pass; b=7khHI82CtlP666ALYMe3JczosObcP3Tjh3Wfm01l9hulTyDsqJiFA+3jz8vYxOoNnZlakc tIXUCYAqzgATxMUVUCUHm+0zryWkFgkGJIO8YZCA56ijsJHn2iWj+J291GkL2pGWbcOle/ REn3Qg0M5nn4Gpur28ilxYI+9647c0Y= ARC-Authentication-Results: i=2; imf11.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=ygDehNYX; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=ADOH6yur; spf=none (imf11.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.165.32) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657139067; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=seh+t+7lpYZ7uIk7rcl3V3IrvZ/ruQR66XXKFOeUdH8=; b=tLsfsZJhnS0gWqr/BRrWCd2uTg0UnmiYbdDIJF5eWuP2b2x6hjy7rCUeZsNlICZqbMvuSd pi4JgZy8k3kD/nhhno2rr6gMIdRpm1QLoTQ4EEDb7aMzh6SMfCBVGeKhJrSft1yWKuFnmy n6kvrs5XByWwiu/KGqBt0L8JHhXjdE8= X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: F07B840030 X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=ygDehNYX; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=ADOH6yur; spf=none (imf11.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.165.32) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (policy=none) header.from=oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") X-Stat-Signature: anbmfkgopk1t5hxrq8i1usg9z3q8mfpf X-HE-Tag: 1657139066-92928 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Create the new routine hugetlb_unmap_file_folio that will unmap a single file folio. This is refactored code from hugetlb_vmdelete_list. It is modified to do locking within the routine itself and check whether the page is mapped within a specific vma before unmapping. This refactoring will be put to use and expanded upon in a subsequent patch adding vma specific locking. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 124 +++++++++++++++++++++++++++++++++---------- 1 file changed, 95 insertions(+), 29 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 31bd4325fce5..0eac0ea2a245 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -396,6 +396,94 @@ static int hugetlbfs_write_end(struct file *file, struct address_space *mapping, return -EINVAL; } +/* + * Called with i_mmap_rwsem held for inode based vma maps. This makes + * sure vma (and vm_mm) will not go away. We also hold the hugetlb fault + * mutex for the page in the mapping. So, we can not race with page being + * faulted into the vma. + */ +static bool hugetlb_vma_maps_page(struct vm_area_struct *vma, + unsigned long addr, struct page *page) +{ + pte_t *ptep, pte; + + ptep = huge_pte_offset(vma->vm_mm, addr, + huge_page_size(hstate_vma(vma))); + + if (!ptep) + return false; + + pte = huge_ptep_get(ptep); + if (huge_pte_none(pte) || !pte_present(pte)) + return false; + + if (pte_page(pte) == page) + return true; + + return false; /* WTH??? */ +} + +/* + * Can vma_offset_start/vma_offset_end overflow on 32-bit arches? + * No, because the interval tree returns us only those vmas + * which overlap the truncated area starting at pgoff, + * and no vma on a 32-bit arch can span beyond the 4GB. + */ +static unsigned long vma_offset_start(struct vm_area_struct *vma, pgoff_t start) +{ + if (vma->vm_pgoff < start) + return (start - vma->vm_pgoff) << PAGE_SHIFT; + else + return 0; +} + +static unsigned long vma_offset_end(struct vm_area_struct *vma, pgoff_t end) +{ + unsigned long t_end; + + if (!end) + return vma->vm_end; + + t_end = ((end - vma->vm_pgoff) << PAGE_SHIFT) + vma->vm_start; + if (t_end > vma->vm_end) + t_end = vma->vm_end; + return t_end; +} + +/* + * Called with hugetlb fault mutex held. Therefore, no more mappings to + * this folio can be created while executing the routine. + */ +static void hugetlb_unmap_file_folio(struct hstate *h, + struct address_space *mapping, + struct folio *folio, pgoff_t index) +{ + struct rb_root_cached *root = &mapping->i_mmap; + struct page *page = &folio->page; + struct vm_area_struct *vma; + unsigned long v_start; + unsigned long v_end; + pgoff_t start, end; + + start = index * pages_per_huge_page(h); + end = ((index + 1) * pages_per_huge_page(h)); + + i_mmap_lock_write(mapping); + + vma_interval_tree_foreach(vma, root, start, end - 1) { + v_start = vma_offset_start(vma, start); + v_end = vma_offset_end(vma, end); + + if (!hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page)) + continue; + + unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, + NULL, ZAP_FLAG_DROP_MARKER); + } + + i_mmap_unlock_write(mapping); +} + static void hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, zap_flags_t zap_flags) @@ -408,30 +496,13 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, * an inclusive "last". */ vma_interval_tree_foreach(vma, root, start, end ? end - 1 : ULONG_MAX) { - unsigned long v_offset; + unsigned long v_start; unsigned long v_end; - /* - * Can the expression below overflow on 32-bit arches? - * No, because the interval tree returns us only those vmas - * which overlap the truncated area starting at pgoff, - * and no vma on a 32-bit arch can span beyond the 4GB. - */ - if (vma->vm_pgoff < start) - v_offset = (start - vma->vm_pgoff) << PAGE_SHIFT; - else - v_offset = 0; - - if (!end) - v_end = vma->vm_end; - else { - v_end = ((end - vma->vm_pgoff) << PAGE_SHIFT) - + vma->vm_start; - if (v_end > vma->vm_end) - v_end = vma->vm_end; - } + v_start = vma_offset_start(vma, start); + v_end = vma_offset_end(vma, end); - unmap_hugepage_range(vma, vma->vm_start + v_offset, v_end, + unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, NULL, zap_flags); } } @@ -504,14 +575,9 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, * the fault mutex. The mutex will prevent faults * until we finish removing the folio. */ - if (unlikely(folio_mapped(folio))) { - i_mmap_lock_write(mapping); - hugetlb_vmdelete_list(&mapping->i_mmap, - index * pages_per_huge_page(h), - (index + 1) * pages_per_huge_page(h), - ZAP_FLAG_DROP_MARKER); - i_mmap_unlock_write(mapping); - } + if (unlikely(folio_mapped(folio))) + hugetlb_unmap_file_folio(h, mapping, folio, + index); folio_lock(folio); /* From patchwork Wed Jul 6 20:23:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 12908574 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C71B4C43334 for ; Wed, 6 Jul 2022 20:24:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6B0188E0008; Wed, 6 Jul 2022 16:24:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6370B8E0001; Wed, 6 Jul 2022 16:24:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4151C8E0008; Wed, 6 Jul 2022 16:24:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2F24E8E0001 for ; Wed, 6 Jul 2022 16:24:31 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 0D97C2078C for ; Wed, 6 Jul 2022 20:24:31 +0000 (UTC) X-FDA: 79657802742.16.6E8B1E3 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by imf27.hostedemail.com (Postfix) with ESMTP id 7D9F14001B for ; Wed, 6 Jul 2022 20:24:30 +0000 (UTC) Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 266IZWQf022841; Wed, 6 Jul 2022 20:24:16 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : content-type : mime-version; s=corp-2021-07-09; bh=Kvao4cEn24XXr8ivJV7kKIfMYHovdXbmTJsGSiAGhtA=; b=TkDW1t9BHJt6FAWQvup3M1x9VRuLC6at9Q3G22rizX0C9ydD8EUGnPRbdSk5xnBz5KTa pTajYEP5g+GyC47qgCoIxWE2XlQk2HsLQtoasbFlprmRN/QgjB6glec+qShxKmG8FPk6 ml2vp6j9eYM6UlDluk758kWCm8i9f1D8tWV4pQBG2O5n6Cz2xCwtkd9Ihzubpd4pVoHX ws4lBQdqsefmmCNnIpDIfFJ/LceRTw3iEty8mtlQeD6AFgygWKNUIqOIlAQvHZNH8F+n DNKqnR2XFMdF1iIBWhneXKtoOtW2ZuOSpA1Lj57BFRKt5DDfjoy+9lkv4j9+11vVNhDy dg== Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3h4ubyk9x6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Jul 2022 20:24:16 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 266KLT5N023114; Wed, 6 Jul 2022 20:24:15 GMT Received: from nam10-dm6-obe.outbound.protection.outlook.com (mail-dm6nam10lp2109.outbound.protection.outlook.com [104.47.58.109]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com with ESMTP id 3h4ud63wa3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 06 Jul 2022 20:24:14 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZZdNvTYaWQ0lBT6P8dfJdr9RSwkAxAkFuePgqAdb9NvZi4eD6N+B6onVVy4WmL6P+ipraeYDoQdWiv++iDGObRCcDZ7CNvZdnZ10Hjg0N1EPaUy2qlGzp2GIAIc+IQfGZGJRly732drjtZX2tuMGt/ykEXwUuxoW2lKWZU4DiI9Srow0snMeXaejo522kfr46aGBmw+jb18Y4rIrd0ArOkU2dCa6AP7bv7MIQ8gj/shflj5WYvNIMxcR5615P8KKAWWxbaY33uSCPwF1cY26VJFwJKNh3VVzIY25fg38sKI2GD0WbXt02hT9EXLOiFaRVeSEvmCLg/ZXJqYkwi2TmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Kvao4cEn24XXr8ivJV7kKIfMYHovdXbmTJsGSiAGhtA=; b=IKhwOdOSnKpD5jr7puVYTBCMHmPs7u0+BrugoJtzq+v68GK/Kn3MIj4tvH/aRo+mH4lYC6jMo++2uw018j1XHZ/+AJk3xtPhkdhBCcv1blN2+P3snbLnlokZ5AGgLN6MzItqNIQQaWVIEf9zPHdo/GIGgqMcfjQCMvWLwLWua9iTdnnIE4HlWiZHJkSUldn3QVmH5DqhaatXb3mWBtsbTS2BeUxP+47mmDMR18+T0Fb/FpLP4JmSAJcugiJdoGBt4GzlIp2Hbh0l5Cv3j3kT5Hra/wyqxR2G/unBV85JpTl4wNQXu28HugoIHqkp3ZlXN9wyjwSw/3FKvQsPfNahVg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Kvao4cEn24XXr8ivJV7kKIfMYHovdXbmTJsGSiAGhtA=; b=zczk3SVOiGJVlJlSYvn2gJlJkhDHxkdQ7POY8vBjzlb9Bv4Xo6OeYQ5h/jV//wWS/6bZHvtK+JcasDbW3XE8rtwBZqxSoWWZSf0/ycKD5UpinIbyVzMsbdqu0YGQFiHpE+/oyqX1mKhSIwT9IlhMQXEnzCHrskNZm+oj/ANPjFM= Received: from BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) by SN6PR10MB2847.namprd10.prod.outlook.com (2603:10b6:805:cb::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5395.18; Wed, 6 Jul 2022 20:24:12 +0000 Received: from BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::c1ba:c197:f81f:ec0]) by BY5PR10MB4196.namprd10.prod.outlook.com ([fe80::c1ba:c197:f81f:ec0%6]) with mapi id 15.20.5417.016; Wed, 6 Jul 2022 20:24:12 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Michal Hocko , Peter Xu , Naoya Horiguchi , David Hildenbrand , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , Mike Kravetz Subject: [RFC PATCH v4 8/8] hugetlb: use new vma_lock for pmd sharing synchronization Date: Wed, 6 Jul 2022 13:23:47 -0700 Message-Id: <20220706202347.95150-9-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220706202347.95150-1-mike.kravetz@oracle.com> References: <20220706202347.95150-1-mike.kravetz@oracle.com> X-ClientProxiedBy: MW4PR03CA0082.namprd03.prod.outlook.com (2603:10b6:303:b6::27) To BY5PR10MB4196.namprd10.prod.outlook.com (2603:10b6:a03:20d::23) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 8f920a3e-0e24-463b-4591-08da5f8d7d35 X-MS-TrafficTypeDiagnostic: SN6PR10MB2847:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: xYt59aYAkipYi4MCU/vEpojZCXRCXLqEu++Nmv6WlOqlrU1us5xWNHcjFwnaCvtgShY5VtDFIQ0844b0Vy0+pQWRvfMzk1SLRFX1RwXhO1pPltnV6whLeZM4KVVhqv8Gi2nMF5aloX8RAsGKQttDkqiJ0eahp5tewocw56fvrG1FEAu49JhL6p+OZaweT3S+FJlHHl35+PR2JMRPdBSE4nsgQ0MhVf3fG4vklNGo1K1INdvAdj+bHr91y9XEN9l6N29qQBAoA/unA1I1iWQy5VfEn6eNz4so4EF2vrZ/Nzb+N6za4CrZzVHD18BfPNZgk2I4RxubCKzk6DrSJ7Decw9IML5zQKBwAdaVx0eSece+bXApdQUzqdcpHsMpw0CXJ45TBJIjkb10U5zPdEUxqS1dZ96YMbschfFLWqTyVAmzwgZbmubHDn3GTsOxWIPTt5IEOVKuVVMP6JRxQsS7g9+wpLEbR3YSfzqDqIFCG/2QSqBIexHYhUzwGRqwbn9ecuqyhn4HIyVoBAEqdHeyBTrrwa4fFzjKk8xHdJRpOOdqcJFJNGv8pUxKn5nOGamAGnxqWWJ0b4EUb6i3JkB02FTotAy/o6fVI28QyllMCftyDQzBx4/2b289w3P659iX5+xNa8I7muRR22M3Od9xhpu9gtDngsa3UP15bESYEkC1t32Spz0ldNTRiLZM4rp8G6KgMBtTMnL2VV/bvUWGQJsNb0bxoFyuKaKBKiTQSxJPD/m0FaV8NvQHeOBPuSVK X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BY5PR10MB4196.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(376002)(366004)(396003)(39860400002)(346002)(136003)(7416002)(478600001)(8936002)(5660300002)(6506007)(2906002)(66476007)(41300700001)(30864003)(6666004)(8676002)(4326008)(66556008)(38100700002)(54906003)(44832011)(36756003)(26005)(186003)(6486002)(6512007)(86362001)(1076003)(2616005)(107886003)(83380400001)(316002)(66946007);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 3mf9ac1QN80DJYyoNmGBGg/1niC0hTbpbjOkPlkyjlOSTsM1Mu7V2UNSk3WECnrU/7ka/spNWz+1KNzIb75gctfnNP/UPhnrJKRL/h1fx9j4itXyLUke+XbkNAeKV2JUDbP2r82UBbJJu8k073YP4o7kQKQDlNqN2Ont+Xduipv0PZZ+P7YJIKtnkvr97pMaNwv+QVNGrt6WvoJJNUv9CBI4H3ss9Hn+btxfC8RloNLNyGKPrircmlTjQ2MGlZrYzEM2ZOkXGeZDRHzcVTrPtyHNle8ER72aQwIM9ewauanlX9wFLYiFFOs1cTKDQtni1tBKBG30gE//M6YFyMo0+5FLzBZSG9XLrtnX6m+1EIuccV14pm0PFX/JLMqfBnAS/Un8NGxNzxm2dHrBtk0FQJjrZ+vP3ysHZ5Wz4S/Ycv3n55BYG5SMcugIyXVFPDZmO40pBHM1XFeL7nUz6PnTu48hr4CaTdUjKGhqD4NO/gbZS4zJXIKZgSGzS2lqY2CkLRXh2D0LXDjxcagGKUl7T5lZb7dagiT9DVV3U1H1hmawoWzVMNP+m4eY5TEdC76WVq1WYvd8WDn9JrG0sdug8/Gxnawji+zYflE9n7ORHJ/ds6zgW+h7cN/h4QuxEvuwp4JpYPpaakIxapTosPZL0+ess7yxliFxi5rqxo/DlOu1EsOdnn+pbCsyrO9klCamp0oX0HXomc4qzZuyw999W0SAIHBFoS8cyZMrfBmJS4/+Ke89M61eX2af9Lq6txLFkyoIigsurM+n8ABCRkc9OibiaQVlsEZozA9SN65dNzP1QgBw5SadNC9zRX/cpRX/PEQpEAdw005F8MvvuSCcTJga/byGHNlG6sWJYCtHv6SSSSl5LHBpZyxppOfE6HclDaV0TsCl+VPBV6oHZcFtlU48klno1iXP0ZK1ESWB57dTr52hX7UwKiLqrZLM8pB9fWxFS3HV2bF40xK/xkF4vIEAU3v4Eop4GvMba1hJkVphiFWyWjxq2Ot8W5eL2khcMgnfWgTNXvCuDQ6br+/YdYqyqxoaqVa/4wg7AkhFPavs6atqNR1LT4e8T78JfPLgEfWYqv3ECGLuSvgfHA5a/uGJGFxufaQA4K/pGUHfsUvTiqTG/KWTR9BBQd4DJgDvTFdH23skSV4ESjA5WGU35L2MPFbcPBgT9D77ryhaQOk5ipOTtFh51WFTl95VpZd9tZB9aDoyyopmYXCSOd07wXKouTBwgnnybbqtc7XIWeckEMSM2LsXYqa708NK+ZIlwR575/PvU5cwaJUfUt66exZ9+NS1QcjJdMw4XpcYZ1CmDmpqP486DZTGlUtA6Wgucm6DZWGvFpufK9sspsm4Veo7OpxMPib0x3XdDd7XFU3G817qvNs/JVhzzR7BQ9pHwZ5vHWVcK0xH0JbbTLsJePNJ9qRLUT6J9vUY5WFAnimY7JZUccWVH1K/4y4JbwhtwhPr3XYo0q/H1P1jdgJFrBTQVB9FRQl9f/x1Rc9NCDdHb9Ufq2HIGkSFQ5VYmbgrRJopKH+DAFJ3wtAFK5ZzNlrQzx6gXenIoWpXZsrrCjAFECmdQ8tgvOaXkVLPEjhxOAgBRPwIeO0z9dVc2d7rOA== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 8f920a3e-0e24-463b-4591-08da5f8d7d35 X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Jul 2022 20:24:12.3735 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: nnHYF7ilWzZ/WlN4gccWR+4ZRi1f++KFYL0StX6QxogVWZmNb9Rjnl72tYe/1V6qR0wMrXpAp7QOMV19gHFZGA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN6PR10MB2847 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.517,18.0.883 definitions=2022-07-06_12:2022-06-28,2022-07-06 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 suspectscore=0 phishscore=0 mlxlogscore=999 spamscore=0 adultscore=0 malwarescore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207060078 X-Proofpoint-GUID: SemIWkYUidxhvrtC98oMonnVJegBP_FC X-Proofpoint-ORIG-GUID: SemIWkYUidxhvrtC98oMonnVJegBP_FC ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657139070; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Kvao4cEn24XXr8ivJV7kKIfMYHovdXbmTJsGSiAGhtA=; b=uhQHEbl00/ooact2WhhTrPTmgiM/wndz07F9vD9p45dt3nDMApdXMcrtOkCqx5IP4uBqF/ OSoh9sZewqc9EH0gbetbrkV7yjXbJBMiVmf77gjUZm3H2v/3V1nFZlCJaAjvaogXRF0Ggh owy7kEIZx8/PA+1Ocw8agkENIPsz1KU= ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1657139070; a=rsa-sha256; cv=pass; b=exRQZm2Y1BjF3tf4rsZVj2ou++GW7KiSxuWZaT77d9+D1eJyXI3DFHgyba79wfyzmsI+9m pim/rhebF0n2rs2I4g6Myf2t4ks4MrGW0pLhNdRSGZ00d751r8XQtC7xpYy19xGJqnrEsi AGty6/rTbW1ZBn+TGK2NuBZQXCgy+MI= ARC-Authentication-Results: i=2; imf27.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=TkDW1t9B; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=zczk3SVO; dmarc=pass (policy=none) header.from=oracle.com; spf=none (imf27.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.165.32) smtp.mailfrom=mike.kravetz@oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") X-Stat-Signature: a6ibegj3qeixtfk5eak7yg3665adgn9g X-Rspamd-Queue-Id: 7D9F14001B X-Rspam-User: Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2021-07-09 header.b=TkDW1t9B; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=zczk3SVO; dmarc=pass (policy=none) header.from=oracle.com; spf=none (imf27.hostedemail.com: domain of mike.kravetz@oracle.com has no SPF policy when checking 205.220.165.32) smtp.mailfrom=mike.kravetz@oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") X-Rspamd-Server: rspam10 X-HE-Tag: 1657139070-665753 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The new hugetlb vma lock (rw semaphore) is used to address this race: Faulting thread Unsharing thread ... ... ptep = huge_pte_offset() or ptep = huge_pte_alloc() ... i_mmap_lock_write lock page table ptep invalid <------------------------ huge_pmd_unshare() Could be in a previously unlock_page_table sharing process or worse i_mmap_unlock_write ... i_mmap_lock_write lock page table ptep invalid <------------------------ huge_pmd_unshare() Could be in a previously unlock_page_table sharing process or worse i_mmap_unlock_write ... ptl = huge_pte_lock(ptep) get/update pte set_pte_at(pte, ptep) The vma_lock is used as follows: - During fault processing. the lock is acquired in read mode before doing a page table lock and allocation (huge_pte_alloc). The lock is held until code is finished with the page table entry (ptep). - The lock must be held in write mode whenever huge_pmd_unshare is called. Lock ordering issues come into play when unmapping a page from all vmas mapping the page. The i_mmap_rwsem must be held to search for the vmas, and the vma lock must be held before calling unmap which will call huge_pmd_unshare. This is done today in: - try_to_migrate_one and try_to_unmap_ for page migration and memory error handling. In these routines we 'try' to obtain the vma lock and fail to unmap if unsuccessful. Calling routines already deal with the failure of unmapping. - hugetlb_vmdelete_list for truncation and hole punch. This routine also tries to acquire the vma lock. If it fails, it skips the unmapping. However, we can not have file truncation or hole punch fail because of contention. After hugetlb_vmdelete_list, truncation and hole punch call remove_inode_hugepages. remove_inode_hugepages check for mapped pages and call hugetlb_unmap_file_page to unmap them. hugetlb_unmap_file_page is designed to drop locks and reacquire in the correct order to guarantee unmap success. Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 45 ++++++++++++++++++++ mm/hugetlb.c | 76 ++++++++++++++++++++++++++++++---- mm/memory.c | 2 + mm/rmap.c | 99 ++++++++++++++++++++++++++++---------------- mm/userfaultfd.c | 9 +++- 5 files changed, 186 insertions(+), 45 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 0eac0ea2a245..be0a5073766f 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -459,6 +459,8 @@ static void hugetlb_unmap_file_folio(struct hstate *h, struct folio *folio, pgoff_t index) { struct rb_root_cached *root = &mapping->i_mmap; + unsigned long skipped_vm_start; + struct mm_struct *skipped_mm; struct page *page = &folio->page; struct vm_area_struct *vma; unsigned long v_start; @@ -469,6 +471,8 @@ static void hugetlb_unmap_file_folio(struct hstate *h, end = ((index + 1) * pages_per_huge_page(h)); i_mmap_lock_write(mapping); +retry: + skipped_mm = NULL; vma_interval_tree_foreach(vma, root, start, end - 1) { v_start = vma_offset_start(vma, start); @@ -477,11 +481,48 @@ static void hugetlb_unmap_file_folio(struct hstate *h, if (!hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page)) continue; + if (!hugetlb_vma_trylock_write(vma)) { + /* + * If we can not get vma lock, we need to drop + * immap_sema and take locks in order. + */ + skipped_vm_start = vma->vm_start; + skipped_mm = vma->vm_mm; + /* grab mm-struct as we will be dropping i_mmap_sema */ + mmgrab(skipped_mm); + break; + } + unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, NULL, ZAP_FLAG_DROP_MARKER); + hugetlb_vma_unlock_write(vma); } i_mmap_unlock_write(mapping); + + if (skipped_mm) { + mmap_read_lock(skipped_mm); + mmdrop(skipped_mm); + vma = find_vma(skipped_mm, skipped_vm_start); + if (!vma || vma->vm_file->f_mapping != mapping || + vma->vm_start != skipped_vm_start || + !is_vm_hugetlb_page(vma)) { + mmap_read_unlock(skipped_mm); + goto retry; + } + + hugetlb_vma_lock_write(vma); + i_mmap_lock_write(mapping); + mmap_read_unlock(skipped_mm); + + v_start = vma_offset_start(vma, start); + v_end = vma_offset_end(vma, end); + unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, + NULL, ZAP_FLAG_DROP_MARKER); + hugetlb_vma_unlock_write(vma); + + goto retry; + } } static void @@ -499,11 +540,15 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, unsigned long v_start; unsigned long v_end; + if (!hugetlb_vma_trylock_write(vma)) + continue; + v_start = vma_offset_start(vma, start); v_end = vma_offset_end(vma, end); unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, NULL, zap_flags); + hugetlb_vma_unlock_write(vma); } } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2eca89bb08ab..8369db31df13 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4848,6 +4848,14 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, mmu_notifier_invalidate_range_start(&range); mmap_assert_write_locked(src); raw_write_seqcount_begin(&src->write_protect_seq); + } else { + /* + * For shared mappings the vma lock must be held before + * calling huge_pte_offset in the src vma. Otherwise, the + * returned ptep could go away if part of a shared pmd and + * another thread calls huge_pmd_unshare. + */ + hugetlb_vma_lock_read(src_vma); } last_addr_mask = hugetlb_mask_last_page(h); @@ -4999,6 +5007,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, if (cow) { raw_write_seqcount_end(&src->write_protect_seq); mmu_notifier_invalidate_range_end(&range); + } else { + hugetlb_vma_unlock_read(src_vma); } return ret; @@ -5057,6 +5067,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, mmu_notifier_invalidate_range_start(&range); last_addr_mask = hugetlb_mask_last_page(h); /* Prevent race with file truncation */ + hugetlb_vma_lock_write(vma); i_mmap_lock_write(mapping); for (; old_addr < old_end; old_addr += sz, new_addr += sz) { src_pte = huge_pte_offset(mm, old_addr, sz); @@ -5088,6 +5099,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, flush_tlb_range(vma, old_end - len, old_end); mmu_notifier_invalidate_range_end(&range); i_mmap_unlock_write(mapping); + hugetlb_vma_unlock_write(vma); return len + old_addr - old_end; } @@ -5392,9 +5404,30 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, * may get SIGKILLed if it later faults. */ if (outside_reserve) { + struct address_space *mapping = vma->vm_file->f_mapping; + pgoff_t idx; + u32 hash; + put_page(old_page); BUG_ON(huge_pte_none(pte)); + /* + * Drop hugetlb_fault_mutex and vma_lock before + * unmapping. unmapping needs to hold vma_lock + * in write mode. Dropping vma_lock in read mode + * here is OK as COW mappings do not interact with + * PMD sharing. + * + * Reacquire both after unmap operation. + */ + idx = vma_hugecache_offset(h, vma, haddr); + hash = hugetlb_fault_mutex_hash(mapping, idx); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + hugetlb_vma_unlock_read(vma); + unmap_ref_private(mm, vma, old_page, haddr); + + hugetlb_vma_lock_read(vma); + mutex_lock(&hugetlb_fault_mutex_table[hash]); spin_lock(ptl); ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); if (likely(ptep && @@ -5563,14 +5596,16 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, }; /* - * hugetlb_fault_mutex and i_mmap_rwsem must be + * vma_lock and hugetlb_fault_mutex must be * dropped before handling userfault. Reacquire * after handling fault to make calling code simpler. */ + hugetlb_vma_unlock_read(vma); hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_unlock(&hugetlb_fault_mutex_table[hash]); ret = handle_userfault(&vmf, reason); mutex_lock(&hugetlb_fault_mutex_table[hash]); + hugetlb_vma_lock_read(vma); return ret; } @@ -5821,6 +5856,11 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); if (ptep) { + /* + * Since we hold no locks, ptep could be stale. That is + * OK as we are only making decisions based on content and + * not actually modifying content here. + */ entry = huge_ptep_get(ptep); if (unlikely(is_hugetlb_entry_migration(entry))) { migration_entry_wait_huge(vma, ptep); @@ -5828,23 +5868,35 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(hstate_index(h)); - } else { - ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); - if (!ptep) - return VM_FAULT_OOM; } - mapping = vma->vm_file->f_mapping; - idx = vma_hugecache_offset(h, vma, haddr); - /* * Serialize hugepage allocation and instantiation, so that we don't * get spurious allocation failures if two CPUs race to instantiate * the same page in the page cache. */ + mapping = vma->vm_file->f_mapping; + idx = vma_hugecache_offset(h, vma, haddr); hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_lock(&hugetlb_fault_mutex_table[hash]); + /* + * Acquire vma lock before calling huge_pte_alloc and hold + * until finished with ptep. This prevents huge_pmd_unshare from + * being called elsewhere and making the ptep no longer valid. + * + * ptep could have already be assigned via huge_pte_offset. That + * is OK, as huge_pte_alloc will return the same value unless + * something has changed. + */ + hugetlb_vma_lock_read(vma); + ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); + if (!ptep) { + hugetlb_vma_unlock_read(vma); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + return VM_FAULT_OOM; + } + entry = huge_ptep_get(ptep); /* PTE markers should be handled the same way as none pte */ if (huge_pte_none_mostly(entry)) { @@ -5908,6 +5960,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, unlock_page(pagecache_page); put_page(pagecache_page); } + hugetlb_vma_unlock_read(vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); return handle_userfault(&vmf, VM_UFFD_WP); } @@ -5951,6 +6004,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, put_page(pagecache_page); } out_mutex: + hugetlb_vma_unlock_read(vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); /* * Generally it's safe to hold refcount during waiting page lock. But @@ -6413,8 +6467,9 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, flush_cache_range(vma, range.start, range.end); mmu_notifier_invalidate_range_start(&range); - last_addr_mask = hugetlb_mask_last_page(h); + hugetlb_vma_lock_write(vma); i_mmap_lock_write(vma->vm_file->f_mapping); + last_addr_mask = hugetlb_mask_last_page(h); for (; address < end; address += psize) { spinlock_t *ptl; ptep = huge_pte_offset(mm, address, psize); @@ -6513,6 +6568,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, * See Documentation/mm/mmu_notifier.rst */ i_mmap_unlock_write(vma->vm_file->f_mapping); + hugetlb_vma_assert_locked(vma); mmu_notifier_invalidate_range_end(&range); return pages << h->order; @@ -6890,6 +6946,7 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, pud_t *pud = pud_offset(p4d, addr); i_mmap_assert_write_locked(vma->vm_file->f_mapping); + hugetlb_vma_assert_locked(vma); BUG_ON(page_count(virt_to_page(ptep)) == 0); if (page_count(virt_to_page(ptep)) == 1) return 0; @@ -7271,6 +7328,7 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, start, end); mmu_notifier_invalidate_range_start(&range); + hugetlb_vma_lock_write(vma); i_mmap_lock_write(vma->vm_file->f_mapping); for (address = start; address < end; address += PUD_SIZE) { ptep = huge_pte_offset(mm, address, sz); diff --git a/mm/memory.c b/mm/memory.c index 8917bea2f0bc..3131766f9c7d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1693,10 +1693,12 @@ static void unmap_single_vma(struct mmu_gather *tlb, if (vma->vm_file) { zap_flags_t zap_flags = details ? details->zap_flags : 0; + hugetlb_vma_lock_write(vma); i_mmap_lock_write(vma->vm_file->f_mapping); __unmap_hugepage_range_final(tlb, vma, start, end, NULL, zap_flags); i_mmap_unlock_write(vma->vm_file->f_mapping); + hugetlb_vma_unlock_write(vma); } } else unmap_page_range(tlb, vma, start, end, details); diff --git a/mm/rmap.c b/mm/rmap.c index 64076c2a49c1..e1c19d86cea6 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1557,24 +1557,38 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * To call huge_pmd_unshare, i_mmap_rwsem must be * held in write mode. Caller needs to explicitly * do this outside rmap routines. + * + * We also must hold hugetlb vma_lock in write mode. + * Lock order dictates acquiring vma_lock BEFORE + * i_mmap_rwsem. We can only try lock here and fail + * if unsuccessful. */ - VM_BUG_ON(!anon && !(flags & TTU_RMAP_LOCKED)); - if (!anon && huge_pmd_unshare(mm, vma, address, pvmw.pte)) { - flush_tlb_range(vma, range.start, range.end); - mmu_notifier_invalidate_range(mm, range.start, - range.end); - - /* - * The ref count of the PMD page was dropped - * which is part of the way map counting - * is done for shared PMDs. Return 'true' - * here. When there is no other sharing, - * huge_pmd_unshare returns false and we will - * unmap the actual page and drop map count - * to zero. - */ - page_vma_mapped_walk_done(&pvmw); - break; + if (!anon) { + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + if (!hugetlb_vma_trylock_write(vma)) { + page_vma_mapped_walk_done(&pvmw); + ret = false; + } + if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { + hugetlb_vma_unlock_write(vma); + flush_tlb_range(vma, + range.start, range.end); + mmu_notifier_invalidate_range(mm, + range.start, range.end); + /* + * The ref count of the PMD page was + * dropped which is part of the way map + * counting is done for shared PMDs. + * Return 'true' here. When there is + * no other sharing, huge_pmd_unshare + * returns false and we will unmap the + * actual page and drop map count + * to zero. + */ + page_vma_mapped_walk_done(&pvmw); + break; + } + hugetlb_vma_unlock_write(vma); } pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { @@ -1933,26 +1947,41 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, * To call huge_pmd_unshare, i_mmap_rwsem must be * held in write mode. Caller needs to explicitly * do this outside rmap routines. + * + * We also must hold hugetlb vma_lock in write mode. + * Lock order dictates acquiring vma_lock BEFORE + * i_mmap_rwsem. We can only try lock here and + * fail if unsuccessful. */ - VM_BUG_ON(!anon && !(flags & TTU_RMAP_LOCKED)); - if (!anon && huge_pmd_unshare(mm, vma, address, pvmw.pte)) { - flush_tlb_range(vma, range.start, range.end); - mmu_notifier_invalidate_range(mm, range.start, - range.end); - - /* - * The ref count of the PMD page was dropped - * which is part of the way map counting - * is done for shared PMDs. Return 'true' - * here. When there is no other sharing, - * huge_pmd_unshare returns false and we will - * unmap the actual page and drop map count - * to zero. - */ - page_vma_mapped_walk_done(&pvmw); - break; + if (!anon) { + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + if (!hugetlb_vma_trylock_write(vma)) { + page_vma_mapped_walk_done(&pvmw); + ret = false; + break; + } + if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { + hugetlb_vma_unlock_write(vma); + flush_tlb_range(vma, + range.start, range.end); + mmu_notifier_invalidate_range(mm, + range.start, range.end); + + /* + * The ref count of the PMD page was + * dropped which is part of the way map + * counting is done for shared PMDs. + * Return 'true' here. When there is + * no other sharing, huge_pmd_unshare + * returns false and we will unmap the + * actual page and drop map count + * to zero. + */ + page_vma_mapped_walk_done(&pvmw); + break; + } + hugetlb_vma_unlock_write(vma); } - /* Nuke the hugetlb page table entry */ pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 3225b5f70bd8..1894e545a1a2 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -377,16 +377,21 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, BUG_ON(dst_addr >= dst_start + len); /* - * Serialize via hugetlb_fault_mutex. + * Serialize via vma_lock and hugetlb_fault_mutex. + * vma_lock ensures the dst_pte remains valid even + * in the case of shared pmds. fault mutex prevents + * races with other faulting threads. */ idx = linear_page_index(dst_vma, dst_addr); mapping = dst_vma->vm_file->f_mapping; hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_lock(&hugetlb_fault_mutex_table[hash]); + hugetlb_vma_lock_read(dst_vma); err = -ENOMEM; dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize); if (!dst_pte) { + hugetlb_vma_unlock_read(dst_vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out_unlock; } @@ -394,6 +399,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, if (mode != MCOPY_ATOMIC_CONTINUE && !huge_pte_none_mostly(huge_ptep_get(dst_pte))) { err = -EEXIST; + hugetlb_vma_unlock_read(dst_vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out_unlock; } @@ -402,6 +408,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, dst_addr, src_addr, mode, &page, wp_copy); + hugetlb_vma_unlock_read(dst_vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); cond_resched();