From patchwork Sun Oct 7 23:38:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 10629843 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4227C112B for ; Sun, 7 Oct 2018 23:39:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 348B828B54 for ; Sun, 7 Oct 2018 23:39:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2841728B99; Sun, 7 Oct 2018 23:39:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B568128B54 for ; Sun, 7 Oct 2018 23:39:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 710296B000D; Sun, 7 Oct 2018 19:39:17 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 69B596B000E; Sun, 7 Oct 2018 19:39:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53BC86B0010; Sun, 7 Oct 2018 19:39:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-yw1-f70.google.com (mail-yw1-f70.google.com [209.85.161.70]) by kanga.kvack.org (Postfix) with ESMTP id 1C6B96B000D for ; Sun, 7 Oct 2018 19:39:17 -0400 (EDT) Received: by mail-yw1-f70.google.com with SMTP id x5-v6so5288279ywd.19 for ; Sun, 07 Oct 2018 16:39:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id; bh=aCmsQNyRgqSu/ASYBs+TVQhB3o0ELTVPJmnIBQcjzy8=; b=qHmw0luigXvr9a7hT+0LyTLxUZWM6tFQyVnEmOh4BO9LAPk8xKhXsP8ikShEccdl0k Mceo5mEpEatAfo6sy0xJyLwGkaEy2DTCYt0DyQ8ZgVJ7I+4heNFjmNbbbuAdh3tYqA9g 8Zn2RnuVBzT/yJs0V1bU1GC9F8VhvrM7Kyvocm1gu3+OmPUC52Xh3c7yYdqCPzW1j9LF z8XR5yP7t7j9ho7+v2M7kJ+x9W/K5mxC0Lj45aK51GmWtqxgFmFAhBKcaGWtBxnPCcsP EECziap3T6JYe2UMEI+n6PZv0YQ9g5cSX+ybpRWT7i5/aygr4bbRLR/wXwFSLuH+tepH df4w== X-Gm-Message-State: ABuFfoijicJDIKjQPT0LqlJezSf2/yKoQdEAbhyBDyJd6kTSlNPdE+ng N0NIdUtCPkWz4tSJDoxUAbEuRt1OIVq+gladJ+bld5sVvWt3jvOTLJW9n8MP/vwLUVyLr47fN/d dMLhao4mW4lDDB9YgPOtUNt1P21VnIOBs8VpWLVbCtcjJ+ALfps21TsZCrHOtGI55+Q== X-Received: by 2002:a25:6652:: with SMTP id z18-v6mr11951759ybm.134.1538955556735; Sun, 07 Oct 2018 16:39:16 -0700 (PDT) X-Google-Smtp-Source: ACcGV60Mu2LLDCNpX7SiimaU+rWfeDa2wswW08hT7VwVuZibk+iSL57ZoQm2okCw51IuSPk9WxUh X-Received: by 2002:a25:6652:: with SMTP id z18-v6mr11951745ybm.134.1538955556208; Sun, 07 Oct 2018 16:39:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538955556; cv=none; d=google.com; s=arc-20160816; b=k2Cw3COqxhfYSOcOjv39/9wtw1GewNhttBr2uxhuaJ0XW9cyxtgTP9fwO6mVurr4PV KTFKwXK2H4P0FUjWKTKSUOf9iM8s1N9JATWZDQkDut/vFYvetNNdks3z9ffIclngxIva hAdmNAZ8YB5cYoB7XHfo36G7+RG7rtyGMOGfccLbZ0ZoxB/QmkMC+b45okgj8reopzK/ u1DR/uOqPamCTS3MG+qj3BANvWVntBnXid3LIuiLwyN+hbUITwCF/LSazv/iEyO1SopI Xz/6BYzU7nG/oUXyaW0oRuxxD5cihQNmRMxXmJLXnc08Ud82pHYrN/GImZuS14ZwVCK+ XB2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:cc:to:from:dkim-signature; bh=aCmsQNyRgqSu/ASYBs+TVQhB3o0ELTVPJmnIBQcjzy8=; b=XftBCEYwGQ/8iZ8b2qAPpY2ww8rkQ23dxVVCSNgZZyqEAlqbhyDfPREtmKHK+Bi/Km BXHIZ0wbKITUHtDd3XZjQJiJdT8GpkZbq7hyjveknSDpNQDF8/O+0dHnZHrku0RnnuKN +hPTE7hgbGNyaHDcY4URcIiPsn3LFgQgbBG97Z2m7G2cgb2X68d2EfezMrZbcPi4UcYN i/egOMUYERgPXwi/ZnMaf/s/qbRPeMgNoff2n3l5vo+T1riclGxnfOJvZGt8QetCfjwe IT53AgwQTbTwpEw70H5EgG6hmvTDv1TP/+BmQaePtIhaDg1XF/CE57Wq0z4ns+dcmOsQ iO6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=Jl+yxqVR; spf=pass (google.com: domain of mike.kravetz@oracle.com designates 141.146.126.78 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from aserp2120.oracle.com (aserp2120.oracle.com. [141.146.126.78]) by mx.google.com with ESMTPS id p67-v6si3733213ybp.473.2018.10.07.16.39.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 07 Oct 2018 16:39:16 -0700 (PDT) Received-SPF: pass (google.com: domain of mike.kravetz@oracle.com designates 141.146.126.78 as permitted sender) client-ip=141.146.126.78; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=Jl+yxqVR; spf=pass (google.com: domain of mike.kravetz@oracle.com designates 141.146.126.78 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w97NYkjT030745; Sun, 7 Oct 2018 23:39:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id; s=corp-2018-07-02; bh=aCmsQNyRgqSu/ASYBs+TVQhB3o0ELTVPJmnIBQcjzy8=; b=Jl+yxqVRTAdAs2wpTEMzlbHe/75ha0vw4hbCBzjz4oQu6nm9UWJ7/Z2NKti1bofDdggm vjLzK6JM8aobd34YeJy4oLvkA+zSQj/fswxIfT4LAv/eYVrmGQVWJUx4376UwcOXAa15 4RxTy06LH0Q7kHzlzTY8GOFyK2DCLkFs4jEwUbsNkgG23LeJe+C0kNa9iKOv4LUgtZnW 2XEsGE+Udofbo0hlLk/9Jbj31EkraevyoyvnL1ysGmpyLyLogHwTNZsHaUkAEilqib4U YrRAAN10erNB3LllpPyFO4IcuFFZqfk96QlgFyOmmkXNVuT79L5WVnmQI804E/KM66ls yQ== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2120.oracle.com with ESMTP id 2mxn0pkgym-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sun, 07 Oct 2018 23:39:08 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w97Nd1vU018735 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sun, 7 Oct 2018 23:39:02 GMT Received: from abhmp0002.oracle.com (abhmp0002.oracle.com [141.146.116.8]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w97Ncwsa032572; Sun, 7 Oct 2018 23:38:59 GMT Received: from monkey.oracle.com (/50.38.38.67) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 07 Oct 2018 23:38:58 +0000 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Michal Hocko , Hugh Dickins , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Mike Kravetz Subject: [PATCH RFC 0/1] hugetlbfs: fix truncate/fault races Date: Sun, 7 Oct 2018 16:38:47 -0700 Message-Id: <20181007233848.13397-1-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.17.1 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9039 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=295 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1810070241 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Our DB team noticed negative hugetlb reserved page counts during development testing. Related meminfo fields were as follows on one system: HugePages_Total: 47143 HugePages_Free: 45610 HugePages_Rsvd: 18446744073709551613 HugePages_Surp: 0 Hugepagesize: 2048 kB Code inspection revealed that the most likely cause were races with truncate and page faults. In fact, I could write a not too complicated program to cause the races and recreate the issue. Way back in 2006, Hugh Dickins created a patch (ebed4bfc8da8) with this message: "[PATCH] hugetlb: fix absurd HugePages_Rsvd If you truncated an mmap'ed hugetlbfs file, then faulted on the truncated area, /proc/meminfo's HugePages_Rsvd wrapped hugely "negative". Reinstate my preliminary i_size check before attempting to allocate the page (though this only fixes the most obvious case: more work will be needed here)." Looks like we need to do more work. While looking at the code, there were many issues to correctly handle racing and back out changes partially made. Instead, why not just introduce a rw mutex to prevent the races. Page faults would take the mutex in read mode to allow multiple faults in parallel as it works today. Truncate code would take the mutex in write mode and prevent faults for the duration of truncate processing. This seems almost too obvious. Something must be wrong with this approach, or others would have employed it earlier. The following patch describes the current race in detail and adds the mutex to prevent truncate/fault races. Mike Kravetz (1): hugetlbfs: introduce truncation/fault mutex to avoid races fs/hugetlbfs/inode.c | 24 ++++++++++++++++++++---- include/linux/hugetlb.h | 1 + mm/hugetlb.c | 25 +++++++++++++++++++------ mm/userfaultfd.c | 8 +++++++- 4 files changed, 47 insertions(+), 11 deletions(-)