From patchwork Sat Feb 18 00:28:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13145401 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58DC1C6379F for ; Sat, 18 Feb 2023 00:29:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4D28B280020; Fri, 17 Feb 2023 19:29:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 45877280002; Fri, 17 Feb 2023 19:29:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2D24D280020; Fri, 17 Feb 2023 19:29:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 20C4C280002 for ; Fri, 17 Feb 2023 19:29:22 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id F4226405E8 for ; Sat, 18 Feb 2023 00:29:21 +0000 (UTC) X-FDA: 80478528564.16.834E979 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf09.hostedemail.com (Postfix) with ESMTP id 40BE1140005 for ; Sat, 18 Feb 2023 00:29:20 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=lcSeD08U; spf=pass (imf09.hostedemail.com: domain of 33xvwYwoKCPwnxlsyklxsrksskpi.gsqpmry1-qqozego.svk@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=33xvwYwoKCPwnxlsyklxsrksskpi.gsqpmry1-qqozego.svk@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676680160; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6FTaaiplXAWGfEVqPjHy0St/TFfLKP8Xa0naqtqNKxQ=; b=d1vyPhtt+8EnU3bLHHXF/w1YhcnFUMUcB02OtV5Tl1pnNXFB0jG20JRjU6if3fuZ9ysJ+o 1OkeZ3FyRE6q2gV+2ZcLWVXEqIHx0JaW6x9KjJtRYLUFM74yNcOerKiQDh1oEyYTJalNzG Hr9PvGV7vcS3TGjKU7Vz49vDa03xPCg= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=lcSeD08U; spf=pass (imf09.hostedemail.com: domain of 33xvwYwoKCPwnxlsyklxsrksskpi.gsqpmry1-qqozego.svk@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=33xvwYwoKCPwnxlsyklxsrksskpi.gsqpmry1-qqozego.svk@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676680160; a=rsa-sha256; cv=none; b=e+MRAzio6WZYnMASHZCVxUmZbUG0RqBGhrDXm+rQT3ekASIPYPLtW+tYySr0CT8kYApYvQ mv8FoKqjagjsEqfBTL6YxU2gzKepJmhvDkOWBGqxUr5qYxkC7/174rocjD0+boJ24CSDFB UojyzagQKCpV1fi87U9ir+qycVaEkcU= Received: by mail-yb1-f201.google.com with SMTP id w6-20020a25c706000000b0098592b9ff86so2698600ybe.9 for ; Fri, 17 Feb 2023 16:29:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=6FTaaiplXAWGfEVqPjHy0St/TFfLKP8Xa0naqtqNKxQ=; b=lcSeD08U2fduS0Ppbd/dwZ1sVutpIAR5utesmektFgF0YNOfket/WkjCBRvoXlNLiV +o2Aogs/gc+f50ybLSWxFhCjImIYIGSTQeUOhgRWJa7vrQRBVyoRIEpYpw2YRWcA9U41 WyAW0Ab++KWiCATJErqWrBwpBVdFZBa+QSN736IFEusFdICh8RUH5fL8c0v/Y8PEr8Br Qe8eJUa76VSmD3HP1n1esGFDMJpGGHWAJXm6Yqaf73w38JldN0uF+PQwqsmagB2uCrvE JVZuFpSJuNpEY8qrsEqwZB/8hVTroKRCuulZOvJKAfU7a7tyMxK2NmKF8QMW0V3Aw8B3 CM8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6FTaaiplXAWGfEVqPjHy0St/TFfLKP8Xa0naqtqNKxQ=; b=qlHlVxK0wLJe+i8HCyshMIFdyaISK1EkJhLgRJMO4RxOLuTptavrIAc5LR7JgexEwZ pEmlEIX7dsQOQFixQd4+X3IRXZEiHjA0fQtwGBCRnZJzwqq7tNt7RMnmtejTabhXPDcj gJZvVAgcI6t9pdkamH+tUvtRWkAjW9nIHwAOMkvQpGlY77cGNNxjabFA/QEWdlFToFOL h3LcviDC3LS6DLpEiTRiAvtGIrAvMxqoo3YDN+iu5Zl+gg/ziETzpbyLFPylNTEquwf5 5tB6QokeKkmQFXlH/dIySVTGSmw3YJMXL0TbIHo7hHomVGoWU7GHA4XRLjVb7fV/S722 0C9Q== X-Gm-Message-State: AO0yUKVxD0jvMtDZGol5LMunH7XLUIxTza+sKwYrEUxmrQQg7HIxMWqv KjIj0ZhdYh4KWWERHqugakBlntvATPN+YGhu X-Google-Smtp-Source: AK7set8ceT73sh6JAXytc6nngKglB94I/eez0+U3CbTB6Hmq6cgS64DE3iyooTxML7Ni0PUJg/39Zu2G+xo8+s0D X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:f49:0:b0:995:ccb:1aae with SMTP id y9-20020a5b0f49000000b009950ccb1aaemr85936ybr.13.1676680159411; Fri, 17 Feb 2023 16:29:19 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:08 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-36-jthoughton@google.com> Subject: [PATCH v2 35/46] hugetlb: add check to prevent refcount overflow via HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 40BE1140005 X-Stat-Signature: is64g171pf9869dxz3cs7io4ak94645a X-Rspam-User: X-HE-Tag: 1676680160-598857 X-HE-Meta: U2FsdGVkX1+Qhdngt+yCIPMW/rZCF9D1dAYepWQPccUtpjuj320jIdxsM8424XnaFlxLcwaSmng7Rk1/k0i99h9np6XhUPdBQB3CLR+6TxM+jEdnKecLNEJNmeWOsgCluXRsbNTiiC57RL0yuUhisv/T57teArpne2gjqYBBbPLox2ADIyg6YS8aL9+BAH1V3kA7npL2jqReCE7eeD9gwI03ajDwk1qqk7Iq+oiYMYl+SyL1UYdvlnBodp7MSoMsgMCThy3FN0wU9OhLK7QTxI6PCSCZBKn1w1zNeVPfuD+In3DLwolCRFG0dWNEzKuupH6EkVhhGOxuKm804U7oyeV+A92UnWH4yqtPUHfaiqk/cwpYu6CtZztI9+8gstC8hSmWkRGOmjyx+ia/UMiFk8Bgwxr+hb19TfeRELK89ZIt1z6GugCJmkg/pZ9UzEHkgB9MxsGoPIOjDLrFvxc4NVQJfFoDYQS8BEJR/0c5Pmc2mvvOpld4EYqMSry//Wrn/T5ennUFnVSkMXeJbe5NgT80j0Qua3HTNVZZRCvHo3wd4KgfGkME72qOVtxxSMSoR3ZZWTyEYpqmtuWijZ5i3xVIc9em5o+PIHFhnRL7zK1QXij1dONoymVsu4S13KZtYPPzR0US+mCET4v2WKyJceilW9FGnN5HrvVmaxJgpZgGlVYu86xk9bCEMno7XHuIO4LdpBcjpfXciptj128DtPvmNEWGKVOlNDDtJsEM5o8S5Y+JmrxeJT1I0mbMUJ8ySiipvgX2KFk7faf31qBBXU869zxkWIgyq46SLdS+QL7V7jy5fYwanBYMVZ9v3kxN2Uzh3/SnPYmGA4q+GAhLYUd7rFLotw7MjW/RM1caxzAmXiASRH5DsGpwsrV9Im7EZmKdMD19QB3D6EEKNwQ8frVs0JRQt6FpiVWfIzhjpkx/C7S1NrnNAeiCMPFMD8AgvbmJpC0vskSLCPF2QGu LJFZfTWp 2TjVnPlHVjLq2WGZ3cZaup1Nh5bpsDo4OjPHr5E4BRYyc2fR2TdMXjEllISSg6yiX9aj5zrJzhLoTH4Ugk/74DbX04421nMiPhvHcN61HyyYzCIvyKTFdEQVMsDwkif/F+3VH13Vf4gg/NVHvUioysME0MM28F53nmcuHTtMONGUUQnTJm2RKgwL8Sv4whFBcmJ2bNjl+GtcdWq/um8jdBAYO+e74KJc4CBsebjaZ6ovcMQpG5ucy4rTQFRnYK4synyGSo0uqWnZmksxVB8jozpteFOA86M0zFgJuaJwWEFYgByTspPTS6fbSiGeLW7eM3dZLZTseT68QxNw5y71AZifDawzY5Mgg1mSPpbWDcEFl7n1X6R8cFZpkBLmp/5M27SbQF3ykEzCPjLMe0BtXzSM6LuoNf80lqFLJqBHO7npANjB+ZPKxkKCkDQcYMv0zfvJkgWYISDXeUsVUJW3eujVVbNCXNdHwzZTkCjB3OgKfwCoA+4GeDaadsww+u1aQj/MkHqQtcu8JCYv0QiPr5M8i6w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: With high-granularity mappings, it becomes quite trivial for userspace to overflow a page's refcount or mapcount. It can be done like so: 1. Create a 1G hugetlbfs file with a single 1G page. 2. Create 8192 mappings of the file. 3. Use UFFDIO_CONTINUE to map every mapping at entirely 4K. Each time step 3 is done for a mapping, the refcount and mapcount will increase by 2^19 (512 * 512). Do that 2^13 times (8192), and you reach 2^31. To avoid this, WARN_ON_ONCE when the refcount goes negative. If this happens as a result of a page fault, return VM_FAULT_SIGBUS, and if it happens as a result of a UFFDIO_CONTINUE, return EFAULT. We can also create too many mappings by fork()ing a lot with VMAs setup such that page tables must be copied at fork()-time (like if we have VM_UFFD_WP). Use try_get_page() in copy_hugetlb_page_range() to deal with this. Signed-off-by: James Houghton diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c4d189e5f1fd..34368072dabe 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5397,7 +5397,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } else { ptepage = pte_page(entry); hpage = compound_head(ptepage); - get_page(hpage); + if (try_get_page(hpage)) { + ret = -EFAULT; + break; + } /* * Failing to duplicate the anon rmap is a rare case @@ -6132,6 +6135,30 @@ static bool hugetlb_pte_stable(struct hstate *h, struct hugetlb_pte *hpte, return same; } +/* + * Like filemap_lock_folio, but check the refcount of the page afterwards to + * check if we are at risk of overflowing refcount back to 0. + * + * This should be used in places that can be used to easily overflow refcount, + * like places that create high-granularity mappings. + */ +static struct folio *hugetlb_try_find_lock_folio(struct address_space *mapping, + pgoff_t idx) +{ + struct folio *folio = filemap_lock_folio(mapping, idx); + + /* + * This check is very similar to the one in try_get_page(). + * + * This check is inherently racy, so WARN_ON_ONCE() if this condition + * ever occurs. + */ + if (WARN_ON_ONCE(folio && folio_ref_count(folio) <= 0)) + return ERR_PTR(-EFAULT); + + return folio; +} + static vm_fault_t hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, @@ -6168,7 +6195,15 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * before we get page_table_lock. */ new_folio = false; - folio = filemap_lock_folio(mapping, idx); + folio = hugetlb_try_find_lock_folio(mapping, idx); + if (IS_ERR(folio)) { + /* + * We don't want to invoke the OOM killer here, as we aren't + * actually OOMing. + */ + ret = VM_FAULT_SIGBUS; + goto out; + } if (!folio) { size = i_size_read(mapping->host) >> huge_page_shift(h); if (idx >= size) @@ -6600,8 +6635,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, if (is_continue) { ret = -EFAULT; - folio = filemap_lock_folio(mapping, idx); - if (!folio) + folio = hugetlb_try_find_lock_folio(mapping, idx); + if (IS_ERR_OR_NULL(folio)) goto out; folio_in_pagecache = true; } else if (!*pagep) {