From patchwork Mon Feb 5 19:18:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13546071 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0462EC4828D for ; Mon, 5 Feb 2024 19:19:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7B37D6B0081; Mon, 5 Feb 2024 14:19:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 763B36B0082; Mon, 5 Feb 2024 14:19:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B56C6B0083; Mon, 5 Feb 2024 14:19:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 4A4E26B0081 for ; Mon, 5 Feb 2024 14:19:01 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 16886803A1 for ; Mon, 5 Feb 2024 19:19:01 +0000 (UTC) X-FDA: 81758712882.06.8D9B6FA Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com [209.85.208.52]) by imf21.hostedemail.com (Postfix) with ESMTP id 3AE9A1C0012 for ; Mon, 5 Feb 2024 19:18:58 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of breno.debian@gmail.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=breno.debian@gmail.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707160739; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7YfrI5usOrBhq2NBpdCHSIDWPFSL+4qU1L11CadSm6A=; b=sFfyWu0x9RlJzp40Egu3mqu5p20EF9QhhNsxO4mb9tYSzAxCMcBYUEd24goQMWvuGE71ym aWIF/FXBkld8LmTQ0dJMilIBI+rOHNLX04W00bJtscOzHdKrn1YeoM/aX0XgZ6G4o0OmZr Wcm4ANfXkpLHi1R8oqIei6Echo+Jez8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707160739; a=rsa-sha256; cv=none; b=Qj64tSng5ek/iinqdHJ6/g+NPnaiwXdkl0TRhEwKRqQiLDzewaRT1KojKYtINMhSNUGEi6 CqwLzvFGTEKJyltLdK0qA5jtG+qsPFzjT6Nwiqtnb3Rty836jLx7Z+ViomzgL6TVBSRqNK wfYUGRt169YL1w3TahBCDEoTZJflIGk= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of breno.debian@gmail.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=breno.debian@gmail.com; dmarc=none Received: by mail-ed1-f52.google.com with SMTP id 4fb4d7f45d1cf-56025fcaebaso2473785a12.0 for ; Mon, 05 Feb 2024 11:18:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707160738; x=1707765538; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7YfrI5usOrBhq2NBpdCHSIDWPFSL+4qU1L11CadSm6A=; b=cclekSsef/G2Jg07qdPyG2nqvo+J0LlhLw2/K5CDVj8vexuP2TVJSWSEiJtWQaXXHY stZsJo6qISN76VUPL9bAINizwOcxX/C/dSewqi6ST63IG9UmtUrjj8TBRmTWZQcmJCeY FiQfwTdsZegCGuNY3HO/kFfbDqfbezkVNX6Ngejoicn8+m/C3hOw00Te3wOF+aPQ2hlz tqZ1c4i3ZwoTDFmyX8wgz4Gg8I9mo96pZQ+/GxNJ715UcA+bWPImqMF3KRKyom12L6Ws Rp8gV5iTvMoaM4RRxRK80H5IHQNJG6ipg/7eIAeHH2yAUS6egKOvwf/99vjjLrYV8Wvf FmWw== X-Gm-Message-State: AOJu0YyuCtqP9ekzoG7nroz8CL8KYlq/cxt1014nIKzTWYbAoLzu0f9R U8jMkddo0M4YnmT76BNCH5JS3fvpUv7MG5zRp/Ml16fuXXel4TIf X-Google-Smtp-Source: AGHT+IFgHGSXN1+B6BcH3MtCqD1qFGb9lFA9lq3vDOdsEG6QnqMggULNf8ZrBeQOV4xk1iJaLYSJZw== X-Received: by 2002:a17:907:7746:b0:a28:d1f9:976 with SMTP id kx6-20020a170907774600b00a28d1f90976mr228274ejc.65.1707160737874; Mon, 05 Feb 2024 11:18:57 -0800 (PST) X-Forwarded-Encrypted: i=0; AJvYcCXVm5RKPt+Hlz22NnQj7TZfmCmCPsU/XvEaNZ9YvFqmT9hiTOEBxbGNZw9X7DUtafu4eEfhaq4K6nUvuLarSkrwmFkypvNZGY7hG96yJ7PZzbhjJLZtnmzXOjqjEQrp5aPiBokhkMhhB+dDl40ykJWzz4TeDugPTAoqJy4oRsQdUAiyiMpxskMZnvCEJLgVgWHWUpMb0kUD6GNkKfx97LTrv7PzZIWgemoQZ9XX3oX1jL7wXkGnMuJesFH0OUtC1S74UkOpFrdBMgKDHDmUuuWS4PsAOYHq/7o9vAVwvNpkrvZI9hK79X4KJPuwIP9BKcVJr2B9i0CxaPS8 Received: from localhost (fwdproxy-cln-021.fbsv.net. [2a03:2880:31ff:15::face:b00c]) by smtp.gmail.com with ESMTPSA id x15-20020a170906134f00b00a372b38380csm167637ejb.13.2024.02.05.11.18.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 11:18:57 -0800 (PST) From: Breno Leitao To: mike.kravetz@oracle.com, linux-mm@kvack.org, akpm@linux-foundation.org, muchun.song@linux.dev Cc: lstoakes@gmail.com, willy@infradead.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, linux-kernel@vger.kernel.org, Rik van Riel Subject: [PATCH v2 1/2] mm/hugetlb: Restore the reservation if needed Date: Mon, 5 Feb 2024 11:18:41 -0800 Message-Id: <20240205191843.4009640-2-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240205191843.4009640-1-leitao@debian.org> References: <20240205191843.4009640-1-leitao@debian.org> MIME-Version: 1.0 X-Stat-Signature: d1yu5xsges5d54o1qq7oa4m3nff3hkyh X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 3AE9A1C0012 X-Rspam-User: X-HE-Tag: 1707160738-10602 X-HE-Meta: U2FsdGVkX1/rKc9jgMt3+K8UovrqoIWl84AaTwqrP5i+9Wkjo8ikP5/SJzJ+8YorL1MT8hFPIxCItsd660IyrMCa+TeMBlEpjGcokWU35FrZvUC7BeldGgxp/i37tS+wWK2Wqf2rrJ2MvF/ugXPUwJgEuzi6D7X+eEfz4qUgEfzcpNL2oUFkXQFZV4/QayeOMC2Q650SW7ZCTHZySEp8Inu+IiBpVdh4lw4mAqsdqXfmkkQTbTB/8Y5YbkXuxqBy859e7C/0EFHAHr8Km6E4woDfDGF4xNoq0ikZd85wLxduF5ApsG5B4isoUPscNY/1QW3Ab5A4rVvTGeORDrS1HEc81JGC6tZPWGl67TRBD3iRTf6OlULoBwbMTqg9G+cm8fCc8bVLYU8ReG/UbrU1sRwZT/dZA+cRsmkRCXNxRHB+M2WelGiWr00Eexs9nBN6HHQDU22/qnYBACOVOxTIqMnJsnxUtefKbq6junzdu2ze0qqLXXN7uAA1aEiUwqEmpJxd5hNojZa3HYiJRd6XuD45odG182nkhJo693iWH7KbG8GEJOo7TGBvabCy6kG9E/3AW1X+6RJd9KFtQTMo8BRpwsabgcGEKzmGCWFTvN3B2ELiuFMg9ZbXZY0eAtSF9nT9hYaKRImqhHFpqp31YlWZhh4FyXAhiWQ5Io6pBGpYmODus5fzOK7m6oykp9z9hWqmAT5ALM+NuHtAARJgF5VvGlnoHuYEe2GUmesh1B8nE6hpTkKcUIGpdN1ZFZZ/WIMfYKyYEOF7U015+kBpNWL83TW9ub+Mgld9wIrHuXzSF79IEgc+KZaZO4fLz60z9DioCEzv7b6+fnlfREOdsnV8wgeDZrkSJQWHQ6jenW3CDzkO4OZk2G8mmR+NPBq/p6/vXCV+8wdbhq3ZelHmicb90WHswNoZKJ628Pr6nHtf0NZVsFquL8/9HUkIG46RwBUgROVVo/QOjQACXit Bp7zgWuP JINE+mVe8gZ7uXvnVXZ57Tmx2SvF/m1A16MCWPGrv+1rhqcHOR2OMy2E5d6mKDiCfm1lUrIwQu3mY89uM8d0h0QqjK6IHlYHcSiN5yCSuzGeWICFnTt+PhcNE9CfTaaT78al7zhZGIfkDE1MhIAutrGzyopQGaSajTkeDoUV+xlwwLNYUpNGFXHpwqEem3+sw+o7jTtUC0NK8TBpc79Pqo+3AWTavNV2146ICbzurWUHJL17M121WM1aZvT340UD0UV+S5wOQqUvGaqbCr0IJE3LgtQaZ0ocsrJqnwtMOsA1VokEL6VntIMZNisB1hOmN6BF8en4tiBzZGL955unA8MqapPst2rDimWf542NucztOZHYtdpPlBXAgmociaj5WvgZ9toJodblMXS3oiHDGRTmfJtiVj6A9th3g/kZM1v96qmpu1f3dUGEy50R7cI0u36n6aPW+RsryUd62vsRRBkDKnNAp2toaBL3PYuzT/RRFE690HXAbWKRBgeS46OSVwkIvAj5aXtKAftBK4kCiHfGDCg86deDhW888ZeexKcJTEo7WthgTcXsj0MKHZ8F4F02j X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently there is a bug that a huge page could be stolen, and when the original owner tries to fault in it, it causes a page fault. You can achieve that by: 1) Creating a single page echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 2) mmap() the page above with MAP_HUGETLB into (void *ptr1). * This will mark the page as reserved 3) touch the page, which causes a page fault and allocates the page * This will move the page out of the free list. * It will also unreserved the page, since there is no more free page 4) madvise(MADV_DONTNEED) the page * This will free the page, but not mark it as reserved. 5) Allocate a secondary page with mmap(MAP_HUGETLB) into (void *ptr2). * it should fail, but, since there is no more available page. * But, since the page above is not reserved, this mmap() succeed. 6) Faulting at ptr1 will cause a SIGBUS * it will try to allocate a huge page, but there is none available A full reproducer is in selftest. See https://lore.kernel.org/all/20240105155419.1939484-1-leitao@debian.org/ Fix this by restoring the reserved page if necessary. These are the condition for the page restore: * The system is not using surplus pages. The goal is to reduce the surplus usage for this case. * If the VMA has the HPAGE_RESV_OWNER flag set, and is PRIVATE. This is safely checked using __vma_private_lock() * The page is anonymous Once this is scenario is found, set the `hugetlb_restore_reserve` bit in the folio. Then check if the resv reservations need to be adjusted later, done later, after the spinlock, since the vma_xxxx_reservation() might touch the file system lock. Suggested-by: Rik van Riel Signed-off-by: Breno Leitao --- mm/hugetlb.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ed1581b670d4..44f1e6366d04 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5585,6 +5585,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, struct page *page; struct hstate *h = hstate_vma(vma); unsigned long sz = huge_page_size(h); + bool adjust_reservation = false; unsigned long last_addr_mask; bool force_flush = false; @@ -5677,7 +5678,31 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, hugetlb_count_sub(pages_per_huge_page(h), mm); hugetlb_remove_rmap(page_folio(page)); + /* + * Restore the reservation for anonymous page, otherwise the + * backing page could be stolen by someone. + * If there we are freeing a surplus, do not set the restore + * reservation bit. + */ + if (!h->surplus_huge_pages && __vma_private_lock(vma) && + folio_test_anon(page_folio(page))) { + folio_set_hugetlb_restore_reserve(page_folio(page)); + /* Reservation to be adjusted after the spin lock */ + adjust_reservation = true; + } + spin_unlock(ptl); + + /* + * Adjust the reservation for the region that will have the + * reserve restored. Keep in mind that vma_needs_reservation() changes + * resv->adds_in_progress if it succeeds. If this is not done, + * do_exit() will not see it, and will keep the reservation + * forever. + */ + if (adjust_reservation && vma_needs_reservation(h, vma, address)) + vma_add_reservation(h, vma, address); + tlb_remove_page_size(tlb, page, huge_page_size(h)); /* * Bail out after unmapping reference page if supplied