From patchwork Sun Oct 30 21:29:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13025214 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B031FC38A02 for ; Sun, 30 Oct 2022 21:29:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1BAFB6B0073; Sun, 30 Oct 2022 17:29:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 16A958E0001; Sun, 30 Oct 2022 17:29:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 031AC6B0075; Sun, 30 Oct 2022 17:29:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id EA66D6B0073 for ; Sun, 30 Oct 2022 17:29:38 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B7DC11604C6 for ; Sun, 30 Oct 2022 21:29:38 +0000 (UTC) X-FDA: 80078907636.08.D8ED36D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf13.hostedemail.com (Postfix) with ESMTP id 5CBF620009 for ; Sun, 30 Oct 2022 21:29:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667165377; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TYAYVHGHmOdGIx+b1vNePbFs/fCGWl7iYc8XlHT8evo=; b=UTV2q1+NLxAVRy9yBjNmGQagjhb6YAtmwHZ0KbfzpyLz1Xr7PjsH51Z3a1s2vOWCV8MG9Y c+aguZ9GsxuFY2iH8GVaTmZaZsR1of71BemC8XhkX9AfpAvnwquP4AJz+ho/uyef88jQdc T51GcvD63tj5owBYdAB6+AlSoEmK3ag= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-424-B0ZYMR7lPnS4ncT9fsOuXQ-1; Sun, 30 Oct 2022 17:29:36 -0400 X-MC-Unique: B0ZYMR7lPnS4ncT9fsOuXQ-1 Received: by mail-qt1-f200.google.com with SMTP id u31-20020a05622a199f00b003a51fa90654so1130154qtc.19 for ; Sun, 30 Oct 2022 14:29:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TYAYVHGHmOdGIx+b1vNePbFs/fCGWl7iYc8XlHT8evo=; b=0xZ6sf2xj+krUR8+GuZOV4c8apGunBImO34DilRSs0sUXd72TrisGV0gp66imPnfMA o/PFDpyswB3HexvJ1hne22U6WxdmUpvl2egwcMh/EW5uiv+7InIxl5fveXlP5MRQp+Vr qsaBsFT5HmNnwAbDe2CO4NKT0rV2h/cx9tQn/+j2QiomlRkU94E1dyY7+ymIoSZVjjzs kgD2LqCaGiI7TeMdKhLjGHRfZOlERm0TXD5ojUpozzivlLSckjDiOzVUbElN8cPQ6AZU Z+rGFwH021HjAh2dOff1HvLimeC2cNe7Yt4tZs66uEsTL4PQUxrr1NAnXBxqXSxDTxzc En8Q== X-Gm-Message-State: ACrzQf18iddY6Qw4UPAHkirrffeGis6+sN0q9u5N7UAiQCJGhBG0VmqD xDwZD6tf0KSKWmmfsxe1jodBxJwMaT4KBpt7xaYp1njyfBPAaKeAYDaKM9uYbKSlHuPDinakqx+ MwzIk94UyHzk= X-Received: by 2002:a05:620a:284a:b0:6ab:9cc5:cb4c with SMTP id h10-20020a05620a284a00b006ab9cc5cb4cmr7121575qkp.609.1667165376109; Sun, 30 Oct 2022 14:29:36 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4KUeAeL4uqFBmuI9YDUOE8dGtH/Wl4ZTYcW7dVrCJqlvE5bFVFwU9LFOVe/w5t4Vrzl2V26A== X-Received: by 2002:a05:620a:284a:b0:6ab:9cc5:cb4c with SMTP id h10-20020a05620a284a00b006ab9cc5cb4cmr7121563qkp.609.1667165375878; Sun, 30 Oct 2022 14:29:35 -0700 (PDT) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id a1-20020ac81081000000b003a4b88b886esm2654781qtj.96.2022.10.30.14.29.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 30 Oct 2022 14:29:35 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Andrew Morton , James Houghton , Miaohe Lin , David Hildenbrand , Muchun Song , Andrea Arcangeli , Nadav Amit , Mike Kravetz , peterx@redhat.com, Rik van Riel Subject: [PATCH RFC 02/10] mm/hugetlb: Comment huge_pte_offset() for its locking requirements Date: Sun, 30 Oct 2022 17:29:21 -0400 Message-Id: <20221030212929.335473-3-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221030212929.335473-1-peterx@redhat.com> References: <20221030212929.335473-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-type: text/plain ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667165378; a=rsa-sha256; cv=none; b=bWjfdfMkdwwPM/+vdtmRZGgdgl3hq8Fv5JkINA4a9wxdQErkLtAoOUTNcS3mPazWeXEGSj 4X0InKe6zZJcCLXJQKilceLpR5Z6b7T+2/JdEnA46pncLb0ltjeMjcO1OaAUf5AW1jMEnq 5Cq916fhLJRX2dA6Ms1di7UjVYWqhRE= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UTV2q1+N; spf=pass (imf13.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667165378; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TYAYVHGHmOdGIx+b1vNePbFs/fCGWl7iYc8XlHT8evo=; b=52f3l+o668d203rhl4fQN7LGsmVpN9xqc0Fsf7KYBe13342qXbvvjBwGQykV0BxFMUwgOt tPj8oZjBQj5CpowGKRzrZCbylFQEVDpK7TtZPZigLcWD3Pth837INuAAkUG36uxZU7Ss7Q vpZ1NtbwIvCx+3GTWKS7AX6zYIevSY4= Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UTV2q1+N; spf=pass (imf13.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 5CBF620009 X-Stat-Signature: p5mxs46z8hjw5udqm1r7ii1ea5seer71 X-Rspam-User: X-HE-Tag: 1667165378-684837 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: huge_pte_offset() is potentially a pgtable walker, looking up pte_t* for a hugetlb address. Normally, it's always safe to walk the pgtable as long as we're with the mmap lock held for either read or write, because that guarantees the pgtable pages will always be valid during the process. But it's not true for hugetlbfs: hugetlbfs has the pmd sharing feature, it means that even with mmap lock held, the PUD pgtable page can still go away from under us if pmd unsharing is possible during the walk. It's not always the case, e.g.: (1) If the mapping is private we're not prone to pmd sharing or unsharing, so it's okay. (2) If we're with the hugetlb vma lock held for either read/write, it's okay too because pmd unshare cannot happen at all. Document all these explicitly for huge_pte_offset(), because it's really not that obvious. This also tells all the callers on what it needs to guarantee huge_pte_offset() thread-safety. Signed-off-by: Peter Xu --- arch/arm64/mm/hugetlbpage.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 35e9a468d13e..0bf930c75d4b 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -329,6 +329,38 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, return ptep; } +/* + * huge_pte_offset(): Walk the hugetlb pgtable until the last level PTE. + * Returns the pte_t* if found, or NULL if the address is not mapped. + * + * NOTE: since this function will walk all the pgtable pages (including not + * only high-level pgtable page, but also PUD that can be unshared + * concurrently for VM_SHARED), the caller of this function should be + * responsible of its thread safety. One can follow this rule: + * + * (1) For private mappings: pmd unsharing is not possible, so it'll + * always be safe if we're with the mmap sem for either read or write. + * This is normally always the case, so IOW we don't need to do + * anything special. + * + * (2) For shared mappings: pmd unsharing is possible (so the PUD-ranged + * pgtable page can go away from under us! It can be done by a pmd + * unshare with a follow up munmap() on the other process), then we + * need either: + * + * (2.1) hugetlb vma lock read or write held, to make sure pmd unshare + * won't happen upon the range (it also makes sure the pte_t we + * read is the right and stable one), or, + * + * (2.2) RCU read lock, to make sure even pmd unsharing happened, the + * old shared PUD page won't get freed from under us, so even of + * the pteval can be obsolete, at least it's still always safe to + * access the pgtable page (e.g., de-referencing pte_t* would not + * cause use-after-free). + * + * PS: from the regard of (2.2), it's the same logic of fast-gup being safe + * for generic mm, as long as RCU is used to free any pgtable page. + */ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz) {