From patchwork Fri Dec 16 15:52:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 13075140 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B7D7C4332F for ; Fri, 16 Dec 2022 15:52:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 27CC88E0008; Fri, 16 Dec 2022 10:52:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 22C688E0002; Fri, 16 Dec 2022 10:52:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0CE488E0008; Fri, 16 Dec 2022 10:52:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id EE2DF8E0002 for ; Fri, 16 Dec 2022 10:52:34 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 9DF6EA775F for ; Fri, 16 Dec 2022 15:52:34 +0000 (UTC) X-FDA: 80248611828.19.5017F14 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf13.hostedemail.com (Postfix) with ESMTP id 9CFDC2000B for ; Fri, 16 Dec 2022 15:52:32 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Jnm0OOLM; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf13.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671205952; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SUHaufgIZGXg7uWbITubevKDDOKyb4JchyF1bpwW4KQ=; b=1xPctR5V39UOS86HvR3uiURSgooFuuZ0HMGI8MqMX+ztubxs5yzPr7Wcn2R0mGA3IX2GsW 1c+pyg6OdtdQ/U3abj+ptQglvRq9FS4HikqOeV4i0LbkSv2ox2wKl80oYvnAdr8/2jfKbj SlImGrSuaoKqRDTwvmzyiYle+QWVCjM= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Jnm0OOLM; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf13.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671205952; a=rsa-sha256; cv=none; b=41SdDFDK1TiNbLkbxGcTxl5A88lU4fX5S/t+LmOxQ5fYdYiBFETGU/CtEfLrwQzV9X/UYO M1wb4gjCN6AGkLmE8eZLhwOo118GOrxXTR8E8w0rNPS+PQamkHkPCsnlz+gTe4PkTVpwfn HF62RFF4Wzzh/Q07Hyg1vOrXsC+79cs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671205952; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SUHaufgIZGXg7uWbITubevKDDOKyb4JchyF1bpwW4KQ=; b=Jnm0OOLMGa8ZduX8a8Wt4+UK/2SuR3A/7tuy4+lRA7+xDAuw8/aNHB2diLLy8cMcpQAF4Y CPPnd5O1jt4bMEuSlCcxqUOgt2sGXrY3DpAr5MAVtjsCYHkxQt20ofk8neCc8Niux1mIyd 5I59y2ah3ZHjMaMQH4Q1L8M3huskxQE= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-468-JG4wsg4cOhmtkoA4zEBfkQ-1; Fri, 16 Dec 2022 10:52:30 -0500 X-MC-Unique: JG4wsg4cOhmtkoA4zEBfkQ-1 Received: by mail-qv1-f71.google.com with SMTP id kl19-20020a056214519300b004e174020eebso1655618qvb.23 for ; Fri, 16 Dec 2022 07:52:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SUHaufgIZGXg7uWbITubevKDDOKyb4JchyF1bpwW4KQ=; b=C8HIlJGeaSrgd6mZTT6gmtUKNBdLOtIjUHnHz3O7h4AM/afAR9YlG95X9q3IJUp3X/ zzxW6yZyxwkrnvnINK9eAJAscHyogtyyeLxnbKNDLhtURRL6ptlqKex+ir63enGShs/n 0vJJrx2orEzGV+0ZMKoQVVZ2DAz8v5WSe0xtvrS3XPMN+y/DSQ5h53T+5fGfXAFu24/Q 9WKDC/XmXKPad0KD1Dp0U49NSvfrWM6/2yq3B5dBUjOSZNWN1szeIVGS9wmXtuhvAc/N bUYkNiFUxb5nmQmMeSwGTC1o63xovyg6jwoNWxxl6cK8PLGPpywCzlbpGPO/tfGjH9H1 fcfQ== X-Gm-Message-State: ANoB5pmLByl36gbc8ldif+loptOgh1019uBQPUWYpPjtxaRB2IO3e4ro 1Qc5ixTQLZWlRmZWgWco2apiDwas06n3xrkehVTRvTgRDSpIlevFneWudLsUwkzhusG+zZJijhq Kc+gr1p3y7qA= X-Received: by 2002:a05:622a:1652:b0:3a8:1600:e60f with SMTP id y18-20020a05622a165200b003a81600e60fmr42089078qtj.14.1671205949523; Fri, 16 Dec 2022 07:52:29 -0800 (PST) X-Google-Smtp-Source: AA0mqf6EvF97ux4AdUPswvdTGfH5GWS4ADEWRFVyxzLx9ScIIRjScwUEcGsBcdGYmcjjmGBVg0epbg== X-Received: by 2002:a05:622a:1652:b0:3a8:1600:e60f with SMTP id y18-20020a05622a165200b003a81600e60fmr42089044qtj.14.1671205949310; Fri, 16 Dec 2022 07:52:29 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-45-70-31-26-132.dsl.bell.ca. [70.31.26.132]) by smtp.gmail.com with ESMTPSA id h9-20020ac81389000000b003a530a32f67sm1472717qtj.65.2022.12.16.07.52.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Dec 2022 07:52:28 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: David Hildenbrand , John Hubbard , Muchun Song , Mike Kravetz , Nadav Amit , Andrea Arcangeli , Rik van Riel , peterx@redhat.com, Miaohe Lin , Jann Horn , James Houghton , Andrew Morton Subject: [PATCH v4 8/9] mm/hugetlb: Make walk_hugetlb_range() safe to pmd unshare Date: Fri, 16 Dec 2022 10:52:26 -0500 Message-Id: <20221216155226.2043738-1-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221216155100.2043537-1-peterx@redhat.com> References: <20221216155100.2043537-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Queue-Id: 9CFDC2000B X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: x1pmin6ncxo1ez8u5e4cm19tfyusd9ug X-HE-Tag: 1671205952-221471 X-HE-Meta: U2FsdGVkX182fSDXjzqHHSEnGoKvKTI0kTajXquKVdFYJSCtmrsmDrhAa9g6RLH3sosk9GaoklzXkWZ0hyIYEvG4d4AeEr5igUfzx2fNTr8iu1Xt+Lp/6IxOzuSb8YLEhGq3xBgjKdOJI16Iq/GmZW9RaO45RD5+/Uu8aK6oxszZkcM+hT5iBfzy793r1i6I+eKVMO99rq5ZVe7YxUyyG0bMoHPfMH9lAlovtYbcGSTwi2L30n6Vosve3+heanpqXKC6Ak8YroY3MpUKXoFBcg4pXPXtISdIl7p8RSa+5/gjia3+Xw4SJFpHqVMejK3VbY9fvSgOjGfOmQFdsEszWKORrjp3sspJz9jjK1KB72WBBl80IqpXse10W/p9c672ftwZk7c/IpRbFyisbQiX39i7tshBSQxEkZQJIV7qz+OhELKhd1YYJMZR6/sy06TODhPpPV6a3ir1jnI7e2QvKPC6rdPURygm7UpmIWF1XxnlbFKHpFfbCqQnJ1+pV+i3ULj6x2HMDNH9/HsU9rC+2ySasrNnTdja1Lg8iPVKaBYAQjxgC6DQqIvp9dpeTM3/vj2omYLb9gtKhAvAdEjsntv7zGBVWsEiwK46mmz8UxVPpBu+C8/VZMqyuaPkZL0VnEGd3jhX9lnioXVU+FQZnZhyroL6GsiLPJRf/BcETdlox70jony1YZTBHR5yJTmeHowv5j4jBdIBHJ62Nc9J2uuUkt6J8WibhKO3YAOAdpexI4OfY7vBaOW/8LG3VMFQjJoL2ABJvq+ZNgekTd5lqQZsshSpNMHFm53pwI/imtjsLJq8E1AkHNtIRQjhSguX4Uu9kci57p81GI0EAK15Isg4zrueMVASr0iVEti2QTs2NgVX93YWeiuNrARq8jJgdkiTILadbI3x8PKCZQgMYhCPSSCWF7vvIrvjX6beGrGueI+bGHEi5N+9Gd/S4iB6fsr9N6LmoYJINPFumOk 8ZkXJM42 HxdpnafeQ8rbeej0DSu/+AlwdVdTGP3OpbYjC5Kxw6+5mRXNVozMtk0KvKkLx7/aDlZB86kVbB3u8iwAf4rtrObYtVczDBOGHGPXlyZ4L+nHkcAu7y942Da2om1h0Xd2Rkz5nOGoeiLbOnEOYiOGhI7rKP7vOV0na2su0Rlh742/Fv/gv+oAE2yQbTpvNVbe6O3tl0p1TVGyWICuxWI8bHTC0laDgSe4VsIMg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Since walk_hugetlb_range() walks the pgtable, it needs the vma lock to make sure the pgtable page will not be freed concurrently. Reviewed-by: Mike Kravetz Reviewed-by: John Hubbard Signed-off-by: Peter Xu --- include/linux/pagewalk.h | 11 ++++++++++- mm/hmm.c | 15 ++++++++++++++- mm/pagewalk.c | 2 ++ 3 files changed, 26 insertions(+), 2 deletions(-) diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index 959f52e5867d..27a6df448ee5 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -21,7 +21,16 @@ struct mm_walk; * depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:PMD. * Any folded depths (where PTRS_PER_P?D is equal to 1) * are skipped. - * @hugetlb_entry: if set, called for each hugetlb entry + * @hugetlb_entry: if set, called for each hugetlb entry. This hook + * function is called with the vma lock held, in order to + * protect against a concurrent freeing of the pte_t* or + * the ptl. In some cases, the hook function needs to drop + * and retake the vma lock in order to avoid deadlocks + * while calling other functions. In such cases the hook + * function must either refrain from accessing the pte or + * ptl after dropping the vma lock, or else revalidate + * those items after re-acquiring the vma lock and before + * accessing them. * @test_walk: caller specific callback function to determine whether * we walk over the current vma or not. Returning 0 means * "do page table walk over the current vma", returning diff --git a/mm/hmm.c b/mm/hmm.c index 3850fb625dda..796de6866089 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -493,8 +493,21 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, required_fault = hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags); if (required_fault) { + int ret; + spin_unlock(ptl); - return hmm_vma_fault(addr, end, required_fault, walk); + hugetlb_vma_unlock_read(vma); + /* + * Avoid deadlock: drop the vma lock before calling + * hmm_vma_fault(), which will itself potentially take and + * drop the vma lock. This is also correct from a + * protection point of view, because there is no further + * use here of either pte or ptl after dropping the vma + * lock. + */ + ret = hmm_vma_fault(addr, end, required_fault, walk); + hugetlb_vma_lock_read(vma); + return ret; } pfn = pte_pfn(entry) + ((start & ~hmask) >> PAGE_SHIFT); diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 7f1c9b274906..d98564a7be57 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -302,6 +302,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, const struct mm_walk_ops *ops = walk->ops; int err = 0; + hugetlb_vma_lock_read(vma); do { next = hugetlb_entry_end(h, addr, end); pte = huge_pte_offset(walk->mm, addr & hmask, sz); @@ -314,6 +315,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, if (err) break; } while (addr = next, addr != end); + hugetlb_vma_unlock_read(vma); return err; }