From patchwork Thu Jul 20 19:34:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jann Horn X-Patchwork-Id: 13321010 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9FC89EB64DA for ; Thu, 20 Jul 2023 19:34:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 198FD280156; Thu, 20 Jul 2023 15:34:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 149B728004C; Thu, 20 Jul 2023 15:34:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F2BDB280156; Thu, 20 Jul 2023 15:34:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id DE33928004C for ; Thu, 20 Jul 2023 15:34:47 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A840C14026C for ; Thu, 20 Jul 2023 19:34:47 +0000 (UTC) X-FDA: 81032992614.17.24C859B Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by imf23.hostedemail.com (Postfix) with ESMTP id DAC45140025 for ; Thu, 20 Jul 2023 19:34:45 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=th0hsoe8; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of jannh@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689881686; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=t32Deh0rq1ive9pTh7+v9bOa6uVnCR209D3Ju5e1XgA=; b=JIJ+ZV3FqzEm5NgXyyLx+anpzWHTmn8TDs07tPXderzkC0h3Uh1SF+aI+bmkY9Zg3uLFUa rnl/OCa5PRi/JEdzW6hzK0YFXVxgjQVlnyFyHnd8BumzNAAGOcSs5PMhsC2TulPU13OtIJ MfztcbAChlwJU/UEsMydgkHNi9f40ek= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=th0hsoe8; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of jannh@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689881686; a=rsa-sha256; cv=none; b=wKx/ap/94yKa4+J8N+oVKBac50rQuTSwOfzFpTHcShK6Ukj3zV2GIpTYHP4jZfcEv1R6o6 eJs/jBgJL+p1pdClox01+lcQetvs1N1EPQyuS/QKBJTmdUBrfVNvOyQ8MswvFS8EBvUVAQ DV6O+tD7z8u8WvLTErOemAeFwsUCMOg= Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-516500163b2so14a12.1 for ; Thu, 20 Jul 2023 12:34:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689881684; x=1690486484; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=t32Deh0rq1ive9pTh7+v9bOa6uVnCR209D3Ju5e1XgA=; b=th0hsoe8iAVfUQpoHROmt10fhpl4jBCsJhrVBKS1gSyCfcM9WdbyDtBevXnASCiXLZ /5gGvHfA3Ly1z4ShyZNYI6Ouineppw63pczjip6MfLVuuwHYgkL0kRoOvBZtGR+HU+9q lwjGA5WGrltSoNwL++z3wDvNYxxtQ103MVoXWK3270l+X0wJamiZ0ex+3ta+zbz6Itp0 IZLPLocq/lHtU2y6aRUNawA6J3tjhrQP/W4JtdN1rfjtmvEWrBrdxyGYQ7vFsWF27IOV 3L6g9yU7Vse8qeHAsFjCESR10hy782RK7M567CrQ2eduKKro7rp+QhIzT19cwJgY7rLL Qavw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689881684; x=1690486484; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=t32Deh0rq1ive9pTh7+v9bOa6uVnCR209D3Ju5e1XgA=; b=GgUkTANGCsdUndboF8SzIpYsVadeJFVir4uW3VVevlcvHufvSSipAuc38sp+wKYtWW D1NnhpYHE+zva7SJgENWYi/2q2e7qxlnTiBlY1cnBaCS8h/k6SRRj9x3kdsj/S2VbI4d qk4XUkVRq0lddl9B3X270AqqsjKwf6EQlncT7E6NtKn5ngySypabB8BPyduFlHgbMxRH wx6Pm5+7h23TmxA++ZcTmNsIsYEIcODOH0qIep0LGYbOizgB1ghZkctw5MR+NdFIWwlr 7XBNLNboAE8LFzSxOXl5EmIXR7ZqKNG5XSY3ywVsVP6CVlHIg8wprl5tENjjHtzBCt/r +hbg== X-Gm-Message-State: ABy/qLbNFQXg/KjSCIFi1E+mVBW0jb1f2ZXA2nV35eCmK+rxn3jPgUYt ghH63IZQM05cPG866oCSXnNq7Q== X-Google-Smtp-Source: APBJJlEUI0k7zCtePQO9GpQLmLYNX6ZSbQuzDmKpGLap3hFDkG6Np3m+JsOlNPqAXd/RvLATa1bjAA== X-Received: by 2002:a50:99d7:0:b0:50b:c48c:8a25 with SMTP id n23-20020a5099d7000000b0050bc48c8a25mr17186edb.6.1689881684072; Thu, 20 Jul 2023 12:34:44 -0700 (PDT) Received: from localhost ([2a00:79e0:9d:4:5f41:554c:a4f4:69d]) by smtp.gmail.com with ESMTPSA id 12-20020a05600c020c00b003fc16ee2864sm1815609wmi.48.2023.07.20.12.34.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Jul 2023 12:34:43 -0700 (PDT) From: Jann Horn To: Andrew Morton Cc: Suren Baghdasaryan , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2] mm: Don't drop VMA locks in mm_drop_all_locks() Date: Thu, 20 Jul 2023 21:34:36 +0200 Message-ID: <20230720193436.454247-1-jannh@google.com> X-Mailer: git-send-email 2.41.0.487.g6d72f3e995-goog MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: nkxcsxhsmdg1f4uyknohxbak3jozoqht X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: DAC45140025 X-HE-Tag: 1689881685-754056 X-HE-Meta: U2FsdGVkX193WW9UsvHxSm5PNb04jBdZ7OU7eo5mdDNn7oXabXAOUrZenYo73Lkh6gEY6/VS0h9MTVPdccaviJjUp08LZYpTSgMHAufd0JMudoAC0+oVNZ22VAz5cUC/nwjfl4uc2gEVARUx+XnDzj66y2TnPMHEKZXaNGWcOXxS2QcNnYYfk4SvL4gNrCCvZT62JP85GfBr9xJBoj1BHBFnVy5okKszdhAk+WYbPoGnLTBeX874q05HHE8+BmCPRAaEd1YfV0g1FpguSDkE/cYS1dJLlwR/E0EhgQSNl7cftGbrDHiSWyqZp5hI674slvvcE1gUv4jPyJ2lYpRnozM5MVCImFH9SE7zFth75lIvS1ISWgRmuguVSUyE4qeOr7whcOZVqCk4miG8ZpvlMab+hH2M3+26vNAPkRp+haprfck/6FMhwH31O5+aQvyMtYu9jW+IklilBvmXZT6nwoxC29/H2nGhVoavdf0t8c5gjQ/mQf/dOxw8s53S4XPBWdt+hyCrVjiZl4KfP3GZXyf7WKfARxET8k78I/eUW47SYr75+F3JQ/+Bsk31b6tgx1FK7gJJ39p1eBwhPJN5ZejZoeR5dw2qYP5pTW02hZxCfM3FnufmOowdq/luMK8VnIlGG0iDvicmPpC5FXmmNXSQ1UMCR9On/KftDLDic+2Kr0VGoshngNAbMXAi/wSKp32jK72DeF4+bO5EbVsSAtLdKAQaxpvXYS7MRT9xICcA+Mm8dxPpZhKHpxj0QWZwu9eIpmjh7vQabDm1fZyDTFYHevuaQlXmG1Mn9AHyu1nwMM85ooqDoGytMCJr45+YygikhRGGqCEYuDnhjlE1Bh9FBIv9TF7xl9ewYLZekR1WISVulACVoDIeLP3ayR1zBhahWJSpLorXA+MCULdLTZEW/JwpOdq1oG4FhIkwN+He0iqB0kw2C55km7qBd25pVW35mTGykj7XMrZQEr/ YfGD9AvB e0Zn5y3KUR8l21thB47rqLoX7p67rb06AultanVIO+2knAPGgBtUMtjIY/5+hZ/vZEU4DIkuoHPnmUuTf/hKbgmKHZDAWWPU2n1Dg/Mi4D7zvT0UkMRBXaBjLuB978Ya6mZD0TML2lyySYxNS8i/4QThSGC7cos3VCpCIf2hW0Xwh2d8+IzEMq7+HLHgU0xXH8iIp89DX9UAodfn2CVRslrN8RQoowbuAS+lmvFEwvpOlDSe31sc0NGq2ZPwz6d4pL/lOnnaCKZjlzHorJKtqkoDWwukvF5JaJVdCfN+X2RtTcQGUUZVsEadFOyiJIb0JpIuKakzCL1rsXivtgS/yPXU6HEB9ZRsL2w9iBthTqN9YVdXIQRU70z1GFjQx3E17o2EWo6+lxPy08cW16rUZ8n1CBNShFjKd+uNnHvd9KKGSzo0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Despite its name, mm_drop_all_locks() does not drop _all_ locks; the mmap lock is held write-locked by the caller, and the caller is responsible for dropping the mmap lock at a later point (which will also release the VMA locks). Calling vma_end_write_all() here is dangerous because the caller might have write-locked a VMA with the expectation that it will stay write-locked until the mmap_lock is released, as usual. This _almost_ becomes a problem in the following scenario: An anonymous VMA A and an SGX VMA B are mapped adjacent to each other. Userspace calls munmap() on a range starting at the start address of A and ending in the middle of B. Hypothetical call graph with additional notes in brackets: do_vmi_align_munmap [begin first for_each_vma_range loop] vma_start_write [on VMA A] vma_mark_detached [on VMA A] __split_vma [on VMA B] sgx_vma_open [== new->vm_ops->open] sgx_encl_mm_add __mmu_notifier_register [luckily THIS CAN'T ACTUALLY HAPPEN] mm_take_all_locks mm_drop_all_locks vma_end_write_all [drops VMA lock taken on VMA A before] vma_start_write [on VMA B] vma_mark_detached [on VMA B] [end first for_each_vma_range loop] vma_iter_clear_gfp [removes VMAs from maple tree] mmap_write_downgrade unmap_region mmap_read_unlock In this hypothetical scenario, while do_vmi_align_munmap() thinks it still holds a VMA write lock on VMA A, the VMA write lock has actually been invalidated inside __split_vma(). The call from sgx_encl_mm_add() to __mmu_notifier_register() can't actually happen here, as far as I understand, because we are duplicating an existing SGX VMA, but sgx_encl_mm_add() only calls __mmu_notifier_register() for the first SGX VMA created in a given process. So this could only happen in fork(), not on munmap(). But in my view it is just pure luck that this can't happen. Also, we wouldn't actually have any bad consequences from this in do_vmi_align_munmap(), because by the time the bug drops the lock on VMA A, we've already marked VMA A as detached, which makes it completely ineligible for any VMA-locked page faults. But again, that's just pure luck. So remove the vma_end_write_all(), so that VMA write locks are only ever released on mmap_write_unlock() or mmap_write_downgrade(). Also add comments to document the locking rules established by this patch. Fixes: eeff9a5d47f8 ("mm/mmap: prevent pagefault handler from racing with mmu_notifier registration") Reviewed-by: Suren Baghdasaryan Signed-off-by: Jann Horn --- v2: - add comments (Suren) include/linux/mm.h | 5 +++++ include/linux/mmap_lock.h | 8 ++++++++ mm/mmap.c | 7 ++++++- 3 files changed, 19 insertions(+), 1 deletion(-) base-commit: bfa3037d828050896ae52f6467b6ca2489ae6fb1 diff --git a/include/linux/mm.h b/include/linux/mm.h index 2dd73e4f3d8e..20b1fe17cccf 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -680,6 +680,11 @@ static bool __is_vma_write_locked(struct vm_area_struct *vma, int *mm_lock_seq) return (vma->vm_lock_seq == *mm_lock_seq); } +/* + * Begin writing to a VMA. + * Exclude concurrent readers under the per-VMA lock until the currently + * write-locked mmap_lock is dropped or downgraded. + */ static inline void vma_start_write(struct vm_area_struct *vma) { int mm_lock_seq; diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index aab8f1b28d26..0b1d8430a638 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -73,6 +73,14 @@ static inline void mmap_assert_write_locked(struct mm_struct *mm) } #ifdef CONFIG_PER_VMA_LOCK +/* + * Drop all currently-held per-VMA locks. + * This is called from the mmap_lock implementation directly before releasing + * a write-locked mmap_lock (or downgrading it to read-locked). + * This should normally NOT be called manually from other places. + * If you want to call this manually anyway, keep in mind that this will release + * *all* VMA write locks, including ones from further up the stack. + */ static inline void vma_end_write_all(struct mm_struct *mm) { mmap_assert_write_locked(mm); diff --git a/mm/mmap.c b/mm/mmap.c index 3eda23c9ebe7..3dc509fb2102 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3662,6 +3662,12 @@ int mm_take_all_locks(struct mm_struct *mm) mutex_lock(&mm_all_locks_mutex); + /* + * vma_start_write() does not have a complement in mm_drop_all_locks() + * because vma_start_write() is always asymmetrical; it marks a VMA as + * being written to until mmap_write_unlock() or mmap_write_downgrade() + * is reached. + */ mas_for_each(&mas, vma, ULONG_MAX) { if (signal_pending(current)) goto out_unlock; @@ -3758,7 +3764,6 @@ void mm_drop_all_locks(struct mm_struct *mm) if (vma->vm_file && vma->vm_file->f_mapping) vm_unlock_mapping(vma->vm_file->f_mapping); } - vma_end_write_all(mm); mutex_unlock(&mm_all_locks_mutex); }