From patchwork Tue Oct 25 22:01:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 13019921 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C47A8C38A2D for ; Tue, 25 Oct 2022 22:01:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 469508E0003; Tue, 25 Oct 2022 18:01:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 419288E0001; Tue, 25 Oct 2022 18:01:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E0CB8E0003; Tue, 25 Oct 2022 18:01:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1EA928E0001 for ; Tue, 25 Oct 2022 18:01:43 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id BB024C0E46 for ; Tue, 25 Oct 2022 22:01:42 +0000 (UTC) X-FDA: 80060844444.30.F23DD8C Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf07.hostedemail.com (Postfix) with ESMTP id E4DDA40014 for ; Tue, 25 Oct 2022 22:01:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666735300; x=1698271300; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=XPIZHNC/KLHq2wwLZzF0y3YfGyClXuQbuFbE/PrASB8=; b=ahCORWKnJ8TJteR4IImz+v5EN1/izv0o8FqyZwSo4j180SBnjFWs5ie3 voprcBgIoNwfMtxZRwMBeRdRv59EXwYt2y6epYBDe/ViZv7MIeG9aSFrR NRanq/XICJRHhQgwT6ONYJfUPjEfprrkYwIHx7TNgHmG/C40gXKulUzm4 nhNpfI4uttBZv120BNxWyCfHLsx7Zc+1PssTZ13Xa4px0XRakAzw8YDWr OqYCBKcod3jLaLBtEaG5bahyUbS0JJcAuuorv+hf2YVJseGNeUD/h12cQ 4cTx2wJS33qTKdl547XR0snYcHBgP5eY8gQ54bFrbIng9E36wLBCAySTe Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10511"; a="307797766" X-IronPort-AV: E=Sophos;i="5.95,213,1661842800"; d="scan'208";a="307797766" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Oct 2022 15:01:39 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10511"; a="737000608" X-IronPort-AV: E=Sophos;i="5.95,213,1661842800"; d="scan'208";a="737000608" Received: from cckuo-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.218.44]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Oct 2022 15:01:39 -0700 From: ira.weiny@intel.com To: Andrew Morton Cc: Ira Weiny , Matthew Wilcox , Andrea Arcangeli , Peter Xu , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH V2] mm/userfaultfd: Replace kmap/kmap_atomic() with kmap_local_page() Date: Tue, 25 Oct 2022 15:01:36 -0700 Message-Id: <20221025220136.2366143-1-ira.weiny@intel.com> X-Mailer: git-send-email 2.37.2 MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666735301; a=rsa-sha256; cv=none; b=MgpjmqDLibul8zAvsu64fUqsGGr2tABTTvdCKpL0gMYPvER5aTDWrA8xzoJASWdCAT0xqX 4f1AaxQFIQvWvOnXUA4IbHHu9ayUjq9/+AK4J/nKrxf2g7VMB93nYejlYF20/pgNCcaHvd a3ZeE+uewYmnZujS+/FVoVvXg3cC6SE= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=ahCORWKn; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf07.hostedemail.com: domain of ira.weiny@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ira.weiny@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666735301; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=xcEr+PMTrtwh690Cfngwy8lwmBhmBeuXjkgF1G//yAM=; b=ZHuqOOMllBFEAEziYuNyqE1gQkOMtTig5alDcajbI51HsRpSenDXpak5PZ60RC1V8o9es5 SpZMEJLosDisE4SIVy+9MVV5P43AdeOc20N9DGqJBas3tb4WuZY7NN7eSr0XDZQJmYZ8j5 YfsK7Po5U2XoKoLYoNua2/8vdXWpsCw= X-Rspamd-Queue-Id: E4DDA40014 Authentication-Results: imf07.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=ahCORWKn; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf07.hostedemail.com: domain of ira.weiny@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ira.weiny@intel.com X-Rspamd-Server: rspam12 X-Rspam-User: X-Stat-Signature: fg47cx6ti383pa37nap17w8h966514nd X-HE-Tag: 1666735300-392926 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ira Weiny kmap() and kmap_atomic() are being deprecated in favor of kmap_local_page() which is appropriate for any thread local context.[1] A recent locking bug report with userfaultfd showed that the conversion of the kmap_atomic()'s in those code flows requires care with regard to the prevention of deadlock.[2] git archaeology implied that the recursion may not be an actual bug.[3] However, depending on the implementation of the mmap_lock and the condition of the call there may still be a deadlock.[4] So this is not purely a lockdep issue. Considering a single threaded call stack there are 3 options. 1) Different mm's are in play (no issue) 2) Readlock implementation is recursive and same mm is in play (no issue) 3) Readlock implementation is _not_ recursive (issue) The mmap_lock is recursive so with a single thread there is no issue. However, Matthew pointed out a deadlock scenario when you consider additional process' and threads thusly. "The readlock implementation is only recursive if nobody else has taken a write lock. If you have a multithreaded process, one of the other threads can call mmap() and that will prevent recursion (due to fairness). Even if it's a different process that you're trying to acquire the mmap read lock on, you can still get into a deadly embrace. eg: process A thread 1 takes read lock on own mmap_lock process A thread 2 calls mmap, blocks taking write lock process B thread 1 takes page fault, read lock on own mmap lock process B thread 2 calls mmap, blocks taking write lock process A thread 1 blocks taking read lock on process B process B thread 1 blocks taking read lock on process A Now all four threads are blocked waiting for each other." Regardless using pagefault_disable() ensures that no matter what locking implementation is used a deadlock will not occur. Complete kmap conversion in userfaultfd by replacing the kmap() and kmap_atomic() calls with kmap_local_page(). When replacing the kmap_atomic() call ensure page faults continue to be disabled to support the correct fall back behavior and add a comment to inform future souls of the requirement. [1] https://lore.kernel.org/all/20220813220034.806698-1-ira.weiny@intel.com/ [2] https://lore.kernel.org/all/Y1Mh2S7fUGQ%2FiKFR@iweiny-desk3/ [3] https://lore.kernel.org/all/Y1MymJ%2FINb45AdaY@iweiny-desk3/ [4] https://lore.kernel.org/lkml/Y1bXBtGTCym77%2FoD@casper.infradead.org/ Cc: Matthew Wilcox Cc: Andrew Morton Cc: Andrea Arcangeli Cc: Peter Xu Signed-off-by: Ira Weiny --- Changes from V1 Update the commit message and comment based on additional discussion Thanks to Matt for pointing out the deadlock potential despite recursive reads. --- mm/userfaultfd.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e24e8a47ce8a..3d0fef3980b3 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -157,11 +157,28 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm, if (!page) goto out; - page_kaddr = kmap_atomic(page); + page_kaddr = kmap_local_page(page); + /* + * The read mmap_lock is held here. Despite the + * mmap_lock being read recursive a deadlock is still + * possible if a writer has taken a lock. For example: + * + * process A thread 1 takes read lock on own mmap_lock + * process A thread 2 calls mmap, blocks taking write lock + * process B thread 1 takes page fault, read lock on own mmap lock + * process B thread 2 calls mmap, blocks taking write lock + * process A thread 1 blocks taking read lock on process B + * process B thread 1 blocks taking read lock on process A + * + * Disable page faults to prevent potential deadlock + * and retry the copy outside the mmap_lock. + */ + pagefault_disable(); ret = copy_from_user(page_kaddr, (const void __user *) src_addr, PAGE_SIZE); - kunmap_atomic(page_kaddr); + pagefault_enable(); + kunmap_local(page_kaddr); /* fallback to copy_from_user outside mmap_lock */ if (unlikely(ret)) { @@ -646,11 +663,11 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, mmap_read_unlock(dst_mm); BUG_ON(!page); - page_kaddr = kmap(page); + page_kaddr = kmap_local_page(page); err = copy_from_user(page_kaddr, (const void __user *) src_addr, PAGE_SIZE); - kunmap(page); + kunmap_local(page_kaddr); if (unlikely(err)) { err = -EFAULT; goto out;