From patchwork Tue Oct 25 22:01:08 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ira Weiny <ira.weiny@intel.com>
X-Patchwork-Id: 13019920
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D16E0ECDFA1
	for <linux-mm@archiver.kernel.org>; Tue, 25 Oct 2022 22:01:15 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 04ABC8E0002; Tue, 25 Oct 2022 18:01:15 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id F3CFF8E0001; Tue, 25 Oct 2022 18:01:14 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id E056A8E0002; Tue, 25 Oct 2022 18:01:14 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com
 [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id D03C48E0001
	for <linux-mm@kvack.org>; Tue, 25 Oct 2022 18:01:14 -0400 (EDT)
Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay03.hostedemail.com (Postfix) with ESMTP id A4D0AA0EB2
	for <linux-mm@kvack.org>; Tue, 25 Oct 2022 22:01:14 +0000 (UTC)
X-FDA: 80060843268.21.B8360A6
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
	by imf24.hostedemail.com (Postfix) with ESMTP id 1A4A3180045
	for <linux-mm@kvack.org>; Tue, 25 Oct 2022 22:01:12 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1666735273; x=1698271273;
  h=from:to:cc:subject:date:message-id:mime-version:
   content-transfer-encoding;
  bh=CI7ouPnQbhvab5W2CvjMw4Hb6IV4gsALSSktGnfsUlM=;
  b=FqQItmz6B8ePeTOloWRareSf3OmRM/TogDKzoJff/swmhWmgLswcg1Kv
   x3MigDTPVzyh9PV60rossUIUCyrXO9ri5rEC21Hb+ZuHOr+AuoIOsoXzI
   mXNtxzMVH7TTXBbT77SrJYyLt+eN98bIjhPwRweVVbgc1mcLz7b9Vtg1W
   9ZLsdlxYVgCmIb1KURF6RsuVUQucw8HlGuZEHAQv8rT4MRoryvE0xPMTa
   qAvTNT5ExV9ecfZDiMDpx21nHJ+FrsCCoqXPsBlzxyvSJcN2aQ9Nl3c/L
   tHliUCHCpGJ/xYGdSdVwCprFk78ylLX/pIxV8zi1Adv30Sx+XfKDk6r2a
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10511"; a="372011729"
X-IronPort-AV: E=Sophos;i="5.95,213,1661842800";
   d="scan'208";a="372011729"
Received: from fmsmga006.fm.intel.com ([10.253.24.20])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 25 Oct 2022 15:01:11 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10511"; a="876964588"
X-IronPort-AV: E=Sophos;i="5.95,213,1661842800";
   d="scan'208";a="876964588"
Received: from cckuo-mobl1.amr.corp.intel.com (HELO localhost)
 ([10.212.218.44])
  by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 25 Oct 2022 15:01:10 -0700
From: ira.weiny@intel.com
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Ira Weiny <ira.weiny@intel.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	Peter Xu <peterx@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	kernel test robot <yujie.liu@intel.com>,
	linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH V2] mm/shmem: Ensure proper fallback if page faults
Date: Tue, 25 Oct 2022 15:01:08 -0700
Message-Id: <20221025220108.2366043-1-ira.weiny@intel.com>
X-Mailer: git-send-email 2.37.2
MIME-Version: 1.0
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666735273; a=rsa-sha256;
	cv=none;
	b=lpFy7bem81yepilS4DBV5CKKvzDttT4J+OUpmsoa/mtrYfoVGcyTtVlIlN0XMkHXAif0hu
	WmppTLzoIcQIfjgI0zQCS+K7Awt54Aqpjd/O5o3f5NvxJfe3rH0/zaiV3ureiqwUnt+DgX
	+djW2ScVbE4agBbMAoTBJ++Z51K26nE=
ARC-Authentication-Results: i=1;
	imf24.hostedemail.com;
	dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel
 header.b=FqQItmz6;
	spf=pass (imf24.hostedemail.com: domain of ira.weiny@intel.com designates
 134.134.136.100 as permitted sender) smtp.mailfrom=ira.weiny@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=hostedemail.com;
	s=arc-20220608; t=1666735273;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:references:dkim-signature;
	bh=sf6AHT22uxOV8743OChoUTxgqxbjH8dJqeUD4RjTE3Q=;
	b=KnQh9qTryi76CdflmgF8roPpWUaXBZdY54AnWSizITdpGCSmaV6xEEBhGLLAQGA0fKFcbC
	xq3OaXD70k5NwgK1U9+2riyoy3/3nu90nPObJsjJaUxkl1c+D2HNSRY0nwzGRIK5sr/KSc
	sBw1Dpo3uz8szW207UgL53LopuRtElU=
X-Rspamd-Queue-Id: 1A4A3180045
X-Rspam-User: 
Authentication-Results: imf24.hostedemail.com;
	dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel
 header.b=FqQItmz6;
	spf=pass (imf24.hostedemail.com: domain of ira.weiny@intel.com designates
 134.134.136.100 as permitted sender) smtp.mailfrom=ira.weiny@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
X-Rspamd-Server: rspam04
X-Stat-Signature: dzujkmjrx6n9e5uxjkd76fxfsy4wpikt
X-HE-Tag: 1666735272-496996
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

From: Ira Weiny <ira.weiny@intel.com>

The kernel test robot flagged a recursive lock as a result of a
conversion from kmap_atomic() to kmap_local_folio()[Link]

The cause was due to the code depending on the kmap_atomic() side effect
of disabling page faults.  In that case the code expects the fault to
fail and take the fallback case.

git archaeology implied that the recursion may not be an actual bug.[1]
However, depending on the implementation of the mmap_lock and the
condition of the call there may still be a deadlock.[2]  So this is not
purely a lockdep issue.  Considering a single threaded call stack there
are 3 options.

	1) Different mm's are in play (no issue)
	2) Readlock implementation is recursive and same mm is in play
	   (no issue)
	3) Readlock implementation is _not_ recursive (issue)

The mmap_lock is recursive so with a single thread there is no issue.

However, Matthew pointed out a deadlock scenario when you consider
additional process' and threads thusly.

"The readlock implementation is only recursive if nobody else has taken
a write lock.  If you have a multithreaded process, one of the other
threads can call mmap() and that will prevent recursion (due to
fairness).  Even if it's a different process that you're trying to
acquire the mmap read lock on, you can still get into a deadly embrace.
eg:

process A thread 1 takes read lock on own mmap_lock
process A thread 2 calls mmap, blocks taking write lock
process B thread 1 takes page fault, read lock on own mmap lock
process B thread 2 calls mmap, blocks taking write lock
process A thread 1 blocks taking read lock on process B
process B thread 1 blocks taking read lock on process A

Now all four threads are blocked waiting for each other."

Regardless using pagefault_disable() ensures that no matter what locking
implementation is used a deadlock will not occur.  Add an explicit
pagefault_disable() and a big comment to explain this for future souls
looking at this code.

[1] https://lore.kernel.org/all/Y1MymJ%2FINb45AdaY@iweiny-desk3/
[2] https://lore.kernel.org/lkml/Y1bXBtGTCym77%2FoD@casper.infradead.org/

Fixes: 7a7256d5f512 ("shmem: convert shmem_mfill_atomic_pte() to use a folio")
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/r/202210211215.9dc6efb5-yujie.liu@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
Changes from V1
	Update the commit message and comment based on additional
	discussion.

Thanks to Matt for pointing out the deadlock potential despite recursive
reads.
Thanks to Matt and Andrew for initial diagnosis.
Thanks to Randy for pointing out C code needs ';'  :-D
Thanks to Andrew for suggesting an elaborate comment
Thanks to Peter for pointing out that the mm's may be the same.
---
 mm/shmem.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index 8280a5cb48df..c1d8b8a1aa3b 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2424,9 +2424,26 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 
 		if (!zeropage) {	/* COPY */
 			page_kaddr = kmap_local_folio(folio, 0);
+			/*
+			 * The read mmap_lock is held here.  Despite the
+			 * mmap_lock being read recursive a deadlock is still
+			 * possible if a writer has taken a lock.  For example:
+			 *
+			 * process A thread 1 takes read lock on own mmap_lock
+			 * process A thread 2 calls mmap, blocks taking write lock
+			 * process B thread 1 takes page fault, read lock on own mmap lock
+			 * process B thread 2 calls mmap, blocks taking write lock
+			 * process A thread 1 blocks taking read lock on process B
+			 * process B thread 1 blocks taking read lock on process A
+			 *
+			 * Disable page faults to prevent potential deadlock
+			 * and retry the copy outside the mmap_lock.
+			 */
+			pagefault_disable();
 			ret = copy_from_user(page_kaddr,
 					     (const void __user *)src_addr,
 					     PAGE_SIZE);
+			pagefault_enable();
 			kunmap_local(page_kaddr);
 
 			/* fallback to copy_from_user outside mmap_lock */