From patchwork Wed Jun 19 00:05:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elliot Berman X-Patchwork-Id: 13703203 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 86E68191; Wed, 19 Jun 2024 00:05:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.180.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718755535; cv=none; b=JV6F1V2mtuLvZN239cdvl/N2ipLhoJEpItBKksL+2cqI3AJ+iy+WoSvM9Itpc+Yh5I9atlgRnTuy6m+ZyL6uhhRzPAaFgzjO0b0emDQaFSViJizzWqvXErrpf9d/KZd6h0NBo5Wq5Cs6ACVy07fTJ+PcL4RvWgrLs1k2ZeGTfxo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718755535; c=relaxed/simple; bh=RDA0qEcvJY2mO+lnI56e4nT/ja162SWL44U6M9WUuCs=; h=From:Date:Subject:MIME-Version:Content-Type:Message-ID:References: In-Reply-To:To:CC; b=kzDOi2MBBKY2WBc+62n8e10wcyEveI6adjUVyxpj3k1cT5xAFc0Uvy9SHmyg6r7ZyH9/oOShE9oz3Xrzd2plhLueXmrEPnfaykRy3KBg4IxEDaXxNNWBvQXfsTJKFiLwNKkLhBYOOfoUd9nzthdirpFwppvtQPYmXw/ubLMvZ6c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=quicinc.com; spf=pass smtp.mailfrom=quicinc.com; dkim=pass (2048-bit key) header.d=quicinc.com header.i=@quicinc.com header.b=licRGVV3; arc=none smtp.client-ip=205.220.180.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=quicinc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=quicinc.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=quicinc.com header.i=@quicinc.com header.b="licRGVV3" Received: from pps.filterd (m0279872.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 45ILaNQ0001140; Wed, 19 Jun 2024 00:05:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= /OyteV4yM0qUOEafhhbvbTmI9bXZz54sg4qnQFhyy5A=; b=licRGVV3sGtCHq36 0d7qxPKp8nHzVyyyjoDtjlEIvyc2Yl9kIh+zg/1vUDXoiMeuH4qrR8zEBbzYvhCQ NUka56yx1u+Ai8QeAN5kLA66MICuH0QcNTBlA5cchSBCmxBCfR9SAatx+pkRTzTm U7JSMQHJjmxKZZ5qhWtr2sjgInmJwV+pwQWrv6eqnmFQWGNJI7kTZqe4CDsTYn+p rO7Fsy2I/oduWoMg66mamclM2wjvvYcWlIzaHJlxeGOKpUs/sDqA8E+tKKKQBQEw sznWTshLaFRzYo2tyLd5eaELSFmOvfjHrcBKRTRdLqyrO51KbVbmiZSfjF94kRxC y12OOQ== Received: from nasanppmta03.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3yuja287fu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 19 Jun 2024 00:05:18 +0000 (GMT) Received: from nasanex01b.na.qualcomm.com (nasanex01b.na.qualcomm.com [10.46.141.250]) by NASANPPMTA03.qualcomm.com (8.17.1.19/8.17.1.19) with ESMTPS id 45J05GcX029964 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 19 Jun 2024 00:05:16 GMT Received: from hu-eberman-lv.qualcomm.com (10.49.16.6) by nasanex01b.na.qualcomm.com (10.46.141.250) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.9; Tue, 18 Jun 2024 17:05:16 -0700 From: Elliot Berman Date: Tue, 18 Jun 2024 17:05:07 -0700 Subject: [PATCH RFC 1/5] mm/gup: Move GUP_PIN_COUNTING_BIAS to page_ref.h Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <20240618-exclusive-gup-v1-1-30472a19c5d1@quicinc.com> References: <20240618-exclusive-gup-v1-0-30472a19c5d1@quicinc.com> In-Reply-To: <20240618-exclusive-gup-v1-0-30472a19c5d1@quicinc.com> To: Andrew Morton , Shuah Khan , David Hildenbrand , Matthew Wilcox , CC: , , , , , , Elliot Berman , Fuad Tabba X-Mailer: b4 0.13.0 X-ClientProxiedBy: nalasex01b.na.qualcomm.com (10.47.209.197) To nasanex01b.na.qualcomm.com (10.46.141.250) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: eoEoI_FFBsbRB2SSN34e9igotn1wcWa9 X-Proofpoint-ORIG-GUID: eoEoI_FFBsbRB2SSN34e9igotn1wcWa9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-06-18_06,2024-06-17_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxlogscore=684 suspectscore=0 clxscore=1015 impostorscore=0 malwarescore=0 mlxscore=0 lowpriorityscore=0 bulkscore=0 spamscore=0 phishscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2405170001 definitions=main-2406180176 From: Fuad Tabba No functional change intended. Signed-off-by: Fuad Tabba Signed-off-by: Elliot Berman --- include/linux/mm.h | 32 -------------------------------- include/linux/page_ref.h | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 32 insertions(+), 32 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 9849dfda44d43..fd0d10b08e7ac 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1580,38 +1580,6 @@ static inline void put_page(struct page *page) folio_put(folio); } -/* - * GUP_PIN_COUNTING_BIAS, and the associated functions that use it, overload - * the page's refcount so that two separate items are tracked: the original page - * reference count, and also a new count of how many pin_user_pages() calls were - * made against the page. ("gup-pinned" is another term for the latter). - * - * With this scheme, pin_user_pages() becomes special: such pages are marked as - * distinct from normal pages. As such, the unpin_user_page() call (and its - * variants) must be used in order to release gup-pinned pages. - * - * Choice of value: - * - * By making GUP_PIN_COUNTING_BIAS a power of two, debugging of page reference - * counts with respect to pin_user_pages() and unpin_user_page() becomes - * simpler, due to the fact that adding an even power of two to the page - * refcount has the effect of using only the upper N bits, for the code that - * counts up using the bias value. This means that the lower bits are left for - * the exclusive use of the original code that increments and decrements by one - * (or at least, by much smaller values than the bias value). - * - * Of course, once the lower bits overflow into the upper bits (and this is - * OK, because subtraction recovers the original values), then visual inspection - * no longer suffices to directly view the separate counts. However, for normal - * applications that don't have huge page reference counts, this won't be an - * issue. - * - * Locking: the lockless algorithm described in folio_try_get_rcu() - * provides safe operation for get_user_pages(), page_mkclean() and - * other calls that race to set up page table entries. - */ -#define GUP_PIN_COUNTING_BIAS (1U << 10) - void unpin_user_page(struct page *page); void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, bool make_dirty); diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h index 1acf5bac7f503..e6aeaafb143ca 100644 --- a/include/linux/page_ref.h +++ b/include/linux/page_ref.h @@ -62,6 +62,38 @@ static inline void __page_ref_unfreeze(struct page *page, int v) #endif +/* + * GUP_PIN_COUNTING_BIAS, and the associated functions that use it, overload + * the page's refcount so that two separate items are tracked: the original page + * reference count, and also a new count of how many pin_user_pages() calls were + * made against the page. ("gup-pinned" is another term for the latter). + * + * With this scheme, pin_user_pages() becomes special: such pages are marked as + * distinct from normal pages. As such, the unpin_user_page() call (and its + * variants) must be used in order to release gup-pinned pages. + * + * Choice of value: + * + * By making GUP_PIN_COUNTING_BIAS a power of two, debugging of page reference + * counts with respect to pin_user_pages() and unpin_user_page() becomes + * simpler, due to the fact that adding an even power of two to the page + * refcount has the effect of using only the upper N bits, for the code that + * counts up using the bias value. This means that the lower bits are left for + * the exclusive use of the original code that increments and decrements by one + * (or at least, by much smaller values than the bias value). + * + * Of course, once the lower bits overflow into the upper bits (and this is + * OK, because subtraction recovers the original values), then visual inspection + * no longer suffices to directly view the separate counts. However, for normal + * applications that don't have huge page reference counts, this won't be an + * issue. + * + * Locking: the lockless algorithm described in folio_try_get_rcu() + * provides safe operation for get_user_pages(), page_mkclean() and + * other calls that race to set up page table entries. + */ +#define GUP_PIN_COUNTING_BIAS (1U << 10) + static inline int page_ref_count(const struct page *page) { return atomic_read(&page->_refcount); From patchwork Wed Jun 19 00:05:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elliot Berman X-Patchwork-Id: 13703207 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 41F5A17EF; Wed, 19 Jun 2024 00:05:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.180.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718755537; cv=none; b=CpOvQ11cbx77ZPe8r+pi/OeMs567T3aBQwO4EnhJfUoMsLZvVODjmQI4kPyn4X+//WmIMdZNrVvW8EwhTWUnMrGRlmP7zS+JMKBzAlNoz+rIudGQoAM/72JaAIEZI2pMHb42sIRv/d5xll+KgyRaXzjZqbhCDrVnXC8vL9xz8wI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718755537; c=relaxed/simple; bh=b++cIeHBx13QkfY4xwxrEOQsXIxlcHqPbLn3jtu1ON8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-ID:References: In-Reply-To:To:CC; b=Rqhju3ui1PEOh0FNSQv8qhTEiXIB+f2tOZMmQnNnizD8dO1Fcq/35M1y72oXUQMkDg22o//ZbF5vkHTGw7Q3YYUoXUyhahTvH7m4bgEj7rDWCAEoZm+TTPsBUqdy3pBE8CS6EV1YqGFc8NTowEygbDxSxAS7NbQDuITOk94fg+o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=quicinc.com; spf=pass smtp.mailfrom=quicinc.com; dkim=pass (2048-bit key) header.d=quicinc.com header.i=@quicinc.com header.b=ox6jmTii; arc=none smtp.client-ip=205.220.180.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=quicinc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=quicinc.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=quicinc.com header.i=@quicinc.com header.b="ox6jmTii" Received: from pps.filterd (m0279873.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 45ILcmhp008764; Wed, 19 Jun 2024 00:05:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= Pdz9/HkhqVOEA9g7xL2yq+OHaJU30ffU6niiZoYVcdE=; b=ox6jmTiiLNHGqjgh KTUecqGu4t9dr97sxEjvvPIIrWyk570X0NTtmxMw43BqntSgStcrqmf+GhulZ4sL BBsW9sIJXypcGMeWn08B7oQEsGPb/VJM81uXiwnsxc0u0eCw2y21c2ILpZ8AuCBv xcav7jkify34Sl75n+/hCEUj8SLHBK4Ibq3X7k5uQdXV1/FJt2/XY9PFb0wmtEil dPLsXNKSQ4pKBMbaIYNJHyV0uYRXI4tXEPlslPuyBWSYeS0pAo/jcOYx0etIvLIF 6NXPF4FZwuMX3PYj2zWhTdEh9vp79nZZpYGWs6W0Y16AeVuxXxeejU+VQc9bcShJ 6HKIZA== Received: from nasanppmta05.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3yuja787bg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 19 Jun 2024 00:05:18 +0000 (GMT) Received: from nasanex01b.na.qualcomm.com (nasanex01b.na.qualcomm.com [10.46.141.250]) by NASANPPMTA05.qualcomm.com (8.17.1.19/8.17.1.19) with ESMTPS id 45J05H2f006583 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 19 Jun 2024 00:05:17 GMT Received: from hu-eberman-lv.qualcomm.com (10.49.16.6) by nasanex01b.na.qualcomm.com (10.46.141.250) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.9; Tue, 18 Jun 2024 17:05:16 -0700 From: Elliot Berman Date: Tue, 18 Jun 2024 17:05:08 -0700 Subject: [PATCH RFC 2/5] mm/gup: Add an option for obtaining an exclusive pin Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <20240618-exclusive-gup-v1-2-30472a19c5d1@quicinc.com> References: <20240618-exclusive-gup-v1-0-30472a19c5d1@quicinc.com> In-Reply-To: <20240618-exclusive-gup-v1-0-30472a19c5d1@quicinc.com> To: Andrew Morton , Shuah Khan , David Hildenbrand , Matthew Wilcox , CC: , , , , , , Elliot Berman , Fuad Tabba X-Mailer: b4 0.13.0 X-ClientProxiedBy: nalasex01b.na.qualcomm.com (10.47.209.197) To nasanex01b.na.qualcomm.com (10.46.141.250) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-ORIG-GUID: vSrJ-1fL5TL-mlKFMXsXbosJI6IXdRVp X-Proofpoint-GUID: vSrJ-1fL5TL-mlKFMXsXbosJI6IXdRVp X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-06-18_06,2024-06-17_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 impostorscore=0 lowpriorityscore=0 adultscore=0 mlxlogscore=802 malwarescore=0 phishscore=0 bulkscore=0 suspectscore=0 clxscore=1015 spamscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2405170001 definitions=main-2406180176 From: Fuad Tabba Introduce the ability to obtain an exclusive long-term pin on a page. This exclusive pin can only be held if there are no other pins on the page, regular, or exclusive. Moreover, once this pin is held, no other pins can be grabbed until the exclusive pin is released. This pin is grabbed using the (new) FOLL_EXCLUSIVE flag, and is gated by the EXCLUSIVE_PIN configuration option. Similar to how the normal GUP pin is obtain, the exclusive PIN overloads the _refcount field for normal pages, or the _pincount field for large pages. It appropriates bit 30 of these two fields, which still allows the detection of overflows into bit 31. It does however, half the number of potential normals pins for a page. In order to avoid the possibility of COWing such a page, once an exclusive pin has been obtained, it's marked as AnonExclusive. Co-Developed-by: Elliot Berman Signed-off-by: Elliot Berman Signed-off-by: Fuad Tabba Signed-off-by: Elliot Berman --- include/linux/mm.h | 24 +++++ include/linux/mm_types.h | 2 + include/linux/page_ref.h | 36 +++++++ mm/Kconfig | 5 + mm/gup.c | 239 +++++++++++++++++++++++++++++++++++++++++------ 5 files changed, 279 insertions(+), 27 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index fd0d10b08e7ac..d03d62bceba08 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1583,9 +1583,13 @@ static inline void put_page(struct page *page) void unpin_user_page(struct page *page); void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, bool make_dirty); +void unpin_exc_pages_dirty_lock(struct page **pages, unsigned long npages, + bool make_dirty); void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages, bool make_dirty); void unpin_user_pages(struct page **pages, unsigned long npages); +void unpin_exc_pages(struct page **pages, unsigned long npages); +void unexc_user_page(struct page *page); static inline bool is_cow_mapping(vm_flags_t flags) { @@ -1958,6 +1962,26 @@ static inline bool folio_needs_cow_for_dma(struct vm_area_struct *vma, return folio_maybe_dma_pinned(folio); } +static inline bool folio_maybe_exclusive_pinned(const struct folio *folio) +{ + unsigned int count; + + if (!IS_ENABLED(CONFIG_EXCLUSIVE_PIN)) + return false; + + if (folio_test_large(folio)) + count = atomic_read(&folio->_pincount); + else + count = folio_ref_count(folio); + + return count >= GUP_PIN_EXCLUSIVE_BIAS; +} + +static inline bool page_maybe_exclusive_pinned(const struct page *page) +{ + return folio_maybe_exclusive_pinned(page_folio(page)); +} + /** * is_zero_page - Query if a page is a zero page * @page: The page to query diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index af3a0256fa93b..dc397e3465c23 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1465,6 +1465,8 @@ enum { * hinting faults. */ FOLL_HONOR_NUMA_FAULT = 1 << 12, + /* exclusive PIN only if there aren't other pins (including this) */ + FOLL_EXCLUSIVE = 1 << 13, /* See also internal only FOLL flags in mm/internal.h */ }; diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h index e6aeaafb143ca..9d16e1f4db094 100644 --- a/include/linux/page_ref.h +++ b/include/linux/page_ref.h @@ -94,6 +94,14 @@ static inline void __page_ref_unfreeze(struct page *page, int v) */ #define GUP_PIN_COUNTING_BIAS (1U << 10) +/* + * GUP_PIN_EXCLUSIVE_BIAS is used to grab an exclusive pin over a page. + * This exclusive pin can only be taken once, and only if no other GUP pins + * exist for the page. + * After it's taken, no other gup pins can be taken. + */ +#define GUP_PIN_EXCLUSIVE_BIAS (1U << 30) + static inline int page_ref_count(const struct page *page) { return atomic_read(&page->_refcount); @@ -147,6 +155,34 @@ static inline void init_page_count(struct page *page) set_page_count(page, 1); } +static __must_check inline bool page_ref_setexc(struct page *page, unsigned int refs) +{ + unsigned int old_count, new_count; + + if (WARN_ON_ONCE(refs >= GUP_PIN_EXCLUSIVE_BIAS)) + return false; + + do { + old_count = atomic_read(&page->_refcount); + + if (old_count >= GUP_PIN_COUNTING_BIAS) + return false; + + if (check_add_overflow(old_count, refs + GUP_PIN_EXCLUSIVE_BIAS, &new_count)) + return false; + } while (atomic_cmpxchg(&page->_refcount, old_count, new_count) != old_count); + + if (page_ref_tracepoint_active(page_ref_mod)) + __page_ref_mod(page, refs); + + return true; +} + +static __must_check inline bool folio_ref_setexc(struct folio *folio, unsigned int refs) +{ + return page_ref_setexc(&folio->page, refs); +} + static inline void page_ref_add(struct page *page, int nr) { atomic_add(nr, &page->_refcount); diff --git a/mm/Kconfig b/mm/Kconfig index b4cb45255a541..56f8c80b996f5 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1249,6 +1249,11 @@ config IOMMU_MM_DATA config EXECMEM bool +config EXCLUSIVE_PIN + def_bool y + help + Add support for exclusive pins of pages. + source "mm/damon/Kconfig" endmenu diff --git a/mm/gup.c b/mm/gup.c index ca0f5cedce9b2..7f20de33221da 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -97,6 +97,65 @@ static inline struct folio *try_get_folio(struct page *page, int refs) return folio; } +static bool large_folio_pin_setexc(struct folio *folio, unsigned int pins) +{ + unsigned int old_pincount, new_pincount; + + if (WARN_ON_ONCE(pins >= GUP_PIN_EXCLUSIVE_BIAS)) + return false; + + do { + old_pincount = atomic_read(&folio->_pincount); + + if (old_pincount > 0) + return false; + + if (check_add_overflow(old_pincount, pins + GUP_PIN_EXCLUSIVE_BIAS, &new_pincount)) + return false; + } while (atomic_cmpxchg(&folio->_pincount, old_pincount, pins) != old_pincount); + + return true; +} + +static bool __try_grab_folio_excl(struct folio *folio, int pincount, int refcount) +{ + if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_EXCLUSIVE_PIN))) + return false; + + if (folio_test_large(folio)) { + if (!large_folio_pin_setexc(folio, pincount)) + return false; + } else if (!folio_ref_setexc(folio, refcount)) { + return false; + } + + if (!PageAnonExclusive(&folio->page)) + SetPageAnonExclusive(&folio->page); + + return true; +} + +static bool try_grab_folio_excl(struct folio *folio, int refs) +{ + /* + * When pinning a large folio, use an exact count to track it. + * + * However, be sure to *also* increment the normal folio + * refcount field at least once, so that the folio really + * is pinned. That's why the refcount from the earlier + * try_get_folio() is left intact. + */ + return __try_grab_folio_excl(folio, refs, + refs * (GUP_PIN_COUNTING_BIAS - 1)); +} + +static bool try_grab_page_excl(struct page *page) +{ + struct folio *folio = page_folio(page); + + return __try_grab_folio_excl(folio, 1, GUP_PIN_COUNTING_BIAS); +} + /** * try_grab_folio() - Attempt to get or pin a folio. * @page: pointer to page to be grabbed @@ -161,19 +220,41 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) return NULL; } - /* - * When pinning a large folio, use an exact count to track it. - * - * However, be sure to *also* increment the normal folio - * refcount field at least once, so that the folio really - * is pinned. That's why the refcount from the earlier - * try_get_folio() is left intact. - */ - if (folio_test_large(folio)) - atomic_add(refs, &folio->_pincount); - else - folio_ref_add(folio, - refs * (GUP_PIN_COUNTING_BIAS - 1)); + if (unlikely(folio_maybe_exclusive_pinned(folio))) { + if (!put_devmap_managed_folio_refs(folio, refs)) + folio_put_refs(folio, refs); + return NULL; + } + + if (unlikely(flags & FOLL_EXCLUSIVE)) { + if (!try_grab_folio_excl(folio, refs)) + return NULL; + } else { + /* + * When pinning a large folio, use an exact count to track it. + * + * However, be sure to *also* increment the normal folio + * refcount field at least once, so that the folio really + * is pinned. That's why the refcount from the earlier + * try_get_folio() is left intact. + */ + if (folio_test_large(folio)) + atomic_add(refs, &folio->_pincount); + else + folio_ref_add(folio, + refs * (GUP_PIN_COUNTING_BIAS - 1)); + + if (unlikely(folio_maybe_exclusive_pinned(folio))) { + if (folio_test_large(folio)) + atomic_sub(refs, &folio->_pincount); + else + folio_put_refs(folio, + refs * (GUP_PIN_COUNTING_BIAS - 1)); + + return NULL; + } + } + /* * Adjust the pincount before re-checking the PTE for changes. * This is essentially a smp_mb() and is paired with a memory @@ -198,6 +279,26 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) refs *= GUP_PIN_COUNTING_BIAS; } + if (unlikely(flags & FOLL_EXCLUSIVE)) { + if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_EXCLUSIVE_PIN))) + goto out; + if (is_zero_folio(folio)) + return; + if (folio_test_large(folio)) { + if (WARN_ON_ONCE((atomic_read(&folio->_pincount) < GUP_PIN_EXCLUSIVE_BIAS))) + goto out; + atomic_sub(GUP_PIN_EXCLUSIVE_BIAS, &folio->_pincount); + } else { + if (WARN_ON_ONCE((unsigned int)refs >= GUP_PIN_EXCLUSIVE_BIAS)) + goto out; + if (WARN_ON_ONCE(folio_ref_count(folio) < GUP_PIN_EXCLUSIVE_BIAS)) + goto out; + + refs += GUP_PIN_EXCLUSIVE_BIAS; + } + } + +out: if (!put_devmap_managed_folio_refs(folio, refs)) folio_put_refs(folio, refs); } @@ -242,16 +343,35 @@ int __must_check try_grab_page(struct page *page, unsigned int flags) if (is_zero_page(page)) return 0; - /* - * Similar to try_grab_folio(): be sure to *also* - * increment the normal page refcount field at least once, - * so that the page really is pinned. - */ - if (folio_test_large(folio)) { - folio_ref_add(folio, 1); - atomic_add(1, &folio->_pincount); + if (unlikely(folio_maybe_exclusive_pinned(folio))) + return -EBUSY; + + if (unlikely(flags & FOLL_EXCLUSIVE)) { + if (!try_grab_page_excl(page)) + return -EBUSY; } else { - folio_ref_add(folio, GUP_PIN_COUNTING_BIAS); + /* + * Similar to try_grab_folio(): be sure to *also* + * increment the normal page refcount field at least once, + * so that the page really is pinned. + */ + if (folio_test_large(folio)) { + folio_ref_add(folio, 1); + atomic_add(1, &folio->_pincount); + } else { + folio_ref_add(folio, GUP_PIN_COUNTING_BIAS); + } + + if (unlikely(folio_maybe_exclusive_pinned(folio))) { + if (folio_test_large(folio)) { + folio_put_refs(folio, 1); + atomic_sub(1, &folio->_pincount); + } else { + folio_put_refs(folio, GUP_PIN_COUNTING_BIAS); + } + + return -EBUSY; + } } node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1); @@ -288,6 +408,9 @@ void folio_add_pin(struct folio *folio) if (is_zero_folio(folio)) return; + if (unlikely(folio_maybe_exclusive_pinned(folio))) + return; + /* * Similar to try_grab_folio(): be sure to *also* increment the normal * page refcount field at least once, so that the page really is @@ -301,6 +424,15 @@ void folio_add_pin(struct folio *folio) WARN_ON_ONCE(folio_ref_count(folio) < GUP_PIN_COUNTING_BIAS); folio_ref_add(folio, GUP_PIN_COUNTING_BIAS); } + + if (unlikely(folio_maybe_exclusive_pinned(folio))) { + if (folio_test_large(folio)) { + folio_put_refs(folio, 1); + atomic_sub(1, &folio->_pincount); + } else { + folio_put_refs(folio, GUP_PIN_COUNTING_BIAS); + } + } } static inline struct folio *gup_folio_range_next(struct page *start, @@ -355,8 +487,8 @@ static inline struct folio *gup_folio_next(struct page **list, * set_page_dirty_lock(), unpin_user_page(). * */ -void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, - bool make_dirty) +static void __unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, + bool make_dirty, unsigned int flags) { unsigned long i; struct folio *folio; @@ -395,11 +527,28 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, folio_mark_dirty(folio); folio_unlock(folio); } - gup_put_folio(folio, nr, FOLL_PIN); + gup_put_folio(folio, nr, flags); } } + +void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, + bool make_dirty) +{ + __unpin_user_pages_dirty_lock(pages, npages, make_dirty, FOLL_PIN); +} EXPORT_SYMBOL(unpin_user_pages_dirty_lock); +void unpin_exc_pages_dirty_lock(struct page **pages, unsigned long npages, + bool make_dirty) +{ + if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_EXCLUSIVE_PIN))) + return; + + __unpin_user_pages_dirty_lock(pages, npages, make_dirty, + FOLL_PIN | FOLL_EXCLUSIVE); +} +EXPORT_SYMBOL(unpin_exc_pages_dirty_lock); + /** * unpin_user_page_range_dirty_lock() - release and optionally dirty * gup-pinned page range @@ -466,7 +615,7 @@ static void gup_fast_unpin_user_pages(struct page **pages, unsigned long npages) * * Please see the unpin_user_page() documentation for details. */ -void unpin_user_pages(struct page **pages, unsigned long npages) +static void __unpin_user_pages(struct page **pages, unsigned long npages, unsigned int flags) { unsigned long i; struct folio *folio; @@ -483,11 +632,35 @@ void unpin_user_pages(struct page **pages, unsigned long npages) sanity_check_pinned_pages(pages, npages); for (i = 0; i < npages; i += nr) { folio = gup_folio_next(pages, npages, i, &nr); - gup_put_folio(folio, nr, FOLL_PIN); + gup_put_folio(folio, nr, flags); } } + +void unpin_user_pages(struct page **pages, unsigned long npages) +{ + __unpin_user_pages(pages, npages, FOLL_PIN); +} EXPORT_SYMBOL(unpin_user_pages); +void unpin_exc_pages(struct page **pages, unsigned long npages) +{ + if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_EXCLUSIVE_PIN))) + return; + + __unpin_user_pages(pages, npages, FOLL_PIN | FOLL_EXCLUSIVE); +} +EXPORT_SYMBOL(unpin_exc_pages); + +void unexc_user_page(struct page *page) +{ + if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_EXCLUSIVE_PIN))) + return; + + sanity_check_pinned_pages(&page, 1); + gup_put_folio(page_folio(page), 0, FOLL_EXCLUSIVE); +} +EXPORT_SYMBOL(unexc_user_page); + /* * Set the MMF_HAS_PINNED if not set yet; after set it'll be there for the mm's * lifecycle. Avoid setting the bit unless necessary, or it might cause write @@ -2610,6 +2783,18 @@ static bool is_valid_gup_args(struct page **pages, int *locked, if (WARN_ON_ONCE(!(gup_flags & FOLL_PIN) && (gup_flags & FOLL_LONGTERM))) return false; + /* EXCLUSIVE can only be specified when config is enabled */ + if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_EXCLUSIVE_PIN) && (gup_flags & FOLL_EXCLUSIVE))) + return false; + + /* EXCLUSIVE can only be specified when pinning */ + if (WARN_ON_ONCE(!(gup_flags & FOLL_PIN) && (gup_flags & FOLL_EXCLUSIVE))) + return false; + + /* EXCLUSIVE can only be specified when LONGTERM */ + if (WARN_ON_ONCE(!(gup_flags & FOLL_LONGTERM) && (gup_flags & FOLL_EXCLUSIVE))) + return false; + /* Pages input must be given if using GET/PIN */ if (WARN_ON_ONCE((gup_flags & (FOLL_GET | FOLL_PIN)) && !pages)) return false; From patchwork Wed Jun 19 00:05:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elliot Berman X-Patchwork-Id: 13703205 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4074317CD; Wed, 19 Jun 2024 00:05:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.180.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718755537; cv=none; b=uUxRbx+fySCzZdYG7dN3t/qRdm3aRaLrXD61SPPSnwpCnfe1icBKRYR+XOWMtylcYYCU8N3PaxJ3H2+DCNM4Kva7dwIkErB7n7w8dDdhNNvvoDxwqQNipv8IZoaBfs4kIrENfqi9aJ0jVn4yT1z+Wx4r+DqTg4vqJOWm7t2OAko= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718755537; c=relaxed/simple; bh=Cv+Hq6doZ0kmBCUdoel5OWpIUoYgi1z24PdLJXBPS4k=; h=From:Date:Subject:MIME-Version:Content-Type:Message-ID:References: In-Reply-To:To:CC; b=u+Jv02BHLDyavpBBDZMIKHN94n2hUAJEWcxB6undSFhGXZj+ZbNR7Hps2+fX9LVXVVrcEMdAhc6TgdjFHNI8a/VchPYnNcRXFvvJGa+PrgxBQxfFcFnKXARJw2voK0sEGyhBDplPFpLFeEtk9Gw2JBwo73BI5Bg0HZ28fGJiFD8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=quicinc.com; spf=pass smtp.mailfrom=quicinc.com; dkim=pass (2048-bit key) header.d=quicinc.com header.i=@quicinc.com header.b=Sr6RKo8y; arc=none smtp.client-ip=205.220.180.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=quicinc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=quicinc.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=quicinc.com header.i=@quicinc.com header.b="Sr6RKo8y" Received: from pps.filterd (m0279869.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 45ILb6h3006655; Wed, 19 Jun 2024 00:05:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= iRWeq01fhpWDKacRAHXGRmQhpNUX3EjJdBREfuPnW/s=; b=Sr6RKo8yez2811fS GciE+gQYZgtPZlfph3QNk/bpj/XO6a12duNefTd9I+nKa7/qYlcWEdAKTjEGaUqB jlji6Iv8+SzzzK2Pg4akuqixk+WoC24HuL7aGIY3hiJ2SHvQUAvslQ4esZ33CNXJ 1ylsU3wftuGrYeDOGsi9QqNTTXMElSzU6I1sM5bbK4IvLrv9p9DWpv3XSRRB2L9Z Wg+xbwkDSHa/jySmFZEHmozR/1LKn7u87hHwXyzVHjYi9yiFhJ1v/LB5EYO2QS5B lHUZtwYHwz7HK+l/at3UOCA6VIbJCkt13ibVWbb2orf/Dfds9096vZsEwhQARYc5 AOFzBw== Received: from nasanppmta05.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3yujag07au-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 19 Jun 2024 00:05:18 +0000 (GMT) Received: from nasanex01b.na.qualcomm.com (nasanex01b.na.qualcomm.com [10.46.141.250]) by NASANPPMTA05.qualcomm.com (8.17.1.19/8.17.1.19) with ESMTPS id 45J05HIE006596 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 19 Jun 2024 00:05:17 GMT Received: from hu-eberman-lv.qualcomm.com (10.49.16.6) by nasanex01b.na.qualcomm.com (10.46.141.250) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.9; Tue, 18 Jun 2024 17:05:17 -0700 From: Elliot Berman Date: Tue, 18 Jun 2024 17:05:09 -0700 Subject: [PATCH RFC 3/5] mm/gup: Add support for re-pinning a normal pinned page as exclusive Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <20240618-exclusive-gup-v1-3-30472a19c5d1@quicinc.com> References: <20240618-exclusive-gup-v1-0-30472a19c5d1@quicinc.com> In-Reply-To: <20240618-exclusive-gup-v1-0-30472a19c5d1@quicinc.com> To: Andrew Morton , Shuah Khan , David Hildenbrand , Matthew Wilcox , CC: , , , , , , Elliot Berman , Fuad Tabba X-Mailer: b4 0.13.0 X-ClientProxiedBy: nalasex01b.na.qualcomm.com (10.47.209.197) To nasanex01b.na.qualcomm.com (10.46.141.250) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: RvxghgNS65Z373X2P6YCrPAm_VgGE1I5 X-Proofpoint-ORIG-GUID: RvxghgNS65Z373X2P6YCrPAm_VgGE1I5 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-06-18_06,2024-06-17_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 phishscore=0 mlxscore=0 adultscore=0 mlxlogscore=999 impostorscore=0 spamscore=0 bulkscore=0 malwarescore=0 clxscore=1015 suspectscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2405170001 definitions=main-2406180176 From: Fuad Tabba When a page is shared, the exclusive pin is dropped, but one normal pin is maintained. In order to be able to unshare a page, add the ability to reaquire the exclusive pin, but only if there is only one normal pin on the page, and only if the page is marked as AnonExclusive. Co-Developed-by: Elliot Berman Signed-off-by: Elliot Berman Signed-off-by: Fuad Tabba Signed-off-by: Elliot Berman --- include/linux/mm.h | 1 + include/linux/page_ref.h | 18 ++++++++++++------ mm/gup.c | 48 +++++++++++++++++++++++++++++++++++++----------- 3 files changed, 50 insertions(+), 17 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index d03d62bceba0..628ab936dd2b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1590,6 +1590,7 @@ void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages, void unpin_user_pages(struct page **pages, unsigned long npages); void unpin_exc_pages(struct page **pages, unsigned long npages); void unexc_user_page(struct page *page); +int reexc_user_page(struct page *page); static inline bool is_cow_mapping(vm_flags_t flags) { diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h index 9d16e1f4db09..e66130fe995d 100644 --- a/include/linux/page_ref.h +++ b/include/linux/page_ref.h @@ -92,7 +92,8 @@ static inline void __page_ref_unfreeze(struct page *page, int v) * provides safe operation for get_user_pages(), page_mkclean() and * other calls that race to set up page table entries. */ -#define GUP_PIN_COUNTING_BIAS (1U << 10) +#define GUP_PIN_COUNTING_SHIFT (10) +#define GUP_PIN_COUNTING_BIAS (1U << GUP_PIN_COUNTING_SHIFT) /* * GUP_PIN_EXCLUSIVE_BIAS is used to grab an exclusive pin over a page. @@ -100,7 +101,8 @@ static inline void __page_ref_unfreeze(struct page *page, int v) * exist for the page. * After it's taken, no other gup pins can be taken. */ -#define GUP_PIN_EXCLUSIVE_BIAS (1U << 30) +#define GUP_PIN_EXCLUSIVE_SHIFT (30) +#define GUP_PIN_EXCLUSIVE_BIAS (1U << GUP_PIN_EXCLUSIVE_SHIFT) static inline int page_ref_count(const struct page *page) { @@ -155,7 +157,9 @@ static inline void init_page_count(struct page *page) set_page_count(page, 1); } -static __must_check inline bool page_ref_setexc(struct page *page, unsigned int refs) +static __must_check inline bool page_ref_setexc(struct page *page, + unsigned int expected_pins, + unsigned int refs) { unsigned int old_count, new_count; @@ -165,7 +169,7 @@ static __must_check inline bool page_ref_setexc(struct page *page, unsigned int do { old_count = atomic_read(&page->_refcount); - if (old_count >= GUP_PIN_COUNTING_BIAS) + if ((old_count >> GUP_PIN_COUNTING_SHIFT) != expected_pins) return false; if (check_add_overflow(old_count, refs + GUP_PIN_EXCLUSIVE_BIAS, &new_count)) @@ -178,9 +182,11 @@ static __must_check inline bool page_ref_setexc(struct page *page, unsigned int return true; } -static __must_check inline bool folio_ref_setexc(struct folio *folio, unsigned int refs) +static __must_check inline bool folio_ref_setexc(struct folio *folio, + unsigned int expected_pins, + unsigned int refs) { - return page_ref_setexc(&folio->page, refs); + return page_ref_setexc(&folio->page, expected_pins, refs); } static inline void page_ref_add(struct page *page, int nr) diff --git a/mm/gup.c b/mm/gup.c index 7f20de33221d..663030d03d95 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -97,7 +97,9 @@ static inline struct folio *try_get_folio(struct page *page, int refs) return folio; } -static bool large_folio_pin_setexc(struct folio *folio, unsigned int pins) +static bool large_folio_pin_setexc(struct folio *folio, + unsigned int expected_pins, + unsigned int pins) { unsigned int old_pincount, new_pincount; @@ -107,7 +109,7 @@ static bool large_folio_pin_setexc(struct folio *folio, unsigned int pins) do { old_pincount = atomic_read(&folio->_pincount); - if (old_pincount > 0) + if (old_pincount != expected_pins) return false; if (check_add_overflow(old_pincount, pins + GUP_PIN_EXCLUSIVE_BIAS, &new_pincount)) @@ -117,15 +119,18 @@ static bool large_folio_pin_setexc(struct folio *folio, unsigned int pins) return true; } -static bool __try_grab_folio_excl(struct folio *folio, int pincount, int refcount) +static bool __try_grab_folio_excl(struct folio *folio, + unsigned int expected_pins, + int pincount, + int refcount) { if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_EXCLUSIVE_PIN))) return false; if (folio_test_large(folio)) { - if (!large_folio_pin_setexc(folio, pincount)) + if (!large_folio_pin_setexc(folio, expected_pins, pincount)) return false; - } else if (!folio_ref_setexc(folio, refcount)) { + } else if (!folio_ref_setexc(folio, expected_pins, refcount)) { return false; } @@ -135,7 +140,9 @@ static bool __try_grab_folio_excl(struct folio *folio, int pincount, int refcoun return true; } -static bool try_grab_folio_excl(struct folio *folio, int refs) +static bool try_grab_folio_excl(struct folio *folio, + unsigned int expected_pins, + int refs) { /* * When pinning a large folio, use an exact count to track it. @@ -145,15 +152,17 @@ static bool try_grab_folio_excl(struct folio *folio, int refs) * is pinned. That's why the refcount from the earlier * try_get_folio() is left intact. */ - return __try_grab_folio_excl(folio, refs, + return __try_grab_folio_excl(folio, expected_pins, refs, refs * (GUP_PIN_COUNTING_BIAS - 1)); } -static bool try_grab_page_excl(struct page *page) +static bool try_grab_page_excl(struct page *page, + unsigned int expected_pins) { struct folio *folio = page_folio(page); - return __try_grab_folio_excl(folio, 1, GUP_PIN_COUNTING_BIAS); + return __try_grab_folio_excl(folio, expected_pins, 1, + GUP_PIN_COUNTING_BIAS); } /** @@ -227,7 +236,7 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) } if (unlikely(flags & FOLL_EXCLUSIVE)) { - if (!try_grab_folio_excl(folio, refs)) + if (!try_grab_folio_excl(folio, 0, refs)) return NULL; } else { /* @@ -347,7 +356,7 @@ int __must_check try_grab_page(struct page *page, unsigned int flags) return -EBUSY; if (unlikely(flags & FOLL_EXCLUSIVE)) { - if (!try_grab_page_excl(page)) + if (!try_grab_page_excl(page, 0)) return -EBUSY; } else { /* @@ -661,6 +670,23 @@ void unexc_user_page(struct page *page) } EXPORT_SYMBOL(unexc_user_page); +int reexc_user_page(struct page *page) +{ + if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_EXCLUSIVE_PIN))) + return -EINVAL; + + sanity_check_pinned_pages(&page, 1); + + if (!PageAnonExclusive(page)) + return -EINVAL; + + if (!try_grab_page_excl(page, 1)) + return -EBUSY; + + return 0; +} +EXPORT_SYMBOL(reexc_user_page); + /* * Set the MMF_HAS_PINNED if not set yet; after set it'll be there for the mm's * lifecycle. Avoid setting the bit unless necessary, or it might cause write From patchwork Wed Jun 19 00:05:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elliot Berman X-Patchwork-Id: 13703202 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D1B5181; Wed, 19 Jun 2024 00:05:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.180.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718755534; cv=none; b=nwEKATC7Q1UfrFG7lP6WoWklS19TOIpTzm4T+A+8oknzYoAo9OiC6glkhDUS+uuuYHgvvM4V+rAobEoj4ZkhFoPlBVLzRht2KySdUz3xWdq1nrztKKxCp2vXiWKfB0JoO0MZBcXDcnA4MHOZVCc4uWP5qbfV7J44dRnbxOhSWp8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718755534; c=relaxed/simple; bh=mO7uNlI6EqQmvX/oussd84ugeoJdJFRO2MOzE1RAeQ8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-ID:References: In-Reply-To:To:CC; b=bPJUIzKuw6At5lg+9hIkOh6JMgRHFVVFfWiUgXLANKKgn0cIOCu6+puG2d+DSP/o0rgNGu/UpY6S6a5VntWC5AfvM7vptgtOvpCn8684M4+ujnVGRS4sIBi9hLY2MbDj1d6eTIDBrkjvbDvhNhav5sO4pS6uuTxw7Bw7OzCJel8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=quicinc.com; spf=pass smtp.mailfrom=quicinc.com; dkim=pass (2048-bit key) header.d=quicinc.com header.i=@quicinc.com header.b=Ua/TTQQY; arc=none smtp.client-ip=205.220.180.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=quicinc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=quicinc.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=quicinc.com header.i=@quicinc.com header.b="Ua/TTQQY" Received: from pps.filterd (m0279868.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 45IMFwkH016036; Wed, 19 Jun 2024 00:05:19 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= fbO+Vx6FL2r30CUuolgRIrjrd/S25P9zWMEY/WMDOJw=; b=Ua/TTQQYT8glFHXW LchHycEnI/XJlu9OkuSKKkzIokPq+WYkDbm9QUpGy0LILerA0qctjtvnnsXBIG2K xS5vVGKhdEeLJVuU5ZjxobXgt6siWIs2lxCz8B2vD0HhXc1sZ3AhxaH5TxnBydem gKbCo5eOK3cx2W3bF/heGEI/tTuxELAxpjI3qAfX13yZZNW2DGDVxXOMCXWzYcYN 6jgN2CBe8mpIa1D6DVGWxlbUFFrdLUQltstDGWbuutlONHp11wj/29GjJtA6PjMK sdVzcMoJGKsdMO+apXi4QUGIDX9SqGqdQ2NTO5XvHvwvHO24ZveFByS61VBEdxxm LwXswg== Received: from nasanppmta05.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3yujah877q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 19 Jun 2024 00:05:19 +0000 (GMT) Received: from nasanex01b.na.qualcomm.com (nasanex01b.na.qualcomm.com [10.46.141.250]) by NASANPPMTA05.qualcomm.com (8.17.1.19/8.17.1.19) with ESMTPS id 45J05H2c006643 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 19 Jun 2024 00:05:17 GMT Received: from hu-eberman-lv.qualcomm.com (10.49.16.6) by nasanex01b.na.qualcomm.com (10.46.141.250) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.9; Tue, 18 Jun 2024 17:05:17 -0700 From: Elliot Berman Date: Tue, 18 Jun 2024 17:05:10 -0700 Subject: [PATCH RFC 4/5] mm/gup-test: Verify exclusive pinned Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <20240618-exclusive-gup-v1-4-30472a19c5d1@quicinc.com> References: <20240618-exclusive-gup-v1-0-30472a19c5d1@quicinc.com> In-Reply-To: <20240618-exclusive-gup-v1-0-30472a19c5d1@quicinc.com> To: Andrew Morton , Shuah Khan , David Hildenbrand , Matthew Wilcox , CC: , , , , , , Elliot Berman X-Mailer: b4 0.13.0 X-ClientProxiedBy: nalasex01b.na.qualcomm.com (10.47.209.197) To nasanex01b.na.qualcomm.com (10.46.141.250) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-ORIG-GUID: Ep5_bKbxZPmtupXQY1F98ZZxOd7BDf84 X-Proofpoint-GUID: Ep5_bKbxZPmtupXQY1F98ZZxOd7BDf84 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-06-18_06,2024-06-17_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 mlxlogscore=812 clxscore=1015 spamscore=0 lowpriorityscore=0 mlxscore=0 priorityscore=1501 phishscore=0 adultscore=0 suspectscore=0 impostorscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2405170001 definitions=main-2406180176 Add test that pages have the exclusive pin bias when providing FOLL_EXCLUSIVE. Signed-off-by: Elliot Berman --- mm/gup_test.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/mm/gup_test.c b/mm/gup_test.c index eeb3f4d87c510..9c6b8c93e44a7 100644 --- a/mm/gup_test.c +++ b/mm/gup_test.c @@ -66,6 +66,26 @@ static void verify_dma_pinned(unsigned int cmd, struct page **pages, } } +static void verify_exclusive_pinned(unsigned int gup_flags, struct page **pages, + unsigned long nr_pages) +{ + unsigned long i; + const struct folio *folio; + + if (!(gup_flags & FOLL_EXCLUSIVE)) + return; + + for (i = 0; i < nr_pages; i++) { + folio = page_folio(pages[i]); + + if (WARN(!folio_maybe_exclusive_pinned(folio), + "pages[%lu] is not exclusive pinned\n", i)) { + dump_page(&folio->page, "gup_test failure"); + break; + } + } +} + static void dump_pages_test(struct gup_test *gup, struct page **pages, unsigned long nr_pages) { @@ -185,6 +205,8 @@ static int __gup_test_ioctl(unsigned int cmd, */ verify_dma_pinned(cmd, pages, nr_pages); + verify_exclusive_pinned(gup->gup_flags, pages, nr_pages); + if (cmd == DUMP_USER_PAGES_TEST) dump_pages_test(gup, pages, nr_pages); From patchwork Wed Jun 19 00:05:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elliot Berman X-Patchwork-Id: 13703204 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C4446193; Wed, 19 Jun 2024 00:05:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.180.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718755535; cv=none; b=AZSZct0KKcuUNndRSBpJfZ4VUBgnOVciR7Z/Ud4obIH8UqiUWPSAEMesNyDjFXHNf3MfuvkpoHkvqlnBgJ1FociiPzzNZTmkCYE2A9Z/ArsNENQxUhEoY7TnJmCRNFXAh4N45tVw5XuIPe04xzinmWwQgZVoynqMC6NpWkRqh0Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718755535; c=relaxed/simple; bh=3rFLb8JSB4SyXYfQa9wLnefBGieNG3xQZFmGCShvn4Y=; h=From:Date:Subject:MIME-Version:Content-Type:Message-ID:References: In-Reply-To:To:CC; b=o0JoOhy7tOnL7luS3PWhdPrX3pPWtJIDy9/GMjvGbFe3fL9fG9fcZ7NkVuWN+ElLfYsko5gXzxk/qmaKbCu+azjvLp+nq5VHdDZvcvJP/i/HnT+5U3Mw+R3eoMoXsk6XS4AuRlbKaaEa+FPmAnV6LFD0d9wksgvD/Q+ho07kfsQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=quicinc.com; spf=pass smtp.mailfrom=quicinc.com; dkim=pass (2048-bit key) header.d=quicinc.com header.i=@quicinc.com header.b=o2j3NCQE; arc=none smtp.client-ip=205.220.180.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=quicinc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=quicinc.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=quicinc.com header.i=@quicinc.com header.b="o2j3NCQE" Received: from pps.filterd (m0279870.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 45ILb6Xo000504; Wed, 19 Jun 2024 00:05:19 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= iMkszGdm9ae/n7pRpO/rrT8MarQ3LgEe54I6kubPvLI=; b=o2j3NCQEJTFU2hZA IcfJBRdS+w2hLNTSnVEQqnCk6AXpZRPDzzyjWU3587KuuQzZLw5I/0tOLUNVu6UP Rs6j0xFWBjkFJmhUD+PeP0g9pvohDo5Y8d2hN4+BOXp6oLp/9rFH5UWggdClLz25 rRhoaL7pj6CoQXuGDGoeGiDlQlUrTPKq5n4fqNiF4XWg7fYW2lZIs0xabEKlei/F fubw4xm7gCPQsE4bGTlJ+UpHrkc22hnsqi5ae6AcKEp8m+Dmg7hgq2ReUMYGqtoV N2eCzPir57qdth9XiRT9nDGlGmmenoIIk2FhV/utsVba60mRmieP4CfYuSeG0BnC WXrfVg== Received: from nasanppmta03.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3yujag07ba-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 19 Jun 2024 00:05:19 +0000 (GMT) Received: from nasanex01b.na.qualcomm.com (nasanex01b.na.qualcomm.com [10.46.141.250]) by NASANPPMTA03.qualcomm.com (8.17.1.19/8.17.1.19) with ESMTPS id 45J05IcW029972 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 19 Jun 2024 00:05:18 GMT Received: from hu-eberman-lv.qualcomm.com (10.49.16.6) by nasanex01b.na.qualcomm.com (10.46.141.250) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.9; Tue, 18 Jun 2024 17:05:18 -0700 From: Elliot Berman Date: Tue, 18 Jun 2024 17:05:11 -0700 Subject: [PATCH RFC 5/5] mm/gup_test: Verify GUP grabs same pages twice Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <20240618-exclusive-gup-v1-5-30472a19c5d1@quicinc.com> References: <20240618-exclusive-gup-v1-0-30472a19c5d1@quicinc.com> In-Reply-To: <20240618-exclusive-gup-v1-0-30472a19c5d1@quicinc.com> To: Andrew Morton , Shuah Khan , David Hildenbrand , Matthew Wilcox , CC: , , , , , , Elliot Berman X-Mailer: b4 0.13.0 X-ClientProxiedBy: nalasex01b.na.qualcomm.com (10.47.209.197) To nasanex01b.na.qualcomm.com (10.46.141.250) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: unhDqPMZarSEVc31r4x6VcJKNGdRtYup X-Proofpoint-ORIG-GUID: unhDqPMZarSEVc31r4x6VcJKNGdRtYup X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-06-18_06,2024-06-17_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 mlxscore=0 clxscore=1015 impostorscore=0 suspectscore=0 adultscore=0 spamscore=0 malwarescore=0 phishscore=0 mlxlogscore=880 bulkscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2405170001 definitions=main-2406180176 GUP'ing pages should get the same pages, test it. In case of FOLL_EXCLUSIVE, the second pin should fail to get any pages. Note: this change ought to be refactored to pull out the GUP'ing bits that's duplicated between the original and the second GUP. Signed-off-by: Elliot Berman --- mm/gup_test.c | 86 +++++++++++++++++++++++++++++++++++ mm/gup_test.h | 1 + tools/testing/selftests/mm/gup_test.c | 5 +- 3 files changed, 91 insertions(+), 1 deletion(-) diff --git a/mm/gup_test.c b/mm/gup_test.c index 9c6b8c93e44a7..28cc422b60b78 100644 --- a/mm/gup_test.c +++ b/mm/gup_test.c @@ -86,6 +86,89 @@ static void verify_exclusive_pinned(unsigned int gup_flags, struct page **pages, } } +static int verify_gup_twice(unsigned int cmd, struct gup_test *gup, + struct page **expected_pages, + unsigned long expected_nr_pages) +{ + unsigned long i, nr_pages, addr, next; + long nr; + struct page **pages __free(kfree) = NULL; + int ret = 0; + + nr_pages = gup->size / PAGE_SIZE; + pages = kvcalloc(nr_pages, sizeof(void *), GFP_KERNEL); + if (!pages) + return -ENOMEM; + + i = 0; + nr = gup->nr_pages_per_call; + for (addr = gup->addr; addr < gup->addr + gup->size; addr = next) { + if (nr != gup->nr_pages_per_call) + break; + + next = addr + nr * PAGE_SIZE; + if (next > gup->addr + gup->size) { + next = gup->addr + gup->size; + nr = (next - addr) / PAGE_SIZE; + } + + switch (cmd) { + case GUP_FAST_BENCHMARK: + nr = get_user_pages_fast(addr, nr, gup->gup_flags, + pages + i); + break; + case GUP_BASIC_TEST: + nr = get_user_pages(addr, nr, gup->gup_flags, pages + i); + break; + case PIN_FAST_BENCHMARK: + nr = pin_user_pages_fast(addr, nr, gup->gup_flags, + pages + i); + break; + case PIN_BASIC_TEST: + nr = pin_user_pages(addr, nr, gup->gup_flags, pages + i); + break; + case PIN_LONGTERM_BENCHMARK: + nr = pin_user_pages(addr, nr, + gup->gup_flags | FOLL_LONGTERM, + pages + i); + break; + default: + pr_err("cmd %d not supported for %s\n", cmd, __func__); + return -EINVAL; + } + + if (nr <= 0) + break; + i += nr; + } + + nr_pages = i; + + if (gup->gup_flags & FOLL_EXCLUSIVE) { + if (WARN(nr_pages, + "Able to acquire exclusive pin twice for %ld of %ld pages", + nr_pages, expected_nr_pages)) { + dump_page(pages[0], + "gup_test: verify_gup_twice() test"); + ret = -EIO; + } + } else if (nr_pages != expected_nr_pages) { + pr_err("%s: Expected %ld pages, got %ld\n", __func__, + expected_nr_pages, nr_pages); + ret = -EIO; + } else { + for (i = 0; i < nr_pages; i++) { + if (WARN(pages[i] != expected_pages[i], + "pages[%lu] mismatch\n", i)) + break; + } + } + + put_back_pages(cmd, pages, nr_pages, gup->test_flags); + + return ret; +} + static void dump_pages_test(struct gup_test *gup, struct page **pages, unsigned long nr_pages) { @@ -210,6 +293,9 @@ static int __gup_test_ioctl(unsigned int cmd, if (cmd == DUMP_USER_PAGES_TEST) dump_pages_test(gup, pages, nr_pages); + if (gup->test_flags & GUP_TEST_FLAG_GUP_TWICE) + ret = verify_gup_twice(cmd, gup, pages, nr_pages); + start_time = ktime_get(); put_back_pages(cmd, pages, nr_pages, gup->test_flags); diff --git a/mm/gup_test.h b/mm/gup_test.h index 5b37b54e8bea6..fcd41919b0159 100644 --- a/mm/gup_test.h +++ b/mm/gup_test.h @@ -17,6 +17,7 @@ #define GUP_TEST_MAX_PAGES_TO_DUMP 8 #define GUP_TEST_FLAG_DUMP_PAGES_USE_PIN 0x1 +#define GUP_TEST_FLAG_GUP_TWICE 0x2 struct gup_test { __u64 get_delta_usec; diff --git a/tools/testing/selftests/mm/gup_test.c b/tools/testing/selftests/mm/gup_test.c index bdeaac67ff9aa..b4b10c8338f80 100644 --- a/tools/testing/selftests/mm/gup_test.c +++ b/tools/testing/selftests/mm/gup_test.c @@ -98,7 +98,7 @@ int main(int argc, char **argv) pthread_t *tid; char *p; - while ((opt = getopt(argc, argv, "m:r:n:F:f:abcj:tTLUuwWSHpz")) != -1) { + while ((opt = getopt(argc, argv, "m:r:n:F:f:abcj:dtTLUuwWSHpz")) != -1) { switch (opt) { case 'a': cmd = PIN_FAST_BENCHMARK; @@ -172,6 +172,9 @@ int main(int argc, char **argv) /* fault pages in gup, do not fault in userland */ touch = 1; break; + case 'd': + gup.test_flags |= GUP_TEST_FLAG_GUP_TWICE; + break; default: ksft_exit_fail_msg("Wrong argument\n"); }