From patchwork Fri Jan 31 09:06:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergey Senozhatsky X-Patchwork-Id: 13955140 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1516C0218D for ; Fri, 31 Jan 2025 09:08:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 447B2280293; Fri, 31 Jan 2025 04:08:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F59D2800EF; Fri, 31 Jan 2025 04:08:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 26EF6280293; Fri, 31 Jan 2025 04:08:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 023EF2800EF for ; Fri, 31 Jan 2025 04:08:06 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id AB37C1C6FF8 for ; Fri, 31 Jan 2025 09:08:06 +0000 (UTC) X-FDA: 83067170172.25.B25F260 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf02.hostedemail.com (Postfix) with ESMTP id BE22580014 for ; Fri, 31 Jan 2025 09:08:04 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=Buqq49Sx; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf02.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.170 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738314484; a=rsa-sha256; cv=none; b=kSsy55psNa7cOT3tC0Gib2cUnbEIv4HZtmGQuFTxxoStCmu6DlFZ5iIzy5auHNOlJwNv1Z CC1S0TdMVEkgWyVTxp4DuqBlLObEwvpQDpDyLdvriu3LAkM8fK5ZTCvFU6g3di4t35liWx 7Z816HJbsZs4SVcFgqInaiT9urgL6io= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=Buqq49Sx; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf02.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.170 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738314484; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=K14zF8RGObITiXxNWEiw4I5DkmsIF3aHWMHaZssq3hI=; b=YQ74A7S6CjUAPwXsdpOuc5oEdl/aDr9BFif9AmCKBWsMRd5ifnIyUmxIalB8k/FXoYFDTu 4PIG9yqv79BFgBVzn7ybTBdx8IWYCp4O/X5W1VZssn6yOIATl5tk+rQg9Lm/TKxl8ogn1I 7j9JIOpjShHPvgZCU7SlwbVEogkXc08= Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-21bc1512a63so32839735ad.1 for ; Fri, 31 Jan 2025 01:08:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1738314483; x=1738919283; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=K14zF8RGObITiXxNWEiw4I5DkmsIF3aHWMHaZssq3hI=; b=Buqq49Sx/QE6hjy6Spduc8z55ux6O1LBxhVGd7cDTiZeT8Um3XR9F2PxgJe7McyBn8 QcHZCiLUDgFeCMXtQVpUQy3pufSXduLAAZOgIWsvuZh2FaCIZb+BgZNTouPF8jf22aib F1XSf8bRYZoETuhfm1qP/TQU/che5lkVkCOIU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738314483; x=1738919283; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=K14zF8RGObITiXxNWEiw4I5DkmsIF3aHWMHaZssq3hI=; b=hfHiRiWN+vI6U3G/Xm+hSIFSXHLnNWUrdId0aPIBp9kcNYdip/bQRAuuMTJsIHFUbp c7e/E6AMTVvokg+HQTlg9xNjSjda4Wftw+f102bmTWzAcT6OKLzQ9YJp9Iv+osPOw8ms cx7lU1UR5zmxb7n8HcQsGRK/UzZWCu36K4S+tO11zHJCZpn3E53sKr3539ANrfV16Rgz 4H3J9NH4J5ruVZlJlcU8Wj6JERtjyft/kDJroMNpQF70yaHh38KAR2jirx4eqoXyUezp tw3Xw8lYXWU2+51hAi0rdiAKXL1rmE5Zl2dZPN/E/IEnNmOTw3ojCXqFX/v7uIEy+t0q 1nbw== X-Forwarded-Encrypted: i=1; AJvYcCUO/n0Mn90rST7qmBidl8V4v40+yC1odhm2owS7SQ5o9bUCzBDlSWV9YdcIHHJqBD5JNwyB5AxdGw==@kvack.org X-Gm-Message-State: AOJu0Yyq4rBWUiA1uoadB2Ga1jbmjWOA6p+6F860j1WAcJu0XEHLXjRL z4/rXGy7yWy6PgIZyC4HjNvnftzbMQ7LxifiUqI2hNEL2+BwUykhMj1LGHpHmQ== X-Gm-Gg: ASbGnctejeEhkdle9gBJ5XZqFsKoxFzciZ0mftNQs9RdYqXqQBRVPNvER6KtzARGmwf 9UBr8dbleTQ0NNGJviGJWbWcYM+j4+iUMBVE35FhyUYLV9/apr7yus339QRu9cPyDAhBO6lCJiy bqp6tYOREO0jvBmKOLJYYnxL7GHXmereIhuZ/mSjf8CK1qhIXNVFp7/PVAfbX6rY6KG0HulM9mW E17Y08bJkPkV/n9/n02koxSNw72UYpVm2CQzIwjn2+5xRp+f3hB1faCz07sZfUDBG5aNjpzXkh6 tqNIws8Ce2ZmGDzS5Q== X-Google-Smtp-Source: AGHT+IFoGKrZNGGlmGxZx2UZTouqjuhCYFoIKMcMR38ulc4zYXbCr002OD1N6TYgHMj1w9Keyos0HA== X-Received: by 2002:a17:902:ecce:b0:211:3275:3fe with SMTP id d9443c01a7336-21dd7c65555mr158196745ad.17.1738314483547; Fri, 31 Jan 2025 01:08:03 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:c752:be9d:3368:16fa]) by smtp.gmail.com with UTF8SMTPSA id d9443c01a7336-21de31f80c2sm26197065ad.72.2025.01.31.01.08.01 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 31 Jan 2025 01:08:03 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton Cc: Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky , Yosry Ahmed Subject: [PATCHv4 14/17] zsmalloc: make zspage lock preemptible Date: Fri, 31 Jan 2025 18:06:13 +0900 Message-ID: <20250131090658.3386285-15-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog In-Reply-To: <20250131090658.3386285-1-senozhatsky@chromium.org> References: <20250131090658.3386285-1-senozhatsky@chromium.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: BE22580014 X-Stat-Signature: i891nm9kztbxxnpzob5k8kc7k5efzpcy X-HE-Tag: 1738314484-595646 X-HE-Meta: U2FsdGVkX19Cdhp/hufyU7kmSu3VYJdU0tML7sjn35AhBgXou2WLJakNAExAP81zZT0mFphcq8clwADDLZaG4T3IMHDnrxxkXxEf+bu3LXrfj3zoLEsIz501WBjhuVHGhZE3IAgkgP20/ANkSN02Facjuig6eDVDdnAbVonuN0wRxGEE7V2Sw5fW6RrQEO3w+tikeFapCVPL/mfGbOJkxVyU70k9IgJ6u7kOedV9MYOL5nePfFSFX2pQddG6R+od1SFERNJBeb1E0pnqwkXwR5J56rjQN8wjeGLqh3XSvw5wj4L08cmRvuVglElLZPzSn96LJsUBZtMFHWtXI8H3ZXWICoAK/ODZLQEnH5awzc3cdYtN6JEKA8Jgz1D4OOeiytd2FcWxlbc6dwW+0dHtGYPw1kLvoBhjEeHVXHylJSKMLLAsU2WDXurQ+wLSll++yaRDmgsO1iUjAGcGLET2rPG8Wvps2xxeo2SVAnwwh0F3OG8uVPkQsOV3giu2mV6y49D8IWaDHkAWhaBREQwHJpngZnNPba9WQ1cRHvXtRmeX1dt3PS4JCeBooaPu6i6RtqqBvFOjx5LM2L5Gl22UjO6dcdTVPeg2MFA7HGXJP33y10oeNrt2kA2RwBYGScq1FHEMkLcgRNjzPfZJHYWYefb3XUWS6YE8DEEf69gp0MlkQjAUMJztcSGmEG38ssprYwwBVMS5GjuLczqOcLtT6KALLyLW0tEWBEjIc17v86GyLhre5MElpcKlpsDjLz4c58Ml46PK4lhS6ChzQagV5sCEAy8WWPZvEEakGZZzBneD6RZKVGcVjOh1yu/yOm5pc1ochF5iw1MItd+WHmrb22iUfC0QKs8a3Uleoasrzr+Lq9IGbiRA9LnNWqE2n4HTMgK9DpA/1ehGxAboktqkD4c86G8G8mYmyvuxhmSVYAyORGCqtcBJu5SZ5zKLSmtYFpmZfEY/+nN3iFPlsPi XWiMrTRa CO9zg72wLOt6Eh66JCdF/0ShkVQ1nm+VA2h89lCeCgsa5F81WjSiJPZB3IivK3qgpP1CgbbsQ6xfdXStgMQH6/P+/APD+v02SRHYCxRenms3BUtiRoL08M2bf6hex81+w8MWPDbyU/zt4RxQDy1x5Fyul4v2h9+ZaGnwceqCfMK1Y2PIQQY1CSbD60vHRV6Hk/JLyn5xwjoOlSF/8eSPcAevCjwP7IGAMuLO2T8t29/U8ZOMSw8pzGaJ1IcdXmQCEMKw5hGCu7eEMezRIwaIrmBqCpmlpRXshO0+Gjy/XW+B1yx4+n3qM+2M4qw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Switch over from rwlock_t to a atomic_t variable that takes negative value when the page is under migration, or positive values when the page is used by zsmalloc users (object map, etc.) Using a rwsem per-zspage is a little too memory heavy, a simple atomic_t should suffice. zspage lock is a leaf lock for zs_map_object(), where it's read-acquired. Since this lock now permits preemption extra care needs to be taken when it is write-acquired - all writers grab it in atomic context, so they cannot spin and wait for (potentially preempted) reader to unlock zspage. There are only two writers at this moment - migration and compaction. In both cases we use write-try-lock and bail out if zspage is read locked. Writers, on the other hand, never get preempted, so readers can spin waiting for the writer to unlock zspage. With this we can implement a preemptible object mapping. Signed-off-by: Sergey Senozhatsky Cc: Yosry Ahmed --- mm/zsmalloc.c | 135 +++++++++++++++++++++++++++++++------------------- 1 file changed, 83 insertions(+), 52 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 4b4c77bc08f9..f5b5fe732e50 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -292,6 +292,9 @@ static inline void free_zpdesc(struct zpdesc *zpdesc) __free_page(page); } +#define ZS_PAGE_UNLOCKED 0 +#define ZS_PAGE_WRLOCKED -1 + struct zspage { struct { unsigned int huge:HUGE_BITS; @@ -304,7 +307,7 @@ struct zspage { struct zpdesc *first_zpdesc; struct list_head list; /* fullness list */ struct zs_pool *pool; - rwlock_t lock; + atomic_t lock; }; struct mapping_area { @@ -314,6 +317,59 @@ struct mapping_area { enum zs_mapmode vm_mm; /* mapping mode */ }; +static void zspage_lock_init(struct zspage *zspage) +{ + atomic_set(&zspage->lock, ZS_PAGE_UNLOCKED); +} + +/* + * zspage lock permits preemption on the reader-side (there can be multiple + * readers). Writers (exclusive zspage ownership), on the other hand, are + * always run in atomic context and cannot spin waiting for a (potentially + * preempted) reader to unlock zspage. This, basically, means that writers + * can only call write-try-lock and must bail out if it didn't succeed. + * + * At the same time, writers cannot reschedule under zspage write-lock, + * so readers can spin waiting for the writer to unlock zspage. + */ +static void zspage_read_lock(struct zspage *zspage) +{ + atomic_t *lock = &zspage->lock; + int old = atomic_read(lock); + + do { + if (old == ZS_PAGE_WRLOCKED) { + cpu_relax(); + old = atomic_read(lock); + continue; + } + } while (!atomic_try_cmpxchg(lock, &old, old + 1)); +} + +static void zspage_read_unlock(struct zspage *zspage) +{ + atomic_dec(&zspage->lock); +} + +static bool zspage_try_write_lock(struct zspage *zspage) +{ + atomic_t *lock = &zspage->lock; + int old = ZS_PAGE_UNLOCKED; + + preempt_disable(); + if (atomic_try_cmpxchg(lock, &old, ZS_PAGE_WRLOCKED)) + return true; + + preempt_enable(); + return false; +} + +static void zspage_write_unlock(struct zspage *zspage) +{ + atomic_set(&zspage->lock, ZS_PAGE_UNLOCKED); + preempt_enable(); +} + /* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */ static void SetZsHugePage(struct zspage *zspage) { @@ -325,12 +381,6 @@ static bool ZsHugePage(struct zspage *zspage) return zspage->huge; } -static void lock_init(struct zspage *zspage); -static void migrate_read_lock(struct zspage *zspage); -static void migrate_read_unlock(struct zspage *zspage); -static void migrate_write_lock(struct zspage *zspage); -static void migrate_write_unlock(struct zspage *zspage); - #ifdef CONFIG_COMPACTION static void kick_deferred_free(struct zs_pool *pool); static void init_deferred_free(struct zs_pool *pool); @@ -1026,7 +1076,7 @@ static struct zspage *alloc_zspage(struct zs_pool *pool, return NULL; zspage->magic = ZSPAGE_MAGIC; - lock_init(zspage); + zspage_lock_init(zspage); for (i = 0; i < class->pages_per_zspage; i++) { struct zpdesc *zpdesc; @@ -1251,7 +1301,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, * zs_unmap_object API so delegate the locking from class to zspage * which is smaller granularity. */ - migrate_read_lock(zspage); + zspage_read_lock(zspage); pool_read_unlock(pool); class = zspage_class(pool, zspage); @@ -1311,7 +1361,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle) } local_unlock(&zs_map_area.lock); - migrate_read_unlock(zspage); + zspage_read_unlock(zspage); } EXPORT_SYMBOL_GPL(zs_unmap_object); @@ -1705,18 +1755,18 @@ static void lock_zspage(struct zspage *zspage) /* * Pages we haven't locked yet can be migrated off the list while we're * trying to lock them, so we need to be careful and only attempt to - * lock each page under migrate_read_lock(). Otherwise, the page we lock + * lock each page under zspage_read_lock(). Otherwise, the page we lock * may no longer belong to the zspage. This means that we may wait for * the wrong page to unlock, so we must take a reference to the page - * prior to waiting for it to unlock outside migrate_read_lock(). + * prior to waiting for it to unlock outside zspage_read_lock(). */ while (1) { - migrate_read_lock(zspage); + zspage_read_lock(zspage); zpdesc = get_first_zpdesc(zspage); if (zpdesc_trylock(zpdesc)) break; zpdesc_get(zpdesc); - migrate_read_unlock(zspage); + zspage_read_unlock(zspage); zpdesc_wait_locked(zpdesc); zpdesc_put(zpdesc); } @@ -1727,41 +1777,16 @@ static void lock_zspage(struct zspage *zspage) curr_zpdesc = zpdesc; } else { zpdesc_get(zpdesc); - migrate_read_unlock(zspage); + zspage_read_unlock(zspage); zpdesc_wait_locked(zpdesc); zpdesc_put(zpdesc); - migrate_read_lock(zspage); + zspage_read_lock(zspage); } } - migrate_read_unlock(zspage); + zspage_read_unlock(zspage); } #endif /* CONFIG_COMPACTION */ -static void lock_init(struct zspage *zspage) -{ - rwlock_init(&zspage->lock); -} - -static void migrate_read_lock(struct zspage *zspage) __acquires(&zspage->lock) -{ - read_lock(&zspage->lock); -} - -static void migrate_read_unlock(struct zspage *zspage) __releases(&zspage->lock) -{ - read_unlock(&zspage->lock); -} - -static void migrate_write_lock(struct zspage *zspage) -{ - write_lock(&zspage->lock); -} - -static void migrate_write_unlock(struct zspage *zspage) -{ - write_unlock(&zspage->lock); -} - #ifdef CONFIG_COMPACTION static const struct movable_operations zsmalloc_mops; @@ -1803,7 +1828,7 @@ static bool zs_page_isolate(struct page *page, isolate_mode_t mode) } static int zs_page_migrate(struct page *newpage, struct page *page, - enum migrate_mode mode) + enum migrate_mode mode) { struct zs_pool *pool; struct size_class *class; @@ -1819,15 +1844,12 @@ static int zs_page_migrate(struct page *newpage, struct page *page, VM_BUG_ON_PAGE(!zpdesc_is_isolated(zpdesc), zpdesc_page(zpdesc)); - /* We're committed, tell the world that this is a Zsmalloc page. */ - __zpdesc_set_zsmalloc(newzpdesc); - /* The page is locked, so this pointer must remain valid */ zspage = get_zspage(zpdesc); pool = zspage->pool; /* - * The pool lock protects the race between zpage migration + * The pool->lock protects the race between zpage migration * and zs_free. */ pool_write_lock(pool); @@ -1837,8 +1859,15 @@ static int zs_page_migrate(struct page *newpage, struct page *page, * the class lock protects zpage alloc/free in the zspage. */ size_class_lock(class); - /* the migrate_write_lock protects zpage access via zs_map_object */ - migrate_write_lock(zspage); + /* the zspage write_lock protects zpage access via zs_map_object */ + if (!zspage_try_write_lock(zspage)) { + size_class_unlock(class); + pool_write_unlock(pool); + return -EINVAL; + } + + /* We're committed, tell the world that this is a Zsmalloc page. */ + __zpdesc_set_zsmalloc(newzpdesc); offset = get_first_obj_offset(zpdesc); s_addr = kmap_local_zpdesc(zpdesc); @@ -1869,7 +1898,7 @@ static int zs_page_migrate(struct page *newpage, struct page *page, */ pool_write_unlock(pool); size_class_unlock(class); - migrate_write_unlock(zspage); + zspage_write_unlock(zspage); zpdesc_get(newzpdesc); if (zpdesc_zone(newzpdesc) != zpdesc_zone(zpdesc)) { @@ -2005,9 +2034,11 @@ static unsigned long __zs_compact(struct zs_pool *pool, if (!src_zspage) break; - migrate_write_lock(src_zspage); + if (!zspage_try_write_lock(src_zspage)) + break; + migrate_zspage(pool, src_zspage, dst_zspage); - migrate_write_unlock(src_zspage); + zspage_write_unlock(src_zspage); fg = putback_zspage(class, src_zspage); if (fg == ZS_INUSE_RATIO_0) {