From patchwork Fri Feb 21 22:25:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergey Senozhatsky X-Patchwork-Id: 13986374 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADFE2C021B5 for ; Fri, 21 Feb 2025 22:30:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 477B46B0089; Fri, 21 Feb 2025 17:30:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 427DE280002; Fri, 21 Feb 2025 17:30:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C81E280001; Fri, 21 Feb 2025 17:30:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 10C5F6B0089 for ; Fri, 21 Feb 2025 17:30:18 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8A07316083D for ; Fri, 21 Feb 2025 22:30:17 +0000 (UTC) X-FDA: 83145396474.28.D8492C5 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf12.hostedemail.com (Postfix) with ESMTP id 98DB14000D for ; Fri, 21 Feb 2025 22:30:15 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=S5I4PwwQ; spf=pass (imf12.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.170 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740177015; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Vyf4VQS9dbJdyz0MwhcPGTpR7cg7WEhYb8zFP/ep/Cc=; b=zoDufXHh3nNCMVf63AsQxCcchUxC8XRi2E75c71EoKtWAiwvq1Fz5p5RuOwK7Q8rsWpciq Q9VpzDyRZHG81kLsI6s9AzXfTKbOmeG5FhMI8J8MjPu7pr3fts8s05sfaCv1sMsTb4wI9f GsZ40WxgXDomlH3eOXnQ05dhE2KgP+Q= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=S5I4PwwQ; spf=pass (imf12.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.170 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740177015; a=rsa-sha256; cv=none; b=GTKDVvPc+lPg744LLdksxYf3AkYb63or4enZvAX1i4U7++ju1g00qNpakdKYiDr4Ivwaun zC8OjJvfYvfwZgAeHxbRVPFJX2j6C9dYPuby4QCNNadfkn1J59kVVTNiNkH+MSStvXYreV TjLvM+UpOtb+s4UP4XAJpVHvUUvM/RE= Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-21c2f1b610dso75298885ad.0 for ; Fri, 21 Feb 2025 14:30:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1740177014; x=1740781814; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Vyf4VQS9dbJdyz0MwhcPGTpR7cg7WEhYb8zFP/ep/Cc=; b=S5I4PwwQ7SP3kxKNclIfQ7JiF90fs75gVEqYljCGXC5yplUCrJUl6tZo42kWb+gMVF Gs6nZmHyCP6z3RgeKmC/ZWFHcA0p0PBB0e/esfT9KntBDz84Wg3UC6fGaiInPx4A8FN8 7cLueqZpk83Xb39hKjzp0fszsQ8AWnNopbXzA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740177014; x=1740781814; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Vyf4VQS9dbJdyz0MwhcPGTpR7cg7WEhYb8zFP/ep/Cc=; b=YXdcwW/2A2+gWRQl+ThMIFZ4i/K4BM9BVjee3rNc38DV1DQIUSI2XvxNt9N7afSvyo nM9qumB9E7u6k0ZPwvIFT6W2ebVhx3rldzAGFQmzbYFwMd2BmTa7mUrnwaNgtDe2rHL2 UPY9O3G3fzxzAEBypkmthfubyyAYWPlXI1304IvM4N3s/GBOj+32nshlunlrmMnLS2R/ m1v9NjPdQD1CObiqBemIOUWsY0RnrizuSPzgBi20T39sgNH1JPTt3+IdVYJ6vKh2E+m3 PBNo9ISXYlpwszU25htJ0UNJGtZCC8Ve1EG4ekojn4wzJ6HT+ce8HssBJTMFHyy18PUI oxQQ== X-Forwarded-Encrypted: i=1; AJvYcCUaEmvWBonAlwoZj4fddcNMSRfPqHGE8TCMi0zJUkQB68143ohdQdlG89LAuw2h5QRmgb/vZ0Ir/g==@kvack.org X-Gm-Message-State: AOJu0YxCbE+ynt3iNDhzroRFAv6kvWoTT7iooFKBVa8aBPAqtC4F1eH5 pBls+llYg3jFyTvZRQzdJH120Bt66LwkVilX3LhhUnqcsYaITlrEYuKkxMrjxQ== X-Gm-Gg: ASbGncuoHMLoh15WQPT31Ia1abI4FO4/4+tLrCB+pI32aygNS65iVANQFHCXCNQ8J3D FkEBtfv0VDeEas1l5ZaImcBwFtwPlHCjBYWhSH2paeNULlVIXoQDDTDqOF5uTz7zH5bZihZ8R+/ ceepDZjHK3HdN5B+3iRbsaY3iOKAltnsknQ9J1OxjwYtqerzfxWFUxaQfS5gxxC5f9KIyTYi3Kn Yfj0cJWaDKBhiwvDSt23L6nsvlGQAe6W/vxMD8BapGlkYRUCz6HqUytTFQBAxgYRr6VCwPot7Nf Rhg1eDuay3YK/euZWtB5hOhhTyU= X-Google-Smtp-Source: AGHT+IHofchBgt10CdVSO9Xcl4yHCl6fKy3NCLrLd0JwdaM3wh17mZmotiuBqqh8g/QoieHEYG/z2A== X-Received: by 2002:a05:6a20:9149:b0:1ee:dd60:195b with SMTP id adf61e73a8af0-1eef3dea587mr9136695637.41.1740177014436; Fri, 21 Feb 2025 14:30:14 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:f987:e1e:3dbb:2191]) by smtp.gmail.com with UTF8SMTPSA id 41be03b00d2f7-adc9ff10056sm13408975a12.72.2025.02.21.14.30.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 21 Feb 2025 14:30:14 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton Cc: Yosry Ahmed , Hillf Danton , Kairui Song , Sebastian Andrzej Siewior , Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: [PATCH v8 01/17] zram: sleepable entry locking Date: Sat, 22 Feb 2025 07:25:32 +0900 Message-ID: <20250221222958.2225035-2-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog In-Reply-To: <20250221222958.2225035-1-senozhatsky@chromium.org> References: <20250221222958.2225035-1-senozhatsky@chromium.org> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: omdhpgmhby1jdayieqt4k7bf7rgefncm X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 98DB14000D X-HE-Tag: 1740177015-284140 X-HE-Meta: U2FsdGVkX199QpG0b4Ja1QnMyxubf0ZKtq4Ogc8Q6zZlj5D4rOGgXHUK6/qJfnR08T2BDSC/rNRKjXcIzpx74Fv+0Qj9HONo73XWEvA1zH1+nJX27h2JG4aZMb6FrMtHjR1C1KddFR3tOnQCRfIn11QEotukEASrHdVQWMaKRwv0pS7oj19Rv70ihL5VUtikmrriH+WEsuZhQKwriEavbu1MUMfRh2yqD48foW+BscBTF2E2VAhP1KvCtF4SmZqOlcEQ3FoAoPtdutEo+cRmiS61au1qG4Pe2rJacj5SIIgYt7gR/R1u3simy9SsWouJsLxku19CuDdNeaGqRkzw3+CLG4eDXK0SyvhCWDl+2vfUnnHsCKphDNmU9kGtxUGFOAi86Z+wrM4zY6BYEteXuT+v5EXAP59g4bkvcnZCO7CdYc0W2POHTJuxrqW5Uv7om/4F4zCrvKmVuC7xPnAmOgp4kUxxEgDIvbvcp9Yb2SFKuYHjAlHnsfwLJuue7uFZUEhQOKsATWmE24FqyPpBgMwWh9IWUUvMJs4woApjdcddCCoGda8wkKYr34m03Jovcpp8uGLdfh6K+LQn5wvY+weuxCwcxv77hxC5Uq4ote+rnVMoJKjK4Pe1Py1jz50jGTnyMsSoR7ZjvoUnH8pEhTL2r2Wzpfs1eFvyzYj9aZqpgaIexZKm8z86HUkNF2tXy8uhbFTavkwV+HTZcOcRLhOhW36rRcGd0r9g1Rn4EO80oBQGaVlMgJnlko5qKmce/Ddk9KIyJDXl8Rc71ytWbTH4YWx9vDcgm1aAVTD+nwOcQN2G+pJZlIWVHCmOH1DgG0VROcy26DXXDeAsmza2SRX3+dESoJRsfvYlzZC1CJnP84iiepTWF3GCc7Gt/vNuQ6Jfss+Y6WwJ/YzEMlAh0hXrG4CyZ2UBtl15ScoNZjN0QoCk5WjNvY4otXFZZ7/6pyxU2xsKwbEafxmcBM9 IkjeXGTj AHBCBm6KByZoTk1+jvz8csXPT45bgyjkco6lM5hSH3cK63c1WAjNSmwR/1lZntLR3OHBuMzYw+a97EetSP9pKYAdwgxRRENTE56zg/rqVg0Mmg4WnyCzdJHfj6GaUuv4xSGKsbVsGCgHTrx2J0ple1pQOFgxyNhcEkXegZHhPuafYOnKkzjcxpYiO2ocwXSLlKwBgMfWWfr6KulJxuHayDlHtz+1/XM2GMupT/IxIYbrNRSZbhRvMYqBP7VwsUHsTaNsZqP7tn58GJJIu0qi/NdJyxhAIvxdoUH0/CwrAh6AknjqG9419NWgrqQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Concurrent modifications of meta table entries is now handled by per-entry spin-lock. This has a number of shortcomings. First, this imposes atomic requirements on compression backends. zram can call both zcomp_compress() and zcomp_decompress() under entry spin-lock, which implies that we can use only compression algorithms that don't schedule/sleep/wait during compression and decompression. This, for instance, makes it impossible to use some of the ASYNC compression algorithms (H/W compression, etc.) implementations. Second, this can potentially trigger watchdogs. For example, entry re-compression with secondary algorithms is performed under entry spin-lock. Given that we chain secondary compression algorithms and that some of them can be configured for best compression ratio (and worst compression speed) zram can stay under spin-lock for quite some time. Having a per-entry mutex (or, for instance, a rw-semaphore) significantly increases sizeof() of each entry and hence the meta table. Therefore entry locking returns back to bit locking, as before, however, this time also preempt-rt friendly, because if waits-on-bit instead of spinning-on-bit. Lock owners are also now permitted to schedule, which is a first step on the path of making zram non-atomic. Signed-off-by: Sergey Senozhatsky --- drivers/block/zram/zram_drv.c | 62 ++++++++++++++++++++++++++++------- drivers/block/zram/zram_drv.h | 20 +++++++---- 2 files changed, 65 insertions(+), 17 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 9f5020b077c5..37c5651305c2 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -58,19 +58,62 @@ static void zram_free_page(struct zram *zram, size_t index); static int zram_read_from_zspool(struct zram *zram, struct page *page, u32 index); -static int zram_slot_trylock(struct zram *zram, u32 index) +#ifdef CONFIG_DEBUG_LOCK_ALLOC +#define slot_dep_map(zram, index) (&(zram)->table[(index)].dep_map) +#define zram_lock_class(zram) (&(zram)->lock_class) +#else +#define slot_dep_map(zram, index) NULL +#define zram_lock_class(zram) NULL +#endif + +static void zram_slot_lock_init(struct zram *zram, u32 index) { - return spin_trylock(&zram->table[index].lock); + lockdep_init_map(slot_dep_map(zram, index), + "zram->table[index].lock", + zram_lock_class(zram), 0); +} + +/* + * entry locking rules: + * + * 1) Lock is exclusive + * + * 2) lock() function can sleep waiting for the lock + * + * 3) Lock owner can sleep + * + * 4) Use TRY lock variant when in atomic context + * - must check return value and handle locking failers + */ +static __must_check bool zram_slot_trylock(struct zram *zram, u32 index) +{ + unsigned long *lock = &zram->table[index].flags; + + if (!test_and_set_bit_lock(ZRAM_ENTRY_LOCK, lock)) { + mutex_acquire(slot_dep_map(zram, index), 0, 1, _RET_IP_); + lock_acquired(slot_dep_map(zram, index), _RET_IP_); + return true; + } + + lock_contended(slot_dep_map(zram, index), _RET_IP_); + return false; } static void zram_slot_lock(struct zram *zram, u32 index) { - spin_lock(&zram->table[index].lock); + unsigned long *lock = &zram->table[index].flags; + + mutex_acquire(slot_dep_map(zram, index), 0, 0, _RET_IP_); + wait_on_bit_lock(lock, ZRAM_ENTRY_LOCK, TASK_UNINTERRUPTIBLE); + lock_acquired(slot_dep_map(zram, index), _RET_IP_); } static void zram_slot_unlock(struct zram *zram, u32 index) { - spin_unlock(&zram->table[index].lock); + unsigned long *lock = &zram->table[index].flags; + + mutex_release(slot_dep_map(zram, index), _RET_IP_); + clear_and_wake_up_bit(ZRAM_ENTRY_LOCK, lock); } static inline bool init_done(struct zram *zram) @@ -93,7 +136,6 @@ static void zram_set_handle(struct zram *zram, u32 index, unsigned long handle) zram->table[index].handle = handle; } -/* flag operations require table entry bit_spin_lock() being held */ static bool zram_test_flag(struct zram *zram, u32 index, enum zram_pageflags flag) { @@ -1473,15 +1515,11 @@ static bool zram_meta_alloc(struct zram *zram, u64 disksize) huge_class_size = zs_huge_class_size(zram->mem_pool); for (index = 0; index < num_pages; index++) - spin_lock_init(&zram->table[index].lock); + zram_slot_lock_init(zram, index); + return true; } -/* - * To protect concurrent access to the same index entry, - * caller should hold this table index entry's bit_spinlock to - * indicate this index entry is accessing. - */ static void zram_free_page(struct zram *zram, size_t index) { unsigned long handle; @@ -2625,6 +2663,7 @@ static int zram_add(void) if (ret) goto out_cleanup_disk; + lockdep_register_key(zram_lock_class(zram)); zram_debugfs_register(zram); pr_info("Added device: %s\n", zram->disk->disk_name); return device_id; @@ -2653,6 +2692,7 @@ static int zram_remove(struct zram *zram) zram->claim = true; mutex_unlock(&zram->disk->open_mutex); + lockdep_unregister_key(zram_lock_class(zram)); zram_debugfs_unregister(zram); if (claimed) { diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index db78d7c01b9a..794c9234e627 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -28,7 +28,6 @@ #define ZRAM_SECTOR_PER_LOGICAL_BLOCK \ (1 << (ZRAM_LOGICAL_BLOCK_SHIFT - SECTOR_SHIFT)) - /* * ZRAM is mainly used for memory efficiency so we want to keep memory * footprint small and thus squeeze size and zram pageflags into a flags @@ -46,6 +45,7 @@ /* Flags for zram pages (table[page_no].flags) */ enum zram_pageflags { ZRAM_SAME = ZRAM_FLAG_SHIFT, /* Page consists the same element */ + ZRAM_ENTRY_LOCK, /* entry access lock bit */ ZRAM_WB, /* page is stored on backing_device */ ZRAM_PP_SLOT, /* Selected for post-processing */ ZRAM_HUGE, /* Incompressible page */ @@ -58,13 +58,18 @@ enum zram_pageflags { __NR_ZRAM_PAGEFLAGS, }; -/*-- Data structures */ - -/* Allocated for each disk page */ +/* + * Allocated for each disk page. We use bit-lock (ZRAM_ENTRY_LOCK bit + * of flags) to save memory. There can be plenty of entries and standard + * locking primitives (e.g. mutex) will significantly increase sizeof() + * of each entry and hence of the meta table. + */ struct zram_table_entry { unsigned long handle; - unsigned int flags; - spinlock_t lock; + unsigned long flags; +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map dep_map; +#endif #ifdef CONFIG_ZRAM_TRACK_ENTRY_ACTIME ktime_t ac_time; #endif @@ -137,5 +142,8 @@ struct zram { struct dentry *debugfs_dir; #endif atomic_t pp_in_progress; +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lock_class_key lock_class; +#endif }; #endif