From patchwork Wed Feb 12 06:26:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergey Senozhatsky X-Patchwork-Id: 13971029 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E0DEC02198 for ; Wed, 12 Feb 2025 06:32:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A5756B0085; Wed, 12 Feb 2025 01:32:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 62FE46B0088; Wed, 12 Feb 2025 01:32:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B401280001; Wed, 12 Feb 2025 01:32:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2AB996B0085 for ; Wed, 12 Feb 2025 01:32:11 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 927381A0E8B for ; Wed, 12 Feb 2025 06:32:10 +0000 (UTC) X-FDA: 83110322820.16.1D24992 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf25.hostedemail.com (Postfix) with ESMTP id A9BB7A0008 for ; Wed, 12 Feb 2025 06:32:08 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=JUYJl8F1; spf=pass (imf25.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.169 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739341928; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Nbviwh7jX+fyZgx+5zWo4yndOTBdlLf1Q5tRKthE6k8=; b=ltxbY3lgZmXPHFCCm+sS+EIuLjDQpws62bIXejzMUJArrslkUcg/mJMhdyOVFDkOVK2gn8 N4k4T4ldWF9TS6+QGfIkIbGiRn2UgXliG+z8fyCWvioKw6MML6XQ4UdlILxJxSgKI0kvLy BmDeOA1Xq+DbvgacZRuk9JKFfKm9x3I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739341928; a=rsa-sha256; cv=none; b=jWmMl9oXulVVtqnHE51h1NVtIoVjs7RXH0HbiAQjXQ6XEINKBPU9iwn0L0Rnc0MW9r7Hoh DWXK+2qmCQoN4fxgB8R3jqwISItIb40tvZcSrUXBRI1nS5DmUi2xk0ix3XBfIthYuvzw9J jTipttWVxn/CYSXdsefn3NzTyIqmYDk= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=JUYJl8F1; spf=pass (imf25.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.169 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-21f92258aa6so81130125ad.3 for ; Tue, 11 Feb 2025 22:32:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1739341927; x=1739946727; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Nbviwh7jX+fyZgx+5zWo4yndOTBdlLf1Q5tRKthE6k8=; b=JUYJl8F1TGBHIzcetpxsg0Qg1tOWY7H+e6oQnR+U1AMz6c3QKXDiPBdmZJXRq5XsdI 9NuAmXHMl6s4TS7N55jBEk8dv2rV0ehHO0EtvS9gSjCBSB+0BrXWGQ8hWomsTteg4jgh 3r7mx2iighfat5TDoCVeXAXH2bea0qNAdEr8w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739341927; x=1739946727; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Nbviwh7jX+fyZgx+5zWo4yndOTBdlLf1Q5tRKthE6k8=; b=gWjRRFdjSc5X/1FuhTSUzYWYOqlg0tbipVJgURb0frF3vzAHfm79xiPE0DTc8WlGDa EmwTiqK6+k4Hrjnc/7p9lEHaBfXxQF4L/DEVyIBdt8Bv5E6hCenf6IFr+RYzWt+QAosm UPI9qil8Cj57j/kWTmh+jIDRQeWh+DVBxPDvEcUoHkMcxClixxK6y+fknu9uxWIH6jlu DDgZwyGBx7A6KhaKEtAGNCwtgZ3kgOZcHweKyOkdkA3zQdEHShpZeETw1WqpKzkvBXjf NLtYzdWmHaNSs0eJwbubLd7ACmAZyf9TwDABoBZDCdzhriSdftzgJ+QKsVWIAOqcMkwo H0xA== X-Forwarded-Encrypted: i=1; AJvYcCU0JMG14efDEGFX5CAzcl9dMKJInnERttLe9YhPriv8XiDe7rk010UZ1na4Shn2NnvJMjpI027/tg==@kvack.org X-Gm-Message-State: AOJu0YzFwoqeHMYWk2XDAoYni0XfkLJ13o+pf4FnZ5qmxaE9/kxP41Lb yUXCYlMqynmnz9cGMY3IrSgVUz9lFyBWjH7QOsWgQxKwgpvhSzgnzZ6zvnTQwQ== X-Gm-Gg: ASbGncuJnJyOKXuWoHXV7pwDlLgITsefutvCBdMXo3AWOl+01tedCj15WsiwicieVxn GtkdyfJ6qWe5QB+lscMCPcjz0OJtHJctFBQWkGLy0GwKZVrUQIj3E6GBLTcfBNMcstviv0tsuhS pdffnfifS3HYGfjV2c4GLiWOr0nVWH7Y4oScek5AqRmzxLHS2Zp8v+CLcxRtGcoyqYnwSkbbVaJ DIUVccqN/zlDitu1QaDx7o0XV5amMgOwpohA7R2SDt/6vQzdVD/9u8BKwFfJWZRqGv80ooDY+og 1hFgKuxv2QN5KrhNNg== X-Google-Smtp-Source: AGHT+IFSvyC57xadlYDsJ2pjEf3GwKRx0RFUmqWoHBxBXjtPHZ1A+qKCAZJUZUiYg24snMECPbVRZA== X-Received: by 2002:a17:902:ea11:b0:216:46f4:7e3d with SMTP id d9443c01a7336-220bbad6f94mr31042255ad.15.1739341925761; Tue, 11 Feb 2025 22:32:05 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:69f5:6852:451e:8142]) by smtp.gmail.com with UTF8SMTPSA id d9443c01a7336-220c34f443bsm2839685ad.5.2025.02.11.22.32.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 11 Feb 2025 22:32:05 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton Cc: Yosry Ahmed , Kairui Song , Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: [PATCH v5 01/18] zram: sleepable entry locking Date: Wed, 12 Feb 2025 15:26:59 +0900 Message-ID: <20250212063153.179231-2-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.502.g6dc24dfdaf-goog In-Reply-To: <20250212063153.179231-1-senozhatsky@chromium.org> References: <20250212063153.179231-1-senozhatsky@chromium.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: A9BB7A0008 X-Rspamd-Server: rspam07 X-Stat-Signature: mcjqxg1xagyh5ks5ngu6ixk4x36oyr98 X-HE-Tag: 1739341928-182063 X-HE-Meta: U2FsdGVkX19V9DTktW5YU0aGVECqZsmZqu8c0hcBIFhV2jDOvEk4xUa7+LxpDFCuLteupP7BSgVdQzenvvybYFjwtzPWPs/q0PQeLMODF1gz2WLKYGbDNr44EQJy/ED0XYvUS6Q8lMrIPDaTM6DFcwGOaq9Y+gcjG5M/K+xbKvIaFlIunooFjRa4LHsXXMM55ONFv4V1syVJ/LMk3cOFiPi5q6Z3xXLR2xXL+/5YiZ5V28SyDMG16zXplN1ByVF69GhtOEo4Cx153GVHAwU25enU97QMiIBqky1C2/Mh4oE2RhqpGCwqt3hX9Ojl+X/Va4Arn+TMSE2QxYUSIHLXSyv4hp1bI+N04DVB73PlH2S0spxEhtpENs5W2v+6ZglYq7sbBTEPfGWfhiEieIgMlrdOaIn/6ILFNt4PUjVwXyz2ehSJe9krMXig9kKPZ4MMf2D8xML222cgZ95onuM/wlW0lPJMPftm/obnzXBU1II1M+7LzRZ4oO73pffIZDkjtv88cTWJC8FZRAEBbDJiGL8z/lVZptt8Aol05zhECkM/HTpbXvgxsPFoyd7v9D3oYxGOkO/fxIpUu6OnjT07VT6HbAq0DihnIiyqQMUJnF1E2Sj+lB7EMTg4x8Cvs+t6IReW8K6MetWhrSzuZD7fR7bNrBuGVbeIYcmL21UytJ4M1AtaMrciTlvPdFuAm3YCZ+IW5+MN7JMtDwL1ers/8EFF70x2shoqe8G5YTEej2OwcOEBkvljPFmawhj/2qcfq59DzIZYoGkxiyYwOZy3zTvZtVD3RYLYRYCkr3p2+ed4M4rnWfBc1Btk58I/7+BX7qWhBHxBeEs/T3Xdvx//9My32QW4QNaLf9QXpzpCqJFTlHjznMefJJUBhtfLX6MVBG4JJye5AIKdD06Yd3HpZuUseEkwp1SJtK2OB1mCxvmYrc4NbZ+eGM6KOuWLv/YrEcnX0LdBs2dlrKFY2VP +HgoNam4 1Qlrt71LiNTLNALaCLnHxIxlLD+zq29aYgJXcmWnIJbiqB5crs98iAyMmK5++PaRFX3VeKuTJa74G9JJzVQQDV9pqP0E0OWSDVcBOoVAl8YTIjdAOFCyOyGxfaNt95LXntShD6nemYSuNnsn4T1BEfzqAOV3HWxRSZ0J+D3+ETq7t4ygkkiXvd5b8tp2w0Fisd8SUu9D2LeiZkSLVgAdiwKiHw1ukRIAVhZnOroneGYftpdT+xI6bcSHiKPGJMozU2F4eSUX0weqvpEM1kQJptg/rLfuItcgir464ys/JW66RenWhfPRQYj6U0A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Concurrent modifications of meta table entries is now handled by per-entry spin-lock. This has a number of shortcomings. First, this imposes atomic requirements on compression backends. zram can call both zcomp_compress() and zcomp_decompress() under entry spin-lock, which implies that we can use only compression algorithms that don't schedule/sleep/wait during compression and decompression. This, for instance, makes it impossible to use some of the ASYNC compression algorithms (H/W compression, etc.) implementations. Second, this can potentially trigger watchdogs. For example, entry re-compression with secondary algorithms is performed under entry spin-lock. Given that we chain secondary compression algorithms and that some of them can be configured for best compression ratio (and worst compression speed) zram can stay under spin-lock for quite some time. Having a per-entry mutex (or, for instance, a rw-semaphore) significantly increases sizeof() of each entry and hence the meta table. Therefore entry locking returns back to bit locking, as before, however, this time also preempt-rt friendly, because if waits-on-bit instead of spinning-on-bit. Lock owners are also now permitted to schedule, which is a first step on the path of making zram non-atomic. Signed-off-by: Sergey Senozhatsky --- drivers/block/zram/zram_drv.c | 65 ++++++++++++++++++++++++++++------- drivers/block/zram/zram_drv.h | 20 +++++++---- 2 files changed, 67 insertions(+), 18 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 9f5020b077c5..3708436f1d1f 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -58,19 +58,57 @@ static void zram_free_page(struct zram *zram, size_t index); static int zram_read_from_zspool(struct zram *zram, struct page *page, u32 index); -static int zram_slot_trylock(struct zram *zram, u32 index) +static void zram_slot_lock_init(struct zram *zram, u32 index) { - return spin_trylock(&zram->table[index].lock); +#ifdef CONFIG_DEBUG_LOCK_ALLOC + lockdep_init_map(&zram->table[index].lockdep_map, "zram-entry->lock", + &zram->table_lockdep_key, 0); +#endif +} + +/* + * entry locking rules: + * + * 1) Lock is exclusive + * + * 2) lock() function can sleep waiting for the lock + * + * 3) Lock owner can sleep + * + * 4) Use TRY lock variant when in atomic context + * - must check return value and handle locking failers + */ +static __must_check bool zram_slot_try_lock(struct zram *zram, u32 index) +{ + unsigned long *lock = &zram->table[index].flags; + + if (!test_and_set_bit_lock(ZRAM_ENTRY_LOCK, lock)) { +#ifdef CONFIG_DEBUG_LOCK_ALLOC + mutex_acquire(&zram->table[index].lockdep_map, 0, 1, _RET_IP_); +#endif + return true; + } + return false; } static void zram_slot_lock(struct zram *zram, u32 index) { - spin_lock(&zram->table[index].lock); + unsigned long *lock = &zram->table[index].flags; + +#ifdef CONFIG_DEBUG_LOCK_ALLOC + mutex_acquire(&zram->table[index].lockdep_map, 0, 0, _RET_IP_); +#endif + wait_on_bit_lock(lock, ZRAM_ENTRY_LOCK, TASK_UNINTERRUPTIBLE); } static void zram_slot_unlock(struct zram *zram, u32 index) { - spin_unlock(&zram->table[index].lock); + unsigned long *lock = &zram->table[index].flags; + +#ifdef CONFIG_DEBUG_LOCK_ALLOC + mutex_release(&zram->table[index].lockdep_map, _RET_IP_); +#endif + clear_and_wake_up_bit(ZRAM_ENTRY_LOCK, lock); } static inline bool init_done(struct zram *zram) @@ -93,7 +131,6 @@ static void zram_set_handle(struct zram *zram, u32 index, unsigned long handle) zram->table[index].handle = handle; } -/* flag operations require table entry bit_spin_lock() being held */ static bool zram_test_flag(struct zram *zram, u32 index, enum zram_pageflags flag) { @@ -1473,15 +1510,11 @@ static bool zram_meta_alloc(struct zram *zram, u64 disksize) huge_class_size = zs_huge_class_size(zram->mem_pool); for (index = 0; index < num_pages; index++) - spin_lock_init(&zram->table[index].lock); + zram_slot_lock_init(zram, index); + return true; } -/* - * To protect concurrent access to the same index entry, - * caller should hold this table index entry's bit_spinlock to - * indicate this index entry is accessing. - */ static void zram_free_page(struct zram *zram, size_t index) { unsigned long handle; @@ -2321,7 +2354,7 @@ static void zram_slot_free_notify(struct block_device *bdev, zram = bdev->bd_disk->private_data; atomic64_inc(&zram->stats.notify_free); - if (!zram_slot_trylock(zram, index)) { + if (!zram_slot_try_lock(zram, index)) { atomic64_inc(&zram->stats.miss_free); return; } @@ -2625,6 +2658,10 @@ static int zram_add(void) if (ret) goto out_cleanup_disk; +#ifdef CONFIG_DEBUG_LOCK_ALLOC + lockdep_register_key(&zram->table_lockdep_key); +#endif + zram_debugfs_register(zram); pr_info("Added device: %s\n", zram->disk->disk_name); return device_id; @@ -2681,6 +2718,10 @@ static int zram_remove(struct zram *zram) */ zram_reset_device(zram); +#ifdef CONFIG_DEBUG_LOCK_ALLOC + lockdep_unregister_key(&zram->table_lockdep_key); +#endif + put_disk(zram->disk); kfree(zram); return 0; diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index db78d7c01b9a..63b933059cb6 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -28,7 +28,6 @@ #define ZRAM_SECTOR_PER_LOGICAL_BLOCK \ (1 << (ZRAM_LOGICAL_BLOCK_SHIFT - SECTOR_SHIFT)) - /* * ZRAM is mainly used for memory efficiency so we want to keep memory * footprint small and thus squeeze size and zram pageflags into a flags @@ -46,6 +45,7 @@ /* Flags for zram pages (table[page_no].flags) */ enum zram_pageflags { ZRAM_SAME = ZRAM_FLAG_SHIFT, /* Page consists the same element */ + ZRAM_ENTRY_LOCK, /* entry access lock bit */ ZRAM_WB, /* page is stored on backing_device */ ZRAM_PP_SLOT, /* Selected for post-processing */ ZRAM_HUGE, /* Incompressible page */ @@ -58,13 +58,18 @@ enum zram_pageflags { __NR_ZRAM_PAGEFLAGS, }; -/*-- Data structures */ - -/* Allocated for each disk page */ +/* + * Allocated for each disk page. We use bit-lock (ZRAM_ENTRY_LOCK bit + * of flags) to save memory. There can be plenty of entries and standard + * locking primitives (e.g. mutex) will significantly increase sizeof() + * of each entry and hence of the meta table. + */ struct zram_table_entry { unsigned long handle; - unsigned int flags; - spinlock_t lock; + unsigned long flags; +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map lockdep_map; +#endif #ifdef CONFIG_ZRAM_TRACK_ENTRY_ACTIME ktime_t ac_time; #endif @@ -137,5 +142,8 @@ struct zram { struct dentry *debugfs_dir; #endif atomic_t pp_in_progress; +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lock_class_key table_lockdep_key; +#endif }; #endif