From patchwork Fri Feb 14 04:50:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergey Senozhatsky X-Patchwork-Id: 13974463 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5461AC02198 for ; Fri, 14 Feb 2025 04:52:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D2AA56B0088; Thu, 13 Feb 2025 23:52:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CB52B6B0085; Thu, 13 Feb 2025 23:52:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ADE29280001; Thu, 13 Feb 2025 23:52:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 868A06B0083 for ; Thu, 13 Feb 2025 23:52:28 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 34179161CFB for ; Fri, 14 Feb 2025 04:52:28 +0000 (UTC) X-FDA: 83117329176.18.E26F16F Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by imf04.hostedemail.com (Postfix) with ESMTP id 5933C40006 for ; Fri, 14 Feb 2025 04:52:26 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=f5NIvtQG; spf=pass (imf04.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.216.52 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739508746; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8Ip299Sr+pOGhTseTfOOpc4VtCVbO+2qU2TEqvTRYlA=; b=wOpEcYgeJeAOOyw9E5uELp9tNoybHuewNoY2Z7NWQIlxrRlWECigM+1P1hHBfHtE6nkFFA itg8KFoPIsDkYPzgvlnrKmQp+anmPtOYnFJed7Dq3anrzBqgNB0LO9pmdeUXK6TRX0rRDu BL2sXoLSBQu2V+wizEwX/oCbVA5TlmY= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=f5NIvtQG; spf=pass (imf04.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.216.52 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739508746; a=rsa-sha256; cv=none; b=E2sB6pc/2BIMgUEItWYnpNYweyQ7ufUOpLxyDTbBpZaD6pfBJAeNB6caKnfYtkYoIo97V2 lv9zG6o20p6NaOhlBXyAkV2/X4HtFNmdyEHd1ZkjjJPvLf1+qXUz3KJ0GZ7993EwfqcM2y vaLigts3J7xpn4Xu3Pr9SDj0xNcxyo0= Received: by mail-pj1-f52.google.com with SMTP id 98e67ed59e1d1-2f44353649aso2467399a91.0 for ; Thu, 13 Feb 2025 20:52:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1739508745; x=1740113545; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8Ip299Sr+pOGhTseTfOOpc4VtCVbO+2qU2TEqvTRYlA=; b=f5NIvtQGf8rXGJPC3/PTrsuPim+iGKiqUSPRdzT4Rwqlt0nIXN+6mMFk/A3qA0l3J9 xPDU1OIpqLR9CPBZ6BYVYjUg0ouqfJ8CnoVoS3C6w5Y4vPltj9FB6P8FRPsEdgtiXhqT DniXRg/OCeq7nvCHAivPgpSjLHXBizirC7WA0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739508745; x=1740113545; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8Ip299Sr+pOGhTseTfOOpc4VtCVbO+2qU2TEqvTRYlA=; b=UGS++HWSCtOWeyGx78hNtdi1UqscqPN7hHf3XUK3kBMPKivVj4ao6+T/U9bhp8HvJ/ +zpSAnab4w7e5Kr2bYr6PT0Mowu/aUCEJBvdwTxixhXke75mYBy3j5Xu5Eujf9qLi+9y lWrkiw/5t7ymTeKR+mCd9g7dugLnwduqfC7XHAmYyFWeo4kIT8nwuFP+HCTOJmVWwkYs l78nivsERwI590vKMv8Oq64HSS9nLby45jicSA6NdjKNmBd2fPkoCnK8gLPCOQfrSBrg uE2pFsFPjTkl++VM2xzStjRuvzdBXDOEP/FLz27NhKYn560dwiBEfwKuqsiPx1E8ophg oATg== X-Forwarded-Encrypted: i=1; AJvYcCV/ne/kUj0RClvQ5Z+p2oSJGK1VKtyZ2PiWh3/qzQdomU4PjUKi1iDF3xIS84cnHmC+n90/k7lUNw==@kvack.org X-Gm-Message-State: AOJu0Ywpvb5adgcT4QtVDcRv00rACIIKW6sebXDIa6c1qiFn8saF5UUp +Gzf3vzZw3QGINPAj7iP7a8k+E5HLtUwH0jdxxklDIg/1uqtwZV5a0A/pCpHhw== X-Gm-Gg: ASbGncud5H9z1sV1L17P1v9uPVteGI5bZHOkKH4Xvb/UdyJZXR2z9TTOL0uz9QuoSBb AH0abyf3jH607ymWNxfOtfzK4BdHlEZ6nV48gVgQE/gz3PhakupBm2tG6AXzJyj00xSbLGb/ZWV YVdt46+EHx+yugHVNcXUB93X4ibo0R3EQGjnZYiM+z/aniTQEIrMMnH38UhiiuOIaSEFIlgHii6 QZuYe1JWlbc9XYpGYhU6PYgcOawMR6Cyay8Mz3qDL6ZzTLDtX3JXejRo0SGzsVb2tX6OSThnLzw w4O5xJvDgMRbQfoOEg== X-Google-Smtp-Source: AGHT+IElhSM58kSia8Ygok9EIOPYpgVDSk1Jw5b34RsIuUpVrp592z+ExnE3ByHbPmlZ4fx+OfAh+g== X-Received: by 2002:a05:6a00:3989:b0:729:a31:892d with SMTP id d2e1a72fcca58-7322c591baemr18875311b3a.8.1739508745187; Thu, 13 Feb 2025 20:52:25 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:942d:9291:22aa:8126]) by smtp.gmail.com with UTF8SMTPSA id d2e1a72fcca58-7324f64a39asm588836b3a.69.2025.02.13.20.52.22 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 13 Feb 2025 20:52:24 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton Cc: Yosry Ahmed , Hillf Danton , Kairui Song , Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: [PATCH v6 01/17] zram: sleepable entry locking Date: Fri, 14 Feb 2025 13:50:13 +0900 Message-ID: <20250214045208.1388854-2-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog In-Reply-To: <20250214045208.1388854-1-senozhatsky@chromium.org> References: <20250214045208.1388854-1-senozhatsky@chromium.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 5933C40006 X-Stat-Signature: 1uc5tar1b4id9gkqk6z7y661pr13u4ka X-HE-Tag: 1739508746-123828 X-HE-Meta: U2FsdGVkX1/4z5I2Uv5TmFmBRUzVkPyjwL1dLfd3SNbUgDIC31+pHQ+RUGcqdCscyfE/fOnBJe+OjayuTAAjQBv6mzjW4su2J33cxA+4Po38jHk77i8dUNrpPTpULceW6qWvanhxF6v6TrUwl5QXuOPLPi8t5ne1qnmJ6LwZ1Liztfd2QZZUbPyFdVWARBVcQo4WTn7oSIh+lSNy89Dx5M5L5ZYB2eN9mq0kNNnril3/G7RTBTaQEgndzTAw77sj1Bff/K4gzQgdUm6BqzE6MpWcr17Hfk/i27n0oFe4hHc51R1XNeVsYLmhZHminPIazq3H8vfolf1fH8fKUTdfpO0tpnkQg4WMUwhKuuvtx6NYAV1Ub0GN/4bA/0SBnhJKWhUkRpI60i+orjWvYWMfy8P2mg5aAAoh64y8sxkwxnxMEt/wtZSR2czS/qg8Nmla7zdT6UD4mE+J/Xe1ykma/RRChVb7cLoRX3gfDxyCugz3mwlSrVpKQ1lNUgcqTzwoYNfGNg0OybIzV871ZtAqX2/qV0y1j87MzssXnfe5+j2Ofshdx0oCArFBtEHmkJ+OrJOH8EIjdJ/MlILL1/2SWdd52+DDOrp/l58wW2oCX43OdcwQmtfLSaX9aMVsR9zY9E/FKHpnNIBxafHFX3h7SpwK/ZOO+72OVj9DX2Tmy7GM2IvLmpMu7k5ZMA+tUlqjyXUJMjx+qg45He7U8sJrxdYeLSgcsEXvnx3wrgRqjuQWG8hi4uEmHfwcqehXhsoxfH3IhKcDrKi0qM0cIN/QN4KxZINsKob3h20ltUm1Lqtv91j+CvkywW7/DHmqBXvEbd+gWud3eC86FAvHTMVo/3ONZux2UtVsdO7zLPtitb3pr96ZZa3pePGuZDR0bnQMV41ZqO2/UNwTBjVykqUyxL7RVpm6woejiUIq0MhzkDXLkVtrL67J64rsda3mqemioDjaPNlWWe+TMfSKM/f PapWGUxa og+aiHa+iSA/5qX/Y4QvEt8VksRR77yFpWyn5J50Wn0pd8Ic6j4XnCIxu0RyW8kksOD/QnIvBAUZDv8Trh/BS1yjsD1RuGzXyqVOFfxO1CzPfDshOyWKTRcKlTV4Ozw/82Bxvoxc+oJmKQfNst2F+JXU08lTzvMnEBrGiv7LDkHn40G6xh90NrKjWR+RCkTp3i9cOqN+x1/9mC2hfDwYdozmsFJ4hhkwv9eONiJefTumL2cr43gpVg+ZdLyUluyr5SHUiXLJEZ0rot6pyZOTLvJ6WB7rNqyAAOJR0KgZSxVyYX7tapEL8V3UlJe+6OeclIifyICpwVz9BTRaRIkisfIEQ78SRV9EknOwE9LvZVyxt4rMrXGiTFaObyeQTg+tcTByu X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Concurrent modifications of meta table entries is now handled by per-entry spin-lock. This has a number of shortcomings. First, this imposes atomic requirements on compression backends. zram can call both zcomp_compress() and zcomp_decompress() under entry spin-lock, which implies that we can use only compression algorithms that don't schedule/sleep/wait during compression and decompression. This, for instance, makes it impossible to use some of the ASYNC compression algorithms (H/W compression, etc.) implementations. Second, this can potentially trigger watchdogs. For example, entry re-compression with secondary algorithms is performed under entry spin-lock. Given that we chain secondary compression algorithms and that some of them can be configured for best compression ratio (and worst compression speed) zram can stay under spin-lock for quite some time. Having a per-entry mutex (or, for instance, a rw-semaphore) significantly increases sizeof() of each entry and hence the meta table. Therefore entry locking returns back to bit locking, as before, however, this time also preempt-rt friendly, because if waits-on-bit instead of spinning-on-bit. Lock owners are also now permitted to schedule, which is a first step on the path of making zram non-atomic. Signed-off-by: Sergey Senozhatsky --- drivers/block/zram/zram_drv.c | 105 ++++++++++++++++++++++++++++++---- drivers/block/zram/zram_drv.h | 20 +++++-- 2 files changed, 108 insertions(+), 17 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 9f5020b077c5..65e16117f2db 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -58,19 +58,99 @@ static void zram_free_page(struct zram *zram, size_t index); static int zram_read_from_zspool(struct zram *zram, struct page *page, u32 index); -static int zram_slot_trylock(struct zram *zram, u32 index) +static void zram_slot_lock_init(struct zram *zram, u32 index) { - return spin_trylock(&zram->table[index].lock); +#ifdef CONFIG_DEBUG_LOCK_ALLOC + lockdep_init_map(&zram->table[index].dep_map, + "zram->table[index].lock", + &zram->lock_class, 0); +#endif +} + +#ifdef CONFIG_DEBUG_LOCK_ALLOC +static inline bool __slot_trylock(struct zram *zram, u32 index) +{ + struct lockdep_map *dep_map = &zram->table[index].dep_map; + unsigned long *lock = &zram->table[index].flags; + + if (!test_and_set_bit_lock(ZRAM_ENTRY_LOCK, lock)) { + mutex_acquire(dep_map, 0, 1, _RET_IP_); + lock_acquired(dep_map, _RET_IP_); + return true; + } + + lock_contended(dep_map, _RET_IP_); + return false; +} + +static inline void __slot_lock(struct zram *zram, u32 index) +{ + struct lockdep_map *dep_map = &zram->table[index].dep_map; + unsigned long *lock = &zram->table[index].flags; + + mutex_acquire(dep_map, 0, 0, _RET_IP_); + wait_on_bit_lock(lock, ZRAM_ENTRY_LOCK, TASK_UNINTERRUPTIBLE); + lock_acquired(dep_map, _RET_IP_); +} + +static inline void __slot_unlock(struct zram *zram, u32 index) +{ + struct lockdep_map *dep_map = &zram->table[index].dep_map; + unsigned long *lock = &zram->table[index].flags; + + mutex_release(dep_map, _RET_IP_); + clear_and_wake_up_bit(ZRAM_ENTRY_LOCK, lock); +} +#else +static inline bool __slot_trylock(struct zram *zram, u32 index) +{ + unsigned long *lock = &zram->table[index].flags; + + if (!test_and_set_bit_lock(ZRAM_ENTRY_LOCK, lock)) + return true; + return false; +} + +static inline void __slot_lock(struct zram *zram, u32 index) +{ + unsigned long *lock = &zram->table[index].flags; + + wait_on_bit_lock(lock, ZRAM_ENTRY_LOCK, TASK_UNINTERRUPTIBLE); +} + +static inline void __slot_unlock(struct zram *zram, u32 index) +{ + unsigned long *lock = &zram->table[index].flags; + + clear_and_wake_up_bit(ZRAM_ENTRY_LOCK, lock); +} +#endif /* CONFIG_DEBUG_LOCK_ALLOC */ + +/* + * entry locking rules: + * + * 1) Lock is exclusive + * + * 2) lock() function can sleep waiting for the lock + * + * 3) Lock owner can sleep + * + * 4) Use TRY lock variant when in atomic context + * - must check return value and handle locking failers + */ +static __must_check bool zram_slot_trylock(struct zram *zram, u32 index) +{ + return __slot_trylock(zram, index); } static void zram_slot_lock(struct zram *zram, u32 index) { - spin_lock(&zram->table[index].lock); + return __slot_lock(zram, index); } static void zram_slot_unlock(struct zram *zram, u32 index) { - spin_unlock(&zram->table[index].lock); + return __slot_unlock(zram, index); } static inline bool init_done(struct zram *zram) @@ -93,7 +173,6 @@ static void zram_set_handle(struct zram *zram, u32 index, unsigned long handle) zram->table[index].handle = handle; } -/* flag operations require table entry bit_spin_lock() being held */ static bool zram_test_flag(struct zram *zram, u32 index, enum zram_pageflags flag) { @@ -1473,15 +1552,11 @@ static bool zram_meta_alloc(struct zram *zram, u64 disksize) huge_class_size = zs_huge_class_size(zram->mem_pool); for (index = 0; index < num_pages; index++) - spin_lock_init(&zram->table[index].lock); + zram_slot_lock_init(zram, index); + return true; } -/* - * To protect concurrent access to the same index entry, - * caller should hold this table index entry's bit_spinlock to - * indicate this index entry is accessing. - */ static void zram_free_page(struct zram *zram, size_t index) { unsigned long handle; @@ -2625,6 +2700,10 @@ static int zram_add(void) if (ret) goto out_cleanup_disk; +#ifdef CONFIG_DEBUG_LOCK_ALLOC + lockdep_register_key(&zram->lock_class); +#endif + zram_debugfs_register(zram); pr_info("Added device: %s\n", zram->disk->disk_name); return device_id; @@ -2681,6 +2760,10 @@ static int zram_remove(struct zram *zram) */ zram_reset_device(zram); +#ifdef CONFIG_DEBUG_LOCK_ALLOC + lockdep_unregister_key(&zram->lock_class); +#endif + put_disk(zram->disk); kfree(zram); return 0; diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index db78d7c01b9a..794c9234e627 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -28,7 +28,6 @@ #define ZRAM_SECTOR_PER_LOGICAL_BLOCK \ (1 << (ZRAM_LOGICAL_BLOCK_SHIFT - SECTOR_SHIFT)) - /* * ZRAM is mainly used for memory efficiency so we want to keep memory * footprint small and thus squeeze size and zram pageflags into a flags @@ -46,6 +45,7 @@ /* Flags for zram pages (table[page_no].flags) */ enum zram_pageflags { ZRAM_SAME = ZRAM_FLAG_SHIFT, /* Page consists the same element */ + ZRAM_ENTRY_LOCK, /* entry access lock bit */ ZRAM_WB, /* page is stored on backing_device */ ZRAM_PP_SLOT, /* Selected for post-processing */ ZRAM_HUGE, /* Incompressible page */ @@ -58,13 +58,18 @@ enum zram_pageflags { __NR_ZRAM_PAGEFLAGS, }; -/*-- Data structures */ - -/* Allocated for each disk page */ +/* + * Allocated for each disk page. We use bit-lock (ZRAM_ENTRY_LOCK bit + * of flags) to save memory. There can be plenty of entries and standard + * locking primitives (e.g. mutex) will significantly increase sizeof() + * of each entry and hence of the meta table. + */ struct zram_table_entry { unsigned long handle; - unsigned int flags; - spinlock_t lock; + unsigned long flags; +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map dep_map; +#endif #ifdef CONFIG_ZRAM_TRACK_ENTRY_ACTIME ktime_t ac_time; #endif @@ -137,5 +142,8 @@ struct zram { struct dentry *debugfs_dir; #endif atomic_t pp_in_progress; +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lock_class_key lock_class; +#endif }; #endif