From patchwork Wed Feb 12 06:26:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergey Senozhatsky X-Patchwork-Id: 13971028 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3D50C02198 for ; Wed, 12 Feb 2025 06:32:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C24516B0082; Wed, 12 Feb 2025 01:32:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BAC7E6B0083; Wed, 12 Feb 2025 01:32:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A4D196B0085; Wed, 12 Feb 2025 01:32:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 859096B0082 for ; Wed, 12 Feb 2025 01:32:03 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 1E2D21A0E63 for ; Wed, 12 Feb 2025 06:32:03 +0000 (UTC) X-FDA: 83110322526.08.87B4B8B Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf21.hostedemail.com (Postfix) with ESMTP id 2C9D61C0009 for ; Wed, 12 Feb 2025 06:32:00 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=nrwbpU2v; spf=pass (imf21.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.216.50 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739341921; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=501yi1cpVrmEp+ncJhrBCmHl7nTzT7vsrJ8f3PE998M=; b=mdJmIjF37O+4v4JsMdmU0Jm3SiVT8HWIjmlamVwTgZZ4fNJ55n0+oIFtvuUFAVnJ3xyi7R Dc4mGg7k6hNy5EtKn8kSHKfAawKQhib+r864vM2Ee9vu4tKuRN67g5sPiBsZXjFOfQhHNU BsY7YbDGe9l9VD0Nm48CzQEKcwyN35w= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=nrwbpU2v; spf=pass (imf21.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.216.50 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739341921; a=rsa-sha256; cv=none; b=dGzWla2IwGYKlpomKBWYomlklhRpcaOSjFQQm6AbT9YWQr2NXIDcWDzyWVFStZQ2j9AZJ+ QFir2kpGXpRlaeDWZ/OZCqYIFNRzTbO+AuDK/2pIzuIUYdhJGkAPNDHZe3yylc2LLWtU1Q bhNQ2GAln0F91jf2dBP2CnBH+ZY6ihI= Received: by mail-pj1-f50.google.com with SMTP id 98e67ed59e1d1-2fa3e20952fso8250141a91.1 for ; Tue, 11 Feb 2025 22:32:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1739341920; x=1739946720; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=501yi1cpVrmEp+ncJhrBCmHl7nTzT7vsrJ8f3PE998M=; b=nrwbpU2vDf5XLz5dYKQTW1WDcmpYL+N5TSD67lqfZtdFzjpWtxSzdHFydoF7wIMZpe LsuTIWOPe/5Sw7FOcJKVUdHAXTZ0RS7yiqOYUaSfpwbG8IamkU8iwWCd34QjWEflQGbx j7utTHu8NSRAMDRit1hBZFIXuaMPR4dERDYA0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739341920; x=1739946720; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=501yi1cpVrmEp+ncJhrBCmHl7nTzT7vsrJ8f3PE998M=; b=Myp9hhSgvFncdgwSAL+9Lg/DNndNCzsBzE1PTA7td2iB6w0fMC3ODhgt33J8VImzY7 JJS4jx/5h3JNMf/4n8EXXSC5LoZqsJbih4ZnjiOyWITDK41bHdpRWpSCasRRcxuBSmFu usZy8tvoFhK1sCtfctbMClMcJ3q0eXZJrtwBQ8q+TysHC7bz5jzWs9VpvSeLvuQoffOH 0CxZ5pPwLdWPwDAfpOpItwnDOkpyMSuAjgybe8w8h5QhfCS/CD8bxtHHXe604PY+SwfP ixxG4QSB2DkFhj4ZqiFAa/qtNSJnQ3JQdOrbwHqXe8HhxtcbPOly/gPM31pyf6iFXcLQ XQ0Q== X-Forwarded-Encrypted: i=1; AJvYcCV8F79hCBbt2C/1+o0MfKkseg1VK0rjaZsFnZYVXMbEQ95hHcD7UPwrN0SW1ufoWeovbqn5jQp+Jw==@kvack.org X-Gm-Message-State: AOJu0YxxKSpECQAXksXtI9Ngy8+0VZx4q++bcFAH2OsyJ+M+01OaJCDK OD5Gf5LmOQFJjPRSgxpvvSyU3KJCJnQd5EJ5nCV65m+QB5HlHjxKr2Gj/7kxnQ== X-Gm-Gg: ASbGncsDM/GuRxcFJwOQlhzmFr+BrESRRBt9D2oCHR8VnI6CP2J4UTei645YeTmcM/k OxuFdsz9ICa52z4VVHIZEAkTsNj4p5xjjx70b5rsBb2fgGZB6nL6JNIkOWRfVtswmrdN/qGfFoj rji19YJhRHpmine01lSFc/Gr4+ySMc+ZbdWQW3las5BpItcv88vQETQN3RHTSywBVQzI/fX2Ha8 wpWDJfERKONAUJAQ3z39C+K6CETXZpoP3JP526ePQaQ32ooFPjxJ031C8LHxK5SdsYSrpV9s7ok 1vao42JqkqoNwh+VeA== X-Google-Smtp-Source: AGHT+IHiV9B6eIfyg2Gsmc1ORW5sQ5pz7wffjd71n/VATHAnz5v2omml/47WVzTGc9TaWCYUQEpqog== X-Received: by 2002:a17:90a:fc4c:b0:2ee:48bf:7dc9 with SMTP id 98e67ed59e1d1-2fbf5c1bfc6mr3357534a91.15.1739341920049; Tue, 11 Feb 2025 22:32:00 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:69f5:6852:451e:8142]) by smtp.gmail.com with UTF8SMTPSA id 98e67ed59e1d1-2fbf98d3886sm663293a91.25.2025.02.11.22.31.57 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 11 Feb 2025 22:31:59 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton Cc: Yosry Ahmed , Kairui Song , Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: [PATCH v5 00/18] zsmalloc/zram: there be preemption Date: Wed, 12 Feb 2025 15:26:58 +0900 Message-ID: <20250212063153.179231-1-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.502.g6dc24dfdaf-goog MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 2C9D61C0009 X-Stat-Signature: qx1de6mfkaux1urodstgthtzy15mddqx X-HE-Tag: 1739341920-498287 X-HE-Meta: U2FsdGVkX1848+PHRG7/utb1S9EZozekqR3MTh05KSiA5lNTViEgrBliBezu3aJOnt5B4pOFWdw23P6hHCiv7AU0xh5qqbu1ykPMniL/7zzd3thTai4tI470Nif/oAjTqy1inSFRfAKUMti3L49eWHBY8t8mcekFoRHnZn+/ML4wFqbAeutgnoMLqcKFaPhIPlE5rW858uI4Hy6kX0MUwHetX06U356U7ue2oUixSFFzvQ0BxnrNk5nXXNRZPLrIqKm6QL8YMKqbVL5yh4km3nspkcwP40X8qQxEjr5IpYQ4NXX+MP8i0aPIke32+Fz9/kM0r4X0coGN1v6+Y0pJeUOwcNHFHeklCHi54XM1JB9d/Ivto/oUZUL+8hVVrIRGNcL3TULY1uOFEZ89Q7wsUK/z/f+577v7zPmvGGLHEOoNHNsRW7ZZJimUpOU0rr3TQhjfi7rjN+uOzm/QLKfi+VcipYG84o12GFLj0KqeqrarzuTwJEOXmoR7fjPWtgf1GZm4tjBKjAde1IPIatSubcaXjfL+Iz43Sy75H/DcDwqwNPHlExEw1J6XJzZtlD2+/YrEZd1rnM0DX4IG+9o5M+BBWHO3jA0vkkYKhQ6n7PGjxS0vMQMUD1ZRlg/Si78kk6u7KLJf2rpG5JNtbBUKM+2DnPgHjaiEo2116lpQUO76wOp+LeGLsApab8YvIhMI/V+N6fbJ2BXD4bqecB3m8/LsmLk4HlAkni/V9kPUT9ui2HWXxEaaVrpQr+PTYm/wbqZTxjuM6AgmaEN1XB93xPNje29rOfM2qYpFCn0lcm6zo7NBgZ461Q2vYsdBlT+9GEpYQmE8+0z/PjRSGv1yAyzbOgB6p5GCMMAkqUL5EQJ5pWpl9SL04vZISF4OlJ3JUmqksG7RQWB29BasNTad4KYIVUExKlr5gAxSiOoxejsbbtl0ja45sK0JYJVht6OI+d0hGReeSXgjpf7b3MI KfRGK7T3 t/qK2yFLq8aIetllH2oZX2HJowSzuuz86Av6tUqFsds9c/KdJOZWaoehuV1aVTCArXz/v1+trEAlcShe6alOhJ8LJkwd3DK62eifgp4Bm9LqktDOvEJwsnssnjJehIJyG5IIZY/chdsYgyckmAYLKoxpudEEs9KB1ALH0bCRQC7FGhEOcRILPjQ5UXaR8I+qDThm6tn6p0Enr5wwRWQ7DrrQraHcQ4LbYy0C8c9VmmA1VrnbPIZ7j2By2jKSqJWPvPLjUM9NtHCkvFf77eQtyokExzpVyZroUac7bF2hnv8qoJ6Hto/GUHa9BnXcSWd1h10TE X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently zram runs compression and decompression in non-preemptible sections, e.g. zcomp_stream_get() // grabs CPU local lock zcomp_compress() or zram_slot_lock() // grabs entry spin-lock zcomp_stream_get() // grabs CPU local lock zs_map_object() // grabs rwlock and CPU local lock zcomp_decompress() Potentially a little troublesome for a number of reasons. For instance, this makes it impossible to use async compression algorithms or/and H/W compression algorithms, which can wait for OP completion or resource availability. This also restricts what compression algorithms can do internally, for example, zstd can allocate internal state memory for C/D dictionaries: do_fsync() do_writepages() zram_bio_write() zram_write_page() // become non-preemptible zcomp_compress() zstd_compress() ZSTD_compress_usingCDict() ZSTD_compressBegin_usingCDict_internal() ZSTD_resetCCtx_usingCDict() ZSTD_resetCCtx_internal() zstd_custom_alloc() // memory allocation Not to mention that the system can be configured to maximize compression ratio at a cost of CPU/HW time (e.g. lz4hc or deflate with very high compression level) so zram can stay in non-preemptible section (even under spin-lock or/and rwlock) for an extended period of time. Aside from compression algorithms, this also restricts what zram can do. One particular example is zram_write_page() zsmalloc handle allocation, which has an optimistic allocation (disallowing direct reclaim) and a pessimistic fallback path, which then forces zram to compress the page one more time. This series changes zram to not directly impose atomicity restrictions on compression algorithms (and on itself), which makes zram write() fully preemptible; zram read(), sadly, is not always preemptible yet. There are still indirect atomicity restrictions imposed by zsmalloc(). One notable example is object mapping API, which returns with: a) local CPU lock held b) zspage rwlock held First, zsmalloc is converted to use sleepable RW-"lock" (it's atomic_t in fact) for zspage migration protection. Second, a new handle mapping is introduced which doesn't use per-CPU buffers (and hence no local CPU lock), does fewer memcpy() calls, but requires users to provide a pointer to temp buffer for object copy-in (when needed). Third, zram is converted to the new zsmalloc mapping API and thus zram read() becomes preemptible. v4 -> v5: - switched to preemptible per-CPU comp streams (Yosry) - switched to preemptible bit-locks for zram entry locking (Andrew) - added lockdep annotations to new zsmalloc/zram locks (Hillf, Yosry) - perf measurements - reworked re-compression loop (a bunch of minor fixes) - fixed potential physical page leaks on writeback/recompression error paths - documented new locking rules Sergey Senozhatsky (18): zram: sleepable entry locking zram: permit preemption with active compression stream zram: remove crypto include zram: remove max_comp_streams device attr zram: remove two-staged handle allocation zram: remove writestall zram_stats member zram: limit max recompress prio to num_active_comps zram: filter out recomp targets based on priority zram: rework recompression loop zsmalloc: factor out pool locking helpers zsmalloc: factor out size-class locking helpers zsmalloc: make zspage lock preemptible zsmalloc: introduce new object mapping API zram: switch to new zsmalloc object mapping API zram: permit reclaim in zstd custom allocator zram: do not leak page on recompress_store error path zram: do not leak page on writeback_store error path zram: add might_sleep to zcomp API Documentation/ABI/testing/sysfs-block-zram | 8 - Documentation/admin-guide/blockdev/zram.rst | 36 +- drivers/block/zram/backend_zstd.c | 11 +- drivers/block/zram/zcomp.c | 43 +- drivers/block/zram/zcomp.h | 8 +- drivers/block/zram/zram_drv.c | 286 +++++++------ drivers/block/zram/zram_drv.h | 22 +- include/linux/zsmalloc.h | 8 + mm/zsmalloc.c | 420 +++++++++++++++----- 9 files changed, 536 insertions(+), 306 deletions(-)