From patchwork Fri Feb 21 22:25:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergey Senozhatsky X-Patchwork-Id: 13986373 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BBBCC021B3 for ; Fri, 21 Feb 2025 22:30:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 318816B0085; Fri, 21 Feb 2025 17:30:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2C8556B0088; Fri, 21 Feb 2025 17:30:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18FD4280001; Fri, 21 Feb 2025 17:30:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E96B76B0085 for ; Fri, 21 Feb 2025 17:30:11 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7BC83C07E0 for ; Fri, 21 Feb 2025 22:30:11 +0000 (UTC) X-FDA: 83145396222.14.8132B26 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf07.hostedemail.com (Postfix) with ESMTP id 9446140015 for ; Fri, 21 Feb 2025 22:30:09 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=Uu0GrqnF; spf=pass (imf07.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.169 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740177009; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=DIJ8ddDA0cCVokuzmJJWIVHAKQ7UhqMQC/3iTM05nQo=; b=x1tj26iiehEqb2C4YRRaIergz91CDGByLTOBgJumpLDg0xRUwkjZDQCuJqx1qTSJKveeRx 5yGaOoOWWmB02FzL4IdqeANOc5yB16xnaONd4E4dnI1mjQ8hV2LBvZ14O/eCYbk6aYLQmK sj+oxEa/ISkDVIrXDBW73EcwPepfiYo= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=Uu0GrqnF; spf=pass (imf07.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.169 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740177009; a=rsa-sha256; cv=none; b=7eDTugrrrNqFyd10QQNapgtDUU11VHnMyv6ISA9I0SwMAdYDNV7wXpMyffZdhGL+fI9TVR UdHAcW5B4nzQV3UsuEePTzdki8l+TmiTIMuar93PFFKZkQfmbF9zqwtAal6xk4SvCb9SNu XrC+35TsJeMSVcnYZD3hFtJ5o03Is2k= Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-220ca204d04so41315305ad.0 for ; Fri, 21 Feb 2025 14:30:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1740177008; x=1740781808; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=DIJ8ddDA0cCVokuzmJJWIVHAKQ7UhqMQC/3iTM05nQo=; b=Uu0GrqnFGIiPpHey7UT59nyYODuYfYWydS1eB620HMWCLU5lpS1BP3IVEv7d3CsG6Y tC0Gn2KueXTY0gGMt/8z50Pgz5C5gIAsH0cZGLwKy+OyqDXM+H7eRIC6tyDf5h4pnKgS QI4e9JzfAJ+HfHxve3hnOHqYju9F5xQnq71oE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740177008; x=1740781808; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=DIJ8ddDA0cCVokuzmJJWIVHAKQ7UhqMQC/3iTM05nQo=; b=roNflJMJfXQdg7JlkfALWcLfpDkMYW7rg0CWDVDt6uYDUXKJsi6xV/si86HrlLbWrQ zHI9DhvWGpqlTLH6DfkJwta1RfL4WpCKOYoVvX8vfhu5D/FgcDINhCwbzU6iJDqKqL8E tD9GUOthDiJBde3ZvP5pvir+SbC7rXSc3fyr7w/tajPS8PHlhPEX3eyuffE65mm7qmtx jL9202XFlMYXWhBPUL3L2HFz4tAJQQe7pTSSff/k2WKdumZz7Y6QTjkIE3NNpi3gsXc0 fHoCcZNNEkDSSpCHWNE00stAFx38ESbtp3spq75X8rM0LZ99jULJWGoGjXft/NhGBVn8 eo4g== X-Forwarded-Encrypted: i=1; AJvYcCURyFjpY/Q01iokuVc+anvxm60aJwimTBLNpOs7a2274giTsaJ9qlKhFz2RHAk3wzQN5qEYl0SapA==@kvack.org X-Gm-Message-State: AOJu0YxzLBGnAqSdLsKmdg7t8BEQb53yr845PyrHgK58oVacsXGxfcCw qckN5r4f/Tnig0ZW8NbNzyQX/AoULDOzou1FQ5MsO5sBtUEnJeYegMABPdm3eQ== X-Gm-Gg: ASbGncvT6N7roN3Jj0olUP2cRxYmSU645ECvuG0pfy8gU0xo9V8FZf+Ls3RWT1MajFE Gl3cFuY6aGgLyVQD8P/JtZQUhXLQqC7aDecwVXo9r6/yBFZKsXiTM8F3qEYgBx67mqYVNDJsZkv ycPP/HMEJkpjQPdfdpHKIJIV3ofTF5Cg7vD33anUH8kxzyZAL+HwVLiMs6ktAv2cn0SJo9x4KRi LJ8gGeKpfjy66YkhDhXYmhYNp6P7h2vaegHAARz9ajs8eEqGbSbvjzuBKRyC8sAv2e19+hfdyfE 5J0eFq//C3rHbP8rXUxNKhwO0yw= X-Google-Smtp-Source: AGHT+IFMB9Iixlk1xyjRwX7FZEzjCfDWa9s2c2YYB21O1DMz6m7MAeKzldOB2KaTkADDRvC7O7Jw1w== X-Received: by 2002:a05:6a00:21cc:b0:730:794e:7ad6 with SMTP id d2e1a72fcca58-73426ce84d6mr7166039b3a.14.1740177008411; Fri, 21 Feb 2025 14:30:08 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:f987:e1e:3dbb:2191]) by smtp.gmail.com with UTF8SMTPSA id d2e1a72fcca58-7324277f388sm16800540b3a.174.2025.02.21.14.30.05 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 21 Feb 2025 14:30:08 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton Cc: Yosry Ahmed , Hillf Danton , Kairui Song , Sebastian Andrzej Siewior , Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: [PATCH v8 00/17] zsmalloc/zram: there be preemption Date: Sat, 22 Feb 2025 07:25:31 +0900 Message-ID: <20250221222958.2225035-1-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: rogydadij569eha13fi7y3uzckf3smsn X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 9446140015 X-HE-Tag: 1740177009-700816 X-HE-Meta: U2FsdGVkX1+dBTMsvUeD/7EJPi7UGsGVYB+P/j6GDT4RxGDDyGOjtIz7WT6tw1Z6IvT6lsU+BDQdcCV9++u1ExBIlQ0vuQiJIG4jy03UVBasNY8GhjsfEWIaoGbanUfUcrPNI5RJOJjPrYjMpNfhHHStcDKaUDF1qrKLxdpD9c8Zvh6Y9QCyAhm6vocCWxKEB70olwB2JdTqFJkyn5JUf0vb6SsZYY/Gvt3M+v64uuJxzERJ+zoEI/QUTG4yKog5W7uLdNxuXuKFExJ8brHZtLQkQkW9VVEnFlz8v1cAxIxzP9bCm65zqYPyULa/C2lakAYcGQWCV5LlvGlkVCZ0Z9m70Rx1RkyO8B0o3IKDMTJyFyYqz9jA/O24PkxNr77gl9dpIUKJQ9lgtvJh2zw15hFuKr+ZbBISANsdX9aBhhjQliQEej8e2/Kb+SToK7YFP+aR1J9y0agEt2Nz86c0ElOg3YLpaPNCWsC/Z4gO+i3Ij72CcLriS5Z/04E5iR0Vf5U0D0geyc6df8CLlpQjObI19J19FpClKVaVP9toURhM1JBkEyeTJZ00GAR2gTtSXDSOz7JzBY+zWWWoCFv0BcylCWGx9vd9BMY3Q65JMHP72wqBqS6vse0HDsZEKfdWhe119kv3B/JQM1e7jqOzbZjZxy1TlLOtwST9zLpwW28c78i4VmEzhIrScc0+IrhlN4NIOXL3hl+nFsBfs1W43GjxYTv9W9rzzkMMObg1wZGDdLOA6sGm5AtmV9lqFkKZLH6bebtgJ/WWGWzB8Prtnes8weL50g9C56/fPjQRKzDfd7YC3jTdUN6USZUiVCr5hPe1nReAVfNmMt/tCVZaUgUMkaid2ecT1j4vc+oXK40ivyckdip6visGDUFPGn2ok9uRvIOIeIJPr/L6EcyYpLDy7L8Wj1BQfuqFJC+sCHKF2pIlbp2b+D6TZB3LLCJZWpLWXZe6ExFxYFpxOkH c2QCNIHU Lrp5e4G0po+/8oztIWMARFUAw0xPHpriZMlj2Uxi8ZhWmaqcVY+6x2oa4zvOOC6jtX8e9KB9XAbPjjyOYkvPw3nnj9bxWAnl8XxCynEGH5etQT8pmYtvhfqqep5G0l7x9VdKK5ZwI3THDUlZBtMFKVheJQbfMKVHaZztVukuc4jR7uhMWdZP8gTI1RqAFC7otrezUZLRSU2fEF2P/5LGljqBGHeU2yXQfQngXSpRBsfPksLaz7wlEk5F4YbOy0GL9ZOPbOP+HGQa9NFzF8OqR5jruEwp0dVwdwFV7sN391RjIkPdfsXsGlYNl33Jx3cZhm28L X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently zram runs compression and decompression in non-preemptible sections, e.g. zcomp_stream_get() // grabs CPU local lock zcomp_compress() or zram_slot_lock() // grabs entry spin-lock zcomp_stream_get() // grabs CPU local lock zs_map_object() // grabs rwlock and CPU local lock zcomp_decompress() Potentially a little troublesome for a number of reasons. For instance, this makes it impossible to use async compression algorithms or/and H/W compression algorithms, which can wait for OP completion or resource availability. This also restricts what compression algorithms can do internally, for example, zstd can allocate internal state memory for C/D dictionaries: do_fsync() do_writepages() zram_bio_write() zram_write_page() // become non-preemptible zcomp_compress() zstd_compress() ZSTD_compress_usingCDict() ZSTD_compressBegin_usingCDict_internal() ZSTD_resetCCtx_usingCDict() ZSTD_resetCCtx_internal() zstd_custom_alloc() // memory allocation Not to mention that the system can be configured to maximize compression ratio at a cost of CPU/HW time (e.g. lz4hc or deflate with very high compression level) so zram can stay in non-preemptible section (even under spin-lock or/and rwlock) for an extended period of time. Aside from compression algorithms, this also restricts what zram can do. One particular example is zram_write_page() zsmalloc handle allocation, which has an optimistic allocation (disallowing direct reclaim) and a pessimistic fallback path, which then forces zram to compress the page one more time. This series changes zram to not directly impose atomicity restrictions on compression algorithms (and on itself), which makes zram write() fully preemptible; zram read(), sadly, is not always preemptible yet. There are still indirect atomicity restrictions imposed by zsmalloc(). One notable example is object mapping API, which returns with: a) local CPU lock held b) zspage rwlock held First, zsmalloc's zspage lock is converted from rwlock to a special type of RW-lookalike look with some extra guarantees/features. Second, a new handle mapping is introduced which doesn't use per-CPU buffers (and hence no local CPU lock), does fewer memcpy() calls, but requires users to provide a pointer to temp buffer for object copy-in (when needed). Third, zram is converted to the new zsmalloc mapping API and thus zram read() becomes preemptible. v7 -> v8 - also provide helpers for lockdep class_lock to remove even more ifdef-s (Yosry) - moved zsmalloc lockdep class_lock registration so that on error we don't un-register a not-yet-registered class - tweaked some commit messages Sergey Senozhatsky (17): zram: sleepable entry locking zram: permit preemption with active compression stream zram: remove unused crypto include zram: remove max_comp_streams device attr zram: remove second stage of handle allocation zram: remove writestall zram_stats member zram: limit max recompress prio to num_active_comps zram: filter out recomp targets based on priority zram: rework recompression loop zsmalloc: rename pool lock zsmalloc: make zspage lock preemptible zsmalloc: introduce new object mapping API zram: switch to new zsmalloc object mapping API zram: permit reclaim in zstd custom allocator zram: do not leak page on recompress_store error path zram: do not leak page on writeback_store error path zram: add might_sleep to zcomp API Documentation/ABI/testing/sysfs-block-zram | 8 - Documentation/admin-guide/blockdev/zram.rst | 36 +- drivers/block/zram/backend_zstd.c | 11 +- drivers/block/zram/zcomp.c | 48 ++- drivers/block/zram/zcomp.h | 8 +- drivers/block/zram/zram_drv.c | 283 ++++++++-------- drivers/block/zram/zram_drv.h | 22 +- include/linux/zsmalloc.h | 8 + mm/zsmalloc.c | 351 ++++++++++++++++---- 9 files changed, 488 insertions(+), 287 deletions(-) --- 2.48.1.601.g30ceb7b040-goog