From patchwork Fri Feb 21 09:38:05 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergey Senozhatsky X-Patchwork-Id: 13985088 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7AFAEC021B3 for ; Fri, 21 Feb 2025 09:39:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0B8DD280004; Fri, 21 Feb 2025 04:39:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 069D96B00B3; Fri, 21 Feb 2025 04:39:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E269F280004; Fri, 21 Feb 2025 04:39:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C29016B00B2 for ; Fri, 21 Feb 2025 04:39:52 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 7E1BAA1BFB for ; Fri, 21 Feb 2025 09:39:52 +0000 (UTC) X-FDA: 83143455024.28.DB7F60C Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf22.hostedemail.com (Postfix) with ESMTP id 8EA7AC0002 for ; Fri, 21 Feb 2025 09:39:50 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=H+mAIsMP; spf=pass (imf22.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.169 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740130790; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=r+V102a8CvzSvRQ+sxK0ffvIfOAyV8bBQDBTXVAo23s=; b=pXqSqGDu3Xb7GTXrS5qcFjG+/GXSSjkIDpKclPgrSy3c8kwRrxIdEC3ld/nX3STI7B02nc YgvcEugoNMB0B1mHp5n2OoM9/2XPF8Z3NmKtxU+1WVaNCNEmURUy+8+g5jrm5mCc+wLXu8 YXv/RxNSUoMrBTYCSU6Ll3+gfHa6DIE= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=H+mAIsMP; spf=pass (imf22.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.169 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740130790; a=rsa-sha256; cv=none; b=ULC7GBJA+xwLoXEXE5wcQcxtlqYlRIWef+Fq1NCAvZJ0/NMiU1sXMimKbfEEzeOI6ION1X /mRtFyG/pSqqUwNIMdvfhFzGUoPoQZ585X4RfO8y0l5wbpvNejHH3XWL+WCsjn1kpLL037 ZBAg3iKWsilirU7cW2/F+4yMml+KhRw= Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-22114b800f7so35341855ad.2 for ; Fri, 21 Feb 2025 01:39:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1740130789; x=1740735589; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=r+V102a8CvzSvRQ+sxK0ffvIfOAyV8bBQDBTXVAo23s=; b=H+mAIsMPjDCemiwb7PwLbLxQm0cXRCr8CixkFk736H703JSHTE1jZd/a0Iz+uQd625 RKvd+ynYgJiRm+rC/wYWWqAD7A2HM2vHDRrJ/YAzrseDwARQvFSt2GUVbgA0Ls8CGHNu dHNN/4BIA1qs4zwOMzVWBP6T7xvcRiUWMumzw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740130789; x=1740735589; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r+V102a8CvzSvRQ+sxK0ffvIfOAyV8bBQDBTXVAo23s=; b=npftPjS1qvtYoGj8DwZmLrEAT+xngBZSt5eWLRKDEm70FiU4nA1+k1827s5D09Jw58 42eXnUv3Hjvra+REDuu124AcVkWE1gfceXX7yhzK4ChoNWa/01kLAEDOAnIE72lmLKr9 t7HStLdxNcypcHokr/JNDjYgKlOr6o+60BB1/P56Xud8vvuZuq07Lj87prT7sjcVGIEo Vr+S0vjWKcKGLmdR8wDKDWHQoSNGXWlWulp0F3o0IcaVeKpKFbU/ASiax2ujzDGnBMY8 z6pTpLMiukEr0WU3yoUWUjVEuGPCgyv7nfaKIYvKjCzk9YTjDznGby/tjD5g6pMluZJl mEeQ== X-Forwarded-Encrypted: i=1; AJvYcCXoDFEZZGdCLTmNNoaKD5mCdz3x8yfIRzIBmtUYe/qc0XX9LN2GFjFAilO64vW8jcjswZavp6F8DQ==@kvack.org X-Gm-Message-State: AOJu0Yzh5/UTCVF4r6+iv24eeeqymuTed7PpSp07vVz1z45o4C1nSOcl aGEzQbVYHt9e7IsvspEjRtKg6+NgqgLTkUV70QOeS89GM5cPhaODwDlMlkR+IQ== X-Gm-Gg: ASbGnctWUb3UsU1Xvmqxw5BuZywsP4H4yf92DzivbGCUS4cycU8WyAelo0PnOHLmn5j G784lTvfnMCZi9GAkfn5vYxc53UZisf8a+IoZCKabZPHi10KmmlIVAtK1jMkI4vV74mYvTk9Iem 8GEfhzrpeNOM1BN5uWDnUxUjRneFET8XkgW3y1yEcd4bdY6cUgqIXacX15F1b2CziCvv+YD4nx3 a2/mnwcinaRdCs4Gp6x2o+YYKhsUEIBKIDVEePq20FPwTV+PaQZOPyTTm2ARG5h3T/UlzDRNCd/ s/IW9ztbP1ZAtngp4QN7qKxkvzE= X-Google-Smtp-Source: AGHT+IFJYSryWZFsvikewIDDOjJ5biVq7e7y/3wvbBGO0tIKWvvm6+aR0zFU9xY7HWK8MXDKvHfkow== X-Received: by 2002:a17:902:f712:b0:221:7e36:b13e with SMTP id d9443c01a7336-2219ff50d62mr52630715ad.12.1740130789480; Fri, 21 Feb 2025 01:39:49 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:f987:e1e:3dbb:2191]) by smtp.gmail.com with UTF8SMTPSA id d9443c01a7336-220d55866ecsm133407275ad.212.2025.02.21.01.39.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 21 Feb 2025 01:39:49 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton Cc: Yosry Ahmed , Hillf Danton , Kairui Song , Sebastian Andrzej Siewior , Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: [PATCH v7 12/17] zsmalloc: introduce new object mapping API Date: Fri, 21 Feb 2025 18:38:05 +0900 Message-ID: <20250221093832.1949691-13-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog In-Reply-To: <20250221093832.1949691-1-senozhatsky@chromium.org> References: <20250221093832.1949691-1-senozhatsky@chromium.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 8EA7AC0002 X-Stat-Signature: 4rrwj1tafkqephks9diisy1oreepyofr X-HE-Tag: 1740130790-647884 X-HE-Meta: U2FsdGVkX19X0S2O/tLnr4+utQQ7Lh9+p2kkUtiY1EbX/DPPojqhT7T5kSeHiAJz0Derb78kJ29oCljFc5Ag43FBdovBlwIqW7gA5pkO7k7NU85C1FBGE85tYzJ0mQwABthUnCFUhYQ/BD4Wz/sU1/HMURyrMQN2K2g4TzGMC64UV95h+lYJ/OBNTfsrBCX7YYg3F5Y4W8tiCqmyV2+WQ/hyySgJRS+bbp5edUFY76VzkY4HW5TdunOVRv8Evu090XQLWPRdXyLJS7RhdFijH6p5t5xdDy0/qVo0ol7gP4FPJHcSXBWkfZ4vdH2VxW1fvT7jFib2W9rTgQJDANpw7Q9t7cuf8hBZ3jJ8GOqmQ3U+2KWEI0J+GWBQ7mDGYrIlGe6FC9rpSkGK86fN2LxpuLX9uz5Szlw+JuiojK/AWzMOeuMB1nOYpq83IfcXPJIxRhd6ts3eL9l59USfuVz8MiUC2Frh99IO++iwo6lkcZEfiuN2VToPtgQP1sn7UR+0IsXg3JmSI6QsiyVe09+aAqjYoI9YL7xvWETEOCTptq7iD9erTek+GmKTvQIzyOfhNnwkHwufkXs6pZEt7r9pCMjWh6L1c1Vto+j/qHr8G2410Mpy2o7GcS9KPFA2ubIlp7xCuOXNikbaj+eRAmxZul29hN7sxzesU20rTCunkvAy1l5WTeVhaRE3++vrlzdr1lS7UuGcWe3VUg5k6QL90clbZEUJt21QQfduREAH66t9Ny78+ca7UF6kVje0W+eZOZ7yBpX0tR8up/NVGfeGYtkDleQG7jV6LocS0Jc0rvqu9sEMaRdDA3DCaVbhFo8mmJbMwZCZ/fxaGlnEuJKYLzeSlETChMEYZoDtijwqXv3baCmPYhcdiF0MKZ4ls91XzIJ6FL+iwDxIbS1815Xsl0MrUtZ5/4AryJ65nJG0AQNupJvk8xkj/FNAliFu0qQTyh+xZOlJdD09bHy51PJ Q0wMTh1b HPqg7jR3sl2VVrhYQaBK6UoP5lFq2qGUfovp7jhFF8F8ugfxvnRj46dAYtWvZcrxZlEp788W5O0LpJgoAzLu6jaTT1AVqTAOudeFLEF3LywLnsiykamhcfjQK2S9oK+OPQUk9Bf0zqQn3kcEKiPYk4dgq0HgnlDLTxVT0DlBWV8h2T1l748PDuT/hcxqq88WvzruOxKJ4GTm9PUalJntexjMMViFTbpFbtWxZ7/BRJhooWvX+csnskzq0PHmu/nMRyZ3+IeSQi9omgIcjVxeDHJmFFmnfOJV67tDAbB9mB2IX/Q1D8eF4GUeslw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Current object mapping API is a little cumbersome. First, it's inconsistent, sometimes it returns with page-faults disabled and sometimes with page-faults enabled. Second, and most importantly, it enforces atomicity restrictions on its users. zs_map_object() has to return a liner object address which is not always possible because some objects span multiple physical (non-contiguous) pages. For such objects zsmalloc uses a per-CPU buffer to which object's data is copied before a pointer to that per-CPU buffer is returned back to the caller. This leads to another, final, issue - extra memcpy(). Since the caller gets a pointer to per-CPU buffer it can memcpy() data only to that buffer, and during zs_unmap_object() zsmalloc will memcpy() from that per-CPU buffer to physical pages that object in question spans across. New API splits functions by access mode: - zs_obj_read_begin(handle, local_copy) Returns a pointer to handle memory. For objects that span two physical pages a local_copy buffer is used to store object's data before the address is returned to the caller. Otherwise the object's page is kmap_local mapped directly. - zs_obj_read_end(handle, buf) Unmaps the page if it was kmap_local mapped by zs_obj_read_begin(). - zs_obj_write(handle, buf, len) Copies len-bytes from compression buffer to handle memory (takes care of objects that span two pages). This does not need any additional (e.g. per-CPU) buffers and writes the data directly to zsmalloc pool pages. In terms of performance, on a synthetic and completely reproducible test that allocates fixed number of objects of fixed sizes and iterates over those objects, first mapping in RO then in RW mode: OLD API ======= 3 first results out of 10 369,205,778 instructions # 0.80 insn per cycle 40,467,926 branches # 113.732 M/sec 369,002,122 instructions # 0.62 insn per cycle 40,426,145 branches # 189.361 M/sec 369,036,706 instructions # 0.63 insn per cycle 40,430,860 branches # 204.105 M/sec [..] NEW API ======= 3 first results out of 10 265,799,293 instructions # 0.51 insn per cycle 29,834,567 branches # 170.281 M/sec 265,765,970 instructions # 0.55 insn per cycle 29,829,019 branches # 161.602 M/sec 265,764,702 instructions # 0.51 insn per cycle 29,828,015 branches # 189.677 M/sec [..] T-test on all 10 runs ===================== Difference at 95.0% confidence -1.03219e+08 +/- 55308.7 -27.9705% +/- 0.0149878% (Student's t, pooled s = 58864.4) The old API will stay around until the remaining users switch to the new one. After that we'll also remove zsmalloc per-CPU buffer and CPU hotplug handling. The split of map(RO) and map(WO) into read_{begin/end}/write is suggested by Yosry Ahmed. Suggested-by: Yosry Ahmed Signed-off-by: Sergey Senozhatsky --- include/linux/zsmalloc.h | 8 +++ mm/zsmalloc.c | 129 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 137 insertions(+) diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h index a48cd0ffe57d..7d70983cf398 100644 --- a/include/linux/zsmalloc.h +++ b/include/linux/zsmalloc.h @@ -58,4 +58,12 @@ unsigned long zs_compact(struct zs_pool *pool); unsigned int zs_lookup_class_index(struct zs_pool *pool, unsigned int size); void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats); + +void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, + void *local_copy); +void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, + void *handle_mem); +void zs_obj_write(struct zs_pool *pool, unsigned long handle, + void *handle_mem, size_t mem_len); + #endif diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 250f1fddaf34..71d030500d2b 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1375,6 +1375,135 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle) } EXPORT_SYMBOL_GPL(zs_unmap_object); +void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle, + void *local_copy) +{ + struct zspage *zspage; + struct zpdesc *zpdesc; + unsigned long obj, off; + unsigned int obj_idx; + struct size_class *class; + void *addr; + + WARN_ON(in_interrupt()); + + /* Guarantee we can get zspage from handle safely */ + read_lock(&pool->lock); + obj = handle_to_obj(handle); + obj_to_location(obj, &zpdesc, &obj_idx); + zspage = get_zspage(zpdesc); + + /* Make sure migration doesn't move any pages in this zspage */ + zspage_read_lock(zspage); + read_unlock(&pool->lock); + + class = zspage_class(pool, zspage); + off = offset_in_page(class->size * obj_idx); + + if (off + class->size <= PAGE_SIZE) { + /* this object is contained entirely within a page */ + addr = kmap_local_zpdesc(zpdesc); + addr += off; + } else { + size_t sizes[2]; + + /* this object spans two pages */ + sizes[0] = PAGE_SIZE - off; + sizes[1] = class->size - sizes[0]; + addr = local_copy; + + memcpy_from_page(addr, zpdesc_page(zpdesc), + off, sizes[0]); + zpdesc = get_next_zpdesc(zpdesc); + memcpy_from_page(addr + sizes[0], + zpdesc_page(zpdesc), + 0, sizes[1]); + } + + if (!ZsHugePage(zspage)) + addr += ZS_HANDLE_SIZE; + + return addr; +} +EXPORT_SYMBOL_GPL(zs_obj_read_begin); + +void zs_obj_read_end(struct zs_pool *pool, unsigned long handle, + void *handle_mem) +{ + struct zspage *zspage; + struct zpdesc *zpdesc; + unsigned long obj, off; + unsigned int obj_idx; + struct size_class *class; + + obj = handle_to_obj(handle); + obj_to_location(obj, &zpdesc, &obj_idx); + zspage = get_zspage(zpdesc); + class = zspage_class(pool, zspage); + off = offset_in_page(class->size * obj_idx); + + if (off + class->size <= PAGE_SIZE) { + if (!ZsHugePage(zspage)) + off += ZS_HANDLE_SIZE; + handle_mem -= off; + kunmap_local(handle_mem); + } + + zspage_read_unlock(zspage); +} +EXPORT_SYMBOL_GPL(zs_obj_read_end); + +void zs_obj_write(struct zs_pool *pool, unsigned long handle, + void *handle_mem, size_t mem_len) +{ + struct zspage *zspage; + struct zpdesc *zpdesc; + unsigned long obj, off; + unsigned int obj_idx; + struct size_class *class; + + WARN_ON(in_interrupt()); + + /* Guarantee we can get zspage from handle safely */ + read_lock(&pool->lock); + obj = handle_to_obj(handle); + obj_to_location(obj, &zpdesc, &obj_idx); + zspage = get_zspage(zpdesc); + + /* Make sure migration doesn't move any pages in this zspage */ + zspage_read_lock(zspage); + read_unlock(&pool->lock); + + class = zspage_class(pool, zspage); + off = offset_in_page(class->size * obj_idx); + + if (off + class->size <= PAGE_SIZE) { + /* this object is contained entirely within a page */ + void *dst = kmap_local_zpdesc(zpdesc); + + if (!ZsHugePage(zspage)) + off += ZS_HANDLE_SIZE; + memcpy(dst + off, handle_mem, mem_len); + kunmap_local(dst); + } else { + /* this object spans two pages */ + size_t sizes[2]; + + off += ZS_HANDLE_SIZE; + sizes[0] = PAGE_SIZE - off; + sizes[1] = mem_len - sizes[0]; + + memcpy_to_page(zpdesc_page(zpdesc), off, + handle_mem, sizes[0]); + zpdesc = get_next_zpdesc(zpdesc); + memcpy_to_page(zpdesc_page(zpdesc), 0, + handle_mem + sizes[0], sizes[1]); + } + + zspage_read_unlock(zspage); +} +EXPORT_SYMBOL_GPL(zs_obj_write); + /** * zs_huge_class_size() - Returns the size (in bytes) of the first huge * zsmalloc &size_class.