From patchwork Mon Jan 27 07:59:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergey Senozhatsky X-Patchwork-Id: 13951057 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E7BEC0218C for ; Mon, 27 Jan 2025 08:03:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 900B2280119; Mon, 27 Jan 2025 03:03:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8B11E2800E8; Mon, 27 Jan 2025 03:03:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 703C5280119; Mon, 27 Jan 2025 03:03:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5057B2800E8 for ; Mon, 27 Jan 2025 03:03:36 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 01C15B486C for ; Mon, 27 Jan 2025 08:03:35 +0000 (UTC) X-FDA: 83052492432.27.B4966B4 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf03.hostedemail.com (Postfix) with ESMTP id 23A6F20008 for ; Mon, 27 Jan 2025 08:03:33 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=UDdlosUV; spf=pass (imf03.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.171 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737965014; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fOh8kdIggNW/MgiQjrbJnv2UXDF8Y7xtTwQgOkh+AAY=; b=n/zVInfZNCQs7PMpAgXKmE46GQ/eoDApwdVxsbt+oS58sg7ZOTe0UG+R/1LHjwRiDYBjtZ VusOvttplxCFhSH0GeOuN3juf7Duv0OcEYKlj+CZ7HTxk+hSH6jU18Y9jyZopqwYqcu74M ka73eXW/n/FlKo/ePF6fHahhoTSwz6w= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=UDdlosUV; spf=pass (imf03.hostedemail.com: domain of senozhatsky@chromium.org designates 209.85.214.171 as permitted sender) smtp.mailfrom=senozhatsky@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737965014; a=rsa-sha256; cv=none; b=hrawkEJXG5BifMXQ6p3Yp6oVg/+QTuJRQyTR+Td0Hn7+sj6yP8ScbFvq9BaIGsFZt3wwzM XomoLKeFzqu3sF05/dke2WhcgPx7mHmMLckCbjhCtJbHKcokFqQefrDz+ncAYEsMjI6RsR jvYRMXzDTPS69WpxMrhS/2sB9mKhnHw= Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-21631789fcdso67002515ad.1 for ; Mon, 27 Jan 2025 00:03:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1737965013; x=1738569813; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fOh8kdIggNW/MgiQjrbJnv2UXDF8Y7xtTwQgOkh+AAY=; b=UDdlosUV5qctQ0zyWjRrRtRQ5O+DE9JcGjh/enHYbY8OJiYugBCzVGjDirj7s6kfyL aIsycqxYh2ViUqMoxsHaECjo7CaZ41MLjkWQgv5Fu9b16BVHMRXQMfwQjMJ7daKkKbjl X+xdXnTxHklO6y8OYYKArAzDzPGFRoimS9mKg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737965013; x=1738569813; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fOh8kdIggNW/MgiQjrbJnv2UXDF8Y7xtTwQgOkh+AAY=; b=sXyo3NO8hT0xjEBA4SUkRcsp/gbFn3eEXG70vFKpw8onF7uXqDJUxcMTfsDnSBK3Uo Dl7nC/wQFNTC7X+/ITfgn4xjufEQWi6msNaLNOYfbW8FCiiIQac6/Hsb1skjYUfhr5sT OqkThVwWZNh1p0rKyiMfFeuZMktWihcAiz52bNKoKAOPi8AfSRGG5woWNkjnBkGceY+u JXryJ27WCu1HIginETTOUixXxKaEfRvw9BQ4nYioafh+oHwioXzcuCIsDAU2PcAw/5Sf eOyMy6tlBTBJk8QpShL2C7cx3lzcu/ONsbRrsZXmVMcMCLJUOfHe0XZvjANL0S5bMIkD nMBQ== X-Gm-Message-State: AOJu0YzHzktWf9xi9288Z1ojkPxFxCSQLJgnwAzruO6VVUH8bYlDVFz/ oQPtdk6R4dL6JjYVVKdfb5rDMh+sFMTTTijeCqt/0AQ/k8gQVmfBMz6P2HfH/A== X-Gm-Gg: ASbGnctcdSrYMR1kiFYDK3kN5K5JMEoSSC8X8nG1Nffkxugj31BUUjXnP3vdPkzBUC3 zNKZeU6Ia+3/vKiW1CcZysqtoU7JxwlGfXYwdZVj3jq336CTu2POzY/EfD8c28Dbbf80+aPptjS l/iS5hgPEkoK8hOvVyWJ8wfRyMydV0jO7NlY3zdZ/ku+bMczevbujCwxiXAemUWldwbu2yAKKBN YpOsy7qR/PxvKglnzwXCO+wBFD4C2Xbzm9QPSFpRCOdgQyctd7RBkAZHXn9cj9dBxm3YNIea5dK 6JBd7WI= X-Google-Smtp-Source: AGHT+IHCMZnq78KzilDYy3SoIuDu6zyyr9bdzoDlo65Vrz4W4RnlHVGP0xG9ma7YfvS5hlyVHvHmww== X-Received: by 2002:a05:6a00:ac4:b0:725:d9b6:3952 with SMTP id d2e1a72fcca58-72f7d1d9573mr28441245b3a.3.1737965012759; Mon, 27 Jan 2025 00:03:32 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:566d:6152:c049:8d3a]) by smtp.gmail.com with UTF8SMTPSA id d2e1a72fcca58-72f8a78dfa2sm6426964b3a.157.2025.01.27.00.03.30 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 27 Jan 2025 00:03:32 -0800 (PST) From: Sergey Senozhatsky To: Andrew Morton , Minchan Kim , Johannes Weiner , Yosry Ahmed , Nhat Pham Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: [RFC PATCH 5/6] zsmalloc: introduce handle mapping API Date: Mon, 27 Jan 2025 16:59:30 +0900 Message-ID: <20250127080254.1302026-6-senozhatsky@chromium.org> X-Mailer: git-send-email 2.48.1.262.g85cc9f2d1e-goog In-Reply-To: <20250127080254.1302026-1-senozhatsky@chromium.org> References: <20250127080254.1302026-1-senozhatsky@chromium.org> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 23A6F20008 X-Stat-Signature: w9z5ojndep8wndjd86bqbx47fjt7ryfd X-Rspam-User: X-HE-Tag: 1737965013-952958 X-HE-Meta: U2FsdGVkX19C1lNoiEd8zxt5MJgE4eVPfwZfhdkVhudieaPKgFGbzaWTL7wiAhTnh/w9PPzfMN6S6CoNbvAU0AX7V8EPnOd3Hsi5SI9QOHYPF0TWavpEx2T/JUlSyEo1o5o4eHr9X6X83BRtxNaMHLEg6GqJ2nsUOQ/YvaRr/H2n19kRnUOEOYB55Rm4BZv/vb59XS3vAGl5hf6l2j4ji49rb77MCkBj3Ytk9X9/1j/n4Xgi4V+eSvA7DJHh6At0eDCQmyKR/A+zbEEtHBxfNmSH8TJY+VAYc0a1EBSI8vYWkKPO81vjfkRaKGLqNj/JPu+uWT3MelmsPNpM68LP+Ax0YTC+VpzDcj8F1giS9ox5W1n/blg1HUUGYpwIoKcUoOXHINQTVixossbzNcf8LFHJ0gHcOnYlbwgMQ4ExVlqzWSCv6zHXGBGZ9JBvytFbYrwgU2sYBgCjOLA4yySrLSIcWJyLm5dtajEjZ1/BHsdzOBoC+00ljWRhccSjt3pMQywyskx3NZbYjG1HQOc/BnaXsyHWSzjP/WhfDqIkMXW/1lf3UYl7273BgF4WSgkHdh/i27oczgfvuopz/WNl5DeOHRJ9MIn+3OEx2wbE7R2DdmiO7ngoYzcnObndjYLFza1b+xLL96iLu91sXoiRyZqicLFIs4ypylgPgxKyblXArv/OhXncYs9qZbovZvZa1HXpAuZgSHvxyQ5uzEB4JaQhL1zqM7kuLLBxaT/8I5lsLI3PrXKUOg0FDwzw5/bvD9ssV+vIR5np7Cf9+u0uw+KteakZVq1ma4yZPpkBKy1UXLeTV8agyCESjLLWGghka8SEvh9FCQiNqPaY2qTVbKqP7O4D0F9TjuOTFkf24/kKpP4esHU2Kdxfn/VoaupQy6hWnSth8+vhwhqbpJsXN5rgxEDkh898K6DAcXzh6vIxPg/Wzq0+KT0eihlB055ickHOALYli3iaXsCz3Fm NxaXCZ20 N36abyoB86GVkLpk/ax8EQg/okpuxTCuNk3JQZ1OcRWcThtlAKjVR4iWOMCy5AyE+2EWZ9EESsEvmhqstPSVJjbYH/lVMgDoETY1WHgxny4J0YCh80WacFuXzvT04hB2DvV1rL1jYqw+kPg+IoONn0kt37RWK6WfMe7FdvSuZ29JtZ9Q3lqMYVu0L7Moyn7b6Rj7wvB/dgNd5dxf/YT9wuUFEcHxwemt/MVxzxbBn6dCSpT6yprj/cd0nhKhS3rDE4cRyI9Ge28jXy3nVysUVhnd90PEiGToP1cQjs7SUBmDsMfAfcTMNYCvKlA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Introduce new API to map/unmap zsmalloc handle/object. The key difference is that this API does not impose atomicity restrictions on its users, unlike zs_map_object() which returns with page-faults and preemption disabled - handle mapping API does not need a per-CPU vm-area because the users are required to provide an aux buffer for objects that span several physical pages. Keep zs_map_object/zs_unmap_object for the time being, as there are still users of it, but eventually old API will be removed. Signed-off-by: Sergey Senozhatsky --- include/linux/zsmalloc.h | 29 ++++++++ mm/zsmalloc.c | 148 ++++++++++++++++++++++++++++----------- 2 files changed, 138 insertions(+), 39 deletions(-) diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h index a48cd0ffe57d..72d84537dd38 100644 --- a/include/linux/zsmalloc.h +++ b/include/linux/zsmalloc.h @@ -58,4 +58,33 @@ unsigned long zs_compact(struct zs_pool *pool); unsigned int zs_lookup_class_index(struct zs_pool *pool, unsigned int size); void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats); + +struct zs_handle_mapping { + unsigned long handle; + /* Points to start of the object data either within local_copy or + * within local_mapping. This is what callers should use to access + * or modify handle data. + */ + void *handle_mem; + + enum zs_mapmode mode; + union { + /* + * Handle object data copied, because it spans across several + * (non-contiguous) physical pages. This pointer should be + * set by the zs_map_handle() caller beforehand and should + * never be accessed directly. + */ + void *local_copy; + /* + * Handle object mapped directly. Should never be used + * directly. + */ + void *local_mapping; + }; +}; + +int zs_map_handle(struct zs_pool *pool, struct zs_handle_mapping *map); +void zs_unmap_handle(struct zs_pool *pool, struct zs_handle_mapping *map); + #endif diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index a5c1f9852072..281bba4a3277 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1132,18 +1132,14 @@ static inline void __zs_cpu_down(struct mapping_area *area) area->vm_buf = NULL; } -static void *__zs_map_object(struct mapping_area *area, - struct zpdesc *zpdescs[2], int off, int size) +static void zs_obj_copyin(void *buf, struct zpdesc *zpdesc, int off, int size) { + struct zpdesc *zpdescs[2]; size_t sizes[2]; - char *buf = area->vm_buf; - - /* disable page faults to match kmap_local_page() return conditions */ - pagefault_disable(); - /* no read fastpath */ - if (area->vm_mm == ZS_MM_WO) - goto out; + zpdescs[0] = zpdesc; + zpdescs[1] = get_next_zpdesc(zpdesc); + BUG_ON(!zpdescs[1]); sizes[0] = PAGE_SIZE - off; sizes[1] = size - sizes[0]; @@ -1151,21 +1147,17 @@ static void *__zs_map_object(struct mapping_area *area, /* copy object to per-cpu buffer */ memcpy_from_page(buf, zpdesc_page(zpdescs[0]), off, sizes[0]); memcpy_from_page(buf + sizes[0], zpdesc_page(zpdescs[1]), 0, sizes[1]); -out: - return area->vm_buf; } -static void __zs_unmap_object(struct mapping_area *area, - struct zpdesc *zpdescs[2], int off, int size) +static void zs_obj_copyout(void *buf, struct zpdesc *zpdesc, int off, int size) { + struct zpdesc *zpdescs[2]; size_t sizes[2]; - char *buf; - /* no write fastpath */ - if (area->vm_mm == ZS_MM_RO) - goto out; + zpdescs[0] = zpdesc; + zpdescs[1] = get_next_zpdesc(zpdesc); + BUG_ON(!zpdescs[1]); - buf = area->vm_buf; buf = buf + ZS_HANDLE_SIZE; size -= ZS_HANDLE_SIZE; off += ZS_HANDLE_SIZE; @@ -1176,10 +1168,6 @@ static void __zs_unmap_object(struct mapping_area *area, /* copy per-cpu buffer to object */ memcpy_to_page(zpdesc_page(zpdescs[0]), off, buf, sizes[0]); memcpy_to_page(zpdesc_page(zpdescs[1]), 0, buf + sizes[0], sizes[1]); - -out: - /* enable page faults to match kunmap_local() return conditions */ - pagefault_enable(); } static int zs_cpu_prepare(unsigned int cpu) @@ -1260,6 +1248,8 @@ EXPORT_SYMBOL_GPL(zs_get_total_pages); * against nested mappings. * * This function returns with preemption and page faults disabled. + * + * NOTE: this function is deprecated and will be removed. */ void *zs_map_object(struct zs_pool *pool, unsigned long handle, enum zs_mapmode mm) @@ -1268,10 +1258,8 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, struct zpdesc *zpdesc; unsigned long obj, off; unsigned int obj_idx; - struct size_class *class; struct mapping_area *area; - struct zpdesc *zpdescs[2]; void *ret; /* @@ -1309,12 +1297,14 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, goto out; } - /* this object spans two pages */ - zpdescs[0] = zpdesc; - zpdescs[1] = get_next_zpdesc(zpdesc); - BUG_ON(!zpdescs[1]); + ret = area->vm_buf; + /* disable page faults to match kmap_local_page() return conditions */ + pagefault_disable(); + if (mm != ZS_MM_WO) { + /* this object spans two pages */ + zs_obj_copyin(area->vm_buf, zpdesc, off, class->size); + } - ret = __zs_map_object(area, zpdescs, off, class->size); out: if (likely(!ZsHugePage(zspage))) ret += ZS_HANDLE_SIZE; @@ -1323,13 +1313,13 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, } EXPORT_SYMBOL_GPL(zs_map_object); +/* NOTE: this function is deprecated and will be removed. */ void zs_unmap_object(struct zs_pool *pool, unsigned long handle) { struct zspage *zspage; struct zpdesc *zpdesc; unsigned long obj, off; unsigned int obj_idx; - struct size_class *class; struct mapping_area *area; @@ -1340,23 +1330,103 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle) off = offset_in_page(class->size * obj_idx); area = this_cpu_ptr(&zs_map_area); - if (off + class->size <= PAGE_SIZE) + if (off + class->size <= PAGE_SIZE) { kunmap_local(area->vm_addr); - else { - struct zpdesc *zpdescs[2]; + goto out; + } - zpdescs[0] = zpdesc; - zpdescs[1] = get_next_zpdesc(zpdesc); - BUG_ON(!zpdescs[1]); + if (area->vm_mm != ZS_MM_RO) + zs_obj_copyout(area->vm_buf, zpdesc, off, class->size); + /* enable page faults to match kunmap_local() return conditions */ + pagefault_enable(); - __zs_unmap_object(area, zpdescs, off, class->size); - } +out: local_unlock(&zs_map_area.lock); - zspage_read_unlock(zspage); } EXPORT_SYMBOL_GPL(zs_unmap_object); +void zs_unmap_handle(struct zs_pool *pool, struct zs_handle_mapping *map) +{ + struct zspage *zspage; + struct zpdesc *zpdesc; + unsigned long obj, off; + unsigned int obj_idx; + struct size_class *class; + + obj = handle_to_obj(map->handle); + obj_to_location(obj, &zpdesc, &obj_idx); + zspage = get_zspage(zpdesc); + class = zspage_class(pool, zspage); + off = offset_in_page(class->size * obj_idx); + + if (off + class->size <= PAGE_SIZE) { + kunmap_local(map->local_mapping); + goto out; + } + + if (map->mode != ZS_MM_RO) + zs_obj_copyout(map->local_copy, zpdesc, off, class->size); + +out: + zspage_read_unlock(zspage); +} +EXPORT_SYMBOL_GPL(zs_unmap_handle); + +int zs_map_handle(struct zs_pool *pool, struct zs_handle_mapping *map) +{ + struct zspage *zspage; + struct zpdesc *zpdesc; + unsigned long obj, off; + unsigned int obj_idx; + struct size_class *class; + + WARN_ON(in_interrupt()); + + /* It guarantees it can get zspage from handle safely */ + pool_read_lock(pool); + obj = handle_to_obj(map->handle); + obj_to_location(obj, &zpdesc, &obj_idx); + zspage = get_zspage(zpdesc); + + /* + * migration cannot move any zpages in this zspage. Here, class->lock + * is too heavy since callers would take some time until they calls + * zs_unmap_object API so delegate the locking from class to zspage + * which is smaller granularity. + */ + zspage_read_lock(zspage); + pool_read_unlock(pool); + + class = zspage_class(pool, zspage); + off = offset_in_page(class->size * obj_idx); + + if (off + class->size <= PAGE_SIZE) { + /* this object is contained entirely within a page */ + map->local_mapping = kmap_local_zpdesc(zpdesc); + map->handle_mem = map->local_mapping + off; + goto out; + } + + if (WARN_ON_ONCE(!map->local_copy)) { + zspage_read_unlock(zspage); + return -EINVAL; + } + + map->handle_mem = map->local_copy; + if (map->mode != ZS_MM_WO) { + /* this object spans two pages */ + zs_obj_copyin(map->local_copy, zpdesc, off, class->size); + } + +out: + if (likely(!ZsHugePage(zspage))) + map->handle_mem += ZS_HANDLE_SIZE; + + return 0; +} +EXPORT_SYMBOL_GPL(zs_map_handle); + /** * zs_huge_class_size() - Returns the size (in bytes) of the first huge * zsmalloc &size_class.