From patchwork Mon Apr 3 12:27:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peng Zhang X-Patchwork-Id: 13198120 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5BA69C761A6 for ; Mon, 3 Apr 2023 12:28:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D9D226B0075; Mon, 3 Apr 2023 08:28:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D4C866B0078; Mon, 3 Apr 2023 08:28:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BEC3E900002; Mon, 3 Apr 2023 08:28:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B01C16B0075 for ; Mon, 3 Apr 2023 08:28:00 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 25A4BA04D1 for ; Mon, 3 Apr 2023 12:28:00 +0000 (UTC) X-FDA: 80640006720.26.6BABDD7 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf18.hostedemail.com (Postfix) with ESMTP id 7435A1C0024 for ; Mon, 3 Apr 2023 12:27:57 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=cQF1RNTr; spf=pass (imf18.hostedemail.com: domain of zhangpeng.00@bytedance.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=zhangpeng.00@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680524878; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=5olhRivIG/hHd0cpc7oUvrJOToDW8+BEcztjjWShXR4=; b=DZOyYxW7teY07GcjoFHhgk3fjix5ZvkVodGyBLd+sd331W3S63eZukxwaLYfNmwj2Be1Hi 1NixDlsc3JI2Mz0KbC5pX23HJePn7B6n/cC0ZmHr9EdgH+CvzaOs6hAPph14OeINpMRrS4 8tGlYzb7D8uChcodsVMGu+qOobGGW2w= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=cQF1RNTr; spf=pass (imf18.hostedemail.com: domain of zhangpeng.00@bytedance.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=zhangpeng.00@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680524878; a=rsa-sha256; cv=none; b=OSCexpqL9a0rvowNHbBWJ7XDDcLJETSpJtts6SGLr/n03vt0NUJTiBUtNe5nC/cCNWnphi J+Evu2sdTEm9+QFS1oaLGw3DPRK2yHK5Fp2RNlIGn1P2bUqYB7Rd6VW87nqeXMUKClpPo7 upRDwGzFK0KgV6uLOnRTHOhQonXGXgA= Received: by mail-pl1-f181.google.com with SMTP id kc4so27802586plb.10 for ; Mon, 03 Apr 2023 05:27:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1680524876; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=5olhRivIG/hHd0cpc7oUvrJOToDW8+BEcztjjWShXR4=; b=cQF1RNTrilN6uUHe2olsoE50iad/Chq08+390cxaMCE5w0fLa8JXQ1uyr4n/8Bp4z0 tLsH3+lZHWOZLs+TLr/jnDH/4H9RY3xGd7aoHpQfdx6vF4Jg63FPIb/CqK1F3smAVq3O W9p9/Hd9rB/nTL+W249WZPVTCmAKSWg1HWA8GSLPnFxjKdMqsdE4eJ4GBp04LSVJd9qz 8EIzbCcTGDJ0luCyYfADRSVq/itah9vnswmU5V7BtQTd2hnV6F6qN2ZXw1usjXg96zf2 Iy0lrYwRIKFosvSNmb8X0muexshj+wwOSD1I4EkFxbrxPicTRtS3FkPG3A2qqofPJleE HmJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680524876; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5olhRivIG/hHd0cpc7oUvrJOToDW8+BEcztjjWShXR4=; b=TU6Vnl16RPN1r1ISUMXBGqvuCPq/ZJRMZP8bcigauBXWx7GH4zAomzPh0pvTx0HcRu Cc/24SdSt4Fg3clpTUOvhxg/+6ATtOndMJdQO56hLxSPyrlXKgLCPMtTJor5fviRKcAB uxHcbK0dU7yW8DHkyGrMSrwt5kQe0v59PNqtjqYvfpOpJl9ap2tS8ThMxaKkCaGPVtbO /PnTmW7iuOSiEQGevnzcF+1Jtsj9xh1/nMuqmh1JUXmUx8XGsXSxdCaPwZ9KyC41oLFi ylT/j8EZktLHQQJW43nrYy0ixHsXtmWS4WKRBO6iKJCkStI4m1d5x5NZLT9lNW/WkK5C dFOQ== X-Gm-Message-State: AAQBX9eN3nJndML/CJjNyT8Bnh2cseWqXtAr2HAdPl9IxyhFVkaUMMHw RyaTmn4Hkt4qPO2TJ59OXc7D5w== X-Google-Smtp-Source: AKy350aoV3blvZ+bajGS5QvM2jWNu1IdK2sKSOo4bYShKSuHsClLRLjZpwRlFfMWDsYJmjEAGBNI2A== X-Received: by 2002:a17:90a:b397:b0:23d:1b82:7236 with SMTP id e23-20020a17090ab39700b0023d1b827236mr42253971pjr.16.1680524876113; Mon, 03 Apr 2023 05:27:56 -0700 (PDT) Received: from GL4FX4PXWL.bytedance.net ([139.177.225.248]) by smtp.gmail.com with ESMTPSA id x5-20020a17090a1f8500b00240ab3c5f66sm6107802pja.29.2023.04.03.05.27.52 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 03 Apr 2023 05:27:55 -0700 (PDT) From: Peng Zhang To: glider@google.com, elver@google.com, dvyukov@google.com, akpm@linux-foundation.org Cc: kasan-dev@googlegroups.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Peng Zhang Subject: [PATCH v2] mm: kfence: Improve the performance of __kfence_alloc() and __kfence_free() Date: Mon, 3 Apr 2023 20:27:38 +0800 Message-Id: <20230403122738.6006-1-zhangpeng.00@bytedance.com> X-Mailer: git-send-email 2.37.0 (Apple Git-136) MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 7435A1C0024 X-Stat-Signature: bnydguyxdb9rwe5izn8hazs38iwr65ch X-HE-Tag: 1680524877-630866 X-HE-Meta: U2FsdGVkX1/PunbHoQXl9YQkfM4BPxEXoSQ+FW5bO4GaE/autmX35WHtGEmq5ELrk85KimofCxQfr5j5MiZAqeKdfo5vdOv71WltO7JAcl+zeSmzlYL4pZdq6pKhFXZZ58gTIYJXmjK36w1KR3M8M1Xjoa4hC78XWagBLf7ejzbqbsYIq3tlwxzMoulfXd7/tt2BH7egNVXzuQLH3d01VZxwpQG4u9Z81TSjBtwCLNvH6a1EyEr0+l0IoLAjCC2mLjPhJzXg8ztmBvXvh/CVU9jaC1AS8bC4Ls0P5L6/AIdoVFB8Dz0w8hfCfIu6udgzgh4VHd3qkTl9enu5MynM/5L/tRLnJupiD/QNrtHTt6nhM+S2U8UNU7AWhLZFT7Fyv72djcrcRrylQM2O0pupmuhLRWULsGUOhvTchgffxSQe2YufQVgGdtpJLcZi/tRKNwT50gjgoKfb+4LTcFXqyDaOYcW7qoOAFE1gVmIXW6dvvvrFzNKK3L4Gl6DmUZDUaBqGVKImn5qgMS1hmLdQqoQjS2RV73HFz9WNKj/FqzMtx9N3rD6YF1bIqaptKkxlyo1yl2EVzqv2elNmq7uyIRffETB6njqvrvbvcNkRnmqJimyYs75X7shym+F5FV1cgtos5r/mRUU/B6rLwM1+bYHM9TlkmQSzLlWqR9ET+7TdYC8xpf7+fLTBFaBq+UyYoi8qYOhVEk5I1ZefHLNuZV2PH959Nv6iVZ2RhtmNFOh2Q4bJ+n+RgCpF8OSfu9DJy86kK5GMBXNHrZ7GoGl8xaT+I7Sh0pYp2aQ4WMx0ZEbO1788MKZ7wjQxIwAzy+JSeymmy97cXGMbuX47lDlQVRtaAnb2LJyNG0003Mhu/vh7DO25KsKFdimiuYrGsFXbaBrWmdaEO1aI4J01mzN7aTJaUB1GpnfNhOGBeWMBT0Cw8P0AGrUoOTxPRqQvcHHRWH71JhtpI5hv7TYS45n DL1C7tPt bJ3XJUvDR5SgXAdrRuSoVd1avnFRYRHisIh9EUegfqVbLf+1ZLQn4ouBRw3ZM+02U08pagJRminRk4iKz1ldhc4KUPtro0y+4TTK4TcZlypMjUw4Tj68nwjBoAEfRXkLghBmXPU2jXBfO10gZuFNtuqtnVXWydLQkF3PYRWX5hkoNdOcq+a+a/eoeUOP8dP6sF1EP91yzi91yoPRo6PKkzF2CZmqvuagobDu3Fc/bD4U2FCytXCh/v9QteAmCBp7nKgKMGKjEkyRKDcj4eEHALEVGJNvFYPgiNgjcE1feZnD7htMU9cWwDKlbSjT3ioGcZfJqQCZ5oaxVmw0/G8BDiEEyo60QDPvBeMloSu4mhStBSO/C2d/kmgFSK8TGZULAgOyxSA6vCODId7wB8JTvEJ9L7tkNXchOO1rjUr3U9PFHZVjjE5hXAMuPm6/JPt+yASHD0uOqW5rrmOlH7xIzO3Q9JZfHEu7Ee33D X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In __kfence_alloc() and __kfence_free(), we will set and check canary. Assuming that the size of the object is close to 0, nearly 4k memory accesses are required because setting and checking canary is executed byte by byte. canary is now defined like this: KFENCE_CANARY_PATTERN(addr) ((u8)0xaa ^ (u8)((unsigned long)(addr) & 0x7)) Observe that canary is only related to the lower three bits of the address, so every 8 bytes of canary are the same. We can access 8-byte canary each time instead of byte-by-byte, thereby optimizing nearly 4k memory accesses to 4k/8 times. Use the bcc tool funclatency to measure the latency of __kfence_alloc() and __kfence_free(), the numbers (deleted the distribution of latency) is posted below. Though different object sizes will have an impact on the measurement, we ignore it for now and assume the average object size is roughly equal. Before patching: __kfence_alloc: avg = 5055 nsecs, total: 5515252 nsecs, count: 1091 __kfence_free: avg = 5319 nsecs, total: 9735130 nsecs, count: 1830 After patching: __kfence_alloc: avg = 3597 nsecs, total: 6428491 nsecs, count: 1787 __kfence_free: avg = 3046 nsecs, total: 3415390 nsecs, count: 1121 The numbers indicate that there is ~30% - ~40% performance improvement. Signed-off-by: Peng Zhang Reviewed-by: Marco Elver --- mm/kfence/core.c | 70 ++++++++++++++++++++++++++++++++-------------- mm/kfence/kfence.h | 10 ++++++- mm/kfence/report.c | 2 +- 3 files changed, 59 insertions(+), 23 deletions(-) diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 79c94ee55f97..b7fe2a2493a0 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -297,20 +297,13 @@ metadata_update_state(struct kfence_metadata *meta, enum kfence_object_state nex WRITE_ONCE(meta->state, next); } -/* Write canary byte to @addr. */ -static inline bool set_canary_byte(u8 *addr) -{ - *addr = KFENCE_CANARY_PATTERN(addr); - return true; -} - /* Check canary byte at @addr. */ static inline bool check_canary_byte(u8 *addr) { struct kfence_metadata *meta; unsigned long flags; - if (likely(*addr == KFENCE_CANARY_PATTERN(addr))) + if (likely(*addr == KFENCE_CANARY_PATTERN_U8(addr))) return true; atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]); @@ -323,15 +316,31 @@ static inline bool check_canary_byte(u8 *addr) return false; } -/* __always_inline this to ensure we won't do an indirect call to fn. */ -static __always_inline void for_each_canary(const struct kfence_metadata *meta, bool (*fn)(u8 *)) +static inline void set_canary(const struct kfence_metadata *meta) { const unsigned long pageaddr = ALIGN_DOWN(meta->addr, PAGE_SIZE); - unsigned long addr; + unsigned long addr = pageaddr; + + /* + * The canary may be written to part of the object memory, but it does + * not affect it. The user should initialize the object before using it. + */ + for (; addr < meta->addr; addr += sizeof(u64)) + *((u64 *)addr) = KFENCE_CANARY_PATTERN_U64; + + addr = ALIGN_DOWN(meta->addr + meta->size, sizeof(u64)); + for (; addr - pageaddr < PAGE_SIZE; addr += sizeof(u64)) + *((u64 *)addr) = KFENCE_CANARY_PATTERN_U64; +} + +static inline void check_canary(const struct kfence_metadata *meta) +{ + const unsigned long pageaddr = ALIGN_DOWN(meta->addr, PAGE_SIZE); + unsigned long addr = pageaddr; /* - * We'll iterate over each canary byte per-side until fn() returns - * false. However, we'll still iterate over the canary bytes to the + * We'll iterate over each canary byte per-side until a corrupted byte + * is found. However, we'll still iterate over the canary bytes to the * right of the object even if there was an error in the canary bytes to * the left of the object. Specifically, if check_canary_byte() * generates an error, showing both sides might give more clues as to @@ -339,16 +348,35 @@ static __always_inline void for_each_canary(const struct kfence_metadata *meta, */ /* Apply to left of object. */ - for (addr = pageaddr; addr < meta->addr; addr++) { - if (!fn((u8 *)addr)) + for (; meta->addr - addr >= sizeof(u64); addr += sizeof(u64)) { + if (unlikely(*((u64 *)addr) != KFENCE_CANARY_PATTERN_U64)) break; } - /* Apply to right of object. */ - for (addr = meta->addr + meta->size; addr < pageaddr + PAGE_SIZE; addr++) { - if (!fn((u8 *)addr)) + /* + * If the canary is corrupted in a certain 64 bytes, or the canary + * memory cannot be completely covered by multiple consecutive 64 bytes, + * it needs to be checked one by one. + */ + for (; addr < meta->addr; addr++) { + if (unlikely(!check_canary_byte((u8 *)addr))) break; } + + /* Apply to right of object. */ + for (addr = meta->addr + meta->size; addr % sizeof(u64) != 0; addr++) { + if (unlikely(!check_canary_byte((u8 *)addr))) + return; + } + for (; addr - pageaddr < PAGE_SIZE; addr += sizeof(u64)) { + if (unlikely(*((u64 *)addr) != KFENCE_CANARY_PATTERN_U64)) { + + for (; addr - pageaddr < PAGE_SIZE; addr++) { + if (!check_canary_byte((u8 *)addr)) + return; + } + } + } } static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t gfp, @@ -434,7 +462,7 @@ static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t g #endif /* Memory initialization. */ - for_each_canary(meta, set_canary_byte); + set_canary(meta); /* * We check slab_want_init_on_alloc() ourselves, rather than letting @@ -495,7 +523,7 @@ static void kfence_guarded_free(void *addr, struct kfence_metadata *meta, bool z alloc_covered_add(meta->alloc_stack_hash, -1); /* Check canary bytes for memory corruption. */ - for_each_canary(meta, check_canary_byte); + check_canary(meta); /* * Clear memory if init-on-free is set. While we protect the page, the @@ -751,7 +779,7 @@ static void kfence_check_all_canary(void) struct kfence_metadata *meta = &kfence_metadata[i]; if (meta->state == KFENCE_OBJECT_ALLOCATED) - for_each_canary(meta, check_canary_byte); + check_canary(meta); } } diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h index 600f2e2431d6..2aafc46a4aaf 100644 --- a/mm/kfence/kfence.h +++ b/mm/kfence/kfence.h @@ -21,7 +21,15 @@ * lower 3 bits of the address, to detect memory corruptions with higher * probability, where similar constants are used. */ -#define KFENCE_CANARY_PATTERN(addr) ((u8)0xaa ^ (u8)((unsigned long)(addr) & 0x7)) +#define KFENCE_CANARY_PATTERN_U8(addr) ((u8)0xaa ^ (u8)((unsigned long)(addr) & 0x7)) + +/* + * Define a continuous 8-byte canary starting from a multiple of 8. The canary + * of each byte is only related to the lowest three bits of its address, so the + * canary of every 8 bytes is the same. 64-bit memory can be filled and checked + * at a time instead of byte by byte to improve performance. + */ +#define KFENCE_CANARY_PATTERN_U64 ((u64)0xaaaaaaaaaaaaaaaa ^ (u64)(0x0706050403020100)) /* Maximum stack depth for reports. */ #define KFENCE_STACK_DEPTH 64 diff --git a/mm/kfence/report.c b/mm/kfence/report.c index 60205f1257ef..197430a5be4a 100644 --- a/mm/kfence/report.c +++ b/mm/kfence/report.c @@ -168,7 +168,7 @@ static void print_diff_canary(unsigned long address, size_t bytes_to_show, pr_cont("["); for (cur = (const u8 *)address; cur < end; cur++) { - if (*cur == KFENCE_CANARY_PATTERN(cur)) + if (*cur == KFENCE_CANARY_PATTERN_U8(cur)) pr_cont(" ."); else if (no_hash_pointers) pr_cont(" 0x%02x", *cur);