From patchwork Thu Jan 25 13:12:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leone Fernando X-Patchwork-Id: 13530812 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B5AA23774; Thu, 25 Jan 2024 13:12:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706188378; cv=none; b=myyzNTWJ85dFxG2EWQm9XP+JworkRoVIg2L3nWBOWhku+UGH8JRZ0aVw4be8FGNqCh7AmaRp2Ngt4G48zcyCnLY6E3nBWIcMSN1GsoTrSgb+RbliKSWIS/Mimxn951eKqoMkEqY/xYH8eawcDxcnFONCBrbykMlCa4vq1I6FsH8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706188378; c=relaxed/simple; bh=Na/rHa5dAP3l50X//9FCmhKqVB813/sR9Yz3qZ8PB4Y=; h=Message-ID:Date:MIME-Version:Subject:From:To:References: In-Reply-To:Content-Type; b=PQbzmHEIlzd8wgjeVNkzlGopTmKjqlIx+g2Xm7U8uYE/ZrbLcNnqkkewXJaGVGT6xB8JYaZ6reC2K9bTNVfCvyf+pyM0J677oiFZSB2z8ymMcOGHr7r51EjEyLRPqQ100ODw3e3gORhAqVX2uVKjdVKM04RRHGssURWltHwInlw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=b+Hz0P4O; arc=none smtp.client-ip=209.85.128.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="b+Hz0P4O" Received: by mail-wm1-f65.google.com with SMTP id 5b1f17b1804b1-40eb2f392f0so36670645e9.1; Thu, 25 Jan 2024 05:12:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706188375; x=1706793175; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:references:to:from :content-language:subject:mime-version:date:message-id:from:to:cc :subject:date:message-id:reply-to; bh=OUC5+LMrNmxS7ctyeH5pl2x/dqcP46pJUlAebEkKzMY=; b=b+Hz0P4OruBTQRh1xMNTysGVlcUVFi101rYbRbTcXFMxitGpm1eb4NuZUWWTb/wVwY UJ2A2f+kMIgufISKXuUu6U/CvOaUO5xEenzZ+c7qTRts/P5STGvadC92LytSueYy+Ot8 0lyia4CNMjQn40Z20x9/jgE83kZrNbC4uXc9CM5PfVs/4qZoqYy9NyQ8FX95BJ4N0GtN ILjbqEzM4OavtrBk+7thE4YjXqPuwqhetYsWv76voTjJE5LpFkkxWGTvXxXitQBptLQG U2RkdOGZmAGX6Tz3KjvtilzYGlzVZtP1hlQ3hJrZWCA75nWs13Wf2CmsTQSfZn9W9x2g fAiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706188375; x=1706793175; h=content-transfer-encoding:in-reply-to:references:to:from :content-language:subject:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OUC5+LMrNmxS7ctyeH5pl2x/dqcP46pJUlAebEkKzMY=; b=kcpe4rfmBttte/9AvnJSBWzx1DbWilvAS0U+s6b+kajdtQ5G42jI33iUaujPha6/9U Rs/FaWuRoAO7B/KrGoPN3yScucAHfYblkBHCKZnxl/4rNLb4oKVLhkIsPJfeDBhlG+fN JQ4EG/Bj2gMwuholx7KlHLPeDbXfPaRHM5FigML+h+w4gUkgUhakLVybv1Y/VEh00NZ7 umd8dggqMRo9IiR/BLvNckJnf7XYgnwSy2G/KUceU6oAMl240KkORWaez9+SPL8100UN 0cCGtLYVn6rn1eQX2isyg0qgD9BmVVmJIqWZl//xdThRdbj3mCDL52sO851xFx9aSOD4 YHOw== X-Gm-Message-State: AOJu0YwBFXYKqnDoUn2aQTOoLRt8jZecPnCxh3/tvGQkhOEunTCRW69J s1AqwPrjAC8wkqqNnl+JA1jOh5CIqb02okQ5In8eCwK/FMYA5Gqn X-Google-Smtp-Source: AGHT+IHjlvwBX+Y574xgT+Z3ZGGsiKzys5jeucBVsLZ2sI4/NvncCt82obiTN8x1uTk4OruxJ06csg== X-Received: by 2002:a05:600c:91e:b0:40e:9f6d:61f6 with SMTP id m30-20020a05600c091e00b0040e9f6d61f6mr512336wmp.184.1706188374954; Thu, 25 Jan 2024 05:12:54 -0800 (PST) Received: from localhost ([45.130.85.2]) by smtp.gmail.com with ESMTPSA id l1-20020a05600c4f0100b0040e549c77a1sm2621155wmq.32.2024.01.25.05.12.51 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 25 Jan 2024 05:12:54 -0800 (PST) Message-ID: Date: Thu, 25 Jan 2024 14:12:39 +0100 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: [RFC PATCH net-next v1 1/3] net: route: expire rt if the dst it holds is expired Content-Language: en-US From: Leone Fernando To: dennis@kernel.org, tj@kernel.org, cl@linux.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, dsahern@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org References: In-Reply-To: X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC The function rt_is_expired is used to verify that a cached dst is valid. Currently, this function ignores the rt.dst->expires value. Add a check to rt_is_expired that validates that the dst is not expired. Signed-off-by: Leone Fernando --- net/ipv4/route.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 16615d107cf0..7c5e68117ee2 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -392,7 +392,8 @@ static inline int ip_rt_proc_init(void) static inline bool rt_is_expired(const struct rtable *rth) { - return rth->rt_genid != rt_genid_ipv4(dev_net(rth->dst.dev)); + return rth->rt_genid != rt_genid_ipv4(dev_net(rth->dst.dev)) || + (rth->dst.expires && time_after(jiffies, rth->dst.expires)); } void rt_cache_flush(struct net *net) From patchwork Thu Jan 25 13:14:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leone Fernando X-Patchwork-Id: 13530814 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com [209.85.128.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C5E8C50A69; Thu, 25 Jan 2024 13:14:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.67 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706188482; cv=none; b=I7yH35otilebs/wSUDYELY4A5XUHHEr4oMhtUPsZc1A2OGti4xzdWXkw+IIOCIO/lz9/UyoqiFykf4qTAPo7MWdtCDNMZoR3cAJYMiTPGdie5jdp56SpkXED/g/QvJZGpzqvbdT+ic9xadgfnJO8Pdn1RgMLuNZImOMjpaVj9iA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706188482; c=relaxed/simple; bh=sS2wLsU7MluN9JmmNAzbFpTnr0xhf954jsxZIMM1KUc=; h=Message-ID:Date:MIME-Version:Subject:From:To:References: In-Reply-To:Content-Type; b=tu/NRovulUctw6aMo9iTnl3FbT92l/LeBFfHvDH6ejFusY0MaG4+LKfb5VeGDNP5cIGor1WA6k9J3Lbww8isAVlmMZNkaVV8XAh+9zZ5usnItNF/XxVojZyr2tXc+eGmiHp/MkkC51noTSQE3FAcB9rMEwy5kWHXkTgjDgl2fEU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Y+2DKi4R; arc=none smtp.client-ip=209.85.128.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Y+2DKi4R" Received: by mail-wm1-f67.google.com with SMTP id 5b1f17b1804b1-40e7e2e04f0so73314605e9.1; Thu, 25 Jan 2024 05:14:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706188479; x=1706793279; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:references:to:from :content-language:subject:mime-version:date:message-id:from:to:cc :subject:date:message-id:reply-to; bh=X4b/eU4REuAaKDyVxKej17wr9BHDZJIws/FBz+hMoFo=; b=Y+2DKi4Rip/duJhyOt8cg/aZuCPwibyQAycc4Q6OX9VG/VgYmH6JAPEP3H4ZWFPY47 0Sxr/dEwu8TDOLfCuylJKTgK32C6bsn0e/d3ovlqq4VmLK98ycwDL4tbRBa8Ra8/USGP /GzC4F7gWiU2zYKg+zn4Ee9KJjqnv+zp70NeLyL+rs12u3HfkTCPvoA5sXpR7NOPykPX tY9JmYPyBbqeyNFJcSKFI7vWaMc/Y5onNBoOJtp2eoQsd1zTK81cV1G1xjAkb1EIDxJj PE63GfZWXV84ENr56LNdckLI1XYe7h3hUfLG7ucm49TtRrOnGWZCSJp5ofWMm+kCwUWp vLoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706188479; x=1706793279; h=content-transfer-encoding:in-reply-to:references:to:from :content-language:subject:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=X4b/eU4REuAaKDyVxKej17wr9BHDZJIws/FBz+hMoFo=; b=qesJ4FmzD8aZc42jZqrTyGzV9ceCytTPSaF1xdHObaeHlOlDlSw6kjP1pPM9muaxVR AW3TEU3FHmAfLTaZqwSQgOH9eZ0dGH64m7kJ/VC5GYTDUHZ88tjp4woZbars3iuI0QMu G6paAQmrl4dXABrAcxmL+zB+2lIYnrNH/PN87C+oC+IO23Pj/7t16hQF9YTcnJEvfFAq iAaPXaUWT56pMn0edGWhsMjycYcHMkxLAAa0TZWHHZETYRbFWLHGwGElFQm/chXV/Y4h 65C98T/ml/dyITo81UWDjiwnuUWiC1kgf/RTNQdssxAKFg5+z4ubOdylsEx9/pBZLu13 MAXw== X-Gm-Message-State: AOJu0YzCFhRVxdx3auQo98Of1ZvbNAMv2R7lldj48aY4Ayj/rTbcGxGG cNEWvqX2dNH66551TO2JIcmny0ao+/5aRqezi3GApukTmLx82VKg X-Google-Smtp-Source: AGHT+IFZM+bA6Vbhmp4GsZBZRYAaTAEY5Dg3tS8OUJ6Cr7i04jgv/HNt/ngjPmlbe2oq+Y51inDqqA== X-Received: by 2002:a05:600c:3d91:b0:40e:4556:5870 with SMTP id bi17-20020a05600c3d9100b0040e45565870mr319886wmb.120.1706188478718; Thu, 25 Jan 2024 05:14:38 -0800 (PST) Received: from localhost ([45.130.85.2]) by smtp.gmail.com with ESMTPSA id l1-20020a05600c4f0100b0040e549c77a1sm2621155wmq.32.2024.01.25.05.14.34 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 25 Jan 2024 05:14:38 -0800 (PST) Message-ID: <547dd88c-f07d-4126-ae0b-bee126f23d73@gmail.com> Date: Thu, 25 Jan 2024 14:14:24 +0100 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: [RFC PATCH net-next v1 2/3] net: dst_cache: add input_dst_cache API Content-Language: en-US From: Leone Fernando To: dennis@kernel.org, tj@kernel.org, cl@linux.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, dsahern@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org References: In-Reply-To: X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC The input_dst_cache allows fast lookup of frequently encountered dsts. In order to provide stable results, I implemented a simple linear hashtable with each bucket containing a constant amount of entries (DST_CACHE_INPUT_BUCKET_SIZE). Similarly to how the route hint is used, I defined the hashtable key to contain the daddr and the tos of the IP header. Lookup is performed in a straightforward manner: start at the bucket head corresponding the hashed key and search the following DST_CACHE_INPUT_BUCKET_SIZE entries of the array for a matching key. When inserting a new dst to the cache, if all the bucket entries are full, the oldest one is deleted to make room for the new dst. Signed-off-by: Leone Fernando --- include/linux/percpu.h | 4 ++ include/net/dst_cache.h | 56 ++++++++++++++++ net/core/dst_cache.c | 145 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 205 insertions(+) diff --git a/include/linux/percpu.h b/include/linux/percpu.h index 8c677f185901..562d846b81fe 100644 --- a/include/linux/percpu.h +++ b/include/linux/percpu.h @@ -141,6 +141,10 @@ extern phys_addr_t per_cpu_ptr_to_phys(void *addr); #define alloc_percpu_gfp(type, gfp) \ (typeof(type) __percpu *)__alloc_percpu_gfp(sizeof(type), \ __alignof__(type), gfp) +#define alloc_percpu_array_gfp(type, size, gfp) \ + ((typeof(type) __percpu *)__alloc_percpu_gfp(sizeof(type[size]), \ + __alignof__(type[size]), \ + gfp)) #define alloc_percpu(type) \ (typeof(type) __percpu *)__alloc_percpu(sizeof(type), \ __alignof__(type)) diff --git a/include/net/dst_cache.h b/include/net/dst_cache.h index df6622a5fe98..560e7aec9347 100644 --- a/include/net/dst_cache.h +++ b/include/net/dst_cache.h @@ -8,11 +8,38 @@ #include #endif +#define DST_CACHE_INPUT_SHIFT (9) +#define DST_CACHE_INPUT_SIZE (1 << DST_CACHE_INPUT_SHIFT) +#define DST_CACHE_INPUT_BUCKET_SIZE (4) +#define DST_CACHE_INPUT_HASH_MASK (~(DST_CACHE_INPUT_BUCKET_SIZE - 1)) +#define INVALID_DST_CACHE_INPUT_KEY (~(u64)(0)) + struct dst_cache { struct dst_cache_pcpu __percpu *cache; unsigned long reset_ts; }; +extern unsigned int dst_cache_net_id __read_mostly; + +/** + * idst_for_each_in_bucket - iterate over a dst cache bucket + * @pos: the type * to use as a loop cursor + * @head: the head of the cpu dst cache. + * @hash: the hash of the bucket + */ +#define idst_for_each_in_bucket(pos, head, hash) \ + for (pos = &head[hash]; \ + pos < &head[hash + DST_CACHE_INPUT_BUCKET_SIZE]; \ + pos++) + +/** + * idst_for_each_in_cache - iterate over the dst cache + * @pos: the type * to use as a loop cursor + * @head: the head of the cpu dst cache. + */ +#define idst_for_each_in_cache(pos, head) \ + for (pos = head; pos < head + DST_CACHE_INPUT_SIZE; pos++) + /** * dst_cache_get - perform cache lookup * @dst_cache: the cache @@ -106,4 +133,33 @@ int dst_cache_init(struct dst_cache *dst_cache, gfp_t gfp); */ void dst_cache_destroy(struct dst_cache *dst_cache); +/** + * dst_cache_input_get_noref - perform lookup in the input cache, + * return a noref dst + * @dst_cache: the input cache + * @skb: the packet according to which the dst entry will be searched + * local BH must be disabled. + */ +struct dst_entry *dst_cache_input_get_noref(struct dst_cache *dst_cache, + struct sk_buff *skb); + +/** + * dst_cache_input_add - add the dst of the given skb to the input cache. + * + * in case the cache bucket is full, the oldest entry will be deleted + * and replaced with the new one. + * @dst_cache: the input cache + * @skb: The packet according to which the dst entry will be searched + * + * local BH must be disabled. + */ +void dst_cache_input_add(struct dst_cache *dst_cache, + const struct sk_buff *skb); + +/** + * dst_cache_input_init - initialize the input cache, + * allocating the required storage + */ +int __init dst_cache_input_init(void); + #endif diff --git a/net/core/dst_cache.c b/net/core/dst_cache.c index 0ccfd5fa5cb9..a635c0e52400 100644 --- a/net/core/dst_cache.c +++ b/net/core/dst_cache.c @@ -13,6 +13,7 @@ #include #endif #include +#include struct dst_cache_pcpu { unsigned long refresh_ts; @@ -21,9 +22,12 @@ struct dst_cache_pcpu { union { struct in_addr in_saddr; struct in6_addr in6_saddr; + u64 key; }; }; +unsigned int dst_cache_net_id __read_mostly; + static void dst_cache_per_cpu_dst_set(struct dst_cache_pcpu *dst_cache, struct dst_entry *dst, u32 cookie) { @@ -181,3 +185,144 @@ void dst_cache_reset_now(struct dst_cache *dst_cache) } } EXPORT_SYMBOL_GPL(dst_cache_reset_now); + +static void dst_cache_input_set(struct dst_cache_pcpu *idst, + struct dst_entry *dst, u64 key) +{ + dst_cache_per_cpu_dst_set(idst, dst, 0); + idst->key = key; + idst->refresh_ts = jiffies; +} + +static struct dst_entry *__dst_cache_input_get_noref(struct dst_cache_pcpu *idst) +{ + struct dst_entry *dst = idst->dst; + + if (unlikely(dst->obsolete && !dst->ops->check(dst, idst->cookie))) { + dst_cache_input_set(idst, NULL, INVALID_DST_CACHE_INPUT_KEY); + goto fail; + } + + idst->refresh_ts = jiffies; + return dst; + +fail: + return NULL; +} + +static inline u64 create_dst_cache_key_ip4(const struct sk_buff *skb) +{ + struct iphdr *iphdr = ip_hdr(skb); + + return (((u64)iphdr->daddr) << 8) | iphdr->tos; +} + +static inline u32 hash_dst_cache_key(u64 key) +{ + return hash_64(key, DST_CACHE_INPUT_SHIFT) & DST_CACHE_INPUT_HASH_MASK; +} + +struct dst_entry *dst_cache_input_get_noref(struct dst_cache *dst_cache, + struct sk_buff *skb) +{ + struct dst_entry *out_dst = NULL; + struct dst_cache_pcpu *pcpu_cache; + struct dst_cache_pcpu *idst; + u32 hash; + u64 key; + + pcpu_cache = this_cpu_ptr(dst_cache->cache); + key = create_dst_cache_key_ip4(skb); + hash = hash_dst_cache_key(key); + idst_for_each_in_bucket(idst, pcpu_cache, hash) { + if (key == idst->key) { + out_dst = __dst_cache_input_get_noref(idst); + goto out; + } + } +out: + return out_dst; +} + +static void dst_cache_input_reset_now(struct dst_cache *dst_cache) +{ + struct dst_cache_pcpu *caches; + struct dst_cache_pcpu *idst; + struct dst_entry *dst; + int i; + + for_each_possible_cpu(i) { + caches = per_cpu_ptr(dst_cache->cache, i); + idst_for_each_in_cache(idst, caches) { + idst->key = INVALID_DST_CACHE_INPUT_KEY; + dst = idst->dst; + if (dst) + dst_release(dst); + } + } +} + +static int __net_init dst_cache_input_net_init(struct net *net) +{ + struct dst_cache *dst_cache = net_generic(net, dst_cache_net_id); + + dst_cache->cache = alloc_percpu_array_gfp(struct dst_cache_pcpu, + DST_CACHE_INPUT_SIZE, + GFP_KERNEL | __GFP_ZERO); + if (!dst_cache->cache) + return -ENOMEM; + + dst_cache_input_reset_now(dst_cache); + return 0; +} + +static void __net_exit dst_cache_input_net_exit(struct net *net) +{ + struct dst_cache *dst_cache = net_generic(net, dst_cache_net_id); + + dst_cache_input_reset_now(dst_cache); + free_percpu(dst_cache->cache); + dst_cache->cache = NULL; +} + +static inline bool idst_empty(struct dst_cache_pcpu *idst) +{ + return idst->key == INVALID_DST_CACHE_INPUT_KEY; +} + +void dst_cache_input_add(struct dst_cache *dst_cache, const struct sk_buff *skb) +{ + struct dst_cache_pcpu *entry = NULL; + struct dst_cache_pcpu *pcpu_cache; + struct dst_cache_pcpu *idst; + u32 hash; + u64 key; + + pcpu_cache = this_cpu_ptr(dst_cache->cache); + key = create_dst_cache_key_ip4(skb); + hash = hash_dst_cache_key(key); + idst_for_each_in_bucket(idst, pcpu_cache, hash) { + if (idst_empty(idst)) { + entry = idst; + goto add_to_cache; + } + if (!entry || time_before(idst->refresh_ts, entry->refresh_ts)) + entry = idst; + } + +add_to_cache: + dst_cache_input_set(entry, skb_dst(skb), key); +} + +static struct pernet_operations dst_cache_input_ops __net_initdata = { + .init = dst_cache_input_net_init, + .exit = dst_cache_input_net_exit, + .id = &dst_cache_net_id, + .size = sizeof(struct dst_cache), +}; + +int __init dst_cache_input_init(void) +{ + return register_pernet_subsys(&dst_cache_input_ops); +} +subsys_initcall(dst_cache_input_init); From patchwork Thu Jan 25 13:15:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leone Fernando X-Patchwork-Id: 13530815 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 127884F605; Thu, 25 Jan 2024 13:16:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706188568; cv=none; b=QbkXyZH5ZtVYX7Cix82KPMQ6+qn9h+TuOL5e9D7fs+pd5G0apjVkfeFVKGDCLc1J948XCh/qJ/IOd2QgS4r5kpdqSLlTUMVp6f+C7Bs0o7B/1kk2+7ACg/N/NoC1DX605unvLvTMDeSn7M508ceCEZcG1Aq3fLNO3TxFVgFcAm0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706188568; c=relaxed/simple; bh=ZJSyLgJ61nDMOcehRqEBFXo3jWlONyIZTqzQ93mmFDQ=; h=Message-ID:Date:MIME-Version:Subject:From:To:References: In-Reply-To:Content-Type; b=T5v9H8ga0tTE/Mvy8KxwLFXQsfyxrgdi8ObID1oolvY8LVgFxIzMMvN33ZCKtkbMV1odW6l4Z9QfsU5Pi43sVkGbifNb2jyplHL/f+IIHeCcB0h3rG4AzXe9Ri2PGGHkFncvOyWOPwhLM6M7AC+ZAac5gTch92PIKd2KtudZ6Vc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=hzV/VcN2; arc=none smtp.client-ip=209.85.128.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hzV/VcN2" Received: by mail-wm1-f65.google.com with SMTP id 5b1f17b1804b1-40eb033c1b0so49385415e9.2; Thu, 25 Jan 2024 05:16:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706188565; x=1706793365; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:references:to:from :content-language:subject:mime-version:date:message-id:from:to:cc :subject:date:message-id:reply-to; bh=500hpU7ciKstMts7ME1z4g/cj3B8hAa7ln6Zs2/XYoA=; b=hzV/VcN2vtA3h97Pv220nI77tf2wPx1ssSwaU9Dt/UipfkpHxosRD977I4bijwghmS nKv3VdremPFyZ2tV7XLziraX4wiwzTSwiPMjtlgb58EOE790cbboagn9ibZHvY/Ch2Ye 4eFSyDId3oPHH2OPOyrtJhc0cVMbGloV5hmqC76cS7EFMglscmtpntK9CA3NHcIPPY7t u/RdFx8FclBayOI3YvxBoRz+e21uuKHBo9RW+iGAId01THqnA6EHi3gVn3ZqE2GCJOP0 J2dcTr0t8p6PvtDFmEkqOLdA7Yb70bVi2Bgr4akJirnFC1MfLU1lPd36jBR4ZHe7cDqk ZArA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706188565; x=1706793365; h=content-transfer-encoding:in-reply-to:references:to:from :content-language:subject:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=500hpU7ciKstMts7ME1z4g/cj3B8hAa7ln6Zs2/XYoA=; b=kMaiUXb4/QYy4XAgu5hao5zBrADtwA6GrHcMyhzZfheUzLyaX31UxuZfdu8oF1ebpm I+KfyqPAMRySXmNKwEClvi22x1ZJDQatmGHbbHVH9+Ntt1O5ltMcsBtkM4SPW48D6J6z TfDI4d9Jr8hUzsLwQ2WR3iCNjxCNK7rSNDTEvO01/BUBGceHoAY8mjAop55v2pPBG9d5 fBxKuFTVwfUFnoZnu+n9G0QLaIaf00PteOIcg9lnXFF5xF/b7cd6/VOVhVbh6bc6eS4J rTatuhi6c+EKFpE588+n5DHHxLaCOwb/ccns4eT7H0NNEnk/3Hc+FUldhpOI2zrwHO3H ChtQ== X-Gm-Message-State: AOJu0YwAPj6IkosJ3hQk4APVGXuQklkb2VN69czxVqdctHjA+tTnvpm9 cwHurBXg/vx4jdqYymEyooDOOf17yTXDc9MSr83bbvSiYVMO9kfb X-Google-Smtp-Source: AGHT+IE+sqmTgv2lgX3bs8kvWkbRw3h069uiugTXbICPwZ4qxDO4ck87O0blKxThh2IqNnzbRAWLLQ== X-Received: by 2002:a7b:c408:0:b0:40e:476d:6832 with SMTP id k8-20020a7bc408000000b0040e476d6832mr531445wmi.159.1706188565066; Thu, 25 Jan 2024 05:16:05 -0800 (PST) Received: from localhost ([45.130.85.2]) by smtp.gmail.com with ESMTPSA id l1-20020a05600c4f0100b0040e549c77a1sm2621155wmq.32.2024.01.25.05.16.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 25 Jan 2024 05:16:04 -0800 (PST) Message-ID: <301afa25-485f-460d-a06c-007f80a060d5@gmail.com> Date: Thu, 25 Jan 2024 14:15:51 +0100 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: [RFC PATCH net-next v1 3/3] net: route: replace route hints with input_dst_cache Content-Language: en-US From: Leone Fernando To: dennis@kernel.org, tj@kernel.org, cl@linux.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, dsahern@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org References: In-Reply-To: X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Replace route hints with cached dsts - ip_rcv_finish_core will first try to use the cache and only then fall back to the demux or perform a full lookup. Only add newly found dsts to the cache after all the checks have passed successfully to avoid adding a dropped packet's dst to the cache. Multicast dsts are not added to the dst_cache as it will require additional checks and multicast packets are rarer and a slower path anyway. A check was added to ip_route_use_dst_cache that prevents forwarding packets received by devices for which forwarding is disabled. Signed-off-by: Leone Fernando --- include/net/route.h | 6 ++--- net/ipv4/ip_input.c | 58 ++++++++++++++++++++++++--------------------- net/ipv4/route.c | 36 +++++++++++++++++++++------- 3 files changed, 61 insertions(+), 39 deletions(-) diff --git a/include/net/route.h b/include/net/route.h index 980ab474eabd..a5a2f55947d6 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -189,9 +189,9 @@ int ip_mc_validate_source(struct sk_buff *skb, __be32 daddr, __be32 saddr, struct in_device *in_dev, u32 *itag); int ip_route_input_noref(struct sk_buff *skb, __be32 dst, __be32 src, u8 tos, struct net_device *devin); -int ip_route_use_hint(struct sk_buff *skb, __be32 dst, __be32 src, - u8 tos, struct net_device *devin, - const struct sk_buff *hint); +int ip_route_use_dst_cache(struct sk_buff *skb, __be32 daddr, __be32 saddr, + u8 tos, struct net_device *dev, + struct dst_entry *dst); static inline int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src, u8 tos, struct net_device *devin) diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c index 5e9c8156656a..35c8b122d62f 100644 --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -305,30 +305,44 @@ static inline bool ip_rcv_options(struct sk_buff *skb, struct net_device *dev) return true; } -static bool ip_can_use_hint(const struct sk_buff *skb, const struct iphdr *iph, - const struct sk_buff *hint) +static bool ip_can_add_dst_cache(struct sk_buff *skb, __u16 rt_type) { - return hint && !skb_dst(skb) && ip_hdr(hint)->daddr == iph->daddr && - ip_hdr(hint)->tos == iph->tos; + return skb_valid_dst(skb) && + rt_type != RTN_BROADCAST && + rt_type != RTN_MULTICAST && + !(IPCB(skb)->flags & IPSKB_MULTIPATH); +} + +static bool ip_can_use_dst_cache(const struct net *net, struct sk_buff *skb) +{ + return !skb_dst(skb) && !fib4_has_custom_rules(net); } int tcp_v4_early_demux(struct sk_buff *skb); int udp_v4_early_demux(struct sk_buff *skb); static int ip_rcv_finish_core(struct net *net, struct sock *sk, - struct sk_buff *skb, struct net_device *dev, - const struct sk_buff *hint) + struct sk_buff *skb, struct net_device *dev) { + struct dst_cache *dst_cache = net_generic(net, dst_cache_net_id); const struct iphdr *iph = ip_hdr(skb); + struct dst_entry *dst; int err, drop_reason; struct rtable *rt; + bool do_cache; drop_reason = SKB_DROP_REASON_NOT_SPECIFIED; - if (ip_can_use_hint(skb, iph, hint)) { - err = ip_route_use_hint(skb, iph->daddr, iph->saddr, iph->tos, - dev, hint); - if (unlikely(err)) - goto drop_error; + do_cache = ip_can_use_dst_cache(net, skb); + if (do_cache) { + dst = dst_cache_input_get_noref(dst_cache, skb); + if (dst) { + err = ip_route_use_dst_cache(skb, iph->daddr, + iph->saddr, iph->tos, + dev, dst); + if (unlikely(err)) + goto drop_error; + do_cache = false; + } } if (READ_ONCE(net->ipv4.sysctl_ip_early_demux) && @@ -418,6 +432,9 @@ static int ip_rcv_finish_core(struct net *net, struct sock *sk, } } + if (do_cache && ip_can_add_dst_cache(skb, rt->rt_type)) + dst_cache_input_add(dst_cache, skb); + return NET_RX_SUCCESS; drop: @@ -444,7 +461,7 @@ static int ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb) if (!skb) return NET_RX_SUCCESS; - ret = ip_rcv_finish_core(net, sk, skb, dev, NULL); + ret = ip_rcv_finish_core(net, sk, skb, dev); if (ret != NET_RX_DROP) ret = dst_input(skb); return ret; @@ -581,21 +598,11 @@ static void ip_sublist_rcv_finish(struct list_head *head) } } -static struct sk_buff *ip_extract_route_hint(const struct net *net, - struct sk_buff *skb, int rt_type) -{ - if (fib4_has_custom_rules(net) || rt_type == RTN_BROADCAST || - IPCB(skb)->flags & IPSKB_MULTIPATH) - return NULL; - - return skb; -} - static void ip_list_rcv_finish(struct net *net, struct sock *sk, struct list_head *head) { - struct sk_buff *skb, *next, *hint = NULL; struct dst_entry *curr_dst = NULL; + struct sk_buff *skb, *next; struct list_head sublist; INIT_LIST_HEAD(&sublist); @@ -610,14 +617,11 @@ static void ip_list_rcv_finish(struct net *net, struct sock *sk, skb = l3mdev_ip_rcv(skb); if (!skb) continue; - if (ip_rcv_finish_core(net, sk, skb, dev, hint) == NET_RX_DROP) + if (ip_rcv_finish_core(net, sk, skb, dev) == NET_RX_DROP) continue; dst = skb_dst(skb); if (curr_dst != dst) { - hint = ip_extract_route_hint(net, skb, - ((struct rtable *)dst)->rt_type); - /* dispatch old sublist */ if (!list_empty(&sublist)) ip_sublist_rcv_finish(&sublist); diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 7c5e68117ee2..3f1977f9b25c 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2157,14 +2157,14 @@ static int ip_mkroute_input(struct sk_buff *skb, /* Implements all the saddr-related checks as ip_route_input_slow(), * assuming daddr is valid and the destination is not a local broadcast one. - * Uses the provided hint instead of performing a route lookup. + * Uses the provided dst from dst_cache instead of performing a route lookup. */ -int ip_route_use_hint(struct sk_buff *skb, __be32 daddr, __be32 saddr, - u8 tos, struct net_device *dev, - const struct sk_buff *hint) +int ip_route_use_dst_cache(struct sk_buff *skb, __be32 daddr, __be32 saddr, + u8 tos, struct net_device *dev, + struct dst_entry *dst) { struct in_device *in_dev = __in_dev_get_rcu(dev); - struct rtable *rt = skb_rtable(hint); + struct rtable *rt = (struct rtable *)dst; struct net *net = dev_net(dev); int err = -EINVAL; u32 tag = 0; @@ -2178,21 +2178,39 @@ int ip_route_use_hint(struct sk_buff *skb, __be32 daddr, __be32 saddr, if (ipv4_is_loopback(saddr) && !IN_DEV_NET_ROUTE_LOCALNET(in_dev, net)) goto martian_source; - if (rt->rt_type != RTN_LOCAL) - goto skip_validate_source; + if (ipv4_is_loopback(daddr) && !IN_DEV_NET_ROUTE_LOCALNET(in_dev, net)) + goto martian_destination; + if (rt->rt_type != RTN_LOCAL) { + if (!IN_DEV_FORWARD(in_dev)) { + err = -EHOSTUNREACH; + goto out_err; + } + goto skip_validate_source; + } tos &= IPTOS_RT_MASK; err = fib_validate_source(skb, saddr, daddr, tos, 0, dev, in_dev, &tag); if (err < 0) goto martian_source; skip_validate_source: - skb_dst_copy(skb, hint); + skb_dst_set_noref(skb, dst); return 0; martian_source: ip_handle_martian_source(dev, in_dev, skb, daddr, saddr); +out_err: return err; + +martian_destination: + RT_CACHE_STAT_INC(in_martian_dst); +#ifdef CONFIG_IP_ROUTE_VERBOSE + if (IN_DEV_LOG_MARTIANS(in_dev)) + net_warn_ratelimited("martian destination %pI4 from %pI4, dev %s\n", + &daddr, &saddr, dev->name); +#endif + err = -EINVAL; + goto out_err; } /* get device for dst_alloc with local routes */ @@ -2213,7 +2231,7 @@ static struct net_device *ip_rt_get_dev(struct net *net, * addresses, because every properly looped back packet * must have correct destination already attached by output routine. * Changes in the enforced policies must be applied also to - * ip_route_use_hint(). + * ip_route_use_dst_cache(). * * Such approach solves two big problems: * 1. Not simplex devices are handled properly.