From patchwork Sun Mar 31 14:03:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Anastasov X-Patchwork-Id: 13612221 Received: from mg.ssi.bg (mg.ssi.bg [193.238.174.37]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4BDB9144D27 for ; Sun, 31 Mar 2024 14:07:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.238.174.37 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711894039; cv=none; b=FrsOXgxpbV8mAv2EgnCV8slDb4ZZ6VlA3E9adMrEgOHs/JSCarJZXnQ/FBqfQ6Q2KWauFV4wo9LkOB9fAew2E28R1Mz6pKl+BEeBOsee8MmB5xeSQhwL28HYaSsSs8PqD9x3lkdtg/rw8ps17gOuYBIFvkHcGXuEtONJ8NilUI8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711894039; c=relaxed/simple; bh=xLorx/8OcTkViFcaWyo9cdkI1WhH6+rwUBzwagR88JM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cXXY66dNeKpybxlVJfG2yopp/ZZeH1nIIJEjNkbfRb5MhvsWZv01in3KOESSNLIf5uGtnwn+YZMceL++RtxrwfdxdffDjZwOh7NCiXsuJcfZ+yVqz4rz7aAEJWXR7UvrG8Jbpgj9U5feoT+sA4x2q3Sj8GMzK7OwWfOsDEaPz28= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=ssi.bg; spf=pass smtp.mailfrom=ssi.bg; dkim=pass (1024-bit key) header.d=ssi.bg header.i=@ssi.bg header.b=de6Pg8GJ; arc=none smtp.client-ip=193.238.174.37 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=ssi.bg Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ssi.bg Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=ssi.bg header.i=@ssi.bg header.b="de6Pg8GJ" Received: from mg.ssi.bg (localhost [127.0.0.1]) by mg.ssi.bg (Proxmox) with ESMTP id AD8CF31E81 for ; Sun, 31 Mar 2024 17:07:15 +0300 (EEST) Received: from ink.ssi.bg (ink.ssi.bg [193.238.174.40]) by mg.ssi.bg (Proxmox) with ESMTPS id ED76131D9D for ; Sun, 31 Mar 2024 17:07:14 +0300 (EEST) Received: from ja.ssi.bg (unknown [213.16.62.126]) by ink.ssi.bg (Postfix) with ESMTPSA id 45936900C3F; Sun, 31 Mar 2024 17:07:04 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=ssi.bg; s=ink; t=1711894024; bh=xLorx/8OcTkViFcaWyo9cdkI1WhH6+rwUBzwagR88JM=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=de6Pg8GJmGI6KDWieuHyc0Xel/AVT6a7+uBzPlk7XrGG04lDNuB8TzwfT0h9fiH8h xtZswVN0ygR615JLsZzbJ2zBqeICWG1ZBDtyYl3GP9GMCD2O0vthTgJQMJjAQps8Fi Sc2+hhl4Df/XnFuxmifI8MfAjjis3t1+42UOhiKY= Received: from ja.home.ssi.bg (localhost.localdomain [127.0.0.1]) by ja.ssi.bg (8.17.1/8.17.1) with ESMTP id 42VE45PX077710; Sun, 31 Mar 2024 17:04:05 +0300 Received: (from root@localhost) by ja.home.ssi.bg (8.17.1/8.17.1/Submit) id 42VE45nk077709; Sun, 31 Mar 2024 17:04:05 +0300 From: Julian Anastasov To: Simon Horman Cc: lvs-devel@vger.kernel.org, netfilter-devel@vger.kernel.org, Dust Li , Jiejian Wu , rcu@vger.kernel.org Subject: [PATCHv3 net-next 05/14] ipvs: do not keep dest_dst after dest is removed Date: Sun, 31 Mar 2024 17:03:52 +0300 Message-ID: <20240331140401.77657-6-ja@ssi.bg> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240331140401.77657-1-ja@ssi.bg> References: <20240331140401.77657-1-ja@ssi.bg> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Before now dest->dest_dst is not released when server is moved into dest_trash list after removal. As result, we can keep dst/dev references for long time without actively using them. It is better to avoid walking the dest_trash list when ip_vs_dst_event() receives dev events. So, make sure we do not hold dev references in dest_trash list. As packets can be flying while server is being removed, check the IP_VS_DEST_F_AVAILABLE flag in slow path to ensure we do not save new dev references to removed servers. Signed-off-by: Julian Anastasov --- net/netfilter/ipvs/ip_vs_ctl.c | 20 +++++++---------- net/netfilter/ipvs/ip_vs_xmit.c | 39 ++++++++++++++++++++++++--------- 2 files changed, 37 insertions(+), 22 deletions(-) diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index e325e5f9d37b..7a8c7ac94194 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -810,7 +810,6 @@ static void ip_vs_dest_free(struct ip_vs_dest *dest) { struct ip_vs_service *svc = rcu_dereference_protected(dest->svc, 1); - __ip_vs_dst_cache_reset(dest); __ip_vs_svc_put(svc); call_rcu(&dest->rcu_head, ip_vs_dest_rcu_free); } @@ -1013,10 +1012,6 @@ __ip_vs_update_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest, dest->af = udest->af; - spin_lock_bh(&dest->dst_lock); - __ip_vs_dst_cache_reset(dest); - spin_unlock_bh(&dest->dst_lock); - if (add) { list_add_rcu(&dest->n_list, &svc->destinations); svc->num_dests++; @@ -1024,6 +1019,10 @@ __ip_vs_update_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest, if (sched && sched->add_dest) sched->add_dest(svc, dest); } else { + spin_lock_bh(&dest->dst_lock); + __ip_vs_dst_cache_reset(dest); + spin_unlock_bh(&dest->dst_lock); + sched = rcu_dereference_protected(svc->scheduler, 1); if (sched && sched->upd_dest) sched->upd_dest(svc, dest); @@ -1258,6 +1257,10 @@ static void __ip_vs_unlink_dest(struct ip_vs_service *svc, { dest->flags &= ~IP_VS_DEST_F_AVAILABLE; + spin_lock_bh(&dest->dst_lock); + __ip_vs_dst_cache_reset(dest); + spin_unlock_bh(&dest->dst_lock); + /* * Remove it from the d-linked destination list. */ @@ -1747,13 +1750,6 @@ static int ip_vs_dst_event(struct notifier_block *this, unsigned long event, } rcu_read_unlock(); - mutex_lock(&ipvs->service_mutex); - spin_lock_bh(&ipvs->dest_trash_lock); - list_for_each_entry(dest, &ipvs->dest_trash, t_list) { - ip_vs_forget_dev(dest, dev); - } - spin_unlock_bh(&ipvs->dest_trash_lock); - mutex_unlock(&ipvs->service_mutex); return NOTIFY_DONE; } diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c index 65e0259178da..4a41b36fcfb8 100644 --- a/net/netfilter/ipvs/ip_vs_xmit.c +++ b/net/netfilter/ipvs/ip_vs_xmit.c @@ -317,9 +317,11 @@ __ip_vs_get_out_rt(struct netns_ipvs *ipvs, int skb_af, struct sk_buff *skb, if (dest) { dest_dst = __ip_vs_dst_check(dest); - if (likely(dest_dst)) + if (likely(dest_dst)) { rt = (struct rtable *) dest_dst->dst_cache; - else { + if (ret_saddr) + *ret_saddr = dest_dst->dst_saddr.ip; + } else { dest_dst = ip_vs_dest_dst_alloc(); spin_lock_bh(&dest->dst_lock); if (!dest_dst) { @@ -335,14 +337,24 @@ __ip_vs_get_out_rt(struct netns_ipvs *ipvs, int skb_af, struct sk_buff *skb, ip_vs_dest_dst_free(dest_dst); goto err_unreach; } - __ip_vs_dst_set(dest, dest_dst, &rt->dst, 0); + /* It is forbidden to attach dest->dest_dst if + * server is deleted. We can see the flag going down, + * for very short period and it must be checked under + * dst_lock. + */ + if (dest->flags & IP_VS_DEST_F_AVAILABLE) + __ip_vs_dst_set(dest, dest_dst, &rt->dst, 0); + else + noref = 0; spin_unlock_bh(&dest->dst_lock); IP_VS_DBG(10, "new dst %pI4, src %pI4, refcnt=%d\n", &dest->addr.ip, &dest_dst->dst_saddr.ip, rcuref_read(&rt->dst.__rcuref)); + if (ret_saddr) + *ret_saddr = dest_dst->dst_saddr.ip; + if (!noref) + ip_vs_dest_dst_free(dest_dst); } - if (ret_saddr) - *ret_saddr = dest_dst->dst_saddr.ip; } else { __be32 saddr = htonl(INADDR_ANY); @@ -480,9 +492,11 @@ __ip_vs_get_out_rt_v6(struct netns_ipvs *ipvs, int skb_af, struct sk_buff *skb, if (dest) { dest_dst = __ip_vs_dst_check(dest); - if (likely(dest_dst)) + if (likely(dest_dst)) { rt = (struct rt6_info *) dest_dst->dst_cache; - else { + if (ret_saddr) + *ret_saddr = dest_dst->dst_saddr.in6; + } else { u32 cookie; dest_dst = ip_vs_dest_dst_alloc(); @@ -503,14 +517,19 @@ __ip_vs_get_out_rt_v6(struct netns_ipvs *ipvs, int skb_af, struct sk_buff *skb, } rt = (struct rt6_info *) dst; cookie = rt6_get_cookie(rt); - __ip_vs_dst_set(dest, dest_dst, &rt->dst, cookie); + if (dest->flags & IP_VS_DEST_F_AVAILABLE) + __ip_vs_dst_set(dest, dest_dst, &rt->dst, cookie); + else + noref = 0; spin_unlock_bh(&dest->dst_lock); IP_VS_DBG(10, "new dst %pI6, src %pI6, refcnt=%d\n", &dest->addr.in6, &dest_dst->dst_saddr.in6, rcuref_read(&rt->dst.__rcuref)); + if (ret_saddr) + *ret_saddr = dest_dst->dst_saddr.in6; + if (!noref) + ip_vs_dest_dst_free(dest_dst); } - if (ret_saddr) - *ret_saddr = dest_dst->dst_saddr.in6; } else { noref = 0; dst = __ip_vs_route_output_v6(net, daddr, ret_saddr, do_xfrm,