Message ID | 20240822043609.141992-1-takakura@valinux.co.jp (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | netfilter: Don't track counter updates of do_add_counters() | expand |
On Thu, Aug 22, 2024 at 6:36 AM <takakura@valinux.co.jp> wrote: > > From: Ryo Takakura <takakura@valinux.co.jp> > > While adding counters in do_add_counters(), we call > xt_write_recseq_begin/end to indicate that counters are being updated. > Updates are being tracked so that the counters retrieved by get_counters() > will reflect concurrent updates. > > However, there is no need to track the updates done by do_add_counters() as > both do_add_counters() and get_counters() acquire per ipv4,ipv6,arp mutex > beforehand which prevents concurrent update and retrieval between the two. > > Moreover, as the xt_write_recseq_begin/end is shared among ipv4,ipv6,arp, > do_add_counters() called by one of ipv4,ipv6,arp can falsely delay the > synchronization of concurrent get_counters() or xt_replace_table() called > by any other than the one calling do_add_counters(). > > So remove xt_write_recseq_begin/end from do_add_counters() for ipv4,ipv6,arp. Completely wrong patch. There is no way we can update pairs of 64bit counters without any synchronization. This is per cpu sequence, the 'shared among ipv4,ipv6,arp' part is moot. We could use cmpxchg128 on 64bit arches, but I suspect there will be no improvement.
Hi Eric, Thanks for taking a look at the patch! And sorry that I see that I was missing the point of the synchronization. On 2024-08-22 6:03 Eric Dumazet wrote: >On Thu, Aug 22, 2024 at 6:36 AM <takakura@valinux.co.jp> wrote: >> >> From: Ryo Takakura <takakura@valinux.co.jp> >> >> While adding counters in do_add_counters(), we call >> xt_write_recseq_begin/end to indicate that counters are being updated. >> Updates are being tracked so that the counters retrieved by get_counters() >> will reflect concurrent updates. >> >> However, there is no need to track the updates done by do_add_counters() as >> both do_add_counters() and get_counters() acquire per ipv4,ipv6,arp mutex >> beforehand which prevents concurrent update and retrieval between the two. >> >> Moreover, as the xt_write_recseq_begin/end is shared among ipv4,ipv6,arp, >> do_add_counters() called by one of ipv4,ipv6,arp can falsely delay the >> synchronization of concurrent get_counters() or xt_replace_table() called >> by any other than the one calling do_add_counters(). >> >> So remove xt_write_recseq_begin/end from do_add_counters() for ipv4,ipv6,arp. > >Completely wrong patch. > >There is no way we can update pairs of 64bit counters without any >synchronization. Yes, I was completely wrong about why the synchronization is required... >This is per cpu sequence, the 'shared among ipv4,ipv6,arp' part is moot. > >We could use cmpxchg128 on 64bit arches, but I suspect there will be >no improvement. I see. And if we were to use cmpxchg128, we would also need to come up with the way for xt_replace_table()'s synchronization which I guess the current per cpu sequence is more suited. Sincerely, Ryo Takakura
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index 14365b20f1c5..20de048d3e0c 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -1009,7 +1009,6 @@ static int do_add_counters(struct net *net, sockptr_t arg, unsigned int len) const struct xt_table_info *private; int ret = 0; struct arpt_entry *iter; - unsigned int addend; paddc = xt_copy_counters(arg, len, &tmp); if (IS_ERR(paddc)) @@ -1029,8 +1028,6 @@ static int do_add_counters(struct net *net, sockptr_t arg, unsigned int len) } i = 0; - - addend = xt_write_recseq_begin(); xt_entry_foreach(iter, private->entries, private->size) { struct xt_counters *tmp; @@ -1038,7 +1035,6 @@ static int do_add_counters(struct net *net, sockptr_t arg, unsigned int len) ADD_COUNTER(*tmp, paddc[i].bcnt, paddc[i].pcnt); ++i; } - xt_write_recseq_end(addend); unlock_up_free: local_bh_enable(); xt_table_unlock(t); diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index fe89a056eb06..f54dea2a8fcd 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -1162,7 +1162,6 @@ do_add_counters(struct net *net, sockptr_t arg, unsigned int len) const struct xt_table_info *private; int ret = 0; struct ipt_entry *iter; - unsigned int addend; paddc = xt_copy_counters(arg, len, &tmp); if (IS_ERR(paddc)) @@ -1182,7 +1181,6 @@ do_add_counters(struct net *net, sockptr_t arg, unsigned int len) } i = 0; - addend = xt_write_recseq_begin(); xt_entry_foreach(iter, private->entries, private->size) { struct xt_counters *tmp; @@ -1190,7 +1188,6 @@ do_add_counters(struct net *net, sockptr_t arg, unsigned int len) ADD_COUNTER(*tmp, paddc[i].bcnt, paddc[i].pcnt); ++i; } - xt_write_recseq_end(addend); unlock_up_free: local_bh_enable(); xt_table_unlock(t); diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index 131f7bb2110d..f1d3bb74eb16 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -1179,7 +1179,6 @@ do_add_counters(struct net *net, sockptr_t arg, unsigned int len) const struct xt_table_info *private; int ret = 0; struct ip6t_entry *iter; - unsigned int addend; paddc = xt_copy_counters(arg, len, &tmp); if (IS_ERR(paddc)) @@ -1198,7 +1197,6 @@ do_add_counters(struct net *net, sockptr_t arg, unsigned int len) } i = 0; - addend = xt_write_recseq_begin(); xt_entry_foreach(iter, private->entries, private->size) { struct xt_counters *tmp; @@ -1206,7 +1204,6 @@ do_add_counters(struct net *net, sockptr_t arg, unsigned int len) ADD_COUNTER(*tmp, paddc[i].bcnt, paddc[i].pcnt); ++i; } - xt_write_recseq_end(addend); unlock_up_free: local_bh_enable(); xt_table_unlock(t);