diff mbox series

net/bridge: Optimizing read-write locks in ebtables.c

Message ID EC5AC714C75A855E+20240923091535.77865-1-yushengjin@uniontech.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series net/bridge: Optimizing read-write locks in ebtables.c | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 11 of 11 maintainers
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 19 this patch: 19
netdev/checkpatch warning CHECK: Alignment should match open parenthesis CHECK: Blank lines aren't necessary after an open brace '{' CHECK: Please don't use multiple blank lines WARNING: memory barrier without comment WARNING: suspect code indent for conditional statements (8, 12)
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc fail Errors and warnings before: 0 this patch: 3
netdev/source_inline success Was 0 now: 0

Commit Message

yushengjin Sept. 23, 2024, 9:15 a.m. UTC
When conducting WRK testing, the CPU usage rate of the testing machine was
100%. forwarding through a bridge, if the network load is too high, it may
cause abnormal load on the ebt_do_table of the kernel ebtable module, leading
to excessive soft interrupts and sometimes even directly causing CPU soft
deadlocks.

After analysis, it was found that the code of ebtables had not been optimized
for a long time, and the read-write locks inside still existed. However, other
arp/ip/ip6 tables had already been optimized a lot, and performance bottlenecks
in read-write locks had been discovered a long time ago.

Ref link: https://lore.kernel.org/lkml/20090428092411.5331c4a1@nehalam/

So I referred to arp/ip/ip6 modification methods to optimize the read-write
lock in ebtables.c.

test method:
1) Test machine creates bridge :
``` bash
brctl addbr br-a
brctl addbr br-b
brctl addif br-a enp1s0f0 enp1s0f1
brctl addif br-b enp130s0f0 enp130s0f1
ifconfig br-a up
ifconfig br-b up
```
2) Testing with another machine:
``` bash
ulimit -n 2048
./wrk -t48 -c2000 -d6000 -R10000 -s request.lua http://4.4.4.2:80/4k.html &
./wrk -t48 -c2000 -d6000 -R10000 -s request.lua http://5.5.5.2:80/4k.html &
```

Signed-off-by: yushengjin <yushengjin@uniontech.com>
---
 include/linux/netfilter_bridge/ebtables.h |  47 +++++++-
 net/bridge/netfilter/ebtables.c           | 132 ++++++++++++++++------
 2 files changed, 145 insertions(+), 34 deletions(-)

Comments

Eric Dumazet Sept. 23, 2024, 9:29 a.m. UTC | #1
On Mon, Sep 23, 2024 at 11:16 AM yushengjin <yushengjin@uniontech.com> wrote:
>
> When conducting WRK testing, the CPU usage rate of the testing machine was
> 100%. forwarding through a bridge, if the network load is too high, it may
> cause abnormal load on the ebt_do_table of the kernel ebtable module, leading
> to excessive soft interrupts and sometimes even directly causing CPU soft
> deadlocks.
>
> After analysis, it was found that the code of ebtables had not been optimized
> for a long time, and the read-write locks inside still existed. However, other
> arp/ip/ip6 tables had already been optimized a lot, and performance bottlenecks
> in read-write locks had been discovered a long time ago.
>
> Ref link: https://lore.kernel.org/lkml/20090428092411.5331c4a1@nehalam/
>
> So I referred to arp/ip/ip6 modification methods to optimize the read-write
> lock in ebtables.c.
>
> test method:
> 1) Test machine creates bridge :
> ``` bash
> brctl addbr br-a
> brctl addbr br-b
> brctl addif br-a enp1s0f0 enp1s0f1
> brctl addif br-b enp130s0f0 enp130s0f1
> ifconfig br-a up
> ifconfig br-b up
> ```
> 2) Testing with another machine:
> ``` bash
> ulimit -n 2048
> ./wrk -t48 -c2000 -d6000 -R10000 -s request.lua http://4.4.4.2:80/4k.html &
> ./wrk -t48 -c2000 -d6000 -R10000 -s request.lua http://5.5.5.2:80/4k.html &
> ```
>
> Signed-off-by: yushengjin <yushengjin@uniontech.com>
> ---
>  include/linux/netfilter_bridge/ebtables.h |  47 +++++++-
>  net/bridge/netfilter/ebtables.c           | 132 ++++++++++++++++------
>  2 files changed, 145 insertions(+), 34 deletions(-)
>
> diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h
> index fd533552a062..dd52dea20fb8 100644
> --- a/include/linux/netfilter_bridge/ebtables.h
> +++ b/include/linux/netfilter_bridge/ebtables.h
> @@ -93,7 +93,6 @@ struct ebt_table {
>         char name[EBT_TABLE_MAXNAMELEN];
>         struct ebt_replace_kernel *table;
>         unsigned int valid_hooks;
> -       rwlock_t lock;
>         /* the data used by the kernel */
>         struct ebt_table_info *private;
>         struct nf_hook_ops *ops;
> @@ -124,4 +123,50 @@ static inline bool ebt_invalid_target(int target)
>
>  int ebt_register_template(const struct ebt_table *t, int(*table_init)(struct net *net));
>  void ebt_unregister_template(const struct ebt_table *t);
> +
> +/**
> + * ebt_recseq - recursive seqcount for netfilter use
> + *
> + * Packet processing changes the seqcount only if no recursion happened
> + * get_counters() can use read_seqcount_begin()/read_seqcount_retry(),
> + * because we use the normal seqcount convention :
> + * Low order bit set to 1 if a writer is active.
> + */
> +DECLARE_PER_CPU(seqcount_t, ebt_recseq);
> +
> +/**
> + * ebt_write_recseq_begin - start of a write section
> + *
> + * Begin packet processing : all readers must wait the end
> + * 1) Must be called with preemption disabled
> + * 2) softirqs must be disabled too (or we should use this_cpu_add())
> + * Returns :
> + *  1 if no recursion on this cpu
> + *  0 if recursion detected
> + */
> +static inline unsigned int ebt_write_recseq_begin(void)
> +{
> +       unsigned int addend;
> +
> +       addend = (__this_cpu_read(ebt_recseq.sequence) + 1) & 1;
> +
> +       __this_cpu_add(ebt_recseq.sequence, addend);
> +       smp_mb();
> +
> +       return addend;
> +}
> +
> +/**
> + * ebt_write_recseq_end - end of a write section
> + * @addend: return value from previous ebt_write_recseq_begin()
> + *
> + * End packet processing : all readers can proceed
> + * 1) Must be called with preemption disabled
> + * 2) softirqs must be disabled too (or we should use this_cpu_add())
> + */
> +static inline void ebt_write_recseq_end(unsigned int addend)
> +{
> +       smp_wmb();
> +       __this_cpu_add(ebt_recseq.sequence, addend);
> +}

Why not reusing xt_recseq, xt_write_recseq_begin(), xt_write_recseq_end(),
instead of copy/pasting them ?

This was added in

commit 7f5c6d4f665bb57a19a34ce1fb16cc708c04f219    netfilter: get rid
of atomic ops in fast path

If this is an include mess, just move them in a separate include file.
yushengjin Sept. 23, 2024, 10:05 a.m. UTC | #2
在 23/9/2024 下午5:29, Eric Dumazet 写道:
> On Mon, Sep 23, 2024 at 11:16 AM yushengjin <yushengjin@uniontech.com> wrote:
>> When conducting WRK testing, the CPU usage rate of the testing machine was
>> 100%. forwarding through a bridge, if the network load is too high, it may
>> cause abnormal load on the ebt_do_table of the kernel ebtable module, leading
>> to excessive soft interrupts and sometimes even directly causing CPU soft
>> deadlocks.
>>
>> After analysis, it was found that the code of ebtables had not been optimized
>> for a long time, and the read-write locks inside still existed. However, other
>> arp/ip/ip6 tables had already been optimized a lot, and performance bottlenecks
>> in read-write locks had been discovered a long time ago.
>>
>> Ref link: https://lore.kernel.org/lkml/20090428092411.5331c4a1@nehalam/
>>
>> So I referred to arp/ip/ip6 modification methods to optimize the read-write
>> lock in ebtables.c.
>>
>> test method:
>> 1) Test machine creates bridge :
>> ``` bash
>> brctl addbr br-a
>> brctl addbr br-b
>> brctl addif br-a enp1s0f0 enp1s0f1
>> brctl addif br-b enp130s0f0 enp130s0f1
>> ifconfig br-a up
>> ifconfig br-b up
>> ```
>> 2) Testing with another machine:
>> ``` bash
>> ulimit -n 2048
>> ./wrk -t48 -c2000 -d6000 -R10000 -s request.lua http://4.4.4.2:80/4k.html &
>> ./wrk -t48 -c2000 -d6000 -R10000 -s request.lua http://5.5.5.2:80/4k.html &
>> ```
>>
>> Signed-off-by: yushengjin <yushengjin@uniontech.com>
>> ---
>>   include/linux/netfilter_bridge/ebtables.h |  47 +++++++-
>>   net/bridge/netfilter/ebtables.c           | 132 ++++++++++++++++------
>>   2 files changed, 145 insertions(+), 34 deletions(-)
>>
>> diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h
>> index fd533552a062..dd52dea20fb8 100644
>> --- a/include/linux/netfilter_bridge/ebtables.h
>> +++ b/include/linux/netfilter_bridge/ebtables.h
>> @@ -93,7 +93,6 @@ struct ebt_table {
>>          char name[EBT_TABLE_MAXNAMELEN];
>>          struct ebt_replace_kernel *table;
>>          unsigned int valid_hooks;
>> -       rwlock_t lock;
>>          /* the data used by the kernel */
>>          struct ebt_table_info *private;
>>          struct nf_hook_ops *ops;
>> @@ -124,4 +123,50 @@ static inline bool ebt_invalid_target(int target)
>>
>>   int ebt_register_template(const struct ebt_table *t, int(*table_init)(struct net *net));
>>   void ebt_unregister_template(const struct ebt_table *t);
>> +
>> +/**
>> + * ebt_recseq - recursive seqcount for netfilter use
>> + *
>> + * Packet processing changes the seqcount only if no recursion happened
>> + * get_counters() can use read_seqcount_begin()/read_seqcount_retry(),
>> + * because we use the normal seqcount convention :
>> + * Low order bit set to 1 if a writer is active.
>> + */
>> +DECLARE_PER_CPU(seqcount_t, ebt_recseq);
>> +
>> +/**
>> + * ebt_write_recseq_begin - start of a write section
>> + *
>> + * Begin packet processing : all readers must wait the end
>> + * 1) Must be called with preemption disabled
>> + * 2) softirqs must be disabled too (or we should use this_cpu_add())
>> + * Returns :
>> + *  1 if no recursion on this cpu
>> + *  0 if recursion detected
>> + */
>> +static inline unsigned int ebt_write_recseq_begin(void)
>> +{
>> +       unsigned int addend;
>> +
>> +       addend = (__this_cpu_read(ebt_recseq.sequence) + 1) & 1;
>> +
>> +       __this_cpu_add(ebt_recseq.sequence, addend);
>> +       smp_mb();
>> +
>> +       return addend;
>> +}
>> +
>> +/**
>> + * ebt_write_recseq_end - end of a write section
>> + * @addend: return value from previous ebt_write_recseq_begin()
>> + *
>> + * End packet processing : all readers can proceed
>> + * 1) Must be called with preemption disabled
>> + * 2) softirqs must be disabled too (or we should use this_cpu_add())
>> + */
>> +static inline void ebt_write_recseq_end(unsigned int addend)
>> +{
>> +       smp_wmb();
>> +       __this_cpu_add(ebt_recseq.sequence, addend);
>> +}
> Why not reusing xt_recseq, xt_write_recseq_begin(), xt_write_recseq_end(),
> instead of copy/pasting them ?
>
> This was added in
>
> commit 7f5c6d4f665bb57a19a34ce1fb16cc708c04f219    netfilter: get rid
> of atomic ops in fast path
They used different seqcounts, I'm worried it might have an impact.
>
> If this is an include mess, just move them in a separate include file.

Can i copy  ebt_write_recseq_begin(), ebt_write_recseq_endend to
include/linux/netfilter/x_tables.h ?

Or add a parameter in xt_write_recseq_begin() , xt_write_recseq_end()  to
clarify whether it is xt_recseq or ebt_recseq.
Eric Dumazet Sept. 23, 2024, 10:16 a.m. UTC | #3
On Mon, Sep 23, 2024 at 12:06 PM yushengjin <yushengjin@uniontech.com> wrote:
>
>
> 在 23/9/2024 下午5:29, Eric Dumazet 写道:
> > On Mon, Sep 23, 2024 at 11:16 AM yushengjin <yushengjin@uniontech.com> wrote:
> >> When conducting WRK testing, the CPU usage rate of the testing machine was
> >> 100%. forwarding through a bridge, if the network load is too high, it may
> >> cause abnormal load on the ebt_do_table of the kernel ebtable module, leading
> >> to excessive soft interrupts and sometimes even directly causing CPU soft
> >> deadlocks.
> >>
> >> After analysis, it was found that the code of ebtables had not been optimized
> >> for a long time, and the read-write locks inside still existed. However, other
> >> arp/ip/ip6 tables had already been optimized a lot, and performance bottlenecks
> >> in read-write locks had been discovered a long time ago.
> >>
> >> Ref link: https://lore.kernel.org/lkml/20090428092411.5331c4a1@nehalam/
> >>
> >> So I referred to arp/ip/ip6 modification methods to optimize the read-write
> >> lock in ebtables.c.
> >>
> >> test method:
> >> 1) Test machine creates bridge :
> >> ``` bash
> >> brctl addbr br-a
> >> brctl addbr br-b
> >> brctl addif br-a enp1s0f0 enp1s0f1
> >> brctl addif br-b enp130s0f0 enp130s0f1
> >> ifconfig br-a up
> >> ifconfig br-b up
> >> ```
> >> 2) Testing with another machine:
> >> ``` bash
> >> ulimit -n 2048
> >> ./wrk -t48 -c2000 -d6000 -R10000 -s request.lua http://4.4.4.2:80/4k.html &
> >> ./wrk -t48 -c2000 -d6000 -R10000 -s request.lua http://5.5.5.2:80/4k.html &
> >> ```
> >>
> >> Signed-off-by: yushengjin <yushengjin@uniontech.com>
> >> ---
> >>   include/linux/netfilter_bridge/ebtables.h |  47 +++++++-
> >>   net/bridge/netfilter/ebtables.c           | 132 ++++++++++++++++------
> >>   2 files changed, 145 insertions(+), 34 deletions(-)
> >>
> >> diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h
> >> index fd533552a062..dd52dea20fb8 100644
> >> --- a/include/linux/netfilter_bridge/ebtables.h
> >> +++ b/include/linux/netfilter_bridge/ebtables.h
> >> @@ -93,7 +93,6 @@ struct ebt_table {
> >>          char name[EBT_TABLE_MAXNAMELEN];
> >>          struct ebt_replace_kernel *table;
> >>          unsigned int valid_hooks;
> >> -       rwlock_t lock;
> >>          /* the data used by the kernel */
> >>          struct ebt_table_info *private;
> >>          struct nf_hook_ops *ops;
> >> @@ -124,4 +123,50 @@ static inline bool ebt_invalid_target(int target)
> >>
> >>   int ebt_register_template(const struct ebt_table *t, int(*table_init)(struct net *net));
> >>   void ebt_unregister_template(const struct ebt_table *t);
> >> +
> >> +/**
> >> + * ebt_recseq - recursive seqcount for netfilter use
> >> + *
> >> + * Packet processing changes the seqcount only if no recursion happened
> >> + * get_counters() can use read_seqcount_begin()/read_seqcount_retry(),
> >> + * because we use the normal seqcount convention :
> >> + * Low order bit set to 1 if a writer is active.
> >> + */
> >> +DECLARE_PER_CPU(seqcount_t, ebt_recseq);
> >> +
> >> +/**
> >> + * ebt_write_recseq_begin - start of a write section
> >> + *
> >> + * Begin packet processing : all readers must wait the end
> >> + * 1) Must be called with preemption disabled
> >> + * 2) softirqs must be disabled too (or we should use this_cpu_add())
> >> + * Returns :
> >> + *  1 if no recursion on this cpu
> >> + *  0 if recursion detected
> >> + */
> >> +static inline unsigned int ebt_write_recseq_begin(void)
> >> +{
> >> +       unsigned int addend;
> >> +
> >> +       addend = (__this_cpu_read(ebt_recseq.sequence) + 1) & 1;
> >> +
> >> +       __this_cpu_add(ebt_recseq.sequence, addend);
> >> +       smp_mb();
> >> +
> >> +       return addend;
> >> +}
> >> +
> >> +/**
> >> + * ebt_write_recseq_end - end of a write section
> >> + * @addend: return value from previous ebt_write_recseq_begin()
> >> + *
> >> + * End packet processing : all readers can proceed
> >> + * 1) Must be called with preemption disabled
> >> + * 2) softirqs must be disabled too (or we should use this_cpu_add())
> >> + */
> >> +static inline void ebt_write_recseq_end(unsigned int addend)
> >> +{
> >> +       smp_wmb();
> >> +       __this_cpu_add(ebt_recseq.sequence, addend);
> >> +}
> > Why not reusing xt_recseq, xt_write_recseq_begin(), xt_write_recseq_end(),
> > instead of copy/pasting them ?
> >
> > This was added in
> >
> > commit 7f5c6d4f665bb57a19a34ce1fb16cc708c04f219    netfilter: get rid
> > of atomic ops in fast path
> They used different seqcounts, I'm worried it might have an impact.
> >
> > If this is an include mess, just move them in a separate include file.
>
> Can i copy  ebt_write_recseq_begin(), ebt_write_recseq_endend to
> include/linux/netfilter/x_tables.h ?
>
> Or add a parameter in xt_write_recseq_begin() , xt_write_recseq_end()  to
> clarify whether it is xt_recseq or ebt_recseq.

I think you can reuse xt_recseq, xt_write_recseq_begin(),
xt_write_recseq_end() directly.

We use xt_recseq from three different users already.

If you think you need a separate variable, please elaborate.

We prefer to reuse code, obviously.
diff mbox series

Patch

diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h
index fd533552a062..dd52dea20fb8 100644
--- a/include/linux/netfilter_bridge/ebtables.h
+++ b/include/linux/netfilter_bridge/ebtables.h
@@ -93,7 +93,6 @@  struct ebt_table {
 	char name[EBT_TABLE_MAXNAMELEN];
 	struct ebt_replace_kernel *table;
 	unsigned int valid_hooks;
-	rwlock_t lock;
 	/* the data used by the kernel */
 	struct ebt_table_info *private;
 	struct nf_hook_ops *ops;
@@ -124,4 +123,50 @@  static inline bool ebt_invalid_target(int target)
 
 int ebt_register_template(const struct ebt_table *t, int(*table_init)(struct net *net));
 void ebt_unregister_template(const struct ebt_table *t);
+
+/**
+ * ebt_recseq - recursive seqcount for netfilter use
+ *
+ * Packet processing changes the seqcount only if no recursion happened
+ * get_counters() can use read_seqcount_begin()/read_seqcount_retry(),
+ * because we use the normal seqcount convention :
+ * Low order bit set to 1 if a writer is active.
+ */
+DECLARE_PER_CPU(seqcount_t, ebt_recseq);
+
+/**
+ * ebt_write_recseq_begin - start of a write section
+ *
+ * Begin packet processing : all readers must wait the end
+ * 1) Must be called with preemption disabled
+ * 2) softirqs must be disabled too (or we should use this_cpu_add())
+ * Returns :
+ *  1 if no recursion on this cpu
+ *  0 if recursion detected
+ */
+static inline unsigned int ebt_write_recseq_begin(void)
+{
+	unsigned int addend;
+
+	addend = (__this_cpu_read(ebt_recseq.sequence) + 1) & 1;
+
+	__this_cpu_add(ebt_recseq.sequence, addend);
+	smp_mb();
+
+	return addend;
+}
+
+/**
+ * ebt_write_recseq_end - end of a write section
+ * @addend: return value from previous ebt_write_recseq_begin()
+ *
+ * End packet processing : all readers can proceed
+ * 1) Must be called with preemption disabled
+ * 2) softirqs must be disabled too (or we should use this_cpu_add())
+ */
+static inline void ebt_write_recseq_end(unsigned int addend)
+{
+	smp_wmb();
+	__this_cpu_add(ebt_recseq.sequence, addend);
+}
 #endif
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 3e67d4aff419..9da5733a26ea 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -40,6 +40,9 @@ 
 #define COUNTER_BASE(c, n, cpu) ((struct ebt_counter *)(((char *)c) + \
 				 COUNTER_OFFSET(n) * cpu))
 
+DEFINE_PER_CPU(seqcount_t, ebt_recseq);
+EXPORT_PER_CPU_SYMBOL_GPL(ebt_recseq);
+
 struct ebt_pernet {
 	struct list_head tables;
 };
@@ -204,11 +207,14 @@  unsigned int ebt_do_table(void *priv, struct sk_buff *skb,
 	const char *base;
 	const struct ebt_table_info *private;
 	struct xt_action_param acpar;
+	unsigned int addend;
 
 	acpar.state   = state;
 	acpar.hotdrop = false;
 
-	read_lock_bh(&table->lock);
+	local_bh_disable();
+	addend = ebt_write_recseq_begin();
+
 	private = table->private;
 	cb_base = COUNTER_BASE(private->counters, private->nentries,
 	   smp_processor_id());
@@ -229,10 +235,8 @@  unsigned int ebt_do_table(void *priv, struct sk_buff *skb,
 
 		if (EBT_MATCH_ITERATE(point, ebt_do_match, skb, &acpar) != 0)
 			goto letscontinue;
-		if (acpar.hotdrop) {
-			read_unlock_bh(&table->lock);
-			return NF_DROP;
-		}
+		if (acpar.hotdrop)
+			goto drop_out;
 
 		ADD_COUNTER(*(counter_base + i), skb->len, 1);
 
@@ -251,13 +255,13 @@  unsigned int ebt_do_table(void *priv, struct sk_buff *skb,
 			verdict = t->u.target->target(skb, &acpar);
 		}
 		if (verdict == EBT_ACCEPT) {
-			read_unlock_bh(&table->lock);
+			ebt_write_recseq_end(addend);
+			local_bh_enable();
 			return NF_ACCEPT;
 		}
-		if (verdict == EBT_DROP) {
-			read_unlock_bh(&table->lock);
-			return NF_DROP;
-		}
+		if (verdict == EBT_DROP)
+			goto drop_out;
+
 		if (verdict == EBT_RETURN) {
 letsreturn:
 			if (WARN(sp == 0, "RETURN on base chain")) {
@@ -278,10 +282,8 @@  unsigned int ebt_do_table(void *priv, struct sk_buff *skb,
 		if (verdict == EBT_CONTINUE)
 			goto letscontinue;
 
-		if (WARN(verdict < 0, "bogus standard verdict\n")) {
-			read_unlock_bh(&table->lock);
-			return NF_DROP;
-		}
+		if (WARN(verdict < 0, "bogus standard verdict\n"))
+			goto drop_out;
 
 		/* jump to a udc */
 		cs[sp].n = i + 1;
@@ -290,10 +292,8 @@  unsigned int ebt_do_table(void *priv, struct sk_buff *skb,
 		i = 0;
 		chaininfo = (struct ebt_entries *) (base + verdict);
 
-		if (WARN(chaininfo->distinguisher, "jump to non-chain\n")) {
-			read_unlock_bh(&table->lock);
-			return NF_DROP;
-		}
+		if (WARN(chaininfo->distinguisher, "jump to non-chain\n"))
+			goto drop_out;
 
 		nentries = chaininfo->nentries;
 		point = (struct ebt_entry *)chaininfo->data;
@@ -309,10 +309,15 @@  unsigned int ebt_do_table(void *priv, struct sk_buff *skb,
 	if (chaininfo->policy == EBT_RETURN)
 		goto letsreturn;
 	if (chaininfo->policy == EBT_ACCEPT) {
-		read_unlock_bh(&table->lock);
+		ebt_write_recseq_end(addend);
+		local_bh_enable();
 		return NF_ACCEPT;
 	}
-	read_unlock_bh(&table->lock);
+
+drop_out:
+	ebt_write_recseq_end(addend);
+	local_bh_enable();
+
 	return NF_DROP;
 }
 
@@ -983,12 +988,48 @@  static int translate_table(struct net *net, const char *name,
 	return ret;
 }
 
-/* called under write_lock */
+
 static void get_counters(const struct ebt_counter *oldcounters,
 			 struct ebt_counter *counters, unsigned int nentries)
 {
 	int i, cpu;
 	struct ebt_counter *counter_base;
+	seqcount_t *s;
+
+	/* counters of cpu 0 */
+	memcpy(counters, oldcounters,
+	       sizeof(struct ebt_counter) * nentries);
+
+	/* add other counters to those of cpu 0 */
+	for_each_possible_cpu(cpu) {
+
+		if (cpu == 0)
+			continue;
+
+		s = &per_cpu(ebt_recseq, cpu);
+		counter_base = COUNTER_BASE(oldcounters, nentries, cpu);
+		for (i = 0; i < nentries; i++) {
+			u64 bcnt, pcnt;
+			unsigned int start;
+
+			do {
+				start = read_seqcount_begin(s);
+				bcnt = counter_base[i].bcnt;
+				pcnt = counter_base[i].pcnt;
+			} while (read_seqcount_retry(s, start));
+
+			ADD_COUNTER(counters[i], bcnt, pcnt);
+			cond_resched();
+		}
+	}
+}
+
+
+static void get_old_counters(const struct ebt_counter *oldcounters,
+			 struct ebt_counter *counters, unsigned int nentries)
+{
+	int i, cpu;
+	struct ebt_counter *counter_base;
 
 	/* counters of cpu 0 */
 	memcpy(counters, oldcounters,
@@ -1013,6 +1054,7 @@  static int do_replace_finish(struct net *net, struct ebt_replace *repl,
 	/* used to be able to unlock earlier */
 	struct ebt_table_info *table;
 	struct ebt_table *t;
+	unsigned int cpu;
 
 	/* the user wants counters back
 	 * the check on the size is done later, when we have the lock
@@ -1050,6 +1092,8 @@  static int do_replace_finish(struct net *net, struct ebt_replace *repl,
 		goto free_unlock;
 	}
 
+	local_bh_disable();
+
 	/* we have the mutex lock, so no danger in reading this pointer */
 	table = t->private;
 	/* make sure the table can only be rmmod'ed if it contains no rules */
@@ -1058,15 +1102,31 @@  static int do_replace_finish(struct net *net, struct ebt_replace *repl,
 		goto free_unlock;
 	} else if (table->nentries && !newinfo->nentries)
 		module_put(t->me);
-	/* we need an atomic snapshot of the counters */
-	write_lock_bh(&t->lock);
-	if (repl->num_counters)
-		get_counters(t->private->counters, counterstmp,
-		   t->private->nentries);
 
+	smp_wmb();
 	t->private = newinfo;
-	write_unlock_bh(&t->lock);
+	smp_mb();
+
+	local_bh_enable();
+
+	// wait for even ebt_recseq on all cpus
+	for_each_possible_cpu(cpu) {
+		seqcount_t *s = &per_cpu(ebt_recseq, cpu);
+		u32 seq = raw_read_seqcount(s);
+
+		if (seq & 1) {
+			do {
+				cond_resched();
+				cpu_relax();
+			} while (seq == raw_read_seqcount(s));
+		}
+	}
+
 	mutex_unlock(&ebt_mutex);
+
+	if (repl->num_counters)
+	    get_old_counters(table->counters, counterstmp, table->nentries);
+
 	/* so, a user can change the chains while having messed up her counter
 	 * allocation. Only reason why this is done is because this way the lock
 	 * is held only once, while this doesn't bring the kernel into a
@@ -1093,6 +1153,7 @@  static int do_replace_finish(struct net *net, struct ebt_replace *repl,
 	return 0;
 
 free_unlock:
+	local_bh_enable();
 	mutex_unlock(&ebt_mutex);
 free_iterate:
 	EBT_ENTRY_ITERATE(newinfo->entries, newinfo->entries_size,
@@ -1235,7 +1296,6 @@  int ebt_register_table(struct net *net, const struct ebt_table *input_table,
 		goto free_chainstack;
 
 	table->private = newinfo;
-	rwlock_init(&table->lock);
 	mutex_lock(&ebt_mutex);
 	list_for_each_entry(t, &ebt_net->tables, list) {
 		if (strcmp(t->name, table->name) == 0) {
@@ -1382,6 +1442,7 @@  static int do_update_counters(struct net *net, const char *name,
 	int i, ret;
 	struct ebt_counter *tmp;
 	struct ebt_table *t;
+	unsigned int addend;
 
 	if (num_counters == 0)
 		return -EINVAL;
@@ -1405,14 +1466,16 @@  static int do_update_counters(struct net *net, const char *name,
 		goto unlock_mutex;
 	}
 
-	/* we want an atomic add of the counters */
-	write_lock_bh(&t->lock);
+	local_bh_disable();
+	addend = ebt_write_recseq_begin();
 
 	/* we add to the counters of the first cpu */
 	for (i = 0; i < num_counters; i++)
 		ADD_COUNTER(t->private->counters[i], tmp[i].bcnt, tmp[i].pcnt);
 
-	write_unlock_bh(&t->lock);
+	ebt_write_recseq_end(addend);
+	local_bh_enable();
+
 	ret = 0;
 unlock_mutex:
 	mutex_unlock(&ebt_mutex);
@@ -1530,9 +1593,7 @@  static int copy_counters_to_user(struct ebt_table *t,
 	if (!counterstmp)
 		return -ENOMEM;
 
-	write_lock_bh(&t->lock);
 	get_counters(oldcounters, counterstmp, nentries);
-	write_unlock_bh(&t->lock);
 
 	if (copy_to_user(user, counterstmp,
 	    array_size(nentries, sizeof(struct ebt_counter))))
@@ -2568,6 +2629,11 @@  static struct pernet_operations ebt_net_ops = {
 static int __init ebtables_init(void)
 {
 	int ret;
+	unsigned int i;
+
+	for_each_possible_cpu(i) {
+	    seqcount_init(&per_cpu(ebt_recseq, i));
+	}
 
 	ret = xt_register_target(&ebt_standard_target);
 	if (ret < 0)