diff mbox series

[RFC,net-next,2/3] net: dsa: qca8k: enable assisted learning on CPU port

Message ID 20210807120726.1063225-3-dqfext@gmail.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series qca8k bridge flags offload | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link

Commit Message

Qingfang Deng Aug. 7, 2021, 12:07 p.m. UTC
Enable assisted learning on CPU port to fix roaming issues.

Although hardware learning is available, it won't work well with
software bridging fallback or multiple CPU ports.

Signed-off-by: DENG Qingfang <dqfext@gmail.com>
---
 drivers/net/dsa/qca8k.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Vladimir Oltean Aug. 7, 2021, 10:25 p.m. UTC | #1
On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote:
> Enable assisted learning on CPU port to fix roaming issues.

'roaming issues' implies to me it suffered from blindness to MAC
addresses learned on foreign interfaces, which appears to not be true
since your previous patch removes hardware learning on the CPU port
(=> hardware learning on the CPU port was supported, so there were no
roaming issues)

> 
> Although hardware learning is available, it won't work well with
> software bridging fallback or multiple CPU ports.

This part is potentially true however, but it would need proof. I am not
familiar enough with the qca8k driver to say for sure that it suffers
from the typical problem with bridging with software LAG uppers (no FDB
isolation for standalone ports => attempt to shortcircuit the forwarding
through the CPU port and go directly towards the bridged port, which
would result in drops), and I also can't say anything about its support
for multiple CPU ports.
Qingfang Deng Aug. 8, 2021, 4:05 p.m. UTC | #2
On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote:
> On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote:
> > Enable assisted learning on CPU port to fix roaming issues.
> 
> 'roaming issues' implies to me it suffered from blindness to MAC
> addresses learned on foreign interfaces, which appears to not be true
> since your previous patch removes hardware learning on the CPU port
> (=> hardware learning on the CPU port was supported, so there were no
> roaming issues)

The datasheet says learning is enabled by default, but if that's true,
the driver won't have to enable it manually.

Others have reported roaming issues as well:
https://github.com/Ansuel/openwrt/pull/3

As I don't have the hardware to test, I don't know what the default
value really is, so I just disable learning to make sure.

> 
> > 
> > Although hardware learning is available, it won't work well with
> > software bridging fallback or multiple CPU ports.
> 
> This part is potentially true however, but it would need proof. I am not
> familiar enough with the qca8k driver to say for sure that it suffers
> from the typical problem with bridging with software LAG uppers (no FDB
> isolation for standalone ports => attempt to shortcircuit the forwarding
> through the CPU port and go directly towards the bridged port, which
> would result in drops), and I also can't say anything about its support
> for multiple CPU ports.

QCA8337 supports disabling learning and FDB lookup on a per-VLAN basis,
so we could assign all standalone ports to a reserved VLAN (ID 0 or 4095)
with learning and FDB lookup disabled.

Ansuel has a patch set for multiple CPU ports.
Vladimir Oltean Aug. 8, 2021, 9:10 p.m. UTC | #3
On Mon, Aug 09, 2021 at 12:05:03AM +0800, DENG Qingfang wrote:
> On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote:
> > On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote:
> > > Enable assisted learning on CPU port to fix roaming issues.
> > 
> > 'roaming issues' implies to me it suffered from blindness to MAC
> > addresses learned on foreign interfaces, which appears to not be true
> > since your previous patch removes hardware learning on the CPU port
> > (=> hardware learning on the CPU port was supported, so there were no
> > roaming issues)
> 
> The datasheet says learning is enabled by default, but if that's true,
> the driver won't have to enable it manually.
> 
> Others have reported roaming issues as well:
> https://github.com/Ansuel/openwrt/pull/3
> 
> As I don't have the hardware to test, I don't know what the default
> value really is, so I just disable learning to make sure.

That link doesn't really say more than "roaming issues" either, so I am
still not clear on what is being fixed here exactly.

Note that I can still think of 'roaming'-related issues with VLAN-aware
bridges and foreign interfaces and hardware learning on the CPU port,
but I don't want to speculate too much and just want to hear what is the
issue that is being fixed.

> > > Although hardware learning is available, it won't work well with
> > > software bridging fallback or multiple CPU ports.
> > 
> > This part is potentially true however, but it would need proof. I am not
> > familiar enough with the qca8k driver to say for sure that it suffers
> > from the typical problem with bridging with software LAG uppers (no FDB
> > isolation for standalone ports => attempt to shortcircuit the forwarding
> > through the CPU port and go directly towards the bridged port, which
> > would result in drops), and I also can't say anything about its support
> > for multiple CPU ports.
> 
> QCA8337 supports disabling learning and FDB lookup on a per-VLAN basis,
> so we could assign all standalone ports to a reserved VLAN (ID 0 or 4095)
> with learning and FDB lookup disabled.

And to follow along that idea, if you also change the tagger to send
all packets towards a standalone port using that reserved VLAN, then
even if hardware learning is enabled on the CPU port, it will be
inconsequential as long as IVL is used, because no FDB lookup will match
the VLAN in which those addresses were learned.

My point is, if you come with something functional to the table, present
the whole story. If more changes still need to be made until it works
with software bridging fallback, say that too. Otherwise, I think that
the general idea that "hardware learning on the CPU port won't work well
with software bridging fallback" is not strictly true, and that this
patch has a weak overall justification.

> 
> Ansuel has a patch set for multiple CPU ports.
Andre Valentin Aug. 10, 2021, 5:27 p.m. UTC | #4
On Sun, Aug 08, 2021 at 1805, DENG Qingfang wrote:
> On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote:
>> On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote:
>>> Enable assisted learning on CPU port to fix roaming issues.
>>
>> 'roaming issues' implies to me it suffered from blindness to MAC
>> addresses learned on foreign interfaces, which appears to not be true
>> since your previous patch removes hardware learning on the CPU port
>> (=> hardware learning on the CPU port was supported, so there were no
>> roaming issues)

The issue is with a wifi AP bridged into dsa and previously learned
addresses.

Test setup:
We have to wifi APs a and b(with qca8k). Client is on AP a.

The qca8k switch in AP b sees also the broadcast traffic from the client
and takes the address into its fdb.

Now the client roams to AP b.
The client starts DHCP but does not get an IP. With tcpdump, I see the
packets going through the switch (ap->cpu port->ethernet port) and they
arrive at the DHCP server. It responds, the response packet reaches the
ethernet port of the qca8k, and is not forwarded.

After about 3 minutes the fdb entry in the qca8k on AP b is
"cleaned up" and the client can immediately get its IP from the DHCP server.

I hope this helps understanding the background.
Vladimir Oltean Aug. 10, 2021, 5:53 p.m. UTC | #5
On Tue, Aug 10, 2021 at 07:27:05PM +0200, Andre Valentin wrote:
> On Sun, Aug 08, 2021 at 1805, DENG Qingfang wrote:
> > On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote:
> >> On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote:
> >>> Enable assisted learning on CPU port to fix roaming issues.
> >>
> >> 'roaming issues' implies to me it suffered from blindness to MAC
> >> addresses learned on foreign interfaces, which appears to not be true
> >> since your previous patch removes hardware learning on the CPU port
> >> (=> hardware learning on the CPU port was supported, so there were no
> >> roaming issues)
>
> The issue is with a wifi AP bridged into dsa and previously learned
> addresses.
>
> Test setup:
> We have to wifi APs a and b(with qca8k). Client is on AP a.
>
> The qca8k switch in AP b sees also the broadcast traffic from the client
> and takes the address into its fdb.
>
> Now the client roams to AP b.
> The client starts DHCP but does not get an IP. With tcpdump, I see the
> packets going through the switch (ap->cpu port->ethernet port) and they
> arrive at the DHCP server. It responds, the response packet reaches the
> ethernet port of the qca8k, and is not forwarded.
>
> After about 3 minutes the fdb entry in the qca8k on AP b is
> "cleaned up" and the client can immediately get its IP from the DHCP server.
>
> I hope this helps understanding the background.

How does this differ from what is described in commit d5f19486cee7
("net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign
bridge neighbors")?
Andre Valentin Aug. 10, 2021, 9:09 p.m. UTC | #6
Am 10.08.21 um 19:53 schrieb Vladimir Oltean:
> On Tue, Aug 10, 2021 at 07:27:05PM +0200, Andre Valentin wrote:
>> On Sun, Aug 08, 2021 at 1805, DENG Qingfang wrote:
>>> On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote:
>>>> On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote:
>>>>> Enable assisted learning on CPU port to fix roaming issues.
>>>>
>>>> 'roaming issues' implies to me it suffered from blindness to MAC
>>>> addresses learned on foreign interfaces, which appears to not be true
>>>> since your previous patch removes hardware learning on the CPU port
>>>> (=> hardware learning on the CPU port was supported, so there were no
>>>> roaming issues)
>>
>> The issue is with a wifi AP bridged into dsa and previously learned
>> addresses.
>>
>> Test setup:
>> We have to wifi APs a and b(with qca8k). Client is on AP a.
>>
>> The qca8k switch in AP b sees also the broadcast traffic from the client
>> and takes the address into its fdb.
>>
>> Now the client roams to AP b.
>> The client starts DHCP but does not get an IP. With tcpdump, I see the
>> packets going through the switch (ap->cpu port->ethernet port) and they
>> arrive at the DHCP server. It responds, the response packet reaches the
>> ethernet port of the qca8k, and is not forwarded.
>>
>> After about 3 minutes the fdb entry in the qca8k on AP b is
>> "cleaned up" and the client can immediately get its IP from the DHCP server.
>>
>> I hope this helps understanding the background.
> 
> How does this differ from what is described in commit d5f19486cee7
> ("net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign
> bridge neighbors")?
> 
I lost a bit, It is a bit different.

I've been also working a bit on the ipq807x device with such a switch on
OpenWRT. There is a backport of that commit. To fix the problems described
by d5f19486cee7, I enabled assisted_learning on qca8k. And it solves the
problem.

But initially, the device was unreachable until I created traffic from the device
to a client (cpu port->ethernet). At first, I did not notice this because a wifi client
with it's traffic immediately solved the issue automatically.
Later I verified this in detail.

Hopefully DENG Qingfang patches help. But I cannot proove atm.
Vladimir Oltean Aug. 10, 2021, 11:33 p.m. UTC | #7
On Tue, Aug 10, 2021 at 11:09:07PM +0200, Andre Valentin wrote:
> Am 10.08.21 um 19:53 schrieb Vladimir Oltean:
> > On Tue, Aug 10, 2021 at 07:27:05PM +0200, Andre Valentin wrote:
> >> On Sun, Aug 08, 2021 at 1805, DENG Qingfang wrote:
> >>> On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote:
> >>>> On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote:
> >>>>> Enable assisted learning on CPU port to fix roaming issues.
> >>>>
> >>>> 'roaming issues' implies to me it suffered from blindness to MAC
> >>>> addresses learned on foreign interfaces, which appears to not be true
> >>>> since your previous patch removes hardware learning on the CPU port
> >>>> (=> hardware learning on the CPU port was supported, so there were no
> >>>> roaming issues)
> >>
> >> The issue is with a wifi AP bridged into dsa and previously learned
> >> addresses.
> >>
> >> Test setup:
> >> We have to wifi APs a and b(with qca8k). Client is on AP a.
> >>
> >> The qca8k switch in AP b sees also the broadcast traffic from the client
> >> and takes the address into its fdb.
> >>
> >> Now the client roams to AP b.
> >> The client starts DHCP but does not get an IP. With tcpdump, I see the
> >> packets going through the switch (ap->cpu port->ethernet port) and they
> >> arrive at the DHCP server. It responds, the response packet reaches the
> >> ethernet port of the qca8k, and is not forwarded.
> >>
> >> After about 3 minutes the fdb entry in the qca8k on AP b is
> >> "cleaned up" and the client can immediately get its IP from the DHCP server.
> >>
> >> I hope this helps understanding the background.
> > 
> > How does this differ from what is described in commit d5f19486cee7
> > ("net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign
> > bridge neighbors")?
> > 
> I lost a bit, It is a bit different.
> 
> I've been also working a bit on the ipq807x device with such a switch on
> OpenWRT. There is a backport of that commit. To fix the problems described
> by d5f19486cee7, I enabled assisted_learning on qca8k. And it solves the
> problem.
> 
> But initially, the device was unreachable until I created traffic from the device
> to a client (cpu port->ethernet). At first, I did not notice this because a wifi client
> with it's traffic immediately solved the issue automatically.
> Later I verified this in detail.
> 
> Hopefully DENG Qingfang patches help. But I cannot proove atm.

I don't understand. You're saying that when the device sends a packet
from its new position, the switch learns it on the CPU port, and that
fixes the issue?

Isn't that always how issues like that get fixed? If hardware learning
is supported on the CPU port, it is no different than a device roaming
from one switch port to another (but isn't directly connected to that
switch port, otherwise the switch might fast age the port when the
device roams) - it is inaccessible until it says something.

I still have no idea what we're talking about, and why this patch is
necessary. Does the qca8k switch support hardware learning on the CPU
port or not?
diff mbox series

Patch

diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index 798bc548e5b0..de2aa7812d1c 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -1114,6 +1114,8 @@  qca8k_setup(struct dsa_switch *ds)
 	/* We don't have interrupts for link changes, so we need to poll */
 	ds->pcs_poll = true;
 
+	ds->assisted_learning_on_cpu_port = true;
+
 	return 0;
 }