diff mbox series

[RFC,v2,net-next,06/17] net: dsa: add addresses obtained from RX filtering to host addresses

Message ID 20210224114350.2791260-7-olteanv@gmail.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series RX filtering in DSA | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count fail Series longer than 15 patches
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 2 maintainers not CCed: davem@davemloft.net kuba@kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 2 this patch: 2
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning CHECK: Please don't use multiple blank lines
netdev/build_allmodconfig_warn success Errors and warnings before: 2 this patch: 2
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Vladimir Oltean Feb. 24, 2021, 11:43 a.m. UTC
From: Vladimir Oltean <vladimir.oltean@nxp.com>

In case we have ptp4l running on a bridged DSA switch interface, the PTP
traffic is classified as link-local (in the default profile, the MAC
addresses are 01:1b:19:00:00:00 and 01:80:c2:00:00:0e), which means it
isn't the responsibility of the bridge to make sure it gets trapped to
the CPU.

The solution is to implement the standard callbacks for dev_uc_add and
dev_mc_add, and behave just like any other network interface: ensure
that the user space program can see those packets.

Note that since ndo_set_rx_mode runs in atomic context, we must schedule
the dsa_slave_switchdev_event_work in order to install the FDB and MDB
entries to a DSA switch that may sleep. But since the DSA switchdev
event logic only deals with FDB entries, we must fake some events for
MDB ones.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/dsa_priv.h |  10 +-
 net/dsa/slave.c    | 329 ++++++++++++++++++++++++++++++++-------------
 2 files changed, 239 insertions(+), 100 deletions(-)

Comments

Tobias Waldekranz Feb. 26, 2021, 10:59 a.m. UTC | #1
On Wed, Feb 24, 2021 at 13:43, Vladimir Oltean <olteanv@gmail.com> wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>
> In case we have ptp4l running on a bridged DSA switch interface, the PTP
> traffic is classified as link-local (in the default profile, the MAC
> addresses are 01:1b:19:00:00:00 and 01:80:c2:00:00:0e), which means it
> isn't the responsibility of the bridge to make sure it gets trapped to
> the CPU.
>
> The solution is to implement the standard callbacks for dev_uc_add and
> dev_mc_add, and behave just like any other network interface: ensure
> that the user space program can see those packets.

So presumably the application would use PACKET_ADD_MEMBERSHIP to set
this up?

This is a really elegant way of solving this problem I think!

One problem I see is that this will not result in packets getting
trapped to the CPU, rather they will simply be forwarded.  I.e. with
this patch applied, once ptp4l adds the groups it is interested in, my
HW FDB will look like this:

ADDR                VID  DST   TYPE
01:1b:19:00:00:00     0  cpu0  static
01:80:c2:00:00:0e     0  cpu0  static

But this will not allow these groups to ingress on (STP) blocked
ports. AFAIK, PTP (certainly LLDP which also uses the latter group)
should be able to do that.

For mv88e6xxx (but I think this applies to most switches), there are
roughly three ways a given multicast group can reach the CPU:

1. Trap: Packet is unconditionally redirected to the CPU, independent
   of things like 802.1X or STP state on the ingressing port.
2. Mirror: Send a copy of packets that pass all other ingress policy to
   the CPU.
3. Forward: Forward packets that pass all other ingress policy to the
   CPU.

Entries are now added as "Forward", which means that the group will no
longer reach the other local ports. But the command from the application
is "I want to see these packets", it says nothing about preventing the
group from being forwarded. So I think the default ought to be
"Mirror". Additionally, we probably need some way of specifying "Trap"
to those applications that need it. E.g. ptp4l could specify
PACKET_MR_MULTICAST_TRAP in mr_action or something if it does not want
the bridge (or the switch) to forward it.

If "Forward" is desired, the existing "bridge mdb" interface seems like
the proper one, since it also affects other ports.
Vladimir Oltean Feb. 26, 2021, 1:28 p.m. UTC | #2
On Fri, Feb 26, 2021 at 11:59:36AM +0100, Tobias Waldekranz wrote:
> On Wed, Feb 24, 2021 at 13:43, Vladimir Oltean <olteanv@gmail.com> wrote:
> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> >
> > In case we have ptp4l running on a bridged DSA switch interface, the PTP
> > traffic is classified as link-local (in the default profile, the MAC
> > addresses are 01:1b:19:00:00:00 and 01:80:c2:00:00:0e), which means it
> > isn't the responsibility of the bridge to make sure it gets trapped to
> > the CPU.
> >
> > The solution is to implement the standard callbacks for dev_uc_add and
> > dev_mc_add, and behave just like any other network interface: ensure
> > that the user space program can see those packets.
>
> So presumably the application would use PACKET_ADD_MEMBERSHIP to set
> this up?
>
> This is a really elegant way of solving this problem I think!

Yes, using the unmodified *_ADD_MEMBERSHIP UAPI was the intention.
If that is not possible, the whole idea kinda loses its appeal and we'd
be better off starting from scratch and figuring out how we'd prefer for
user space to request (exclusive) address membership.

> One problem I see is that this will not result in packets getting
> trapped to the CPU, rather they will simply be forwarded.  I.e. with
> this patch applied, once ptp4l adds the groups it is interested in, my
> HW FDB will look like this:
>
> ADDR                VID  DST   TYPE
> 01:1b:19:00:00:00     0  cpu0  static
> 01:80:c2:00:00:0e     0  cpu0  static
>
> But this will not allow these groups to ingress on (STP) blocked
> ports. AFAIK, PTP (certainly LLDP which also uses the latter group)
> should be able to do that.
>
> For mv88e6xxx (but I think this applies to most switches), there are
> roughly three ways a given multicast group can reach the CPU:
>
> 1. Trap: Packet is unconditionally redirected to the CPU, independent
>    of things like 802.1X or STP state on the ingressing port.
> 2. Mirror: Send a copy of packets that pass all other ingress policy to
>    the CPU.
> 3. Forward: Forward packets that pass all other ingress policy to the
>    CPU.
>
> Entries are now added as "Forward", which means that the group will no
> longer reach the other local ports. But the command from the application
> is "I want to see these packets", it says nothing about preventing the
> group from being forwarded. So I think the default ought to be
> "Mirror". Additionally, we probably need some way of specifying "Trap"
> to those applications that need it. E.g. ptp4l could specify
> PACKET_MR_MULTICAST_TRAP in mr_action or something if it does not want
> the bridge (or the switch) to forward it.
>
> If "Forward" is desired, the existing "bridge mdb" interface seems like
> the proper one, since it also affects other ports.

I'm not sure I understand your exact requirement.

Let me try to quote from IEEE 802.1Q-2018 clause 8.13.9 "Points of
attachment and connectivity for Higher Layer Entities". For context,
this talks about about Higher Layer Entities, aka applications such as
STP, MRP, ISIS-SPB, whose world view is as though they are connected
directly to the LAN that the bridge port is connected to, i.e. they
bypass the MAC relay (forwarding) function.

The spec says:

  Controls placed in the forwarding path have no effect on the ability
  of a Higher Layer Entity to transmit and receive frames to or from a
  given LAN using a direct attachment to that LAN (e.g., from entity A
  to LAN A); they only affect the path taken by any indirect
  transmission or reception (e.g., from entity A to or from LAN B).

Then there's this drawing:

 +-----------------------+                         +-----------------------+
 | Higher Layer Entity A |                         | Higher Layer Entity B |
 +-----------------------+                         +-----------------------+
            |                                                   |
            |     +---------------------------------------+     |
            |     |             MAC Relay Entity          |     |
            |     |                                       |     |
            |     |   Port   Filtering Database   Port    |     |
            |     |  State       Information     State    |     |
            |     |                                       |     |
            |     |     /             /             /     |     |
            +-----|---x/  x---------x/  x---------x/  x---+-----+
            |     |                                       |     |
            |     +---------------------------------------+     |
            |                                                   |
 +----------------------------+                      +----------------------------+
 |          |                 |                      |          |                 |
 |          x                 |                      |          x                 |
 |    MAC    /                |                      |    MAC    /                |
 |  Entity  / MAC_Operational |                      |  Entity  / MAC_Operational |
 |          x                 |                      |          x                 |
 |          |                 |                      |          |                 |
 +----------------------------+                      +----------------------------+
            |                                                   |
            |                                                   |
            |                                                   |
   ----------------------------                        ----------------------------
  /       LAN A               /                       /       LAN B               /
 /                           /                       /                           /
/---------------------------/                       /---------------------------/

Figure 8-20: Effect of control information on the forwarding path

A one phrase conclusion from the above is that a Higher Layer Entity
should be able to accept packets from the bridge port regardless of its
STP state, as long as the MAC is operational.

In my world view, anything sent and received through the swpN DSA
interfaces, and therefore by injection/extraction through the CPU port,
represents a Higher Layer Entity.

So the fact that your Marvell switch treats the CPU port as just another
destination for forwarding should be hidden away from the externally
observable behavior. DSA does not really support the switched endpoint
use case, we should deny attaching IP interfaces to it.

But to your point: we are currently using the .port_fdb_add and
.port_mdb_add API in DSA to install the host-filtered addresses. This is
true, and potentially incorrect, since indeed it assumes that the
underlying mechanism to trap these addresses to the CPU is through the
FDB/MDB, which will violate the expectation of Higher Layer Entities
since it goes through the forwarding process. I think we could add a new
set of APIs in DSA, something like .host_uc_add and .host_mc_add. Then,
drivers that can't do any better can go ahead and internally call their
.port_fdb_add and .port_mdb_add. Question: If we add .host_uc_add and
.host_mc_add, should these APIs be per front port? And if they should,
should we leave the switch driver the task of reference counting them?


As to why does IEEE 1588 use 01-80-C2-00-00-0E for peer delay in its
default profile for the L2 transport, I could only find this (quoting
clause "F.3 Multicast MAC Addresses"):

  To ensure peer delay measurements on ports blocked by (Rapid/Multiple)
  Spanning Tree Protocols, a reserved address, 01-80-C2-00-00-0E, shall be
  used as a Destination MAC Address for PTP peer delay mechanism messages.

Basically, unlike end-to-end delay measurements, peer delay is by
definition point-to-point, which in an L2 network mean link-local.
So if your LAN uses the peer delay measurement protocol, then all your
switches must have some sort of PTP awareness. There are 2 cases really:
- You are a Peer-to-peer Transparent Clock, so you must speak the Peer
  Delay protocol yourself. Therefore you must have a Higher Layer Entity
  that consumes the PTP PDUs. So for this case, it doesn't really make
  any difference that the reserved bridge MAC address range is used.
- You are an End-to-end Transparent Clock. These are an oddity of the
  1588 standard which do not speak the Peer Delay protocol, but are
  allowed to forward Peer Delay messages, and update the correctionField
  of those Peer Delay messages that are Events (contain timestamps).
  I would think that this is where having a reserved address for the
  Peer Delay messages would make a difference.


Then, there seems to be the second part of your request, that of address
exclusivity. I think that for addresses that should explicitly be
removed from the hardware data path, the tc-trap action was created for
that exact purpose. However, the question really becomes: does PTP care
enough to know it's running on top of a switchdev interface, so that it
should claim exclusive ownership of that multicast group, or should it
just care enough to say "hey, I'm here, I'm interested in it too"?

For one thing, the 01-80-c2-00-00-0e multicast group is shared, even
IEEE 1588 says so:

  NOTE 2: At its July 17−20, 2006 meeting the IEEE 802.1 Working Group
  approved a motion that included the following text: "re-designate the
  reserved address currently identified for use by 802.1AB as an address
  that can be used by protocols that require the scope of the address to
  be limited to an individual LAN" and "The reserved multicast address
  that IEEE 1588 should use is 01-80-C2-00-00-0E." This address is not
  reserved exclusively for PTP, but rather it is a shared address.

and the bridge driver also supports this odd attribute:

  group_fwd_mask MASK - set the group forward mask. This is the bitmask
  that is applied to decide whether to forward incoming frames destined
  to link-local addresses, ie addresses of the form 01:80:C2:00:00:0X
  (defaults to 0, ie the bridge does not forward any link-local frames).

which by the way seems to be in blatant contradiction of clause
8.13.4 Reserved MAC addresses: "Any frame with a destination address
that is a reserved MAC address shall not be forwarded by a Bridge".

So if ptp4l was to install a trap action by itself, it would potentially
interact in unexpected ways with other uses of that address on the same
system.

Thoughts?
Tobias Waldekranz Feb. 26, 2021, 10:44 p.m. UTC | #3
On Fri, Feb 26, 2021 at 15:28, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Fri, Feb 26, 2021 at 11:59:36AM +0100, Tobias Waldekranz wrote:
>> On Wed, Feb 24, 2021 at 13:43, Vladimir Oltean <olteanv@gmail.com> wrote:
>> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
>> >
>> > In case we have ptp4l running on a bridged DSA switch interface, the PTP
>> > traffic is classified as link-local (in the default profile, the MAC
>> > addresses are 01:1b:19:00:00:00 and 01:80:c2:00:00:0e), which means it
>> > isn't the responsibility of the bridge to make sure it gets trapped to
>> > the CPU.
>> >
>> > The solution is to implement the standard callbacks for dev_uc_add and
>> > dev_mc_add, and behave just like any other network interface: ensure
>> > that the user space program can see those packets.
>>
>> So presumably the application would use PACKET_ADD_MEMBERSHIP to set
>> this up?
>>
>> This is a really elegant way of solving this problem I think!
>
> Yes, using the unmodified *_ADD_MEMBERSHIP UAPI was the intention.
> If that is not possible, the whole idea kinda loses its appeal and we'd
> be better off starting from scratch and figuring out how we'd prefer for
> user space to request (exclusive) address membership.
>
>> One problem I see is that this will not result in packets getting
>> trapped to the CPU, rather they will simply be forwarded.  I.e. with
>> this patch applied, once ptp4l adds the groups it is interested in, my
>> HW FDB will look like this:
>>
>> ADDR                VID  DST   TYPE
>> 01:1b:19:00:00:00     0  cpu0  static
>> 01:80:c2:00:00:0e     0  cpu0  static
>>
>> But this will not allow these groups to ingress on (STP) blocked
>> ports. AFAIK, PTP (certainly LLDP which also uses the latter group)
>> should be able to do that.
>>
>> For mv88e6xxx (but I think this applies to most switches), there are
>> roughly three ways a given multicast group can reach the CPU:
>>
>> 1. Trap: Packet is unconditionally redirected to the CPU, independent
>>    of things like 802.1X or STP state on the ingressing port.
>> 2. Mirror: Send a copy of packets that pass all other ingress policy to
>>    the CPU.
>> 3. Forward: Forward packets that pass all other ingress policy to the
>>    CPU.
>>
>> Entries are now added as "Forward", which means that the group will no
>> longer reach the other local ports. But the command from the application
>> is "I want to see these packets", it says nothing about preventing the
>> group from being forwarded. So I think the default ought to be
>> "Mirror". Additionally, we probably need some way of specifying "Trap"
>> to those applications that need it. E.g. ptp4l could specify
>> PACKET_MR_MULTICAST_TRAP in mr_action or something if it does not want
>> the bridge (or the switch) to forward it.
>>
>> If "Forward" is desired, the existing "bridge mdb" interface seems like
>> the proper one, since it also affects other ports.
>
> I'm not sure I understand your exact requirement.
>
> Let me try to quote from IEEE 802.1Q-2018 clause 8.13.9 "Points of
> attachment and connectivity for Higher Layer Entities". For context,
> this talks about about Higher Layer Entities, aka applications such as
> STP, MRP, ISIS-SPB, whose world view is as though they are connected
> directly to the LAN that the bridge port is connected to, i.e. they
> bypass the MAC relay (forwarding) function.
>
> The spec says:
>
>   Controls placed in the forwarding path have no effect on the ability
>   of a Higher Layer Entity to transmit and receive frames to or from a
>   given LAN using a direct attachment to that LAN (e.g., from entity A
>   to LAN A); they only affect the path taken by any indirect
>   transmission or reception (e.g., from entity A to or from LAN B).
>
> Then there's this drawing:
>
>  +-----------------------+                         +-----------------------+
>  | Higher Layer Entity A |                         | Higher Layer Entity B |
>  +-----------------------+                         +-----------------------+
>             |                                                   |
>             |     +---------------------------------------+     |
>             |     |             MAC Relay Entity          |     |
>             |     |                                       |     |
>             |     |   Port   Filtering Database   Port    |     |
>             |     |  State       Information     State    |     |
>             |     |                                       |     |
>             |     |     /             /             /     |     |
>             +-----|---x/  x---------x/  x---------x/  x---+-----+
>             |     |                                       |     |
>             |     +---------------------------------------+     |
>             |                                                   |
>  +----------------------------+                      +----------------------------+
>  |          |                 |                      |          |                 |
>  |          x                 |                      |          x                 |
>  |    MAC    /                |                      |    MAC    /                |
>  |  Entity  / MAC_Operational |                      |  Entity  / MAC_Operational |
>  |          x                 |                      |          x                 |
>  |          |                 |                      |          |                 |
>  +----------------------------+                      +----------------------------+
>             |                                                   |
>             |                                                   |
>             |                                                   |
>    ----------------------------                        ----------------------------
>   /       LAN A               /                       /       LAN B               /
>  /                           /                       /                           /
> /---------------------------/                       /---------------------------/
>
> Figure 8-20: Effect of control information on the forwarding path
>
> A one phrase conclusion from the above is that a Higher Layer Entity
> should be able to accept packets from the bridge port regardless of its
> STP state, as long as the MAC is operational.
>
> In my world view, anything sent and received through the swpN DSA
> interfaces, and therefore by injection/extraction through the CPU port,
> represents a Higher Layer Entity.

Yes, I agree.

> So the fact that your Marvell switch treats the CPU port as just another
> destination for forwarding should be hidden away from the externally
> observable behavior. DSA does not really support the switched endpoint
> use case, we should deny attaching IP interfaces to it.
>
> But to your point: we are currently using the .port_fdb_add and
> .port_mdb_add API in DSA to install the host-filtered addresses. This is
> true, and potentially incorrect, since indeed it assumes that the
> underlying mechanism to trap these addresses to the CPU is through the
> FDB/MDB, which will violate the expectation of Higher Layer Entities
> since it goes through the forwarding process.

Well, at least the way it is implemented now. E.g. mv88e6xxx can setup
mirrors/traps via the FDB (ATU), I imagine that many smaller devices
reuse their FDBs for this purpose. So it might be possible to just
extend the API a bit to signal the type of entry required.

> I think we could add a new
> set of APIs in DSA, something like .host_uc_add and .host_mc_add. Then,
> drivers that can't do any better can go ahead and internally call their
> .port_fdb_add and .port_mdb_add.

That might be an ever better way of managing it. That would also make it
easier to design in the difference between trap/mirror from the start.

> Question: If we add .host_uc_add and
> .host_mc_add, should these APIs be per front port?

I think they would have to be per port in order to work in the
standalone case. You need some context to about which FID (or
equivalent) to tie the entry to.

> And if they should,
> should we leave the switch driver the task of reference counting them?

Well, I know you do not like the DSA layer "trying to be helpful", but I
can not imagine a case where some hardware would be interested in having
N callbacks for the same address when N ports are attached to the same
bridge. So I think it would be a service that DSA can provide.

>
> As to why does IEEE 1588 use 01-80-C2-00-00-0E for peer delay in its
> default profile for the L2 transport, I could only find this (quoting
> clause "F.3 Multicast MAC Addresses"):
>
>   To ensure peer delay measurements on ports blocked by (Rapid/Multiple)
>   Spanning Tree Protocols, a reserved address, 01-80-C2-00-00-0E, shall be
>   used as a Destination MAC Address for PTP peer delay mechanism messages.

Yeah that makes sense. You want to make sure that you are already synced
with your neighbor when a topology change occurs.

> Basically, unlike end-to-end delay measurements, peer delay is by
> definition point-to-point, which in an L2 network mean link-local.
> So if your LAN uses the peer delay measurement protocol, then all your
> switches must have some sort of PTP awareness. There are 2 cases really:
> - You are a Peer-to-peer Transparent Clock, so you must speak the Peer
>   Delay protocol yourself. Therefore you must have a Higher Layer Entity
>   that consumes the PTP PDUs. So for this case, it doesn't really make
>   any difference that the reserved bridge MAC address range is used.
> - You are an End-to-end Transparent Clock. These are an oddity of the
>   1588 standard which do not speak the Peer Delay protocol, but are
>   allowed to forward Peer Delay messages, and update the correctionField
>   of those Peer Delay messages that are Events (contain timestamps).
>   I would think that this is where having a reserved address for the
>   Peer Delay messages would make a difference.
>
>
> Then, there seems to be the second part of your request, that of address
> exclusivity. I think that for addresses that should explicitly be
> removed from the hardware data path, the tc-trap action was created for
> that exact purpose.

Maybe. Tc-trap is a weird creature in that it seems to violate the rule
that in order for an offload to be accepted into the kernel, a software
implementation needs to be in place first. The exception, as I
understand it, is for things that do not make sense in software
(e.g. cut-through switching).

Trapping a packet, blocking bridging, certainly seems like something
that can be done in software. The fact that this was not done means that
it is hard to add it now without potentially breaking existing
use-cases.

Now when a tc trap filter is added, the switch will trap it to the CPU,
the switchdev driver will _not_ set OFM. The result is that the frame is
software forwarded by the bridge. Even if you hack the driver to pretend
that it was HW forwarded (setting OFM), that will not stop the bridge
from forwarding it to foreign interfaces.

One idea could be to implement the software version of "trap" to mean
that sch_handle_ingress could return a signal to
__netif_receive_skb_core to skip any rx handlers and only consider
device specific protocol handlers. A kind of "RX_HANDLER_EXACT with a
twist". Again, this is hard to add after the fact, but you should at
least be able to get the old behavior with "skip_sw".

> However, the question really becomes: does PTP care
> enough to know it's running on top of a switchdev interface, so that it
> should claim exclusive ownership of that multicast group, or should it
> just care enough to say "hey, I'm here, I'm interested in it too"?

Ideally it would not know about switchdev, but maybe it should care
whether it is running on top of a bridge, since that has implications
for how PTP frames should be forwarded (or not).

> For one thing, the 01-80-c2-00-00-0e multicast group is shared, even
> IEEE 1588 says so:
>
>   NOTE 2: At its July 17−20, 2006 meeting the IEEE 802.1 Working Group
>   approved a motion that included the following text: "re-designate the
>   reserved address currently identified for use by 802.1AB as an address
>   that can be used by protocols that require the scope of the address to
>   be limited to an individual LAN" and "The reserved multicast address
>   that IEEE 1588 should use is 01-80-C2-00-00-0E." This address is not
>   reserved exclusively for PTP, but rather it is a shared address.
>
> and the bridge driver also supports this odd attribute:
>
>   group_fwd_mask MASK - set the group forward mask. This is the bitmask
>   that is applied to decide whether to forward incoming frames destined
>   to link-local addresses, ie addresses of the form 01:80:C2:00:00:0X
>   (defaults to 0, ie the bridge does not forward any link-local frames).
>
> which by the way seems to be in blatant contradiction of clause
> 8.13.4 Reserved MAC addresses: "Any frame with a destination address
> that is a reserved MAC address shall not be forwarded by a Bridge".

Well, the set of addresses that are reserved vary based on the type of
bridge (Tables 8-1, 8-2, and 8-3). So this setting allows you to choose
the applicable set.

Additionally, there are cases where you want to emulate a $10 switch (or
even hub) which does _no_ kind of filtering at all.

> So if ptp4l was to install a trap action by itself, it would potentially
> interact in unexpected ways with other uses of that address on the same
> system.

Yeah that is a problem. The easy way out is probably "that is the
admin's problem". A middle way could be to warn/abort if the trap is not
in place. You could also go all the way and reference count filters I
suppose.
diff mbox series

Patch

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 4043da2bacc0..c03c67631e23 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -117,10 +117,12 @@  struct dsa_switchdev_event_work {
 	struct dsa_switch *ds;
 	int port;
 	struct work_struct work;
-	unsigned long event;
-	/* Specific for SWITCHDEV_FDB_ADD_TO_DEVICE and
-	 * SWITCHDEV_FDB_DEL_TO_DEVICE
-	 */
+	enum dsa_switchdev_event {
+		DSA_EVENT_FDB_ADD,
+		DSA_EVENT_FDB_DEL,
+		DSA_EVENT_MDB_ADD,
+		DSA_EVENT_MDB_DEL,
+	} event;
 	unsigned char addr[ETH_ALEN];
 	u16 vid;
 	bool host_addr;
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 6544a4ec69f4..65b3c1166fe7 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -171,6 +171,220 @@  static int dsa_host_fdb_del(struct dsa_port *dp, const unsigned char *addr,
 	return 0;
 }
 
+static void
+dsa_fdb_offload_notify(struct dsa_switchdev_event_work *switchdev_work)
+{
+	struct dsa_switch *ds = switchdev_work->ds;
+	struct switchdev_notifier_fdb_info info;
+	struct dsa_port *dp;
+
+	if (!dsa_is_user_port(ds, switchdev_work->port))
+		return;
+
+	info.addr = switchdev_work->addr;
+	info.vid = switchdev_work->vid;
+	info.offloaded = true;
+	dp = dsa_to_port(ds, switchdev_work->port);
+	call_switchdev_notifiers(SWITCHDEV_FDB_OFFLOADED,
+				 dp->slave, &info.info, NULL);
+}
+
+#define work_to_ctx(w) \
+	container_of((w), struct dsa_switchdev_event_work, work)
+
+static void dsa_slave_switchdev_event_work(struct work_struct *work)
+{
+	struct dsa_switchdev_event_work *switchdev_work = work_to_ctx(work);
+	struct dsa_switch *ds = switchdev_work->ds;
+	struct dsa_port *dp;
+	int err;
+
+	dp = dsa_to_port(ds, switchdev_work->port);
+
+	rtnl_lock();
+	switch (switchdev_work->event) {
+	case DSA_EVENT_FDB_ADD:
+		if (switchdev_work->host_addr)
+			err = dsa_host_fdb_add(dp, switchdev_work->addr,
+					       switchdev_work->vid);
+		else
+			err = dsa_port_fdb_add(dp, switchdev_work->addr,
+					       switchdev_work->vid);
+		if (err) {
+			dev_err(ds->dev,
+				"port %d failed to add %pM vid %d to fdb: %d\n",
+				dp->index, switchdev_work->addr,
+				switchdev_work->vid, err);
+			break;
+		}
+		dsa_fdb_offload_notify(switchdev_work);
+		break;
+
+	case DSA_EVENT_FDB_DEL:
+		if (switchdev_work->host_addr)
+			err = dsa_host_fdb_del(dp, switchdev_work->addr,
+					       switchdev_work->vid);
+		else
+			err = dsa_port_fdb_del(dp, switchdev_work->addr,
+					       switchdev_work->vid);
+		if (err) {
+			dev_err(ds->dev,
+				"port %d failed to delete %pM vid %d from fdb: %d\n",
+				dp->index, switchdev_work->addr,
+				switchdev_work->vid, err);
+		}
+
+		break;
+
+	case DSA_EVENT_MDB_ADD: {
+		struct switchdev_obj_port_mdb mdb;
+
+		ether_addr_copy(mdb.addr, switchdev_work->addr);
+		mdb.vid = switchdev_work->vid;
+
+		if (switchdev_work->host_addr)
+			err = dsa_host_mdb_add(dp, &mdb);
+		else
+			err = dsa_port_mdb_add(dp, &mdb);
+		if (err) {
+			dev_err(ds->dev,
+				"port %d failed to add %pM vid %d to mdb: %d\n",
+				dp->index, mdb.addr, mdb.vid, err);
+			break;
+		}
+		dsa_fdb_offload_notify(switchdev_work);
+		break;
+	}
+	case DSA_EVENT_MDB_DEL: {
+		struct switchdev_obj_port_mdb mdb;
+
+		ether_addr_copy(mdb.addr, switchdev_work->addr);
+		mdb.vid = switchdev_work->vid;
+
+		if (switchdev_work->host_addr)
+			err = dsa_host_mdb_del(dp, &mdb);
+		else
+			err = dsa_port_mdb_del(dp, &mdb);
+		if (err) {
+			dev_err(ds->dev,
+				"port %d failed to delete %pM vid %d from mdb: %d\n",
+				dp->index, mdb.addr, mdb.vid, err);
+		}
+		break;
+	}
+	default:
+		break;
+	}
+	rtnl_unlock();
+
+	kfree(switchdev_work);
+	dev_put(dp->slave);
+}
+
+
+static int dsa_slave_schedule_switchdev_work(struct net_device *dev,
+					     enum dsa_switchdev_event event,
+					     const unsigned char *addr, u16 vid,
+					     bool host_addr)
+{
+	struct dsa_switchdev_event_work *switchdev_work;
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+
+	if (!dp->ds->ops->port_fdb_add || !dp->ds->ops->port_fdb_del)
+		return -EOPNOTSUPP;
+
+	switchdev_work = kzalloc(sizeof(*switchdev_work), GFP_ATOMIC);
+	if (!switchdev_work)
+		return -ENOMEM;
+
+	INIT_WORK(&switchdev_work->work, dsa_slave_switchdev_event_work);
+	switchdev_work->ds = dp->ds;
+	switchdev_work->port = dp->index;
+	switchdev_work->event = event;
+
+	ether_addr_copy(switchdev_work->addr, addr);
+	switchdev_work->vid = vid;
+	switchdev_work->host_addr = host_addr;
+
+	/* Hold a reference on the slave for dsa_fdb_offload_notify */
+	dev_hold(dev);
+	dsa_schedule_work(&switchdev_work->work);
+
+	return 0;
+}
+
+static int dsa_slave_sync_uc(struct net_device *dev,
+			     const unsigned char *addr)
+{
+	int err;
+
+	err = dsa_slave_schedule_switchdev_work(dev, DSA_EVENT_FDB_ADD,
+						addr, 0, true);
+	if (err == -EOPNOTSUPP) {
+		struct dsa_port *dp = dsa_slave_to_port(dev);
+
+		dev_uc_add(dp->cpu_dp->master, addr);
+
+		return 0;
+	}
+
+	return err;
+}
+
+static int dsa_slave_unsync_uc(struct net_device *dev,
+			       const unsigned char *addr)
+{
+	int err;
+
+	err = dsa_slave_schedule_switchdev_work(dev, DSA_EVENT_FDB_DEL,
+						addr, 0, true);
+	if (err == -EOPNOTSUPP) {
+		struct dsa_port *dp = dsa_slave_to_port(dev);
+
+		dev_uc_del(dp->cpu_dp->master, addr);
+
+		return 0;
+	}
+
+	return err;
+}
+
+static int dsa_slave_sync_mc(struct net_device *dev,
+			     const unsigned char *addr)
+{
+	int err;
+
+	err = dsa_slave_schedule_switchdev_work(dev, DSA_EVENT_MDB_ADD,
+						addr, 0, true);
+	if (err == -EOPNOTSUPP) {
+		struct dsa_port *dp = dsa_slave_to_port(dev);
+
+		dev_mc_add(dp->cpu_dp->master, addr);
+
+		return 0;
+	}
+
+	return err;
+}
+
+static int dsa_slave_unsync_mc(struct net_device *dev,
+			       const unsigned char *addr)
+{
+	int err;
+
+	err = dsa_slave_schedule_switchdev_work(dev, DSA_EVENT_MDB_DEL,
+						addr, 0, true);
+	if (err == -EOPNOTSUPP) {
+		struct dsa_port *dp = dsa_slave_to_port(dev);
+
+		dev_mc_del(dp->cpu_dp->master, addr);
+
+		return 0;
+	}
+
+	return err;
+}
+
 /* slave mii_bus handling ***************************************************/
 static int dsa_slave_phy_read(struct mii_bus *bus, int addr, int reg)
 {
@@ -263,8 +477,9 @@  static int dsa_slave_close(struct net_device *dev)
 
 	dsa_port_disable_rt(dp);
 
-	dev_mc_unsync(master, dev);
-	dev_uc_unsync(master, dev);
+	__dev_uc_sync(dev, dsa_slave_sync_uc, dsa_slave_unsync_uc);
+	__dev_mc_sync(dev, dsa_slave_sync_mc, dsa_slave_unsync_mc);
+
 	if (dev->flags & IFF_ALLMULTI)
 		dev_set_allmulti(master, -1);
 	if (dev->flags & IFF_PROMISC)
@@ -290,10 +505,8 @@  static void dsa_slave_change_rx_flags(struct net_device *dev, int change)
 
 static void dsa_slave_set_rx_mode(struct net_device *dev)
 {
-	struct net_device *master = dsa_slave_to_master(dev);
-
-	dev_mc_sync(master, dev);
-	dev_uc_sync(master, dev);
+	__dev_uc_sync(dev, dsa_slave_sync_uc, dsa_slave_unsync_uc);
+	__dev_mc_sync(dev, dsa_slave_sync_mc, dsa_slave_unsync_mc);
 }
 
 static int dsa_slave_set_mac_address(struct net_device *dev, void *a)
@@ -1970,6 +2183,8 @@  int dsa_slave_create(struct dsa_port *port)
 	else
 		eth_hw_addr_inherit(slave_dev, master);
 	slave_dev->priv_flags |= IFF_NO_QUEUE;
+	if (ds->ops->port_fdb_add && ds->ops->port_fdb_del)
+		slave_dev->priv_flags |= IFF_UNICAST_FLT;
 	slave_dev->netdev_ops = &dsa_slave_netdev_ops;
 	if (ds->ops->port_max_mtu)
 		slave_dev->max_mtu = ds->ops->port_max_mtu(ds, port->index);
@@ -2290,75 +2505,6 @@  static int dsa_slave_netdevice_event(struct notifier_block *nb,
 	return NOTIFY_DONE;
 }
 
-static void
-dsa_fdb_offload_notify(struct dsa_switchdev_event_work *switchdev_work)
-{
-	struct dsa_switch *ds = switchdev_work->ds;
-	struct switchdev_notifier_fdb_info info;
-	struct dsa_port *dp;
-
-	if (!dsa_is_user_port(ds, switchdev_work->port))
-		return;
-
-	info.addr = switchdev_work->addr;
-	info.vid = switchdev_work->vid;
-	info.offloaded = true;
-	dp = dsa_to_port(ds, switchdev_work->port);
-	call_switchdev_notifiers(SWITCHDEV_FDB_OFFLOADED,
-				 dp->slave, &info.info, NULL);
-}
-
-static void dsa_slave_switchdev_event_work(struct work_struct *work)
-{
-	struct dsa_switchdev_event_work *switchdev_work =
-		container_of(work, struct dsa_switchdev_event_work, work);
-	struct dsa_switch *ds = switchdev_work->ds;
-	struct dsa_port *dp;
-	int err;
-
-	dp = dsa_to_port(ds, switchdev_work->port);
-
-	rtnl_lock();
-	switch (switchdev_work->event) {
-	case SWITCHDEV_FDB_ADD_TO_DEVICE:
-		if (switchdev_work->host_addr)
-			err = dsa_host_fdb_add(dp, switchdev_work->addr,
-					       switchdev_work->vid);
-		else
-			err = dsa_port_fdb_add(dp, switchdev_work->addr,
-					       switchdev_work->vid);
-		if (err) {
-			dev_err(ds->dev,
-				"port %d failed to add %pM vid %d to fdb: %d\n",
-				dp->index, switchdev_work->addr,
-				switchdev_work->vid, err);
-			break;
-		}
-		dsa_fdb_offload_notify(switchdev_work);
-		break;
-
-	case SWITCHDEV_FDB_DEL_TO_DEVICE:
-		if (switchdev_work->host_addr)
-			err = dsa_host_fdb_del(dp, switchdev_work->addr,
-					       switchdev_work->vid);
-		else
-			err = dsa_port_fdb_del(dp, switchdev_work->addr,
-					       switchdev_work->vid);
-		if (err) {
-			dev_err(ds->dev,
-				"port %d failed to delete %pM vid %d from fdb: %d\n",
-				dp->index, switchdev_work->addr,
-				switchdev_work->vid, err);
-		}
-
-		break;
-	}
-	rtnl_unlock();
-
-	kfree(switchdev_work);
-	dev_put(dp->slave);
-}
-
 static int dsa_lower_dev_walk(struct net_device *lower_dev,
 			      struct netdev_nested_priv *priv)
 {
@@ -2387,7 +2533,7 @@  static int dsa_slave_switchdev_event(struct notifier_block *unused,
 {
 	struct net_device *dev = switchdev_notifier_info_to_dev(ptr);
 	const struct switchdev_notifier_fdb_info *fdb_info;
-	struct dsa_switchdev_event_work *switchdev_work;
+	enum dsa_switchdev_event dsa_event;
 	bool host_addr = false;
 	struct dsa_port *dp;
 	int err;
@@ -2441,28 +2587,19 @@  static int dsa_slave_switchdev_event(struct notifier_block *unused,
 				return NOTIFY_DONE;
 		}
 
-		if (!dp->ds->ops->port_fdb_add || !dp->ds->ops->port_fdb_del)
-			return NOTIFY_DONE;
-
-		switchdev_work = kzalloc(sizeof(*switchdev_work), GFP_ATOMIC);
-		if (!switchdev_work)
-			return NOTIFY_BAD;
-
-		INIT_WORK(&switchdev_work->work,
-			  dsa_slave_switchdev_event_work);
-		switchdev_work->ds = dp->ds;
-		switchdev_work->port = dp->index;
-		switchdev_work->event = event;
+		if (event == SWITCHDEV_FDB_ADD_TO_DEVICE)
+			dsa_event = DSA_EVENT_FDB_ADD;
+		else
+			dsa_event = DSA_EVENT_FDB_DEL;
 
-		ether_addr_copy(switchdev_work->addr,
-				fdb_info->addr);
-		switchdev_work->vid = fdb_info->vid;
-		switchdev_work->host_addr = host_addr;
+		err = dsa_slave_schedule_switchdev_work(dp->slave, dsa_event,
+							fdb_info->addr,
+							fdb_info->vid,
+							host_addr);
+		if (err == -EOPNOTSUPP)
+			return NOTIFY_OK;
 
-		/* Hold a reference on the slave for dsa_fdb_offload_notify */
-		dev_hold(dev);
-		dsa_schedule_work(&switchdev_work->work);
-		break;
+		return notifier_from_errno(err);
 	default:
 		return NOTIFY_DONE;
 	}