Message ID | 20240402001137.2980589-1-Joseph.Huang@garmin.com (mailing list archive) |
---|---|
Headers | show |
Series | MC Flood disable and snooping | expand |
On 4/2/24 03:10, Joseph Huang wrote: > There is a use case where one would like to enable multicast snooping > on a bridge but disable multicast flooding on all bridge ports so that > registered multicast traffic will only reach the intended recipients and > unregistered multicast traffic will be dropped. However, with existing > bridge ports' mcast_flood flag implementation, it doesn't work as desired. > > This patchset aims to make multicast snooping work even when multicast > flooding is disabled on the bridge ports, without changing the semantic of > the mcast_flood flag too much. Patches 1 to 4 attempt to address this issue. > > Also, in a network where more than one multicast snooping capable bridges > are interconnected without multicast routers being present, multicast > snooping fails if: > > 1. The source is not directly attached to the Querier > 2. The listener is beyond the mrouter port of the bridge where the > source is directly attached > 3. A hardware offloading switch is involved > > When all of the conditions are met, the listener will not receive any > multicast packets from the source. Patches 5 to 10 attempt to address this > issue. Specifically, patches 5 to 8 set up the infrastructure, patch 9 > handles unregistered multicast packets forwarding, and patch 10 handles > registered multicast packets forwarding to the mrouter port. > > The patches were developed against 5.15, and forward-ported to 6.8. > Tests were done on a Pi 4B + Marvell 6393X Eval board with a single > switch chip with no VLAN. > > V1 -> V2: > - Moved the bulk of the change from the bridge to the mv88e6xxx driver. > - Added more patches (specifically 3 and 4) to workaround some more > issues with multicast flooding being disabled. > > v1 here: > https://patchwork.kernel.org/project/netdevbpf/cover/20210504182259.5042-1-Joseph.Huang@garmin.com/ > For the bridge patches: Nacked-by: Nikolay Aleksandrov <razor@blackwall.org> You cannot break the multicast flood flag to add support for a custom use-case. This is unacceptable. The current bridge behaviour is correct your patch 02 doesn't fix anything, you should configure the bridge properly to avoid all those problems, not break protocols. Your special use case can easily be solved by a user-space helper or eBPF and nftables. You can set the mcast flood flag and bypass the bridge for these packets. I basically said the same in 2021, if this is going to be in the bridge it should be hidden behind an option that is default off. But in my opinion adding an option to solve such special cases is undesirable, they can be easily solved with what's currently available.
On Mon, Apr 01, 2024 at 08:10:59PM -0400, Joseph Huang wrote: > There is a use case where one would like to enable multicast snooping > on a bridge but disable multicast flooding on all bridge ports so that > registered multicast traffic will only reach the intended recipients and > unregistered multicast traffic will be dropped. However, with existing > bridge ports' mcast_flood flag implementation, it doesn't work as desired. > > This patchset aims to make multicast snooping work even when multicast > flooding is disabled on the bridge ports, without changing the semantic of > the mcast_flood flag too much. Patches 1 to 4 attempt to address this issue. > > Also, in a network where more than one multicast snooping capable bridges > are interconnected without multicast routers being present, multicast > snooping fails if: > > 1. The source is not directly attached to the Querier > 2. The listener is beyond the mrouter port of the bridge where the > source is directly attached > 3. A hardware offloading switch is involved I've not studied the details here, but that last point makes me think the offload driver is broken. There should not be any difference between software bridging and hardware bridging. The whole idea is that you take what Linux can do in software and accelerate it by offloading to hardware. Doing acceleration should not change the behaviour. > The patches were developed against 5.15, and forward-ported to 6.8. > Tests were done on a Pi 4B + Marvell 6393X Eval board with a single > switch chip with no VLAN. So what is the mv88e6xxx driver doing wrong? Andrew
Hi Nikolai, On Tue, Apr 02, 2024 at 12:28:38PM +0300, Nikolay Aleksandrov wrote: > For the bridge patches: > Nacked-by: Nikolay Aleksandrov <razor@blackwall.org> > > You cannot break the multicast flood flag to add support for a custom > use-case. This is unacceptable. The current bridge behaviour is correct > your patch 02 doesn't fix anything, you should configure the bridge > properly to avoid all those problems, not break protocols. > > Your special use case can easily be solved by a user-space helper or > eBPF and nftables. You can set the mcast flood flag and bypass the > bridge for these packets. I basically said the same in 2021, if this is > going to be in the bridge it should be hidden behind an option that is > default off. But in my opinion adding an option to solve such special > cases is undesirable, they can be easily solved with what's currently > available. I appreciate your time is limited, but could you please translate your suggestion, and detail your proposed alternative a bit, for those of us who are not very familiar with IP multicast snooping? Bypass the bridge for which packets? General IGMP/MLD queries? Wouldn't that break snooping? And then do what with the packets, forward them in another software layer than the bridge? I also don't quite understand the suggestion of turning on mcast flooding: isn't Joseph saying that he wants it off for the unregistered multicast data traffic?
On 4/2/24 20:43, Vladimir Oltean wrote: > Hi Nikolai, > > On Tue, Apr 02, 2024 at 12:28:38PM +0300, Nikolay Aleksandrov wrote: >> For the bridge patches: >> Nacked-by: Nikolay Aleksandrov <razor@blackwall.org> >> >> You cannot break the multicast flood flag to add support for a custom >> use-case. This is unacceptable. The current bridge behaviour is correct >> your patch 02 doesn't fix anything, you should configure the bridge >> properly to avoid all those problems, not break protocols. >> >> Your special use case can easily be solved by a user-space helper or >> eBPF and nftables. You can set the mcast flood flag and bypass the >> bridge for these packets. I basically said the same in 2021, if this is >> going to be in the bridge it should be hidden behind an option that is >> default off. But in my opinion adding an option to solve such special >> cases is undesirable, they can be easily solved with what's currently >> available. > > I appreciate your time is limited, but could you please translate your > suggestion, and detail your proposed alternative a bit, for those of us > who are not very familiar with IP multicast snooping? > My suggestion is not related to snooping really, but to the goal of patches 01-03. The bridge patches in this set are trying to forward traffic that is not supposed to be forwarded with the proposed configuration, so that can be done by a user-space helper that installs rules to bypass the bridge specifically for those packets while monitoring the bridge state to implement a policy and manage these rules in order to keep snooping working. > Bypass the bridge for which packets? General IGMP/MLD queries? Wouldn't > that break snooping? And then do what with the packets, forward them in > another software layer than the bridge? > The ones that are not supposed to be forwarded in the proposed config and are needed for this use case (control traffic and link-local). Obviously to have proper snooping you'd need to manage these bypass rules and use them only while needed. > I also don't quite understand the suggestion of turning on mcast flooding: > isn't Joseph saying that he wants it off for the unregistered multicast > data traffic? Ah my bad, I meant to turn off flooding and bypass the bridge for those packets and ports while necessary, under necessary can be any policy that the user-space helper wants to implement. In any case, if this is going to be yet another kernel solution then it must be a new option that is default off, and doesn't break current mcast flood flag behaviour. In general my opinion is that the whole snooping control must be in user-space and only have the dataplane in the kernel, but that is beyond the scope of this set.
On Tue, Apr 02, 2024 at 09:50:51PM +0300, Nikolay Aleksandrov wrote: > On 4/2/24 20:43, Vladimir Oltean wrote: > > Hi Nikolai, > > > > On Tue, Apr 02, 2024 at 12:28:38PM +0300, Nikolay Aleksandrov wrote: > > > For the bridge patches: > > > Nacked-by: Nikolay Aleksandrov <razor@blackwall.org> > > > > > > You cannot break the multicast flood flag to add support for a custom > > > use-case. This is unacceptable. The current bridge behaviour is correct > > > your patch 02 doesn't fix anything, you should configure the bridge > > > properly to avoid all those problems, not break protocols. > > > > > > Your special use case can easily be solved by a user-space helper or > > > eBPF and nftables. You can set the mcast flood flag and bypass the > > > bridge for these packets. I basically said the same in 2021, if this is > > > going to be in the bridge it should be hidden behind an option that is > > > default off. But in my opinion adding an option to solve such special > > > cases is undesirable, they can be easily solved with what's currently > > > available. > > > > I appreciate your time is limited, but could you please translate your > > suggestion, and detail your proposed alternative a bit, for those of us > > who are not very familiar with IP multicast snooping? > > > > My suggestion is not related to snooping really, but to the goal of > patches 01-03. The bridge patches in this set are trying to forward > traffic that is not supposed to be forwarded with the proposed > configuration, Correct up to a point. Reinterpreting the given user space configuration and trying to make it do something else seems like a mistake, but in principle one could also look at alternative bridge configurations like the one I described here: https://lore.kernel.org/netdev/20240402180805.yhhwj2f52sdc4dl2@skbuf/ > so that can be done by a user-space helper that installs > rules to bypass the bridge specifically for those packets while > monitoring the bridge state to implement a policy and manage these rules > in order to keep snooping working. > > > Bypass the bridge for which packets? General IGMP/MLD queries? Wouldn't > > that break snooping? And then do what with the packets, forward them in > > another software layer than the bridge? > > > > The ones that are not supposed to be forwarded in the proposed config > and are needed for this use case (control traffic and link-local). Obviously > to have proper snooping you'd need to manage these bypass > rules and use them only while needed. I think Joseph will end up in a situation where he needs IGMP control messages both in the bridge data path and outside of it :) Also, your proposal eliminates the possibility of cooperating with a hardware accelerator which can forward the IGMP messages where they need to go. As far as I understand, I don't think Joseph has a very "special" use case. Disabling flooding of unregistered multicast in the data plane sounds reasonable. There seems to be a gap in the bridge API, in that this operation also affects the control plane, which he is trying to fix with this "force flooding", because of insufficiently fine grained control. > > I also don't quite understand the suggestion of turning on mcast flooding: > > isn't Joseph saying that he wants it off for the unregistered multicast > > data traffic? > > Ah my bad, I meant to turn off flooding and bypass the bridge for those > packets and ports while necessary, under necessary can be any policy > that the user-space helper wants to implement. > > In any case, if this is going to be yet another kernel solution then it > must be a new option that is default off, and doesn't break current mcast > flood flag behaviour. Yeah, maybe something like this, simple and with clear offload semantics, as seen in existing hardware (not Marvell though): mcast_flood == off: - mcast_ipv4_ctrl_flood: don't care (maybe can force to "off") - mcast_ipv4_data_flood: don't care - mcast_ipv6_ctrl_flood: don't care - mcast_ipv6_data_flood: don't care - mcast_l2_flood: don't care mcast_flood == on: - Flood 224.0.0.x according to mcast_ipv4_ctrl_flood - Flood all other IPv4 multicast according to mcast_ipv4_data_flood - Flood ff02::/16 according to mcast_ipv6_ctrl_flood - Flood all other IPv6 multicast according to mcast_ipv6_data_flood - Flood L2 according to mcast_l2_flood
On 4/2/24 23:46, Vladimir Oltean wrote: > On Tue, Apr 02, 2024 at 09:50:51PM +0300, Nikolay Aleksandrov wrote: >> On 4/2/24 20:43, Vladimir Oltean wrote: >>> Hi Nikolai, >>> >>> On Tue, Apr 02, 2024 at 12:28:38PM +0300, Nikolay Aleksandrov wrote: >>>> For the bridge patches: >>>> Nacked-by: Nikolay Aleksandrov <razor@blackwall.org> >>>> >>>> You cannot break the multicast flood flag to add support for a custom >>>> use-case. This is unacceptable. The current bridge behaviour is correct >>>> your patch 02 doesn't fix anything, you should configure the bridge >>>> properly to avoid all those problems, not break protocols. >>>> >>>> Your special use case can easily be solved by a user-space helper or >>>> eBPF and nftables. You can set the mcast flood flag and bypass the >>>> bridge for these packets. I basically said the same in 2021, if this is >>>> going to be in the bridge it should be hidden behind an option that is >>>> default off. But in my opinion adding an option to solve such special >>>> cases is undesirable, they can be easily solved with what's currently >>>> available. >>> >>> I appreciate your time is limited, but could you please translate your >>> suggestion, and detail your proposed alternative a bit, for those of us >>> who are not very familiar with IP multicast snooping? >>> >> >> My suggestion is not related to snooping really, but to the goal of >> patches 01-03. The bridge patches in this set are trying to forward >> traffic that is not supposed to be forwarded with the proposed >> configuration, > > Correct up to a point. Reinterpreting the given user space configuration > and trying to make it do something else seems like a mistake, but in > principle one could also look at alternative bridge configurations like > the one I described here: > https://lore.kernel.org/netdev/20240402180805.yhhwj2f52sdc4dl2@skbuf/ > >> so that can be done by a user-space helper that installs >> rules to bypass the bridge specifically for those packets while >> monitoring the bridge state to implement a policy and manage these rules >> in order to keep snooping working. >> >>> Bypass the bridge for which packets? General IGMP/MLD queries? Wouldn't >>> that break snooping? And then do what with the packets, forward them in >>> another software layer than the bridge? >>> >> >> The ones that are not supposed to be forwarded in the proposed config >> and are needed for this use case (control traffic and link-local). Obviously >> to have proper snooping you'd need to manage these bypass >> rules and use them only while needed. > > I think Joseph will end up in a situation where he needs IGMP control > messages both in the bridge data path and outside of it :) > My solution does not exclude such scenario. With all unregistered mcast disabled it will be handled the same as with this proposed solution. > Also, your proposal eliminates the possibility of cooperating with a > hardware accelerator which can forward the IGMP messages where they need > to go. > > As far as I understand, I don't think Joseph has a very "special" use case. > Disabling flooding of unregistered multicast in the data plane sounds > reasonable. There seems to be a gap in the bridge API, in that this This we already have, but.. > operation also affects the control plane, which he is trying to fix with > this "force flooding", because of insufficiently fine grained control. > Right, this is the part that is more special, I'm not saying it's unreasonable. The proposition to make it optional and break it down to type of traffic sounds good to me. >>> I also don't quite understand the suggestion of turning on mcast flooding: >>> isn't Joseph saying that he wants it off for the unregistered multicast >>> data traffic? >> >> Ah my bad, I meant to turn off flooding and bypass the bridge for those >> packets and ports while necessary, under necessary can be any policy >> that the user-space helper wants to implement. >> >> In any case, if this is going to be yet another kernel solution then it >> must be a new option that is default off, and doesn't break current mcast >> flood flag behaviour. > > Yeah, maybe something like this, simple and with clear offload > semantics, as seen in existing hardware (not Marvell though): > > mcast_flood == off: > - mcast_ipv4_ctrl_flood: don't care (maybe can force to "off") > - mcast_ipv4_data_flood: don't care > - mcast_ipv6_ctrl_flood: don't care > - mcast_ipv6_data_flood: don't care > - mcast_l2_flood: don't care > mcast_flood == on: > - Flood 224.0.0.x according to mcast_ipv4_ctrl_flood > - Flood all other IPv4 multicast according to mcast_ipv4_data_flood > - Flood ff02::/16 according to mcast_ipv6_ctrl_flood > - Flood all other IPv6 multicast according to mcast_ipv6_data_flood > - Flood L2 according to mcast_l2_flood Yep, sounds good to me. I was thinking about something in these lines as well if doing a kernel solution in order to make it simpler and more generic. The ctrl flood bits need to be handled more carefully to make sure they match only control traffic and not link-local data. I think the old option can be converted to use this fine-grained control, that is if anyone sets the old flood on/off then the flood mask gets set properly so we can do just 1 & in the fast path and avoid adding more tests. It will also make it symmetric - if it can override the on case, then it will be able to override the off case. And to be more explicit you can pass a mask variable to br_multicast_rcv() which will get populated and then you can pass it down to br_flood(). That will also avoid adding new bits to the skb's bridge CB.
Hi Andrew, On 4/2/2024 8:43 AM, Andrew Lunn wrote: > On Mon, Apr 01, 2024 at 08:10:59PM -0400, Joseph Huang wrote: >> There is a use case where one would like to enable multicast snooping >> on a bridge but disable multicast flooding on all bridge ports so that >> registered multicast traffic will only reach the intended recipients and >> unregistered multicast traffic will be dropped. However, with existing >> bridge ports' mcast_flood flag implementation, it doesn't work as desired. >> >> This patchset aims to make multicast snooping work even when multicast >> flooding is disabled on the bridge ports, without changing the semantic of >> the mcast_flood flag too much. Patches 1 to 4 attempt to address this issue. >> >> Also, in a network where more than one multicast snooping capable bridges >> are interconnected without multicast routers being present, multicast >> snooping fails if: >> >> 1. The source is not directly attached to the Querier >> 2. The listener is beyond the mrouter port of the bridge where the >> source is directly attached >> 3. A hardware offloading switch is involved > > I've not studied the details here, but that last point makes me think > the offload driver is broken. There should not be any difference > between software bridging and hardware bridging. The whole idea is > that you take what Linux can do in software and accelerate it by > offloading to hardware. Doing acceleration should not change the > behaviour. > In patch 10 I gave a little more detail about the fix, but basically this is what happened. Assuming we have a soft bridge like the following: bp1 +------------+ Querier <---- | bridge | +------------+ bp2 | | bp3 | | v v MC Source MC Listener Here bp1 is the mrouter port, bp2 is connected to the multicast source, and bp3 is connected to the multicast listener who wishes to receive multicast traffic for that group. After some Query/Report exchange, the snooping code in the bridge is going to learn about the Listener from bp3, and is going to create an MDB group which includes bp3 as the sole member. When the bridge receives a multicast packet for that group from bp2, the bridge will do a look up to find the members of that group (in this case, bp3) and forward the packet to every single member in that group. At the same time, the bridge will also forward the packet to every mrouter port so that listeners beyond mrouter ports can receive that multicast packet as well. Now consider the same scenario, but with a hardware-offloaded switch: +------------+ | bridge | +------------+ ^ | | p6 (Host CPU port) p1/bp1 +------------+ Querier <---- | sw | +------------+ p2/bp2 | | p3/bp3 | | v v MC Source MC Listener Same Query/Report exchange, same MDB group, except that this time around the MDB group will be offloaded to the switch as well. So in the switch's ATU we will now have an entry for the multicast group and with p3 being the only member of that ATU. When the multicast packet arrives at the switch from p2, the switch will do an ATU lookup, and forward the packet to p3 only. This means that the Host CPU (p6) will not get a copy of the packet, and so the soft bridge will not have the opportunity to forward that packet to the mrouter port. This is what patch 10 attempts to address. One possible solution of course is to add p6 to every MDB group, however that's probably not very desirable. Besides, it will be more efficient if the packet is forwarded to the mrouter port by the switch in hardware (i.e., offload mrouter forwarding), vs. being forwarded in the bridge by software. >> The patches were developed against 5.15, and forward-ported to 6.8. >> Tests were done on a Pi 4B + Marvell 6393X Eval board with a single >> switch chip with no VLAN. > > So what is the mv88e6xxx driver doing wrong? > > Andrew > The mv88e6xxx driver did nothing differently than the other DSA drivers. I chose the mv88e6xxx driver to implement the fix simply because that's the only platform I have at hand to verify the solution.
On Thu, Apr 04, 2024 at 05:35:41PM -0400, Joseph Huang wrote: > Hi Andrew, > > On 4/2/2024 8:43 AM, Andrew Lunn wrote: > > On Mon, Apr 01, 2024 at 08:10:59PM -0400, Joseph Huang wrote: > > > There is a use case where one would like to enable multicast snooping > > > on a bridge but disable multicast flooding on all bridge ports so that > > > registered multicast traffic will only reach the intended recipients and > > > unregistered multicast traffic will be dropped. However, with existing > > > bridge ports' mcast_flood flag implementation, it doesn't work as desired. > > > > > > This patchset aims to make multicast snooping work even when multicast > > > flooding is disabled on the bridge ports, without changing the semantic of > > > the mcast_flood flag too much. Patches 1 to 4 attempt to address this issue. > > > > > > Also, in a network where more than one multicast snooping capable bridges > > > are interconnected without multicast routers being present, multicast > > > snooping fails if: > > > > > > 1. The source is not directly attached to the Querier > > > 2. The listener is beyond the mrouter port of the bridge where the > > > source is directly attached > > > 3. A hardware offloading switch is involved > > > > I've not studied the details here, but that last point makes me think > > the offload driver is broken. There should not be any difference > > between software bridging and hardware bridging. The whole idea is > > that you take what Linux can do in software and accelerate it by > > offloading to hardware. Doing acceleration should not change the > > behaviour. > > > > In patch 10 I gave a little more detail about the fix, but basically this is > what happened. > > Assuming we have a soft bridge like the following: > > bp1 +------------+ > Querier <---- | bridge | > +------------+ > bp2 | | bp3 > | | > v v > MC Source MC Listener > > Here bp1 is the mrouter port, bp2 is connected to the multicast source, and > bp3 is connected to the multicast listener who wishes to receive multicast > traffic for that group. > > After some Query/Report exchange, the snooping code in the bridge is going > to learn about the Listener from bp3, and is going to create an MDB group > which includes bp3 as the sole member. When the bridge receives a multicast > packet for that group from bp2, the bridge will do a look up to find the > members of that group (in this case, bp3) and forward the packet to every > single member in that group. At the same time, the bridge will also forward > the packet to every mrouter port so that listeners beyond mrouter ports can > receive that multicast packet as well. > > Now consider the same scenario, but with a hardware-offloaded switch: > > +------------+ > | bridge | > +------------+ > ^ > | > | p6 (Host CPU port) > p1/bp1 +------------+ > Querier <---- | sw | > +------------+ > p2/bp2 | | p3/bp3 > | | > v v > MC Source MC Listener > > Same Query/Report exchange, same MDB group, except that this time around the > MDB group will be offloaded to the switch as well. So in the switch's ATU we > will now have an entry for the multicast group and with p3 being the only > member of that ATU. When the multicast packet arrives at the switch from p2, > the switch will do an ATU lookup, and forward the packet to p3 only. This > means that the Host CPU (p6) will not get a copy of the packet, and so the > soft bridge will not have the opportunity to forward that packet to the > mrouter port. This is what patch 10 attempts to address. > > One possible solution of course is to add p6 to every MDB group, however > that's probably not very desirable. Besides, it will be more efficient if > the packet is forwarded to the mrouter port by the switch in hardware (i.e., > offload mrouter forwarding), vs. being forwarded in the bridge by software. Thanks for the explanation. So i think the key part which you said above is: At the same time, the bridge will also forward the packet to every mrouter port so that listeners beyond mrouter ports can receive that multicast packet as well. How does the bridge know about the mrouter port? It seems like the bridge needs to pass that information down to the switch. Is the mrouter itself performing a join on the group when it has members of the group on the other side of it? Andrew
On 4/2/2024 5:59 PM, Nikolay Aleksandrov wrote: > On 4/2/24 23:46, Vladimir Oltean wrote: >> On Tue, Apr 02, 2024 at 09:50:51PM +0300, Nikolay Aleksandrov wrote: >>> On 4/2/24 20:43, Vladimir Oltean wrote: >>>> Hi Nikolai, >>>> >>>> On Tue, Apr 02, 2024 at 12:28:38PM +0300, Nikolay Aleksandrov wrote: >>>>> For the bridge patches: >>>>> Nacked-by: Nikolay Aleksandrov <razor@blackwall.org> >>>>> >>>>> You cannot break the multicast flood flag to add support for a custom >>>>> use-case. This is unacceptable. The current bridge behaviour is correct >>>>> your patch 02 doesn't fix anything, you should configure the bridge >>>>> properly to avoid all those problems, not break protocols. >>>>> >>>>> Your special use case can easily be solved by a user-space helper or >>>>> eBPF and nftables. You can set the mcast flood flag and bypass the >>>>> bridge for these packets. I basically said the same in 2021, if this is >>>>> going to be in the bridge it should be hidden behind an option that is >>>>> default off. But in my opinion adding an option to solve such special >>>>> cases is undesirable, they can be easily solved with what's currently >>>>> available. >>>> >>>> I appreciate your time is limited, but could you please translate your >>>> suggestion, and detail your proposed alternative a bit, for those of us >>>> who are not very familiar with IP multicast snooping? >>>> >>> >>> My suggestion is not related to snooping really, but to the goal of >>> patches 01-03. The bridge patches in this set are trying to forward >>> traffic that is not supposed to be forwarded with the proposed >>> configuration, >> >> Correct up to a point. Reinterpreting the given user space configuration >> and trying to make it do something else seems like a mistake, but in >> principle one could also look at alternative bridge configurations like >> the one I described here: >> https://lore.kernel.org/netdev/20240402180805.yhhwj2f52sdc4dl2@skbuf/ >> >>> so that can be done by a user-space helper that installs >>> rules to bypass the bridge specifically for those packets while >>> monitoring the bridge state to implement a policy and manage these rules >>> in order to keep snooping working. >>> >>>> Bypass the bridge for which packets? General IGMP/MLD queries? Wouldn't >>>> that break snooping? And then do what with the packets, forward them in >>>> another software layer than the bridge? >>>> >>> >>> The ones that are not supposed to be forwarded in the proposed config >>> and are needed for this use case (control traffic and link-local). Obviously >>> to have proper snooping you'd need to manage these bypass >>> rules and use them only while needed. >> >> I think Joseph will end up in a situation where he needs IGMP control >> messages both in the bridge data path and outside of it :) >> > > My solution does not exclude such scenario. With all unregistered mcast > disabled it will be handled the same as with this proposed solution. > >> Also, your proposal eliminates the possibility of cooperating with a >> hardware accelerator which can forward the IGMP messages where they need >> to go. >> >> As far as I understand, I don't think Joseph has a very "special" use case. >> Disabling flooding of unregistered multicast in the data plane sounds >> reasonable. There seems to be a gap in the bridge API, in that this > > This we already have, but.. > >> operation also affects the control plane, which he is trying to fix with >> this "force flooding", because of insufficiently fine grained control. >> > > Right, this is the part that is more special, I'm not saying it's > unreasonable. The proposition to make it optional and break it down to > type of traffic sounds good to me. > >>>> I also don't quite understand the suggestion of turning on mcast flooding: >>>> isn't Joseph saying that he wants it off for the unregistered multicast >>>> data traffic? >>> >>> Ah my bad, I meant to turn off flooding and bypass the bridge for those >>> packets and ports while necessary, under necessary can be any policy >>> that the user-space helper wants to implement. >>> >>> In any case, if this is going to be yet another kernel solution then it >>> must be a new option that is default off, and doesn't break current mcast >>> flood flag behaviour. >> >> Yeah, maybe something like this, simple and with clear offload >> semantics, as seen in existing hardware (not Marvell though): >> >> mcast_flood == off: >> - mcast_ipv4_ctrl_flood: don't care (maybe can force to "off") >> - mcast_ipv4_data_flood: don't care >> - mcast_ipv6_ctrl_flood: don't care >> - mcast_ipv6_data_flood: don't care >> - mcast_l2_flood: don't care >> mcast_flood == on: >> - Flood 224.0.0.x according to mcast_ipv4_ctrl_flood >> - Flood all other IPv4 multicast according to mcast_ipv4_data_flood >> - Flood ff02::/16 according to mcast_ipv6_ctrl_flood >> - Flood all other IPv6 multicast according to mcast_ipv6_data_flood >> - Flood L2 according to mcast_l2_flood Did you mean if mcast_flood == on (meaning mcast_flood is ENABLED) - mcast_ipv4_ctrl_flood: don't care (since 224.0.0.x will be flooded anyway) ... if mcast_flood == off (meaning mcast_flood is DISABLED) - Flood 224.0.0.x according to mcast_ipv4_ctrl_flood ... ? Otherwise the problem is still not solved when mcast_flood is disabled. > > Yep, sounds good to me. I was thinking about something in these lines > as well if doing a kernel solution in order to make it simpler and more > generic. The ctrl flood bits need to be handled more carefully to make > sure they match only control traffic and not link-local data. Do we consider 224.0.0.251 (mDNS) to be control or data? What qualifies as control I guess that's my question. > I think the old option can be converted to use this fine-grained > control, that is if anyone sets the old flood on/off then the flood > mask gets set properly so we can do just 1 & in the fast path and avoid > adding more tests. It will also make it symmetric - if it can override > the on case, then it will be able to override the off case. And to be > more explicit you can pass a mask variable to br_multicast_rcv() which > will get populated and then you can pass it down to br_flood(). That > will also avoid adding new bits to the skb's bridge CB. > >
Hi Andrew, On 4/4/2024 6:11 PM, Andrew Lunn wrote: > On Thu, Apr 04, 2024 at 05:35:41PM -0400, Joseph Huang wrote: >> Hi Andrew, >> >> On 4/2/2024 8:43 AM, Andrew Lunn wrote: >>> On Mon, Apr 01, 2024 at 08:10:59PM -0400, Joseph Huang wrote: >>>> There is a use case where one would like to enable multicast snooping >>>> on a bridge but disable multicast flooding on all bridge ports so that >>>> registered multicast traffic will only reach the intended recipients and >>>> unregistered multicast traffic will be dropped. However, with existing >>>> bridge ports' mcast_flood flag implementation, it doesn't work as desired. >>>> >>>> This patchset aims to make multicast snooping work even when multicast >>>> flooding is disabled on the bridge ports, without changing the semantic of >>>> the mcast_flood flag too much. Patches 1 to 4 attempt to address this issue. >>>> >>>> Also, in a network where more than one multicast snooping capable bridges >>>> are interconnected without multicast routers being present, multicast >>>> snooping fails if: >>>> >>>> 1. The source is not directly attached to the Querier >>>> 2. The listener is beyond the mrouter port of the bridge where the >>>> source is directly attached >>>> 3. A hardware offloading switch is involved >>> >>> I've not studied the details here, but that last point makes me think >>> the offload driver is broken. There should not be any difference >>> between software bridging and hardware bridging. The whole idea is >>> that you take what Linux can do in software and accelerate it by >>> offloading to hardware. Doing acceleration should not change the >>> behaviour. >>> >> >> In patch 10 I gave a little more detail about the fix, but basically this is >> what happened. >> >> Assuming we have a soft bridge like the following: >> >> bp1 +------------+ >> Querier <---- | bridge | >> +------------+ >> bp2 | | bp3 >> | | >> v v >> MC Source MC Listener >> >> Here bp1 is the mrouter port, bp2 is connected to the multicast source, and >> bp3 is connected to the multicast listener who wishes to receive multicast >> traffic for that group. >> >> After some Query/Report exchange, the snooping code in the bridge is going >> to learn about the Listener from bp3, and is going to create an MDB group >> which includes bp3 as the sole member. When the bridge receives a multicast >> packet for that group from bp2, the bridge will do a look up to find the >> members of that group (in this case, bp3) and forward the packet to every >> single member in that group. At the same time, the bridge will also forward >> the packet to every mrouter port so that listeners beyond mrouter ports can >> receive that multicast packet as well. >> >> Now consider the same scenario, but with a hardware-offloaded switch: >> >> +------------+ >> | bridge | >> +------------+ >> ^ >> | >> | p6 (Host CPU port) >> p1/bp1 +------------+ >> Querier <---- | sw | >> +------------+ >> p2/bp2 | | p3/bp3 >> | | >> v v >> MC Source MC Listener >> >> Same Query/Report exchange, same MDB group, except that this time around the >> MDB group will be offloaded to the switch as well. So in the switch's ATU we >> will now have an entry for the multicast group and with p3 being the only >> member of that ATU. When the multicast packet arrives at the switch from p2, >> the switch will do an ATU lookup, and forward the packet to p3 only. This >> means that the Host CPU (p6) will not get a copy of the packet, and so the >> soft bridge will not have the opportunity to forward that packet to the >> mrouter port. This is what patch 10 attempts to address. >> >> One possible solution of course is to add p6 to every MDB group, however >> that's probably not very desirable. Besides, it will be more efficient if >> the packet is forwarded to the mrouter port by the switch in hardware (i.e., >> offload mrouter forwarding), vs. being forwarded in the bridge by software. > > Thanks for the explanation. So i think the key part which you said > above is: > > At the same time, the bridge will also forward the packet to every > mrouter port so that listeners beyond mrouter ports can receive that > multicast packet as well. > > How does the bridge know about the mrouter port? It seems like the The bridge learns about the existence of the Querier by the reception of Queries. The bridge will mark the port which it received Queries from as the mrouter port. > bridge needs to pass that information down to the switch. Is the The bridge does pass that information down to switchdev. Patch 5 adds DSA handling of that event as well. Patches 9 and 10 adds the support in the mv88e6xxx driver. > mrouter itself performing a join on the group when it has members of > the group on the other side of it? You hit a key point here. The Querier does not send Report (not with IGMPv2 anyway), so the bridge will never add the mrouter port to any MDB group. That's why patch 10 is needed. Prestera driver does something similar (which is what patches 6,7, and 10 are modeled after). > > Andrew
On Thu, Apr 04, 2024 at 06:16:12PM -0400, Joseph Huang wrote: > > > mcast_flood == off: > > > - mcast_ipv4_ctrl_flood: don't care (maybe can force to "off") > > > - mcast_ipv4_data_flood: don't care > > > - mcast_ipv6_ctrl_flood: don't care > > > - mcast_ipv6_data_flood: don't care > > > - mcast_l2_flood: don't care > > > mcast_flood == on: > > > - Flood 224.0.0.x according to mcast_ipv4_ctrl_flood > > > - Flood all other IPv4 multicast according to mcast_ipv4_data_flood > > > - Flood ff02::/16 according to mcast_ipv6_ctrl_flood > > > - Flood all other IPv6 multicast according to mcast_ipv6_data_flood > > > - Flood L2 according to mcast_l2_flood > > Did you mean > > if mcast_flood == on (meaning mcast_flood is ENABLED) > - mcast_ipv4_ctrl_flood: don't care (since 224.0.0.x will be flooded anyway) > ... > > if mcast_flood == off (meaning mcast_flood is DISABLED) > - Flood 224.0.0.x according to mcast_ipv4_ctrl_flood > ... > > ? Otherwise the problem is still not solved when mcast_flood is disabled. No, I mean exactly as I said. My goal was not to "solve the problem" when mcast_flood is disabled, but to give you an option to configure the bridge to achieve what you want, in a way which I think is more acceptable. AFAIU, there is not really any "problem" - the bridge behaves exactly as instructed given the limited language available to instruct it ("mcast_flood" covers all multicast). So the other knobs have the role of fine-tuning what gets flooded when mcast_flood is on. Like "yes, but..." You can't "solve the problem" when it involves changing an established behavior that somebody probably depended on to be just like that. > > Yep, sounds good to me. I was thinking about something in these lines > > as well if doing a kernel solution in order to make it simpler and more > > generic. The ctrl flood bits need to be handled more carefully to make > > sure they match only control traffic and not link-local data. > > Do we consider 224.0.0.251 (mDNS) to be control or data? What qualifies as > control I guess that's my question. Well, as I said, I'm proposing that 224.0.0.x qualifies as control and the rest of IPv4 multicast as data. Which means that, applied to your case, "mcast_flood on mcast_ipv4_ctrl_flood on mcast_ipv4_data_flood off" will "force flood" mDNS just like the IGMP traffic from your patches. I'm not aware if this could be considered problematic (I don't think so). The reason behind this proposal is that, AFAIU, endpoints may choose to join IGMP groups in the 224.0.0.x range or not, but RFC4541 says that switches shouldn't prune the destinations towards endpoints that don't join this range anyway: https://www.rfc-editor.org/rfc/rfc4541#page-6 Whereas for IP multicast traffic towards an address outside 224.0.0.x, pruning will happen as per the IGMP join tracking mechanism.
On 4/5/24 13:20, Vladimir Oltean wrote: > On Thu, Apr 04, 2024 at 06:16:12PM -0400, Joseph Huang wrote: >>>> mcast_flood == off: >>>> - mcast_ipv4_ctrl_flood: don't care (maybe can force to "off") >>>> - mcast_ipv4_data_flood: don't care >>>> - mcast_ipv6_ctrl_flood: don't care >>>> - mcast_ipv6_data_flood: don't care >>>> - mcast_l2_flood: don't care >>>> mcast_flood == on: >>>> - Flood 224.0.0.x according to mcast_ipv4_ctrl_flood >>>> - Flood all other IPv4 multicast according to mcast_ipv4_data_flood >>>> - Flood ff02::/16 according to mcast_ipv6_ctrl_flood >>>> - Flood all other IPv6 multicast according to mcast_ipv6_data_flood >>>> - Flood L2 according to mcast_l2_flood >> >> Did you mean >> >> if mcast_flood == on (meaning mcast_flood is ENABLED) >> - mcast_ipv4_ctrl_flood: don't care (since 224.0.0.x will be flooded anyway) >> ... >> >> if mcast_flood == off (meaning mcast_flood is DISABLED) >> - Flood 224.0.0.x according to mcast_ipv4_ctrl_flood >> ... >> >> ? Otherwise the problem is still not solved when mcast_flood is disabled. > > No, I mean exactly as I said. My goal was not to "solve the problem" > when mcast_flood is disabled, but to give you an option to configure the > bridge to achieve what you want, in a way which I think is more acceptable. > > AFAIU, there is not really any "problem" - the bridge behaves exactly as > instructed given the limited language available to instruct it ("mcast_flood" > covers all multicast). So the other knobs have the role of fine-tuning > what gets flooded when mcast_flood is on. Like "yes, but..." > > You can't "solve the problem" when it involves changing an established > behavior that somebody probably depended on to be just like that. > >>> Yep, sounds good to me. I was thinking about something in these lines >>> as well if doing a kernel solution in order to make it simpler and more >>> generic. The ctrl flood bits need to be handled more carefully to make >>> sure they match only control traffic and not link-local data. >> >> Do we consider 224.0.0.251 (mDNS) to be control or data? What qualifies as >> control I guess that's my question. > > Well, as I said, I'm proposing that 224.0.0.x qualifies as control and > the rest of IPv4 multicast as data. Which means that, applied to your > case, "mcast_flood on mcast_ipv4_ctrl_flood on mcast_ipv4_data_flood off" > will "force flood" mDNS just like the IGMP traffic from your patches. > I'm not aware if this could be considered problematic (I don't think so). > > The reason behind this proposal is that, AFAIU, endpoints may choose to > join IGMP groups in the 224.0.0.x range or not, but RFC4541 says that > switches shouldn't prune the destinations towards endpoints that don't > join this range anyway: https://www.rfc-editor.org/rfc/rfc4541#page-6 > > Whereas for IP multicast traffic towards an address outside 224.0.0.x, > pruning will happen as per the IGMP join tracking mechanism. +1, non-IGMP traffic to 224.0.0.x must be flooded to all anyway so this should allow for a better control over it, but perhaps the naming should be link_local instead because control usually means IGMP, maybe something like mcast_ip_link_local_flood
> > Thanks for the explanation. So i think the key part which you said > > above is: > > > > At the same time, the bridge will also forward the packet to every > > mrouter port so that listeners beyond mrouter ports can receive that > > multicast packet as well. > > > > How does the bridge know about the mrouter port? It seems like the > > The bridge learns about the existence of the Querier by the reception of > Queries. The bridge will mark the port which it received Queries from as the > mrouter port. > > > bridge needs to pass that information down to the switch. Is the > > The bridge does pass that information down to switchdev. Patch 5 adds DSA > handling of that event as well. Patches 9 and 10 adds the support in the > mv88e6xxx driver. I've not been looking at the details too much for this patchset. It does however seem that some parts are controversial, and others just seem like a bug fix. Does it make sense to split this into two patchsets? Andrew
On 4/5/2024 7:00 AM, Nikolay Aleksandrov wrote: > On 4/5/24 13:20, Vladimir Oltean wrote: >> On Thu, Apr 04, 2024 at 06:16:12PM -0400, Joseph Huang wrote: >>>>> mcast_flood == off: >>>>> - mcast_ipv4_ctrl_flood: don't care (maybe can force to "off") >>>>> - mcast_ipv4_data_flood: don't care >>>>> - mcast_ipv6_ctrl_flood: don't care >>>>> - mcast_ipv6_data_flood: don't care >>>>> - mcast_l2_flood: don't care >>>>> mcast_flood == on: >>>>> - Flood 224.0.0.x according to mcast_ipv4_ctrl_flood >>>>> - Flood all other IPv4 multicast according to mcast_ipv4_data_flood >>>>> - Flood ff02::/16 according to mcast_ipv6_ctrl_flood >>>>> - Flood all other IPv6 multicast according to mcast_ipv6_data_flood >>>>> - Flood L2 according to mcast_l2_flood >>> >>> Did you mean >>> >>> if mcast_flood == on (meaning mcast_flood is ENABLED) >>> - mcast_ipv4_ctrl_flood: don't care (since 224.0.0.x will be flooded >>> anyway) >>> ... >>> >>> if mcast_flood == off (meaning mcast_flood is DISABLED) >>> - Flood 224.0.0.x according to mcast_ipv4_ctrl_flood >>> ... >>> >>> ? Otherwise the problem is still not solved when mcast_flood is >>> disabled. >> >> No, I mean exactly as I said. My goal was not to "solve the problem" >> when mcast_flood is disabled, but to give you an option to configure the >> bridge to achieve what you want, in a way which I think is more >> acceptable. >> >> AFAIU, there is not really any "problem" - the bridge behaves exactly as >> instructed given the limited language available to instruct it >> ("mcast_flood" >> covers all multicast). So the other knobs have the role of fine-tuning >> what gets flooded when mcast_flood is on. Like "yes, but..." >> >> You can't "solve the problem" when it involves changing an established >> behavior that somebody probably depended on to be just like that. >> >>>> Yep, sounds good to me. I was thinking about something in these lines >>>> as well if doing a kernel solution in order to make it simpler and more >>>> generic. The ctrl flood bits need to be handled more carefully to make >>>> sure they match only control traffic and not link-local data. >>> >>> Do we consider 224.0.0.251 (mDNS) to be control or data? What >>> qualifies as >>> control I guess that's my question. >> >> Well, as I said, I'm proposing that 224.0.0.x qualifies as control and >> the rest of IPv4 multicast as data. Which means that, applied to your >> case, "mcast_flood on mcast_ipv4_ctrl_flood on mcast_ipv4_data_flood off" >> will "force flood" mDNS just like the IGMP traffic from your patches. >> I'm not aware if this could be considered problematic (I don't think so). >> >> The reason behind this proposal is that, AFAIU, endpoints may choose to >> join IGMP groups in the 224.0.0.x range or not, but RFC4541 says that >> switches shouldn't prune the destinations towards endpoints that don't >> join this range anyway: https://www.rfc-editor.org/rfc/rfc4541#page-6 >> >> Whereas for IP multicast traffic towards an address outside 224.0.0.x, >> pruning will happen as per the IGMP join tracking mechanism. > > +1, non-IGMP traffic to 224.0.0.x must be flooded to all anyway > so this should allow for a better control over it, but perhaps > the naming should be link_local instead because control usually > means IGMP, maybe something like mcast_ip_link_local_flood > Like this? bridge link set dev swp0 mcast_flood off - all flooding disabled bridge link set dev swp0 mcast_flood on - all flooding enabled bridge link set dev swp0 mcast_flood on mcast_ipv4_data_flood off mcast_ipv6_data_flood off - IPv4 data packets flooding disabled, IPv6 data packets flooding disabled, everything else floods (that is to say, only allow IPv4 local subnet and IPv6 link-local to flood) ? The syntax seems to be counterintuitive. Or like this? bridge link set dev swp0 mcast_flood on mcast_ipv4_ctrl_flood on - only allow IPv4 local subnet to flood, everything else off ? So basically the question is, what should the behavior be when something is omitted from the command line?
On Fri, Apr 05, 2024 at 04:22:43PM -0400, Joseph Huang wrote: > Like this? > > bridge link set dev swp0 mcast_flood off > - all flooding disabled > > bridge link set dev swp0 mcast_flood on > - all flooding enabled > > bridge link set dev swp0 mcast_flood on mcast_ipv4_data_flood off > mcast_ipv6_data_flood off > - IPv4 data packets flooding disabled, IPv6 data packets flooding > disabled, everything else floods (that is to say, only allow IPv4 local > subnet and IPv6 link-local to flood) > > ? Yeah. > The syntax seems to be counterintuitive. > > Or like this? > > bridge link set dev swp0 mcast_flood on mcast_ipv4_ctrl_flood on > - only allow IPv4 local subnet to flood, everything else off > > ? Nope. > So basically the question is, what should the behavior be when something is > omitted from the command line? The answer is always: "new options should default to behaving exactly like before". It's not just about the command line arguments, but also about the actual netlink attributes that iproute2 (and other tooling) creates when communicating with the kernel. Old user space has no idea about the existence of mcast_ipv4_ctrl_flood et. al. So, if netlink attributes specifying their value are not sent by user space, their value in the kernel must mimic the value of mcast_flood.
On 4/5/2024 5:15 PM, Vladimir Oltean wrote: > On Fri, Apr 05, 2024 at 04:22:43PM -0400, Joseph Huang wrote: >> Like this? >> >> bridge link set dev swp0 mcast_flood off >> - all flooding disabled >> >> bridge link set dev swp0 mcast_flood on >> - all flooding enabled >> >> bridge link set dev swp0 mcast_flood on mcast_ipv4_data_flood off >> mcast_ipv6_data_flood off >> - IPv4 data packets flooding disabled, IPv6 data packets flooding >> disabled, everything else floods (that is to say, only allow IPv4 local >> subnet and IPv6 link-local to flood) >> >> ? > > Yeah. > >> The syntax seems to be counterintuitive. >> >> Or like this? >> >> bridge link set dev swp0 mcast_flood on mcast_ipv4_ctrl_flood on >> - only allow IPv4 local subnet to flood, everything else off >> >> ? > > Nope. > >> So basically the question is, what should the behavior be when something is >> omitted from the command line? > > The answer is always: "new options should default to behaving exactly > like before". It's not just about the command line arguments, but also > about the actual netlink attributes that iproute2 (and other tooling) > creates when communicating with the kernel. Old user space has no idea > about the existence of mcast_ipv4_ctrl_flood et. al. So, if netlink > attributes specifying their value are not sent by user space, their > value in the kernel must mimic the value of mcast_flood. How about the following syntax? I think it satisfies all the "not breaking existing behavior" requirements (new option defaults to off, and missing user space netlink attributes does not change the existing behavior): mcast_flood off all off mcast_flood off mcast_flood_rfc4541 off all off mcast_flood off mcast_flood_rfc4541 on 224.0.0.X and ff02::1 on, the rest off mcast_flood on all on mcast_flood on mcast_flood_rfc4541 off all on (mcast_flood on overrides mcast_flood_rfc4541) mcast_flood on mcast_flood_rfc4541 on all on mcast_flood_rfc4541 off invalid (mcast_flood_rfc4541 is only valid if mcast_flood [on | off] is specified first) mcast_flood_rfc4541 on invalid (mcast_flood_rfc4541 is only valid if mcast_flood [on | off] is specified first) Think of mcast_flood_rfc4541 like a pet door if you will.
On Mon, Apr 29, 2024 at 04:14:03PM -0400, Joseph Huang wrote: > How about the following syntax? I think it satisfies all the "not breaking > existing behavior" requirements (new option defaults to off, and missing > user space netlink attributes does not change the existing behavior): > > mcast_flood off > all off > mcast_flood off mcast_flood_rfc4541 off > all off > mcast_flood off mcast_flood_rfc4541 on > 224.0.0.X and ff02::1 on, the rest off > mcast_flood on > all on > mcast_flood on mcast_flood_rfc4541 off > all on (mcast_flood on overrides mcast_flood_rfc4541) > mcast_flood on mcast_flood_rfc4541 on > all on > mcast_flood_rfc4541 off > invalid (mcast_flood_rfc4541 is only valid if mcast_flood [on | off] is > specified first) > mcast_flood_rfc4541 on > invalid (mcast_flood_rfc4541 is only valid if mcast_flood [on | off] is > specified first) A bridge port defaults to having BR_MCAST_FLOOD set - see new_nbp(). Netlink attributes are only there to _change_ the state of properties in the kernel. They don't need to be specified by user space if there's nothing to be changed. "Only valid if another netlink attribute comes first" makes no sense. You can alter 2 bridge port flags as part of the same netlink message, or as part of different netlink messages (sent over sockets of other processes). > > Think of mcast_flood_rfc4541 like a pet door if you will. Ultimately, as far as I see it, both the OR-based and the AND-based UAPI addition could be made to work in a way that's perhaps similarly backwards compatible. It needs to be worked out with the bridge maintainers. Given that I'm not doing great with my spare time, I will take a back seat on that. However, what I don't quite understand in your proposal is how many IPv4 addresses lie beyond the "224.0.0.X" notation? 256? Explain why there is such a large discrepancy in the number of IPv4 addresses you are in control of (256), vs only 1 IPv6 address with the new rfc4541 flag?
On 4/29/2024 9:21 PM, Vladimir Oltean wrote: > On Mon, Apr 29, 2024 at 04:14:03PM -0400, Joseph Huang wrote: >> How about the following syntax? I think it satisfies all the "not breaking >> existing behavior" requirements (new option defaults to off, and missing >> user space netlink attributes does not change the existing behavior): >> >> mcast_flood off >> all off >> mcast_flood off mcast_flood_rfc4541 off >> all off >> mcast_flood off mcast_flood_rfc4541 on >> 224.0.0.X and ff02::1 on, the rest off >> mcast_flood on >> all on >> mcast_flood on mcast_flood_rfc4541 off >> all on (mcast_flood on overrides mcast_flood_rfc4541) >> mcast_flood on mcast_flood_rfc4541 on >> all on >> mcast_flood_rfc4541 off >> invalid (mcast_flood_rfc4541 is only valid if mcast_flood [on | off] is >> specified first) >> mcast_flood_rfc4541 on >> invalid (mcast_flood_rfc4541 is only valid if mcast_flood [on | off] is >> specified first) > > A bridge port defaults to having BR_MCAST_FLOOD set - see new_nbp(). > Netlink attributes are only there to _change_ the state of properties in > the kernel. They don't need to be specified by user space if there's > nothing to be changed. "Only valid if another netlink attribute comes > first" makes no sense. You can alter 2 bridge port flags as part of the > same netlink message, or as part of different netlink messages (sent > over sockets of other processes). > >> >> Think of mcast_flood_rfc4541 like a pet door if you will. > > Ultimately, as far as I see it, both the OR-based and the AND-based UAPI > addition could be made to work in a way that's perhaps similarly backwards > compatible. It needs to be worked out with the bridge maintainers. Given > that I'm not doing great with my spare time, I will take a back seat on > that. Nik, do you have any objection to the following proposal? mcast_flood -> default/ off on (existing flag) missing (specified/ (specified/ (on) nlmsg) nlmsg) mcast_flood_rfc4541 (proposed new flag) | v default/ flood all no flood flood all missing (off) off flood all no flood flood all (specified/nlmsg) on flood all flood 4541 flood all (specified/nlmsg) ^^^^^^^^^^ only behavior change Basically the attributes are OR'ed together to form the final flooding decision. > > However, what I don't quite understand in your proposal is how many IPv4 > addresses lie beyond the "224.0.0.X" notation? 256? Explain why there is > such a large discrepancy in the number of IPv4 addresses you are in > control of (256), vs only 1 IPv6 address with the new rfc4541 flag? That's straight out of RFC-4541 ("coincidentally", these are also the IP addresses for which the bridge will not create mdb's): 2.1.2. 2) Packets with a destination IP (DIP) address in the 224.0.0.X range which are not IGMP must be forwarded on all ports. This recommendation is based on the fact that many host systems do not send Join IP multicast addresses in this range before sending or listening to IP multicast packets. Furthermore, since the 224.0.0.X address range is defined as link-local (not to be routed), it seems unnecessary to keep the state for each address in this range. Additionally, some routers operate in the 224.0.0.X address range without issuing IGMP Joins, and these applications would break if the switch were to prune them due to not having seen a Join Group message from the router. and 3. In IPv6, the data forwarding rules are more straight forward because MLD is mandated for addresses with scope 2 (link-scope) or greater. The only exception is the address FF02::1 which is the all hosts link-scope address for which MLD messages are never sent. Packets with the all hosts link-scope address should be forwarded on all ports.
On 30/04/2024 20:01, Joseph Huang wrote: > On 4/29/2024 9:21 PM, Vladimir Oltean wrote: >> On Mon, Apr 29, 2024 at 04:14:03PM -0400, Joseph Huang wrote: >>> How about the following syntax? I think it satisfies all the "not breaking >>> existing behavior" requirements (new option defaults to off, and missing >>> user space netlink attributes does not change the existing behavior): >>> >>> mcast_flood off >>> all off >>> mcast_flood off mcast_flood_rfc4541 off >>> all off >>> mcast_flood off mcast_flood_rfc4541 on >>> 224.0.0.X and ff02::1 on, the rest off >>> mcast_flood on >>> all on >>> mcast_flood on mcast_flood_rfc4541 off >>> all on (mcast_flood on overrides mcast_flood_rfc4541) >>> mcast_flood on mcast_flood_rfc4541 on >>> all on >>> mcast_flood_rfc4541 off >>> invalid (mcast_flood_rfc4541 is only valid if mcast_flood [on | off] is >>> specified first) >>> mcast_flood_rfc4541 on >>> invalid (mcast_flood_rfc4541 is only valid if mcast_flood [on | off] is >>> specified first) >> >> A bridge port defaults to having BR_MCAST_FLOOD set - see new_nbp(). >> Netlink attributes are only there to _change_ the state of properties in >> the kernel. They don't need to be specified by user space if there's >> nothing to be changed. "Only valid if another netlink attribute comes >> first" makes no sense. You can alter 2 bridge port flags as part of the >> same netlink message, or as part of different netlink messages (sent >> over sockets of other processes). >> >>> >>> Think of mcast_flood_rfc4541 like a pet door if you will. >> >> Ultimately, as far as I see it, both the OR-based and the AND-based UAPI >> addition could be made to work in a way that's perhaps similarly backwards >> compatible. It needs to be worked out with the bridge maintainers. Given >> that I'm not doing great with my spare time, I will take a back seat on >> that. > > Nik, do you have any objection to the following proposal? > > mcast_flood -> default/ off on > (existing flag) missing (specified/ (specified/ > (on) nlmsg) nlmsg) > > mcast_flood_rfc4541 > (proposed new flag) > | > v > default/ flood all no flood flood all > missing > (off) > > off flood all no flood flood all > (specified/nlmsg) > > on flood all flood 4541 flood all > (specified/nlmsg) ^^^^^^^^^^ > only behavior change > > > Basically the attributes are OR'ed together to form the final flooding decision. > > Looks good to me. Please make use of the boolopt uapi to avoid adding new nl attributes. Thanks, Nik