diff mbox series

[net-next,v2] bonding: add a vlan+mac tx hashing option

Message ID 20210113223548.1171655-1-jarod@redhat.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v2] bonding: add a vlan+mac tx hashing option | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 2 maintainers not CCed: linux-doc@vger.kernel.org corbet@lwn.net
netdev/source_inline fail Was 0 now: 1
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 7151 this patch: 7151
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning WARNING: quoted string split across lines
netdev/build_allmodconfig_warn success Errors and warnings before: 7560 this patch: 7560
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Jarod Wilson Jan. 13, 2021, 10:35 p.m. UTC
This comes from an end-user request, where they're running multiple VMs on
hosts with bonded interfaces connected to some interest switch topologies,
where 802.3ad isn't an option. They're currently running a proprietary
solution that effectively achieves load-balancing of VMs and bandwidth
utilization improvements with a similar form of transmission algorithm.

Basically, each VM has it's own vlan, so it always sends its traffic out
the same interface, unless that interface fails. Traffic gets split
between the interfaces, maintaining a consistent path, with failover still
available if an interface goes down.

This has been rudimetarily tested to provide similar results, suitable for
them to use to move off their current proprietary solution. A patch for
iproute2 is forthcoming as well, to properly support the new mode there as
well.

Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Thomas Davis <tadavis@lbl.gov>
Cc: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
v2: verified netlink interfaces working, added Documentation, changed
tx hash mode name to vlan+mac for consistency and clarity.

 Documentation/networking/bonding.rst | 13 +++++++++++++
 drivers/net/bonding/bond_main.c      | 27 +++++++++++++++++++++++++--
 drivers/net/bonding/bond_options.c   |  1 +
 include/linux/netdevice.h            |  1 +
 include/uapi/linux/if_bonding.h      |  1 +
 5 files changed, 41 insertions(+), 2 deletions(-)

Comments

Jakub Kicinski Jan. 14, 2021, 1:58 a.m. UTC | #1
On Wed, 13 Jan 2021 17:35:48 -0500 Jarod Wilson wrote:
> This comes from an end-user request, where they're running multiple VMs on
> hosts with bonded interfaces connected to some interest switch topologies,
> where 802.3ad isn't an option. They're currently running a proprietary
> solution that effectively achieves load-balancing of VMs and bandwidth
> utilization improvements with a similar form of transmission algorithm.
> 
> Basically, each VM has it's own vlan, so it always sends its traffic out
> the same interface, unless that interface fails. Traffic gets split
> between the interfaces, maintaining a consistent path, with failover still
> available if an interface goes down.
> 
> This has been rudimetarily tested to provide similar results, suitable for
> them to use to move off their current proprietary solution. A patch for
> iproute2 is forthcoming as well, to properly support the new mode there as
> well.

> Signed-off-by: Jarod Wilson <jarod@redhat.com>
> ---
> v2: verified netlink interfaces working, added Documentation, changed
> tx hash mode name to vlan+mac for consistency and clarity.
> 
>  Documentation/networking/bonding.rst | 13 +++++++++++++
>  drivers/net/bonding/bond_main.c      | 27 +++++++++++++++++++++++++--
>  drivers/net/bonding/bond_options.c   |  1 +
>  include/linux/netdevice.h            |  1 +
>  include/uapi/linux/if_bonding.h      |  1 +
>  5 files changed, 41 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
> index adc314639085..c78ceb7630a0 100644
> --- a/Documentation/networking/bonding.rst
> +++ b/Documentation/networking/bonding.rst
> @@ -951,6 +951,19 @@ xmit_hash_policy
>  		packets will be distributed according to the encapsulated
>  		flows.
>  
> +	vlan+mac
> +
> +		This policy uses a very rudimentary vland ID and source mac
> +		ID hash to load-balance traffic per-vlan, with failover
> +		should one leg fail. The intended use case is for a bond
> +		shared by multiple virtual machines, all configured to
> +		use their own vlan, to give lacp-like functionality
> +		without requiring lacp-capable switching hardware.
> +
> +		The formula for the hash is simply
> +
> +		hash = (vlan ID) XOR (source MAC)

But in the code it's only using one byte of the MAC, currently.

I think that's fine for the particular use case but should we call out
explicitly in the commit message why it's considered sufficient?

Someone can change it later, if needed, but best if we spell out the
current motivation.

>  	The default value is layer2.  This option was added in bonding
>  	version 2.6.3.  In earlier versions of bonding, this parameter
>  	does not exist, and the layer2 policy is the only policy.  The

> +static inline u32 bond_vlan_srcmac_hash(struct sk_buff *skb)

Can we drop the inline? It's a static function called once.

> +{
> +	struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);

I don't see anything in the patch making sure the interface actually
has a L2 header. Should we validate somehow the ifc is Ethernet?

> +	u32 srcmac = mac_hdr->h_source[5];
> +	u16 vlan;
> +
> +	if (!skb_vlan_tag_present(skb))
> +		return srcmac;
> +
> +	vlan = skb_vlan_tag_get(skb);
> +
> +	return srcmac ^ vlan;
> +}
Jarod Wilson Jan. 14, 2021, 9:11 p.m. UTC | #2
On Wed, Jan 13, 2021 at 05:58:18PM -0800, Jakub Kicinski wrote:
> On Wed, 13 Jan 2021 17:35:48 -0500 Jarod Wilson wrote:
> > This comes from an end-user request, where they're running multiple VMs on
> > hosts with bonded interfaces connected to some interest switch topologies,
> > where 802.3ad isn't an option. They're currently running a proprietary
> > solution that effectively achieves load-balancing of VMs and bandwidth
> > utilization improvements with a similar form of transmission algorithm.
> > 
> > Basically, each VM has it's own vlan, so it always sends its traffic out
> > the same interface, unless that interface fails. Traffic gets split
> > between the interfaces, maintaining a consistent path, with failover still
> > available if an interface goes down.
> > 
> > This has been rudimetarily tested to provide similar results, suitable for
> > them to use to move off their current proprietary solution. A patch for
> > iproute2 is forthcoming as well, to properly support the new mode there as
> > well.
> 
> > Signed-off-by: Jarod Wilson <jarod@redhat.com>
> > ---
> > v2: verified netlink interfaces working, added Documentation, changed
> > tx hash mode name to vlan+mac for consistency and clarity.
> > 
> >  Documentation/networking/bonding.rst | 13 +++++++++++++
> >  drivers/net/bonding/bond_main.c      | 27 +++++++++++++++++++++++++--
> >  drivers/net/bonding/bond_options.c   |  1 +
> >  include/linux/netdevice.h            |  1 +
> >  include/uapi/linux/if_bonding.h      |  1 +
> >  5 files changed, 41 insertions(+), 2 deletions(-)
> > 
> > diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
> > index adc314639085..c78ceb7630a0 100644
> > --- a/Documentation/networking/bonding.rst
> > +++ b/Documentation/networking/bonding.rst
> > @@ -951,6 +951,19 @@ xmit_hash_policy
> >  		packets will be distributed according to the encapsulated
> >  		flows.
> >  
> > +	vlan+mac
> > +
> > +		This policy uses a very rudimentary vland ID and source mac
> > +		ID hash to load-balance traffic per-vlan, with failover
> > +		should one leg fail. The intended use case is for a bond
> > +		shared by multiple virtual machines, all configured to
> > +		use their own vlan, to give lacp-like functionality
> > +		without requiring lacp-capable switching hardware.
> > +
> > +		The formula for the hash is simply
> > +
> > +		hash = (vlan ID) XOR (source MAC)
> 
> But in the code it's only using one byte of the MAC, currently.
> 
> I think that's fine for the particular use case but should we call out
> explicitly in the commit message why it's considered sufficient?
> 
> Someone can change it later, if needed, but best if we spell out the
> current motivation.

In truth, this code started out as a copy of bond_eth_hash(), which also
only uses the last byte, though of both source and destination macs. In
the typical use case for the requesting user, the bond is formed from two
onboard NICs, which typically have adjacent mac addresses, i.e.,
AA:BB:CC:DD:EE:01 and AA:BB:CC:DD:EE:02, so only the last byte is really
relevant to hash differently, but in thinking about it, a replacement NIC
because an onboard one died could have the same last byte, and maybe we
ought to just go full source mac right off the go here.

Something like this instead maybe:

static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
{
        struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
        u32 srcmac = 0;
        u16 vlan;
        int i;

        for (i = 0; i < ETH_ALEN; i++)
                srcmac = (srcmac << 8) | mac_hdr->h_source[i];

        if (!skb_vlan_tag_present(skb))
                return srcmac;

        vlan = skb_vlan_tag_get(skb);

        return vlan ^ srcmac;
}

Then the documentation is spot-on, and we're future-proof, though
marginally less performant in calculating the hash, which may have been a
consideration when the original function was written, but is probably
basically irrelevant w/modern systems...

> >  	The default value is layer2.  This option was added in bonding
> >  	version 2.6.3.  In earlier versions of bonding, this parameter
> >  	does not exist, and the layer2 policy is the only policy.  The
> 
> > +static inline u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
> 
> Can we drop the inline? It's a static function called once.

Works for me. That was also inherited by copying bond_eth_hash(). :)

> > +{
> > +	struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
> 
> I don't see anything in the patch making sure the interface actually
> has a L2 header. Should we validate somehow the ifc is Ethernet?

I don't think it's necessary. There doesn't appear to be any explicit
check for BOND_XMIT_POLICY_LAYER2 either. I believe we're guaranteed to
not have anything but an ethernet header here, as the only other type I'm
aware of being supported is Infiniband, but we limit that to active-backup
only, and xmit_hash_policy isn't valid for active-backup.
Jakub Kicinski Jan. 14, 2021, 9:23 p.m. UTC | #3
On Thu, 14 Jan 2021 16:11:41 -0500 Jarod Wilson wrote:
> In truth, this code started out as a copy of bond_eth_hash(), which also
> only uses the last byte, though of both source and destination macs. In
> the typical use case for the requesting user, the bond is formed from two
> onboard NICs, which typically have adjacent mac addresses, i.e.,
> AA:BB:CC:DD:EE:01 and AA:BB:CC:DD:EE:02, so only the last byte is really
> relevant to hash differently, but in thinking about it, a replacement NIC
> because an onboard one died could have the same last byte, and maybe we
> ought to just go full source mac right off the go here.
> 
> Something like this instead maybe:
> 
> static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
> {
>         struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
>         u32 srcmac = 0;
>         u16 vlan;
>         int i;
> 
>         for (i = 0; i < ETH_ALEN; i++)
>                 srcmac = (srcmac << 8) | mac_hdr->h_source[i];
> 
>         if (!skb_vlan_tag_present(skb))
>                 return srcmac;
> 
>         vlan = skb_vlan_tag_get(skb);
> 
>         return vlan ^ srcmac;
> }
> 
> Then the documentation is spot-on, and we're future-proof, though
> marginally less performant in calculating the hash, which may have been a
> consideration when the original function was written, but is probably
> basically irrelevant w/modern systems...

No preference, especially if bond_eth_hash() already uses the last byte.
Just make sure the choice is explained in the commit message.
Jarod Wilson Jan. 14, 2021, 9:42 p.m. UTC | #4
On Thu, Jan 14, 2021 at 01:23:14PM -0800, Jakub Kicinski wrote:
> On Thu, 14 Jan 2021 16:11:41 -0500 Jarod Wilson wrote:
> > In truth, this code started out as a copy of bond_eth_hash(), which also
> > only uses the last byte, though of both source and destination macs. In
> > the typical use case for the requesting user, the bond is formed from two
> > onboard NICs, which typically have adjacent mac addresses, i.e.,
> > AA:BB:CC:DD:EE:01 and AA:BB:CC:DD:EE:02, so only the last byte is really
> > relevant to hash differently, but in thinking about it, a replacement NIC
> > because an onboard one died could have the same last byte, and maybe we
> > ought to just go full source mac right off the go here.
> > 
> > Something like this instead maybe:
> > 
> > static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
> > {
> >         struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
> >         u32 srcmac = 0;
> >         u16 vlan;
> >         int i;
> > 
> >         for (i = 0; i < ETH_ALEN; i++)
> >                 srcmac = (srcmac << 8) | mac_hdr->h_source[i];
> > 
> >         if (!skb_vlan_tag_present(skb))
> >                 return srcmac;
> > 
> >         vlan = skb_vlan_tag_get(skb);
> > 
> >         return vlan ^ srcmac;
> > }
> > 
> > Then the documentation is spot-on, and we're future-proof, though
> > marginally less performant in calculating the hash, which may have been a
> > consideration when the original function was written, but is probably
> > basically irrelevant w/modern systems...
> 
> No preference, especially if bond_eth_hash() already uses the last byte.
> Just make sure the choice is explained in the commit message.

I've sold myself on using the full MAC, because if there's no vlan tag
present, mac is the only thing used for the hash, increasing the chances
of getting the same hash for two different interfaces, which won't happen
if we've got the full MAC. Of course, I'm not sure why someone would be
using this xmit hash outside of the very particular use-case that includes
VLANs, but people do strange things...
Jay Vosburgh Jan. 14, 2021, 9:54 p.m. UTC | #5
Jarod Wilson <jarod@redhat.com> wrote:

>On Wed, Jan 13, 2021 at 05:58:18PM -0800, Jakub Kicinski wrote:
>> On Wed, 13 Jan 2021 17:35:48 -0500 Jarod Wilson wrote:
>> > This comes from an end-user request, where they're running multiple VMs on
>> > hosts with bonded interfaces connected to some interest switch topologies,
>> > where 802.3ad isn't an option. They're currently running a proprietary
>> > solution that effectively achieves load-balancing of VMs and bandwidth
>> > utilization improvements with a similar form of transmission algorithm.
>> > 
>> > Basically, each VM has it's own vlan, so it always sends its traffic out
>> > the same interface, unless that interface fails. Traffic gets split
>> > between the interfaces, maintaining a consistent path, with failover still
>> > available if an interface goes down.
>> > 
>> > This has been rudimetarily tested to provide similar results, suitable for
>> > them to use to move off their current proprietary solution. A patch for
>> > iproute2 is forthcoming as well, to properly support the new mode there as
>> > well.
>> 
>> > Signed-off-by: Jarod Wilson <jarod@redhat.com>
>> > ---
>> > v2: verified netlink interfaces working, added Documentation, changed
>> > tx hash mode name to vlan+mac for consistency and clarity.
>> > 
>> >  Documentation/networking/bonding.rst | 13 +++++++++++++
>> >  drivers/net/bonding/bond_main.c      | 27 +++++++++++++++++++++++++--
>> >  drivers/net/bonding/bond_options.c   |  1 +
>> >  include/linux/netdevice.h            |  1 +
>> >  include/uapi/linux/if_bonding.h      |  1 +
>> >  5 files changed, 41 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
>> > index adc314639085..c78ceb7630a0 100644
>> > --- a/Documentation/networking/bonding.rst
>> > +++ b/Documentation/networking/bonding.rst
>> > @@ -951,6 +951,19 @@ xmit_hash_policy
>> >  		packets will be distributed according to the encapsulated
>> >  		flows.
>> >  
>> > +	vlan+mac

	I notice that the code calls it "VLAN_SRCMAC" but the
user-facing nomenclature is "vlan+mac"; I tend to lean towards having
the user visible name also be "vlan+srcmac".  Both for consistency, and
just in case someone someday wants "vlan+dstmac".  And you did ask for
preference on this in a separate email.

>> > +		This policy uses a very rudimentary vland ID and source mac
>> > +		ID hash to load-balance traffic per-vlan, with failover
>> > +		should one leg fail. The intended use case is for a bond
>> > +		shared by multiple virtual machines, all configured to
>> > +		use their own vlan, to give lacp-like functionality
>> > +		without requiring lacp-capable switching hardware.
>> > +
>> > +		The formula for the hash is simply
>> > +
>> > +		hash = (vlan ID) XOR (source MAC)
>> 
>> But in the code it's only using one byte of the MAC, currently.
>> 
>> I think that's fine for the particular use case but should we call out
>> explicitly in the commit message why it's considered sufficient?
>> 
>> Someone can change it later, if needed, but best if we spell out the
>> current motivation.
>
>In truth, this code started out as a copy of bond_eth_hash(), which also
>only uses the last byte, though of both source and destination macs. In
>the typical use case for the requesting user, the bond is formed from two
>onboard NICs, which typically have adjacent mac addresses, i.e.,
>AA:BB:CC:DD:EE:01 and AA:BB:CC:DD:EE:02, so only the last byte is really
>relevant to hash differently, but in thinking about it, a replacement NIC
>because an onboard one died could have the same last byte, and maybe we
>ought to just go full source mac right off the go here.

	Yah, the existing L2 hash is pretty weak.  It might be possible
to squeeze this into the existing bond_xmit_hash a bit better, if the
hash is two u32s.  The first being the first 32 bits of the MAC, and the
second being the last 16 bits of the MAC combined with the 16 bit VLAN
tag.

	There's already logic at the end of bond_xmit_hash to reduce a
u32 into the final hash that perhaps could be leveraged.  

	Thinking about it, though, all the ways to combine that data
together end up being pretty vile ("*(u32 *)&ethhdr->h_source[0]" sorts
of things).

>Something like this instead maybe:
>
>static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
>{
>        struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
>        u32 srcmac = 0;
>        u16 vlan;
>        int i;
>
>        for (i = 0; i < ETH_ALEN; i++)
>                srcmac = (srcmac << 8) | mac_hdr->h_source[i];

	I think this will shift h_source[0] and [1] into oblivion.

>        if (!skb_vlan_tag_present(skb))
>                return srcmac;
>
>        vlan = skb_vlan_tag_get(skb);
>
>        return vlan ^ srcmac;
>}
>
>Then the documentation is spot-on, and we're future-proof, though
>marginally less performant in calculating the hash, which may have been a
>consideration when the original function was written, but is probably
>basically irrelevant w/modern systems...
>
>> >  	The default value is layer2.  This option was added in bonding
>> >  	version 2.6.3.  In earlier versions of bonding, this parameter
>> >  	does not exist, and the layer2 policy is the only policy.  The
>> 
>> > +static inline u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
>> 
>> Can we drop the inline? It's a static function called once.
>
>Works for me. That was also inherited by copying bond_eth_hash(). :)
>
>> > +{
>> > +	struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
>> 
>> I don't see anything in the patch making sure the interface actually
>> has a L2 header. Should we validate somehow the ifc is Ethernet?
>
>I don't think it's necessary. There doesn't appear to be any explicit
>check for BOND_XMIT_POLICY_LAYER2 either. I believe we're guaranteed to
>not have anything but an ethernet header here, as the only other type I'm
>aware of being supported is Infiniband, but we limit that to active-backup
>only, and xmit_hash_policy isn't valid for active-backup.

	This is correct, interfaces in a bond other than active-backup
will all be ARPHRD_ETHER.  I'm unaware of a way to get a packet in there
without at least an Ethernet header.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
Jarod Wilson Jan. 15, 2021, 3:08 p.m. UTC | #6
On Thu, Jan 14, 2021 at 01:54:31PM -0800, Jay Vosburgh wrote:
> Jarod Wilson <jarod@redhat.com> wrote:
> 
> >On Wed, Jan 13, 2021 at 05:58:18PM -0800, Jakub Kicinski wrote:
> >> On Wed, 13 Jan 2021 17:35:48 -0500 Jarod Wilson wrote:
> >> > This comes from an end-user request, where they're running multiple VMs on
> >> > hosts with bonded interfaces connected to some interest switch topologies,
> >> > where 802.3ad isn't an option. They're currently running a proprietary
> >> > solution that effectively achieves load-balancing of VMs and bandwidth
> >> > utilization improvements with a similar form of transmission algorithm.
> >> > 
> >> > Basically, each VM has it's own vlan, so it always sends its traffic out
> >> > the same interface, unless that interface fails. Traffic gets split
> >> > between the interfaces, maintaining a consistent path, with failover still
> >> > available if an interface goes down.
> >> > 
> >> > This has been rudimetarily tested to provide similar results, suitable for
> >> > them to use to move off their current proprietary solution. A patch for
> >> > iproute2 is forthcoming as well, to properly support the new mode there as
> >> > well.
> >> 
> >> > Signed-off-by: Jarod Wilson <jarod@redhat.com>
> >> > ---
> >> > v2: verified netlink interfaces working, added Documentation, changed
> >> > tx hash mode name to vlan+mac for consistency and clarity.
> >> > 
> >> >  Documentation/networking/bonding.rst | 13 +++++++++++++
> >> >  drivers/net/bonding/bond_main.c      | 27 +++++++++++++++++++++++++--
> >> >  drivers/net/bonding/bond_options.c   |  1 +
> >> >  include/linux/netdevice.h            |  1 +
> >> >  include/uapi/linux/if_bonding.h      |  1 +
> >> >  5 files changed, 41 insertions(+), 2 deletions(-)
> >> > 
> >> > diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
> >> > index adc314639085..c78ceb7630a0 100644
> >> > --- a/Documentation/networking/bonding.rst
> >> > +++ b/Documentation/networking/bonding.rst
> >> > @@ -951,6 +951,19 @@ xmit_hash_policy
> >> >  		packets will be distributed according to the encapsulated
> >> >  		flows.
> >> >  
> >> > +	vlan+mac
> 
> 	I notice that the code calls it "VLAN_SRCMAC" but the
> user-facing nomenclature is "vlan+mac"; I tend to lean towards having
> the user visible name also be "vlan+srcmac".  Both for consistency, and
> just in case someone someday wants "vlan+dstmac".  And you did ask for
> preference on this in a separate email.

That's valid. I was trying to keep it short, but it does muddy the waters
a bit by not including src. I'll adjust accordingly and resend the
userspace bit too.

...
> 	Yah, the existing L2 hash is pretty weak.  It might be possible
> to squeeze this into the existing bond_xmit_hash a bit better, if the
> hash is two u32s.  The first being the first 32 bits of the MAC, and the
> second being the last 16 bits of the MAC combined with the 16 bit VLAN
> tag.
> 
> 	There's already logic at the end of bond_xmit_hash to reduce a
> u32 into the final hash that perhaps could be leveraged.  
> 
> 	Thinking about it, though, all the ways to combine that data
> together end up being pretty vile ("*(u32 *)&ethhdr->h_source[0]" sorts
> of things).

Yeah, I'd worry that bond_xmit_hash() is already getting a bit complicated
to follow and understand, and that would make it even more so.

> >Something like this instead maybe:
> >
> >static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
> >{
> >        struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
> >        u32 srcmac = 0;
> >        u16 vlan;
> >        int i;
> >
> >        for (i = 0; i < ETH_ALEN; i++)
> >                srcmac = (srcmac << 8) | mac_hdr->h_source[i];
> 
> 	I think this will shift h_source[0] and [1] into oblivion.

Argh, yep, 48 bits don't fit into a u32. Okay, so I'll replace that with a
u32 srcmac_vendor and u32 srcmac_dev, but they'll only have 24 bits of data
in them, then return vlan ^ srcmac_vendor ^ srcmac_dev, I think.
diff mbox series

Patch

diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
index adc314639085..c78ceb7630a0 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -951,6 +951,19 @@  xmit_hash_policy
 		packets will be distributed according to the encapsulated
 		flows.
 
+	vlan+mac
+
+		This policy uses a very rudimentary vland ID and source mac
+		ID hash to load-balance traffic per-vlan, with failover
+		should one leg fail. The intended use case is for a bond
+		shared by multiple virtual machines, all configured to
+		use their own vlan, to give lacp-like functionality
+		without requiring lacp-capable switching hardware.
+
+		The formula for the hash is simply
+
+		hash = (vlan ID) XOR (source MAC)
+
 	The default value is layer2.  This option was added in bonding
 	version 2.6.3.  In earlier versions of bonding, this parameter
 	does not exist, and the layer2 policy is the only policy.  The
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5fe5232cc3f3..766c09a553c1 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -164,7 +164,7 @@  module_param(xmit_hash_policy, charp, 0);
 MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3ad hashing method; "
 				   "0 for layer 2 (default), 1 for layer 3+4, "
 				   "2 for layer 2+3, 3 for encap layer 2+3, "
-				   "4 for encap layer 3+4");
+				   "4 for encap layer 3+4, 5 for vlan+mac");
 module_param(arp_interval, int, 0);
 MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
 module_param_array(arp_ip_target, charp, NULL, 0);
@@ -1434,6 +1434,8 @@  static enum netdev_lag_hash bond_lag_hash_type(struct bonding *bond,
 		return NETDEV_LAG_HASH_E23;
 	case BOND_XMIT_POLICY_ENCAP34:
 		return NETDEV_LAG_HASH_E34;
+	case BOND_XMIT_POLICY_VLAN_SRCMAC:
+		return NETDEV_LAG_HASH_VLAN_SRCMAC;
 	default:
 		return NETDEV_LAG_HASH_UNKNOWN;
 	}
@@ -3494,6 +3496,20 @@  static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
 	return true;
 }
 
+static inline u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
+{
+	struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
+	u32 srcmac = mac_hdr->h_source[5];
+	u16 vlan;
+
+	if (!skb_vlan_tag_present(skb))
+		return srcmac;
+
+	vlan = skb_vlan_tag_get(skb);
+
+	return srcmac ^ vlan;
+}
+
 /* Extract the appropriate headers based on bond's xmit policy */
 static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
 			      struct flow_keys *fk)
@@ -3501,10 +3517,14 @@  static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
 	bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34;
 	int noff, proto = -1;
 
-	if (bond->params.xmit_policy > BOND_XMIT_POLICY_LAYER23) {
+	switch (bond->params.xmit_policy) {
+	case BOND_XMIT_POLICY_ENCAP23:
+	case BOND_XMIT_POLICY_ENCAP34:
 		memset(fk, 0, sizeof(*fk));
 		return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
 					  fk, NULL, 0, 0, 0, 0);
+	default:
+		break;
 	}
 
 	fk->ports.ports = 0;
@@ -3556,6 +3576,9 @@  u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
 	    skb->l4_hash)
 		return skb->hash;
 
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
+		return bond_vlan_srcmac_hash(skb);
+
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
 	    !bond_flow_dissect(bond, skb, &flow))
 		return bond_eth_hash(skb);
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index a4e4e15f574d..deafe3587c80 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -101,6 +101,7 @@  static const struct bond_opt_value bond_xmit_hashtype_tbl[] = {
 	{ "layer2+3", BOND_XMIT_POLICY_LAYER23, 0},
 	{ "encap2+3", BOND_XMIT_POLICY_ENCAP23, 0},
 	{ "encap3+4", BOND_XMIT_POLICY_ENCAP34, 0},
+	{ "vlan+mac", BOND_XMIT_POLICY_VLAN_SRCMAC,  0},
 	{ NULL,       -1,                       0},
 };
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5b949076ed23..a94ce80a2fe1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2615,6 +2615,7 @@  enum netdev_lag_hash {
 	NETDEV_LAG_HASH_L23,
 	NETDEV_LAG_HASH_E23,
 	NETDEV_LAG_HASH_E34,
+	NETDEV_LAG_HASH_VLAN_SRCMAC,
 	NETDEV_LAG_HASH_UNKNOWN,
 };
 
diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
index 45f3750aa861..e8eb4ad03cf1 100644
--- a/include/uapi/linux/if_bonding.h
+++ b/include/uapi/linux/if_bonding.h
@@ -94,6 +94,7 @@ 
 #define BOND_XMIT_POLICY_LAYER23	2 /* layer 2+3 (IP ^ MAC) */
 #define BOND_XMIT_POLICY_ENCAP23	3 /* encapsulated layer 2+3 */
 #define BOND_XMIT_POLICY_ENCAP34	4 /* encapsulated layer 3+4 */
+#define BOND_XMIT_POLICY_VLAN_SRCMAC	5 /* vlan + source MAC */
 
 /* 802.3ad port state definitions (43.4.2.2 in the 802.3ad standard) */
 #define LACP_STATE_LACP_ACTIVITY   0x1