mbox series

[RFC,0/1] Allow MAC change on up interface

Message ID 20190815185702.30937-1-prestwoj@gmail.com (mailing list archive)
Headers show
Series Allow MAC change on up interface | expand

Message

James Prestwood Aug. 15, 2019, 6:57 p.m. UTC
This is an example of how a devices MAC address could be changed while
the interface is up. Currently RTNL and mac80211 both require the
interface be down before changing the MAC.

After poking around a bit I found that some drivers can actually
change the MAC while the iface is up. Allowing user space to do this
while the iface is up would elminate a few potential race conditions
that arise when changing the MAC from user space.

This commit does a few things:
 - Adds an EXT_FEATURE that user space can check to see if the driver
   allows this MAC changing.
 - Adds a new NL80211_ATTR for including a "random mac" to     
   CMD_CONNECT. This MAC is passed down the stack and gets set to 
   the net_device's address.
 - Set this wiphy extended feature in iwlwifi (just as an example)
 - Relax checks in mac80211 which check if the interface is running
 - Set IFF_LIVE_ADDR_CHANGE on net_device. Note: I know setting this
   where I did is likely not the right way to do it, but for this
   proof-of-concept it works. With guidance I can move this around
   to a proper place.

James Prestwood (1):
  RFC: allow mac address change on up iface

 .../net/wireless/intel/iwlwifi/mvm/mac80211.c |  2 ++
 include/net/cfg80211.h                        |  1 +
 include/uapi/linux/nl80211.h                  |  3 +++
 net/mac80211/iface.c                          |  5 ++---
 net/wireless/nl80211.c                        |  9 +++++++++
 net/wireless/sme.c                            | 20 +++++++++++++++++++
 6 files changed, 37 insertions(+), 3 deletions(-)

Comments

Jeff Johnson Aug. 15, 2019, 8:48 p.m. UTC | #1
On 2019-08-15 11:57, James Prestwood wrote:
>  - Adds a new NL80211_ATTR for including a "random mac" to
>    CMD_CONNECT. This MAC is passed down the stack and gets set to
>    the net_device's address.

My initial reaction is that I'd avoid using the term "random". For some
use cases the address may truly be random, but for other use cases it
may be derived, perhaps in an AP-specific manner. Other terms which may
be more appropriate are "spoofed mac", "administered mac", etc.

It is a shame that due to backward compatibility we can't use
NL80211_ATTR_MAC while shifting the current usage over to using
NL80211_ATTR_BSSID.
Toke Høiland-Jørgensen Aug. 16, 2019, 9:56 a.m. UTC | #2
Jeff Johnson <jjohnson@codeaurora.org> writes:

> On 2019-08-15 11:57, James Prestwood wrote:
>>  - Adds a new NL80211_ATTR for including a "random mac" to
>>    CMD_CONNECT. This MAC is passed down the stack and gets set to
>>    the net_device's address.
>
> My initial reaction is that I'd avoid using the term "random".

+1 - from the description I was expecting calls to get_random_bytes() in
the setter function :)

> For some use cases the address may truly be random, but for other use
> cases it may be derived, perhaps in an AP-specific manner. Other terms
> which may be more appropriate are "spoofed mac", "administered mac",
> etc.

"custom mac"?
Johannes Berg Aug. 19, 2019, 10:14 a.m. UTC | #3
On Thu, 2019-08-15 at 11:57 -0700, James Prestwood wrote:
> This is an example of how a devices MAC address could be changed while
> the interface is up. Currently RTNL and mac80211 both require the
> interface be down before changing the MAC.
> 
> After poking around a bit I found that some drivers can actually
> change the MAC while the iface is up. Allowing user space to do this
> while the iface is up would elminate a few potential race conditions
> that arise when changing the MAC from user space.
> 
> This commit does a few things:
>  - Adds an EXT_FEATURE that user space can check to see if the driver
>    allows this MAC changing.
>  - Adds a new NL80211_ATTR for including a "random mac" to     
>    CMD_CONNECT. This MAC is passed down the stack and gets set to 
>    the net_device's address.
>  - Set this wiphy extended feature in iwlwifi (just as an example)
>  - Relax checks in mac80211 which check if the interface is running
>  - Set IFF_LIVE_ADDR_CHANGE on net_device. Note: I know setting this
>    where I did is likely not the right way to do it, but for this
>    proof-of-concept it works. With guidance I can move this around
>    to a proper place.


It actually seems wrong to set IFF_LIVE_ADDR_CHANGE at all, because you
don't actually support that - you only support setting it while not
connected?

johannes
James Prestwood Aug. 19, 2019, 3:55 p.m. UTC | #4
On Mon, 2019-08-19 at 12:14 +0200, Johannes Berg wrote:
> On Thu, 2019-08-15 at 11:57 -0700, James Prestwood wrote:
> > This is an example of how a devices MAC address could be changed
> > while
> > the interface is up. Currently RTNL and mac80211 both require the
> > interface be down before changing the MAC.
> > 
> > After poking around a bit I found that some drivers can actually
> > change the MAC while the iface is up. Allowing user space to do
> > this
> > while the iface is up would elminate a few potential race
> > conditions
> > that arise when changing the MAC from user space.
> > 
> > This commit does a few things:
> >  - Adds an EXT_FEATURE that user space can check to see if the
> > driver
> >    allows this MAC changing.
> >  - Adds a new NL80211_ATTR for including a "random mac" to     
> >    CMD_CONNECT. This MAC is passed down the stack and gets set to 
> >    the net_device's address.
> >  - Set this wiphy extended feature in iwlwifi (just as an example)
> >  - Relax checks in mac80211 which check if the interface is running
> >  - Set IFF_LIVE_ADDR_CHANGE on net_device. Note: I know setting
> > this
> >    where I did is likely not the right way to do it, but for this
> >    proof-of-concept it works. With guidance I can move this around
> >    to a proper place.
> 
> 
> It actually seems wrong to set IFF_LIVE_ADDR_CHANGE at all, because
> you
> don't actually support that - you only support setting it while not
> connected?

You are right, we only care about setting the MAC while not connected.
But, the eth_ API's that set the MAC are contingent on
IFF_LIVE_ADDR_CHANGE when the interface is running. If you follow down
'dev_set_mac_address':

dev_set_mac_address ->
   ndo_set_mac_address (ieee80211_change_mac) ->
      eth_mac_addr ->
         eth_prepare_mac_addr_change:

You see the check for:

!(dev->priv_flags & IFF_LIVE_ADDR_CHANGE) && netif_running(dev)

Like I said in my commit message, I did not think setting
IFF_LIVE_ADDR_CHANGE where I did was the correct way to do it, but
unless this eth code is changed its looking like it does need to be set
somewhere to change the MAC while 'running'.

Maybe this is a historical thing but the comment about
IFF_LIVE_ADDR_CHANGE says "device supports hardware address change when
it's running". Isn't a wireless adapter 'running' when not connected?
Or does 'running' indicate some different state than up/down?

If you have any suggestions on how I could do this without setting
IFF_LIVE_ADDR_CHANGE I am all ears.

Thanks,
James

> 
> johannes
>
Johannes Berg Aug. 19, 2019, 8:20 p.m. UTC | #5
Hi James,

> > It actually seems wrong to set IFF_LIVE_ADDR_CHANGE at all, because
> > you
> > don't actually support that - you only support setting it while not
> > connected?
> 
> You are right, we only care about setting the MAC while not connected.
> But, the eth_ API's that set the MAC are contingent on
> IFF_LIVE_ADDR_CHANGE when the interface is running. If you follow down
> 'dev_set_mac_address':
> 
> dev_set_mac_address ->
>    ndo_set_mac_address (ieee80211_change_mac) ->
>       eth_mac_addr ->
>          eth_prepare_mac_addr_change:
> 
> You see the check for:
> 
> !(dev->priv_flags & IFF_LIVE_ADDR_CHANGE) && netif_running(dev)

Right.

> Like I said in my commit message, I did not think setting
> IFF_LIVE_ADDR_CHANGE where I did was the correct way to do it, but
> unless this eth code is changed its looking like it does need to be set
> somewhere to change the MAC while 'running'.

Also right.

> Maybe this is a historical thing but the comment about
> IFF_LIVE_ADDR_CHANGE says "device supports hardware address change when
> it's running". Isn't a wireless adapter 'running' when not connected?
> Or does 'running' indicate some different state than up/down?
> 
> If you have any suggestions on how I could do this without setting
> IFF_LIVE_ADDR_CHANGE I am all ears.

I don't, short of

1) don't do that then
2) extend the network stack to have IFF_LIVE_BUT_NO_CARRIER_ADDR_CHANGE
   or something like that

TBH, I'm not really sure I see any point in this to start with, many
devices will give the address to the firmware when the interface is
brought up (even iwlwifi does - I'm not sure we'd want to take your
patch for iwlwifi even if that works today, nothing says the firmware
might not change its mind on that), and so it's quite likely only going
to be supported in few devices.

You've also not really explained what exactly is troubling you with
changing the MAC address, you just mentioned some sort of "race
condition"?

Now, one thing I can imagine would be that you'd want to optimize

 * ifdown
   - remove iface from fw/hw
   - stop fw/hw
 * change MAC
 * ifup
   - start fw/hw
   - add iface to fw/hw

to just

 * ifdown
   - remove iface from fw/hw
 * change MAC
 * ifup
   - add iface to fw/hw

i.e. not restart the firmware (which takes time) for all this, but that
seems much easier to solve by e.g. having a combined operation for all
of this that gets handled in mac80211, or more generally by having a
"please keep the firmware running" token that you can hold while you do
the operation?


Your changes are also a bit strange - you modified the "connect" path
and iwlwifi, but the connect path is not usually (other than with iw or
even iwconfig) taken for iwlwifi? And if you modify auth/assoc paths,
you get into even weirder problems - what if you use different addresses
for auth and assoc? What if the assoc (or even connect) really was a
*re*assoc, and thus must have the same MAC address? To me, the whole
thing seems like more of a problem than a solution.

johannes
Denis Kenzior Aug. 19, 2019, 8:58 p.m. UTC | #6
Hi Johannes,

> TBH, I'm not really sure I see any point in this to start with, many
> devices will give the address to the firmware when the interface is
> brought up (even iwlwifi does - I'm not sure we'd want to take your
> patch for iwlwifi even if that works today, nothing says the firmware
> might not change its mind on that), and so it's quite likely only going
> to be supported in few devices.

Hmm... I sense a pattern of you not seeing a point in doing many 
things... Do you actually use the stuff you maintain?

> 
> You've also not really explained what exactly is troubling you with
> changing the MAC address, you just mentioned some sort of "race
> condition"?

Well, one possible use case might be, oh something like this:

https://source.android.com/devices/tech/connect/wifi-mac-randomization

> 
> Now, one thing I can imagine would be that you'd want to optimize
> 
>   * ifdown
>     - remove iface from fw/hw
>     - stop fw/hw
>   * change MAC
>   * ifup
>     - start fw/hw
>     - add iface to fw/hw
> 
> to just
> 
>   * ifdown
>     - remove iface from fw/hw
>   * change MAC
>   * ifup
>     - add iface to fw/hw
> 

That would be a part of it...

> i.e. not restart the firmware (which takes time) for all this, but that
> seems much easier to solve by e.g. having a combined operation for all
> of this that gets handled in mac80211, or more generally by having a
> "please keep the firmware running" token that you can hold while you do
> the operation?

And also maybe a bunch of other optimizations like not flushing scan 
results?

> 
> Your changes are also a bit strange - you modified the "connect" path
> and iwlwifi, but the connect path is not usually (other than with iw or
> even iwconfig) taken for iwlwifi? And if you modify auth/assoc paths,
> you get into even weirder problems - what if you use different addresses
> for auth and assoc? What if the assoc (or even connect) really was a
> *re*assoc, and thus must have the same MAC address? To me, the whole
> thing seems like more of a problem than a solution.
> 

Okay, so there are some obstacles.  But can you get over the whole 
"Don't hold it like this" part and actually offer up something constructive?

Regards,
-Denis
James Prestwood Aug. 19, 2019, 9:14 p.m. UTC | #7
Hi Johannes,

Without reiterating what Denis said:

<snip>

> I don't, short of
> 
> 1) don't do that then
> 2) extend the network stack to have
> IFF_LIVE_BUT_NO_CARRIER_ADDR_CHANGE
>    or something like that

So you mean 2 is my only option... ;) I am fine with this.

> 
> TBH, I'm not really sure I see any point in this to start with, many
> devices will give the address to the firmware when the interface is
> brought up (even iwlwifi does - I'm not sure we'd want to take your
> patch for iwlwifi even if that works today, nothing says the firmware
> might not change its mind on that), and so it's quite likely only
> going
> to be supported in few devices.

The iwlwifi change was just an example. It ultimately would be up to
the maintainers of each driver to support this or not. Regardless,
doing the ground work for a driver/firmware to support this is more
valuable than continuing to neglect these quirks that make use of
nl80211 difficult/racy.

> 
> You've also not really explained what exactly is troubling you with
> changing the MAC address, you just mentioned some sort of "race
> condition"?

In order to change the MAC on a per-AP/SSID is to: ifdown, change MAC,
ifup via RTNL. The problem is that ifdown generates an RTNL link down
event and there is no way of knowing how this event was generated (by
you, hot-unplug, or some other issue in kernel?). Handling this without
a race is simply not possible. You sort of just have to pray none of
this happens (and its unlikely but it *could* happen).

Besides efficiency another obvious reason for this change is simply
ease of use. If the hardware supports doing this then why should
userspace have to jump through hoops to accomplish it?

> 
> Now, one thing I can imagine would be that you'd want to optimize
> 
>  * ifdown
>    - remove iface from fw/hw
>    - stop fw/hw
>  * change MAC
>  * ifup
>    - start fw/hw
>    - add iface to fw/hw
> to just
> 
>  * ifdown
>    - remove iface from fw/hw
>  * change MAC
>  * ifup
>    - add iface to fw/hw
> 
> i.e. not restart the firmware (which takes time) for all this, but
> that
> seems much easier to solve by e.g. having a combined operation for
> all
> of this that gets handled in mac80211, or more generally by having a
> "please keep the firmware running" token that you can hold while you
> do
> the operation?
> 
> 
> Your changes are also a bit strange - you modified the "connect" path
> and iwlwifi, but the connect path is not usually (other than with iw
> or
> even iwconfig) taken for iwlwifi? And if you modify auth/assoc paths,
> you get into even weirder problems - what if you use different
> addresses
> for auth and assoc? What if the assoc (or even connect) really was a
> *re*assoc, and thus must have the same MAC address? To me, the whole
> thing seems like more of a problem than a solution.

The connect path is just what we (IWD) use for almost all types of
connections that support it (apart from things like SAE/OWE/FT). Not
sure what you mean for "usually not taken for iwlwifi"? If you have an
iwlwifi card and you issue CMD_CONNECT thats the path it takes...

Thanks,
James

> 
> johannes
>
Johannes Berg Aug. 20, 2019, 6:59 a.m. UTC | #8
Hi James,

Thanks for staying on topic.

> > I don't, short of
> > 
> > 1) don't do that then
> > 2) extend the network stack to have
> > IFF_LIVE_BUT_NO_CARRIER_ADDR_CHANGE
> >    or something like that
> 
> So you mean 2 is my only option... ;) I am fine with this.

:-)

I thought so, but I had another thought later. It might be possible to
set LIVE_ADDR_CHANGE, but then block it in mac80211 when the interface
is already connected (or beaconing, or whatever, using the MAC address
in some way - even while scanning, remain-on-channel is active, etc.)

I still think you'd have to bake it into the mac80211<->driver API
somehow, because we normally "add_interface()" with the MAC address, and
nothing says that the driver cannot ignore the MAC address from that
point on. The fact that iwlwifi just copies it into every new MAC_CTXT
command and the firmware actually accepts the update seems rather
accidental and therefore fragile to rely on.

> The iwlwifi change was just an example. It ultimately would be up to
> the maintainers of each driver to support this or not. 

Sure. I was just trying to say what I wrote one paragraph up.

> > You've also not really explained what exactly is troubling you with
> > changing the MAC address, you just mentioned some sort of "race
> > condition"?
> 
> In order to change the MAC on a per-AP/SSID is to: ifdown, change MAC,
> ifup via RTNL. The problem is that ifdown generates an RTNL link down
> event and there is no way of knowing how this event was generated (by
> you, hot-unplug, or some other issue in kernel?). Handling this without
> a race is simply not possible. You sort of just have to pray none of
> this happens (and its unlikely but it *could* happen).

I see, at least sort of. I'm having a hard time seeing how this really
is a problem in practice, but I suppose that's because I haven't tried
implementing a fully event-driven stack.

> The connect path is just what we (IWD) use for almost all types of
> connections that support it (apart from things like SAE/OWE/FT). Not
> sure what you mean for "usually not taken for iwlwifi"? If you have an
> iwlwifi card and you issue CMD_CONNECT thats the path it takes...

Interesting. I didn't think you'd do that, since it gives you far less
control over things, and you need the other paths anyway for the
features you mention, and the implementation in cfg80211 is far less
complete than a typical firmware implementation would be.

johannes
Johannes Berg Aug. 20, 2019, 8:59 a.m. UTC | #9
On Mon, 2019-08-19 at 15:58 -0500, Denis Kenzior wrote:
> Hi Johannes,

[...]

> Hmm... I sense a pattern of you not seeing a point in doing many 
> things... Do you actually use the stuff you maintain?

[...]

> Well, one possible use case might be, oh something like this:
> 
> https://source.android.com/devices/tech/connect/wifi-mac-randomization

[...]

> And also maybe a bunch of other optimizations like not flushing scan 
> results?

Stop.

Your tone, and in particular the constant snide comments and attacks on
me are, quite frankly, getting extremely tiring.

It almost seems like you're just trying to bully me into taking your
patches by constantly trying to make me feel that I cannot know better
anyway. This is not how you should be treating anyone.

Look, I did say I don't see a point in this, but you're taking that out
of context. I also stated that I didn't understand the whole thing about
"race conditions" and all, because nobody actually explained the
reasoning behind the changes here.

James, unlike you, managed to reply on point and explain why it was
needed. If all you can do is accuse me of not using the software and
therefore not knowing how it should be used, even implying that I'm not
smart enough to understand the use cases, then I don't know why you
bother replying at all.

I can understand your frustration to some extent, and I want to give you
the benefit of doubt and want to believe this behaviour was borne out of
it, since I've been reviewing your changes relatively critically.

However, I also want to believe that I've been (trying to) keep the
discussion on a technical level, telling you why I believe some things
shouldn't be done the way you did them, rather than telling you that
you're not smart enough to be working on this. If you feel otherwise,
please do let me know and I'll go back to understand, in order to
improve my behaviour in the future.

Please help keep the discussion technical, without demeaning undertones.

Thanks,
johannes
Denis Kenzior Aug. 20, 2019, 3:40 p.m. UTC | #10
Hi Johannes,

> Stop.
> 
> Your tone, and in particular the constant snide comments and attacks on
> me are, quite frankly, getting extremely tiring.
> 

Look, I'm sorry I hit a nerve, but from where I am sitting, it had to be 
said...

Regardless.  Peace, I'm not trying to start drama here.  We just want to 
improve things and it feels like you're shutting down every idea without 
truly understanding the issues we're facing.

> It almost seems like you're just trying to bully me into taking your
> patches by constantly trying to make me feel that I cannot know better
> anyway. This is not how you should be treating anyone.
> 

Before lecturing me on my tone, can you go back and re-read your 
responses to many of the contributors here?  I mean really read them? 
Your tone is quite dismissive, whether intentional or not.  When one of 
my guys comes to me and says:
	"Johannes' response was completely useless, it feels like he didn't 
even read my cover letter"

I will come out and call you on it.  So if you don't mean to come off 
that way, great.  We'll just chalk it up to a mis-understanding.

> Look, I did say I don't see a point in this, but you're taking that out
> of context. I also stated that I didn't understand the whole thing about
> "race conditions" and all, because nobody actually explained the
> reasoning behind the changes here.

Fine.  I get that.  But how about asking what the use case is? Or say 
you don't quite understand why something is needed?  We'll happily 
explain.  When the first reaction to an RFC is: "I don't see the point" 
or "You're doing it wrong" from a maintainer who's job (by definition) 
is to encourage new contributors and improve the subsystem he 
maintains...?  Well that's kind of a downer, don't you agree?  You're 
the maintainer and you should be held to a higher standard...

I maintain 3 projects, I know the job isn't great, but you still should 
be (more) civil to people...

> 
> James, unlike you, managed to reply on point and explain why it was
> needed. If all you can do is accuse me of not using the software and
> therefore not knowing how it should be used, even implying that I'm not
> smart enough to understand the use cases, then I don't know why you
> bother replying at all.

Good on James.  I council all my guys to keep cool when dealing with 
upstream.  But that doesn't mean you should be dismissing things out of 
hand on the very first interaction you have with a new contributor.

> 
> I can understand your frustration to some extent, and I want to give you
> the benefit of doubt and want to believe this behaviour was borne out of
> it, since I've been reviewing your changes relatively critically.
> 

Good.  I want you to do that.  The changes are in very tricky areas and 
you know the code best.

> However, I also want to believe that I've been (trying to) keep the
> discussion on a technical level, telling you why I believe some things
> shouldn't be done the way you did them, rather than telling you that
> you're not smart enough to be working on this. If you feel otherwise,
> please do let me know and I'll go back to understand, in order to
> improve my behaviour in the future.

If you interpreted my rants as an assault to your intelligence, then I'm 
sorry.  They really were not meant this way.  But sometimes we really 
had to wonder if you were using the same API we were?  So the question I 
asked above was purely logical consequence of what I was seeing.

You yourself admitted that you have never implemented an event driven 
stack.  So how about listening to the guys that are?

We are using your APIs in different ways.  So instead of questioning why 
or attacking those ways, how about asking whether improvements can be made?

We are facing serious pain points with the API.  So instead of 
dismissing things out of hand, can we work together to improve things?

We are trying to make things fast.  The API is frequently not setup for 
that.  The MAC randomization is just one example.  Bringing down the 
interface (and shutting down the firmware, toggling power state, 
cleaning up resources/state) prior to every connection is just not 
acceptable.  Look at the link I sent.  The Android guys state 3 seconds 
is the typical 'hit'.  This is literally wasting everyone's time.

> 
> Please help keep the discussion technical, without demeaning undertones.

Point taken.  And I apologize again.  But please consider what I said above.

Regards,
-Denis
Dan Williams Aug. 20, 2019, 5:53 p.m. UTC | #11
On Tue, 2019-08-20 at 10:40 -0500, Denis Kenzior wrote:
> Hi Johannes,
> 
> > Stop.
> > 
> > Your tone, and in particular the constant snide comments and
> > attacks on
> > me are, quite frankly, getting extremely tiring.
> > 
> 
> Look, I'm sorry I hit a nerve, but from where I am sitting, it had to
> be 
> said...

But did it really? And in that way?  There were certainly better ways
to go about that response.

I don't recall seeing a NAK anywhere his email chain (which you'd get
with some other kernel maintainers) but instead (a) an explanation of
why the proposed solution had some problems, (b) some alternative
possibilities and (c) requests for more information so the discussion
could continue.

It does the requested changes no good to take that kind of tone. Let's
move on from here and keep things constructive to solve the problem at
hand, which is:

"Changing the MAC address of a WiFi interface takes longer than I'd
like and clears some state that I'd like it to keep."

That is a technical problem we can solve, so let's keep it at that
level.

Dan

> Regardless.  Peace, I'm not trying to start drama here.  We just want
> to 
> improve things and it feels like you're shutting down every idea
> without 
> truly understanding the issues we're facing.
> 
> > It almost seems like you're just trying to bully me into taking
> > your
> > patches by constantly trying to make me feel that I cannot know
> > better
> > anyway. This is not how you should be treating anyone.
> > 
> 
> Before lecturing me on my tone, can you go back and re-read your 
> responses to many of the contributors here?  I mean really read
> them? 
> Your tone is quite dismissive, whether intentional or not.  When one
> of 
> my guys comes to me and says:
> 	"Johannes' response was completely useless, it feels like he
> didn't 
> even read my cover letter"
> 
> I will come out and call you on it.  So if you don't mean to come
> off 
> that way, great.  We'll just chalk it up to a mis-understanding.
> 
> > Look, I did say I don't see a point in this, but you're taking that
> > out
> > of context. I also stated that I didn't understand the whole thing
> > about
> > "race conditions" and all, because nobody actually explained the
> > reasoning behind the changes here.
> 
> Fine.  I get that.  But how about asking what the use case is? Or
> say 
> you don't quite understand why something is needed?  We'll happily 
> explain.  When the first reaction to an RFC is: "I don't see the
> point" 
> or "You're doing it wrong" from a maintainer who's job (by
> definition) 
> is to encourage new contributors and improve the subsystem he 
> maintains...?  Well that's kind of a downer, don't you
> agree?  You're 
> the maintainer and you should be held to a higher standard...
> 
> I maintain 3 projects, I know the job isn't great, but you still
> should 
> be (more) civil to people...
> 
> > James, unlike you, managed to reply on point and explain why it was
> > needed. If all you can do is accuse me of not using the software
> > and
> > therefore not knowing how it should be used, even implying that I'm
> > not
> > smart enough to understand the use cases, then I don't know why you
> > bother replying at all.
> 
> Good on James.  I council all my guys to keep cool when dealing with 
> upstream.  But that doesn't mean you should be dismissing things out
> of 
> hand on the very first interaction you have with a new contributor.
> 
> > I can understand your frustration to some extent, and I want to
> > give you
> > the benefit of doubt and want to believe this behaviour was borne
> > out of
> > it, since I've been reviewing your changes relatively critically.
> > 
> 
> Good.  I want you to do that.  The changes are in very tricky areas
> and 
> you know the code best.
> 
> > However, I also want to believe that I've been (trying to) keep the
> > discussion on a technical level, telling you why I believe some
> > things
> > shouldn't be done the way you did them, rather than telling you
> > that
> > you're not smart enough to be working on this. If you feel
> > otherwise,
> > please do let me know and I'll go back to understand, in order to
> > improve my behaviour in the future.
> 
> If you interpreted my rants as an assault to your intelligence, then
> I'm 
> sorry.  They really were not meant this way.  But sometimes we
> really 
> had to wonder if you were using the same API we were?  So the
> question I 
> asked above was purely logical consequence of what I was seeing.
> 
> You yourself admitted that you have never implemented an event
> driven 
> stack.  So how about listening to the guys that are?
> 
> We are using your APIs in different ways.  So instead of questioning
> why 
> or attacking those ways, how about asking whether improvements can be
> made?
> 
> We are facing serious pain points with the API.  So instead of 
> dismissing things out of hand, can we work together to improve
> things?
> 
> We are trying to make things fast.  The API is frequently not setup
> for 
> that.  The MAC randomization is just one example.  Bringing down the 
> interface (and shutting down the firmware, toggling power state, 
> cleaning up resources/state) prior to every connection is just not 
> acceptable.  Look at the link I sent.  The Android guys state 3
> seconds 
> is the typical 'hit'.  This is literally wasting everyone's time.
> 
> > Please help keep the discussion technical, without demeaning
> > undertones.
> 
> Point taken.  And I apologize again.  But please consider what I said
> above.
> 
> Regards,
> -Denis
Denis Kenzior Aug. 20, 2019, 6:21 p.m. UTC | #12
Hi Dan,

On 8/20/19 12:53 PM, Dan Williams wrote:
> On Tue, 2019-08-20 at 10:40 -0500, Denis Kenzior wrote:
>> Hi Johannes,
>>
>>> Stop.
>>>
>>> Your tone, and in particular the constant snide comments and
>>> attacks on
>>> me are, quite frankly, getting extremely tiring.
>>>
>>
>> Look, I'm sorry I hit a nerve, but from where I am sitting, it had to
>> be
>> said...
> 
> But did it really? And in that way?  There were certainly better ways
> to go about that response.

The issue is that this isn't the first such occurrence.  There is a 
pattern here and it needs to change.  So +1 on handling this better.

> 
> I don't recall seeing a NAK anywhere his email chain (which you'd get
> with some other kernel maintainers) but instead (a) an explanation of
> why the proposed solution had some problems, (b) some alternative
> possibilities and (c) requests for more information so the discussion
> could continue.

So the cover letter states:
"Set IFF_LIVE_ADDR_CHANGE on net_device. Note: I know setting this
    where I did is likely not the right way to do it, but for this
    proof-of-concept it works. With guidance I can move this around
    to a proper place."

and I'll leave it up to you to read the first response from the maintainer.

> 
> It does the requested changes no good to take that kind of tone. Let's

Neither is:
"don't do that then"

or

"I'm not really sure I see any point in this to start with"

or

"To me, the whole thing seems like more of a problem than a solution."

> move on from here and keep things constructive to solve the problem at
> hand, which is:
> 
> "Changing the MAC address of a WiFi interface takes longer than I'd
> like and clears some state that I'd like it to keep."
> 
> That is a technical problem we can solve, so let's keep it at that
> level.
> 

I'm all for moving on and having the people that know this stuff well 
giving actual guidance, as was requested originally.

Regards,
-Denis
Toke Høiland-Jørgensen Aug. 20, 2019, 6:54 p.m. UTC | #13
Denis Kenzior <denkenz@gmail.com> writes:

> Hi Dan,
>
> On 8/20/19 12:53 PM, Dan Williams wrote:
>> On Tue, 2019-08-20 at 10:40 -0500, Denis Kenzior wrote:
>>> Hi Johannes,
>>>
>>>> Stop.
>>>>
>>>> Your tone, and in particular the constant snide comments and
>>>> attacks on
>>>> me are, quite frankly, getting extremely tiring.
>>>>
>>>
>>> Look, I'm sorry I hit a nerve, but from where I am sitting, it had to
>>> be
>>> said...
>> 
>> But did it really? And in that way?  There were certainly better ways
>> to go about that response.
>
> The issue is that this isn't the first such occurrence.  There is a 
> pattern here and it needs to change.  So +1 on handling this better.
>
>> 
>> I don't recall seeing a NAK anywhere his email chain (which you'd get
>> with some other kernel maintainers) but instead (a) an explanation of
>> why the proposed solution had some problems, (b) some alternative
>> possibilities and (c) requests for more information so the discussion
>> could continue.
>
> So the cover letter states:
> "Set IFF_LIVE_ADDR_CHANGE on net_device. Note: I know setting this
>     where I did is likely not the right way to do it, but for this
>     proof-of-concept it works. With guidance I can move this around
>     to a proper place."
>
> and I'll leave it up to you to read the first response from the
> maintainer.

I went back and re-read the whole thread, and all I see is Johannes and
James having a technical discussion, and you barging in with accusations
of bad faith. So yeah, going to agree with Dan here; you were not
"hitting a nerve", you were just being rude.

-Toke
Denis Kenzior Aug. 20, 2019, 7:22 p.m. UTC | #14
Hi Johannes,

So keeping things purely technical now :)

> I thought so, but I had another thought later. It might be possible to
> set LIVE_ADDR_CHANGE, but then block it in mac80211 when the interface
> is already connected (or beaconing, or whatever, using the MAC address
> in some way - even while scanning, remain-on-channel is active, etc.)
> 

Here's what we would like to see:

- The ability for userspace to add a 'Local Mac Address to use' 
attribute on CMD_CONNECT and CMD_AUTHENTICATE.
- It doesn't seem useful to add this to CMD_ASSOCIATE at the moment, as 
for new connections you'd always go either through CMD_AUTHENTICATE, 
CMD_ASSOCIATE sequence or use CMD_CONNECT.  This should take care of 
some (most) of your objections about specifying different addresses for 
authenticate/associate.  Feel free to correct me here.
- For Fast Transitions, Re-associations, roaming, etc userspace would 
simply omit the 'Local Mac Address to use' attribute and the currently 
set address would be used.
- Optionally (and I'm thinking of tools like NM here), add the ability 
to set the mac address via RTNL while the device is UP but has no 
carrier, and things like scanning, offchannel, etc are not in progress. 
Though really I don't see how NM could guarantee this without bringing 
the device down first, so I'll let NM guys chime in to see if this is a 
good idea.
- We definitely do not want to to mess with the device state otherwise. 
E.g. no firmware downloading, powering the adapter down, etc.  That just 
takes too much time.

The goal here is to keep things as fast as possible.  The randomization 
feature is basically standard on every modern OS.  So users expect it to 
be used on every connection.  Slowing down the connection time by up to 
3 seconds just isn't acceptable.

So tell us what you would like to see.  A new 
IFF_NO_CARRIER_ADDRESS_CHANGE or whether it is possible to use 
IFF_LIVE_ADDR_CHANGE with some additional restrictions.

> I still think you'd have to bake it into the mac80211<->driver API
> somehow, because we normally "add_interface()" with the MAC address, and
> nothing says that the driver cannot ignore the MAC address from that
> point on. The fact that iwlwifi just copies it into every new MAC_CTXT
> command and the firmware actually accepts the update seems rather
> accidental and therefore fragile to rely on.
> 

Since you seem to have a clear(er?) idea here, can you elaborate or 
possibly provide the driver interface changes you want to see?

Regards,
-Denis
Johannes Berg Aug. 20, 2019, 7:32 p.m. UTC | #15
Hi Denis,

Rather than replying to all the separate items here, just two things:

1) I'll reiterate - please keep things civil. You're taking things out
   of context a *lot* here, in what seems like an attempt to draw a
   parallel between my and your utterances.

2) I'll take your point that I've been somewhat dismissive in tone, and
   will try to change that.


I do want to reply to two things specifically though:

> Fine.  I get that.  But how about asking what the use case is? Or say 
> you don't quite understand why something is needed?  

Really, I should *not* have to ask that. You should consider it your
obligation to provide enough information for me to review patches
without having to go back and ask what it is you actually want to
achieve.

Compared to some other subsystems and maintainers I've dealt with, I
think I've actually been rather patient in trying to extract the purpose
of your changes, rather than *really* just dismissing them (which I've
felt like many times.)

> a maintainer who's job (by definition) 
> is to encourage new contributors and improve the subsystem he 
> maintains...?

This is what maybe you see as the maintainer's role, but I disagree, at
least to some extent. I see the role more of a supervisor, somebody
who's ensuring that all the changes coming in will work together. Yes, I
have my own incentive to improve things, but if you have a particular
incentive to improve a particular use case, the onus really is on you to
convince me of that, not on me to try to ferret the reasoning out of you
and make those improvements my own.


So please - come with some actual reasoning. This particular thread only
offered "would elminate a few potential race conditions", in the cover
letter, not even the patch itself, so it wasn't very useful. I was
perhaps less than courteous trying to extract the reasoning, but I
shouldn't have to in the first place.

Thanks,
johannes
Johannes Berg Aug. 20, 2019, 7:43 p.m. UTC | #16
On Tue, 2019-08-20 at 14:22 -0500, Denis Kenzior wrote:
> Hi Johannes,
> 
> So keeping things purely technical now :)
> 
> > I thought so, but I had another thought later. It might be possible to
> > set LIVE_ADDR_CHANGE, but then block it in mac80211 when the interface
> > is already connected (or beaconing, or whatever, using the MAC address
> > in some way - even while scanning, remain-on-channel is active, etc.)
> > 
> 
> Here's what we would like to see:
> 
> - The ability for userspace to add a 'Local Mac Address to use' 
> attribute on CMD_CONNECT and CMD_AUTHENTICATE.

Why here though? I don't really like this as it's extra complexity
there, and dev_set_mac_address() is really easy to call from userspace.
Yes, that's sort of a round-trip more (you wouldn't even really have to
wait for it I guess), but we have to make trade-offs like that (vs.
kernel complexity) at some point.

> - It doesn't seem useful to add this to CMD_ASSOCIATE at the moment, as 
> for new connections you'd always go either through CMD_AUTHENTICATE, 
> CMD_ASSOCIATE sequence or use CMD_CONNECT.  This should take care of 
> some (most) of your objections about specifying different addresses for 
> authenticate/associate.  Feel free to correct me here.

That wasn't really my objection, I was more wondering why James
implemented it *only* for CMD_CONNECT, when I had been under the
(apparently mistaken) impression that one would generally prefer
CMD_ASSOCIATE over CMD_CONNECT, if available.

> - Optionally (and I'm thinking of tools like NM here), add the ability 
> to set the mac address via RTNL while the device is UP but has no 
> carrier, and things like scanning, offchannel, etc are not in progress. 
> Though really I don't see how NM could guarantee this without bringing 
> the device down first, so I'll let NM guys chime in to see if this is a 
> good idea.

I'm thinking along the lines of letting you do this *instead* of the
scheme you described above. That way, we don't even need to have a
discussion over whether it makes sense at CONNECT, AUTH, ASSOC, or
whatever because you can essentially do it before/with any of these
commands.

Why would this not be sufficient?

> - We definitely do not want to to mess with the device state otherwise. 
> E.g. no firmware downloading, powering the adapter down, etc.  That just 
> takes too much time.

I can understand this part.

> So tell us what you would like to see.  A new 
> IFF_NO_CARRIER_ADDRESS_CHANGE or whether it is possible to use 
> IFF_LIVE_ADDR_CHANGE with some additional restrictions.

I don't know. This is not something I'm intimately familiar with. I
could imagine (as you quoted above) that we just set
IFF_LIVE_ADDR_CHANGE and manage the exclusions local to e.g. mac80211.

> > I still think you'd have to bake it into the mac80211<->driver API
> > somehow, because we normally "add_interface()" with the MAC address, and
> > nothing says that the driver cannot ignore the MAC address from that
> > point on. The fact that iwlwifi just copies it into every new MAC_CTXT
> > command and the firmware actually accepts the update seems rather
> > accidental and therefore fragile to rely on.
> > 
> 
> Since you seem to have a clear(er?) idea here, can you elaborate or 
> possibly provide the driver interface changes you want to see?

I can see a few ways of doing this, for example having an optional
"change_vif_addr" method in the mac80211 ops, so that drivers can do the
necessary work. I suppose iwlwifi would actually want to send a new
MAC_CONTEXT command at this time, for example, because the firmware is
notoriously finicky when it comes to command sequences.

Alternatively, and this would work with all drivers, you could just
pretend to remove/add the interface, ie. in mac80211 call

 drv_remove_interface(sdata)
 // set new mac addr
 drv_add_interface(sdata)

This has the advantage that it'll be guaranteed to work with all
drivers, at the expense of perhaps a few more firmware commands
(depending on the driver).

You can probably come up with other ways, like having a feature flag
whether this is supported and then the driver has to detect it as
certain documented points in time, etc., but all of those feel
relatively fragile to me.


My personal preference would be for

 * allow RTNL MAC address changes at "idle" times (TBD what that really
   means) with IFF_LIVE_ADDR_CHANGE, but mac80211 guarding it
 * mac80211 doing remove/add as far as the driver is concerned

Yes, you can probably shave a few more milliseconds off by adding more
complexity, but unless we measure that and find it to be significant, I
think the added complexity (in cfg80211 code) and restrictions (many
drivers not supporting it if it's opt-in) wouldn't be worth it.

johannes
Denis Kenzior Aug. 20, 2019, 7:46 p.m. UTC | #17
Hi Johannes,

On 8/20/19 2:32 PM, Johannes Berg wrote:
> Hi Denis,
> 
> Rather than replying to all the separate items here, just two things:
> 
> 1) I'll reiterate - please keep things civil. You're taking things out
>     of context a *lot* here, in what seems like an attempt to draw a
>     parallel between my and your utterances.
> 
> 2) I'll take your point that I've been somewhat dismissive in tone, and
>     will try to change that.
> 

I'll apologize for the methods I used, but you were not getting to 2) 
above via our earlier discussions.  Anyway, peace.

> 
> I do want to reply to two things specifically though:
> 
>> Fine.  I get that.  But how about asking what the use case is? Or say
>> you don't quite understand why something is needed?
> 
> Really, I should *not* have to ask that. You should consider it your
> obligation to provide enough information for me to review patches
> without having to go back and ask what it is you actually want to
> achieve.

So let me ask you this.  What do you _want_ to see when a contributor 
submits something as an RFC, which that contributor states is not ready 
for 'true' review and is really there to generate feedback?

Do you want to have that contributor use a different prefix? Every 
maintainer is differrent.  I get that.  So if we want to start an actual 
brainstorming session with you, how do we accomplish that?

> 
> Compared to some other subsystems and maintainers I've dealt with, I
> think I've actually been rather patient in trying to extract the purpose
> of your changes, rather than *really* just dismissing them (which I've
> felt like many times.)
> 

I'll admit you're not the worst I've dealt with, but you can always 
improve, right? :)

>> a maintainer who's job (by definition)
>> is to encourage new contributors and improve the subsystem he
>> maintains...?
> 
> This is what maybe you see as the maintainer's role, but I disagree, at
> least to some extent. I see the role more of a supervisor, somebody
> who's ensuring that all the changes coming in will work together. Yes, I
> have my own incentive to improve things, but if you have a particular
> incentive to improve a particular use case, the onus really is on you to
> convince me of that, not on me to try to ferret the reasoning out of you
> and make those improvements my own.
> 

So this goes back to my earlier point.  How do you want us to start a 
discussion with you such that you don't become a 'supervisor' and 
instead try to understand the pain points first?

And really, we are not expecting you to do the improvements on your own. 
  But you know the subsystem best.  So you really should consider giving 
actual guidance on how to accomplish something in the best way.

Also, look at it from the PoV of any new contributor.  So while I can 
definitely relate to what you're saying here, I think you should put 
yourself in your peer's shoes and try to be more understanding of their 
perspective.  At least make the effort to hear people out...

> 
> So please - come with some actual reasoning. This particular thread only
> offered "would elminate a few potential race conditions", in the cover
> letter, not even the patch itself, so it wasn't very useful. I was
> perhaps less than courteous trying to extract the reasoning, but I
> shouldn't have to in the first place.
> 

Okay, so we'll work on that.  But this is a two way street too.  And 
sometimes it seems like you're not actually reading the cover letters ;)

Regards,
-Denis
James Prestwood Aug. 20, 2019, 7:53 p.m. UTC | #18
On Tue, 2019-08-20 at 08:59 +0200, Johannes Berg wrote:
> Hi James,
> 
> Thanks for staying on topic.
> 
> > > I don't, short of
> > > 
> > > 1) don't do that then
> > > 2) extend the network stack to have
> > > IFF_LIVE_BUT_NO_CARRIER_ADDR_CHANGE
> > >    or something like that
> > 
> > So you mean 2 is my only option... ;) I am fine with this.
> 
> :-)
> 
> I thought so, but I had another thought later. It might be possible
> to
> set LIVE_ADDR_CHANGE, but then block it in mac80211 when the
> interface
> is already connected (or beaconing, or whatever, using the MAC
> address
> in some way - even while scanning, remain-on-channel is active, etc.)

Yeah that makes sense.

> 
> I still think you'd have to bake it into the mac80211<->driver API
> somehow, because we normally "add_interface()" with the MAC address,
> and
> nothing says that the driver cannot ignore the MAC address from that
> point on. The fact that iwlwifi just copies it into every new
> MAC_CTXT
> command and the firmware actually accepts the update seems rather
> accidental and therefore fragile to rely on.

I havent looked into the actual drivers WRT add_interface so I'll take
a look. But I think I see the separation now and why it may not work
for all drivers/firmwares the way I did it.

So are you thinking we need another driver method:
"change_mac/set_mac"?

> 
> > The iwlwifi change was just an example. It ultimately would be up
> > to
> > the maintainers of each driver to support this or not. 
> 
> Sure. I was just trying to say what I wrote one paragraph up.
> 
> > > You've also not really explained what exactly is troubling you
> > > with
> > > changing the MAC address, you just mentioned some sort of "race
> > > condition"?
> > 
> > In order to change the MAC on a per-AP/SSID is to: ifdown, change
> > MAC,
> > ifup via RTNL. The problem is that ifdown generates an RTNL link
> > down
> > event and there is no way of knowing how this event was generated
> > (by
> > you, hot-unplug, or some other issue in kernel?). Handling this
> > without
> > a race is simply not possible. You sort of just have to pray none
> > of
> > this happens (and its unlikely but it *could* happen).
> 
> I see, at least sort of. I'm having a hard time seeing how this
> really
> is a problem in practice, but I suppose that's because I haven't
> tried
> implementing a fully event-driven stack.
> 
> > The connect path is just what we (IWD) use for almost all types of
> > connections that support it (apart from things like SAE/OWE/FT).
> > Not
> > sure what you mean for "usually not taken for iwlwifi"? If you have
> > an
> > iwlwifi card and you issue CMD_CONNECT thats the path it takes...
> 
> Interesting. I didn't think you'd do that, since it gives you far
> less
> control over things, and you need the other paths anyway for the
> features you mention, and the implementation in cfg80211 is far less
> complete than a typical firmware implementation would be.

There was talk about ditching CMD_CONNECT at one point, but for the
most part it does everything we need for the majority of connections.

But ultimately yes I think we do want this feature for
CMD_AUTHENTICATE/ASSOCIATE, and those details may be a bit more
involved. CMD_CONNECT was just an easier way to get my foot in the door
:)

Thanks,
James
> 
> johannes
>
Denis Kenzior Aug. 20, 2019, 7:58 p.m. UTC | #19
Hi Johannes,

On 8/20/19 2:43 PM, Johannes Berg wrote:
> On Tue, 2019-08-20 at 14:22 -0500, Denis Kenzior wrote:
>> Hi Johannes,
>>
>> So keeping things purely technical now :)
>>
>>> I thought so, but I had another thought later. It might be possible to
>>> set LIVE_ADDR_CHANGE, but then block it in mac80211 when the interface
>>> is already connected (or beaconing, or whatever, using the MAC address
>>> in some way - even while scanning, remain-on-channel is active, etc.)
>>>
>>
>> Here's what we would like to see:
>>
>> - The ability for userspace to add a 'Local Mac Address to use'
>> attribute on CMD_CONNECT and CMD_AUTHENTICATE.
> 
> Why here though? I don't really like this as it's extra complexity
> there, and dev_set_mac_address() is really easy to call from userspace.
> Yes, that's sort of a round-trip more (you wouldn't even really have to
> wait for it I guess), but we have to make trade-offs like that (vs.
> kernel complexity) at some point.

But what actual complexity are we talking about here? If the kernel can 
do this while the CONNECT is pending, why not?  It makes things simpler 
and faster for userspace.  I don't see the downside unless you can 
somehow objectively explain 'complexity'.

> 
>> - It doesn't seem useful to add this to CMD_ASSOCIATE at the moment, as
>> for new connections you'd always go either through CMD_AUTHENTICATE,
>> CMD_ASSOCIATE sequence or use CMD_CONNECT.  This should take care of
>> some (most) of your objections about specifying different addresses for
>> authenticate/associate.  Feel free to correct me here.
> 
> That wasn't really my objection, I was more wondering why James
> implemented it *only* for CMD_CONNECT, when I had been under the
> (apparently mistaken) impression that one would generally prefer
> CMD_ASSOCIATE over CMD_CONNECT, if available.

This was an RFC.  There isn't much point for us to cross all the 't's 
and dot all the 'i's if you hate the idea in the first place.

> 
>> - Optionally (and I'm thinking of tools like NM here), add the ability
>> to set the mac address via RTNL while the device is UP but has no
>> carrier, and things like scanning, offchannel, etc are not in progress.
>> Though really I don't see how NM could guarantee this without bringing
>> the device down first, so I'll let NM guys chime in to see if this is a
>> good idea.
> 
> I'm thinking along the lines of letting you do this *instead* of the
> scheme you described above. That way, we don't even need to have a
> discussion over whether it makes sense at CONNECT, AUTH, ASSOC, or
> whatever because you can essentially do it before/with any of these
> commands.
> 
> Why would this not be sufficient?
> 

It would get the job done.  But it is still a waste of time and still 
slowing us down.  Look at it this way, even if we save 10ms here.  Take 
that and multiply by 3-4 billion devices and then by the number of 
connections one does each day.  This adds up to some serious time 
wasted.  So why not do this elegantly and faster in the first place?

>> - We definitely do not want to to mess with the device state otherwise.
>> E.g. no firmware downloading, powering the adapter down, etc.  That just
>> takes too much time.
> 
> I can understand this part.
> 
>> So tell us what you would like to see.  A new
>> IFF_NO_CARRIER_ADDRESS_CHANGE or whether it is possible to use
>> IFF_LIVE_ADDR_CHANGE with some additional restrictions.
> 
> I don't know. This is not something I'm intimately familiar with. I
> could imagine (as you quoted above) that we just set
> IFF_LIVE_ADDR_CHANGE and manage the exclusions local to e.g. mac80211.
> 

Okay, so lets operate under the assumption we can hi-jack 
IFF_LIVE_ADDR_CHANGE for this

>>> I still think you'd have to bake it into the mac80211<->driver API
>>> somehow, because we normally "add_interface()" with the MAC address, and
>>> nothing says that the driver cannot ignore the MAC address from that
>>> point on. The fact that iwlwifi just copies it into every new MAC_CTXT
>>> command and the firmware actually accepts the update seems rather
>>> accidental and therefore fragile to rely on.
>>>
>>
>> Since you seem to have a clear(er?) idea here, can you elaborate or
>> possibly provide the driver interface changes you want to see?
> 
> I can see a few ways of doing this, for example having an optional
> "change_vif_addr" method in the mac80211 ops, so that drivers can do the
> necessary work. I suppose iwlwifi would actually want to send a new
> MAC_CONTEXT command at this time, for example, because the firmware is
> notoriously finicky when it comes to command sequences.
> 
> Alternatively, and this would work with all drivers, you could just
> pretend to remove/add the interface, ie. in mac80211 call
> 
>   drv_remove_interface(sdata)
>   // set new mac addr
>   drv_add_interface(sdata)
> 
> This has the advantage that it'll be guaranteed to work with all
> drivers, at the expense of perhaps a few more firmware commands
> (depending on the driver).
> 
> You can probably come up with other ways, like having a feature flag
> whether this is supported and then the driver has to detect it as
> certain documented points in time, etc., but all of those feel
> relatively fragile to me.
> 
> 
> My personal preference would be for
> 
>   * allow RTNL MAC address changes at "idle" times (TBD what that really
>     means) with IFF_LIVE_ADDR_CHANGE, but mac80211 guarding it
>   * mac80211 doing remove/add as far as the driver is concerned

Okay, so that's really what we wanted from you.  :)

> 
> Yes, you can probably shave a few more milliseconds off by adding more
> complexity, but unless we measure that and find it to be significant, I
> think the added complexity (in cfg80211 code) and restrictions (many
> drivers not supporting it if it's opt-in) wouldn't be worth it.
> 

So we have a tool in the works that can measure some of these details. 
Also, we can simply attempt to implement both ways and see if the 
complexity is as bad as you say.  Then you can make that choice.

Regards,
-Denis
Johannes Berg Aug. 20, 2019, 8:01 p.m. UTC | #20
Hi,

> So let me ask you this.  What do you _want_ to see when a contributor 
> submits something as an RFC, which that contributor states is not ready 
> for 'true' review and is really there to generate feedback?
> 
> Do you want to have that contributor use a different prefix? Every 
> maintainer is differrent.  I get that.  So if we want to start an actual 
> brainstorming session with you, how do we accomplish that?

I'd be happy if you were to just state what you want to achieve, for
starters! Whether it's [RFC] with a cover letter saying "hey, we feel
this could be faster, and have come up with the following as an idea",
or just another email without any code changes saying "hey, we feel this
could be faster, how do you think we could do this" ... doesn't really
matter.

What you have *actually* been doing though (at least from my
perspective) is something along the lines of

"hey, here's a new way to do <X>"

without even really stating why the existing <X> doesn't work for you.
And where I questioned the need to have a new way to do <X>, well, we
got the other part of this thread.

> So this goes back to my earlier point.  How do you want us to start a 
> discussion with you such that you don't become a 'supervisor' and 
> instead try to understand the pain points first?

Just explaining the pain points would help?

johannes
Johannes Berg Aug. 20, 2019, 8:06 p.m. UTC | #21
On Tue, 2019-08-20 at 15:53 -0400, James Prestwood wrote:

> > I thought so, but I had another thought later. It might be possible
> > to
> > set LIVE_ADDR_CHANGE, but then block it in mac80211 when the
> > interface
> > is already connected (or beaconing, or whatever, using the MAC
> > address
> > in some way - even while scanning, remain-on-channel is active, etc.)
> 
> Yeah that makes sense.
> 
> > I still think you'd have to bake it into the mac80211<->driver API
> > somehow, because we normally "add_interface()" with the MAC address,
> > and
> > nothing says that the driver cannot ignore the MAC address from that
> > point on. The fact that iwlwifi just copies it into every new
> > MAC_CTXT
> > command and the firmware actually accepts the update seems rather
> > accidental and therefore fragile to rely on.
> 
> I havent looked into the actual drivers WRT add_interface so I'll take
> a look. But I think I see the separation now and why it may not work
> for all drivers/firmwares the way I did it.
> 
> So are you thinking we need another driver method:
> "change_mac/set_mac"?

Perhaps. Let's continue that in the other sub-thread, where I replied in
more detail to Denis.

johannes
Johannes Berg Aug. 20, 2019, 8:15 p.m. UTC | #22
On Tue, 2019-08-20 at 14:58 -0500, Denis Kenzior wrote:
> 
> But what actual complexity are we talking about here? If the kernel can 
> do this while the CONNECT is pending, why not?  It makes things simpler 
> and faster for userspace.  I don't see the downside unless you can 
> somehow objectively explain 'complexity'.

It's just extra code that we have to worry about. Right now you want it
for CMD_CONNECT and CMD_AUTH. Somebody will come up with a reason to do
it in CMD_ASSOC next, perhaps, who knows. Somebody else will say "oh,
this is how it's done, so let's add it to CMD_JOIN_IBSS", because of
course that's what they care about. OCB? Mesh? AP mode for tethering?
Etc.

I don't see how this will not keep proliferating, and each new thing
will come with its own dozen lines of code, a new feature flag, etc.

Relaxing and defining once and for all in which situations while the
interface is up you can actually allow changing the address, and then
having userspace do it that way is IMHO a better way to design the
system, since it forgoes entirely all those questions of when and how
and which new use cases will come up etc.

> This was an RFC.  There isn't much point for us to cross all the 't's 
> and dot all the 'i's if you hate the idea in the first place.

Sure, but I cannot distinguish between "we only want it on CMD_CONNECT"
and "we'll extend this once we agree" unless you actually say so. It'd
help to communicate which t's and i's you didn't cross or dot.

> It would get the job done.  But it is still a waste of time and still 
> slowing us down.  Look at it this way, even if we save 10ms here.  Take 
> that and multiply by 3-4 billion devices and then by the number of 
> connections one does each day.  This adds up to some serious time 
> wasted.  So why not do this elegantly and faster in the first place?

It may be a bit faster, but I don't agree with elegantly. It may be more
elegant from the point of view of that single operation, but to me it's
a lot less elegant from a system view, with all the possible extensions
to it that we'll no doubt see, and not have thought about today.

johannes
Denis Kenzior Aug. 20, 2019, 8:37 p.m. UTC | #23
Hi Johannes,

On 8/20/19 3:15 PM, Johannes Berg wrote:
> On Tue, 2019-08-20 at 14:58 -0500, Denis Kenzior wrote:
>>
>> But what actual complexity are we talking about here? If the kernel can
>> do this while the CONNECT is pending, why not?  It makes things simpler
>> and faster for userspace.  I don't see the downside unless you can
>> somehow objectively explain 'complexity'.
> 
> It's just extra code that we have to worry about. Right now you want it
> for CMD_CONNECT and CMD_AUTH. Somebody will come up with a reason to do
> it in CMD_ASSOC next, perhaps, who knows. Somebody else will say "oh,
> this is how it's done, so let's add it to CMD_JOIN_IBSS", because of
> course that's what they care about. OCB? Mesh? AP mode for tethering?
> Etc.

I don't buy the extra code argument.  If you want to do something useful 
you need to write 'extra code'.

The rest, I'm not sure why you are worried about them now?  For station 
there's a very clear & present use case.  If such a clear and present 
use case is presented for AP or Mesh, then deal with it then.

> 
> I don't see how this will not keep proliferating, and each new thing
> will come with its own dozen lines of code, a new feature flag, etc.

Such is life? :)

> 
> Relaxing and defining once and for all in which situations while the
> interface is up you can actually allow changing the address, and then
> having userspace do it that way is IMHO a better way to design the
> system, since it forgoes entirely all those questions of when and how
> and which new use cases will come up etc.
> 

That would be great in theory, but practically never works at least in 
my experience.  So maybe keep and open mind?  There is a clear need to 
make this path as fast as possible for STA.  There is no such need (yet) 
for the other cases you mentioned.

>> This was an RFC.  There isn't much point for us to cross all the 't's
>> and dot all the 'i's if you hate the idea in the first place.
> 
> Sure, but I cannot distinguish between "we only want it on CMD_CONNECT"
> and "we'll extend this once we agree" unless you actually say so. It'd
> help to communicate which t's and i's you didn't cross or dot.

Okay, I'll admit the RFC description could have been better.  But in the 
end you're human last I checked (at least I recall meeting you several 
times? ;)  How about a simple "Why do you think you need this?" first?

Regards,
-Denis
Dan Williams Aug. 20, 2019, 9:18 p.m. UTC | #24
On Tue, 2019-08-20 at 15:37 -0500, Denis Kenzior wrote:
> Hi Johannes,
> 
> On 8/20/19 3:15 PM, Johannes Berg wrote:
> > On Tue, 2019-08-20 at 14:58 -0500, Denis Kenzior wrote:
> > > But what actual complexity are we talking about here? If the
> > > kernel can
> > > do this while the CONNECT is pending, why not?  It makes things
> > > simpler
> > > and faster for userspace.  I don't see the downside unless you
> > > can
> > > somehow objectively explain 'complexity'.
> > 
> > It's just extra code that we have to worry about. Right now you
> > want it
> > for CMD_CONNECT and CMD_AUTH. Somebody will come up with a reason
> > to do
> > it in CMD_ASSOC next, perhaps, who knows. Somebody else will say
> > "oh,
> > this is how it's done, so let's add it to CMD_JOIN_IBSS", because
> > of
> > course that's what they care about. OCB? Mesh? AP mode for
> > tethering?
> > Etc.
> 
> I don't buy the extra code argument.  If you want to do something
> useful 
> you need to write 'extra code'.

Code will be written, but I'd rather it be written once rather than 3+
times for STA/AP/Mesh/etc.

> The rest, I'm not sure why you are worried about them now?  For
> station 
> there's a very clear & present use case.  If such a clear and
> present 
> use case is presented for AP or Mesh, then deal with it then.

Why would you not want to pass a random MAC for AP or Mesh modes? The
same reasons for MAC randomization apply for all those too, I'd think.

> > I don't see how this will not keep proliferating, and each new
> > thing
> > will come with its own dozen lines of code, a new feature flag,
> > etc.
> 
> Such is life? :)

Not really. It's the job of maintainers to balance all these things, to
step back and think of the bigger picture and the future rather than
just solving one particular use-case today.

Your tone leaves the impression you want a particular solution pushed
through without the normal planning/architecture discussions that 
accompany API changes. And that's not how the process typically works.

Dan

> > Relaxing and defining once and for all in which situations while
> > the
> > interface is up you can actually allow changing the address, and
> > then
> > having userspace do it that way is IMHO a better way to design the
> > system, since it forgoes entirely all those questions of when and
> > how
> > and which new use cases will come up etc.
> > 
> 
> That would be great in theory, but practically never works at least
> in 
> my experience.  So maybe keep and open mind?  There is a clear need
> to 
> make this path as fast as possible for STA.  There is no such need
> (yet) 
> for the other cases you mentioned.

> > > This was an RFC.  There isn't much point for us to cross all the
> > > 't's
> > > and dot all the 'i's if you hate the idea in the first place.
> > 
> > Sure, but I cannot distinguish between "we only want it on
> > CMD_CONNECT"
> > and "we'll extend this once we agree" unless you actually say so.
> > It'd
> > help to communicate which t's and i's you didn't cross or dot.
> 
> Okay, I'll admit the RFC description could have been better.  But in
> the 
> end you're human last I checked (at least I recall meeting you
> several 
> times? ;)  How about a simple "Why do you think you need this?"
> first?
> 
> Regards,
> -Denis
Denis Kenzior Aug. 20, 2019, 9:52 p.m. UTC | #25
Hi Dan,

On 8/20/19 4:18 PM, Dan Williams wrote:

<snip>

> 
> Code will be written, but I'd rather it be written once rather than 3+
> times for STA/AP/Mesh/etc.
> 

I'm not sure you can state that definitively just yet?  So the real 
question is whether saving the extra round-trip to the kernel is worth 
the in-kernel complexity.  Given that interleaved system calls are 
_always_ a problem, I argue that it is worth it for at least the Station 
case (and it will keep connection times even faster to boot).  Isn't 
minimizing the latency of connections the end goal here?  I get that 
there are trade offs and people have other opinions on what a good trade 
off is.

But don't misunderstand, either solution is better than what we have 
today.  My argument is: "why close the door on a particular solution 
until the costs are known?"

>> The rest, I'm not sure why you are worried about them now?  For
>> station
>> there's a very clear & present use case.  If such a clear and
>> present
>> use case is presented for AP or Mesh, then deal with it then.
> 
> Why would you not want to pass a random MAC for AP or Mesh modes? The
> same reasons for MAC randomization apply for all those too, I'd think.

Umm, I was not arguing against doing that at all?  All I said was that 
no such use case was yet presented.  For AP it isn't typically needed to 
rapidly switch between MAC addresses while keeping the device UP.  If 
you think there's such a need, I'm happy to learn something new? Same 
goes for Mesh really?

> 
>>> I don't see how this will not keep proliferating, and each new
>>> thing
>>> will come with its own dozen lines of code, a new feature flag,
>>> etc.
>>
>> Such is life? :)
> 
> Not really. It's the job of maintainers to balance all these things, to
> step back and think of the bigger picture and the future rather than
> just solving one particular use-case today.
>  > Your tone leaves the impression you want a particular solution pushed
> through without the normal planning/architecture discussions that
> accompany API changes. And that's not how the process typically works.
> 

So who's attacking who now?  We're trying to solve a long standing issue 
that nobody has bothered to fix for years in a clean way.  Something 
that one of your projects would benefit from, btw.

I have a technical opinion about how it should look like.  Johannes 
might have a different opinion.  In the end it is up to him and I can go 
pound sand.  So yes, I know how the process works ;)

Regards,
-Denis
Johannes Berg Aug. 21, 2019, 7:21 a.m. UTC | #26
On Tue, 2019-08-20 at 16:52 -0500, Denis Kenzior wrote:

> I'm not sure you can state that definitively just yet?  

That's not an argument, you also cannot state definitively that it will
not happen. But yes, I'd think it _is_ in fact likely to happen at some
point for something new, maybe it won't be IBSS but NAN, or something
new that we can't even consider yet. Why close the door on it?

> So the real 
> question is whether saving the extra round-trip to the kernel is worth 
> the in-kernel complexity.

Sort of.

>   Given that interleaved system calls are _always_ a problem

They're not interleaved, they're just serial. And seriously, if syscalls
were such a big problem, we wouldn't be using them, we'd have found a
better solution.

(and FWIW, I don't feel you're really arguing in good faith here. You're
just throwing out statements like that and dismissing all other
arguments with no good explanation.)

> , I argue that it is worth it for at least the Station 
> case (and it will keep connection times even faster to boot).  Isn't 
> minimizing the latency of connections the end goal here?  I get that 
> there are trade offs and people have other opinions on what a good trade 
> off is.

That's ridiculous. If I extrapolate that statement the logical
consequence is that you should put iwd into a kernel module. You're not
doing that, last I checked? So minimizing the latency of connections is
quite clearly not the only goal here.

I understand that you're interested in minimizing the latency of
connections. I can even agree that it's a worthwhile goal. But it cannot
be the only goal.

> But don't misunderstand, either solution is better than what we have 
> today.  My argument is: "why close the door on a particular solution 
> until the costs are known?"

That's not actually what you said.

> > Not really. It's the job of maintainers to balance all these things, to
> > step back and think of the bigger picture and the future rather than
> > just solving one particular use-case today.
> >  > Your tone leaves the impression you want a particular solution pushed
> > through without the normal planning/architecture discussions that
> > accompany API changes. And that's not how the process typically works.
> > 
> 
> So who's attacking who now?  We're trying to solve a long standing issue 
> that nobody has bothered to fix for years in a clean way.  Something 
> that one of your projects would benefit from, btw.

Seriously Denis, stop it.

johannes