mbox series

[v6,mptcp-next,0/6] add MP_CAPABLE 'C' flag

Message ID cover.1621002341.git.geliangtang@gmail.com (mailing list archive)
Headers show
Series add MP_CAPABLE 'C' flag | expand

Message

Geliang Tang May 14, 2021, 2:32 p.m. UTC
v6:
 - rename mptcp_pm_add_entry to mptcp_pm_anno_entry
 - update mptcp_pm_free_add_list
 - update code in mptcp_pm_nl_add_addr_received
 - tag: export/20210514T055902

v5:
 - add a new patch "mptcp: add add_list in mptcp_pm_data"
 - fix !CONFIG_SYSCTL case
 - tag: export/20210507T174457

v4:
 - rename mptcp_is_allow_join_id0 to mptcp_allow_join_id0
 - rename deny_join_id0 in mptcp_pm_data to remote_deny_join_id0
 - add comments for self test cases
 - apply: export/20210504T064955 +
          "data checksum support" +
          "data checksum support cleanups"

v3:
 - use 'u8 allow_join_initial_addr_port'
 - drop the spinlock in patch 3

v2:
 - rename join_denied to allow_join_id0 in mptcp_out_options
 - rename join_denied to deny_join_id0 in mptcp_options_received
 - add a new function mptcp_pm_deny_join_id0_received
 - move deny_join_id0 flag from mptcp_sock to mptcp_pm_data
 - check deny_join_id0 flag in mptcp_pm_create_subflow_or_signal_addr
instead of in mptcp_syn_options.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/183

Geliang Tang (6):
  mptcp: add sysctl allow_join_initial_addr_port
  mptcp: add allow_join_id0 in mptcp_out_options
  mptcp: rename mptcp_pm_add_entry to mptcp_pm_anno_entry
  mptcp: add add_list in mptcp_pm_data
  mptcp: add deny_join_id0 in mptcp_options_received
  selftests: mptcp: add deny_join_id0 testcases

 Documentation/networking/mptcp-sysctl.rst     |  13 ++
 include/net/mptcp.h                           |   3 +-
 net/mptcp/ctrl.c                              |  16 ++
 net/mptcp/options.c                           |  14 +-
 net/mptcp/pm.c                                |   2 +
 net/mptcp/pm_netlink.c                        | 151 ++++++++++++++----
 net/mptcp/protocol.c                          |   1 +
 net/mptcp/protocol.h                          |  21 ++-
 net/mptcp/subflow.c                           |   3 +
 .../testing/selftests/net/mptcp/mptcp_join.sh |  56 ++++++-
 10 files changed, 238 insertions(+), 42 deletions(-)

Comments

Mat Martineau May 14, 2021, 11:36 p.m. UTC | #1
On Fri, 14 May 2021, Geliang Tang wrote:

> v6:
> - rename mptcp_pm_add_entry to mptcp_pm_anno_entry
> - update mptcp_pm_free_add_list
> - update code in mptcp_pm_nl_add_addr_received
> - tag: export/20210514T055902
>

Hi Geliang -

Thanks for the changes, v6 looks good for the export branch.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>


-Mat


> v5:
> - add a new patch "mptcp: add add_list in mptcp_pm_data"
> - fix !CONFIG_SYSCTL case
> - tag: export/20210507T174457
>
> v4:
> - rename mptcp_is_allow_join_id0 to mptcp_allow_join_id0
> - rename deny_join_id0 in mptcp_pm_data to remote_deny_join_id0
> - add comments for self test cases
> - apply: export/20210504T064955 +
>          "data checksum support" +
>          "data checksum support cleanups"
>
> v3:
> - use 'u8 allow_join_initial_addr_port'
> - drop the spinlock in patch 3
>
> v2:
> - rename join_denied to allow_join_id0 in mptcp_out_options
> - rename join_denied to deny_join_id0 in mptcp_options_received
> - add a new function mptcp_pm_deny_join_id0_received
> - move deny_join_id0 flag from mptcp_sock to mptcp_pm_data
> - check deny_join_id0 flag in mptcp_pm_create_subflow_or_signal_addr
> instead of in mptcp_syn_options.
>
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/183
>
> Geliang Tang (6):
>  mptcp: add sysctl allow_join_initial_addr_port
>  mptcp: add allow_join_id0 in mptcp_out_options
>  mptcp: rename mptcp_pm_add_entry to mptcp_pm_anno_entry
>  mptcp: add add_list in mptcp_pm_data
>  mptcp: add deny_join_id0 in mptcp_options_received
>  selftests: mptcp: add deny_join_id0 testcases
>
> Documentation/networking/mptcp-sysctl.rst     |  13 ++
> include/net/mptcp.h                           |   3 +-
> net/mptcp/ctrl.c                              |  16 ++
> net/mptcp/options.c                           |  14 +-
> net/mptcp/pm.c                                |   2 +
> net/mptcp/pm_netlink.c                        | 151 ++++++++++++++----
> net/mptcp/protocol.c                          |   1 +
> net/mptcp/protocol.h                          |  21 ++-
> net/mptcp/subflow.c                           |   3 +
> .../testing/selftests/net/mptcp/mptcp_join.sh |  56 ++++++-
> 10 files changed, 238 insertions(+), 42 deletions(-)
>
> -- 
> 2.31.1
>
>
>

--
Mat Martineau
Intel
Paolo Abeni May 17, 2021, 2:17 p.m. UTC | #2
Hello,

On Fri, 2021-05-14 at 22:32 +0800, Geliang Tang wrote:
> v6:
>  - rename mptcp_pm_add_entry to mptcp_pm_anno_entry
>  - update mptcp_pm_free_add_list
>  - update code in mptcp_pm_nl_add_addr_received
>  - tag: export/20210514T055902
> 
> v5:
>  - add a new patch "mptcp: add add_list in mptcp_pm_data"
>  - fix !CONFIG_SYSCTL case
>  - tag: export/20210507T174457
> 
> v4:
>  - rename mptcp_is_allow_join_id0 to mptcp_allow_join_id0
>  - rename deny_join_id0 in mptcp_pm_data to remote_deny_join_id0
>  - add comments for self test cases
>  - apply: export/20210504T064955 +
>           "data checksum support" +
>           "data checksum support cleanups"
> 
> v3:
>  - use 'u8 allow_join_initial_addr_port'
>  - drop the spinlock in patch 3
> 
> v2:
>  - rename join_denied to allow_join_id0 in mptcp_out_options
>  - rename join_denied to deny_join_id0 in mptcp_options_received
>  - add a new function mptcp_pm_deny_join_id0_received
>  - move deny_join_id0 flag from mptcp_sock to mptcp_pm_data
>  - check deny_join_id0 flag in mptcp_pm_create_subflow_or_signal_addr
> instead of in mptcp_syn_options.
> 
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/183
> 
> Geliang Tang (6):
>   mptcp: add sysctl allow_join_initial_addr_port
>   mptcp: add allow_join_id0 in mptcp_out_options
>   mptcp: rename mptcp_pm_add_entry to mptcp_pm_anno_entry
>   mptcp: add add_list in mptcp_pm_data
>   mptcp: add deny_join_id0 in mptcp_options_received
>   selftests: mptcp: add deny_join_id0 testcases
> 
>  Documentation/networking/mptcp-sysctl.rst     |  13 ++
>  include/net/mptcp.h                           |   3 +-
>  net/mptcp/ctrl.c                              |  16 ++
>  net/mptcp/options.c                           |  14 +-
>  net/mptcp/pm.c                                |   2 +
>  net/mptcp/pm_netlink.c                        | 151 ++++++++++++++----
>  net/mptcp/protocol.c                          |   1 +
>  net/mptcp/protocol.h                          |  21 ++-
>  net/mptcp/subflow.c                           |   3 +
>  .../testing/selftests/net/mptcp/mptcp_join.sh |  56 ++++++-
>  10 files changed, 238 insertions(+), 42 deletions(-)
> 
I'm very sorry to give feedback this late, but I was unable to review
this before.

I have some very generic/broad questions:
1) shuold 'allow_join_initial_addr_port' be by default 0 for clients?
If I read correctly, will be 1 after this patch.

2) is the additional add_addr accounting required? I mean storing the
received addrs in the msk pm add_list (patch 4/6). Can we instead
simply enforce !allow_join_initial_addr_port looking up the relevant
address into the existing subflows? (or caching first subflow dst addr
and port in some new msk fields, whatever is simpler)

if !allow_join_initial_addr_port && <exists subflow with target
ip/port> && <such subflow is mp_capable/>
	<do not create new subflow>

3) If I read correctly the series enforce the peer who could create the
new subflow to respect the 'allow_join_initial_addr_port' field, but
does not implement the corrsponding check on the other end. e.g. if a
server sets 'allow_join_initial_addr_port' to 0 and a client tries to
create additional subflows, the server will not forbit that. I
think/fear the missing complexity here is quite significant.

I think it would be nice avoid adding additional complexity for the 'C'
bit - currently even mptcp.org support it.

Perhaps we could keep 'allow_join_initial_addr_port' constant (always 0
for client, always 0 for server) - avoding the additional complexity of
point 3) above.

Again, I'm sorry for suggesting significant changes this late.

Paolo
Mat Martineau May 17, 2021, 7:59 p.m. UTC | #3
On Mon, 17 May 2021, Paolo Abeni wrote:

> Hello,
>
> On Fri, 2021-05-14 at 22:32 +0800, Geliang Tang wrote:
>> v6:
>>  - rename mptcp_pm_add_entry to mptcp_pm_anno_entry
>>  - update mptcp_pm_free_add_list
>>  - update code in mptcp_pm_nl_add_addr_received
>>  - tag: export/20210514T055902
>>
>> v5:
>>  - add a new patch "mptcp: add add_list in mptcp_pm_data"
>>  - fix !CONFIG_SYSCTL case
>>  - tag: export/20210507T174457
>>
>> v4:
>>  - rename mptcp_is_allow_join_id0 to mptcp_allow_join_id0
>>  - rename deny_join_id0 in mptcp_pm_data to remote_deny_join_id0
>>  - add comments for self test cases
>>  - apply: export/20210504T064955 +
>>           "data checksum support" +
>>           "data checksum support cleanups"
>>
>> v3:
>>  - use 'u8 allow_join_initial_addr_port'
>>  - drop the spinlock in patch 3
>>
>> v2:
>>  - rename join_denied to allow_join_id0 in mptcp_out_options
>>  - rename join_denied to deny_join_id0 in mptcp_options_received
>>  - add a new function mptcp_pm_deny_join_id0_received
>>  - move deny_join_id0 flag from mptcp_sock to mptcp_pm_data
>>  - check deny_join_id0 flag in mptcp_pm_create_subflow_or_signal_addr
>> instead of in mptcp_syn_options.
>>
>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/183
>>
>> Geliang Tang (6):
>>   mptcp: add sysctl allow_join_initial_addr_port
>>   mptcp: add allow_join_id0 in mptcp_out_options
>>   mptcp: rename mptcp_pm_add_entry to mptcp_pm_anno_entry
>>   mptcp: add add_list in mptcp_pm_data
>>   mptcp: add deny_join_id0 in mptcp_options_received
>>   selftests: mptcp: add deny_join_id0 testcases
>>
>>  Documentation/networking/mptcp-sysctl.rst     |  13 ++
>>  include/net/mptcp.h                           |   3 +-
>>  net/mptcp/ctrl.c                              |  16 ++
>>  net/mptcp/options.c                           |  14 +-
>>  net/mptcp/pm.c                                |   2 +
>>  net/mptcp/pm_netlink.c                        | 151 ++++++++++++++----
>>  net/mptcp/protocol.c                          |   1 +
>>  net/mptcp/protocol.h                          |  21 ++-
>>  net/mptcp/subflow.c                           |   3 +
>>  .../testing/selftests/net/mptcp/mptcp_join.sh |  56 ++++++-
>>  10 files changed, 238 insertions(+), 42 deletions(-)
>>
> I'm very sorry to give feedback this late, but I was unable to review
> this before.
>
> I have some very generic/broad questions:
> 1) shuold 'allow_join_initial_addr_port' be by default 0 for clients?
> If I read correctly, will be 1 after this patch.

It should default to 1 because that is what keeps the current behavior.

The logic is flipped from the 'C' bit: C=1 in the header means "do not 
connect to initial subflow addr/port", but allow_join_initial_addr_port=1 
means "*do* connect to the initial subflow addr/port".

> 2) is the additional add_addr accounting required? I mean storing the
> received addrs in the msk pm add_list (patch 4/6). Can we instead
> simply enforce !allow_join_initial_addr_port looking up the relevant
> address into the existing subflows? (or caching first subflow dst addr
> and port in some new msk fields, whatever is simpler)
>
> if !allow_join_initial_addr_port && <exists subflow with target
> ip/port> && <such subflow is mp_capable/>
> 	<do not create new subflow>
>

Caching the first subflow dst addr/port is enough to prevent sending a 
MP_JOIN to that address, but I don't think it does quite enough.

Consider:

1. Initial connection is made between Peer A and Peer B with C=1 (both 
directions).

2. Peer A sends ADD_ADDR. Peer B currently has limits set that don't allow 
outgoing MP_JOINs.

3. Later, Peer B updates limits to allow outgoing MP_JOINs.

At this point, Peer B needs to know what to connect to. Without the 
add_list, the only active address it knows for Peer A is one it is not 
allowed to connect to.


I've expected that we would add better tracking of ADD_ADDRs - remember 
that ADD_ADDR is an advertisement, not a request to "please connect to 
this address right now". Userspace path managers will certainly have more 
persistance of the peer's advertised addresses. Any more comprehensive 
in-kernel PM will also need the add_list. The main concern is that the 
list doesn't consume excessive memory, and the add_list is not expanded 
when pm->accept_addr is false.


> 3) If I read correctly the series enforce the peer who could create the
> new subflow to respect the 'allow_join_initial_addr_port' field, but
> does not implement the corrsponding check on the other end. e.g. if a
> server sets 'allow_join_initial_addr_port' to 0 and a client tries to
> create additional subflows, the server will not forbit that. I
> think/fear the missing complexity here is quite significant.
>
> I think it would be nice avoid adding additional complexity for the 'C'
> bit - currently even mptcp.org support it.
>

Assuming you intended to say "currently even mptcp.org does not support 
it."

You're right: the incoming join address check is missing. But I'm not sure 
the complexity is so bad, the PM can already reject joins. I think we 
could pass enough information to mptcp_pm_allow_new_subflow() to implement 
this check. Am I oversimplifying - is it harder to get at the dest 
addr/port? Or is it something else?

> Perhaps we could keep 'allow_join_initial_addr_port' constant (always 0
> for client, always 0 for server) - avoding the additional complexity of
> point 3) above.
>

Given the answer to #1 is this still what you'd propose? 
allow_join_initial_addr_port=1 is unchanged behavior from the kernel 
today, and is the simpler case.

--
Mat Martineau
Intel
Geliang Tang May 18, 2021, 10:11 a.m. UTC | #4
Hi Mat, Paolo,

Mat Martineau <mathew.j.martineau@linux.intel.com> 于2021年5月18日周二 上午3:59写道:
>
> On Mon, 17 May 2021, Paolo Abeni wrote:
>
> > Hello,
> >
> > On Fri, 2021-05-14 at 22:32 +0800, Geliang Tang wrote:
> >> v6:
> >>  - rename mptcp_pm_add_entry to mptcp_pm_anno_entry
> >>  - update mptcp_pm_free_add_list
> >>  - update code in mptcp_pm_nl_add_addr_received
> >>  - tag: export/20210514T055902
> >>
> >> v5:
> >>  - add a new patch "mptcp: add add_list in mptcp_pm_data"
> >>  - fix !CONFIG_SYSCTL case
> >>  - tag: export/20210507T174457
> >>
> >> v4:
> >>  - rename mptcp_is_allow_join_id0 to mptcp_allow_join_id0
> >>  - rename deny_join_id0 in mptcp_pm_data to remote_deny_join_id0
> >>  - add comments for self test cases
> >>  - apply: export/20210504T064955 +
> >>           "data checksum support" +
> >>           "data checksum support cleanups"
> >>
> >> v3:
> >>  - use 'u8 allow_join_initial_addr_port'
> >>  - drop the spinlock in patch 3
> >>
> >> v2:
> >>  - rename join_denied to allow_join_id0 in mptcp_out_options
> >>  - rename join_denied to deny_join_id0 in mptcp_options_received
> >>  - add a new function mptcp_pm_deny_join_id0_received
> >>  - move deny_join_id0 flag from mptcp_sock to mptcp_pm_data
> >>  - check deny_join_id0 flag in mptcp_pm_create_subflow_or_signal_addr
> >> instead of in mptcp_syn_options.
> >>
> >> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/183
> >>
> >> Geliang Tang (6):
> >>   mptcp: add sysctl allow_join_initial_addr_port
> >>   mptcp: add allow_join_id0 in mptcp_out_options
> >>   mptcp: rename mptcp_pm_add_entry to mptcp_pm_anno_entry
> >>   mptcp: add add_list in mptcp_pm_data
> >>   mptcp: add deny_join_id0 in mptcp_options_received
> >>   selftests: mptcp: add deny_join_id0 testcases
> >>
> >>  Documentation/networking/mptcp-sysctl.rst     |  13 ++
> >>  include/net/mptcp.h                           |   3 +-
> >>  net/mptcp/ctrl.c                              |  16 ++
> >>  net/mptcp/options.c                           |  14 +-
> >>  net/mptcp/pm.c                                |   2 +
> >>  net/mptcp/pm_netlink.c                        | 151 ++++++++++++++----
> >>  net/mptcp/protocol.c                          |   1 +
> >>  net/mptcp/protocol.h                          |  21 ++-
> >>  net/mptcp/subflow.c                           |   3 +
> >>  .../testing/selftests/net/mptcp/mptcp_join.sh |  56 ++++++-
> >>  10 files changed, 238 insertions(+), 42 deletions(-)
> >>
> > I'm very sorry to give feedback this late, but I was unable to review
> > this before.
> >
> > I have some very generic/broad questions:
> > 1) shuold 'allow_join_initial_addr_port' be by default 0 for clients?
> > If I read correctly, will be 1 after this patch.
>
> It should default to 1 because that is what keeps the current behavior.
>
> The logic is flipped from the 'C' bit: C=1 in the header means "do not
> connect to initial subflow addr/port", but allow_join_initial_addr_port=1
> means "*do* connect to the initial subflow addr/port".
>
> > 2) is the additional add_addr accounting required? I mean storing the
> > received addrs in the msk pm add_list (patch 4/6). Can we instead
> > simply enforce !allow_join_initial_addr_port looking up the relevant
> > address into the existing subflows? (or caching first subflow dst addr
> > and port in some new msk fields, whatever is simpler)
> >
> > if !allow_join_initial_addr_port && <exists subflow with target
> > ip/port> && <such subflow is mp_capable/>
> >       <do not create new subflow>
> >
>
> Caching the first subflow dst addr/port is enough to prevent sending a
> MP_JOIN to that address, but I don't think it does quite enough.
>
> Consider:
>
> 1. Initial connection is made between Peer A and Peer B with C=1 (both
> directions).
>
> 2. Peer A sends ADD_ADDR. Peer B currently has limits set that don't allow
> outgoing MP_JOINs.
>
> 3. Later, Peer B updates limits to allow outgoing MP_JOINs.
>
> At this point, Peer B needs to know what to connect to. Without the
> add_list, the only active address it knows for Peer A is one it is not
> allowed to connect to.
>
>
> I've expected that we would add better tracking of ADD_ADDRs - remember
> that ADD_ADDR is an advertisement, not a request to "please connect to
> this address right now". Userspace path managers will certainly have more
> persistance of the peer's advertised addresses. Any more comprehensive
> in-kernel PM will also need the add_list. The main concern is that the
> list doesn't consume excessive memory, and the add_list is not expanded
> when pm->accept_addr is false.
>
>
> > 3) If I read correctly the series enforce the peer who could create the
> > new subflow to respect the 'allow_join_initial_addr_port' field, but
> > does not implement the corrsponding check on the other end. e.g. if a
> > server sets 'allow_join_initial_addr_port' to 0 and a client tries to
> > create additional subflows, the server will not forbit that. I
> > think/fear the missing complexity here is quite significant.
> >
> > I think it would be nice avoid adding additional complexity for the 'C'
> > bit - currently even mptcp.org support it.
> >
>
> Assuming you intended to say "currently even mptcp.org does not support
> it."
>
> You're right: the incoming join address check is missing. But I'm not sure
> the complexity is so bad, the PM can already reject joins. I think we
> could pass enough information to mptcp_pm_allow_new_subflow() to implement
> this check. Am I oversimplifying - is it harder to get at the dest
> addr/port? Or is it something else?

Thanks for your review.

I just sent out a patch named (Squash to "mptcp: add deny_join_id0 in
mptcp_options_received") to fix this.

-Geliang

>
> > Perhaps we could keep 'allow_join_initial_addr_port' constant (always 0
> > for client, always 0 for server) - avoding the additional complexity of
> > point 3) above.
> >
>
> Given the answer to #1 is this still what you'd propose?
> allow_join_initial_addr_port=1 is unchanged behavior from the kernel
> today, and is the simpler case.
>
> --
> Mat Martineau
> Intel
Paolo Abeni May 18, 2021, 1:56 p.m. UTC | #5
On Mon, 2021-05-17 at 12:59 -0700, Mat Martineau wrote:
> On Mon, 17 May 2021, Paolo Abeni wrote:
> 
> > Hello,
> > 
> > On Fri, 2021-05-14 at 22:32 +0800, Geliang Tang wrote:
> > > v6:
> > >  - rename mptcp_pm_add_entry to mptcp_pm_anno_entry
> > >  - update mptcp_pm_free_add_list
> > >  - update code in mptcp_pm_nl_add_addr_received
> > >  - tag: export/20210514T055902
> > > 
> > > v5:
> > >  - add a new patch "mptcp: add add_list in mptcp_pm_data"
> > >  - fix !CONFIG_SYSCTL case
> > >  - tag: export/20210507T174457
> > > 
> > > v4:
> > >  - rename mptcp_is_allow_join_id0 to mptcp_allow_join_id0
> > >  - rename deny_join_id0 in mptcp_pm_data to remote_deny_join_id0
> > >  - add comments for self test cases
> > >  - apply: export/20210504T064955 +
> > >           "data checksum support" +
> > >           "data checksum support cleanups"
> > > 
> > > v3:
> > >  - use 'u8 allow_join_initial_addr_port'
> > >  - drop the spinlock in patch 3
> > > 
> > > v2:
> > >  - rename join_denied to allow_join_id0 in mptcp_out_options
> > >  - rename join_denied to deny_join_id0 in mptcp_options_received
> > >  - add a new function mptcp_pm_deny_join_id0_received
> > >  - move deny_join_id0 flag from mptcp_sock to mptcp_pm_data
> > >  - check deny_join_id0 flag in mptcp_pm_create_subflow_or_signal_addr
> > > instead of in mptcp_syn_options.
> > > 
> > > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/183
> > > 
> > > Geliang Tang (6):
> > >   mptcp: add sysctl allow_join_initial_addr_port
> > >   mptcp: add allow_join_id0 in mptcp_out_options
> > >   mptcp: rename mptcp_pm_add_entry to mptcp_pm_anno_entry
> > >   mptcp: add add_list in mptcp_pm_data
> > >   mptcp: add deny_join_id0 in mptcp_options_received
> > >   selftests: mptcp: add deny_join_id0 testcases
> > > 
> > >  Documentation/networking/mptcp-sysctl.rst     |  13 ++
> > >  include/net/mptcp.h                           |   3 +-
> > >  net/mptcp/ctrl.c                              |  16 ++
> > >  net/mptcp/options.c                           |  14 +-
> > >  net/mptcp/pm.c                                |   2 +
> > >  net/mptcp/pm_netlink.c                        | 151 ++++++++++++++----
> > >  net/mptcp/protocol.c                          |   1 +
> > >  net/mptcp/protocol.h                          |  21 ++-
> > >  net/mptcp/subflow.c                           |   3 +
> > >  .../testing/selftests/net/mptcp/mptcp_join.sh |  56 ++++++-
> > >  10 files changed, 238 insertions(+), 42 deletions(-)
> > > 
> > I'm very sorry to give feedback this late, but I was unable to review
> > this before.
> > 
> > I have some very generic/broad questions:
> > 1) shuold 'allow_join_initial_addr_port' be by default 0 for clients?
> > If I read correctly, will be 1 after this patch.
> 
> It should default to 1 because that is what keeps the current behavior.
> 
> The logic is flipped from the 'C' bit: C=1 in the header means "do not 
> connect to initial subflow addr/port", but allow_join_initial_addr_port=1 
> means "*do* connect to the initial subflow addr/port".
> 
> > 2) is the additional add_addr accounting required? I mean storing the
> > received addrs in the msk pm add_list (patch 4/6). Can we instead
> > simply enforce !allow_join_initial_addr_port looking up the relevant
> > address into the existing subflows? (or caching first subflow dst addr
> > and port in some new msk fields, whatever is simpler)
> > 
> > if !allow_join_initial_addr_port && <exists subflow with target
> > ip/port> && <such subflow is mp_capable/>
> > 	<do not create new subflow>
> > 
> 
> Caching the first subflow dst addr/port is enough to prevent sending a 
> MP_JOIN to that address, but I don't think it does quite enough.
> 
> Consider:
> 
> 1. Initial connection is made between Peer A and Peer B with C=1 (both 
> directions).
> 
> 2. Peer A sends ADD_ADDR. Peer B currently has limits set that don't allow 
> outgoing MP_JOINs.

Uhmmm.... Looking at the RFC, ADD_ADDR <MPC addr/MPC port> is a gray
area to me. My understanding is that with 'C' bit == 0 the peer is
allowed to create additional subflow towardst the initial subflow dst
addr/port, and such subflows are not allowed with C==1. 

But it looks like that ADD_ADDR should override that request. At the
very least an ADD_ADDR for the initial addr/port when the MPC handshake
completed with C==1 is weird.

> I've expected that we would add better tracking of ADD_ADDRs - 
> remember 
> that ADD_ADDR is an advertisement, not a request to "please connect 
> to 
> this address right now". Userspace path managers will certainly have 
> more 
> persistance of the peer's advertised addresses. Any more 
> comprehensive 
> in-kernel PM will also need the add_list. The main concern is that
> the 
> list doesn't consume excessive memory, and the add_list is not 
> expanded 
> when pm->accept_addr is false.

The in-kernel path manager is want to be as simple as possible. As soon
as we start allocating memory on peer req it's very easy to be prone to
DoS. Memory accounting and limits could save us or at least mitigate
the problem, but they will add more complexity.

> > 3) If I read correctly the series enforce the peer who could create the
> > new subflow to respect the 'allow_join_initial_addr_port' field, but
> > does not implement the corrsponding check on the other end. e.g. if a
> > server sets 'allow_join_initial_addr_port' to 0 and a client tries to
> > create additional subflows, the server will not forbit that. I
> > think/fear the missing complexity here is quite significant.
> > 
> > I think it would be nice avoid adding additional complexity for the 'C'
> > bit - currently even mptcp.org support it.
> > 
> 
> Assuming you intended to say "currently even mptcp.org does not support 
> it."
> 
> You're right: the incoming join address check is missing. But I'm not sure 
> the complexity is so bad, the PM can already reject joins. I think we 
> could pass enough information to mptcp_pm_allow_new_subflow() to implement 
> this check. Am I oversimplifying - is it harder to get at the dest 
> addr/port? Or is it something else?

What abot that setting C==0 on client sockets? does that imply that the
client will accept incoming connection on the src port/addr?

For the server case, even a list-based filter would add addtional
overhead and complexity for all setup.

AFAICS the RFC mentions this option as on optimization to avoid
connection attempts failing when something outside the endpoint
themself would prevent them from being succeful.

But this "optimization" has a cost in term of memory, complexity and
additional overhead for each connection/subflow. It will work only if
the administrator will configure the C bit properly. And that could be
non trivial, when load balancing and NAT are outside the mptcp host.

Just does not seam worthy to me, I really think we should start with
the bare minumum to respect the RFC itself, avoiding as much complexity
as possible:
- setting C always to 0 on server
- when the peer send C==1, cache the MPC addr/port, and do not attempt
other connection to such endpoint.

I'm not even sure the latter step is strictly necessary. Another option
would be passing the info to the user-space path manager - if available
- and let it deal with all the complexity. If no user-space path
manager is available just be dumb and simple ;)

> > Perhaps we could keep 'allow_join_initial_addr_port' constant (always 0
> > for client, always 0 for server) - avoding the additional complexity of
> > point 3) above.
> > 
> 
> Given the answer to #1 is this still what you'd propose? 
> allow_join_initial_addr_port=1 is unchanged behavior from the kernel 
> today, and is the simpler case.
> 
> --
> Mat Martineau
> Intel
>
Mat Martineau May 18, 2021, 7:07 p.m. UTC | #6
On Tue, 18 May 2021, Paolo Abeni wrote:

> On Mon, 2021-05-17 at 12:59 -0700, Mat Martineau wrote:
>> On Mon, 17 May 2021, Paolo Abeni wrote:
>>
>>> Hello,
>>>
>>> On Fri, 2021-05-14 at 22:32 +0800, Geliang Tang wrote:
>>>> v6:
>>>>  - rename mptcp_pm_add_entry to mptcp_pm_anno_entry
>>>>  - update mptcp_pm_free_add_list
>>>>  - update code in mptcp_pm_nl_add_addr_received
>>>>  - tag: export/20210514T055902
>>>>
>>>> v5:
>>>>  - add a new patch "mptcp: add add_list in mptcp_pm_data"
>>>>  - fix !CONFIG_SYSCTL case
>>>>  - tag: export/20210507T174457
>>>>
>>>> v4:
>>>>  - rename mptcp_is_allow_join_id0 to mptcp_allow_join_id0
>>>>  - rename deny_join_id0 in mptcp_pm_data to remote_deny_join_id0
>>>>  - add comments for self test cases
>>>>  - apply: export/20210504T064955 +
>>>>           "data checksum support" +
>>>>           "data checksum support cleanups"
>>>>
>>>> v3:
>>>>  - use 'u8 allow_join_initial_addr_port'
>>>>  - drop the spinlock in patch 3
>>>>
>>>> v2:
>>>>  - rename join_denied to allow_join_id0 in mptcp_out_options
>>>>  - rename join_denied to deny_join_id0 in mptcp_options_received
>>>>  - add a new function mptcp_pm_deny_join_id0_received
>>>>  - move deny_join_id0 flag from mptcp_sock to mptcp_pm_data
>>>>  - check deny_join_id0 flag in mptcp_pm_create_subflow_or_signal_addr
>>>> instead of in mptcp_syn_options.
>>>>
>>>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/183
>>>>
>>>> Geliang Tang (6):
>>>>   mptcp: add sysctl allow_join_initial_addr_port
>>>>   mptcp: add allow_join_id0 in mptcp_out_options
>>>>   mptcp: rename mptcp_pm_add_entry to mptcp_pm_anno_entry
>>>>   mptcp: add add_list in mptcp_pm_data
>>>>   mptcp: add deny_join_id0 in mptcp_options_received
>>>>   selftests: mptcp: add deny_join_id0 testcases
>>>>
>>>>  Documentation/networking/mptcp-sysctl.rst     |  13 ++
>>>>  include/net/mptcp.h                           |   3 +-
>>>>  net/mptcp/ctrl.c                              |  16 ++
>>>>  net/mptcp/options.c                           |  14 +-
>>>>  net/mptcp/pm.c                                |   2 +
>>>>  net/mptcp/pm_netlink.c                        | 151 ++++++++++++++----
>>>>  net/mptcp/protocol.c                          |   1 +
>>>>  net/mptcp/protocol.h                          |  21 ++-
>>>>  net/mptcp/subflow.c                           |   3 +
>>>>  .../testing/selftests/net/mptcp/mptcp_join.sh |  56 ++++++-
>>>>  10 files changed, 238 insertions(+), 42 deletions(-)
>>>>
>>> I'm very sorry to give feedback this late, but I was unable to review
>>> this before.
>>>
>>> I have some very generic/broad questions:
>>> 1) shuold 'allow_join_initial_addr_port' be by default 0 for clients?
>>> If I read correctly, will be 1 after this patch.
>>
>> It should default to 1 because that is what keeps the current behavior.
>>
>> The logic is flipped from the 'C' bit: C=1 in the header means "do not
>> connect to initial subflow addr/port", but allow_join_initial_addr_port=1
>> means "*do* connect to the initial subflow addr/port".
>>
>>> 2) is the additional add_addr accounting required? I mean storing the
>>> received addrs in the msk pm add_list (patch 4/6). Can we instead
>>> simply enforce !allow_join_initial_addr_port looking up the relevant
>>> address into the existing subflows? (or caching first subflow dst addr
>>> and port in some new msk fields, whatever is simpler)
>>>
>>> if !allow_join_initial_addr_port && <exists subflow with target
>>> ip/port> && <such subflow is mp_capable/>
>>> 	<do not create new subflow>
>>>
>>
>> Caching the first subflow dst addr/port is enough to prevent sending a
>> MP_JOIN to that address, but I don't think it does quite enough.
>>
>> Consider:
>>
>> 1. Initial connection is made between Peer A and Peer B with C=1 (both
>> directions).
>>
>> 2. Peer A sends ADD_ADDR. Peer B currently has limits set that don't allow
>> outgoing MP_JOINs.
>
> Uhmmm.... Looking at the RFC, ADD_ADDR <MPC addr/MPC port> is a gray
> area to me. My understanding is that with 'C' bit == 0 the peer is
> allowed to create additional subflow towardst the initial subflow dst
> addr/port, and such subflows are not allowed with C==1.

That's correct. My example was unclear: my intent was to say Peer A sent 
an ADD_ADDR for a new/different address that it does want Peer B to 
connect to.

>
> But it looks like that ADD_ADDR should override that request. At the
> very least an ADD_ADDR for the initial addr/port when the MPC handshake
> completed with C==1 is weird.

This is a corner case that I hadn't considered and I agree that it's 
weird. Given the unconditional "MUST NOT" wording of the C bit 
description, I would not expect the ADD_ADDR to override. If a peer 
sends a weird ADD_ADDR they are asking for weirdness I guess, but it's ok 
to ignore them :)

> 
>> I've expected that we would add better tracking of ADD_ADDRs -
>> remember
>> that ADD_ADDR is an advertisement, not a request to "please connect
>> to
>> this address right now". Userspace path managers will certainly have
>> more
>> persistance of the peer's advertised addresses. Any more
>> comprehensive
>> in-kernel PM will also need the add_list. The main concern is that
>> the
>> list doesn't consume excessive memory, and the add_list is not
>> expanded
>> when pm->accept_addr is false.
>
> The in-kernel path manager is want to be as simple as possible. As soon
> as we start allocating memory on peer req it's very easy to be prone to
> DoS. Memory accounting and limits could save us or at least mitigate
> the problem, but they will add more complexity.

My assumption was that high-connection-count servers would probably have 
their limits set in a way that ADD_ADDRs would not be accepted anyway. 
This is the use case where clients may be behind a NAT and the server is 
accepting MP_JOINs from the client (but the server is never sending 
MP_JOINs), and the server is possibly sending ADD_ADDRs to the client. The 
configurable limit is already there from the existing in-kernel PM code.

I do agree with "as simple as possible", but the RFC requirements are 
constraining that. Even though the C bit is ignored in Linux 
implementations so far, I don't think it should stay that way. The current 
code always tries to connect to the initial subflow, so we need to do 
*something* different to remember an alternate peer address. There are 
other options (like remembering only the most recent ADD_ADDR), but given 
the configurable limits we have it doesn't seem that different 
memory-wise. What's important is implementing this RFC functionality well, 
so it's really helpful to explore the alternatives here - and your 
understanding of the resource issues and DoS tradeoffs is very useful.


>
>>> 3) If I read correctly the series enforce the peer who could create the
>>> new subflow to respect the 'allow_join_initial_addr_port' field, but
>>> does not implement the corrsponding check on the other end. e.g. if a
>>> server sets 'allow_join_initial_addr_port' to 0 and a client tries to
>>> create additional subflows, the server will not forbit that. I
>>> think/fear the missing complexity here is quite significant.
>>>
>>> I think it would be nice avoid adding additional complexity for the 'C'
>>> bit - currently even mptcp.org support it.
>>>
>>
>> Assuming you intended to say "currently even mptcp.org does not support
>> it."
>>
>> You're right: the incoming join address check is missing. But I'm not sure
>> the complexity is so bad, the PM can already reject joins. I think we
>> could pass enough information to mptcp_pm_allow_new_subflow() to implement
>> this check. Am I oversimplifying - is it harder to get at the dest
>> addr/port? Or is it something else?
>
> What abot that setting C==0 on client sockets? does that imply that the
> client will accept incoming connection on the src port/addr?

C==0 says it's valid to send MP_JOINs to the client on the src port/addr. 
It's up to the client's PM to accept or reject them.

>
> For the server case, even a list-based filter would add addtional
> overhead and complexity for all setup.
>

It's not free. The RFC wording is a little fuzzy here, since it says the 
sender of C==1 "will not accept" incoming connections on that subflow. 
There's no "MUST/MUST NOT", so maybe it's possible to get by without. It 
would be good to have some input from RFC authors on this :)


> AFAICS the RFC mentions this option as on optimization to avoid
> connection attempts failing when something outside the endpoint
> themself would prevent them from being succeful.
>
> But this "optimization" has a cost in term of memory, complexity and
> additional overhead for each connection/subflow. It will work only if
> the administrator will configure the C bit properly. And that could be
> non trivial, when load balancing and NAT are outside the mptcp host.
>
> Just does not seam worthy to me, I really think we should start with
> the bare minumum to respect the RFC itself, avoiding as much complexity
> as possible:
> - setting C always to 0 on server
> - when the peer send C==1, cache the MPC addr/port, and do not attempt
> other connection to such endpoint.
>
> I'm not even sure the latter step is strictly necessary.

For the former, it is certainly simpler to always send C==0, but sending 
C==1 from a high-connection-count server behind a load balancer is the 
primary use case.

I think the "MUST NOT" wording of the RFC does require the latter step 
(server or client), and that implies a minimally functional in-kernel PM 
would need to track some other address to connect to.


> Another option
> would be passing the info to the user-space path manager - if available
> - and let it deal with all the complexity. If no user-space path
> manager is available just be dumb and simple ;)

The C bit does need to propagate to the userspace path manager. Since the 
userspace PM is mostly a client-side thing, it is situated well to handle 
the complexity of tracking incoming ADD_ADDRs and sending/not-sending 
MP_JOINs that comply with the C bit from the server.

However, an important complexity question in this email thread is about 
the server side (without userspace PM) that may or may not need to reject 
incoming MP_JOINs that it said it won't accept (as I mentioned above, the 
RFC is fuzzy here).



Is any of that more convincing about what the minimal kernel PM needs? Do 
you think some RFC author input on the rejection of MP_JOINs by the C==1 
sender would be useful?


Thanks for the questions!


>
>>> Perhaps we could keep 'allow_join_initial_addr_port' constant (always 0
>>> for client, always 0 for server) - avoding the additional complexity of
>>> point 3) above.
>>>
>>
>> Given the answer to #1 is this still what you'd propose?
>> allow_join_initial_addr_port=1 is unchanged behavior from the kernel
>> today, and is the simpler case.

--
Mat Martineau
Intel
Paolo Abeni May 19, 2021, 9:51 a.m. UTC | #7
On Tue, 2021-05-18 at 12:07 -0700, Mat Martineau wrote:
> On Tue, 18 May 2021, Paolo Abeni wrote:
> > The in-kernel path manager is want to be as simple as possible. As soon
> > as we start allocating memory on peer req it's very easy to be prone to
> > DoS. Memory accounting and limits could save us or at least mitigate
> > the problem, but they will add more complexity.
> 
> My assumption was that high-connection-count servers would probably have 
> their limits set in a way that ADD_ADDRs would not be accepted anyway. 
> This is the use case where clients may be behind a NAT and the server is 
> accepting MP_JOINs from the client (but the server is never sending 
> MP_JOINs), and the server is possibly sending ADD_ADDRs to the client. The 
> configurable limit is already there from the existing in-kernel PM code.
> 
> I do agree with "as simple as possible", but the RFC requirements are 
> constraining that. Even though the C bit is ignored in Linux 
> implementations so far, I don't think it should stay that way. The current 
> code always tries to connect to the initial subflow, so we need to do 
> *something* different to remember an alternate peer address. There are 
> other options (like remembering only the most recent ADD_ADDR), but given 
> the configurable limits we have it doesn't seem that different 
> memory-wise. What's important is implementing this RFC functionality well, 
> so it's really helpful to explore the alternatives here - and your 
> understanding of the resource issues and DoS tradeoffs is very useful.

Uhmmm... I start feeling I misunderstand the RFC WRT to 'C' bit
handling.

"""
		 The third bit, labeled "C", is set to 1 to indicate
                 that the sender of this option will not accept
                 additional MPTCP subflows to the source address and
                 port, and therefore the receiver MUST NOT try to open
                 any additional subflows toward this address and port.
"""

To avoid creating additional subflows towards the MPC server
address/port, we don't need to store any additional information beyond
the 'C' flag: net-next uses that pair only for new subflow with
'subflow' type (yep, the type name is not very self-explaining :(

We just need to forbit them:

---
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index d094588afad8..2ae9fac00623 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -451,7 +451,7 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
 
        /* check if should create a new subflow */
        if (msk->pm.local_addr_used < local_addr_max &&
-           msk->pm.subflows < subflows_max) {
+           msk->pm.subflows < subflows_max && !msk->pm.remote_deny_join_id0) {
                local = select_local_address(pernet, msk);
                if (local) {
                        struct mptcp_addr_info remote = { 0 };

---

> > Just does not seam worthy to me, I really think we should start with
> > the bare minumum to respect the RFC itself, avoiding as much complexity
> > as possible:
> > - setting C always to 0 on server
> > - when the peer send C==1, cache the MPC addr/port, and do not attempt
> > other connection to such endpoint.
> > 
> > I'm not even sure the latter step is strictly necessary.
> 
> For the former, it is certainly simpler to always send C==0, but sending 
> C==1 from a high-connection-count server behind a load balancer is the 
> primary use case.

I *think* it would be far better (and possibly not that complex!!!
@Florian: WDYT? ) implementing an MPTCP aware load balancer. The MPJ
syn pkt carries the remote token, so it's possible for the load
balancer to make the "correct" decision.

> I think the "MUST NOT" wording of the RFC does require the latter step 
> (server or client), and that implies a minimally functional in-kernel PM 
> would need to track some other address to connect to.

This part is unclear to me. Why should the kernel cache more
addresses? 

It's already capabale of using as many ADD_ADDRs as the peer provides
with the current options handling. Sure, there are some constraints.
The stricter one posed by the current implementation is that the kernel
fully processes (up to subflow creation) a given ADD_ADDR option before
using another one.

That is useful to avoid transforming an MPTCP client in a DDOS tool
controlled by an (evil) MPTCP server, and sounds like a reasonable
tread-off between complexity and flexibility.

Looks like we have some topic for tomorrow's mtg :)

Thanks!

Paolo
Mat Martineau May 19, 2021, 6:06 p.m. UTC | #8
On Wed, 19 May 2021, Paolo Abeni wrote:

> On Tue, 2021-05-18 at 12:07 -0700, Mat Martineau wrote:
>> On Tue, 18 May 2021, Paolo Abeni wrote:
>>> The in-kernel path manager is want to be as simple as possible. As soon
>>> as we start allocating memory on peer req it's very easy to be prone to
>>> DoS. Memory accounting and limits could save us or at least mitigate
>>> the problem, but they will add more complexity.
>>
>> My assumption was that high-connection-count servers would probably have
>> their limits set in a way that ADD_ADDRs would not be accepted anyway.
>> This is the use case where clients may be behind a NAT and the server is
>> accepting MP_JOINs from the client (but the server is never sending
>> MP_JOINs), and the server is possibly sending ADD_ADDRs to the client. The
>> configurable limit is already there from the existing in-kernel PM code.
>>
>> I do agree with "as simple as possible", but the RFC requirements are
>> constraining that. Even though the C bit is ignored in Linux
>> implementations so far, I don't think it should stay that way. The current
>> code always tries to connect to the initial subflow, so we need to do
>> *something* different to remember an alternate peer address. There are
>> other options (like remembering only the most recent ADD_ADDR), but given
>> the configurable limits we have it doesn't seem that different
>> memory-wise. What's important is implementing this RFC functionality well,
>> so it's really helpful to explore the alternatives here - and your
>> understanding of the resource issues and DoS tradeoffs is very useful.
>
> Uhmmm... I start feeling I misunderstand the RFC WRT to 'C' bit
> handling.
>
> """
> 		 The third bit, labeled "C", is set to 1 to indicate
>                 that the sender of this option will not accept
>                 additional MPTCP subflows to the source address and
>                 port, and therefore the receiver MUST NOT try to open
>                 any additional subflows toward this address and port.
> """
>
> To avoid creating additional subflows towards the MPC server
> address/port, we don't need to store any additional information beyond
> the 'C' flag: net-next uses that pair only for new subflow with
> 'subflow' type (yep, the type name is not very self-explaining :(
>
> We just need to forbit them:
>
> ---
> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
> index d094588afad8..2ae9fac00623 100644
> --- a/net/mptcp/pm_netlink.c
> +++ b/net/mptcp/pm_netlink.c
> @@ -451,7 +451,7 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
>
>        /* check if should create a new subflow */
>        if (msk->pm.local_addr_used < local_addr_max &&
> -           msk->pm.subflows < subflows_max) {
> +           msk->pm.subflows < subflows_max && !msk->pm.remote_deny_join_id0) {
>                local = select_local_address(pernet, msk);
>                if (local) {
>                        struct mptcp_addr_info remote = { 0 };
>
> ---
>
>>> Just does not seam worthy to me, I really think we should start with
>>> the bare minumum to respect the RFC itself, avoiding as much complexity
>>> as possible:
>>> - setting C always to 0 on server
>>> - when the peer send C==1, cache the MPC addr/port, and do not attempt
>>> other connection to such endpoint.
>>>
>>> I'm not even sure the latter step is strictly necessary.
>>
>> For the former, it is certainly simpler to always send C==0, but sending
>> C==1 from a high-connection-count server behind a load balancer is the
>> primary use case.
>
> I *think* it would be far better (and possibly not that complex!!!
> @Florian: WDYT? ) implementing an MPTCP aware load balancer. The MPJ
> syn pkt carries the remote token, so it's possible for the load
> balancer to make the "correct" decision.
>
>> I think the "MUST NOT" wording of the RFC does require the latter step
>> (server or client), and that implies a minimally functional in-kernel PM
>> would need to track some other address to connect to.
>
> This part is unclear to me. Why should the kernel cache more
> addresses?
>
> It's already capabale of using as many ADD_ADDRs as the peer provides
> with the current options handling. Sure, there are some constraints.
> The stricter one posed by the current implementation is that the kernel
> fully processes (up to subflow creation) a given ADD_ADDR option before
> using another one.
>
> That is useful to avoid transforming an MPTCP client in a DDOS tool
> controlled by an (evil) MPTCP server, and sounds like a reasonable
> tread-off between complexity and flexibility.
>
> Looks like we have some topic for tomorrow's mtg :)
>

Yeah, I think it will be far easier to discuss in that format!

--
Mat Martineau
Intel