diff mbox series

[net,v1,1/2] netlink: let len field used to parse type-not-care nested attrs

Message ID 20230731121247.3972783-1-linma@zju.edu.cn (mailing list archive)
State Rejected
Delegated to: Netdev Maintainers
Headers show
Series [net,v1,1/2] netlink: let len field used to parse type-not-care nested attrs | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4088 this patch: 4088
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 1898 this patch: 1898
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4258 this patch: 4258
netdev/checkpatch warning WARNING: line length of 87 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Lin Ma July 31, 2023, 12:12 p.m. UTC
Recently I found several manual parsing cases for nested attributes
whose fix is rather trivial. The pattern for those like below

const struct nla_policy y[...] = {
  ...
  X	= { .type = NLA_NESTED },
  ...
}

nla_for_each_nested/attr(nla, tb[X], ...) {
   // nla_type never used
   ...
   x = nla_data(nla) // directly access nla without length checking
   ....
}

One example can be found in discussion at:
https://lore.kernel.org/all/20230723074504.3706691-1-linma@zju.edu.cn/

In short, the very direct idea to fix such lengh-check-forgotten bug is
add nla_len() checks like

  if (nla_len(nla) < SOME_LEN)
    return -EINVAL;

However, this is tedious and just like Leon said: add another layer of
cabal knowledge. The better solution should leverage the nla_policy and
discard nlattr whose length is invalid when doing parsing. That is, we
should defined a nested_policy for the X above like

  X      = { NLA_POLICY_NESTED(Z) },

But unfortunately, as said above, the nla_type is never used in such
manual parsing cases, which means is difficult to defined a nested
policy Z without breaking user space (they may put weird value in type
of these nlattrs, we have no idea).

To this end, this commit uses the len field in nla_policy crafty and
allow the existing validate_nla checks such type-not-care nested attrs.
In current implementation, for attribute with type NLA_NESTED, the len
field used as the length of the nested_policy:

	{ .type = NLA_NESTED, .nested_policy = policy, .len = maxattr }

	_NLA_POLICY_NESTED(ARRAY_SIZE(policy) - 1, policy)

If one nlattr does not provide policy, like the example X, this len
field is not used. This means we can leverage this field for our end.
This commit introduces one new macro named NLA_POLICY_NESTED_NO_TYPE
and let validate_nla() to use the len field as a hint to check the
nested attributes. Therefore, such manual parsing code can also define
a nla_policy and take advantage of the validation within the existing
parsers like nla_parse_deprecated..

Signed-off-by: Lin Ma <linma@zju.edu.cn>
---
 include/net/netlink.h |  6 ++++++
 lib/nlattr.c          | 11 +++++++++++
 2 files changed, 17 insertions(+)

Comments

Jakub Kicinski July 31, 2023, 7:03 p.m. UTC | #1
On Mon, 31 Jul 2023 20:12:47 +0800 Lin Ma wrote:
> In short, the very direct idea to fix such lengh-check-forgotten bug is
> add nla_len() checks like
> 
>   if (nla_len(nla) < SOME_LEN)
>     return -EINVAL;
> 
> However, this is tedious and just like Leon said: add another layer of
> cabal knowledge. The better solution should leverage the nla_policy and
> discard nlattr whose length is invalid when doing parsing. That is, we
> should defined a nested_policy for the X above like

Hard no. Putting array index into attr type is an advanced case and the
parsing code has to be able to deal with low level netlink details.
Higher level API should remove the nla_for_each_nested() completely
which is rather hard to achieve here.

Nacked-by: Jakub Kicinski <kuba@kernel.org>
Lin Ma Aug. 1, 2023, 2 a.m. UTC | #2
Hello Jakub,

> > 
> > However, this is tedious and just like Leon said: add another layer of
> > cabal knowledge. The better solution should leverage the nla_policy and
> > discard nlattr whose length is invalid when doing parsing. That is, we
> > should defined a nested_policy for the X above like
> 
> Hard no. Putting array index into attr type is an advanced case and the
> parsing code has to be able to deal with low level netlink details.

Well, I just known that the type field for those attributes is used as array
index.
Hence, for this advanced case, could we define another NLA type, maybe 
NLA_NESTED_IDXARRAY enum? That may be much clearer against modifying existing
code.

> Higher level API should remove the nla_for_each_nested() completely
> which is rather hard to achieve here.

By investigating the code uses nla_for_each_nested macro. There are basically
two scenarios:

1. manually parse nested attributes whose type is not cared (the advance case
   use type as index here).
2. manually parse nested attributes for *one* specific type. Such code do
   nla_type check.

From the API side, to completely remove nla_for_each_nested and avoid the
manual  parsing. I think we can choose two solutions.

Solution-1: add a parsing helper that receives a function pointer as an
            argument, it will call this pointer after carefully verify the
            type and length of an attribute.

Solution-2: add a parsing helper that traverses this nested twice, the first
            time  to do counting size for allocating heap buffer (or stack
            buffer from the caller if the max count is known). The second
            time to fill this buffer with attribute pointers.

Which one is preferred? Please enlighten me about this and I can try to propose
a fix. (I personally like the solution-2 as it works like the existing parsers
like nla_parse) 

> 
> Nacked-by: Jakub Kicinski <kuba@kernel.org>

Thanks
Lin
Jakub Kicinski Aug. 1, 2023, 2:31 a.m. UTC | #3
On Tue, 1 Aug 2023 10:00:01 +0800 (GMT+08:00) Lin Ma wrote:
> > > However, this is tedious and just like Leon said: add another layer of
> > > cabal knowledge. The better solution should leverage the nla_policy and
> > > discard nlattr whose length is invalid when doing parsing. That is, we
> > > should defined a nested_policy for the X above like  
> > 
> > Hard no. Putting array index into attr type is an advanced case and the
> > parsing code has to be able to deal with low level netlink details.  
> 
> Well, I just known that the type field for those attributes is used as array
> index.
> Hence, for this advanced case, could we define another NLA type, maybe 
> NLA_NESTED_IDXARRAY enum? That may be much clearer against modifying existing
> code.

What if the value is of a complex type (nest)?  For 10th time, if
someone does "interesting things" they must know what they're doing.

> > Higher level API should remove the nla_for_each_nested() completely
> > which is rather hard to achieve here.  
> 
> By investigating the code uses nla_for_each_nested macro. There are basically
> two scenarios:
> 
> 1. manually parse nested attributes whose type is not cared (the advance case
>    use type as index here).
> 2. manually parse nested attributes for *one* specific type. Such code do
>    nla_type check.
> 
> From the API side, to completely remove nla_for_each_nested and avoid the
> manual  parsing. I think we can choose two solutions.
> 
> Solution-1: add a parsing helper that receives a function pointer as an
>             argument, it will call this pointer after carefully verify the
>             type and length of an attribute.
> 
> Solution-2: add a parsing helper that traverses this nested twice, the first
>             time  to do counting size for allocating heap buffer (or stack
>             buffer from the caller if the max count is known). The second
>             time to fill this buffer with attribute pointers.
> 
> Which one is preferred? Please enlighten me about this and I can try to propose
> a fix. (I personally like the solution-2 as it works like the existing parsers
> like nla_parse) 

Your initial fix was perfectly fine.

We do need a solution for a normal multi-attr parse, but that's not 
the case here.
Lin Ma Aug. 1, 2023, 2:40 a.m. UTC | #4
Hello Jakub,

> 
> Your initial fix was perfectly fine.
> 
> We do need a solution for a normal multi-attr parse, but that's not 
> the case here.

Cool 
Leon Romanovsky Aug. 1, 2023, 8:11 a.m. UTC | #5
On Mon, Jul 31, 2023 at 12:03:26PM -0700, Jakub Kicinski wrote:
> On Mon, 31 Jul 2023 20:12:47 +0800 Lin Ma wrote:
> > In short, the very direct idea to fix such lengh-check-forgotten bug is
> > add nla_len() checks like
> > 
> >   if (nla_len(nla) < SOME_LEN)
> >     return -EINVAL;
> > 
> > However, this is tedious and just like Leon said: add another layer of
> > cabal knowledge. The better solution should leverage the nla_policy and
> > discard nlattr whose length is invalid when doing parsing. That is, we
> > should defined a nested_policy for the X above like
> 
> Hard no. Putting array index into attr type is an advanced case and the
> parsing code has to be able to deal with low level netlink details.

Jakub,

IMHO, you are lowering too much the separation line between simple vs.
advanced use cases. 

I had no idea that my use-case of passing nested netlink array is counted
as advanced usage.

Thanks
Jakub Kicinski Aug. 1, 2023, 5:57 p.m. UTC | #6
On Tue, 1 Aug 2023 11:11:17 +0300 Leon Romanovsky wrote:
> IMHO, you are lowering too much the separation line between simple vs.
> advanced use cases. 
> 
> I had no idea that my use-case of passing nested netlink array is counted
> as advanced usage.

Agreed, that's a fair point. I'm guessing it was inspired by the
ethtool stats? (Which in hindsight were a mistake on my part.)

For the longest time there was no docs or best practices for netlink.
We have the documentation and more infrastructure in place now.
I hope if you wrote the code today the distinction would have been
clearer.

If we start adding APIs for various one-(two?)-offs from the past
we'll never dig ourselves out of the "no idea what's the normal use
of these APIs" hole..
Leon Romanovsky Aug. 1, 2023, 7:49 p.m. UTC | #7
On Tue, Aug 01, 2023 at 10:57:26AM -0700, Jakub Kicinski wrote:
> On Tue, 1 Aug 2023 11:11:17 +0300 Leon Romanovsky wrote:
> > IMHO, you are lowering too much the separation line between simple vs.
> > advanced use cases. 
> > 
> > I had no idea that my use-case of passing nested netlink array is counted
> > as advanced usage.
> 
> Agreed, that's a fair point. I'm guessing it was inspired by the
> ethtool stats? (Which in hindsight were a mistake on my part.)

I don't remember which part of kernel can be blamed for it. :)

> 
> For the longest time there was no docs or best practices for netlink.
> We have the documentation and more infrastructure in place now.
> I hope if you wrote the code today the distinction would have been
> clearer.
> 
> If we start adding APIs for various one-(two?)-offs from the past
> we'll never dig ourselves out of the "no idea what's the normal use
> of these APIs" hole..

I agree with this sentence, just afraid that it is unrealistic goal, due
to extensive flexibility in netlink UAPI toward user-space, which allows
you to shoot yourself in the foot without even noticing it.

Thanks
Lin Ma Aug. 2, 2023, 12:26 a.m. UTC | #8
Hello there,

> For the longest time there was no docs or best practices for netlink.
> We have the documentation and more infrastructure in place now.
> I hope if you wrote the code today the distinction would have been
> clearer.
> 
> If we start adding APIs for various one-(two?)-offs from the past
> we'll never dig ourselves out of the "no idea what's the normal use
> of these APIs" hole..

This is true. Actually, those check missing codes are mostly old codes and
modern netlink consumers will choose the general netlink interface which
can automatically get attributes description from YAML and never need to
do things like *manual parsing* anymore.

However, according to my practice in auditing the code, I found there are
some general netlink interface users confront other issues like choosing
GENL_DONT_VALIDATE_STRICT without thinking or forgetting add a new
nla_policy when introducing new attributes.

To this end, I'm currently writing a simple documentation about Netlink
interface best practices for the kernel developer (the newly coming docs
are mostly about the user API part). 

I guess I will put this doc in Documentation/staging :)
and I will finish the draft ASAP.

Thanks
Lin
Jakub Kicinski Aug. 2, 2023, 12:53 a.m. UTC | #9
On Wed, 2 Aug 2023 08:26:06 +0800 (GMT+08:00) Lin Ma wrote:
> This is true. Actually, those check missing codes are mostly old codes and
> modern netlink consumers will choose the general netlink interface which
> can automatically get attributes description from YAML and never need to
> do things like *manual parsing* anymore.
> 
> However, according to my practice in auditing the code, I found there are
> some general netlink interface users confront other issues like choosing
> GENL_DONT_VALIDATE_STRICT without thinking or forgetting add a new
> nla_policy when introducing new attributes.
> 
> To this end, I'm currently writing a simple documentation about Netlink
> interface best practices for the kernel developer (the newly coming docs
> are mostly about the user API part). 

Keep in mind that even most of the genetlink stuff is pretty old
at this stage. ethtool is probably the first reasonably modern family.
But do send docs, we'll review and go from there :)
diff mbox series

Patch

diff --git a/include/net/netlink.h b/include/net/netlink.h
index b12cd957abb4..d825a5672161 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -229,6 +229,9 @@  enum nla_policy_validation {
  *                         nested header (or empty); len field is used if
  *                         nested_policy is also used, for the max attr
  *                         number in the nested policy.
+ *                         For NLA_NESTED whose nested nlattr is not necessary,
+ *                         the len field will indicate the exptected length of
+ *                         them for checking.
  *    NLA_U8, NLA_U16,
  *    NLA_U32, NLA_U64,
  *    NLA_S8, NLA_S16,
@@ -372,6 +375,9 @@  struct nla_policy {
 	_NLA_POLICY_NESTED(ARRAY_SIZE(policy) - 1, policy)
 #define NLA_POLICY_NESTED_ARRAY(policy) \
 	_NLA_POLICY_NESTED_ARRAY(ARRAY_SIZE(policy) - 1, policy)
+/* not care about the nested attributes, just do length check */
+#define NLA_POLICY_NESTED_NO_TYPE(length) \
+	_NLA_POLICY_NESTED(length, NULL)
 #define NLA_POLICY_BITFIELD32(valid) \
 	{ .type = NLA_BITFIELD32, .bitfield32_valid = valid }
 
diff --git a/lib/nlattr.c b/lib/nlattr.c
index 489e15bde5c1..29a412b41d28 100644
--- a/lib/nlattr.c
+++ b/lib/nlattr.c
@@ -488,6 +488,18 @@  static int validate_nla(const struct nlattr *nla, int maxtype,
 				 */
 				return err;
 			}
+		} else if (pt->len) {
+			/* length set without nested_policy, the len field will
+			 * be used to check those nested attributes here,
+			 * we will not do parse here but just validation as the
+			 * consumers will do manual parsing.
+			 */
+			const struct nlattr *nla_nested;
+			int rem;
+
+			nla_for_each_attr(nla_nested, nla_data(nla), nla_len(nla), rem)
+				if (nla_len(nla_nested) < pt->len)
+					return -EINVAL;
 		}
 		break;
 	case NLA_NESTED_ARRAY: