diff mbox series

[net-next] net: sysctl data could be in .bss

Message ID 20211020083854.1101670-1-atenart@kernel.org (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net: sysctl data could be in .bss | expand

Checks

Context Check Description
netdev/cover_letter success Single patches do not need cover letters
netdev/fixes_present success Fixes tag not required for -next series
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 5 maintainers not CCed: tglx@linutronix.de andriy.shevchenko@linux.intel.com jacob.e.keller@intel.com akpm@linux-foundation.org peterz@infradead.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 18252 this patch: 18252
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success No Fixes tag
netdev/checkpatch warning CHECK: extern prototypes should be avoided in .h files
netdev/build_allmodconfig_warn success Errors and warnings before: 17670 this patch: 17670
netdev/header_inline success No static functions without inline keyword in header files

Commit Message

Antoine Tenart Oct. 20, 2021, 8:38 a.m. UTC
A check is made when registering non-init netns sysctl files to ensure
their data pointer does not point to a global data section. This works
well for modules as the check is made against the whole module address
space (is_module_address). But when built-in, the check is made against
the .data section. However global variables initialized to 0 can be in
.bss (-fzero-initialized-in-bss).

Add an extra check to make sure the sysctl data does not point to the
.bss section either.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Jonathon Reinhart <jonathon.reinhart@gmail.com>
---

Hello,

This was previously sent as an RFC[1] waiting for a problematic sysctl
to be fixed. The fix was accepted and is now in the nf tree[2].

This is not sent as a fix to avoid possible new warnings in stable
kernels. (The actual fixes of sysctl files should go).

I think this can go through the net-next tree as kernel/extable.c
doesn't seem to be under any subsystem and a conflict is unlikely to
happen.

Thanks,
Antoine

[1] https://lore.kernel.org/all/20211012155542.827631-1-atenart@kernel.org/T/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git/commit/?id=174c376278949c44aad89c514a6b5db6cee8db59

 include/linux/kernel.h | 1 +
 kernel/extable.c       | 8 ++++++++
 net/sysctl_net.c       | 2 +-
 3 files changed, 10 insertions(+), 1 deletion(-)

Comments

Jakub Kicinski Oct. 22, 2021, 8:01 p.m. UTC | #1
Widening the CC list a little.

On Wed, 20 Oct 2021 10:38:54 +0200 Antoine Tenart wrote:
> A check is made when registering non-init netns sysctl files to ensure
> their data pointer does not point to a global data section. This works
> well for modules as the check is made against the whole module address
> space (is_module_address). But when built-in, the check is made against
> the .data section. However global variables initialized to 0 can be in
> .bss (-fzero-initialized-in-bss).
> 
> Add an extra check to make sure the sysctl data does not point to the
> .bss section either.
> 
> Signed-off-by: Antoine Tenart <atenart@kernel.org>
> Reviewed-by: Jonathon Reinhart <jonathon.reinhart@gmail.com>
> ---
> Hello,
> 
> This was previously sent as an RFC[1] waiting for a problematic sysctl
> to be fixed. The fix was accepted and is now in the nf tree[2].
> 
> This is not sent as a fix to avoid possible new warnings in stable
> kernels. (The actual fixes of sysctl files should go).
> 
> I think this can go through the net-next tree as kernel/extable.c
> doesn't seem to be under any subsystem and a conflict is unlikely to
> happen.

> [1] https://lore.kernel.org/all/20211012155542.827631-1-atenart@kernel.org/T/
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git/commit/?id=174c376278949c44aad89c514a6b5db6cee8db59
> 
>  include/linux/kernel.h | 1 +
>  kernel/extable.c       | 8 ++++++++
>  net/sysctl_net.c       | 2 +-
>  3 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 2776423a587e..beb61d0ab220 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -231,6 +231,7 @@ extern char *next_arg(char *args, char **param, char **val);
>  extern int core_kernel_text(unsigned long addr);
>  extern int init_kernel_text(unsigned long addr);
>  extern int core_kernel_data(unsigned long addr);
> +extern int core_kernel_bss(unsigned long addr);

Is the intention of these helpers to have strict section name semantics
or higher level "is this global kernel data" semantics? If it's the
latter we could make core_kernel_data() check bss instead, chances are
all callers will either want that or not care. Steven?

>  extern int __kernel_text_address(unsigned long addr);
>  extern int kernel_text_address(unsigned long addr);
>  extern int func_ptr_is_kernel_text(void *ptr);
> diff --git a/kernel/extable.c b/kernel/extable.c
> index b0ea5eb0c3b4..477a4b6c8f63 100644
> --- a/kernel/extable.c
> +++ b/kernel/extable.c
> @@ -100,6 +100,14 @@ int core_kernel_data(unsigned long addr)
>  	return 0;
>  }
>  
> +int core_kernel_bss(unsigned long addr)
> +{
> +	if (addr >= (unsigned long)__bss_start &&
> +	    addr < (unsigned long)__bss_stop)
> +		return 1;
> +	return 0;
> +}
> +
>  int __kernel_text_address(unsigned long addr)
>  {
>  	if (kernel_text_address(addr))
> diff --git a/net/sysctl_net.c b/net/sysctl_net.c
> index f6cb0d4d114c..d883cf65029f 100644
> --- a/net/sysctl_net.c
> +++ b/net/sysctl_net.c
> @@ -144,7 +144,7 @@ static void ensure_safe_net_sysctl(struct net *net, const char *path,
>  		addr = (unsigned long)ent->data;
>  		if (is_module_address(addr))
>  			where = "module";
> -		else if (core_kernel_data(addr))
> +		else if (core_kernel_data(addr) || core_kernel_bss(addr))
>  			where = "kernel";
>  		else
>  			continue;
Jonathon Reinhart Nov. 8, 2021, 5:24 a.m. UTC | #2
On Fri, Oct 22, 2021 at 4:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Widening the CC list a little.
>
> On Wed, 20 Oct 2021 10:38:54 +0200 Antoine Tenart wrote:
> > A check is made when registering non-init netns sysctl files to ensure
> > their data pointer does not point to a global data section. This works
> > well for modules as the check is made against the whole module address
> > space (is_module_address). But when built-in, the check is made against
> > the .data section. However global variables initialized to 0 can be in
> > .bss (-fzero-initialized-in-bss).
> >
> > Add an extra check to make sure the sysctl data does not point to the
> > .bss section either.
> >
> > Signed-off-by: Antoine Tenart <atenart@kernel.org>
> > Reviewed-by: Jonathon Reinhart <jonathon.reinhart@gmail.com>
> > ---
> > Hello,
> >
> > This was previously sent as an RFC[1] waiting for a problematic sysctl
> > to be fixed. The fix was accepted and is now in the nf tree[2].
> >
> > This is not sent as a fix to avoid possible new warnings in stable
> > kernels. (The actual fixes of sysctl files should go).
> >
> > I think this can go through the net-next tree as kernel/extable.c
> > doesn't seem to be under any subsystem and a conflict is unlikely to
> > happen.
>
> > [1] https://lore.kernel.org/all/20211012155542.827631-1-atenart@kernel.org/T/
> > [2] https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git/commit/?id=174c376278949c44aad89c514a6b5db6cee8db59
> >
> >  include/linux/kernel.h | 1 +
> >  kernel/extable.c       | 8 ++++++++
> >  net/sysctl_net.c       | 2 +-
> >  3 files changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > index 2776423a587e..beb61d0ab220 100644
> > --- a/include/linux/kernel.h
> > +++ b/include/linux/kernel.h
> > @@ -231,6 +231,7 @@ extern char *next_arg(char *args, char **param, char **val);
> >  extern int core_kernel_text(unsigned long addr);
> >  extern int init_kernel_text(unsigned long addr);
> >  extern int core_kernel_data(unsigned long addr);
> > +extern int core_kernel_bss(unsigned long addr);
>
> Is the intention of these helpers to have strict section name semantics
> or higher level "is this global kernel data" semantics? If it's the
> latter we could make core_kernel_data() check bss instead, chances are
> all callers will either want that or not care. Steven?

The core_kernel_data() function was introduced in a2d063ac216c1, and
the commit message says:

"It may or may not return true for RO data... This utility function is
used to determine if data is safe from ever being freed. Thus it
should return true for all RW global data that is not in a module or
has been allocated, or false otherwise."

The intent of the function seems to be more in line with the
higher-level "is this global kernel data" semantics you suggested. The
purpose seems to be to differentiate between "part of the loaded
kernel image" vs. a dynamic allocation (which would include a loaded
module image). And given that it *might* return true for RO data
(depending on the arch linker script, presumably), I think it would be
safe to include .bss -- clearly, with that caveat in place, it isn't
promising strict section semantics.

There are only two existing in-tree consumers:

1. __register_ftrace_function() [kernel/trace/ftrace.c] -- Sets
FTRACE_OPS_FL_DYNAMIC if core_kernel_data(ops) returns false, which
denotes "dynamically allocated ftrace_ops which need special care". It
would be unlikely (if not impossible) for the "ops" object in question
to be all-zero and end up in the .bss, but if it were, then the
current behavior would be wrong. IOW, it would be more correct to
include .bss.

2. ensure_safe_net_sysctl() [net/sysctl_net.c] (The subject of this
thread) -- Trying to distinguish "global kernel data" (static/global
variables) from kmalloc-allocated objects. More correct to include
.bss.

Both of these callers only seem to delineate between static and
dynamic object allocations. Put another way, if core_kernel_bss(), all
existing callers should be updated to check core_kernel_data() ||
core_kernel_bss().

Since Steven introduced it, and until I added
ensure_safe_net_sysctl(), he / tracing was the only consumer.

Thinking critically from the C language perspective, I can't come up
with any case where one would actually expect core_kernel_data() to
return true for 'int global = 1' and false for 'int global = 0'.

In conclusion, I agree with your alternative proposal Jakub, and I
think this patch is the right way forward:

diff --git a/kernel/extable.c b/kernel/extable.c
index b0ea5eb0c3b4..8b6f1d0bdaf6 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -97,6 +97,9 @@ int core_kernel_data(unsigned long addr)
        if (addr >= (unsigned long)_sdata &&
            addr < (unsigned long)_edata)
                return 1;
+       if (addr >= (unsigned long)__bss_start &&
+           addr < (unsigned long)__bss_stop)
+               return 1;
        return 0;
 }

Jonathon Reinhart
Steven Rostedt Nov. 8, 2021, 4:11 p.m. UTC | #3
On Mon, 8 Nov 2021 00:24:33 -0500
Jonathon Reinhart <jonathon.reinhart@gmail.com> wrote:

> On Fri, Oct 22, 2021 at 4:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > Widening the CC list a little.

Thanks!

[..]

> The core_kernel_data() function was introduced in a2d063ac216c1, and
> the commit message says:
> 
> "It may or may not return true for RO data... This utility function is
> used to determine if data is safe from ever being freed. Thus it
> should return true for all RW global data that is not in a module or
> has been allocated, or false otherwise."
> 
> The intent of the function seems to be more in line with the
> higher-level "is this global kernel data" semantics you suggested. The
> purpose seems to be to differentiate between "part of the loaded
> kernel image" vs. a dynamic allocation (which would include a loaded
> module image). And given that it *might* return true for RO data
> (depending on the arch linker script, presumably), I think it would be
> safe to include .bss -- clearly, with that caveat in place, it isn't
> promising strict section semantics.
> 
> There are only two existing in-tree consumers:
> 
> 1. __register_ftrace_function() [kernel/trace/ftrace.c] -- Sets
> FTRACE_OPS_FL_DYNAMIC if core_kernel_data(ops) returns false, which
> denotes "dynamically allocated ftrace_ops which need special care". It
> would be unlikely (if not impossible) for the "ops" object in question
> to be all-zero and end up in the .bss, but if it were, then the
> current behavior would be wrong. IOW, it would be more correct to
> include .bss.
> 
> 2. ensure_safe_net_sysctl() [net/sysctl_net.c] (The subject of this
> thread) -- Trying to distinguish "global kernel data" (static/global
> variables) from kmalloc-allocated objects. More correct to include
> .bss.
> 
> Both of these callers only seem to delineate between static and
> dynamic object allocations. Put another way, if core_kernel_bss(), all
> existing callers should be updated to check core_kernel_data() ||
> core_kernel_bss().
> 
> Since Steven introduced it, and until I added
> ensure_safe_net_sysctl(), he / tracing was the only consumer.

I agree with your analysis.

The intent is that allocated ftrace_ops (things that function tracer uses
to know what callbacks are called from function entry), must go through a
very slow synchronization (rcu_synchronize_tasks). But this is not needed
if the ftrace_ops is part of the core kernel (.data or .bss) as that will
never be freed, and thus does not need to worry about it disappearing while
they are still in use.

> 
> Thinking critically from the C language perspective, I can't come up
> with any case where one would actually expect core_kernel_data() to
> return true for 'int global = 1' and false for 'int global = 0'.
> 
> In conclusion, I agree with your alternative proposal Jakub, and I
> think this patch is the right way forward:
> 
> diff --git a/kernel/extable.c b/kernel/extable.c
> index b0ea5eb0c3b4..8b6f1d0bdaf6 100644
> --- a/kernel/extable.c
> +++ b/kernel/extable.c
> @@ -97,6 +97,9 @@ int core_kernel_data(unsigned long addr)
>         if (addr >= (unsigned long)_sdata &&
>             addr < (unsigned long)_edata)
>                 return 1;
> +       if (addr >= (unsigned long)__bss_start &&
> +           addr < (unsigned long)__bss_stop)
> +               return 1;
>         return 0;
>  }

Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve
Antoine Tenart Nov. 15, 2021, 9:15 a.m. UTC | #4
Thanks Jonathon for the analysis and Steven for the review! I'll send
the patch formally then.

Antoine

Quoting Steven Rostedt (2021-11-08 17:11:14)
> On Mon, 8 Nov 2021 00:24:33 -0500
> Jonathon Reinhart <jonathon.reinhart@gmail.com> wrote:
> 
> > On Fri, Oct 22, 2021 at 4:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > Widening the CC list a little.
> 
> Thanks!
> 
> [..]
> 
> > The core_kernel_data() function was introduced in a2d063ac216c1, and
> > the commit message says:
> > 
> > "It may or may not return true for RO data... This utility function is
> > used to determine if data is safe from ever being freed. Thus it
> > should return true for all RW global data that is not in a module or
> > has been allocated, or false otherwise."
> > 
> > The intent of the function seems to be more in line with the
> > higher-level "is this global kernel data" semantics you suggested. The
> > purpose seems to be to differentiate between "part of the loaded
> > kernel image" vs. a dynamic allocation (which would include a loaded
> > module image). And given that it *might* return true for RO data
> > (depending on the arch linker script, presumably), I think it would be
> > safe to include .bss -- clearly, with that caveat in place, it isn't
> > promising strict section semantics.
> > 
> > There are only two existing in-tree consumers:
> > 
> > 1. __register_ftrace_function() [kernel/trace/ftrace.c] -- Sets
> > FTRACE_OPS_FL_DYNAMIC if core_kernel_data(ops) returns false, which
> > denotes "dynamically allocated ftrace_ops which need special care". It
> > would be unlikely (if not impossible) for the "ops" object in question
> > to be all-zero and end up in the .bss, but if it were, then the
> > current behavior would be wrong. IOW, it would be more correct to
> > include .bss.
> > 
> > 2. ensure_safe_net_sysctl() [net/sysctl_net.c] (The subject of this
> > thread) -- Trying to distinguish "global kernel data" (static/global
> > variables) from kmalloc-allocated objects. More correct to include
> > .bss.
> > 
> > Both of these callers only seem to delineate between static and
> > dynamic object allocations. Put another way, if core_kernel_bss(), all
> > existing callers should be updated to check core_kernel_data() ||
> > core_kernel_bss().
> > 
> > Since Steven introduced it, and until I added
> > ensure_safe_net_sysctl(), he / tracing was the only consumer.
> 
> I agree with your analysis.
> 
> The intent is that allocated ftrace_ops (things that function tracer uses
> to know what callbacks are called from function entry), must go through a
> very slow synchronization (rcu_synchronize_tasks). But this is not needed
> if the ftrace_ops is part of the core kernel (.data or .bss) as that will
> never be freed, and thus does not need to worry about it disappearing while
> they are still in use.
> 
> > 
> > Thinking critically from the C language perspective, I can't come up
> > with any case where one would actually expect core_kernel_data() to
> > return true for 'int global = 1' and false for 'int global = 0'.
> > 
> > In conclusion, I agree with your alternative proposal Jakub, and I
> > think this patch is the right way forward:
> > 
> > diff --git a/kernel/extable.c b/kernel/extable.c
> > index b0ea5eb0c3b4..8b6f1d0bdaf6 100644
> > --- a/kernel/extable.c
> > +++ b/kernel/extable.c
> > @@ -97,6 +97,9 @@ int core_kernel_data(unsigned long addr)
> >         if (addr >= (unsigned long)_sdata &&
> >             addr < (unsigned long)_edata)
> >                 return 1;
> > +       if (addr >= (unsigned long)__bss_start &&
> > +           addr < (unsigned long)__bss_stop)
> > +               return 1;
> >         return 0;
> >  }
> 
> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> 
> -- Steve
>
diff mbox series

Patch

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 2776423a587e..beb61d0ab220 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -231,6 +231,7 @@  extern char *next_arg(char *args, char **param, char **val);
 extern int core_kernel_text(unsigned long addr);
 extern int init_kernel_text(unsigned long addr);
 extern int core_kernel_data(unsigned long addr);
+extern int core_kernel_bss(unsigned long addr);
 extern int __kernel_text_address(unsigned long addr);
 extern int kernel_text_address(unsigned long addr);
 extern int func_ptr_is_kernel_text(void *ptr);
diff --git a/kernel/extable.c b/kernel/extable.c
index b0ea5eb0c3b4..477a4b6c8f63 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -100,6 +100,14 @@  int core_kernel_data(unsigned long addr)
 	return 0;
 }
 
+int core_kernel_bss(unsigned long addr)
+{
+	if (addr >= (unsigned long)__bss_start &&
+	    addr < (unsigned long)__bss_stop)
+		return 1;
+	return 0;
+}
+
 int __kernel_text_address(unsigned long addr)
 {
 	if (kernel_text_address(addr))
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index f6cb0d4d114c..d883cf65029f 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -144,7 +144,7 @@  static void ensure_safe_net_sysctl(struct net *net, const char *path,
 		addr = (unsigned long)ent->data;
 		if (is_module_address(addr))
 			where = "module";
-		else if (core_kernel_data(addr))
+		else if (core_kernel_data(addr) || core_kernel_bss(addr))
 			where = "kernel";
 		else
 			continue;