diff mbox series

tracing: Stop FORTIFY_SOURCE complaining about stack trace caller

Message ID 20230712105235.5fc441aa@gandalf.local.home (mailing list archive)
State Accepted
Commit bec3c25c247c4f88a33d79675a09e1644c9a3114
Headers show
Series tracing: Stop FORTIFY_SOURCE complaining about stack trace caller | expand

Commit Message

Steven Rostedt July 12, 2023, 2:52 p.m. UTC
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

The stack_trace event is an event created by the tracing subsystem to
store stack traces. It originally just contained a hard coded array of 8
words to hold the stack, and a "size" to know how many entries are there.
This is exported to user space as:

name: kernel_stack
ID: 4
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:int size;	offset:8;	size:4;	signed:1;
	field:unsigned long caller[8];	offset:16;	size:64;	signed:0;

print fmt: "\t=> %ps\n\t=> %ps\n\t=> %ps\n" "\t=> %ps\n\t=> %ps\n\t=> %ps\n" "\t=> %ps\n\t=> %ps\n",i
 (void *)REC->caller[0], (void *)REC->caller[1], (void *)REC->caller[2],
 (void *)REC->caller[3], (void *)REC->caller[4], (void *)REC->caller[5],
 (void *)REC->caller[6], (void *)REC->caller[7]

Where the user space tracers could parse the stack. The library was
updated for this specific event to only look at the size, and not the
array. But some older users still look at the array (note, the older code
still checks to make sure the array fits inside the event that it read.
That is, if only 4 words were saved, the parser would not read the fifth
word because it will see that it was outside of the event size).

This event was changed a while ago to be more dynamic, and would save a
full stack even if it was greater than 8 words. It does this by simply
allocating more ring buffer to hold the extra words. Then it copies in the
stack via:

	memcpy(&entry->caller, fstack->calls, size);

As the entry is struct stack_entry, that is created by a macro to both
create the structure and export this to user space, it still had the caller
field of entry defined as: unsigned long caller[8].

When the stack is greater than 8, the FORTIFY_SOURCE code notices that the
amount being copied is greater than the source array and complains about
it. It has no idea that the source is pointing to the ring buffer with the
required allocation.

To hide this from the FORTIFY_SOURCE logic, pointer arithmetic is used:

	ptr = ring_buffer_event_data(event);
	entry = ptr;
	ptr += offsetof(typeof(*entry), caller);
	memcpy(ptr, fstack->calls, size);

Link: https://lore.kernel.org/all/20230612160748.4082850-1-svens@linux.ibm.com/

Reported-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

Comments

Sven Schnelle July 12, 2023, 6:22 p.m. UTC | #1
Steven Rostedt <rostedt@goodmis.org> writes:

> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
>
> The stack_trace event is an event created by the tracing subsystem to
> store stack traces. It originally just contained a hard coded array of 8
> words to hold the stack, and a "size" to know how many entries are there.
> This is exported to user space as:
>
> name: kernel_stack
> ID: 4
> format:
> 	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
> 	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
> 	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
> 	field:int common_pid;	offset:4;	size:4;	signed:1;
>
> 	field:int size;	offset:8;	size:4;	signed:1;
> 	field:unsigned long caller[8];	offset:16;	size:64;	signed:0;
>
> print fmt: "\t=> %ps\n\t=> %ps\n\t=> %ps\n" "\t=> %ps\n\t=> %ps\n\t=> %ps\n" "\t=> %ps\n\t=> %ps\n",i
>  (void *)REC->caller[0], (void *)REC->caller[1], (void *)REC->caller[2],
>  (void *)REC->caller[3], (void *)REC->caller[4], (void *)REC->caller[5],
>  (void *)REC->caller[6], (void *)REC->caller[7]
>
> Where the user space tracers could parse the stack. The library was
> updated for this specific event to only look at the size, and not the
> array. But some older users still look at the array (note, the older code
> still checks to make sure the array fits inside the event that it read.
> That is, if only 4 words were saved, the parser would not read the fifth
> word because it will see that it was outside of the event size).
>
> This event was changed a while ago to be more dynamic, and would save a
> full stack even if it was greater than 8 words. It does this by simply
> allocating more ring buffer to hold the extra words. Then it copies in the
> stack via:
>
> 	memcpy(&entry->caller, fstack->calls, size);
>
> As the entry is struct stack_entry, that is created by a macro to both
> create the structure and export this to user space, it still had the caller
> field of entry defined as: unsigned long caller[8].
>
> When the stack is greater than 8, the FORTIFY_SOURCE code notices that the
> amount being copied is greater than the source array and complains about
> it. It has no idea that the source is pointing to the ring buffer with the
> required allocation.
>
> To hide this from the FORTIFY_SOURCE logic, pointer arithmetic is used:
>
> 	ptr = ring_buffer_event_data(event);
> 	entry = ptr;
> 	ptr += offsetof(typeof(*entry), caller);
> 	memcpy(ptr, fstack->calls, size);
>
> Link: https://lore.kernel.org/all/20230612160748.4082850-1-svens@linux.ibm.com/
>
> Reported-by: Sven Schnelle <svens@linux.ibm.com>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
>  kernel/trace/trace.c | 21 +++++++++++++++++++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 4529e264cb86..20122eeccf97 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -3118,6 +3118,7 @@ static void __ftrace_trace_stack(struct trace_buffer *buffer,
>  	struct ftrace_stack *fstack;
>  	struct stack_entry *entry;
>  	int stackidx;
> +	void *ptr;
>  
>  	/*
>  	 * Add one, for this function and the call to save_stack_trace()
> @@ -3161,9 +3162,25 @@ static void __ftrace_trace_stack(struct trace_buffer *buffer,
>  				    trace_ctx);
>  	if (!event)
>  		goto out;
> -	entry = ring_buffer_event_data(event);
> +	ptr = ring_buffer_event_data(event);
> +	entry = ptr;
> +
> +	/*
> +	 * For backward compatibility reasons, the entry->caller is an
> +	 * array of 8 slots to store the stack. This is also exported
> +	 * to user space. The amount allocated on the ring buffer actually
> +	 * holds enough for the stack specified by nr_entries. This will
> +	 * go into the location of entry->caller. Due to string fortifiers
> +	 * checking the size of the destination of memcpy() it triggers
> +	 * when it detects that size is greater than 8. To hide this from
> +	 * the fortifiers, we use "ptr" and pointer arithmetic to assign caller.
> +	 *
> +	 * The below is really just:
> +	 *   memcpy(&entry->caller, fstack->calls, size);
> +	 */
> +	ptr += offsetof(typeof(*entry), caller);
> +	memcpy(ptr, fstack->calls, size);
>  
> -	memcpy(&entry->caller, fstack->calls, size);
>  	entry->size = nr_entries;
>  
>  	if (!call_filter_check_discard(call, entry, buffer, event))

Works, so:

Tested-by: Sven Schnelle <svens@linux.ibm.com>
Kees Cook July 12, 2023, 11:36 p.m. UTC | #2
On Wed, Jul 12, 2023 at 10:52:35AM -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> The stack_trace event is an event created by the tracing subsystem to
> store stack traces. It originally just contained a hard coded array of 8
> words to hold the stack, and a "size" to know how many entries are there.
> This is exported to user space as:
> 
> name: kernel_stack
> ID: 4
> format:
> 	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
> 	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
> 	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
> 	field:int common_pid;	offset:4;	size:4;	signed:1;
> 
> 	field:int size;	offset:8;	size:4;	signed:1;
> 	field:unsigned long caller[8];	offset:16;	size:64;	signed:0;
> 
> print fmt: "\t=> %ps\n\t=> %ps\n\t=> %ps\n" "\t=> %ps\n\t=> %ps\n\t=> %ps\n" "\t=> %ps\n\t=> %ps\n",i
>  (void *)REC->caller[0], (void *)REC->caller[1], (void *)REC->caller[2],
>  (void *)REC->caller[3], (void *)REC->caller[4], (void *)REC->caller[5],
>  (void *)REC->caller[6], (void *)REC->caller[7]
> 
> Where the user space tracers could parse the stack. The library was
> updated for this specific event to only look at the size, and not the
> array. But some older users still look at the array (note, the older code
> still checks to make sure the array fits inside the event that it read.
> That is, if only 4 words were saved, the parser would not read the fifth
> word because it will see that it was outside of the event size).
> 
> This event was changed a while ago to be more dynamic, and would save a
> full stack even if it was greater than 8 words. It does this by simply
> allocating more ring buffer to hold the extra words. Then it copies in the
> stack via:
> 
> 	memcpy(&entry->caller, fstack->calls, size);
> 
> As the entry is struct stack_entry, that is created by a macro to both
> create the structure and export this to user space, it still had the caller
> field of entry defined as: unsigned long caller[8].
> 
> When the stack is greater than 8, the FORTIFY_SOURCE code notices that the
> amount being copied is greater than the source array and complains about
> it. It has no idea that the source is pointing to the ring buffer with the
> required allocation.
> 
> To hide this from the FORTIFY_SOURCE logic, pointer arithmetic is used:
> 
> 	ptr = ring_buffer_event_data(event);
> 	entry = ptr;
> 	ptr += offsetof(typeof(*entry), caller);
> 	memcpy(ptr, fstack->calls, size);

But... Why are you lying to the compiler? Why not just make this
dynamically sized for real? It's not a "struct stack_entry" if it might
be bigger.

Just create a new struct that isn't lying? (Dealing with the "minimum
size" issue for a dynamic array is usually done with unions, but
ftrace's structs are ... different. As such, I just added a one-off
union.) Here, and you can be the first user of __counted_by too:

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4529e264cb86..40935578c365 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3108,6 +3108,14 @@ struct ftrace_stacks {
 static DEFINE_PER_CPU(struct ftrace_stacks, ftrace_stacks);
 static DEFINE_PER_CPU(int, ftrace_stack_reserve);
 
+union stack_entry_dynamic {
+	struct stack_entry entry;
+	struct {
+		int size;
+		unsigned long caller[] __counted_by(size);
+	};
+};
+
 static void __ftrace_trace_stack(struct trace_buffer *buffer,
 				 unsigned int trace_ctx,
 				 int skip, struct pt_regs *regs)
@@ -3116,7 +3124,7 @@ static void __ftrace_trace_stack(struct trace_buffer *buffer,
 	struct ring_buffer_event *event;
 	unsigned int size, nr_entries;
 	struct ftrace_stack *fstack;
-	struct stack_entry *entry;
+	union stack_entry_dynamic *entry;
 	int stackidx;
 
 	/*
@@ -3155,16 +3163,15 @@ static void __ftrace_trace_stack(struct trace_buffer *buffer,
 		nr_entries = stack_trace_save(fstack->calls, size, skip);
 	}
 
-	size = nr_entries * sizeof(unsigned long);
 	event = __trace_buffer_lock_reserve(buffer, TRACE_STACK,
-				    (sizeof(*entry) - sizeof(entry->caller)) + size,
+				    struct_size(entry, caller, nr_entries),
 				    trace_ctx);
 	if (!event)
 		goto out;
 	entry = ring_buffer_event_data(event);
 
-	memcpy(&entry->caller, fstack->calls, size);
 	entry->size = nr_entries;
+	memcpy(entry->caller, fstack->calls, flex_array_size(entry, caller, nr_entries));
 
 	if (!call_filter_check_discard(call, entry, buffer, event))
 		__buffer_unlock_commit(buffer, event);
Steven Rostedt July 13, 2023, 12:43 a.m. UTC | #3
On Wed, 12 Jul 2023 16:36:30 -0700
Kees Cook <keescook@chromium.org> wrote:

> > 
> > To hide this from the FORTIFY_SOURCE logic, pointer arithmetic is used:
> > 
> > 	ptr = ring_buffer_event_data(event);
> > 	entry = ptr;
> > 	ptr += offsetof(typeof(*entry), caller);
> > 	memcpy(ptr, fstack->calls, size);  
> 
> But... Why are you lying to the compiler? Why not just make this
> dynamically sized for real? It's not a "struct stack_entry" if it might
> be bigger.

I was waiting for some controversy from this patch ;-)

> 
> Just create a new struct that isn't lying? (Dealing with the "minimum
> size" issue for a dynamic array is usually done with unions, but
> ftrace's structs are ... different. As such, I just added a one-off
> union.) Here, and you can be the first user of __counted_by too:
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 4529e264cb86..40935578c365 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -3108,6 +3108,14 @@ struct ftrace_stacks {
>  static DEFINE_PER_CPU(struct ftrace_stacks, ftrace_stacks);
>  static DEFINE_PER_CPU(int, ftrace_stack_reserve);
>  
> +union stack_entry_dynamic {
> +	struct stack_entry entry;
> +	struct {
> +		int size;
> +		unsigned long caller[] __counted_by(size);
> +	};
> +};

This actually makes it more likely to cause a bug in the future (and the
present!). Now we need to know that if stack_entry were to ever change, we
have to change this too. And it can change (although highly unlikely).

The problem comes with this structure being created by a macro that creates
the structure and what it exports to user space.

From kernel/trace/trace_entries.h: The macro that creates this structure.

FTRACE_ENTRY(kernel_stack, stack_entry,

	TRACE_STACK,

	F_STRUCT(
		__field(	int,		size	)
		__array(	unsigned long,	caller,	FTRACE_STACK_ENTRIES	)
	),

	F_printk("\t=> %ps\n\t=> %ps\n\t=> %ps\n"
		 "\t=> %ps\n\t=> %ps\n\t=> %ps\n"
		 "\t=> %ps\n\t=> %ps\n",
		 (void *)__entry->caller[0], (void *)__entry->caller[1],
		 (void *)__entry->caller[2], (void *)__entry->caller[3],
		 (void *)__entry->caller[4], (void *)__entry->caller[5],
		 (void *)__entry->caller[6], (void *)__entry->caller[7])
);

Now, this event is unique, as it's the only event that has a dynamic array
that is not done by the way other dynamic arrays are done. Which is to
insert a field that has an offset and length to it. That is, other events
would look like this:

struct stack_entry {
	int		size;
	short		offset; length;
	[ more fields ]
	int		dynamic[];
}

Where offset would be the ((void *)(struct stack_entry *)data) + offset. As
all the dynamic size portions of an event are at the end of the event, with
these offset/length tuples to tell user space and the kernel where to look
in the event binary data for those fields.

But to keep backward compatibility, as this event had code specific for
parsing it in libtraceevent that doesn't expect that offset/length tuple,
and instead just looked at the "caller[8]" portion to see that it had 8
fields (this is static for all events). New code uses the size to know, and
ignores the [8]. The event meta data gives the actual size of the stored
data so the parser knows not to go beyond that.

Note, this event existed before normal trace events that had the dynamic
arrays, which is why it didn't do the same.

The only thing this code is doing is filling in the ring buffer. The entry
structure is just a helper to know where to place the data in the ring
buffer.

So my question to you is, what's the purpose of hardening? To prevent
future bugs, right? By making a shadow struct, we are now burdened to
remember to modify the shadow stack if we ever modify this current
structure.

Oh, to further my point, your change is buggy too (and I almost didn't even
notice it!)

> +
>  static void __ftrace_trace_stack(struct trace_buffer *buffer,
>  				 unsigned int trace_ctx,
>  				 int skip, struct pt_regs *regs)
> @@ -3116,7 +3124,7 @@ static void __ftrace_trace_stack(struct trace_buffer *buffer,
>  	struct ring_buffer_event *event;
>  	unsigned int size, nr_entries;
>  	struct ftrace_stack *fstack;
> -	struct stack_entry *entry;
> +	union stack_entry_dynamic *entry;
>  	int stackidx;
>  
>  	/*
> @@ -3155,16 +3163,15 @@ static void __ftrace_trace_stack(struct trace_buffer *buffer,
>  		nr_entries = stack_trace_save(fstack->calls, size, skip);
>  	}
>  
> -	size = nr_entries * sizeof(unsigned long);
>  	event = __trace_buffer_lock_reserve(buffer, TRACE_STACK,
> -				    (sizeof(*entry) - sizeof(entry->caller)) + size,
> +				    struct_size(entry, caller, nr_entries),
>  				    trace_ctx);

Your definition of stack_entry_dynamic is wrong, because stack_entry is
really (as created by the macro, hence why this is error prone):

struct stack_entry {
	struct trace_entry	ent;
	int			size;
	long			caller[8];
};

So you didn't allocate enough, and are writing into the wrong part of the
data, corrupting it.

This is why I much rather prefer the simple:

 	ptr += offsetof(typeof(*entry), caller);
 	memcpy(ptr, fstack->calls, size);

Which doesn't need to care about synchronizing with the macro magic of the
structure, which may change in the future, and this will lead to one more
location that would need to be updated, but likely forgotten.

C is fun, let's go shopping!

-- Steve


>  	if (!event)
>  		goto out;
>  	entry = ring_buffer_event_data(event);
>  
> -	memcpy(&entry->caller, fstack->calls, size);
>  	entry->size = nr_entries;
> +	memcpy(entry->caller, fstack->calls, flex_array_size(entry, caller, nr_entries));
>  
>  	if (!call_filter_check_discard(call, entry, buffer, event))
>  		__buffer_unlock_commit(buffer, event);
>
Kees Cook July 13, 2023, 5:22 a.m. UTC | #4
On Wed, Jul 12, 2023 at 08:43:58PM -0400, Steven Rostedt wrote:
> On Wed, 12 Jul 2023 16:36:30 -0700
> Kees Cook <keescook@chromium.org> wrote:
> > [...]
> > +union stack_entry_dynamic {
> > +	struct stack_entry entry;
> > +	struct {
> > +		int size;
> > +		unsigned long caller[] __counted_by(size);
> > +	};
> > +};
> 
> This actually makes it more likely to cause a bug in the future (and the
> present!). Now we need to know that if stack_entry were to ever change, we
> have to change this too. And it can change (although highly unlikely).

Yeah, I should have included a bit more, like:
BUILD_BUG_ON(offsetof(struct stack_entry, caller) == offsetof(struct stack_entry_dynamic, caller))

But anyway, I think we can still do better. :)

> [...]
> Now, this event is unique, as it's the only event that has a dynamic array
> that is not done by the way other dynamic arrays are done. Which is to
> insert a field that has an offset and length to it. That is, other events
> would look like this:
> 
> struct stack_entry {
> 	int		size;
> 	short		offset; length;
> 	[ more fields ]
> 	int		dynamic[];
> }

Yeah, it won't be a "true" trace dynamic array. stack_entry is the
sockaddr of trace. :) (i.e. we want to change the size of the trailing
array without actually changing the struct, but this way leads to
madness.)

> Where offset would be the ((void *)(struct stack_entry *)data) + offset. As
> all the dynamic size portions of an event are at the end of the event, with
> these offset/length tuples to tell user space and the kernel where to look
> in the event binary data for those fields.
> 
> But to keep backward compatibility, as this event had code specific for
> parsing it in libtraceevent that doesn't expect that offset/length tuple,
> and instead just looked at the "caller[8]" portion to see that it had 8
> fields (this is static for all events). New code uses the size to know, and
> ignores the [8]. The event meta data gives the actual size of the stored
> data so the parser knows not to go beyond that.

Right -- so, it would have been much nicer to use a new struct when
caller[8] wasn't enough. :) But that ship has sailed. Now we're in
a state where some tools expect caller[8] and some expect "caller[]
__counted_by(size)". In other places where we've had a "minimum sized
flexible array" requirements, we would've used a union, effectively like:

	int size;
	union {
		some_type legacy_padding[8];
		some_type caller[] __counted_by(size);
	};

Old code ends up with the same sized struct, and new code gets a proper
flexible array with the original struct member name. (And with
__counted_by, we'll have the capacity to do run-time bounds checking on
the flexible array.)

> 
> Note, this event existed before normal trace events that had the dynamic
> arrays, which is why it didn't do the same.
> 
> The only thing this code is doing is filling in the ring buffer. The entry
> structure is just a helper to know where to place the data in the ring
> buffer.
> 
> So my question to you is, what's the purpose of hardening? To prevent
> future bugs, right? By making a shadow struct, we are now burdened to
> remember to modify the shadow stack if we ever modify this current
> structure.

Right, I'd rather avoid a shadow struct -- it's especially weird here
given the special way trace creates its structures. :P The hardening
efforts are mainly holistic protections, in the sense that much of the
kernel isn't doing this kind of careful length management, etc. But that
means we can't abuse fixed-sized arrays anymore -- the compiler needs
to believe what it's told about sizes. ;)

> [...]
> This is why I much rather prefer the simple:
> 
>  	ptr += offsetof(typeof(*entry), caller);
>  	memcpy(ptr, fstack->calls, size);
> 
> Which doesn't need to care about synchronizing with the macro magic of the
> structure, which may change in the future, and this will lead to one more
> location that would need to be updated, but likely forgotten.

I am just thinking this will kick the can down the road. The compiler
may get smart enough to see through this or who know what next.

> C is fun, let's go shopping!

Yup. I'm just trying to take away all the foot-guns. :) Let me try
another way to solve this that is less of a bypass. For now, sure, let's
take the work-around, but I think it'll need to be fixed up soon.

-Kees
Kees Cook July 13, 2023, 6:08 a.m. UTC | #5
On Wed, Jul 12, 2023 at 10:22:35PM -0700, Kees Cook wrote:
> But anyway, I think we can still do better. :)

Okay, what about this? I'm really not sure how to test this
appropriately, but it does appear to build. (famous last words)

This creates the union, but I think it ends up only being visible for
trace.c; everything else thinks it's just __array.

Anyway, maybe I'm closer, but trace internals are hurting my head right
now...

-Kees


diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4529e264cb86..fba49d6c590c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3157,7 +3157,7 @@ static void __ftrace_trace_stack(struct trace_buffer *buffer,
 
 	size = nr_entries * sizeof(unsigned long);
 	event = __trace_buffer_lock_reserve(buffer, TRACE_STACK,
-				    (sizeof(*entry) - sizeof(entry->caller)) + size,
+				    (sizeof(*entry) - sizeof(entry->__legacy_caller)) + size,
 				    trace_ctx);
 	if (!event)
 		goto out;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index ed7906b13f09..29be88ad6227 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -83,6 +83,12 @@ enum trace_type {
 #undef __dynamic_array
 #define __dynamic_array(type, item)	type	item[];
 
+#define __min_array(type, item, size)		\
+	union {					\
+		type __legacy_ ## item[size];	\
+		DECLARE_FLEX_ARRAY(type, item);	\
+	};
+
 #undef __rel_dynamic_array
 #define __rel_dynamic_array(type, item)	type	item[];
 
diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index 340b2fa98218..311a6c338385 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -190,7 +190,7 @@ FTRACE_ENTRY(kernel_stack, stack_entry,
 
 	F_STRUCT(
 		__field(	int,		size	)
-		__array(	unsigned long,	caller,	FTRACE_STACK_ENTRIES	)
+		__min_array(	unsigned long,	caller,	FTRACE_STACK_ENTRIES	)
 	),
 
 	F_printk("\t=> %ps\n\t=> %ps\n\t=> %ps\n"
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 58f3946081e2..8f6dcd616d85 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -57,6 +57,9 @@ static int ftrace_event_register(struct trace_event_call *call,
 #undef __dynamic_array
 #define __dynamic_array(type, item)			type item[];
 
+#undef __min_array
+#define __min_array(type, item, size)			type item[size];
+
 #undef F_STRUCT
 #define F_STRUCT(args...)				args
 
@@ -123,6 +126,9 @@ static void __always_unused ____ftrace_check_##name(void)		\
 	.size = 0, .align = __alignof__(_type),				\
 	is_signed_type(_type), .filter_type = FILTER_OTHER },
 
+#undef __min_array
+#define __min_array(_type, _item, _len) __array(_type, _item, _len)
+
 #undef FTRACE_ENTRY
 #define FTRACE_ENTRY(name, struct_name, id, tstruct, print)		\
 static struct trace_event_fields ftrace_event_fields_##name[] = {	\
@@ -155,6 +161,9 @@ static struct trace_event_fields ftrace_event_fields_##name[] = {	\
 #undef __dynamic_array
 #define __dynamic_array(type, item)
 
+#undef __min_array
+#define __min_array(type, item, len)
+
 #undef F_printk
 #define F_printk(fmt, args...) __stringify(fmt) ", "  __stringify(args)
Steven Rostedt July 13, 2023, 12:18 p.m. UTC | #6
On Wed, 12 Jul 2023 23:08:37 -0700
Kees Cook <keescook@chromium.org> wrote:

> On Wed, Jul 12, 2023 at 10:22:35PM -0700, Kees Cook wrote:
> > But anyway, I think we can still do better. :)  
> 
> Okay, what about this? I'm really not sure how to test this
> appropriately, but it does appear to build. (famous last words)
> 
> This creates the union, but I think it ends up only being visible for
> trace.c; everything else thinks it's just __array.
> 
> Anyway, maybe I'm closer, but trace internals are hurting my head right
> now...

You need to wear your Macro Wizard hat to protect from brain hurt.

Anyway, I was thinking about this too, and the correct approach is this
patch. But this is something I don't want to push into the -rc release,
and instead will wait till the next merge window. I rather have the
simple pointer arithmetic for now.

But you were close. But instead of doing a union, what I did was to
expose something different to user space and for the kernel make the
dynamic array struct.

This patch is on top of the current code, but the real patch is on top
of my pointer arithmetic change. (I did a "git diff HEAD~1")

But simple testing shows that this does work.

-- Steve

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4529e264cb86..ca8d3aa1058a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3155,16 +3155,16 @@ static void __ftrace_trace_stack(struct trace_buffer *buffer,
 		nr_entries = stack_trace_save(fstack->calls, size, skip);
 	}
 
-	size = nr_entries * sizeof(unsigned long);
 	event = __trace_buffer_lock_reserve(buffer, TRACE_STACK,
-				    (sizeof(*entry) - sizeof(entry->caller)) + size,
+				    struct_size(entry, caller, nr_entries),
 				    trace_ctx);
 	if (!event)
 		goto out;
 	entry = ring_buffer_event_data(event);
 
-	memcpy(&entry->caller, fstack->calls, size);
 	entry->size = nr_entries;
+	memcpy(&entry->caller, fstack->calls,
+	       flex_array_size(entry, caller, nr_entries));
 
 	if (!call_filter_check_discard(call, entry, buffer, event))
 		__buffer_unlock_commit(buffer, event);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index ed7906b13f09..6f09cf3888c3 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -77,6 +77,16 @@ enum trace_type {
 #undef __array
 #define __array(type, item, size)	type	item[size];
 
+/*
+ * For backward compatibility, older user space expects to see the
+ * kernel_stack event with a fixed size caller field. But today the fix
+ * size is ignored by the kernel, and the real structure is dynamic.
+ * Expose to user space: "unsigned long caller[8];" but the real structure
+ * will be "unsigned long caller[] __counted_by(size)"
+ */
+#undef __stack_array
+#define __stack_array(type, item, size, field)		type item[] __counted_by(field);
+
 #undef __array_desc
 #define __array_desc(type, container, item, size)
 
diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index 340b2fa98218..c47422b20908 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -190,7 +190,7 @@ FTRACE_ENTRY(kernel_stack, stack_entry,
 
 	F_STRUCT(
 		__field(	int,		size	)
-		__array(	unsigned long,	caller,	FTRACE_STACK_ENTRIES	)
+		__stack_array(	unsigned long,	caller,	FTRACE_STACK_ENTRIES, size)
 	),
 
 	F_printk("\t=> %ps\n\t=> %ps\n\t=> %ps\n"
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 58f3946081e2..1698fc22afa0 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -51,6 +51,9 @@ static int ftrace_event_register(struct trace_event_call *call,
 #undef __array
 #define __array(type, item, size)			type item[size];
 
+#undef __stack_array
+#define __stack_array(type, item, size, field)		__array(type, item, size)
+
 #undef __array_desc
 #define __array_desc(type, container, item, size)	type item[size];
 
@@ -114,6 +117,9 @@ static void __always_unused ____ftrace_check_##name(void)		\
 	is_signed_type(_type), .filter_type = FILTER_OTHER,			\
 	.len = _len },
 
+#undef __stack_array
+#define __stack_array(_type, _item, _len, _field) __array(_type, _item, _len)
+
 #undef __array_desc
 #define __array_desc(_type, _container, _item, _len) __array(_type, _item, _len)
 
@@ -149,6 +155,9 @@ static struct trace_event_fields ftrace_event_fields_##name[] = {	\
 #undef __array
 #define __array(type, item, len)
 
+#undef __stack_array
+#define __stack_array(type, item, len, field)
+
 #undef __array_desc
 #define __array_desc(type, container, item, len)
diff mbox series

Patch

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4529e264cb86..20122eeccf97 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3118,6 +3118,7 @@  static void __ftrace_trace_stack(struct trace_buffer *buffer,
 	struct ftrace_stack *fstack;
 	struct stack_entry *entry;
 	int stackidx;
+	void *ptr;
 
 	/*
 	 * Add one, for this function and the call to save_stack_trace()
@@ -3161,9 +3162,25 @@  static void __ftrace_trace_stack(struct trace_buffer *buffer,
 				    trace_ctx);
 	if (!event)
 		goto out;
-	entry = ring_buffer_event_data(event);
+	ptr = ring_buffer_event_data(event);
+	entry = ptr;
+
+	/*
+	 * For backward compatibility reasons, the entry->caller is an
+	 * array of 8 slots to store the stack. This is also exported
+	 * to user space. The amount allocated on the ring buffer actually
+	 * holds enough for the stack specified by nr_entries. This will
+	 * go into the location of entry->caller. Due to string fortifiers
+	 * checking the size of the destination of memcpy() it triggers
+	 * when it detects that size is greater than 8. To hide this from
+	 * the fortifiers, we use "ptr" and pointer arithmetic to assign caller.
+	 *
+	 * The below is really just:
+	 *   memcpy(&entry->caller, fstack->calls, size);
+	 */
+	ptr += offsetof(typeof(*entry), caller);
+	memcpy(ptr, fstack->calls, size);
 
-	memcpy(&entry->caller, fstack->calls, size);
 	entry->size = nr_entries;
 
 	if (!call_filter_check_discard(call, entry, buffer, event))