diff mbox series

sched/task_struct: Move alloc_tag to the end of the struct.

Message ID 20240621102750.oH9p1FQq@linutronix.de (mailing list archive)
State New
Headers show
Series sched/task_struct: Move alloc_tag to the end of the struct. | expand

Commit Message

Sebastian Andrzej Siewior June 21, 2024, 10:27 a.m. UTC
The alloc_tag member has been added to task_struct at the very
beginning. This is a pointer and on 64bit architectures it forces 4 byte
padding after `ptrace' and then forcing another another 4 byte padding
after `on_cpu'. A few members later, `se' requires a cacheline aligned
due to struct sched_avg resulting in 52 hole before `se'.

This is the case on 64bit-SMP architectures.
The 52 byte hole can be avoided by moving alloc_tag away where it
currently resides.

Move alloc_tag to the end of task_struct. There is likely a hole before
`thread' due to its alignment requirement and the previous members are
likely to be already pointer-aligned.

Fixes: 22d407b164ff7 ("lib: add allocation tagging support for memory allocation profiling")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/sched.h | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Kent Overstreet June 21, 2024, 2:20 p.m. UTC | #1
On Fri, Jun 21, 2024 at 12:27:50PM +0200, Sebastian Andrzej Siewior wrote:
> The alloc_tag member has been added to task_struct at the very
> beginning. This is a pointer and on 64bit architectures it forces 4 byte
> padding after `ptrace' and then forcing another another 4 byte padding
> after `on_cpu'. A few members later, `se' requires a cacheline aligned
> due to struct sched_avg resulting in 52 hole before `se'.
> 
> This is the case on 64bit-SMP architectures.
> The 52 byte hole can be avoided by moving alloc_tag away where it
> currently resides.
> 
> Move alloc_tag to the end of task_struct. There is likely a hole before
> `thread' due to its alignment requirement and the previous members are
> likely to be already pointer-aligned.

We sure we want it at the end? we do want it on a hot cacheline
Sebastian Andrzej Siewior June 21, 2024, 6:29 p.m. UTC | #2
On 2024-06-21 10:20:58 [-0400], Kent Overstreet wrote:
> On Fri, Jun 21, 2024 at 12:27:50PM +0200, Sebastian Andrzej Siewior wrote:
> > The alloc_tag member has been added to task_struct at the very
> > beginning. This is a pointer and on 64bit architectures it forces 4 byte
> > padding after `ptrace' and then forcing another another 4 byte padding
> > after `on_cpu'. A few members later, `se' requires a cacheline aligned
> > due to struct sched_avg resulting in 52 hole before `se'.
> > 
> > This is the case on 64bit-SMP architectures.
> > The 52 byte hole can be avoided by moving alloc_tag away where it
> > currently resides.
> > 
> > Move alloc_tag to the end of task_struct. There is likely a hole before
> > `thread' due to its alignment requirement and the previous members are
> > likely to be already pointer-aligned.
> 
> We sure we want it at the end? we do want it on a hot cacheline

Well, the front is bad.
Looking at pgalloc_tag_add() and its callers, there is no task_struct
touching. alloc_tag_save()/restore might be the critical one. This is
random code… Puh. So if the end is too cold, what about around the mm
pointer?
Other suggestions?

Sebastian
Kent Overstreet June 21, 2024, 6:49 p.m. UTC | #3
On Fri, Jun 21, 2024 at 08:29:15PM +0200, Sebastian Andrzej Siewior wrote:
> On 2024-06-21 10:20:58 [-0400], Kent Overstreet wrote:
> > On Fri, Jun 21, 2024 at 12:27:50PM +0200, Sebastian Andrzej Siewior wrote:
> > > The alloc_tag member has been added to task_struct at the very
> > > beginning. This is a pointer and on 64bit architectures it forces 4 byte
> > > padding after `ptrace' and then forcing another another 4 byte padding
> > > after `on_cpu'. A few members later, `se' requires a cacheline aligned
> > > due to struct sched_avg resulting in 52 hole before `se'.
> > > 
> > > This is the case on 64bit-SMP architectures.
> > > The 52 byte hole can be avoided by moving alloc_tag away where it
> > > currently resides.
> > > 
> > > Move alloc_tag to the end of task_struct. There is likely a hole before
> > > `thread' due to its alignment requirement and the previous members are
> > > likely to be already pointer-aligned.
> > 
> > We sure we want it at the end? we do want it on a hot cacheline
> 
> Well, the front is bad.
> Looking at pgalloc_tag_add() and its callers, there is no task_struct
> touching. alloc_tag_save()/restore might be the critical one. This is
> random code… Puh. So if the end is too cold, what about around the mm
> pointer?

Not there, that's not actually that hot. It needs to be by
task_struct->flags; that's used in the same paths.
Sebastian Andrzej Siewior June 21, 2024, 7:07 p.m. UTC | #4
On 2024-06-21 14:49:23 [-0400], Kent Overstreet wrote:
> On Fri, Jun 21, 2024 at 08:29:15PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2024-06-21 10:20:58 [-0400], Kent Overstreet wrote:
> > > On Fri, Jun 21, 2024 at 12:27:50PM +0200, Sebastian Andrzej Siewior wrote:
> > > > The alloc_tag member has been added to task_struct at the very
> > > > beginning. This is a pointer and on 64bit architectures it forces 4 byte
> > > > padding after `ptrace' and then forcing another another 4 byte padding
> > > > after `on_cpu'. A few members later, `se' requires a cacheline aligned
> > > > due to struct sched_avg resulting in 52 hole before `se'.
> > > > 
> > > > This is the case on 64bit-SMP architectures.
> > > > The 52 byte hole can be avoided by moving alloc_tag away where it
> > > > currently resides.
> > > > 
> > > > Move alloc_tag to the end of task_struct. There is likely a hole before
> > > > `thread' due to its alignment requirement and the previous members are
> > > > likely to be already pointer-aligned.
> > > 
> > > We sure we want it at the end? we do want it on a hot cacheline
> > 
> > Well, the front is bad.
> > Looking at pgalloc_tag_add() and its callers, there is no task_struct
> > touching. alloc_tag_save()/restore might be the critical one. This is
> > random code… Puh. So if the end is too cold, what about around the mm
> > pointer?
> 
> Not there, that's not actually that hot. It needs to be by
> task_struct->flags; that's used in the same paths.

But there is no space without the additional 52 bytes. Was this by any
chance on purpose?

Sebastian
Kent Overstreet June 21, 2024, 7:13 p.m. UTC | #5
On Fri, Jun 21, 2024 at 09:07:19PM +0200, Sebastian Andrzej Siewior wrote:
> On 2024-06-21 14:49:23 [-0400], Kent Overstreet wrote:
> > On Fri, Jun 21, 2024 at 08:29:15PM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2024-06-21 10:20:58 [-0400], Kent Overstreet wrote:
> > > > On Fri, Jun 21, 2024 at 12:27:50PM +0200, Sebastian Andrzej Siewior wrote:
> > > > > The alloc_tag member has been added to task_struct at the very
> > > > > beginning. This is a pointer and on 64bit architectures it forces 4 byte
> > > > > padding after `ptrace' and then forcing another another 4 byte padding
> > > > > after `on_cpu'. A few members later, `se' requires a cacheline aligned
> > > > > due to struct sched_avg resulting in 52 hole before `se'.
> > > > > 
> > > > > This is the case on 64bit-SMP architectures.
> > > > > The 52 byte hole can be avoided by moving alloc_tag away where it
> > > > > currently resides.
> > > > > 
> > > > > Move alloc_tag to the end of task_struct. There is likely a hole before
> > > > > `thread' due to its alignment requirement and the previous members are
> > > > > likely to be already pointer-aligned.
> > > > 
> > > > We sure we want it at the end? we do want it on a hot cacheline
> > > 
> > > Well, the front is bad.
> > > Looking at pgalloc_tag_add() and its callers, there is no task_struct
> > > touching. alloc_tag_save()/restore might be the critical one. This is
> > > random code… Puh. So if the end is too cold, what about around the mm
> > > pointer?
> > 
> > Not there, that's not actually that hot. It needs to be by
> > task_struct->flags; that's used in the same paths.
> 
> But there is no space without the additional 52 bytes. Was this by any
> chance on purpose?

No, that wasn't, and it doesn't have to be the exact same cacheline, but
we do want it near current->flags; we store PF_MEMALLOC flags there that
are converted to gfp flags and used exactly where we're using
current->alloc_tag.
Sebastian Andrzej Siewior June 21, 2024, 7:22 p.m. UTC | #6
On 2024-06-21 15:13:19 [-0400], Kent Overstreet wrote:
> > > > random code… Puh. So if the end is too cold, what about around the mm
> > > > pointer?
> > > 
> > > Not there, that's not actually that hot. It needs to be by
> > > task_struct->flags; that's used in the same paths.
> > 
> > But there is no space without the additional 52 bytes. Was this by any
> > chance on purpose?
> 
> No, that wasn't, and it doesn't have to be the exact same cacheline, but
> we do want it near current->flags; we store PF_MEMALLOC flags there that
> are converted to gfp flags and used exactly where we're using
> current->alloc_tag.

Hmm. `stack' and `usage' are the only two member that you would have to
move (away) in order the stash the conditional variable there. The
`ptrace' one uses the same flags as `flags' so it wouldn't make sense to
move that one.

Sebastian
Sebastian Andrzej Siewior June 28, 2024, 9:49 a.m. UTC | #7
On 2024-06-21 12:27:52 [+0200], To linux-mm@kvack.org wrote:
> The alloc_tag member has been added to task_struct at the very
> beginning. This is a pointer and on 64bit architectures it forces 4 byte
> padding after `ptrace' and then forcing another another 4 byte padding
> after `on_cpu'. A few members later, `se' requires a cacheline aligned
> due to struct sched_avg resulting in 52 hole before `se'.
> 
> This is the case on 64bit-SMP architectures.
> The 52 byte hole can be avoided by moving alloc_tag away where it
> currently resides.
> 
> Move alloc_tag to the end of task_struct. There is likely a hole before
> `thread' due to its alignment requirement and the previous members are
> likely to be already pointer-aligned.
> 
> Fixes: 22d407b164ff7 ("lib: add allocation tagging support for memory allocation profiling")
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Could we please get this merged and worry about possible performance
regression later? Or once there is a test case or an idea where this
pointer might fit better but clearly the current situation is worse.

Sebastian
Andrew Morton June 28, 2024, 7:20 p.m. UTC | #8
On Fri, 28 Jun 2024 11:49:44 +0200 Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 2024-06-21 12:27:52 [+0200], To linux-mm@kvack.org wrote:
> > The alloc_tag member has been added to task_struct at the very
> > beginning. This is a pointer and on 64bit architectures it forces 4 byte
> > padding after `ptrace' and then forcing another another 4 byte padding
> > after `on_cpu'. A few members later, `se' requires a cacheline aligned
> > due to struct sched_avg resulting in 52 hole before `se'.
> > 
> > This is the case on 64bit-SMP architectures.
> > The 52 byte hole can be avoided by moving alloc_tag away where it
> > currently resides.
> > 
> > Move alloc_tag to the end of task_struct. There is likely a hole before
> > `thread' due to its alignment requirement and the previous members are
> > likely to be already pointer-aligned.
> > 
> > Fixes: 22d407b164ff7 ("lib: add allocation tagging support for memory allocation profiling")
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> Could we please get this merged and worry about possible performance
> regression later? Or once there is a test case or an idea where this
> pointer might fit better but clearly the current situation is worse.
> 

All in favor of saving 56 bytes from the task_struct, but we can do
that by moving various things around.  Was alloc_tag the best choice?
Kent Overstreet June 28, 2024, 7:35 p.m. UTC | #9
On Fri, Jun 28, 2024 at 11:49:44AM +0200, Sebastian Andrzej Siewior wrote:
> On 2024-06-21 12:27:52 [+0200], To linux-mm@kvack.org wrote:
> > The alloc_tag member has been added to task_struct at the very
> > beginning. This is a pointer and on 64bit architectures it forces 4 byte
> > padding after `ptrace' and then forcing another another 4 byte padding
> > after `on_cpu'. A few members later, `se' requires a cacheline aligned
> > due to struct sched_avg resulting in 52 hole before `se'.
> > 
> > This is the case on 64bit-SMP architectures.
> > The 52 byte hole can be avoided by moving alloc_tag away where it
> > currently resides.
> > 
> > Move alloc_tag to the end of task_struct. There is likely a hole before
> > `thread' due to its alignment requirement and the previous members are
> > likely to be already pointer-aligned.
> > 
> > Fixes: 22d407b164ff7 ("lib: add allocation tagging support for memory allocation profiling")
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> Could we please get this merged and worry about possible performance
> regression later? Or once there is a test case or an idea where this
> pointer might fit better but clearly the current situation is worse.

Sebastian, I gave you review feedback on your patch; if you can
incorporate it into a new version your patch will sail right in.
Sebastian Andrzej Siewior June 28, 2024, 7:55 p.m. UTC | #10
On 2024-06-28 15:35:38 [-0400], Kent Overstreet wrote:
> On Fri, Jun 28, 2024 at 11:49:44AM +0200, Sebastian Andrzej Siewior wrote:
> > On 2024-06-21 12:27:52 [+0200], To linux-mm@kvack.org wrote:
> > > The alloc_tag member has been added to task_struct at the very
> > > beginning. This is a pointer and on 64bit architectures it forces 4 byte
> > > padding after `ptrace' and then forcing another another 4 byte padding
> > > after `on_cpu'. A few members later, `se' requires a cacheline aligned
> > > due to struct sched_avg resulting in 52 hole before `se'.
> > > 
> > > This is the case on 64bit-SMP architectures.
> > > The 52 byte hole can be avoided by moving alloc_tag away where it
> > > currently resides.
> > > 
> > > Move alloc_tag to the end of task_struct. There is likely a hole before
> > > `thread' due to its alignment requirement and the previous members are
> > > likely to be already pointer-aligned.
> > > 
> > > Fixes: 22d407b164ff7 ("lib: add allocation tagging support for memory allocation profiling")
> > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > 
> > Could we please get this merged and worry about possible performance
> > regression later? Or once there is a test case or an idea where this
> > pointer might fit better but clearly the current situation is worse.
> 
> Sebastian, I gave you review feedback on your patch; if you can
> incorporate it into a new version your patch will sail right in.

Kent, you said you didn't want it where it currently is. Fine. You said
you want it at the front next to `flags'. This isn't going to work since
there is no space left. You didn't make another suggestion or say how to
make room.

I don't impose this version, I just don't see a better way right now.

Sebastian
Kent Overstreet June 28, 2024, 8:20 p.m. UTC | #11
On Fri, Jun 28, 2024 at 09:55:53PM +0200, Sebastian Andrzej Siewior wrote:
> On 2024-06-28 15:35:38 [-0400], Kent Overstreet wrote:
> > On Fri, Jun 28, 2024 at 11:49:44AM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2024-06-21 12:27:52 [+0200], To linux-mm@kvack.org wrote:
> > > > The alloc_tag member has been added to task_struct at the very
> > > > beginning. This is a pointer and on 64bit architectures it forces 4 byte
> > > > padding after `ptrace' and then forcing another another 4 byte padding
> > > > after `on_cpu'. A few members later, `se' requires a cacheline aligned
> > > > due to struct sched_avg resulting in 52 hole before `se'.
> > > > 
> > > > This is the case on 64bit-SMP architectures.
> > > > The 52 byte hole can be avoided by moving alloc_tag away where it
> > > > currently resides.
> > > > 
> > > > Move alloc_tag to the end of task_struct. There is likely a hole before
> > > > `thread' due to its alignment requirement and the previous members are
> > > > likely to be already pointer-aligned.
> > > > 
> > > > Fixes: 22d407b164ff7 ("lib: add allocation tagging support for memory allocation profiling")
> > > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > 
> > > Could we please get this merged and worry about possible performance
> > > regression later? Or once there is a test case or an idea where this
> > > pointer might fit better but clearly the current situation is worse.
> > 
> > Sebastian, I gave you review feedback on your patch; if you can
> > incorporate it into a new version your patch will sail right in.
> 
> Kent, you said you didn't want it where it currently is. Fine. You said
> you want it at the front next to `flags'. This isn't going to work since
> there is no space left. You didn't make another suggestion or say how to
> make room.

It doesn't need to be on the exact same cacheline, just as near as you
can get it.
Sebastian Andrzej Siewior June 30, 2024, 9:11 p.m. UTC | #12
On 2024-06-28 16:20:27 [-0400], Kent Overstreet wrote:
> > Kent, you said you didn't want it where it currently is. Fine. You said
> > you want it at the front next to `flags'. This isn't going to work since
> > there is no space left. You didn't make another suggestion or say how to
> > make room.
> 
> It doesn't need to be on the exact same cacheline, just as near as you
> can get it.

the first possible thing would be somewhere after the scheduler.
However, what difference does it make if it s two cache lines later or
more?  I don't understand the requirement "closer".

Sebastian
Kent Overstreet June 30, 2024, 9:23 p.m. UTC | #13
On Sun, Jun 30, 2024 at 11:11:42PM GMT, Sebastian Andrzej Siewior wrote:
> On 2024-06-28 16:20:27 [-0400], Kent Overstreet wrote:
> > > Kent, you said you didn't want it where it currently is. Fine. You said
> > > you want it at the front next to `flags'. This isn't going to work since
> > > there is no space left. You didn't make another suggestion or say how to
> > > make room.
> > 
> > It doesn't need to be on the exact same cacheline, just as near as you
> > can get it.
> 
> the first possible thing would be somewhere after the scheduler.
> However, what difference does it make if it s two cache lines later or
> more?  I don't understand the requirement "closer".

take advantage of CPU prefetching; CPUs will bring in more than just the
cacheline you touched because 64 bytes is small and it's cheap to fetch
from the same DRAM bank while it's open.
Sebastian Andrzej Siewior July 1, 2024, 8:05 a.m. UTC | #14
On 2024-06-30 17:23:36 [-0400], Kent Overstreet wrote:
> On Sun, Jun 30, 2024 at 11:11:42PM GMT, Sebastian Andrzej Siewior wrote:
> > On 2024-06-28 16:20:27 [-0400], Kent Overstreet wrote:
> > > > Kent, you said you didn't want it where it currently is. Fine. You said
> > > > you want it at the front next to `flags'. This isn't going to work since
> > > > there is no space left. You didn't make another suggestion or say how to
> > > > make room.
> > > 
> > > It doesn't need to be on the exact same cacheline, just as near as you
> > > can get it.
> > 
> > the first possible thing would be somewhere after the scheduler.
> > However, what difference does it make if it s two cache lines later or
> > more?  I don't understand the requirement "closer".
> 
> take advantage of CPU prefetching; CPUs will bring in more than just the
> cacheline you touched because 64 bytes is small and it's cheap to fetch
> from the same DRAM bank while it's open.

Looking at the layout:
|        unsigned int               flags;                /*    44     4 */
|        unsigned int               ptrace;               /*    48     4 */
|        int                        on_cpu;               /*    52     4 */
|        struct __call_single_node  wake_entry;           /*    56    16 */
|        /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
…
Starting with sched
…
|        struct sched_statistics    stats __attribute__((__aligned__(64))); /*   704   256 */
|
|        /* XXX last struct has 32 bytes of padding */
sched end, earliest spot imho

|        /* --- cacheline 15 boundary (960 bytes) --- */
|        unsigned int               btrace_seq;           /*   960     4 */

If I add this before `btrace_seq' right after `stats' then it will be 14
caches lines later or 912 bytes after. How big is this prefetch going to
be?

Sebastian
diff mbox series

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 61591ac6eab6d..d76c61510ef1d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -770,10 +770,6 @@  struct task_struct {
 	unsigned int			flags;
 	unsigned int			ptrace;
 
-#ifdef CONFIG_MEM_ALLOC_PROFILING
-	struct alloc_tag		*alloc_tag;
-#endif
-
 #ifdef CONFIG_SMP
 	int				on_cpu;
 	struct __call_single_node	wake_entry;
@@ -1553,6 +1549,9 @@  struct task_struct {
 #ifdef CONFIG_USER_EVENTS
 	struct user_event_mm		*user_event_mm;
 #endif
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+	struct alloc_tag		*alloc_tag;
+#endif
 
 	/*
 	 * New fields for task_struct should be added above here, so that