diff mbox series

mm: append __GFP_COMP flag for trace_malloc

Message ID 1619491400-1904-1-git-send-email-sxwjean@me.com (mailing list archive)
State New, archived
Headers show
Series mm: append __GFP_COMP flag for trace_malloc | expand

Commit Message

Xiongwei Song April 27, 2021, 2:43 a.m. UTC
From: Xiongwei Song <sxwjean@gmail.com>

When calling kmalloc_order, the flags should include __GFP_COMP here,
so that trace_malloc can trace the precise flags.

Signed-off-by: Xiongwei Song <sxwjean@gmail.com>
---
 mm/slab_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Matthew Wilcox April 27, 2021, 2:53 a.m. UTC | #1
On Tue, Apr 27, 2021 at 10:43:20AM +0800, Xiongwei Song wrote:
> From: Xiongwei Song <sxwjean@gmail.com>
> 
> When calling kmalloc_order, the flags should include __GFP_COMP here,
> so that trace_malloc can trace the precise flags.

I suppose that depends on your point of view.  Should we report the
flags used by the caller, or the flags that we used to allocate memory?
And why does it matter?
Xiongwei Song April 27, 2021, 3:29 a.m. UTC | #2
On Tue, Apr 27, 2021 at 10:54 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Apr 27, 2021 at 10:43:20AM +0800, Xiongwei Song wrote:
> > From: Xiongwei Song <sxwjean@gmail.com>
> >
> > When calling kmalloc_order, the flags should include __GFP_COMP here,
> > so that trace_malloc can trace the precise flags.
>
> I suppose that depends on your point of view.
Correct.

Should we report the
> flags used by the caller, or the flags that we used to allocate memory?
> And why does it matter?
When I capture kmem:kmalloc events on my env with perf:
(perf record -p my_pid -e kmem:kmalloc)
I got the result below:
     0.08%  call_site=ffffffff851d0cb0 ptr=0xffff8c04a4ca0000
bytes_req=10176 bytes_alloc=16384
gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC
     0.08%  call_site=ffffffff851d0cb0 ptr=0xffff8c04a4ca4000
bytes_req=10176 bytes_alloc=16384
gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC
     0.08%  call_site=ffffffff851d0cb0 ptr=0xffff8c04a4ca8000
bytes_req=10176 bytes_alloc=16384
gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC
     0.08%  call_site=ffffffff851d0cb0 ptr=0xffff8c04a6f80000
bytes_req=10176 bytes_alloc=16384
gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC
     0.08%  call_site=ffffffff851d0cb0 ptr=0xffff8c04a6f84000
bytes_req=10176 bytes_alloc=16384
gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC
     0.08%  call_site=ffffffff851d0cb0 ptr=0xffff8c04a6f88000
bytes_req=10176 bytes_alloc=16384
gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC
     0.08%  call_site=ffffffff851d0cb0 ptr=0xffff8c04a6f8c000
bytes_req=10176 bytes_alloc=16384
gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC
     0.07%  call_site=ffffffff851d0cb0 ptr=0xffff8c04a4c80000
bytes_req=10176 bytes_alloc=16384
gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC

The value of gfp_flags made me confused, I spent some time to find out
which trace_malloc
is here. So I think we should append __GFP_COMP.

Regards
Matthew Wilcox April 27, 2021, 3:36 a.m. UTC | #3
On Tue, Apr 27, 2021 at 11:29:32AM +0800, Xiongwei Song wrote:
> On Tue, Apr 27, 2021 at 10:54 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Tue, Apr 27, 2021 at 10:43:20AM +0800, Xiongwei Song wrote:
> > > From: Xiongwei Song <sxwjean@gmail.com>
> > >
> > > When calling kmalloc_order, the flags should include __GFP_COMP here,
> > > so that trace_malloc can trace the precise flags.
> >
> > I suppose that depends on your point of view.
> Correct.
> 
> Should we report the
> > flags used by the caller, or the flags that we used to allocate memory?
> > And why does it matter?
> When I capture kmem:kmalloc events on my env with perf:
> (perf record -p my_pid -e kmem:kmalloc)
> I got the result below:
>      0.08%  call_site=ffffffff851d0cb0 ptr=0xffff8c04a4ca0000
> bytes_req=10176 bytes_alloc=16384
> gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC

Hmm ... if you have a lot of allocations about this size, that would
argue in favour of adding a kmem_cache of 10880 [*] bytes.  That way,
we'd get 3 allocations per 32kB instead of 2.

[*] 32768 / 3, rounded down to a 64 byte cacheline

But I don't understand why this confused you.  Your caller at
ffffffff851d0cb0 didn't specify __GFP_COMP.  I'd be more confused if
this did report __GFP_COMP.
Xiongwei Song April 27, 2021, 4:11 a.m. UTC | #4
On Tue, Apr 27, 2021 at 11:36 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Apr 27, 2021 at 11:29:32AM +0800, Xiongwei Song wrote:
> > On Tue, Apr 27, 2021 at 10:54 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Tue, Apr 27, 2021 at 10:43:20AM +0800, Xiongwei Song wrote:
> > > > From: Xiongwei Song <sxwjean@gmail.com>
> > > >
> > > > When calling kmalloc_order, the flags should include __GFP_COMP here,
> > > > so that trace_malloc can trace the precise flags.
> > >
> > > I suppose that depends on your point of view.
> > Correct.
> >
> > Should we report the
> > > flags used by the caller, or the flags that we used to allocate memory?
> > > And why does it matter?
> > When I capture kmem:kmalloc events on my env with perf:
> > (perf record -p my_pid -e kmem:kmalloc)
> > I got the result below:
> >      0.08%  call_site=ffffffff851d0cb0 ptr=0xffff8c04a4ca0000
> > bytes_req=10176 bytes_alloc=16384
> > gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC
>
> Hmm ... if you have a lot of allocations about this size, that would
> argue in favour of adding a kmem_cache of 10880 [*] bytes.  That way,
> we'd get 3 allocations per 32kB instead of 2.
I understand you. But I don't think our process needs this size. This size
may be a bug in our code or somewhere, I don't know the RC for now.

> [*] 32768 / 3, rounded down to a 64 byte cacheline
>
> But I don't understand why this confused you.  Your caller at
> ffffffff851d0cb0 didn't specify __GFP_COMP.  I'd be more confused if
> this did report __GFP_COMP.
>
I just wanted to save some time when debugging.

Regards
Xiongwei Song April 27, 2021, 5:30 a.m. UTC | #5
Hi Mattew,

One more thing I should explain, the kmalloc_order() appends the
__GFP_COMP flags,
not by the caller.

void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
{
...........................................................

flags |= __GFP_COMP;
page = alloc_pages(flags, order);
...........................................................
return ret;
}
EXPORT_SYMBOL(kmalloc_order);

#ifdef CONFIG_TRACING
void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
{
void *ret = kmalloc_order(size, flags, order);
trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << order, flags);
return ret;
}
EXPORT_SYMBOL(kmalloc_order_trace);
#endif


Regards,
Xiongwei

On Tue, Apr 27, 2021 at 12:11 PM Xiongwei Song <sxwjean@gmail.com> wrote:
>
> On Tue, Apr 27, 2021 at 11:36 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Tue, Apr 27, 2021 at 11:29:32AM +0800, Xiongwei Song wrote:
> > > On Tue, Apr 27, 2021 at 10:54 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Tue, Apr 27, 2021 at 10:43:20AM +0800, Xiongwei Song wrote:
> > > > > From: Xiongwei Song <sxwjean@gmail.com>
> > > > >
> > > > > When calling kmalloc_order, the flags should include __GFP_COMP here,
> > > > > so that trace_malloc can trace the precise flags.
> > > >
> > > > I suppose that depends on your point of view.
> > > Correct.
> > >
> > > Should we report the
> > > > flags used by the caller, or the flags that we used to allocate memory?
> > > > And why does it matter?
> > > When I capture kmem:kmalloc events on my env with perf:
> > > (perf record -p my_pid -e kmem:kmalloc)
> > > I got the result below:
> > >      0.08%  call_site=ffffffff851d0cb0 ptr=0xffff8c04a4ca0000
> > > bytes_req=10176 bytes_alloc=16384
> > > gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC
> >
> > Hmm ... if you have a lot of allocations about this size, that would
> > argue in favour of adding a kmem_cache of 10880 [*] bytes.  That way,
> > we'd get 3 allocations per 32kB instead of 2.
> I understand you. But I don't think our process needs this size. This size
> may be a bug in our code or somewhere, I don't know the RC for now.
>
> > [*] 32768 / 3, rounded down to a 64 byte cacheline
> >
> > But I don't understand why this confused you.  Your caller at
> > ffffffff851d0cb0 didn't specify __GFP_COMP.  I'd be more confused if
> > this did report __GFP_COMP.
> >
> I just wanted to save some time when debugging.
>
> Regards
Matthew Wilcox April 27, 2021, 11:25 a.m. UTC | #6
On Tue, Apr 27, 2021 at 01:30:48PM +0800, Xiongwei Song wrote:
> Hi Mattew,
> 
> One more thing I should explain, the kmalloc_order() appends the
> __GFP_COMP flags,
> not by the caller.
> 
> void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
> {
> ...........................................................
> 
> flags |= __GFP_COMP;
> page = alloc_pages(flags, order);
> ...........................................................
> return ret;
> }
> EXPORT_SYMBOL(kmalloc_order);
> 
> #ifdef CONFIG_TRACING
> void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
> {
> void *ret = kmalloc_order(size, flags, order);
> trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << order, flags);
> return ret;
> }
> EXPORT_SYMBOL(kmalloc_order_trace);
> #endif

Yes, I understood that.  What I don't understand is why appending the
__GFP_COMP to the trace would have been less confusing for you.

Suppose I have some code which calls:

	kmalloc(10 * 1024, GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC);

and I see in my logs 

     0.08%  call_site=ffffffff851d0cb0 ptr=0xffff8c04a4ca0000 bytes_req=10176 bytes_alloc=16384 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC|__GFP_COMP

That seems to me _more_ confusing because I would wonder "Where did that
__GFP_COMP come from?"

> 
> Regards,
> Xiongwei
> 
> On Tue, Apr 27, 2021 at 12:11 PM Xiongwei Song <sxwjean@gmail.com> wrote:
> >
> > On Tue, Apr 27, 2021 at 11:36 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Tue, Apr 27, 2021 at 11:29:32AM +0800, Xiongwei Song wrote:
> > > > On Tue, Apr 27, 2021 at 10:54 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Tue, Apr 27, 2021 at 10:43:20AM +0800, Xiongwei Song wrote:
> > > > > > From: Xiongwei Song <sxwjean@gmail.com>
> > > > > >
> > > > > > When calling kmalloc_order, the flags should include __GFP_COMP here,
> > > > > > so that trace_malloc can trace the precise flags.
> > > > >
> > > > > I suppose that depends on your point of view.
> > > > Correct.
> > > >
> > > > Should we report the
> > > > > flags used by the caller, or the flags that we used to allocate memory?
> > > > > And why does it matter?
> > > > When I capture kmem:kmalloc events on my env with perf:
> > > > (perf record -p my_pid -e kmem:kmalloc)
> > > > I got the result below:
> > > >      0.08%  call_site=ffffffff851d0cb0 ptr=0xffff8c04a4ca0000
> > > > bytes_req=10176 bytes_alloc=16384
> > > > gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC
> > >
> > > Hmm ... if you have a lot of allocations about this size, that would
> > > argue in favour of adding a kmem_cache of 10880 [*] bytes.  That way,
> > > we'd get 3 allocations per 32kB instead of 2.
> > I understand you. But I don't think our process needs this size. This size
> > may be a bug in our code or somewhere, I don't know the RC for now.
> >
> > > [*] 32768 / 3, rounded down to a 64 byte cacheline
> > >
> > > But I don't understand why this confused you.  Your caller at
> > > ffffffff851d0cb0 didn't specify __GFP_COMP.  I'd be more confused if
> > > this did report __GFP_COMP.
> > >
> > I just wanted to save some time when debugging.
> >
> > Regards
Xiongwei Song April 28, 2021, 3:05 a.m. UTC | #7
On Tue, Apr 27, 2021 at 7:26 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Apr 27, 2021 at 01:30:48PM +0800, Xiongwei Song wrote:
> > Hi Mattew,
> >
> > One more thing I should explain, the kmalloc_order() appends the
> > __GFP_COMP flags,
> > not by the caller.
> >
> > void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
> > {
> > ...........................................................
> >
> > flags |= __GFP_COMP;
> > page = alloc_pages(flags, order);
> > ...........................................................
> > return ret;
> > }
> > EXPORT_SYMBOL(kmalloc_order);
> >
> > #ifdef CONFIG_TRACING
> > void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
> > {
> > void *ret = kmalloc_order(size, flags, order);
> > trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << order, flags);
> > return ret;
> > }
> > EXPORT_SYMBOL(kmalloc_order_trace);
> > #endif
>
> Yes, I understood that.  What I don't understand is why appending the
> __GFP_COMP to the trace would have been less confusing for you.
>
> Suppose I have some code which calls:
>
>         kmalloc(10 * 1024, GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC);
>
> and I see in my logs
>
>      0.08%  call_site=ffffffff851d0cb0 ptr=0xffff8c04a4ca0000 bytes_req=10176 bytes_alloc=16384 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC|__GFP_COMP
>
> That seems to me _more_ confusing because I would wonder "Where did that
> __GFP_COMP come from?"

Thank you for the comments. But I disagree.

When I use trace, I hope I can get the precise data rather than something
changed that I don't know , then I can get the correct conclusion or
direction on my issue.

Here my question is what the trace events are for if they don't provide the
real situation? I think that's not graceful and friendly.

From my perspective, it'd be better to know my flags changed before checking
code lines one by one. In other words, I need a warning to reminder me on this,
then I can know quickly my process might do some incorrect things.

Regards,
Xiongwei
Vlastimil Babka May 3, 2021, 12:35 p.m. UTC | #8
On 4/28/21 5:05 AM, Xiongwei Song wrote:
> On Tue, Apr 27, 2021 at 7:26 PM Matthew Wilcox <willy@infradead.org> wrote:
>>
>> On Tue, Apr 27, 2021 at 01:30:48PM +0800, Xiongwei Song wrote:
>> > Hi Mattew,
>> >
>> > One more thing I should explain, the kmalloc_order() appends the
>> > __GFP_COMP flags,
>> > not by the caller.
>> >
>> > void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
>> > {
>> > ...........................................................
>> >
>> > flags |= __GFP_COMP;
>> > page = alloc_pages(flags, order);
>> > ...........................................................
>> > return ret;
>> > }
>> > EXPORT_SYMBOL(kmalloc_order);
>> >
>> > #ifdef CONFIG_TRACING
>> > void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
>> > {
>> > void *ret = kmalloc_order(size, flags, order);
>> > trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << order, flags);
>> > return ret;
>> > }
>> > EXPORT_SYMBOL(kmalloc_order_trace);
>> > #endif
>>
>> Yes, I understood that.  What I don't understand is why appending the
>> __GFP_COMP to the trace would have been less confusing for you.
>>
>> Suppose I have some code which calls:
>>
>>         kmalloc(10 * 1024, GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC);
>>
>> and I see in my logs
>>
>>      0.08%  call_site=ffffffff851d0cb0 ptr=0xffff8c04a4ca0000 bytes_req=10176 bytes_alloc=16384 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC|__GFP_COMP
>>
>> That seems to me _more_ confusing because I would wonder "Where did that
>> __GFP_COMP come from?"
> 
> Thank you for the comments. But I disagree.

FTR, I agree with Matthew. This is a tracepoint for kmalloc() so I would expect
to see what flags were passed to kmalloc().
If I wanted to see how the flags translated to page allocator's flags, I would
have used a page allocator's tracepoint which would show me that.

> When I use trace, I hope I can get the precise data rather than something
> changed that I don't know , then I can get the correct conclusion or
> direction on my issue.

It's precise from the point of the caller.

> Here my question is what the trace events are for if they don't provide the
> real situation? I think that's not graceful and friendly.
> 
> From my perspective, it'd be better to know my flags changed before checking
> code lines one by one. In other words, I need a warning to reminder me on this,
> then I can know quickly my process might do some incorrect things.

Your process should not care about __GFP_COMP if you use properly
kmalloc()+kfree(). Once you start caring about __GFP_COMP, you should be using
page allocator's API, not kmalloc().

> Regards,
> Xiongwei
>
Xiongwei Song May 7, 2021, 5:41 a.m. UTC | #9
On Mon, May 3, 2021 at 8:35 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 4/28/21 5:05 AM, Xiongwei Song wrote:
> > On Tue, Apr 27, 2021 at 7:26 PM Matthew Wilcox <willy@infradead.org> wrote:
> >>
> >> On Tue, Apr 27, 2021 at 01:30:48PM +0800, Xiongwei Song wrote:
> >> > Hi Mattew,
> >> >
> >> > One more thing I should explain, the kmalloc_order() appends the
> >> > __GFP_COMP flags,
> >> > not by the caller.
> >> >
> >> > void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
> >> > {
> >> > ...........................................................
> >> >
> >> > flags |= __GFP_COMP;
> >> > page = alloc_pages(flags, order);
> >> > ...........................................................
> >> > return ret;
> >> > }
> >> > EXPORT_SYMBOL(kmalloc_order);
> >> >
> >> > #ifdef CONFIG_TRACING
> >> > void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
> >> > {
> >> > void *ret = kmalloc_order(size, flags, order);
> >> > trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << order, flags);
> >> > return ret;
> >> > }
> >> > EXPORT_SYMBOL(kmalloc_order_trace);
> >> > #endif
> >>
> >> Yes, I understood that.  What I don't understand is why appending the
> >> __GFP_COMP to the trace would have been less confusing for you.
> >>
> >> Suppose I have some code which calls:
> >>
> >>         kmalloc(10 * 1024, GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC);
> >>
> >> and I see in my logs
> >>
> >>      0.08%  call_site=ffffffff851d0cb0 ptr=0xffff8c04a4ca0000 bytes_req=10176 bytes_alloc=16384 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NOMEMALLOC|__GFP_COMP
> >>
> >> That seems to me _more_ confusing because I would wonder "Where did that
> >> __GFP_COMP come from?"
> >
> > Thank you for the comments. But I disagree.
>
> FTR, I agree with Matthew. This is a tracepoint for kmalloc() so I would expect
> to see what flags were passed to kmalloc().
> If I wanted to see how the flags translated to page allocator's flags, I would
> have used a page allocator's tracepoint which would show me that.

Make sense. Thank you.

> > When I use trace, I hope I can get the precise data rather than something
> > changed that I don't know , then I can get the correct conclusion or
> > direction on my issue.
>
> It's precise from the point of the caller.
>
> > Here my question is what the trace events are for if they don't provide the
> > real situation? I think that's not graceful and friendly.
> >
> > From my perspective, it'd be better to know my flags changed before checking
> > code lines one by one. In other words, I need a warning to reminder me on this,
> > then I can know quickly my process might do some incorrect things.
>
> Your process should not care about __GFP_COMP if you use properly
> kmalloc()+kfree(). Once you start caring about __GFP_COMP, you should be using
> page allocator's API, not kmalloc().
>
> > Regards,
> > Xiongwei
> >
>
diff mbox series

Patch

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 585a6f9..c23e1e8 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -928,7 +928,7 @@  EXPORT_SYMBOL(kmalloc_order);
 void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
 {
 	void *ret = kmalloc_order(size, flags, order);
-	trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << order, flags);
+	trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << order, flags | __GFP_COMP);
 	return ret;
 }
 EXPORT_SYMBOL(kmalloc_order_trace);