diff mbox series

mm, slab: Fix sign conversion problem in memcg_uncharge_slab()

Message ID 20200620184719.10994-1-longman@redhat.com (mailing list archive)
State New, archived
Headers show
Series mm, slab: Fix sign conversion problem in memcg_uncharge_slab() | expand

Commit Message

Waiman Long June 20, 2020, 6:47 p.m. UTC
It was found that running the LTP test on a PowerPC system could produce
erroneous values in /proc/meminfo, like:

  MemTotal:       531915072 kB
  MemFree:        507962176 kB
  MemAvailable:   1100020596352 kB

Using bisection, the problem is tracked down to commit 9c315e4d7d8c
("mm: memcg/slab: cache page number in memcg_(un)charge_slab()").

In memcg_uncharge_slab() with a "int order" argument:

  unsigned int nr_pages = 1 << order;
    :
  mod_lruvec_state(lruvec, cache_vmstat_idx(s), -nr_pages);

The mod_lruvec_state() function will eventually call the
__mod_zone_page_state() which accepts a long argument.  Depending on
the compiler and how inlining is done, "-nr_pages" may be treated as
a negative number or a very large positive number. Apparently, it was
treated as a large positive number in that PowerPC system leading to
incorrect stat counts. This problem hasn't been seen in x86-64 yet,
perhaps the gcc compiler there has some slight difference in behavior.

It is fixed by making nr_pages a signed value. For consistency, a
similar change is applied to memcg_charge_slab() as well.

Fixes: 9c315e4d7d8c ("mm: memcg/slab: cache page number in memcg_(un)charge_slab()").
Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/slab.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Roman Gushchin June 20, 2020, 7:59 p.m. UTC | #1
On Sat, Jun 20, 2020 at 02:47:19PM -0400, Waiman Long wrote:
> It was found that running the LTP test on a PowerPC system could produce
> erroneous values in /proc/meminfo, like:
> 
>   MemTotal:       531915072 kB
>   MemFree:        507962176 kB
>   MemAvailable:   1100020596352 kB
> 
> Using bisection, the problem is tracked down to commit 9c315e4d7d8c
> ("mm: memcg/slab: cache page number in memcg_(un)charge_slab()").
> 
> In memcg_uncharge_slab() with a "int order" argument:
> 
>   unsigned int nr_pages = 1 << order;
>     :
>   mod_lruvec_state(lruvec, cache_vmstat_idx(s), -nr_pages);
> 
> The mod_lruvec_state() function will eventually call the
> __mod_zone_page_state() which accepts a long argument.  Depending on
> the compiler and how inlining is done, "-nr_pages" may be treated as
> a negative number or a very large positive number. Apparently, it was
> treated as a large positive number in that PowerPC system leading to
> incorrect stat counts. This problem hasn't been seen in x86-64 yet,
> perhaps the gcc compiler there has some slight difference in behavior.
> 
> It is fixed by making nr_pages a signed value. For consistency, a
> similar change is applied to memcg_charge_slab() as well.
> 
> Fixes: 9c315e4d7d8c ("mm: memcg/slab: cache page number in memcg_(un)charge_slab()").
> Signed-off-by: Waiman Long <longman@redhat.com>

Good catch!

Interesting that I haven't seen it on x86-64, but it's reproducible on Power.

Acked-by: Roman Gushchin <guro@fb.com>

Thanks!

> ---
>  mm/slab.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/slab.h b/mm/slab.h
> index 207c83ef6e06..74f7e09a7cfd 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -348,7 +348,7 @@ static __always_inline int memcg_charge_slab(struct page *page,
>  					     gfp_t gfp, int order,
>  					     struct kmem_cache *s)
>  {
> -	unsigned int nr_pages = 1 << order;
> +	int nr_pages = 1 << order;
>  	struct mem_cgroup *memcg;
>  	struct lruvec *lruvec;
>  	int ret;
> @@ -388,7 +388,7 @@ static __always_inline int memcg_charge_slab(struct page *page,
>  static __always_inline void memcg_uncharge_slab(struct page *page, int order,
>  						struct kmem_cache *s)
>  {
> -	unsigned int nr_pages = 1 << order;
> +	int nr_pages = 1 << order;
>  	struct mem_cgroup *memcg;
>  	struct lruvec *lruvec;
>  
> -- 
> 2.18.1
>
Waiman Long June 20, 2020, 8:48 p.m. UTC | #2
On 6/20/20 3:59 PM, Roman Gushchin wrote:
> On Sat, Jun 20, 2020 at 02:47:19PM -0400, Waiman Long wrote:
>> It was found that running the LTP test on a PowerPC system could produce
>> erroneous values in /proc/meminfo, like:
>>
>>    MemTotal:       531915072 kB
>>    MemFree:        507962176 kB
>>    MemAvailable:   1100020596352 kB
>>
>> Using bisection, the problem is tracked down to commit 9c315e4d7d8c
>> ("mm: memcg/slab: cache page number in memcg_(un)charge_slab()").
>>
>> In memcg_uncharge_slab() with a "int order" argument:
>>
>>    unsigned int nr_pages = 1 << order;
>>      :
>>    mod_lruvec_state(lruvec, cache_vmstat_idx(s), -nr_pages);
>>
>> The mod_lruvec_state() function will eventually call the
>> __mod_zone_page_state() which accepts a long argument.  Depending on
>> the compiler and how inlining is done, "-nr_pages" may be treated as
>> a negative number or a very large positive number. Apparently, it was
>> treated as a large positive number in that PowerPC system leading to
>> incorrect stat counts. This problem hasn't been seen in x86-64 yet,
>> perhaps the gcc compiler there has some slight difference in behavior.
>>
>> It is fixed by making nr_pages a signed value. For consistency, a
>> similar change is applied to memcg_charge_slab() as well.
>>
>> Fixes: 9c315e4d7d8c ("mm: memcg/slab: cache page number in memcg_(un)charge_slab()").
>> Signed-off-by: Waiman Long <longman@redhat.com>
> Good catch!
>
> Interesting that I haven't seen it on x86-64, but it's reproducible on Power.
>
> Acked-by: Roman Gushchin <guro@fb.com>
>
I think it is probably related to the level of inlining that are being 
done. Besides, the interpretation of -nr_pages is ambiguous if nr_pages 
is an unsigned value and different compilers may handle it differently 
when sign extension is needed.

Cheers,
Longman
Andrew Morton June 20, 2020, 9 p.m. UTC | #3
On Sat, 20 Jun 2020 14:47:19 -0400 Waiman Long <longman@redhat.com> wrote:

> It was found that running the LTP test on a PowerPC system could produce
> erroneous values in /proc/meminfo, like:
> 
>   MemTotal:       531915072 kB
>   MemFree:        507962176 kB
>   MemAvailable:   1100020596352 kB
> 
> Using bisection, the problem is tracked down to commit 9c315e4d7d8c
> ("mm: memcg/slab: cache page number in memcg_(un)charge_slab()").
> 
> In memcg_uncharge_slab() with a "int order" argument:
> 
>   unsigned int nr_pages = 1 << order;
>     :
>   mod_lruvec_state(lruvec, cache_vmstat_idx(s), -nr_pages);
> 
> The mod_lruvec_state() function will eventually call the
> __mod_zone_page_state() which accepts a long argument.  Depending on
> the compiler and how inlining is done, "-nr_pages" may be treated as
> a negative number or a very large positive number. Apparently, it was
> treated as a large positive number in that PowerPC system leading to
> incorrect stat counts. This problem hasn't been seen in x86-64 yet,
> perhaps the gcc compiler there has some slight difference in behavior.
> 
> It is fixed by making nr_pages a signed value. For consistency, a
> similar change is applied to memcg_charge_slab() as well.

This is somewhat disturbing.

> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -348,7 +348,7 @@ static __always_inline int memcg_charge_slab(struct page *page,
>  					     gfp_t gfp, int order,
>  					     struct kmem_cache *s)
>  {
> -	unsigned int nr_pages = 1 << order;
> +	int nr_pages = 1 << order;
>  	struct mem_cgroup *memcg;
>  	struct lruvec *lruvec;
>  	int ret;
> @@ -388,7 +388,7 @@ static __always_inline int memcg_charge_slab(struct page *page,
>  static __always_inline void memcg_uncharge_slab(struct page *page, int order,
>  						struct kmem_cache *s)
>  {
> -	unsigned int nr_pages = 1 << order;
> +	int nr_pages = 1 << order;
>  	struct mem_cgroup *memcg;
>  	struct lruvec *lruvec;
>  

I grabbed the patch, but Roman's "mm: memcg/slab: charge individual
slab objects instead of pages"
(http://lkml.kernel.org/r/20200608230654.828134-10-guro@fb.com) deletes
both these functions.

It replaces the offending code with, afaict,


static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page,
					void *p)
{
	struct obj_cgroup *objcg;
	unsigned int off;

	if (!memcg_kmem_enabled() || is_root_cache(s))
		return;

	off = obj_to_index(s, page, p);
	objcg = page_obj_cgroups(page)[off];
	page_obj_cgroups(page)[off] = NULL;

	obj_cgroup_uncharge(objcg, obj_full_size(s));
	mod_objcg_state(objcg, page_pgdat(page), cache_vmstat_idx(s),
>>>			-obj_full_size(s));

	obj_cgroup_put(objcg);
}

-obj_full_size() returns size_t so I guess that's OK.



Also


static __always_inline void uncharge_slab_page(struct page *page, int order,
					       struct kmem_cache *s)
{
#ifdef CONFIG_MEMCG_KMEM
	if (memcg_kmem_enabled() && !is_root_cache(s)) {
		memcg_free_page_obj_cgroups(page);
		percpu_ref_put_many(&s->memcg_params.refcnt, 1 << order);
	}
#endif
	mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s),
>>>			    -(PAGE_SIZE << order));
}

PAGE_SIZE is unsigned long so I guess that's OK as well.


Still, perhaps both could be improved.  Negating an unsigned scalar is
a pretty ugly thing to do.

Am I wrong in thinking that all those mod_foo() functions need careful
review?
Waiman Long June 20, 2020, 9:07 p.m. UTC | #4
On 6/20/20 5:00 PM, Andrew Morton wrote:
> On Sat, 20 Jun 2020 14:47:19 -0400 Waiman Long <longman@redhat.com> wrote:
>
>> It was found that running the LTP test on a PowerPC system could produce
>> erroneous values in /proc/meminfo, like:
>>
>>    MemTotal:       531915072 kB
>>    MemFree:        507962176 kB
>>    MemAvailable:   1100020596352 kB
>>
>> Using bisection, the problem is tracked down to commit 9c315e4d7d8c
>> ("mm: memcg/slab: cache page number in memcg_(un)charge_slab()").
>>
>> In memcg_uncharge_slab() with a "int order" argument:
>>
>>    unsigned int nr_pages = 1 << order;
>>      :
>>    mod_lruvec_state(lruvec, cache_vmstat_idx(s), -nr_pages);
>>
>> The mod_lruvec_state() function will eventually call the
>> __mod_zone_page_state() which accepts a long argument.  Depending on
>> the compiler and how inlining is done, "-nr_pages" may be treated as
>> a negative number or a very large positive number. Apparently, it was
>> treated as a large positive number in that PowerPC system leading to
>> incorrect stat counts. This problem hasn't been seen in x86-64 yet,
>> perhaps the gcc compiler there has some slight difference in behavior.
>>
>> It is fixed by making nr_pages a signed value. For consistency, a
>> similar change is applied to memcg_charge_slab() as well.
> This is somewhat disturbing.
>
>> --- a/mm/slab.h
>> +++ b/mm/slab.h
>> @@ -348,7 +348,7 @@ static __always_inline int memcg_charge_slab(struct page *page,
>>   					     gfp_t gfp, int order,
>>   					     struct kmem_cache *s)
>>   {
>> -	unsigned int nr_pages = 1 << order;
>> +	int nr_pages = 1 << order;
>>   	struct mem_cgroup *memcg;
>>   	struct lruvec *lruvec;
>>   	int ret;
>> @@ -388,7 +388,7 @@ static __always_inline int memcg_charge_slab(struct page *page,
>>   static __always_inline void memcg_uncharge_slab(struct page *page, int order,
>>   						struct kmem_cache *s)
>>   {
>> -	unsigned int nr_pages = 1 << order;
>> +	int nr_pages = 1 << order;
>>   	struct mem_cgroup *memcg;
>>   	struct lruvec *lruvec;
>>   
> I grabbed the patch, but Roman's "mm: memcg/slab: charge individual
> slab objects instead of pages"
> (http://lkml.kernel.org/r/20200608230654.828134-10-guro@fb.com) deletes
> both these functions.
>
> It replaces the offending code with, afaict,
>
>
> static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page,
> 					void *p)
> {
> 	struct obj_cgroup *objcg;
> 	unsigned int off;
>
> 	if (!memcg_kmem_enabled() || is_root_cache(s))
> 		return;
>
> 	off = obj_to_index(s, page, p);
> 	objcg = page_obj_cgroups(page)[off];
> 	page_obj_cgroups(page)[off] = NULL;
>
> 	obj_cgroup_uncharge(objcg, obj_full_size(s));
> 	mod_objcg_state(objcg, page_pgdat(page), cache_vmstat_idx(s),
>>>> 			-obj_full_size(s));
> 	obj_cgroup_put(objcg);
> }
>
> -obj_full_size() returns size_t so I guess that's OK.

size_t is unsigned long. So there is no sign extension. It should be 
fine then.


>
>
>
> Also
>
>
> static __always_inline void uncharge_slab_page(struct page *page, int order,
> 					       struct kmem_cache *s)
> {
> #ifdef CONFIG_MEMCG_KMEM
> 	if (memcg_kmem_enabled() && !is_root_cache(s)) {
> 		memcg_free_page_obj_cgroups(page);
> 		percpu_ref_put_many(&s->memcg_params.refcnt, 1 << order);
> 	}
> #endif
> 	mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s),
>>>> 			    -(PAGE_SIZE << order));
> }
>
> PAGE_SIZE is unsigned long so I guess that's OK as well.
>
>
> Still, perhaps both could be improved.  Negating an unsigned scalar is
> a pretty ugly thing to do.
>
> Am I wrong in thinking that all those mod_foo() functions need careful
> review?
>
Yes, we may need to review code that has sign extension to make sure 
such an error is not happening again.

BTW, is it possible to place my patch before Roman's patch and modifying 
his patch accordingly? This patch needs to be ported to 5.7-stable.

Cheers,
Longman
Roman Gushchin June 20, 2020, 9:12 p.m. UTC | #5
On Sat, Jun 20, 2020 at 02:00:18PM -0700, Andrew Morton wrote:
> On Sat, 20 Jun 2020 14:47:19 -0400 Waiman Long <longman@redhat.com> wrote:
> 
> > It was found that running the LTP test on a PowerPC system could produce
> > erroneous values in /proc/meminfo, like:
> > 
> >   MemTotal:       531915072 kB
> >   MemFree:        507962176 kB
> >   MemAvailable:   1100020596352 kB
> > 
> > Using bisection, the problem is tracked down to commit 9c315e4d7d8c
> > ("mm: memcg/slab: cache page number in memcg_(un)charge_slab()").
> > 
> > In memcg_uncharge_slab() with a "int order" argument:
> > 
> >   unsigned int nr_pages = 1 << order;
> >     :
> >   mod_lruvec_state(lruvec, cache_vmstat_idx(s), -nr_pages);
> > 
> > The mod_lruvec_state() function will eventually call the
> > __mod_zone_page_state() which accepts a long argument.  Depending on
> > the compiler and how inlining is done, "-nr_pages" may be treated as
> > a negative number or a very large positive number. Apparently, it was
> > treated as a large positive number in that PowerPC system leading to
> > incorrect stat counts. This problem hasn't been seen in x86-64 yet,
> > perhaps the gcc compiler there has some slight difference in behavior.
> > 
> > It is fixed by making nr_pages a signed value. For consistency, a
> > similar change is applied to memcg_charge_slab() as well.
> 
> This is somewhat disturbing.
> 
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -348,7 +348,7 @@ static __always_inline int memcg_charge_slab(struct page *page,
> >  					     gfp_t gfp, int order,
> >  					     struct kmem_cache *s)
> >  {
> > -	unsigned int nr_pages = 1 << order;
> > +	int nr_pages = 1 << order;
> >  	struct mem_cgroup *memcg;
> >  	struct lruvec *lruvec;
> >  	int ret;
> > @@ -388,7 +388,7 @@ static __always_inline int memcg_charge_slab(struct page *page,
> >  static __always_inline void memcg_uncharge_slab(struct page *page, int order,
> >  						struct kmem_cache *s)
> >  {
> > -	unsigned int nr_pages = 1 << order;
> > +	int nr_pages = 1 << order;
> >  	struct mem_cgroup *memcg;
> >  	struct lruvec *lruvec;
> >  
> 
> I grabbed the patch, but Roman's "mm: memcg/slab: charge individual
> slab objects instead of pages"
> (http://lkml.kernel.org/r/20200608230654.828134-10-guro@fb.com) deletes
> both these functions.

It looks like Waiman's patch should be backported to stable.
So if you can queue it before my series, that would be nice.

> 
> It replaces the offending code with, afaict,
> 
> 
> static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page,
> 					void *p)
> {
> 	struct obj_cgroup *objcg;
> 	unsigned int off;
> 
> 	if (!memcg_kmem_enabled() || is_root_cache(s))
> 		return;
> 
> 	off = obj_to_index(s, page, p);
> 	objcg = page_obj_cgroups(page)[off];
> 	page_obj_cgroups(page)[off] = NULL;
> 
> 	obj_cgroup_uncharge(objcg, obj_full_size(s));
> 	mod_objcg_state(objcg, page_pgdat(page), cache_vmstat_idx(s),
> >>>			-obj_full_size(s));
> 
> 	obj_cgroup_put(objcg);
> }
> 
> -obj_full_size() returns size_t so I guess that's OK.
> 
> 
> 
> Also
> 
> 
> static __always_inline void uncharge_slab_page(struct page *page, int order,
> 					       struct kmem_cache *s)
> {
> #ifdef CONFIG_MEMCG_KMEM
> 	if (memcg_kmem_enabled() && !is_root_cache(s)) {
> 		memcg_free_page_obj_cgroups(page);
> 		percpu_ref_put_many(&s->memcg_params.refcnt, 1 << order);
> 	}
> #endif
> 	mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s),
> >>>			    -(PAGE_SIZE << order));
> }
> 
> PAGE_SIZE is unsigned long so I guess that's OK as well.
> 
> 
> Still, perhaps both could be improved.  Negating an unsigned scalar is
> a pretty ugly thing to do.
> 
> Am I wrong in thinking that all those mod_foo() functions need careful
> review?
> 

I'll take a look too.


Thanks!
diff mbox series

Patch

diff --git a/mm/slab.h b/mm/slab.h
index 207c83ef6e06..74f7e09a7cfd 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -348,7 +348,7 @@  static __always_inline int memcg_charge_slab(struct page *page,
 					     gfp_t gfp, int order,
 					     struct kmem_cache *s)
 {
-	unsigned int nr_pages = 1 << order;
+	int nr_pages = 1 << order;
 	struct mem_cgroup *memcg;
 	struct lruvec *lruvec;
 	int ret;
@@ -388,7 +388,7 @@  static __always_inline int memcg_charge_slab(struct page *page,
 static __always_inline void memcg_uncharge_slab(struct page *page, int order,
 						struct kmem_cache *s)
 {
-	unsigned int nr_pages = 1 << order;
+	int nr_pages = 1 << order;
 	struct mem_cgroup *memcg;
 	struct lruvec *lruvec;