diff mbox series

zsmalloc: fix linking bug in init_zspage

Message ID 20180810002817.2667-1-zhouxianrong@tom.com (mailing list archive)
State New, archived
Headers show
Series zsmalloc: fix linking bug in init_zspage | expand

Commit Message

zhou xianrong Aug. 10, 2018, 12:28 a.m. UTC
From: zhouxianrong <zhouxianrong@huawei.com>

The last partial object in last subpage of zspage should not be linked
in allocation list. Otherwise it could trigger BUG_ON explicitly at
function zs_map_object. But it happened rarely.

Signed-off-by: zhouxianrong <zhouxianrong@huawei.com>
---
 mm/zsmalloc.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Minchan Kim Aug. 13, 2018, 6:05 a.m. UTC | #1
Hi,

On Thu, Aug 09, 2018 at 08:28:17PM -0400, zhouxianrong wrote:
> From: zhouxianrong <zhouxianrong@huawei.com>
> 
> The last partial object in last subpage of zspage should not be linked
> in allocation list. Otherwise it could trigger BUG_ON explicitly at
> function zs_map_object. But it happened rarely.

Could you be more specific? What case did you see the problem?
Is it a real problem or one founded by review?

Thanks.

> 
> Signed-off-by: zhouxianrong <zhouxianrong@huawei.com>
> ---
>  mm/zsmalloc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 8d87e973a4f5..24dd8da0aa59 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1040,6 +1040,8 @@ static void init_zspage(struct size_class *class, struct zspage *zspage)
>  			 * Reset OBJ_TAG_BITS bit to last link to tell
>  			 * whether it's allocated object or not.
>  			 */
> +			if (off > PAGE_SIZE)
> +				link -= class->size / sizeof(*link);
>  			link->next = -1UL << OBJ_TAG_BITS;
>  		}
>  		kunmap_atomic(vaddr);
> -- 
> 2.13.6
>
Sergey Senozhatsky Aug. 13, 2018, 10:55 a.m. UTC | #2
On (08/13/18 15:05), Minchan Kim wrote:
> > From: zhouxianrong <zhouxianrong@huawei.com>
> > 
> > The last partial object in last subpage of zspage should not be linked
> > in allocation list. Otherwise it could trigger BUG_ON explicitly at
> > function zs_map_object. But it happened rarely.
> 
> Could you be more specific? What case did you see the problem?
> Is it a real problem or one founded by review?
[..]
> > Signed-off-by: zhouxianrong <zhouxianrong@huawei.com>
> > ---
> >  mm/zsmalloc.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > index 8d87e973a4f5..24dd8da0aa59 100644
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -1040,6 +1040,8 @@ static void init_zspage(struct size_class *class, struct zspage *zspage)
> >  			 * Reset OBJ_TAG_BITS bit to last link to tell
> >  			 * whether it's allocated object or not.
> >  			 */
> > +			if (off > PAGE_SIZE)
> > +				link -= class->size / sizeof(*link);
> >  			link->next = -1UL << OBJ_TAG_BITS;
> >  		}
> >  		kunmap_atomic(vaddr);

Hmm. This can be a real issue. Unless I'm missing something.

So... I might be wrong, but the way I see the bug report is:

When we link objects during zspage init, we do the following:

	while ((off += class->size) < PAGE_SIZE) {
		link->next = freeobj++ << OBJ_TAG_BITS;
		link += class->size / sizeof(*link);
	}

Note that we increment the link first, link += class->size / sizeof(*link),
and check for the offset only afterwards. So by the time we break out of
the while-loop the link *might* point to the partial object which starts at
the last page of zspage, but *never* ends, because we don't have next_page
in current zspage. So that's why that object should not be linked in,
because it's not a valid allocates object - we simply don't have space
for it anymore.

zspage [      page 1     ][      page 2      ]
        ...............................link
	                                   [..###]

therefore the last object must be "link - 1" for such cases.

I think, the following change can also do the trick:

	while ((off + class->size) < PAGE_SIZE) {
		link->next = freeobj++ << OBJ_TAG_BITS;
		link += class->size / sizeof(*link);
		off += class->size;
	}

Once again, I might be wrong on this.
Any thoughts?

	-ss
Minchan Kim Aug. 14, 2018, 12:24 a.m. UTC | #3
Hi Sergey,

On Mon, Aug 13, 2018 at 07:55:36PM +0900, Sergey Senozhatsky wrote:
> On (08/13/18 15:05), Minchan Kim wrote:
> > > From: zhouxianrong <zhouxianrong@huawei.com>
> > > 
> > > The last partial object in last subpage of zspage should not be linked
> > > in allocation list. Otherwise it could trigger BUG_ON explicitly at
> > > function zs_map_object. But it happened rarely.
> > 
> > Could you be more specific? What case did you see the problem?
> > Is it a real problem or one founded by review?
> [..]
> > > Signed-off-by: zhouxianrong <zhouxianrong@huawei.com>
> > > ---
> > >  mm/zsmalloc.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > > index 8d87e973a4f5..24dd8da0aa59 100644
> > > --- a/mm/zsmalloc.c
> > > +++ b/mm/zsmalloc.c
> > > @@ -1040,6 +1040,8 @@ static void init_zspage(struct size_class *class, struct zspage *zspage)
> > >  			 * Reset OBJ_TAG_BITS bit to last link to tell
> > >  			 * whether it's allocated object or not.
> > >  			 */
> > > +			if (off > PAGE_SIZE)
> > > +				link -= class->size / sizeof(*link);
> > >  			link->next = -1UL << OBJ_TAG_BITS;
> > >  		}
> > >  		kunmap_atomic(vaddr);
> 
> Hmm. This can be a real issue. Unless I'm missing something.
> 
> So... I might be wrong, but the way I see the bug report is:
> 
> When we link objects during zspage init, we do the following:
> 
> 	while ((off += class->size) < PAGE_SIZE) {
> 		link->next = freeobj++ << OBJ_TAG_BITS;
> 		link += class->size / sizeof(*link);
> 	}
> 
> Note that we increment the link first, link += class->size / sizeof(*link),
> and check for the offset only afterwards. So by the time we break out of
> the while-loop the link *might* point to the partial object which starts at
> the last page of zspage, but *never* ends, because we don't have next_page
> in current zspage. So that's why that object should not be linked in,
> because it's not a valid allocates object - we simply don't have space
> for it anymore.
> 
> zspage [      page 1     ][      page 2      ]
>         ...............................link
> 	                                   [..###]
> 
> therefore the last object must be "link - 1" for such cases.
> 
> I think, the following change can also do the trick:
> 
> 	while ((off + class->size) < PAGE_SIZE) {
> 		link->next = freeobj++ << OBJ_TAG_BITS;
> 		link += class->size / sizeof(*link);
> 		off += class->size;
> 	}
> 
> Once again, I might be wrong on this.
> Any thoughts?

If we want a refactoring, I'm not against but description said it tiggered
BUG_ON on zs_map_object rarely. That means it should be stable material
and need more description to understand. Please be more specific with
some example. The reason I'm hesitating is zsmalloc moves ZS_FULL group
when the zspage->inuse is equal to class->objs_per_zspage so I thought
it shouldn't allocate last partial object.

Thanks.
Sergey Senozhatsky Aug. 14, 2018, 12:51 a.m. UTC | #4
Hi Minchan,

On (08/14/18 09:24), Minchan Kim wrote:
> > Any thoughts?
>
> If we want a refactoring, I'm not against but description said it tiggered
> BUG_ON on zs_map_object rarely. That means it should be stable material
> and need more description to understand. Please be more specific with
> some example.

I don't have any BUG_ON on hands. Would be great if zhouxianrong could
post some backtraces or more info/explanation.

> The reason I'm hesitating is zsmalloc moves ZS_FULL group
> when the zspage->inuse is equal to class->objs_per_zspage so I thought
> it shouldn't allocate last partial object.

Maybe, allocating last partial object does look a bit hacky - it's not a
valid object anyway, but I'm not suggesting that we need to change it.
Let's hear from zhouxianrong.

	-ss
zhou xianrong Aug. 16, 2018, 12:10 a.m. UTC | #5
H<span labeltype="transpond"><minchan kernel="" org="">i:<br /><br />&nbsp; I am sorry so later for replying this message due to something.<br /><br />This is the backtrace edited by me we met.<br /><br />[pid:3471,cpu4,thread-3]------------[ cut here ]------------<br />[pid:3471,cpu4,thread-3]kernel bug at ../../../../../../mm/zsmalloc.c:1455!<br />[pid:3471,cpu4,thread-3]internal error: oops - bug: 0 [#1] preempt smp<br />[pid:3471,cpu4,thread-3]modules linked in:<br />[pid:3471,cpu4,thread-3]cpu: 4 pid: 3471 comm: thread-3 tainted: g&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; w&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4.9.84 #1<br />[pid:3471,cpu4,thread-3]tgid: 715 comm: proc-a<br />[pid:3471,cpu4,thread-3]task: ffffffcc83ba1d00 task.stack: ffffffcad99b0000<br />[pid:3471,cpu4,thread-3]pc is at zs_map_object+0x1e0/0x1f0<br />[pid:3471,cpu4,thread-3]lr is at zs_map_object+0x9c/0x1f0<br />[pid:3471,cpu4,thread-3]pc : [] lr : [] pstate: 20000145<br />[pid:3471,cpu4,thread-3]sp : ffffffcad99b3530<br />[pid:3471,cpu4,thread-3]x29: ffffffcad99b3530 x28: ffffffcc97533c40<br />[pid:3471,cpu4,thread-3]x27: ffffffcc974dd720 x26: ffffffcad99b0000<br />[pid:3471,cpu4,thread-3]x25: 0000000001fa9f80 x24: 0000000000000002<br />[pid:3471,cpu4,thread-3]x23: ffffff89c3a27000 x22: ffffff89c30e6000<br />[pid:3471,cpu4,thread-3]x21: ffffff89c354f000 x20: ffffff89c3234720<br />[pid:3471,cpu4,thread-3]x19: 0000000000000f90 x18: 0000000000000008<br />[pid:3471,cpu4,thread-3]x17: 00000000bbb877ff x16: 00000000ffdba560<br />[pid:3471,cpu4,thread-3]x15: ffffffcaeab13ff5 x14: 000000009e3779b1<br />[pid:3471,cpu4,thread-3]x13: 0000000000000ff4 x12: ffffffcaeab13fd9<br />[pid:3471,cpu4,thread-3]x11: ffffffcaeab13ffa x10: ffffffcaeab13ff8<br />[pid:3471,cpu4,thread-3]x9 : ffffffca8cc201b8 x8 : ffffffca8cc20190<br />[pid:3471,cpu4,thread-3]x7 : 000000000000008e x6 : 000000000000009b<br />[pid:3471,cpu4,thread-3]x5 : 0000000000000000 x4 : 0000000000000001<br />[pid:3471,cpu4,thread-3]x3 : 00000042d42a9000 x2 : 00000000000009d0<br />[pid:3471,cpu4,thread-3]x1 : ffffffcc994ddbc0 x0 : 0000000000000000<br /><br />[pid:3471,cpu4,thread-3] zs_map_object+0x1e0/0x1f0<br />[pid:3471,cpu4,thread-3] zs_zpool_map+0x44/0x54<br />[pid:3471,cpu4,thread-3] zpool_map_handle+0x44/0x58<br />[pid:3471,cpu4,thread-3] zram_bvec_write+0x22c/0x76c<br />[pid:3471,cpu4,thread-3] zram_bvec_rw+0x288/0x488<br />[pid:3471,cpu4,thread-3] zram_rw_page+0x124/0x1a4<br />[pid:3471,cpu4,thread-3] bdev_write_page+0x8c/0xd8<br />[pid:3471,cpu4,thread-3] __swap_writepage+0x1c0/0x3a8<br />[pid:3471,cpu4,thread-3] swap_writepage+0x3c/0x64<br />[pid:3471,cpu4,thread-3] shrink_page_list+0x844/0xd84<br />[pid:3471,cpu4,thread-3] reclaim_pages_from_list+0xf4/0x1bc<br />[pid:3471,cpu4,thread-3] reclaim_pte_range+0x208/0x2a0<br />[pid:3471,cpu4,thread-3] walk_pgd_range+0xe8/0x238<br />[pid:3471,cpu4,thread-3] walk_page_range+0x7c/0x164<br />[pid:3471,cpu4,thread-3] reclaim_write+0x208/0x608<br />[pid:3471,cpu4,thread-3] __vfs_write+0x50/0x88<br />[pid:3471,cpu4,thread-3] vfs_write+0xbc/0x2b0<br />[pid:3471,cpu4,thread-3] sys_write+0x60/0xc4<br />[pid:3471,cpu4,thread-3] el0_svc_naked+0x34/0x38<br />[pid:3471,cpu4,thread-3]code: 17ffffdd d4210000 97ffff1f 97ffff83 (d4210000)<br />[pid:3471,cpu4,thread-3]---[ end trace 652caafc4c4b6d06 ]--- <br /></minchan></span><blockquote style="padding-left:1ex;margin:0px 0px 0px 0.8ex;border-left:#ccc 1px solid"><pre>Hi Sergey,

On Mon, Aug 13, 2018 at 07:55:36PM +0900, Sergey Senozhatsky wrote:
&gt; On (08/13/18 15:05), Minchan Kim wrote:
&gt; &gt; &gt; From: zhouxianrong <zhouxianrong huawei="" com="">
&gt; &gt; &gt; 
&gt; &gt; &gt; The last partial object in last subpage of zspage should not be linked
&gt; &gt; &gt; in allocation list. Otherwise it could trigger BUG_ON explicitly at
&gt; &gt; &gt; function zs_map_object. But it happened rarely.
&gt; &gt; 
&gt; &gt; Could you be more specific? What case did you see the problem?
&gt; &gt; Is it a real problem or one founded by review?
&gt; [..]
&gt; &gt; &gt; Signed-off-by: zhouxianrong <zhouxianrong huawei="" com="">
&gt; &gt; &gt; ---
&gt; &gt; &gt;  mm/zsmalloc.c | 2 ++
&gt; &gt; &gt;  1 file changed, 2 insertions(+)
&gt; &gt; &gt; 
&gt; &gt; &gt; diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
&gt; &gt; &gt; index 8d87e973a4f5..24dd8da0aa59 100644
&gt; &gt; &gt; --- a/mm/zsmalloc.c
&gt; &gt; &gt; +++ b/mm/zsmalloc.c
&gt; &gt; &gt; @@ -1040,6 +1040,8 @@ static void init_zspage(struct size_class *class, struct zspage *zspage)
&gt; &gt; &gt;  			 * Reset OBJ_TAG_BITS bit to last link to tell
&gt; &gt; &gt;  			 * whether it's allocated object or not.
&gt; &gt; &gt;  			 */
&gt; &gt; &gt; +			if (off &gt; PAGE_SIZE)
&gt; &gt; &gt; +				link -= class-&gt;size / sizeof(*link);
&gt; &gt; &gt;  			link-&gt;next = -1UL &lt;&lt; OBJ_TAG_BITS;
&gt; &gt; &gt;  		}
&gt; &gt; &gt;  		kunmap_atomic(vaddr);
&gt; 
&gt; Hmm. This can be a real issue. Unless I'm missing something.
&gt; 
&gt; So... I might be wrong, but the way I see the bug report is:
&gt; 
&gt; When we link objects during zspage init, we do the following:
&gt; 
&gt; 	while ((off += class-&gt;size) &lt; PAGE_SIZE) {
&gt; 		link-&gt;next = freeobj++ &lt;&lt; OBJ_TAG_BITS;
&gt; 		link += class-&gt;size / sizeof(*link);
&gt; 	}
&gt; 
&gt; Note that we increment the link first, link += class-&gt;size / sizeof(*link),
&gt; and check for the offset only afterwards. So by the time we break out of
&gt; the while-loop the link *might* point to the partial object which starts at
&gt; the last page of zspage, but *never* ends, because we don't have next_page
&gt; in current zspage. So that's why that object should not be linked in,
&gt; because it's not a valid allocates object - we simply don't have space
&gt; for it anymore.
&gt; 
&gt; zspage [      page 1     ][      page 2      ]
&gt;         ...............................link
&gt; 	                                   [..###]
&gt; 
&gt; therefore the last object must be &quot;link - 1&quot; for such cases.
&gt; 
&gt; I think, the following change can also do the trick:
&gt; 
&gt; 	while ((off + class-&gt;size) &lt; PAGE_SIZE) {
&gt; 		link-&gt;next = freeobj++ &lt;&lt; OBJ_TAG_BITS;
&gt; 		link += class-&gt;size / sizeof(*link);
&gt; 		off += class-&gt;size;
&gt; 	}
&gt; 
&gt; Once again, I might be wrong on this.
&gt; Any thoughts?

If we want a refactoring, I'm not against but description said it tiggered
BUG_ON on zs_map_object rarely. That means it should be stable material
and need more description to understand. Please be more specific with
some example. The reason I'm hesitating is zsmalloc moves ZS_FULL group
when the zspage-&gt;inuse is equal to class-&gt;objs_per_zspage so I thought
it shouldn't allocate last partial object.

Thanks.
</zhouxianrong></zhouxianrong></pre></blockquote><div style="height:30px;"></div><div style="height:2px;width:298px;border-bottom:solid 2px #e5e5e5"></div><div style="height:20px;"></div><a target="_blank" style="background-image:url(http://r.g.tom.com/kwap/r/app/other/suixinyou.png);background-repeat:no-repeat;background-position:left center;font-size:14px;background-size: 20px;height: 39px;line-height: 39px;padding-left: 25px;display:block;color:#333333;text-decoration: none;" href="http://mail.tom.com/webmail-static/welcomesxy.html"  onmouseover="this.style.cssText='background-image:url(http://r.g.tom.com/kwap/r/app/other/suixinyou.png);background-repeat:no-repeat;background-position:left center;font-size:14px;background-size: 20px;height: 39px;line-height: 39px;padding-left: 27px;display:block;color:#4c4c4c; text-decoration:underline;'" onmouseout="this.style.cssText='background-image:url(http://r.g.tom.com/kwap/r/app/other/suixinyou.png);background-repeat:no-repeat;background-position:left center;font-size:14px;background-size: 20px;height: 39px;line-height: 39px;padding-left: 27px;display:block;color:#4c4c4c;text-decoration:none'">随心邮-在微信里收发邮件,及时省电又安心</a>
H<span labeltype="transpond"><minchan kernel="" org="">i:<br /><br />&nbsp; I am sorry so later for replying this message due to something.<br /><br />This is the backtrace edited by me we met.<br /><br />[pid:3471,cpu4,thread-3]------------[ cut here ]------------<br />[pid:3471,cpu4,thread-3]kernel bug at ../../../../../../mm/zsmalloc.c:1455!<br />[pid:3471,cpu4,thread-3]internal error: oops - bug: 0 [#1] preempt smp<br />[pid:3471,cpu4,thread-3]modules linked in:<br />[pid:3471,cpu4,thread-3]cpu: 4 pid: 3471 comm: thread-3 tainted: g&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; w&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4.9.84 #1<br />[pid:3471,cpu4,thread-3]tgid: 715 comm: proc-a<br />[pid:3471,cpu4,thread-3]task: ffffffcc83ba1d00 task.stack: ffffffcad99b0000<br />[pid:3471,cpu4,thread-3]pc is at zs_map_object+0x1e0/0x1f0<br />[pid:3471,cpu4,thread-3]lr is at zs_map_object+0x9c/0x1f0<br />[pid:3471,cpu4,thread-3]pc : [] lr : [] pstate: 20000145<br />[pid:3471,cpu4,thread-3]sp : ffffffcad99b3530<br />[pid:3471,cpu4,thread-3]x29: ffffffcad99b3530 x28: ffffffcc97533c40<br />[pid:3471,cpu4,thread-3]x27: ffffffcc974dd720 x26: ffffffcad99b0000<br />[pid:3471,cpu4,thread-3]x25: 0000000001fa9f80 x24: 0000000000000002<br />[pid:3471,cpu4,thread-3]x23: ffffff89c3a27000 x22: ffffff89c30e6000<br />[pid:3471,cpu4,thread-3]x21: ffffff89c354f000 x20: ffffff89c3234720<br />[pid:3471,cpu4,thread-3]x19: 0000000000000f90 x18: 0000000000000008<br />[pid:3471,cpu4,thread-3]x17: 00000000bbb877ff x16: 00000000ffdba560<br />[pid:3471,cpu4,thread-3]x15: ffffffcaeab13ff5 x14: 000000009e3779b1<br />[pid:3471,cpu4,thread-3]x13: 0000000000000ff4 x12: ffffffcaeab13fd9<br />[pid:3471,cpu4,thread-3]x11: ffffffcaeab13ffa x10: ffffffcaeab13ff8<br />[pid:3471,cpu4,thread-3]x9 : ffffffca8cc201b8 x8 : ffffffca8cc20190<br />[pid:3471,cpu4,thread-3]x7 : 000000000000008e x6 : 000000000000009b<br />[pid:3471,cpu4,thread-3]x5 : 0000000000000000 x4 : 0000000000000001<br />[pid:3471,cpu4,thread-3]x3 : 00000042d42a9000 x2 : 00000000000009d0<br />[pid:3471,cpu4,thread-3]x1 : ffffffcc994ddbc0 x0 : 0000000000000000<br /><br />[pid:3471,cpu4,thread-3] zs_map_object+0x1e0/0x1f0<br />[pid:3471,cpu4,thread-3] zs_zpool_map+0x44/0x54<br />[pid:3471,cpu4,thread-3] zpool_map_handle+0x44/0x58<br />[pid:3471,cpu4,thread-3] zram_bvec_write+0x22c/0x76c<br />[pid:3471,cpu4,thread-3] zram_bvec_rw+0x288/0x488<br />[pid:3471,cpu4,thread-3] zram_rw_page+0x124/0x1a4<br />[pid:3471,cpu4,thread-3] bdev_write_page+0x8c/0xd8<br />[pid:3471,cpu4,thread-3] __swap_writepage+0x1c0/0x3a8<br />[pid:3471,cpu4,thread-3] swap_writepage+0x3c/0x64<br />[pid:3471,cpu4,thread-3] shrink_page_list+0x844/0xd84<br />[pid:3471,cpu4,thread-3] reclaim_pages_from_list+0xf4/0x1bc<br />[pid:3471,cpu4,thread-3] reclaim_pte_range+0x208/0x2a0<br />[pid:3471,cpu4,thread-3] walk_pgd_range+0xe8/0x238<br />[pid:3471,cpu4,thread-3] walk_page_range+0x7c/0x164<br />[pid:3471,cpu4,thread-3] reclaim_write+0x208/0x608<br />[pid:3471,cpu4,thread-3] __vfs_write+0x50/0x88<br />[pid:3471,cpu4,thread-3] vfs_write+0xbc/0x2b0<br />[pid:3471,cpu4,thread-3] sys_write+0x60/0xc4<br />[pid:3471,cpu4,thread-3] el0_svc_naked+0x34/0x38<br />[pid:3471,cpu4,thread-3]code: 17ffffdd d4210000 97ffff1f 97ffff83 (d4210000)<br />[pid:3471,cpu4,thread-3]---[ end trace 652caafc4c4b6d06 ]--- <br /></minchan></span><blockquote style="padding-left:1ex;margin:0px 0px 0px 0.8ex;border-left:#ccc 1px solid"><pre>Hi Sergey,

On Mon, Aug 13, 2018 at 07:55:36PM +0900, Sergey Senozhatsky wrote:
&gt; On (08/13/18 15:05), Minchan Kim wrote:
&gt; &gt; &gt; From: zhouxianrong <zhouxianrong huawei="" com="">
&gt; &gt; &gt; 
&gt; &gt; &gt; The last partial object in last subpage of zspage should not be linked
&gt; &gt; &gt; in allocation list. Otherwise it could trigger BUG_ON explicitly at
&gt; &gt; &gt; function zs_map_object. But it happened rarely.
&gt; &gt; 
&gt; &gt; Could you be more specific? What case did you see the problem?
&gt; &gt; Is it a real problem or one founded by review?
&gt; [..]
&gt; &gt; &gt; Signed-off-by: zhouxianrong <zhouxianrong huawei="" com="">
&gt; &gt; &gt; ---
&gt; &gt; &gt;  mm/zsmalloc.c | 2 ++
&gt; &gt; &gt;  1 file changed, 2 insertions(+)
&gt; &gt; &gt; 
&gt; &gt; &gt; diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
&gt; &gt; &gt; index 8d87e973a4f5..24dd8da0aa59 100644
&gt; &gt; &gt; --- a/mm/zsmalloc.c
&gt; &gt; &gt; +++ b/mm/zsmalloc.c
&gt; &gt; &gt; @@ -1040,6 +1040,8 @@ static void init_zspage(struct size_class *class, struct zspage *zspage)
&gt; &gt; &gt;  			 * Reset OBJ_TAG_BITS bit to last link to tell
&gt; &gt; &gt;  			 * whether it's allocated object or not.
&gt; &gt; &gt;  			 */
&gt; &gt; &gt; +			if (off &gt; PAGE_SIZE)
&gt; &gt; &gt; +				link -= class-&gt;size / sizeof(*link);
&gt; &gt; &gt;  			link-&gt;next = -1UL &lt;&lt; OBJ_TAG_BITS;
&gt; &gt; &gt;  		}
&gt; &gt; &gt;  		kunmap_atomic(vaddr);
&gt; 
&gt; Hmm. This can be a real issue. Unless I'm missing something.
&gt; 
&gt; So... I might be wrong, but the way I see the bug report is:
&gt; 
&gt; When we link objects during zspage init, we do the following:
&gt; 
&gt; 	while ((off += class-&gt;size) &lt; PAGE_SIZE) {
&gt; 		link-&gt;next = freeobj++ &lt;&lt; OBJ_TAG_BITS;
&gt; 		link += class-&gt;size / sizeof(*link);
&gt; 	}
&gt; 
&gt; Note that we increment the link first, link += class-&gt;size / sizeof(*link),
&gt; and check for the offset only afterwards. So by the time we break out of
&gt; the while-loop the link *might* point to the partial object which starts at
&gt; the last page of zspage, but *never* ends, because we don't have next_page
&gt; in current zspage. So that's why that object should not be linked in,
&gt; because it's not a valid allocates object - we simply don't have space
&gt; for it anymore.
&gt; 
&gt; zspage [      page 1     ][      page 2      ]
&gt;         ...............................link
&gt; 	                                   [..###]
&gt; 
&gt; therefore the last object must be &quot;link - 1&quot; for such cases.
&gt; 
&gt; I think, the following change can also do the trick:
&gt; 
&gt; 	while ((off + class-&gt;size) &lt; PAGE_SIZE) {
&gt; 		link-&gt;next = freeobj++ &lt;&lt; OBJ_TAG_BITS;
&gt; 		link += class-&gt;size / sizeof(*link);
&gt; 		off += class-&gt;size;
&gt; 	}
&gt; 
&gt; Once again, I might be wrong on this.
&gt; Any thoughts?

If we want a refactoring, I'm not against but description said it tiggered
BUG_ON on zs_map_object rarely. That means it should be stable material
and need more description to understand. Please be more specific with
some example. The reason I'm hesitating is zsmalloc moves ZS_FULL group
when the zspage-&gt;inuse is equal to class-&gt;objs_per_zspage so I thought
it shouldn't allocate last partial object.

Thanks.
</zhouxianrong></zhouxianrong></pre></blockquote><div style="height:30px;"></div><div style="height:2px;width:298px;border-bottom:solid 2px #e5e5e5"></div><div style="height:20px;"></div><a target="_blank" style="background-image:url(http://r.g.tom.com/kwap/r/app/other/suixinyou.png);background-repeat:no-repeat;background-position:left center;font-size:14px;background-size: 20px;height: 39px;line-height: 39px;padding-left: 25px;display:block;color:#333333;text-decoration: none;" href="http://mail.tom.com/webmail-static/welcomesxy.html"  onmouseover="this.style.cssText='background-image:url(http://r.g.tom.com/kwap/r/app/other/suixinyou.png);background-repeat:no-repeat;background-position:left center;font-size:14px;background-size: 20px;height: 39px;line-height: 39px;padding-left: 27px;display:block;color:#4c4c4c; text-decoration:underline;'" onmouseout="this.style.cssText='background-image:url(http://r.g.tom.com/kwap/r/app/other/suixinyou.png);background-repeat:no-repeat;background-position:left center;font-size:14px;background-size: 20px;height: 39px;line-height: 39px;padding-left: 27px;display:block;color:#4c4c4c;text-decoration:none'">随心邮-在微信里收发邮件,及时省电又安心</a>
Minchan Kim Aug. 16, 2018, 3:46 a.m. UTC | #6
Hi zhouxianrong,

Please could you be more sepcific what case can we encounter below BUG?
(Please use plain text)
What zs_class size did you this this problem?
Could you say how that can happen?

As I wrote in other reply, zsmalloc should never allocate last parital
object when I look at source code so we need to understand what specific
case we are missing if it's a real zsmalloc bug.

Please explain how that can be happen with a real example.

Thanks.

On Thu, Aug 16, 2018 at 08:10:42AM +0800, zhouxianrong wrote:
> H<span labeltype="transpond"><minchan kernel="" org="">i:<br /><br />&nbsp; I am sorry so later for replying this message due to something.<br /><br />This is the backtrace edited by me we met.<br /><br />[pid:3471,cpu4,thread-3]------------[ cut here ]------------<br />[pid:3471,cpu4,thread-3]kernel bug at ../../../../../../mm/zsmalloc.c:1455!<br />[pid:3471,cpu4,thread-3]internal error: oops - bug: 0 [#1] preempt smp<br />[pid:3471,cpu4,thread-3]modules linked in:<br />[pid:3471,cpu4,thread-3]cpu: 4 pid: 3471 comm: thread-3 tainted: g&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; w&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4.9.84 #1<br />[pid:3471,cpu4,thread-3]tgid: 715 comm: proc-a<br />[pid:3471,cpu4,thread-3]task: ffffffcc83ba1d00 task.stack: ffffffcad99b0000<br />[pid:3471,cpu4,thread-3]pc is at zs_map_object+0x1e0/0x1f0<br />[pid:3471,cpu4,thread-3]lr is at zs_map_object+0x9c/0x1f0<br />[pid:3471,cpu4,thread-3]pc : [] lr : [] pstate: 20000145<br />[pid:3471,cpu4,thread-3]sp : ffffffcad99b3530<br />[pid:3471,cpu4,thread-3]x29: ffffffcad99b3530 x28: ffffffcc97533c40<br />[pid:3471,cpu4,thread-3]x27: ffffffcc974dd720 x26: ffffffcad99b0000<br />[pid:3471,cpu4,thread-3]x25: 0000000001fa9f80 x24: 0000000000000002<br />[pid:3471,cpu4,thread-3]x23: ffffff89c3a27000 x22: ffffff89c30e6000<br />[pid:3471,cpu4,thread-3]x21: ffffff89c354f000 x20: ffffff89c3234720<br />[pid:3471,cpu4,thread-3]x19: 0000000000000f90 x18: 0000000000000008<br />[pid:3471,cpu4,thread-3]x17: 00000000bbb877ff x16: 00000000ffdba560<br />[pid:3471,cpu4,thread-3]x15: ffffffcaeab13ff5 x14: 000000009e3779b1<br />[pid:3471,cpu4,thread-3]x13: 0000000000000ff4 x12: ffffffcaeab13fd9<br />[pid:3471,cpu4,thread-3]x11: ffffffcaeab13ffa x10: ffffffcaeab13ff8<br />[pid:3471,cpu4,thread-3]x9 : ffffffca8cc201b8 x8 : ffffffca8cc20190<br />[pid:3471,cpu4,thread-3]x7 : 000000000000008e x6 : 000000000000009b<br />[pid:3471,cpu4,thread-3]x5 : 0000000000000000 x4 : 0000000000000001<br />[pid:3471,cpu4,thread-3]x3 : 00000042d42a9000 x2 : 00000000000009d0<br />[pid:3471,cpu4,thread-3]x1 : ffffffcc994ddbc0 x0 : 0000000000000000<br /><br />[pid:3471,cpu4,thread-3] zs_map_object+0x1e0/0x1f0<br />[pid:3471,cpu4,thread-3] zs_zpool_map+0x44/0x54<br />[pid:3471,cpu4,thread-3] zpool_map_handle+0x44/0x58<br />[pid:3471,cpu4,thread-3] zram_bvec_write+0x22c/0x76c<br />[pid:3471,cpu4,thread-3] zram_bvec_rw+0x288/0x488<br />[pid:3471,cpu4,thread-3] zram_rw_page+0x124/0x1a4<br />[pid:3471,cpu4,thread-3] bdev_write_page+0x8c/0xd8<br />[pid:3471,cpu4,thread-3] __swap_writepage+0x1c0/0x3a8<br />[pid:3471,cpu4,thread-3] swap_writepage+0x3c/0x64<br />[pid:3471,cpu4,thread-3] shrink_page_list+0x844/0xd84<br />[pid:3471,cpu4,thread-3] reclaim_pages_from_list+0xf4/0x1bc<br />[pid:3471,cpu4,thread-3] reclaim_pte_range+0x208/0x2a0<br />[pid:3471,cpu4,thread-3] walk_pgd_range+0xe8/0x238<br />[pid:3471,cpu4,thread-3] walk_page_range+0x7c/0x164<br />[pid:3471,cpu4,thread-3] reclaim_write+0x208/0x608<br />[pid:3471,cpu4,thread-3] __vfs_write+0x50/0x88<br />[pid:3471,cpu4,thread-3] vfs_write+0xbc/0x2b0<br />[pid:3471,cpu4,thread-3] sys_write+0x60/0xc4<br />[pid:3471,cpu4,thread-3] el0_svc_naked+0x34/0x38<br />[pid:3471,cpu4,thread-3]code: 17ffffdd d4210000 97ffff1f 97ffff83 (d4210000)<br />[pid:3471,cpu4,thread-3]---[ end trace 652caafc4c4b6d06 ]--- <br /></minchan></span><blockquote style="padding-left:1ex;margin:0px 0px 0px 0.8ex;border-left:#ccc 1px solid"><pre>Hi Sergey,
> 
> On Mon, Aug 13, 2018 at 07:55:36PM +0900, Sergey Senozhatsky wrote:
> &gt; On (08/13/18 15:05), Minchan Kim wrote:
> &gt; &gt; &gt; From: zhouxianrong <zhouxianrong huawei="" com="">
> &gt; &gt; &gt; 
> &gt; &gt; &gt; The last partial object in last subpage of zspage should not be linked
> &gt; &gt; &gt; in allocation list. Otherwise it could trigger BUG_ON explicitly at
> &gt; &gt; &gt; function zs_map_object. But it happened rarely.
> &gt; &gt; 
> &gt; &gt; Could you be more specific? What case did you see the problem?
> &gt; &gt; Is it a real problem or one founded by review?
> &gt; [..]
> &gt; &gt; &gt; Signed-off-by: zhouxianrong <zhouxianrong huawei="" com="">
> &gt; &gt; &gt; ---
> &gt; &gt; &gt;  mm/zsmalloc.c | 2 ++
> &gt; &gt; &gt;  1 file changed, 2 insertions(+)
> &gt; &gt; &gt; 
> &gt; &gt; &gt; diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> &gt; &gt; &gt; index 8d87e973a4f5..24dd8da0aa59 100644
> &gt; &gt; &gt; --- a/mm/zsmalloc.c
> &gt; &gt; &gt; +++ b/mm/zsmalloc.c
> &gt; &gt; &gt; @@ -1040,6 +1040,8 @@ static void init_zspage(struct size_class *class, struct zspage *zspage)
> &gt; &gt; &gt;  			 * Reset OBJ_TAG_BITS bit to last link to tell
> &gt; &gt; &gt;  			 * whether it's allocated object or not.
> &gt; &gt; &gt;  			 */
> &gt; &gt; &gt; +			if (off &gt; PAGE_SIZE)
> &gt; &gt; &gt; +				link -= class-&gt;size / sizeof(*link);
> &gt; &gt; &gt;  			link-&gt;next = -1UL &lt;&lt; OBJ_TAG_BITS;
> &gt; &gt; &gt;  		}
> &gt; &gt; &gt;  		kunmap_atomic(vaddr);
> &gt; 
> &gt; Hmm. This can be a real issue. Unless I'm missing something.
> &gt; 
> &gt; So... I might be wrong, but the way I see the bug report is:
> &gt; 
> &gt; When we link objects during zspage init, we do the following:
> &gt; 
> &gt; 	while ((off += class-&gt;size) &lt; PAGE_SIZE) {
> &gt; 		link-&gt;next = freeobj++ &lt;&lt; OBJ_TAG_BITS;
> &gt; 		link += class-&gt;size / sizeof(*link);
> &gt; 	}
> &gt; 
> &gt; Note that we increment the link first, link += class-&gt;size / sizeof(*link),
> &gt; and check for the offset only afterwards. So by the time we break out of
> &gt; the while-loop the link *might* point to the partial object which starts at
> &gt; the last page of zspage, but *never* ends, because we don't have next_page
> &gt; in current zspage. So that's why that object should not be linked in,
> &gt; because it's not a valid allocates object - we simply don't have space
> &gt; for it anymore.
> &gt; 
> &gt; zspage [      page 1     ][      page 2      ]
> &gt;         ...............................link
> &gt; 	                                   [..###]
> &gt; 
> &gt; therefore the last object must be &quot;link - 1&quot; for such cases.
> &gt; 
> &gt; I think, the following change can also do the trick:
> &gt; 
> &gt; 	while ((off + class-&gt;size) &lt; PAGE_SIZE) {
> &gt; 		link-&gt;next = freeobj++ &lt;&lt; OBJ_TAG_BITS;
> &gt; 		link += class-&gt;size / sizeof(*link);
> &gt; 		off += class-&gt;size;
> &gt; 	}
> &gt; 
> &gt; Once again, I might be wrong on this.
> &gt; Any thoughts?
> 
> If we want a refactoring, I'm not against but description said it tiggered
> BUG_ON on zs_map_object rarely. That means it should be stable material
> and need more description to understand. Please be more specific with
> some example. The reason I'm hesitating is zsmalloc moves ZS_FULL group
> when the zspage-&gt;inuse is equal to class-&gt;objs_per_zspage so I thought
> it shouldn't allocate last partial object.
> 
> Thanks.
> </zhouxianrong></zhouxianrong></pre></blockquote><div style="height:30px;"></div><div style="height:2px;width:298px;border-bottom:solid 2px #e5e5e5"></div><div style="height:20px;"></div><a target="_blank" style="background-image:url(http://r.g.tom.com/kwap/r/app/other/suixinyou.png);background-repeat:no-repeat;background-position:left center;font-size:14px;background-size: 20px;height: 39px;line-height: 39px;padding-left: 25px;display:block;color:#333333;text-decoration: none;" href="http://mail.tom.com/webmail-static/welcomesxy.html"  onmouseover="this.style.cssText='background-image:url(http://r.g.tom.com/kwap/r/app/other/suixinyou.png);background-repeat:no-repeat;background-position:left center;font-size:14px;background-size: 20px;height: 39px;line-height: 39px;padding-left: 27px;display:block;color:#4c4c4c; text-decoration:underline;'" onmouseout="this.style.cssText='background-image:url(http://r.g.tom.com/kwap/r/app/other/suixinyou.png);background-repeat:no-repeat;background-position:left center;font-size:14px;background-size: 20px;height: 39px;line-height: 39px;padding-left: 27px;display:block;color:#4c4c4c;text-decoration:none'">随心邮-在微信里收发邮件,及时省电又安心</a>
diff mbox series

Patch

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 8d87e973a4f5..24dd8da0aa59 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1040,6 +1040,8 @@  static void init_zspage(struct size_class *class, struct zspage *zspage)
 			 * Reset OBJ_TAG_BITS bit to last link to tell
 			 * whether it's allocated object or not.
 			 */
+			if (off > PAGE_SIZE)
+				link -= class->size / sizeof(*link);
 			link->next = -1UL << OBJ_TAG_BITS;
 		}
 		kunmap_atomic(vaddr);