diff mbox

[1/2] btrfs: drop trace_btrfs_all_work_done() from normal_work_helper()

Message ID 20161214140530.6534-1-bigeasy@linutronix.de (mailing list archive)
State New, archived
Headers show

Commit Message

Sebastian Andrzej Siewior Dec. 14, 2016, 2:05 p.m. UTC
For btrfs_scrubparity_helper() the ->func() is set to
scrub_parity_bio_endio_worker(). This functions invokes
scrub_free_parity() which kfrees() the `work' object. All is good as
long as trace events are not enabled because we boom with a backtrace
like this:
| Workqueue: btrfs-endio btrfs_endio_helper
| RIP: 0010:[<ffffffff812f81ae>]  [<ffffffff812f81ae>] trace_event_raw_event_btrfs__work__done+0x4e/0xa0
| Call Trace:
|  [<ffffffff8136497d>] btrfs_scrubparity_helper+0x59d/0x780
|  [<ffffffff81364c49>] btrfs_endio_helper+0x9/0x10
|  [<ffffffff8108af8e>] process_one_work+0x26e/0x7b0
|  [<ffffffff8108b516>] worker_thread+0x46/0x560
|  [<ffffffff81091c4e>] kthread+0xee/0x110
|  [<ffffffff818e166a>] ret_from_fork+0x2a/0x40

So in order to avoid this, I remove the trace point.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 fs/btrfs/async-thread.c | 2 --
 1 file changed, 2 deletions(-)

Comments

David Sterba Dec. 20, 2016, 5:26 p.m. UTC | #1
Adding Qu to CC,

On Wed, Dec 14, 2016 at 03:05:29PM +0100, Sebastian Andrzej Siewior wrote:
> For btrfs_scrubparity_helper() the ->func() is set to
> scrub_parity_bio_endio_worker(). This functions invokes
> scrub_free_parity() which kfrees() the `work' object. All is good as
> long as trace events are not enabled because we boom with a backtrace
> like this:
> | Workqueue: btrfs-endio btrfs_endio_helper
> | RIP: 0010:[<ffffffff812f81ae>]  [<ffffffff812f81ae>] trace_event_raw_event_btrfs__work__done+0x4e/0xa0
> | Call Trace:
> |  [<ffffffff8136497d>] btrfs_scrubparity_helper+0x59d/0x780
> |  [<ffffffff81364c49>] btrfs_endio_helper+0x9/0x10
> |  [<ffffffff8108af8e>] process_one_work+0x26e/0x7b0
> |  [<ffffffff8108b516>] worker_thread+0x46/0x560
> |  [<ffffffff81091c4e>] kthread+0xee/0x110
> |  [<ffffffff818e166a>] ret_from_fork+0x2a/0x40
> 
> So in order to avoid this, I remove the trace point.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  fs/btrfs/async-thread.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
> index e0f071f6b5a7..d0dfc3d2e199 100644
> --- a/fs/btrfs/async-thread.c
> +++ b/fs/btrfs/async-thread.c
> @@ -318,8 +318,6 @@ static void normal_work_helper(struct btrfs_work *work)
>  		set_bit(WORK_DONE_BIT, &work->flags);
>  		run_ordered_work(wq);
>  	}
> -	if (!need_order)
> -		trace_btrfs_all_work_done(work);

The comment in the function says we can't touch 'work' after the
callbacks. I don't see any way to use it in a tracepoint here. The
"all_work_done" pairs with a preceding trace_btrfs_work_sched in the
same function or from within run_ordered_work, also called after the
free callback.

So I think we should either remove the tracepoint completely or change
the arguments to take something else than a potentially freed 'work'.

I'm a bit puzzled by the comment in trace/events/btrfs.h

http://lxr.free-electrons.com/source/include/trace/events/btrfs.h#L1165

/* For situiations that the work is freed */
DECLARE_EVENT_CLASS(btrfs__work__done,

so we're expecing a freed pointer anyway? That sounds wrong.

I'll queue the patch for 4.10 as it fixes a crash.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Dec. 21, 2016, 12:33 a.m. UTC | #2
At 12/21/2016 01:26 AM, David Sterba wrote:
> Adding Qu to CC,
>
> On Wed, Dec 14, 2016 at 03:05:29PM +0100, Sebastian Andrzej Siewior wrote:
>> For btrfs_scrubparity_helper() the ->func() is set to
>> scrub_parity_bio_endio_worker(). This functions invokes
>> scrub_free_parity() which kfrees() the `work' object. All is good as
>> long as trace events are not enabled because we boom with a backtrace
>> like this:
>> | Workqueue: btrfs-endio btrfs_endio_helper
>> | RIP: 0010:[<ffffffff812f81ae>]  [<ffffffff812f81ae>] trace_event_raw_event_btrfs__work__done+0x4e/0xa0
>> | Call Trace:
>> |  [<ffffffff8136497d>] btrfs_scrubparity_helper+0x59d/0x780
>> |  [<ffffffff81364c49>] btrfs_endio_helper+0x9/0x10
>> |  [<ffffffff8108af8e>] process_one_work+0x26e/0x7b0
>> |  [<ffffffff8108b516>] worker_thread+0x46/0x560
>> |  [<ffffffff81091c4e>] kthread+0xee/0x110
>> |  [<ffffffff818e166a>] ret_from_fork+0x2a/0x40
>>
>> So in order to avoid this, I remove the trace point.
>>
>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>> ---
>>  fs/btrfs/async-thread.c | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
>> index e0f071f6b5a7..d0dfc3d2e199 100644
>> --- a/fs/btrfs/async-thread.c
>> +++ b/fs/btrfs/async-thread.c
>> @@ -318,8 +318,6 @@ static void normal_work_helper(struct btrfs_work *work)
>>  		set_bit(WORK_DONE_BIT, &work->flags);
>>  		run_ordered_work(wq);
>>  	}
>> -	if (!need_order)
>> -		trace_btrfs_all_work_done(work);
>
> The comment in the function says we can't touch 'work' after the
> callbacks. I don't see any way to use it in a tracepoint here. The
> "all_work_done" pairs with a preceding trace_btrfs_work_sched in the
> same function or from within run_ordered_work, also called after the
> free callback.

The trace point only uses the pointer, and this helps us to pair with 
btrfs_work_queued/sched.

But I still don't understand why backtrace is triggered.
Since we're just recording a pointer, not touching it.

Would you please explain the problem with more details on how it trigger 
the problem?

>
> So I think we should either remove the tracepoint completely or change
> the arguments to take something else than a potentially freed 'work'.

I'm mostly OK to remove the tracepoint, but such all_workd_done() trace 
should still help to determine if it's a workqueue stalled.

Thanks,
Qu

>
> I'm a bit puzzled by the comment in trace/events/btrfs.h
>
> http://lxr.free-electrons.com/source/include/trace/events/btrfs.h#L1165
>
> /* For situiations that the work is freed */
> DECLARE_EVENT_CLASS(btrfs__work__done,
>
> so we're expecing a freed pointer anyway? That sounds wrong.
>
> I'll queue the patch for 4.10 as it fixes a crash.
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sebastian Andrzej Siewior Dec. 21, 2016, 8:28 a.m. UTC | #3
On 2016-12-21 08:33:03 [+0800], Qu Wenruo wrote:
> The trace point only uses the pointer, and this helps us to pair with
> btrfs_work_queued/sched.

| /* For situiations that the work is freed */
| DECLARE_EVENT_CLASS(btrfs__work__done,
| 
|         TP_PROTO(struct btrfs_work *work),
| 
|         TP_ARGS(work),
| 
|         TP_STRUCT__entry_btrfs(
|                 __field(        void *, work                    )
|         ),
| 
|         TP_fast_assign_btrfs(btrfs_work_owner(work),
|                 __entry->work           = work;
|         ),
| 
|         TP_printk_btrfs("work->%p", __entry->work)
| );

and btrfs_work_owner exapnds to:

| struct btrfs_fs_info *
| btrfs_work_owner(struct btrfs_work *work)
| {
|         return work->wq->fs_info;
| }
 
voilà

> But I still don't understand why backtrace is triggered.
> Since we're just recording a pointer, not touching it.
> 
> Would you please explain the problem with more details on how it trigger the
> problem?

enabled all events played with the fs which was just an upgrade and git
tree sync + checkout so nothing special.

> > 
> > So I think we should either remove the tracepoint completely or change
> > the arguments to take something else than a potentially freed 'work'.
> 
> I'm mostly OK to remove the tracepoint, but such all_workd_done() trace
> should still help to determine if it's a workqueue stalled.
> 
> Thanks,
> Qu

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Dec. 21, 2016, 8:45 a.m. UTC | #4
At 12/21/2016 04:28 PM, Sebastian Andrzej Siewior wrote:
> On 2016-12-21 08:33:03 [+0800], Qu Wenruo wrote:
>> The trace point only uses the pointer, and this helps us to pair with
>> btrfs_work_queued/sched.
>
> | /* For situiations that the work is freed */
> | DECLARE_EVENT_CLASS(btrfs__work__done,
> |
> |         TP_PROTO(struct btrfs_work *work),
> |
> |         TP_ARGS(work),
> |
> |         TP_STRUCT__entry_btrfs(
> |                 __field(        void *, work                    )
> |         ),
> |
> |         TP_fast_assign_btrfs(btrfs_work_owner(work),
> |                 __entry->work           = work;
> |         ),
> |
> |         TP_printk_btrfs("work->%p", __entry->work)
> | );
>
> and btrfs_work_owner exapnds to:
>
> | struct btrfs_fs_info *
> | btrfs_work_owner(struct btrfs_work *work)
> | {
> |         return work->wq->fs_info;
> | }
>
> voilà

Oh I got it, thanks very much.

The btrfs_work_owner() is newly introduced, no wonder I didn't know that.


I think we can fix it by extracting fs_info pointer before running the 
work, and using the extracted one in the trace point.

Thanks,
Qu


>
>> But I still don't understand why backtrace is triggered.
>> Since we're just recording a pointer, not touching it.
>>
>> Would you please explain the problem with more details on how it trigger the
>> problem?
>
> enabled all events played with the fs which was just an upgrade and git
> tree sync + checkout so nothing special.
>
>>>
>>> So I think we should either remove the tracepoint completely or change
>>> the arguments to take something else than a potentially freed 'work'.
>>
>> I'm mostly OK to remove the tracepoint, but such all_workd_done() trace
>> should still help to determine if it's a workqueue stalled.
>>
>> Thanks,
>> Qu
>
> Sebastian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sebastian Andrzej Siewior Dec. 21, 2016, 8:55 a.m. UTC | #5
On 2016-12-21 16:45:06 [+0800], Qu Wenruo wrote:
> > | DECLARE_EVENT_CLASS(btrfs__work__done,
> > |
> > |         TP_PROTO(struct btrfs_work *work),
> > |
> > |         TP_ARGS(work),
> > |
> > |         TP_STRUCT__entry_btrfs(
> > |                 __field(        void *, work                    )
> > |         ),
> > |
> > |         TP_fast_assign_btrfs(btrfs_work_owner(work),
> > |                 __entry->work           = work;
> > |         ),
> > |
> > |         TP_printk_btrfs("work->%p", __entry->work)
> > | );
> > 
> > and btrfs_work_owner exapnds to:
> > 
> > | struct btrfs_fs_info *
> > | btrfs_work_owner(struct btrfs_work *work)
> > | {
> > |         return work->wq->fs_info;
> > | }
> > 
> > voilà
> 
> Oh I got it, thanks very much.
> 
> The btrfs_work_owner() is newly introduced, no wonder I didn't know that.

It was introduced in cb001095ca70 ("btrfs: plumb fs_info into
btrfs_work") which is v4.8-rc1. The usage in trace points started in
bc074524e123 ("btrfs: prefix fsid to all trace events") which is also
v4.8-rc1. So whatever is done to get it fixed, it should be pushed
stable for v4.8+.

> Thanks,
> Qu

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Dec. 21, 2016, 2:08 p.m. UTC | #6
On Wed, Dec 21, 2016 at 04:45:06PM +0800, Qu Wenruo wrote:
> 
> 
> At 12/21/2016 04:28 PM, Sebastian Andrzej Siewior wrote:
> > On 2016-12-21 08:33:03 [+0800], Qu Wenruo wrote:
> >> The trace point only uses the pointer, and this helps us to pair with
> >> btrfs_work_queued/sched.
> >
> > | /* For situiations that the work is freed */
> > | DECLARE_EVENT_CLASS(btrfs__work__done,
> > |
> > |         TP_PROTO(struct btrfs_work *work),
> > |
> > |         TP_ARGS(work),
> > |
> > |         TP_STRUCT__entry_btrfs(
> > |                 __field(        void *, work                    )
> > |         ),
> > |
> > |         TP_fast_assign_btrfs(btrfs_work_owner(work),
> > |                 __entry->work           = work;
> > |         ),
> > |
> > |         TP_printk_btrfs("work->%p", __entry->work)
> > | );
> >
> > and btrfs_work_owner exapnds to:
> >
> > | struct btrfs_fs_info *
> > | btrfs_work_owner(struct btrfs_work *work)
> > | {
> > |         return work->wq->fs_info;
> > | }
> >
> > voilà
> 
> Oh I got it, thanks very much.
> 
> The btrfs_work_owner() is newly introduced, no wonder I didn't know that.
> 
> 
> I think we can fix it by extracting fs_info pointer before running the 
> work, and using the extracted one in the trace point.

That sounds like an easy fix to preserve the tracepoint.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index e0f071f6b5a7..d0dfc3d2e199 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -318,8 +318,6 @@  static void normal_work_helper(struct btrfs_work *work)
 		set_bit(WORK_DONE_BIT, &work->flags);
 		run_ordered_work(wq);
 	}
-	if (!need_order)
-		trace_btrfs_all_work_done(work);
 }
 
 void btrfs_init_work(struct btrfs_work *work, btrfs_work_func_t uniq_func,