diff mbox

task btrfs-cleaner:770 blocked for more than 120 seconds.

Message ID 20160211191741.GA4762@localhost.localdomain (mailing list archive)
State New, archived
Headers show

Commit Message

Liu Bo Feb. 11, 2016, 7:18 p.m. UTC
On Wed, Feb 10, 2016 at 02:23:00PM -0700, Chris Murphy wrote:
> On Wed, Feb 10, 2016 at 1:39 PM, ?????? ????????
> <mikhail.v.gavrilov@gmail.com> wrote:
> 
> 
> >
> > Here full log: http://btrfs.sy24.ru/kernel-sysrqw-btrfscleaner770blocked-2.txt
> >
> > I am so sorry if this log is useless.
> 
> Looks good to me. The blocked task happens out of no where with
> nothing reported for almost an hour before the blocking. And I see the
> sysrq: SysRq : Show Blocked State was issued and lots of information
> is in the file.
> 
> > If "sysrq" is needed enabled before hang then I need set this
> > permanently because as I said I not having exactly reproducing this.
> 
> echo 1 > /proc/sys/kernel/sysrq can happen anytime, it just enables
> sysrq triggering functions which on Fedora kernels is not enabled by
> default. The main thing is that the echo w to the sysrq trigger needs
> to happen at the time of the problem to show the state. You did that.
> Let's see what Liu Bo has to say about it.

Really appreciate for collecting these, it should be helpful.

Unfortunately I still could not figure out who's holding fs tree's root WRITE_LOCK so that others are blocked.

A possible bug in log code (the follwing patch addressed it),

- log_new_dir_dentries() is holding log tree's leaf READ_LOCK and may try
  to get fs tree's READ_LOCK via btrfs_iget() -> btrfs_lookup().
  (This is shown in the backtrac)

- btrfs_log_inode() can call btrfs_search_forward() to get fs tree's
  leaf READ_LOCK and then call copy_items() -> btrfs_insert_empty_items()
  to acquire WRITE_LOCK of log tree's leaf and leaf's parent.  
  (In the backtrace, this is blocked by item 1 because log_new_dir_dentries is
  holding a log tree leaf's READ_LOCK() which happens to be sibling to
  the leaf that btrfs_insert_empty_items() is accessing, when doing
  split_leaf() it needs to get the sibling's WRITE_LOCK(). )

Thanks,

-liubo



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Mikhail Gavrilov Feb. 11, 2016, 9:03 p.m. UTC | #1
Thanks guys, I appreciate your's work.
In which kernel this patch would landed?

--
Best Regards,
Mike Gavrilov.


2016-02-12 0:18 GMT+05:00 Liu Bo <bo.li.liu@oracle.com>:
>
> Really appreciate for collecting these, it should be helpful.
>
> Unfortunately I still could not figure out who's holding fs tree's root WRITE_LOCK so that others are blocked.
>
> A possible bug in log code (the follwing patch addressed it),
>
> - log_new_dir_dentries() is holding log tree's leaf READ_LOCK and may try
>   to get fs tree's READ_LOCK via btrfs_iget() -> btrfs_lookup().
>   (This is shown in the backtrac)
>
> - btrfs_log_inode() can call btrfs_search_forward() to get fs tree's
>   leaf READ_LOCK and then call copy_items() -> btrfs_insert_empty_items()
>   to acquire WRITE_LOCK of log tree's leaf and leaf's parent.
>   (In the backtrace, this is blocked by item 1 because log_new_dir_dentries is
>   holding a log tree leaf's READ_LOCK() which happens to be sibling to
>   the leaf that btrfs_insert_empty_items() is accessing, when doing
>   split_leaf() it needs to get the sibling's WRITE_LOCK(). )
>
> Thanks,
>
> -liubo
>
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 323e12c..4a64fdd 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -4956,6 +4956,7 @@ process_leaf:
>                         if (di_key.type == BTRFS_ROOT_ITEM_KEY)
>                                 continue;
>
> +                       btrfs_release_path(path);
>                         di_inode = btrfs_iget(root->fs_info->sb, &di_key,
>                                               root, NULL);
>                         if (IS_ERR(di_inode)) {
> @@ -4971,7 +4972,6 @@ process_leaf:
>                         ctx->log_new_dentries = false;
>                         if (type == BTRFS_FT_DIR)
>                                 log_mode = LOG_INODE_ALL;
> -                       btrfs_release_path(path);
>                         ret = btrfs_log_inode(trans, root, di_inode,
>                                               log_mode, 0, LLONG_MAX, ctx);
>                         iput(di_inode);
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Feb. 12, 2016, 3:22 a.m. UTC | #2
On Fri, Feb 12, 2016 at 02:03:15AM +0500, ?????? ???????? wrote:
> Thanks guys, I appreciate your's work.
> In which kernel this patch would landed?

You can try it on your 4.2.3 kernel or the latest 4.5, but I guess it
doesn't not fix the real deadlock you're hitting...

Thanks,

-liubo

> 
> --
> Best Regards,
> Mike Gavrilov.
> 
> 
> 2016-02-12 0:18 GMT+05:00 Liu Bo <bo.li.liu@oracle.com>:
> >
> > Really appreciate for collecting these, it should be helpful.
> >
> > Unfortunately I still could not figure out who's holding fs tree's root WRITE_LOCK so that others are blocked.
> >
> > A possible bug in log code (the follwing patch addressed it),
> >
> > - log_new_dir_dentries() is holding log tree's leaf READ_LOCK and may try
> >   to get fs tree's READ_LOCK via btrfs_iget() -> btrfs_lookup().
> >   (This is shown in the backtrac)
> >
> > - btrfs_log_inode() can call btrfs_search_forward() to get fs tree's
> >   leaf READ_LOCK and then call copy_items() -> btrfs_insert_empty_items()
> >   to acquire WRITE_LOCK of log tree's leaf and leaf's parent.
> >   (In the backtrace, this is blocked by item 1 because log_new_dir_dentries is
> >   holding a log tree leaf's READ_LOCK() which happens to be sibling to
> >   the leaf that btrfs_insert_empty_items() is accessing, when doing
> >   split_leaf() it needs to get the sibling's WRITE_LOCK(). )
> >
> > Thanks,
> >
> > -liubo
> >
> > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > index 323e12c..4a64fdd 100644
> > --- a/fs/btrfs/tree-log.c
> > +++ b/fs/btrfs/tree-log.c
> > @@ -4956,6 +4956,7 @@ process_leaf:
> >                         if (di_key.type == BTRFS_ROOT_ITEM_KEY)
> >                                 continue;
> >
> > +                       btrfs_release_path(path);
> >                         di_inode = btrfs_iget(root->fs_info->sb, &di_key,
> >                                               root, NULL);
> >                         if (IS_ERR(di_inode)) {
> > @@ -4971,7 +4972,6 @@ process_leaf:
> >                         ctx->log_new_dentries = false;
> >                         if (type == BTRFS_FT_DIR)
> >                                 log_mode = LOG_INODE_ALL;
> > -                       btrfs_release_path(path);
> >                         ret = btrfs_log_inode(trans, root, di_inode,
> >                                               log_mode, 0, LLONG_MAX, ctx);
> >                         iput(di_inode);
> >
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Murphy Feb. 12, 2016, 3:38 a.m. UTC | #3
On Thu, Feb 11, 2016 at 8:22 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
> On Fri, Feb 12, 2016 at 02:03:15AM +0500, ?????? ???????? wrote:
>> Thanks guys, I appreciate your's work.
>> In which kernel this patch would landed?
>
> You can try it on your 4.2.3 kernel or the latest 4.5, but I guess it
> doesn't not fix the real deadlock you're hitting...

I think the question is whether the patch is already in a stable
kernel, or mainline, or not yet merged?
Mikhail Gavrilov Feb. 12, 2016, 7:15 p.m. UTC | #4
2016-02-12 8:22 GMT+05:00 Liu Bo <bo.li.liu@oracle.com>:
> You can try it on your 4.2.3 kernel or the latest 4.5, but I guess it
> doesn't not fix the real deadlock you're hitting...

It means it is not final patch? you continue investigate problem? Can
I help you?

--
Best Regards,
Mike Gavrilov.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Feb. 12, 2016, 8:34 p.m. UTC | #5
On Sat, Feb 13, 2016 at 12:15:08AM +0500, ?????? ???????? wrote:
> 2016-02-12 8:22 GMT+05:00 Liu Bo <bo.li.liu@oracle.com>:
> > You can try it on your 4.2.3 kernel or the latest 4.5, but I guess it
> > doesn't not fix the real deadlock you're hitting...
> 
> It means it is not final patch? you continue investigate problem? Can
> I help you?

Yeah, it's not the final patch, you can apply it and see if the deadlock
will happen again.

Unless we can get a way to reproduce it, I'm afraid there's little things we can  do here.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Feb. 12, 2016, 8:35 p.m. UTC | #6
On Thu, Feb 11, 2016 at 08:38:36PM -0700, Chris Murphy wrote:
> On Thu, Feb 11, 2016 at 8:22 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
> > On Fri, Feb 12, 2016 at 02:03:15AM +0500, ?????? ???????? wrote:
> >> Thanks guys, I appreciate your's work.
> >> In which kernel this patch would landed?
> >
> > You can try it on your 4.2.3 kernel or the latest 4.5, but I guess it
> > doesn't not fix the real deadlock you're hitting...
> 
> I think the question is whether the patch is already in a stable
> kernel, or mainline, or not yet merged?

Oh, it's not merged, just a quick patch to see if it can fix the deadlock.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mikhail Gavrilov Feb. 13, 2016, 7:23 p.m. UTC | #7
Sorry, I have not yet had time to apply your patch.

And get hang again when launch web browser.

Here new logs: http://btrfs.sy24.ru/kernel-sysrqw-btrfscleaner770blocked-3.txt

Addition info: when btrfs partition hangs also hang GUI it is normal?
For grab logs I switch to tty3 (Ctrl - Alt - F3)



--
Best Regards,
Mike Gavrilov.


2016-02-13 1:34 GMT+05:00 Liu Bo <bo.li.liu@oracle.com>:
> On Sat, Feb 13, 2016 at 12:15:08AM +0500, ?????? ???????? wrote:
>> 2016-02-12 8:22 GMT+05:00 Liu Bo <bo.li.liu@oracle.com>:
>> > You can try it on your 4.2.3 kernel or the latest 4.5, but I guess it
>> > doesn't not fix the real deadlock you're hitting...
>>
>> It means it is not final patch? you continue investigate problem? Can
>> I help you?
>
> Yeah, it's not the final patch, you can apply it and see if the deadlock
> will happen again.
>
> Unless we can get a way to reproduce it, I'm afraid there's little things we can  do here.
>
> Thanks,
>
> -liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Feb. 14, 2016, 9:32 p.m. UTC | #8
Hi Mike,
On Sun, Feb 14, 2016 at 12:23:16AM +0500, ?????? ???????? wrote:
> Sorry, I have not yet had time to apply your patch.
> 
> And get hang again when launch web browser.
> 
> Here new logs: http://btrfs.sy24.ru/kernel-sysrqw-btrfscleaner770blocked-3.txt

The logs show,
every hung process is waiting for fs tree root's WRITE_LOCK, but someone who's
holding the WRITE_LOCK is not reflected here.

Only one suspicious process is 
"Chrome_FileThre D ffff8807fe1d7a18 10360 22486   2826 0x00000000"
it's also waiting on btrfs_tree_lock(), but I can't tell whether it
waits for the fs tree root node's lock or the children node's lock.

Hmm, sorry for not being helpful, but I was wondering if this also
occurs on the latest btrfs(4.5) or is it possible to give it a shot?

The best result is that it's been fixed by one commit in 4.5.

> 
> Addition info: when btrfs partition hangs also hang GUI it is normal?
> For grab logs I switch to tty3 (Ctrl - Alt - F3)

I'm afraid the answer is yes because GUI needs to read files (like
config files) from the underlying partition.

Thanks,

-liubo

> 
> 
> 
> --
> Best Regards,
> Mike Gavrilov.
> 
> 
> 2016-02-13 1:34 GMT+05:00 Liu Bo <bo.li.liu@oracle.com>:
> > On Sat, Feb 13, 2016 at 12:15:08AM +0500, ?????? ???????? wrote:
> >> 2016-02-12 8:22 GMT+05:00 Liu Bo <bo.li.liu@oracle.com>:
> >> > You can try it on your 4.2.3 kernel or the latest 4.5, but I guess it
> >> > doesn't not fix the real deadlock you're hitting...
> >>
> >> It means it is not final patch? you continue investigate problem? Can
> >> I help you?
> >
> > Yeah, it's not the final patch, you can apply it and see if the deadlock
> > will happen again.
> >
> > Unless we can get a way to reproduce it, I'm afraid there's little things we can  do here.
> >
> > Thanks,
> >
> > -liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Roman Mamedov Feb. 14, 2016, 10:42 p.m. UTC | #9
On Sun, 14 Feb 2016 13:32:24 -0800
Liu Bo <bo.li.liu@oracle.com> wrote:

> Hi Mike,
> On Sun, Feb 14, 2016 at 12:23:16AM +0500, ?????? ???????? wrote:
> > Sorry, I have not yet had time to apply your patch.
> > 
> > And get hang again when launch web browser.
> > 
> > Here new logs: http://btrfs.sy24.ru/kernel-sysrqw-btrfscleaner770blocked-3.txt
> 
> The logs show,
> every hung process is waiting for fs tree root's WRITE_LOCK, but someone who's
> holding the WRITE_LOCK is not reflected here.
> 
> Only one suspicious process is 
> "Chrome_FileThre D ffff8807fe1d7a18 10360 22486   2826 0x00000000"
> it's also waiting on btrfs_tree_lock(), but I can't tell whether it
> waits for the fs tree root node's lock or the children node's lock.
> 
> Hmm, sorry for not being helpful, but I was wondering if this also
> occurs on the latest btrfs(4.5) or is it possible to give it a shot?
> 
> The best result is that it's been fixed by one commit in 4.5.

FWIW I had a persistently repeating deadlock on 4.1 and 4.3, but after upgrade
to 4.4 it no longer happens. Not sure if it might be related or not, the
backtrace seems similar: http://www.spinics.net/lists/linux-btrfs/msg50951.html

My advice would be to try 4.4, but don't forget to manually add the
nocow/enospc patch: https://patchwork.kernel.org/patch/7967161/

None of the vanilla 4.1, 4.3 or 4.4 should be considered usable with Btrfs at
the moment. The latest kernel series which works out of the box was 3.18;
patched 4.4 also works fine so far, but I wonder if that patch makes it into
4.4.2 or gets forgotten/etc again.
Christian Rohmann Feb. 18, 2016, 12:35 p.m. UTC | #10
On 02/14/2016 11:42 PM, Roman Mamedov wrote:
> FWIW I had a persistently repeating deadlock on 4.1 and 4.3, but
> after upgrade to 4.4 it no longer happens.


Apparently also with 4.4 there is some sort of blocking happening ...
just at 38580:

---- cut ----

[Wed Feb 17 16:43:48 2016] INFO: task btrfs-cleaner:38580 blocked for
more than 120 seconds.
[Wed Feb 17 16:43:48 2016]       Not tainted 4.4.0-customkernel #1
[Wed Feb 17 16:43:48 2016] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Feb 17 16:43:48 2016] btrfs-cleaner   D ffff882c27295dc0     0
38580      2 0x00000000
[Wed Feb 17 16:43:48 2016]  ffff882c16fa6480 ffff88161a980280
ffff882a3d744000 ffff882a3d743df8
[Wed Feb 17 16:43:48 2016]  ffff8815fead7104 ffff882c16fa6480
00000000ffffffff ffff8815fead7108
[Wed Feb 17 16:43:48 2016]  ffffffff81559a31 ffff8815fead7100
ffffffff81559cba ffffffff8155b5a0
[Wed Feb 17 16:43:48 2016] Call Trace:
[Wed Feb 17 16:43:48 2016]  [<ffffffff81559a31>] ? schedule+0x31/0x80
[Wed Feb 17 16:43:48 2016]  [<ffffffff81559cba>] ?
schedule_preempt_disabled+0xa/0x10
[Wed Feb 17 16:43:48 2016]  [<ffffffff8155b5a0>] ?
__mutex_lock_slowpath+0x90/0x110
[Wed Feb 17 16:43:48 2016]  [<ffffffff8155b63b>] ? mutex_lock+0x1b/0x30
[Wed Feb 17 16:43:48 2016]  [<ffffffffa01bf51e>] ?
btrfs_delete_unused_bgs+0xee/0x3f0 [btrfs]
[Wed Feb 17 16:43:48 2016]  [<ffffffff81559396>] ? __schedule+0x286/0x8f0
[Wed Feb 17 16:43:48 2016]  [<ffffffffa01c6ff7>] ?
cleaner_kthread+0x1a7/0x200 [btrfs]
[Wed Feb 17 16:43:48 2016]  [<ffffffffa01c6e50>] ?
check_leaf+0x340/0x340 [btrfs]
[Wed Feb 17 16:43:48 2016]  [<ffffffff8108df6f>] ? kthread+0xcf/0xf0
[Wed Feb 17 16:43:48 2016]  [<ffffffff8108dea0>] ? kthread_park+0x50/0x50
[Wed Feb 17 16:43:48 2016]  [<ffffffff8155d74f>] ? ret_from_fork+0x3f/0x70
[Wed Feb 17 16:43:48 2016]  [<ffffffff8108dea0>] ? kthread_park+0x50/0x50

[Wed Feb 17 17:23:48 2016] INFO: task btrfs-cleaner:38580 blocked for
more than 120 seconds.
[Wed Feb 17 17:23:48 2016]       Not tainted 4.4.0-customkernel #1
[Wed Feb 17 17:23:48 2016] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Feb 17 17:23:48 2016] btrfs-cleaner   D ffff881627a35dc0     0
38580      2 0x00000000
[Wed Feb 17 17:23:48 2016]  ffff882c16fa6480 ffff88161a956f00
ffff882a3d744000 ffff882a3d743df8
[Wed Feb 17 17:23:48 2016]  ffff8815fead7104 ffff882c16fa6480
00000000ffffffff ffff8815fead7108
[Wed Feb 17 17:23:48 2016]  ffffffff81559a31 ffff8815fead7100
ffffffff81559cba ffffffff8155b5a0
[Wed Feb 17 17:23:48 2016] Call Trace:
[Wed Feb 17 17:23:48 2016]  [<ffffffff81559a31>] ? schedule+0x31/0x80
[Wed Feb 17 17:23:48 2016]  [<ffffffff81559cba>] ?
schedule_preempt_disabled+0xa/0x10
[Wed Feb 17 17:23:48 2016]  [<ffffffff8155b5a0>] ?
__mutex_lock_slowpath+0x90/0x110
[Wed Feb 17 17:23:48 2016]  [<ffffffff8155b63b>] ? mutex_lock+0x1b/0x30
[Wed Feb 17 17:23:48 2016]  [<ffffffffa01bf51e>] ?
btrfs_delete_unused_bgs+0xee/0x3f0 [btrfs]
[Wed Feb 17 17:23:48 2016]  [<ffffffff81559396>] ? __schedule+0x286/0x8f0
[Wed Feb 17 17:23:48 2016]  [<ffffffffa01c6ff7>] ?
cleaner_kthread+0x1a7/0x200 [btrfs]
[Wed Feb 17 17:23:48 2016]  [<ffffffffa01c6e50>] ?
check_leaf+0x340/0x340 [btrfs]
[Wed Feb 17 17:23:48 2016]  [<ffffffff8108df6f>] ? kthread+0xcf/0xf0
[Wed Feb 17 17:23:48 2016]  [<ffffffff8108dea0>] ? kthread_park+0x50/0x50
[Wed Feb 17 17:23:48 2016]  [<ffffffff8155d74f>] ? ret_from_fork+0x3f/0x70
[Wed Feb 17 17:23:48 2016]  [<ffffffff8108dea0>] ? kthread_park+0x50/0x50

[Wed Feb 17 17:57:48 2016] INFO: task btrfs-cleaner:38580 blocked for
more than 120 seconds.
[Wed Feb 17 17:57:48 2016]       Not tainted 4.4.0-customkernel #1
[Wed Feb 17 17:57:48 2016] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Feb 17 17:57:48 2016] btrfs-cleaner   D ffff881627a95dc0     0
38580      2 0x00000000
[Wed Feb 17 17:57:48 2016]  ffff882c16fa6480 ffff88161a980fc0
ffff882a3d744000 ffff882a3d743df8
[Wed Feb 17 17:57:48 2016]  ffff8815fead7104 ffff882c16fa6480
00000000ffffffff ffff8815fead7108
[Wed Feb 17 17:57:48 2016]  ffffffff81559a31 ffff8815fead7100
ffffffff81559cba ffffffff8155b5a0
[Wed Feb 17 17:57:48 2016] Call Trace:
[Wed Feb 17 17:57:48 2016]  [<ffffffff81559a31>] ? schedule+0x31/0x80
[Wed Feb 17 17:57:48 2016]  [<ffffffff81559cba>] ?
schedule_preempt_disabled+0xa/0x10
[Wed Feb 17 17:57:48 2016]  [<ffffffff8155b5a0>] ?
__mutex_lock_slowpath+0x90/0x110
[Wed Feb 17 17:57:48 2016]  [<ffffffff8155b63b>] ? mutex_lock+0x1b/0x30
[Wed Feb 17 17:57:48 2016]  [<ffffffffa01bf51e>] ?
btrfs_delete_unused_bgs+0xee/0x3f0 [btrfs]
[Wed Feb 17 17:57:48 2016]  [<ffffffff81559396>] ? __schedule+0x286/0x8f0
[Wed Feb 17 17:57:48 2016]  [<ffffffffa01c6ff7>] ?
cleaner_kthread+0x1a7/0x200 [btrfs]
[Wed Feb 17 17:57:48 2016]  [<ffffffffa01c6e50>] ?
check_leaf+0x340/0x340 [btrfs]
[Wed Feb 17 17:57:48 2016]  [<ffffffff8108df6f>] ? kthread+0xcf/0xf0
[Wed Feb 17 17:57:48 2016]  [<ffffffff8108dea0>] ? kthread_park+0x50/0x50
[Wed Feb 17 17:57:48 2016]  [<ffffffff8155d74f>] ? ret_from_fork+0x3f/0x70
[Wed Feb 17 17:57:48 2016]  [<ffffffff8108dea0>] ? kthread_park+0x50/0x50


---- cut ----
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Feb. 18, 2016, 4:59 p.m. UTC | #11
On Thu, Feb 18, 2016 at 01:35:24PM +0100, Christian Rohmann wrote:
> 
> 
> On 02/14/2016 11:42 PM, Roman Mamedov wrote:
> > FWIW I had a persistently repeating deadlock on 4.1 and 4.3, but
> > after upgrade to 4.4 it no longer happens.
> 
> 
> Apparently also with 4.4 there is some sort of blocking happening ...
> just at 38580:

OK, what does 'sysrq-w' say?

Thanks,

-liubo

> 
> ---- cut ----
> 
> [Wed Feb 17 16:43:48 2016] INFO: task btrfs-cleaner:38580 blocked for
> more than 120 seconds.
> [Wed Feb 17 16:43:48 2016]       Not tainted 4.4.0-customkernel #1
> [Wed Feb 17 16:43:48 2016] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Wed Feb 17 16:43:48 2016] btrfs-cleaner   D ffff882c27295dc0     0
> 38580      2 0x00000000
> [Wed Feb 17 16:43:48 2016]  ffff882c16fa6480 ffff88161a980280
> ffff882a3d744000 ffff882a3d743df8
> [Wed Feb 17 16:43:48 2016]  ffff8815fead7104 ffff882c16fa6480
> 00000000ffffffff ffff8815fead7108
> [Wed Feb 17 16:43:48 2016]  ffffffff81559a31 ffff8815fead7100
> ffffffff81559cba ffffffff8155b5a0
> [Wed Feb 17 16:43:48 2016] Call Trace:
> [Wed Feb 17 16:43:48 2016]  [<ffffffff81559a31>] ? schedule+0x31/0x80
> [Wed Feb 17 16:43:48 2016]  [<ffffffff81559cba>] ?
> schedule_preempt_disabled+0xa/0x10
> [Wed Feb 17 16:43:48 2016]  [<ffffffff8155b5a0>] ?
> __mutex_lock_slowpath+0x90/0x110
> [Wed Feb 17 16:43:48 2016]  [<ffffffff8155b63b>] ? mutex_lock+0x1b/0x30
> [Wed Feb 17 16:43:48 2016]  [<ffffffffa01bf51e>] ?
> btrfs_delete_unused_bgs+0xee/0x3f0 [btrfs]
> [Wed Feb 17 16:43:48 2016]  [<ffffffff81559396>] ? __schedule+0x286/0x8f0
> [Wed Feb 17 16:43:48 2016]  [<ffffffffa01c6ff7>] ?
> cleaner_kthread+0x1a7/0x200 [btrfs]
> [Wed Feb 17 16:43:48 2016]  [<ffffffffa01c6e50>] ?
> check_leaf+0x340/0x340 [btrfs]
> [Wed Feb 17 16:43:48 2016]  [<ffffffff8108df6f>] ? kthread+0xcf/0xf0
> [Wed Feb 17 16:43:48 2016]  [<ffffffff8108dea0>] ? kthread_park+0x50/0x50
> [Wed Feb 17 16:43:48 2016]  [<ffffffff8155d74f>] ? ret_from_fork+0x3f/0x70
> [Wed Feb 17 16:43:48 2016]  [<ffffffff8108dea0>] ? kthread_park+0x50/0x50
> 
> [Wed Feb 17 17:23:48 2016] INFO: task btrfs-cleaner:38580 blocked for
> more than 120 seconds.
> [Wed Feb 17 17:23:48 2016]       Not tainted 4.4.0-customkernel #1
> [Wed Feb 17 17:23:48 2016] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Wed Feb 17 17:23:48 2016] btrfs-cleaner   D ffff881627a35dc0     0
> 38580      2 0x00000000
> [Wed Feb 17 17:23:48 2016]  ffff882c16fa6480 ffff88161a956f00
> ffff882a3d744000 ffff882a3d743df8
> [Wed Feb 17 17:23:48 2016]  ffff8815fead7104 ffff882c16fa6480
> 00000000ffffffff ffff8815fead7108
> [Wed Feb 17 17:23:48 2016]  ffffffff81559a31 ffff8815fead7100
> ffffffff81559cba ffffffff8155b5a0
> [Wed Feb 17 17:23:48 2016] Call Trace:
> [Wed Feb 17 17:23:48 2016]  [<ffffffff81559a31>] ? schedule+0x31/0x80
> [Wed Feb 17 17:23:48 2016]  [<ffffffff81559cba>] ?
> schedule_preempt_disabled+0xa/0x10
> [Wed Feb 17 17:23:48 2016]  [<ffffffff8155b5a0>] ?
> __mutex_lock_slowpath+0x90/0x110
> [Wed Feb 17 17:23:48 2016]  [<ffffffff8155b63b>] ? mutex_lock+0x1b/0x30
> [Wed Feb 17 17:23:48 2016]  [<ffffffffa01bf51e>] ?
> btrfs_delete_unused_bgs+0xee/0x3f0 [btrfs]
> [Wed Feb 17 17:23:48 2016]  [<ffffffff81559396>] ? __schedule+0x286/0x8f0
> [Wed Feb 17 17:23:48 2016]  [<ffffffffa01c6ff7>] ?
> cleaner_kthread+0x1a7/0x200 [btrfs]
> [Wed Feb 17 17:23:48 2016]  [<ffffffffa01c6e50>] ?
> check_leaf+0x340/0x340 [btrfs]
> [Wed Feb 17 17:23:48 2016]  [<ffffffff8108df6f>] ? kthread+0xcf/0xf0
> [Wed Feb 17 17:23:48 2016]  [<ffffffff8108dea0>] ? kthread_park+0x50/0x50
> [Wed Feb 17 17:23:48 2016]  [<ffffffff8155d74f>] ? ret_from_fork+0x3f/0x70
> [Wed Feb 17 17:23:48 2016]  [<ffffffff8108dea0>] ? kthread_park+0x50/0x50
> 
> [Wed Feb 17 17:57:48 2016] INFO: task btrfs-cleaner:38580 blocked for
> more than 120 seconds.
> [Wed Feb 17 17:57:48 2016]       Not tainted 4.4.0-customkernel #1
> [Wed Feb 17 17:57:48 2016] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Wed Feb 17 17:57:48 2016] btrfs-cleaner   D ffff881627a95dc0     0
> 38580      2 0x00000000
> [Wed Feb 17 17:57:48 2016]  ffff882c16fa6480 ffff88161a980fc0
> ffff882a3d744000 ffff882a3d743df8
> [Wed Feb 17 17:57:48 2016]  ffff8815fead7104 ffff882c16fa6480
> 00000000ffffffff ffff8815fead7108
> [Wed Feb 17 17:57:48 2016]  ffffffff81559a31 ffff8815fead7100
> ffffffff81559cba ffffffff8155b5a0
> [Wed Feb 17 17:57:48 2016] Call Trace:
> [Wed Feb 17 17:57:48 2016]  [<ffffffff81559a31>] ? schedule+0x31/0x80
> [Wed Feb 17 17:57:48 2016]  [<ffffffff81559cba>] ?
> schedule_preempt_disabled+0xa/0x10
> [Wed Feb 17 17:57:48 2016]  [<ffffffff8155b5a0>] ?
> __mutex_lock_slowpath+0x90/0x110
> [Wed Feb 17 17:57:48 2016]  [<ffffffff8155b63b>] ? mutex_lock+0x1b/0x30
> [Wed Feb 17 17:57:48 2016]  [<ffffffffa01bf51e>] ?
> btrfs_delete_unused_bgs+0xee/0x3f0 [btrfs]
> [Wed Feb 17 17:57:48 2016]  [<ffffffff81559396>] ? __schedule+0x286/0x8f0
> [Wed Feb 17 17:57:48 2016]  [<ffffffffa01c6ff7>] ?
> cleaner_kthread+0x1a7/0x200 [btrfs]
> [Wed Feb 17 17:57:48 2016]  [<ffffffffa01c6e50>] ?
> check_leaf+0x340/0x340 [btrfs]
> [Wed Feb 17 17:57:48 2016]  [<ffffffff8108df6f>] ? kthread+0xcf/0xf0
> [Wed Feb 17 17:57:48 2016]  [<ffffffff8108dea0>] ? kthread_park+0x50/0x50
> [Wed Feb 17 17:57:48 2016]  [<ffffffff8155d74f>] ? ret_from_fork+0x3f/0x70
> [Wed Feb 17 17:57:48 2016]  [<ffffffff8108dea0>] ? kthread_park+0x50/0x50
> 
> 
> ---- cut ----
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christian Rohmann Feb. 19, 2016, 4 p.m. UTC | #12
Hey liubo,

thanks for the quick response.

On 02/18/2016 05:59 PM, Liu Bo wrote:
>> Apparently also with 4.4 there is some sort of blocking happening ...
>> > just at 38580:
> OK, what does 'sysrq-w' say?

The problem has not appeared again for some time. Do I need to catch it
right when it happens? If so, what evidence should I collect and how?



Regards

Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Murphy Feb. 19, 2016, 4:42 p.m. UTC | #13
On Fri, Feb 19, 2016 at 9:00 AM, Christian Rohmann
<crohmann@netcologne.de> wrote:
> Hey liubo,
>
> thanks for the quick response.
>
> On 02/18/2016 05:59 PM, Liu Bo wrote:
>>> Apparently also with 4.4 there is some sort of blocking happening ...
>>> > just at 38580:
>> OK, what does 'sysrq-w' say?
>
> The problem has not appeared again for some time. Do I need to catch it
> right when it happens? If so, what evidence should I collect and how?

It's uncertain how your kernel is built so full documentation is here:
https://www.kernel.org/doc/Documentation/sysrq.txt

Anytime before or during the problem:
echo 1 >/proc/sys/kernel/sysrq

Once blocked tasks show up:
echo w > /proc/sysrq-trigger

journalctl -k -o short-monotonic -l > journal_sysrqw.txt    ## example

If you're not on a systemd system, all of this should get dumped into
rsyslogd messages, so you can go in there and extract the current boot
into a file to attach. It should also appear in dmesg of course but
sometimes the output is bigger than the default kernel message buffer.
Roman Mamedov March 16, 2016, 9:01 a.m. UTC | #14
On Tue, 15 Mar 2016 09:18:14 +0500
?????? ???????? <mikhail.v.gavrilov@gmail.com> wrote:

> Yesterday btrfs again hangs on kernel 4.4.5
> It's the same or another problem?

Seems different to me, as there's also defrag involved. Try removing and not
using the defrag mount option.

> This messages say that restore failed?

So you can't mount the FS anymore? Did you try btrfsck, what does it report?
Mikhail Gavrilov March 16, 2016, 7:54 p.m. UTC | #15
2016-03-16 14:01 GMT+05:00 Roman Mamedov <rm@romanrm.net>:
> On Tue, 15 Mar 2016 09:18:14 +0500
> Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> wrote:
>
>> Yesterday btrfs again hangs on kernel 4.4.5
>> It's the same or another problem?
>
> Seems different to me, as there's also defrag involved. Try removing and not
> using the defrag mount option.

How then defragment FS?

>
>> This messages say that restore failed?
>
> So you can't mount the FS anymore? Did you try btrfsck, what does it report?
>

FS is mounted, but after failure mount takes a long time (more than 4
min) and Fedora  stops boot  by timeout (system suggest enter under
root for maintance) after reboot again all fine.

I'm just afraid of losing data after this messages:
[  449.949206] BTRFS warning (device sdc1): failed to load free space
cache for block group 2286562574336, rebuilding it now
[  452.619318] BTRFS: open_ctree failed

or it does not mean anything terrible?

--
Best Regards,
Mike Gavrilov.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 323e12c..4a64fdd 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4956,6 +4956,7 @@  process_leaf:
 			if (di_key.type == BTRFS_ROOT_ITEM_KEY)
 				continue;
 
+			btrfs_release_path(path);
 			di_inode = btrfs_iget(root->fs_info->sb, &di_key,
 					      root, NULL);
 			if (IS_ERR(di_inode)) {
@@ -4971,7 +4972,6 @@  process_leaf:
 			ctx->log_new_dentries = false;
 			if (type == BTRFS_FT_DIR)
 				log_mode = LOG_INODE_ALL;
-			btrfs_release_path(path);
 			ret = btrfs_log_inode(trans, root, di_inode,
 					      log_mode, 0, LLONG_MAX, ctx);
 			iput(di_inode);