diff mbox

hang on 3.9, 3.10-rc5

Message ID 20130618165952.9494.8953@localhost.localdomain (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Mason June 18, 2013, 4:59 p.m. UTC
Quoting Josef Bacik (2013-06-18 12:37:06)
> On Tue, Jun 11, 2013 at 11:43:30AM -0400, Sage Weil wrote:
> > I'm also seeing this hang regularly with both 3.9 and 3.10-rc5.  Is this 
> > is a known problem?  In this case there is no powercycling; just a regular 
> > ceph-osd workload.
> > 
> 
> Have you gotten sysrq+w?  Can you tell me where

He attached it last week.

> 
> log_one_extent.isra.22+0x485
> 
> is on your box?  Thanks,

It's very suspect that both of the times he logged this
log_one_extent popped up.  That should be: 

                wait_event(ordered->wait, ordered->csum_bytes_left == 0);

But Sage it would definitely help if you could confirm.

If we follow log_one_extent all the way up to btrfs_log_inode:

               } else if (test_and_clear_bit(BTRFS_INODE_COPY_EVERYTHING,
                                              &BTRFS_I(inode)->runtime_flags)) { 
                        if (inode_only == LOG_INODE_ALL)
                                fast_search = true;
                        max_key.type = BTRFS_XATTR_ITEM_KEY;
                        ret = drop_objectid_items(trans, log, path, ino,
                                                  max_key.type);

Now fast_search is true, but we don't jump directly to logging the
extent.  The while loop runs, we hit the first break.  ins_nr is zero.

Then we:
       if (fast_search) {
                btrfs_release_path(dst_path);
                ret = btrfs_log_changed_extents(trans, root, inode, dst_path);
                if (ret) {
                        err = ret;
                        goto out_unlock;
                }


Very long way of saying I think we're one release_path short.  Sage, I
haven't tested this at all yet, I was hoping to trigger it first.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Sage Weil June 20, 2013, 3:44 a.m. UTC | #1
Hi Chris,

On Tue, 18 Jun 2013, Chris Mason wrote:
> [...]
> Very long way of saying I think we're one release_path short.  Sage, I
> haven't tested this at all yet, I was hoping to trigger it first.
> 
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index c276ac9..c1954b3 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -3730,6 +3730,7 @@ next_slot:
>  log_extents:
>  	if (fast_search) {
>  		btrfs_release_path(dst_path);
> +		btrfs_release_path(path);
>  		ret = btrfs_log_changed_extents(trans, root, inode, dst_path);
>  		if (ret) {
>  			err = ret;

This seems to be doing the trick.  I'll keep testing overnight, but so far 
so good!

Thanks-
sage

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sage Weil June 20, 2013, 9:56 p.m. UTC | #2
On Wed, 19 Jun 2013, Sage Weil wrote:
> Hi Chris,
> 
> On Tue, 18 Jun 2013, Chris Mason wrote:
> > [...]
> > Very long way of saying I think we're one release_path short.  Sage, I
> > haven't tested this at all yet, I was hoping to trigger it first.
> > 
> > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > index c276ac9..c1954b3 100644
> > --- a/fs/btrfs/tree-log.c
> > +++ b/fs/btrfs/tree-log.c
> > @@ -3730,6 +3730,7 @@ next_slot:
> >  log_extents:
> >  	if (fast_search) {
> >  		btrfs_release_path(dst_path);
> > +		btrfs_release_path(path);
> >  		ret = btrfs_log_changed_extents(trans, root, inode, dst_path);
> >  		if (ret) {
> >  			err = ret;
> 
> This seems to be doing the trick.  I'll keep testing overnight, but so far 
> so good!

...and it's still holding up well in QA.

Thanks, Chris!
sage
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason June 21, 2013, 12:42 a.m. UTC | #3
Quoting Sage Weil (2013-06-20 17:56:19)
> On Wed, 19 Jun 2013, Sage Weil wrote:
> > Hi Chris,
> > 
> > On Tue, 18 Jun 2013, Chris Mason wrote:
> > > [...]
> > > Very long way of saying I think we're one release_path short.  Sage, I
> > > haven't tested this at all yet, I was hoping to trigger it first.
> > > 
> > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > > index c276ac9..c1954b3 100644
> > > --- a/fs/btrfs/tree-log.c
> > > +++ b/fs/btrfs/tree-log.c
> > > @@ -3730,6 +3730,7 @@ next_slot:
> > >  log_extents:
> > >     if (fast_search) {
> > >             btrfs_release_path(dst_path);
> > > +           btrfs_release_path(path);
> > >             ret = btrfs_log_changed_extents(trans, root, inode, dst_path);
> > >             if (ret) {
> > >                     err = ret;
> > 
> > This seems to be doing the trick.  I'll keep testing overnight, but so far 
> > so good!
> 
> ...and it's still holding up well in QA.

Awesome, thanks for getting the traces for us.  Looks like this one has
been around since v3.7, so I'm not going to try and sneak it into the
3.10 final.  I'll have it in the next merge window and for stable.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sage Weil June 21, 2013, 1 a.m. UTC | #4
On Thu, 20 Jun 2013, Chris Mason wrote:
> Quoting Sage Weil (2013-06-20 17:56:19)
> > On Wed, 19 Jun 2013, Sage Weil wrote:
> > > Hi Chris,
> > > 
> > > On Tue, 18 Jun 2013, Chris Mason wrote:
> > > > [...]
> > > > Very long way of saying I think we're one release_path short.  Sage, I
> > > > haven't tested this at all yet, I was hoping to trigger it first.
> > > > 
> > > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > > > index c276ac9..c1954b3 100644
> > > > --- a/fs/btrfs/tree-log.c
> > > > +++ b/fs/btrfs/tree-log.c
> > > > @@ -3730,6 +3730,7 @@ next_slot:
> > > >  log_extents:
> > > >     if (fast_search) {
> > > >             btrfs_release_path(dst_path);
> > > > +           btrfs_release_path(path);
> > > >             ret = btrfs_log_changed_extents(trans, root, inode, dst_path);
> > > >             if (ret) {
> > > >                     err = ret;
> > > 
> > > This seems to be doing the trick.  I'll keep testing overnight, but so far 
> > > so good!
> > 
> > ...and it's still holding up well in QA.
> 
> Awesome, thanks for getting the traces for us.  Looks like this one has
> been around since v3.7, so I'm not going to try and sneak it into the
> 3.10 final.  I'll have it in the next merge window and for stable.

Weird, these same tests have been running on it nightly for ages and it 
seems like these failures just started with 3.9.  Perhaps some other 
change made it hang when it didn't before?

In any case, thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason June 21, 2013, 1:04 a.m. UTC | #5
Quoting Sage Weil (2013-06-20 21:00:21)
> On Thu, 20 Jun 2013, Chris Mason wrote:
> > Awesome, thanks for getting the traces for us.  Looks like this one has
> > been around since v3.7, so I'm not going to try and sneak it into the
> > 3.10 final.  I'll have it in the next merge window and for stable.
> 
> Weird, these same tests have been running on it nightly for ages and it 
> seems like these failures just started with 3.9.  Perhaps some other 
> change made it hang when it didn't before?
> 

It's always possible, there are a ton of moving pieces here.  The
wait_event you were hung on was waiting for crcs to finish, and that
part at least isn't new.

Somewhat unrelated, but are you still using notreelog?

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sage Weil June 21, 2013, 1:09 a.m. UTC | #6
On Thu, 20 Jun 2013, Chris Mason wrote:
> Quoting Sage Weil (2013-06-20 21:00:21)
> > On Thu, 20 Jun 2013, Chris Mason wrote:
> > > Awesome, thanks for getting the traces for us.  Looks like this one has
> > > been around since v3.7, so I'm not going to try and sneak it into the
> > > 3.10 final.  I'll have it in the next merge window and for stable.
> > 
> > Weird, these same tests have been running on it nightly for ages and it 
> > seems like these failures just started with 3.9.  Perhaps some other 
> > change made it hang when it didn't before?
> > 
> 
> It's always possible, there are a ton of moving pieces here.  The
> wait_event you were hung on was waiting for crcs to finish, and that
> part at least isn't new.

K.  There was also a shift of writes to leveldb (which does the mmap 
thing), so that may explain the change in behavior.

> Somewhat unrelated, but are you still using notreelog?

Nope, just noatime.

Thanks-
sage
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sage Weil Aug. 9, 2013, 6:04 p.m. UTC | #7
Hi Chris,

On Thu, 20 Jun 2013, Chris Mason wrote:
> Quoting Sage Weil (2013-06-20 17:56:19)
> > On Wed, 19 Jun 2013, Sage Weil wrote:
> > > Hi Chris,
> > > 
> > > On Tue, 18 Jun 2013, Chris Mason wrote:
> > > > [...]
> > > > Very long way of saying I think we're one release_path short.  Sage, I
> > > > haven't tested this at all yet, I was hoping to trigger it first.
> > > > 
> > > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > > > index c276ac9..c1954b3 100644
> > > > --- a/fs/btrfs/tree-log.c
> > > > +++ b/fs/btrfs/tree-log.c
> > > > @@ -3730,6 +3730,7 @@ next_slot:
> > > >  log_extents:
> > > >     if (fast_search) {
> > > >             btrfs_release_path(dst_path);
> > > > +           btrfs_release_path(path);
> > > >             ret = btrfs_log_changed_extents(trans, root, inode, dst_path);
> > > >             if (ret) {
> > > >                     err = ret;
> > > 
> > > This seems to be doing the trick.  I'll keep testing overnight, but so far 
> > > so good!
> > 
> > ...and it's still holding up well in QA.
> 
> Awesome, thanks for getting the traces for us.  Looks like this one has
> been around since v3.7, so I'm not going to try and sneak it into the
> 3.10 final.  I'll have it in the next merge window and for stable.

I was just pulling in the current -rc for testing and realized that this 
isn't upstream yet.  Was this missed?  Sorry I didn't check earlier.

Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason Aug. 9, 2013, 6:08 p.m. UTC | #8
Quoting Sage Weil (2013-08-09 14:04:15)
> Hi Chris,
> 
> On Thu, 20 Jun 2013, Chris Mason wrote:
> > Quoting Sage Weil (2013-06-20 17:56:19)
> > > On Wed, 19 Jun 2013, Sage Weil wrote:
> > > > Hi Chris,
> > > > 
> > > > On Tue, 18 Jun 2013, Chris Mason wrote:
> > > > > [...]
> > > > > Very long way of saying I think we're one release_path short.  Sage, I
> > > > > haven't tested this at all yet, I was hoping to trigger it first.
> > > > > 
> > > > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > > > > index c276ac9..c1954b3 100644
> > > > > --- a/fs/btrfs/tree-log.c
> > > > > +++ b/fs/btrfs/tree-log.c
> > > > > @@ -3730,6 +3730,7 @@ next_slot:
> > > > >  log_extents:
> > > > >     if (fast_search) {
> > > > >             btrfs_release_path(dst_path);
> > > > > +           btrfs_release_path(path);
> > > > >             ret = btrfs_log_changed_extents(trans, root, inode, dst_path);
> > > > >             if (ret) {
> > > > >                     err = ret;
> > > > 
> > > > This seems to be doing the trick.  I'll keep testing overnight, but so far 
> > > > so good!
> > > 
> > > ...and it's still holding up well in QA.
> > 
> > Awesome, thanks for getting the traces for us.  Looks like this one has
> > been around since v3.7, so I'm not going to try and sneak it into the
> > 3.10 final.  I'll have it in the next merge window and for stable.
> 
> I was just pulling in the current -rc for testing and realized that this 
> isn't upstream yet.  Was this missed?  Sorry I didn't check earlier.

Ack, thanks Sage.  It'll go into the next pull (cc stable)

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index c276ac9..c1954b3 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3730,6 +3730,7 @@  next_slot:
 log_extents:
 	if (fast_search) {
 		btrfs_release_path(dst_path);
+		btrfs_release_path(path);
 		ret = btrfs_log_changed_extents(trans, root, inode, dst_path);
 		if (ret) {
 			err = ret;