Message ID | 20180719145006.17532-17-josef@toxicpanda.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jul 19, 2018 at 3:50 PM, Josef Bacik <josef@toxicpanda.com> wrote: > Since we've changed the fsync() path to always run ordered extents > before doing the tree log we no longer need to take the dio_sem in the > tree log path. This gets rid of a lockdep splat that I was seeing with > the AIO tests. The dio_sem can be removed completely, it was added to synchronize lockless dio writes with fsync to avoid races. I.e., just removing it here in the tree-log makes it useless. thanks! > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > --- > fs/btrfs/tree-log.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > index f8220ec02036..aa06e1954b84 100644 > --- a/fs/btrfs/tree-log.c > +++ b/fs/btrfs/tree-log.c > @@ -4439,7 +4439,6 @@ static int btrfs_log_changed_extents(struct btrfs_trans_handle *trans, > > INIT_LIST_HEAD(&extents); > > - down_write(&inode->dio_sem); > write_lock(&tree->lock); > test_gen = root->fs_info->last_trans_committed; > logged_start = start; > @@ -4520,7 +4519,6 @@ static int btrfs_log_changed_extents(struct btrfs_trans_handle *trans, > } > WARN_ON(!list_empty(&extents)); > write_unlock(&tree->lock); > - up_write(&inode->dio_sem); > > btrfs_release_path(path); > if (!ret) > -- > 2.14.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 19, 2018 at 3:50 PM, Josef Bacik <josef@toxicpanda.com> wrote: > Since we've changed the fsync() path to always run ordered extents > before doing the tree log we no longer need to take the dio_sem in the > tree log path. This gets rid of a lockdep splat that I was seeing with > the AIO tests. So actually, we still need it (or some other means of sync). Because even after the recent changes to fsync, the fast path still logs extent items based on the extent maps, and the dio write path creates first the extent map and then the ordered extent. So the old problem can still happen between concurrent fsync and lockless dio write, where fsync logs an extent item for an extent map whose ordered extent we never waited for. The solution prior to the introduction of dio_sem solved this - make the dio write create first the ordered extent, and, only after it, create the extent map. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > --- > fs/btrfs/tree-log.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > index f8220ec02036..aa06e1954b84 100644 > --- a/fs/btrfs/tree-log.c > +++ b/fs/btrfs/tree-log.c > @@ -4439,7 +4439,6 @@ static int btrfs_log_changed_extents(struct btrfs_trans_handle *trans, > > INIT_LIST_HEAD(&extents); > > - down_write(&inode->dio_sem); > write_lock(&tree->lock); > test_gen = root->fs_info->last_trans_committed; > logged_start = start; > @@ -4520,7 +4519,6 @@ static int btrfs_log_changed_extents(struct btrfs_trans_handle *trans, > } > WARN_ON(!list_empty(&extents)); > write_unlock(&tree->lock); > - up_write(&inode->dio_sem); > > btrfs_release_path(path); > if (!ret) > -- > 2.14.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 19, 2018 at 04:21:58PM +0100, Filipe Manana wrote: > On Thu, Jul 19, 2018 at 3:50 PM, Josef Bacik <josef@toxicpanda.com> wrote: > > Since we've changed the fsync() path to always run ordered extents > > before doing the tree log we no longer need to take the dio_sem in the > > tree log path. This gets rid of a lockdep splat that I was seeing with > > the AIO tests. > > So actually, we still need it (or some other means of sync). > Because even after the recent changes to fsync, the fast path still > logs extent items based on the extent maps, and the dio write path > creates first the extent map and then the ordered extent. > So the old problem can still happen between concurrent fsync and > lockless dio write, where fsync logs an extent item for an extent map > whose ordered extent we never waited for. > The solution prior to the introduction of dio_sem solved this - make > the dio write create first the ordered extent, and, only after it, > create the extent map. > Oooh balls I see. This is still a problem even if we add the ordered extent first, because we can easily just start the lockless dio write after we've waited for ordered extents, so the order we create the extent map and ordered extent don't actually matter. We still have this lockdep thing, I think we just move the dio_sem to the start of fsync, if we're fsyncing you just don't get to do lockless dio writes while we're doing the fsync. What do you think? Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 19, 2018 at 4:54 PM, Josef Bacik <josef@toxicpanda.com> wrote: > On Thu, Jul 19, 2018 at 04:21:58PM +0100, Filipe Manana wrote: >> On Thu, Jul 19, 2018 at 3:50 PM, Josef Bacik <josef@toxicpanda.com> wrote: >> > Since we've changed the fsync() path to always run ordered extents >> > before doing the tree log we no longer need to take the dio_sem in the >> > tree log path. This gets rid of a lockdep splat that I was seeing with >> > the AIO tests. >> >> So actually, we still need it (or some other means of sync). >> Because even after the recent changes to fsync, the fast path still >> logs extent items based on the extent maps, and the dio write path >> creates first the extent map and then the ordered extent. >> So the old problem can still happen between concurrent fsync and >> lockless dio write, where fsync logs an extent item for an extent map >> whose ordered extent we never waited for. >> The solution prior to the introduction of dio_sem solved this - make >> the dio write create first the ordered extent, and, only after it, >> create the extent map. >> > > Oooh balls I see. This is still a problem even if we add the ordered extent > first, because we can easily just start the lockless dio write after we've > waited for ordered extents, so the order we create the extent map and ordered > extent don't actually matter. We still have this lockdep thing, I think we just > move the dio_sem to the start of fsync, if we're fsyncing you just don't get to > do lockless dio writes while we're doing the fsync. What do you think? Thanks, On first though, it seems to be the simplest and sanest solution :) Thanks. > > Josef
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index f8220ec02036..aa06e1954b84 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -4439,7 +4439,6 @@ static int btrfs_log_changed_extents(struct btrfs_trans_handle *trans, INIT_LIST_HEAD(&extents); - down_write(&inode->dio_sem); write_lock(&tree->lock); test_gen = root->fs_info->last_trans_committed; logged_start = start; @@ -4520,7 +4519,6 @@ static int btrfs_log_changed_extents(struct btrfs_trans_handle *trans, } WARN_ON(!list_empty(&extents)); write_unlock(&tree->lock); - up_write(&inode->dio_sem); btrfs_release_path(path); if (!ret)
Since we've changed the fsync() path to always run ordered extents before doing the tree log we no longer need to take the dio_sem in the tree log path. This gets rid of a lockdep splat that I was seeing with the AIO tests. Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/btrfs/tree-log.c | 2 -- 1 file changed, 2 deletions(-)