Message ID | 1405496225-22416-1-git-send-email-bo.li.liu@oracle.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On Wed, 16 Jul 2014 15:37:05 +0800, Liu Bo wrote: > xfstests generic/127 detected this problem. > > With commit 7fc34a62ca4434a79c68e23e70ed26111b7a4cf8, now fsync will only flush > data within the passed range. This is the cause of the above problem, > -- btrfs's fsync has a stage called 'sync log' which will wait for all the > ordered extents it've recorded to finish. > > In xfstests/generic/127, with mixed operations such as truncate, fallocate, > punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will > mmap, and then msync. And I find that msync will wait for quite a long time > (about 20s in my case), thanks to ftrace, it turns out that the previous > fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the > range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants, > there can be some ordered extents created but not getting corresponding pages > flushed, then they're left in memory until we fsync which runs into the > stage 'sync log', and fsync will just wait for the system writeback thread > to flush those pages and get ordered extents finished, so the latency is > inevitable. > > This adds a flush similar to btrfs_start_ordered_extent() in > btrfs_wait_logged_extents() to fix that. > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com> > --- > v2: > Move flush part into btrfs_wait_logged_extents() to get the flush range > more precise. > > fs/btrfs/ordered-data.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c > index e12441c..3b52a76 100644 > --- a/fs/btrfs/ordered-data.c > +++ b/fs/btrfs/ordered-data.c > @@ -484,8 +484,16 @@ void btrfs_wait_logged_extents(struct btrfs_root *log, u64 transid) > log_list); > list_del_init(&ordered->log_list); > spin_unlock_irq(&log->log_extents_lock[index]); > + > + WARN_ON(!ordered->inode); > + if (!test_bit(BTRFS_ORDERED_DIRECT, &ordered->flags)) > + filemap_fdatawrite_range(ordered->inode->i_mapping, > + ordered->file_offset, > + ordered->file_offset + ordered->len - 1); I can use bytes_left to filter the ordered extents that have been written out. The other is OK. Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> > + > wait_event(ordered->wait, test_bit(BTRFS_ORDERED_IO_DONE, > &ordered->flags)); > + > btrfs_put_ordered_extent(ordered); > spin_lock_irq(&log->log_extents_lock[index]); > } > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 16, 2014 at 05:36:07PM +0800, Miao Xie wrote: > On Wed, 16 Jul 2014 15:37:05 +0800, Liu Bo wrote: > > xfstests generic/127 detected this problem. > > > > With commit 7fc34a62ca4434a79c68e23e70ed26111b7a4cf8, now fsync will only flush > > data within the passed range. This is the cause of the above problem, > > -- btrfs's fsync has a stage called 'sync log' which will wait for all the > > ordered extents it've recorded to finish. > > > > In xfstests/generic/127, with mixed operations such as truncate, fallocate, > > punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will > > mmap, and then msync. And I find that msync will wait for quite a long time > > (about 20s in my case), thanks to ftrace, it turns out that the previous > > fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the > > range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants, > > there can be some ordered extents created but not getting corresponding pages > > flushed, then they're left in memory until we fsync which runs into the > > stage 'sync log', and fsync will just wait for the system writeback thread > > to flush those pages and get ordered extents finished, so the latency is > > inevitable. > > > > This adds a flush similar to btrfs_start_ordered_extent() in > > btrfs_wait_logged_extents() to fix that. > > > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com> > > --- > > v2: > > Move flush part into btrfs_wait_logged_extents() to get the flush range > > more precise. > > > > fs/btrfs/ordered-data.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c > > index e12441c..3b52a76 100644 > > --- a/fs/btrfs/ordered-data.c > > +++ b/fs/btrfs/ordered-data.c > > @@ -484,8 +484,16 @@ void btrfs_wait_logged_extents(struct btrfs_root *log, u64 transid) > > log_list); > > list_del_init(&ordered->log_list); > > spin_unlock_irq(&log->log_extents_lock[index]); > > + > > + WARN_ON(!ordered->inode); > > + if (!test_bit(BTRFS_ORDERED_DIRECT, &ordered->flags)) > > + filemap_fdatawrite_range(ordered->inode->i_mapping, > > + ordered->file_offset, > > + ordered->file_offset + ordered->len - 1); > > I can use bytes_left to filter the ordered extents that have been written out. I prefer to use BTRFS_ORDERED_IO_DONE, but no big difference with bytes_left :) > > The other is OK. > > Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> thanks, -liubo > > > + > > wait_event(ordered->wait, test_bit(BTRFS_ORDERED_IO_DONE, > > &ordered->flags)); > > + > > btrfs_put_ordered_extent(ordered); > > spin_lock_irq(&log->log_extents_lock[index]); > > } > > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index e12441c..3b52a76 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -484,8 +484,16 @@ void btrfs_wait_logged_extents(struct btrfs_root *log, u64 transid) log_list); list_del_init(&ordered->log_list); spin_unlock_irq(&log->log_extents_lock[index]); + + WARN_ON(!ordered->inode); + if (!test_bit(BTRFS_ORDERED_DIRECT, &ordered->flags)) + filemap_fdatawrite_range(ordered->inode->i_mapping, + ordered->file_offset, + ordered->file_offset + ordered->len - 1); + wait_event(ordered->wait, test_bit(BTRFS_ORDERED_IO_DONE, &ordered->flags)); + btrfs_put_ordered_extent(ordered); spin_lock_irq(&log->log_extents_lock[index]); }
xfstests generic/127 detected this problem. With commit 7fc34a62ca4434a79c68e23e70ed26111b7a4cf8, now fsync will only flush data within the passed range. This is the cause of the above problem, -- btrfs's fsync has a stage called 'sync log' which will wait for all the ordered extents it've recorded to finish. In xfstests/generic/127, with mixed operations such as truncate, fallocate, punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will mmap, and then msync. And I find that msync will wait for quite a long time (about 20s in my case), thanks to ftrace, it turns out that the previous fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants, there can be some ordered extents created but not getting corresponding pages flushed, then they're left in memory until we fsync which runs into the stage 'sync log', and fsync will just wait for the system writeback thread to flush those pages and get ordered extents finished, so the latency is inevitable. This adds a flush similar to btrfs_start_ordered_extent() in btrfs_wait_logged_extents() to fix that. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> --- v2: Move flush part into btrfs_wait_logged_extents() to get the flush range more precise. fs/btrfs/ordered-data.c | 8 ++++++++ 1 file changed, 8 insertions(+)