Message ID | 20200702113220.163855-1-fdmanana@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/4] btrfs: only commit the delayed inode when doing a full fsync | expand |
On 7/2/20 7:32 AM, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > When logging an inode we are committing its delayed items if either the > inode is a directory or if it is a new inode, created in the current > transaction. > > We need to do it for directories, since new directory indexes are stored > as delayed items of the inode and when logging a directory we need to be > able to access all indexes from the fs/subvolume tree in order to figure > out which index ranges need to be logged. > > However for new inodes that are not directories, we do not need to do it > because the only type of delayed item they can have is the inode item, and > we are guaranteed to always log an up to date version of the inode item: > > *) for a full fsync we do it by committing the delayed inode and then > copying the item from the fs/subvolume tree with > copy_inode_items_to_log(); > > *) for a fast fsync we always log the inode item based on the contents of > the in-memory struct btrfs_inode. We guarantee this is always done since > commit e4545de5b035c7 ("Btrfs: fix fsync data loss after append write"). > > So stop running delayed items for a new inodes that are not directories, > since that forces committing the delayed inode into the fs/subvolume tree, > wasting time and adding contention to the tree when a full fsync is not > required. We will only do it in case a fast fsync is needed. > > This patch is part of a series that has the following patches: > > 1/4 btrfs: only commit the delayed inode when doing a full fsync > 2/4 btrfs: only commit delayed items at fsync if we are logging a directory > 3/4 btrfs: stop incremening log_batch for the log root tree when syncing log > 4/4 btrfs: remove no longer needed use of log_writers for the log root tree > > After the entire patchset applied I saw about 12% decrease on max latency > reported by dbench. The test was done on a qemu vm, with 8 cores, 16Gb of > ram, using kvm and using a raw NVMe device directly (no intermediary fs on > the host). The test was invoked like the following: > > mkfs.btrfs -f /dev/sdk > mount -o ssd -o nospace_cache /dev/sdk /mnt/sdk > dbench -D /mnt/sdk -t 300 8 > umount /mnt/dsk > > Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 44bbf8919883..7c325451d47f 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -5123,7 +5123,6 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans, const loff_t end, struct btrfs_log_ctx *ctx) { - struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_path *path; struct btrfs_path *dst_path; struct btrfs_key min_key; @@ -5166,15 +5165,17 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans, max_key.offset = (u64)-1; /* - * Only run delayed items if we are a dir or a new file. + * Only run delayed items if we are a directory. We want to make sure + * all directory indexes hit the fs/subvolume tree so we can find them + * and figure out which index ranges have to be logged. + * * Otherwise commit the delayed inode only if the full sync flag is set, * as we want to make sure an up to date version is in the subvolume * tree so copy_inode_items_to_log() / copy_items() can find it and copy * it to the log tree. For a non full sync, we always log the inode item * based on the in-memory struct btrfs_inode which is always up to date. */ - if (S_ISDIR(inode->vfs_inode.i_mode) || - inode->generation > fs_info->last_trans_committed) + if (S_ISDIR(inode->vfs_inode.i_mode)) ret = btrfs_commit_inode_delayed_items(trans, inode); else if (test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags)) ret = btrfs_commit_inode_delayed_inode(inode);