diff mbox series

[2/3] btrfs: do not commit logs and transactions during link and rename operations

Message ID 20200811114348.692115-1-fdmanana@kernel.org (mailing list archive)
State New, archived
Headers show
Series btrfs: a few performance improvements for fsync and rename/link | expand

Commit Message

Filipe Manana Aug. 11, 2020, 11:43 a.m. UTC
From: Filipe Manana <fdmanana@suse.com>

Since commit d4682ba03ef618 ("Btrfs: sync log after logging new name") we
started to commit logs, and fallback to transaction commits when we failed
to log the new names or commit the logs, after link and rename operations
when the target inodes (or their parents) were previously logged in the
current transaction. This was to avoid losing directories despite an
explicit fsync on them when they are ancestors of some inode that got a
new named logged, due to a link or rename operation. However that adds the
cost of starting IO and waiting for it to complete, which can cause higher
latencies for applications.

Instead of doing that, just make sure that when we log a new name for an
inode we don't mark any of its ancestors as logged, so that if any one
does an fsync against any of them, without doing any other change on them,
the fsync commits the log. This way we only pay the cost of a log commit
(or a transaction commit if something goes wrong or a new block group was
created) if the application explicitly asks to fsync any of the parent
directories.

Using dbench, which mixes several filesystems operations including renames,
revealed some significant latency gains. The following script that uses
dbench was used to test this:

  #!/bin/bash

  DEV=/dev/nvme0n1
  MNT=/mnt/btrfs
  MOUNT_OPTIONS="-o ssd -o space_cache=v2"
  MKFS_OPTIONS="-m single -d single"
  THREADS=16

  echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
  mkfs.btrfs -f $MKFS_OPTIONS $DEV
  mount $MOUNT_OPTIONS $DEV $MNT

  dbench -t 300 -D $MNT $THREADS

  umount $MNT

The test was run on bare metal, no virtualization, on a box with 12 cores
(Intel i7-8700), 64Gb of RAM and using a NVMe device, with a kernel
configuration that is the default of typical distributions (debian in this
case), without debug options enabled (kasan, kmemleak, slub debug, debug
of page allocations, lock debugging, etc).

Results before this patch:

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX    10750455     0.011   155.088
 Close        7896674     0.001     0.243
 Rename        455222     2.158  1101.947
 Unlink       2171189     0.067   121.638
 Deltree          256     2.425     7.816
 Mkdir            128     0.002     0.003
 Qpathinfo    9744323     0.006    21.370
 Qfileinfo    1707092     0.001     0.146
 Qfsinfo      1786756     0.001    11.228
 Sfileinfo     875612     0.003    21.263
 Find         3767281     0.025     9.617
 WriteX       5356924     0.011   211.390
 ReadX        16852694     0.003    9.442
 LockX          35008     0.002     0.119
 UnlockX        35008     0.001     0.138
 Flush         753458     4.252  1102.249

Throughput 1128.35 MB/sec  16 clients  16 procs  max_latency=1102.255 ms

Results after this patch:

16 clients, after

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX    11471098     0.012   448.281
 Close        8426396     0.001     0.925
 Rename        485746     0.123   267.183
 Unlink       2316477     0.080    63.433
 Deltree          288     2.830    11.144
 Mkdir            144     0.003     0.010
 Qpathinfo    10397420     0.006    10.288
 Qfileinfo    1822039     0.001     0.169
 Qfsinfo      1906497     0.002    14.039
 Sfileinfo     934433     0.004     2.438
 Find         4019879     0.026    10.200
 WriteX       5718932     0.011   200.985
 ReadX        17981671     0.003    10.036
 LockX          37352     0.002     0.076
 UnlockX        37352     0.001     0.109
 Flush         804018     5.015   778.033

Throughput 1201.98 MB/sec  16 clients  16 procs  max_latency=778.036 ms
(+6.5% throughput, -29.4% max latency, -75.8% rename latency)

Test case generic/498 from fstests tests the scenario that the previously
mentioned commit fixed.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/inode.c    | 115 +++++---------------------------------------
 fs/btrfs/tree-log.c | 100 +++++++++++++++++---------------------
 fs/btrfs/tree-log.h |  14 ++----
 3 files changed, 60 insertions(+), 169 deletions(-)

Comments

Josef Bacik Aug. 11, 2020, 6:33 p.m. UTC | #1
On 8/11/20 7:43 AM, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Since commit d4682ba03ef618 ("Btrfs: sync log after logging new name") we
> started to commit logs, and fallback to transaction commits when we failed
> to log the new names or commit the logs, after link and rename operations
> when the target inodes (or their parents) were previously logged in the
> current transaction. This was to avoid losing directories despite an
> explicit fsync on them when they are ancestors of some inode that got a
> new named logged, due to a link or rename operation. However that adds the
> cost of starting IO and waiting for it to complete, which can cause higher
> latencies for applications.
> 
> Instead of doing that, just make sure that when we log a new name for an
> inode we don't mark any of its ancestors as logged, so that if any one
> does an fsync against any of them, without doing any other change on them,
> the fsync commits the log. This way we only pay the cost of a log commit
> (or a transaction commit if something goes wrong or a new block group was
> created) if the application explicitly asks to fsync any of the parent
> directories.
> 
> Using dbench, which mixes several filesystems operations including renames,
> revealed some significant latency gains. The following script that uses
> dbench was used to test this:
> 
>    #!/bin/bash
> 
>    DEV=/dev/nvme0n1
>    MNT=/mnt/btrfs
>    MOUNT_OPTIONS="-o ssd -o space_cache=v2"
>    MKFS_OPTIONS="-m single -d single"
>    THREADS=16
> 
>    echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
>    mkfs.btrfs -f $MKFS_OPTIONS $DEV
>    mount $MOUNT_OPTIONS $DEV $MNT
> 
>    dbench -t 300 -D $MNT $THREADS
> 
>    umount $MNT
> 
> The test was run on bare metal, no virtualization, on a box with 12 cores
> (Intel i7-8700), 64Gb of RAM and using a NVMe device, with a kernel
> configuration that is the default of typical distributions (debian in this
> case), without debug options enabled (kasan, kmemleak, slub debug, debug
> of page allocations, lock debugging, etc).
> 
> Results before this patch:
> 
>   Operation      Count    AvgLat    MaxLat
>   ----------------------------------------
>   NTCreateX    10750455     0.011   155.088
>   Close        7896674     0.001     0.243
>   Rename        455222     2.158  1101.947
>   Unlink       2171189     0.067   121.638
>   Deltree          256     2.425     7.816
>   Mkdir            128     0.002     0.003
>   Qpathinfo    9744323     0.006    21.370
>   Qfileinfo    1707092     0.001     0.146
>   Qfsinfo      1786756     0.001    11.228
>   Sfileinfo     875612     0.003    21.263
>   Find         3767281     0.025     9.617
>   WriteX       5356924     0.011   211.390
>   ReadX        16852694     0.003    9.442
>   LockX          35008     0.002     0.119
>   UnlockX        35008     0.001     0.138
>   Flush         753458     4.252  1102.249
> 
> Throughput 1128.35 MB/sec  16 clients  16 procs  max_latency=1102.255 ms
> 
> Results after this patch:
> 
> 16 clients, after
> 
>   Operation      Count    AvgLat    MaxLat
>   ----------------------------------------
>   NTCreateX    11471098     0.012   448.281
>   Close        8426396     0.001     0.925
>   Rename        485746     0.123   267.183
>   Unlink       2316477     0.080    63.433
>   Deltree          288     2.830    11.144
>   Mkdir            144     0.003     0.010
>   Qpathinfo    10397420     0.006    10.288
>   Qfileinfo    1822039     0.001     0.169
>   Qfsinfo      1906497     0.002    14.039
>   Sfileinfo     934433     0.004     2.438
>   Find         4019879     0.026    10.200
>   WriteX       5718932     0.011   200.985
>   ReadX        17981671     0.003    10.036
>   LockX          37352     0.002     0.076
>   UnlockX        37352     0.001     0.109
>   Flush         804018     5.015   778.033
> 
> Throughput 1201.98 MB/sec  16 clients  16 procs  max_latency=778.036 ms
> (+6.5% throughput, -29.4% max latency, -75.8% rename latency)
> 
> Test case generic/498 from fstests tests the scenario that the previously
> mentioned commit fixed.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Man this took me a while to grok, but I think I've got it.  The only thing is 
this patch doesn't apply to current (as of today) misc-next.  Thanks,

Josef
Filipe Manana Aug. 12, 2020, 8:29 a.m. UTC | #2
On Tue, Aug 11, 2020 at 7:33 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On 8/11/20 7:43 AM, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > Since commit d4682ba03ef618 ("Btrfs: sync log after logging new name") we
> > started to commit logs, and fallback to transaction commits when we failed
> > to log the new names or commit the logs, after link and rename operations
> > when the target inodes (or their parents) were previously logged in the
> > current transaction. This was to avoid losing directories despite an
> > explicit fsync on them when they are ancestors of some inode that got a
> > new named logged, due to a link or rename operation. However that adds the
> > cost of starting IO and waiting for it to complete, which can cause higher
> > latencies for applications.
> >
> > Instead of doing that, just make sure that when we log a new name for an
> > inode we don't mark any of its ancestors as logged, so that if any one
> > does an fsync against any of them, without doing any other change on them,
> > the fsync commits the log. This way we only pay the cost of a log commit
> > (or a transaction commit if something goes wrong or a new block group was
> > created) if the application explicitly asks to fsync any of the parent
> > directories.
> >
> > Using dbench, which mixes several filesystems operations including renames,
> > revealed some significant latency gains. The following script that uses
> > dbench was used to test this:
> >
> >    #!/bin/bash
> >
> >    DEV=/dev/nvme0n1
> >    MNT=/mnt/btrfs
> >    MOUNT_OPTIONS="-o ssd -o space_cache=v2"
> >    MKFS_OPTIONS="-m single -d single"
> >    THREADS=16
> >
> >    echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> >    mkfs.btrfs -f $MKFS_OPTIONS $DEV
> >    mount $MOUNT_OPTIONS $DEV $MNT
> >
> >    dbench -t 300 -D $MNT $THREADS
> >
> >    umount $MNT
> >
> > The test was run on bare metal, no virtualization, on a box with 12 cores
> > (Intel i7-8700), 64Gb of RAM and using a NVMe device, with a kernel
> > configuration that is the default of typical distributions (debian in this
> > case), without debug options enabled (kasan, kmemleak, slub debug, debug
> > of page allocations, lock debugging, etc).
> >
> > Results before this patch:
> >
> >   Operation      Count    AvgLat    MaxLat
> >   ----------------------------------------
> >   NTCreateX    10750455     0.011   155.088
> >   Close        7896674     0.001     0.243
> >   Rename        455222     2.158  1101.947
> >   Unlink       2171189     0.067   121.638
> >   Deltree          256     2.425     7.816
> >   Mkdir            128     0.002     0.003
> >   Qpathinfo    9744323     0.006    21.370
> >   Qfileinfo    1707092     0.001     0.146
> >   Qfsinfo      1786756     0.001    11.228
> >   Sfileinfo     875612     0.003    21.263
> >   Find         3767281     0.025     9.617
> >   WriteX       5356924     0.011   211.390
> >   ReadX        16852694     0.003    9.442
> >   LockX          35008     0.002     0.119
> >   UnlockX        35008     0.001     0.138
> >   Flush         753458     4.252  1102.249
> >
> > Throughput 1128.35 MB/sec  16 clients  16 procs  max_latency=1102.255 ms
> >
> > Results after this patch:
> >
> > 16 clients, after
> >
> >   Operation      Count    AvgLat    MaxLat
> >   ----------------------------------------
> >   NTCreateX    11471098     0.012   448.281
> >   Close        8426396     0.001     0.925
> >   Rename        485746     0.123   267.183
> >   Unlink       2316477     0.080    63.433
> >   Deltree          288     2.830    11.144
> >   Mkdir            144     0.003     0.010
> >   Qpathinfo    10397420     0.006    10.288
> >   Qfileinfo    1822039     0.001     0.169
> >   Qfsinfo      1906497     0.002    14.039
> >   Sfileinfo     934433     0.004     2.438
> >   Find         4019879     0.026    10.200
> >   WriteX       5718932     0.011   200.985
> >   ReadX        17981671     0.003    10.036
> >   LockX          37352     0.002     0.076
> >   UnlockX        37352     0.001     0.109
> >   Flush         804018     5.015   778.033
> >
> > Throughput 1201.98 MB/sec  16 clients  16 procs  max_latency=778.036 ms
> > (+6.5% throughput, -29.4% max latency, -75.8% rename latency)
> >
> > Test case generic/498 from fstests tests the scenario that the previously
> > mentioned commit fixed.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
>
> Man this took me a while to grok, but I think I've got it.  The only thing is
> this patch doesn't apply to current (as of today) misc-next.  Thanks,

It was based on misc-next from last week.
It conflicts with current misc-next due to a trivial patch that fixes
typos and grammar on comments:

https://github.com/kdave/btrfs-devel/commit/f993ba65484d364e30d7aee0293afafadf565f25

I can resend if needed.

Thanks.

>
> Josef
diff mbox series

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 96064eb41d55..a6eaaa039447 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6373,7 +6373,6 @@  static int btrfs_link(struct dentry *old_dentry, struct inode *dir,
 		drop_inode = 1;
 	} else {
 		struct dentry *parent = dentry->d_parent;
-		int ret;
 
 		err = btrfs_update_inode(trans, root, inode);
 		if (err)
@@ -6388,12 +6387,7 @@  static int btrfs_link(struct dentry *old_dentry, struct inode *dir,
 				goto fail;
 		}
 		d_instantiate(dentry, inode);
-		ret = btrfs_log_new_name(trans, BTRFS_I(inode), NULL, parent,
-					 true, NULL);
-		if (ret == BTRFS_NEED_TRANS_COMMIT) {
-			err = btrfs_commit_transaction(trans);
-			trans = NULL;
-		}
+		btrfs_log_new_name(trans, BTRFS_I(inode), NULL, parent);
 	}
 
 fail:
@@ -8775,27 +8769,19 @@  static int btrfs_rename_exchange(struct inode *old_dir,
 	struct inode *new_inode = new_dentry->d_inode;
 	struct inode *old_inode = old_dentry->d_inode;
 	struct timespec64 ctime = current_time(old_inode);
-	struct dentry *parent;
 	u64 old_ino = btrfs_ino(BTRFS_I(old_inode));
 	u64 new_ino = btrfs_ino(BTRFS_I(new_inode));
 	u64 old_idx = 0;
 	u64 new_idx = 0;
 	int ret;
+	int ret2;
 	bool root_log_pinned = false;
 	bool dest_log_pinned = false;
-	struct btrfs_log_ctx ctx_root;
-	struct btrfs_log_ctx ctx_dest;
-	bool sync_log_root = false;
-	bool sync_log_dest = false;
-	bool commit_transaction = false;
 
 	/* we only allow rename subvolume link between subvolumes */
 	if (old_ino != BTRFS_FIRST_FREE_OBJECTID && root != dest)
 		return -EXDEV;
 
-	btrfs_init_log_ctx(&ctx_root, old_inode);
-	btrfs_init_log_ctx(&ctx_dest, new_inode);
-
 	/* close the race window with snapshot create/destroy ioctl */
 	if (old_ino == BTRFS_FIRST_FREE_OBJECTID ||
 	    new_ino == BTRFS_FIRST_FREE_OBJECTID)
@@ -8937,30 +8923,14 @@  static int btrfs_rename_exchange(struct inode *old_dir,
 		BTRFS_I(new_inode)->dir_index = new_idx;
 
 	if (root_log_pinned) {
-		parent = new_dentry->d_parent;
-		ret = btrfs_log_new_name(trans, BTRFS_I(old_inode),
-					 BTRFS_I(old_dir), parent,
-					 false, &ctx_root);
-		if (ret == BTRFS_NEED_LOG_SYNC)
-			sync_log_root = true;
-		else if (ret == BTRFS_NEED_TRANS_COMMIT)
-			commit_transaction = true;
-		ret = 0;
+		btrfs_log_new_name(trans, BTRFS_I(old_inode), BTRFS_I(old_dir),
+				   new_dentry->d_parent);
 		btrfs_end_log_trans(root);
 		root_log_pinned = false;
 	}
 	if (dest_log_pinned) {
-		if (!commit_transaction) {
-			parent = old_dentry->d_parent;
-			ret = btrfs_log_new_name(trans, BTRFS_I(new_inode),
-						 BTRFS_I(new_dir), parent,
-						 false, &ctx_dest);
-			if (ret == BTRFS_NEED_LOG_SYNC)
-				sync_log_dest = true;
-			else if (ret == BTRFS_NEED_TRANS_COMMIT)
-				commit_transaction = true;
-			ret = 0;
-		}
+		btrfs_log_new_name(trans, BTRFS_I(new_inode), BTRFS_I(new_dir),
+				   old_dentry->d_parent);
 		btrfs_end_log_trans(dest);
 		dest_log_pinned = false;
 	}
@@ -8993,46 +8963,13 @@  static int btrfs_rename_exchange(struct inode *old_dir,
 			dest_log_pinned = false;
 		}
 	}
-	if (!ret && sync_log_root && !commit_transaction) {
-		ret = btrfs_sync_log(trans, BTRFS_I(old_inode)->root,
-				     &ctx_root);
-		if (ret)
-			commit_transaction = true;
-	}
-	if (!ret && sync_log_dest && !commit_transaction) {
-		ret = btrfs_sync_log(trans, BTRFS_I(new_inode)->root,
-				     &ctx_dest);
-		if (ret)
-			commit_transaction = true;
-	}
-	if (commit_transaction) {
-		/*
-		 * We may have set commit_transaction when logging the new name
-		 * in the destination root, in which case we left the source
-		 * root context in the list of log contextes. So make sure we
-		 * remove it to avoid invalid memory accesses, since the context
-		 * was allocated in our stack frame.
-		 */
-		if (sync_log_root) {
-			mutex_lock(&root->log_mutex);
-			list_del_init(&ctx_root.list);
-			mutex_unlock(&root->log_mutex);
-		}
-		ret = btrfs_commit_transaction(trans);
-	} else {
-		int ret2;
-
-		ret2 = btrfs_end_transaction(trans);
-		ret = ret ? ret : ret2;
-	}
+	ret2 = btrfs_end_transaction(trans);
+	ret = ret ? ret : ret2;
 out_notrans:
 	if (new_ino == BTRFS_FIRST_FREE_OBJECTID ||
 	    old_ino == BTRFS_FIRST_FREE_OBJECTID)
 		up_read(&fs_info->subvol_sem);
 
-	ASSERT(list_empty(&ctx_root.list));
-	ASSERT(list_empty(&ctx_dest.list));
-
 	return ret;
 }
 
@@ -9100,11 +9037,9 @@  static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 	struct inode *old_inode = d_inode(old_dentry);
 	u64 index = 0;
 	int ret;
+	int ret2;
 	u64 old_ino = btrfs_ino(BTRFS_I(old_inode));
 	bool log_pinned = false;
-	struct btrfs_log_ctx ctx;
-	bool sync_log = false;
-	bool commit_transaction = false;
 
 	if (btrfs_ino(BTRFS_I(new_dir)) == BTRFS_EMPTY_SUBVOL_DIR_OBJECTID)
 		return -EPERM;
@@ -9254,17 +9189,8 @@  static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 		BTRFS_I(old_inode)->dir_index = index;
 
 	if (log_pinned) {
-		struct dentry *parent = new_dentry->d_parent;
-
-		btrfs_init_log_ctx(&ctx, old_inode);
-		ret = btrfs_log_new_name(trans, BTRFS_I(old_inode),
-					 BTRFS_I(old_dir), parent,
-					 false, &ctx);
-		if (ret == BTRFS_NEED_LOG_SYNC)
-			sync_log = true;
-		else if (ret == BTRFS_NEED_TRANS_COMMIT)
-			commit_transaction = true;
-		ret = 0;
+		btrfs_log_new_name(trans, BTRFS_I(old_inode), BTRFS_I(old_dir),
+				   new_dentry->d_parent);
 		btrfs_end_log_trans(root);
 		log_pinned = false;
 	}
@@ -9301,23 +9227,8 @@  static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 		btrfs_end_log_trans(root);
 		log_pinned = false;
 	}
-	if (!ret && sync_log) {
-		ret = btrfs_sync_log(trans, BTRFS_I(old_inode)->root, &ctx);
-		if (ret)
-			commit_transaction = true;
-	} else if (sync_log) {
-		mutex_lock(&root->log_mutex);
-		list_del(&ctx.list);
-		mutex_unlock(&root->log_mutex);
-	}
-	if (commit_transaction) {
-		ret = btrfs_commit_transaction(trans);
-	} else {
-		int ret2;
-
-		ret2 = btrfs_end_transaction(trans);
-		ret = ret ? ret : ret2;
-	}
+	ret2 = btrfs_end_transaction(trans);
+	ret = ret ? ret : ret2;
 out_notrans:
 	if (old_ino == BTRFS_FIRST_FREE_OBJECTID)
 		up_read(&fs_info->subvol_sem);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 4e7ef3fd5288..1cc7ab169993 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -176,7 +176,7 @@  static int start_log_trans(struct btrfs_trans_handle *trans,
 
 	atomic_inc(&root->log_batch);
 	atomic_inc(&root->log_writers);
-	if (ctx) {
+	if (ctx && !ctx->logging_new_name) {
 		int index = root->log_transid % 2;
 		list_add_tail(&ctx->list, &root->log_ctxs[index]);
 		ctx->log_transid = root->log_transid;
@@ -5335,19 +5335,34 @@  static int btrfs_log_inode(struct btrfs_trans_handle *trans,
 	}
 
 	/*
-	 * Don't update last_log_commit if we logged that an inode exists after
-	 * it was loaded to memory (full_sync bit set).
-	 * This is to prevent data loss when we do a write to the inode, then
-	 * the inode gets evicted after all delalloc was flushed, then we log
-	 * it exists (due to a rename for example) and then fsync it. This last
-	 * fsync would do nothing (not logging the extents previously written).
+	 * If we are logging that an ancestor inode exists as part of logging a
+	 * new name from a link or rename operation, don't mark the inode as
+	 * logged - otherwise if an explicit fsync is made against an ancestor,
+	 * the fsync considers the inode in the log and doesn't sync the log,
+	 * resulting in the ancestor missing after a power failure unless the
+	 * log was synced as part of an fsync against any other unrelated inode.
+	 * So keep it simple for this case and just don't flag the ancestors as
+	 * logged.
 	 */
-	spin_lock(&inode->lock);
-	inode->logged_trans = trans->transid;
-	if (inode_only != LOG_INODE_EXISTS ||
-	    !test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags))
-		inode->last_log_commit = inode->last_sub_trans;
-	spin_unlock(&inode->lock);
+	if (!ctx ||
+	    !(S_ISDIR(inode->vfs_inode.i_mode) && ctx->logging_new_name &&
+	      &inode->vfs_inode != ctx->inode)) {
+		spin_lock(&inode->lock);
+		inode->logged_trans = trans->transid;
+		/*
+		 * Don't update last_log_commit if we logged that an inode exists
+		 * after it was loaded to memory (full_sync bit set).
+		 * This is to prevent data loss when we do a write to the inode,
+		 * then the inode gets evicted after all delalloc was flushed,
+		 * then we log it exists (due to a rename for example) and then
+		 * fsync it. This last fsync would do nothing (not logging the
+		 * extents previously written).
+		 */
+		if (inode_only != LOG_INODE_EXISTS ||
+		    !test_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags))
+			inode->last_log_commit = inode->last_sub_trans;
+		spin_unlock(&inode->lock);
+	}
 out_unlock:
 	mutex_unlock(&inode->log_mutex);
 
@@ -6367,26 +6382,13 @@  void btrfs_record_snapshot_destroy(struct btrfs_trans_handle *trans,
 /*
  * Call this after adding a new name for a file and it will properly
  * update the log to reflect the new name.
- *
- * @ctx can not be NULL when @sync_log is false, and should be NULL when it's
- * true (because it's not used).
- *
- * Return value depends on whether @sync_log is true or false.
- * When true: returns BTRFS_NEED_TRANS_COMMIT if the transaction needs to be
- *            committed by the caller, and BTRFS_DONT_NEED_TRANS_COMMIT
- *            otherwise.
- * When false: returns BTRFS_DONT_NEED_LOG_SYNC if the caller does not need to
- *             to sync the log, BTRFS_NEED_LOG_SYNC if it needs to sync the log,
- *             or BTRFS_NEED_TRANS_COMMIT if the transaction needs to be
- *             committed (without attempting to sync the log).
  */
-int btrfs_log_new_name(struct btrfs_trans_handle *trans,
+void btrfs_log_new_name(struct btrfs_trans_handle *trans,
 			struct btrfs_inode *inode, struct btrfs_inode *old_dir,
-			struct dentry *parent,
-			bool sync_log, struct btrfs_log_ctx *ctx)
+			struct dentry *parent)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
-	int ret;
+	struct btrfs_log_ctx ctx;
 
 	/*
 	 * this will force the logging code to walk the dentry chain
@@ -6401,34 +6403,18 @@  int btrfs_log_new_name(struct btrfs_trans_handle *trans,
 	 */
 	if (inode->logged_trans <= fs_info->last_trans_committed &&
 	    (!old_dir || old_dir->logged_trans <= fs_info->last_trans_committed))
-		return sync_log ? BTRFS_DONT_NEED_TRANS_COMMIT :
-			BTRFS_DONT_NEED_LOG_SYNC;
-
-	if (sync_log) {
-		struct btrfs_log_ctx ctx2;
-
-		btrfs_init_log_ctx(&ctx2, &inode->vfs_inode);
-		ret = btrfs_log_inode_parent(trans, inode, parent, 0, LLONG_MAX,
-					     LOG_INODE_EXISTS, &ctx2);
-		if (ret == BTRFS_NO_LOG_SYNC)
-			return BTRFS_DONT_NEED_TRANS_COMMIT;
-		else if (ret)
-			return BTRFS_NEED_TRANS_COMMIT;
-
-		ret = btrfs_sync_log(trans, inode->root, &ctx2);
-		if (ret)
-			return BTRFS_NEED_TRANS_COMMIT;
-		return BTRFS_DONT_NEED_TRANS_COMMIT;
-	}
-
-	ASSERT(ctx);
-	ret = btrfs_log_inode_parent(trans, inode, parent, 0, LLONG_MAX,
-				     LOG_INODE_EXISTS, ctx);
-	if (ret == BTRFS_NO_LOG_SYNC)
-		return BTRFS_DONT_NEED_LOG_SYNC;
-	else if (ret)
-		return BTRFS_NEED_TRANS_COMMIT;
+		return;
 
-	return BTRFS_NEED_LOG_SYNC;
+	btrfs_init_log_ctx(&ctx, &inode->vfs_inode);
+	ctx.logging_new_name = true;
+	/*
+	 * We don't care about the return value. If we fail to log the new name
+	 * then we know the next attempt to sync the log will fallback to a full
+	 * transaction commit (due to a call to btrfs_set_log_full_commit()), so
+	 * we don't need to worry about getting a log committed that has an
+	 * inconsistent state after a rename operation.
+	 */
+	btrfs_log_inode_parent(trans, inode, parent, 0, LLONG_MAX,
+			       LOG_INODE_EXISTS, &ctx);
 }
 
diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
index 132e43d29034..ddfc6789d9bf 100644
--- a/fs/btrfs/tree-log.h
+++ b/fs/btrfs/tree-log.h
@@ -16,6 +16,7 @@  struct btrfs_log_ctx {
 	int log_ret;
 	int log_transid;
 	bool log_new_dentries;
+	bool logging_new_name;
 	struct inode *inode;
 	struct list_head list;
 };
@@ -26,6 +27,7 @@  static inline void btrfs_init_log_ctx(struct btrfs_log_ctx *ctx,
 	ctx->log_ret = 0;
 	ctx->log_transid = 0;
 	ctx->log_new_dentries = false;
+	ctx->logging_new_name = false;
 	ctx->inode = inode;
 	INIT_LIST_HEAD(&ctx->list);
 }
@@ -67,16 +69,8 @@  void btrfs_record_unlink_dir(struct btrfs_trans_handle *trans,
 			     int for_rename);
 void btrfs_record_snapshot_destroy(struct btrfs_trans_handle *trans,
 				   struct btrfs_inode *dir);
-/* Return values for btrfs_log_new_name() */
-enum {
-	BTRFS_DONT_NEED_TRANS_COMMIT,
-	BTRFS_NEED_TRANS_COMMIT,
-	BTRFS_DONT_NEED_LOG_SYNC,
-	BTRFS_NEED_LOG_SYNC,
-};
-int btrfs_log_new_name(struct btrfs_trans_handle *trans,
+void btrfs_log_new_name(struct btrfs_trans_handle *trans,
 			struct btrfs_inode *inode, struct btrfs_inode *old_dir,
-			struct dentry *parent,
-			bool sync_log, struct btrfs_log_ctx *ctx);
+			struct dentry *parent);
 
 #endif