diff mbox series

ext4/jbd2: drop jbd2_transaction_committed()

Message ID 20240513072119.2335346-1-yi.zhang@huaweicloud.com (mailing list archive)
State New
Headers show
Series ext4/jbd2: drop jbd2_transaction_committed() | expand

Commit Message

Zhang Yi May 13, 2024, 7:21 a.m. UTC
From: Zhang Yi <yi.zhang@huawei.com>

jbd2_transaction_committed() is used to check whether a transaction with
the given tid has already committed, it hold j_state_lock in read mode
and check the tid of current running transaction and committing
transaction, but holding the j_state_lock is expensive.

We have already stored the sequence number of the most recently
committed transaction in journal t->j_commit_sequence, we could do this
check by comparing it with the given tid instead. If the given tid isn't
smaller than j_commit_sequence, we can ensure that the given transaction
has been committed. That way we could drop the expensive lock and
achieve about 10% ~ 20% performance gains in concurrent DIOs on may
virtual machine with 100G ramdisk.

fio -filename=/mnt/foo -direct=1 -iodepth=10 -rw=$rw -ioengine=libaio \
    -bs=4k -size=10G -numjobs=10 -runtime=60 -overwrite=1 -name=test \
    -group_reporting

Before:
  overwrite       IOPS=88.2k, BW=344MiB/s
  read            IOPS=95.7k, BW=374MiB/s
  rand overwrite  IOPS=98.7k, BW=386MiB/s
  randread        IOPS=102k, BW=397MiB/s

After:
  verwrite:       IOPS=105k, BW=410MiB/s
  read:           IOPS=112k, BW=436MiB/s
  rand overwrite: IOPS=104k, BW=404MiB/s
  randread:       IOPS=111k, BW=432MiB/s

CC: Dave Chinner <david@fromorbit.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Link: https://lore.kernel.org/linux-ext4/493ab4c5-505c-a351-eefa-7d2677cdf800@huaweicloud.com/T/#m6a14df5d085527a188c5a151191e87a3252dc4e2
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c      |  4 ++--
 fs/jbd2/journal.c    | 17 -----------------
 include/linux/jbd2.h |  1 -
 3 files changed, 2 insertions(+), 20 deletions(-)

Comments

Jan Kara May 15, 2024, 12:25 a.m. UTC | #1
On Mon 13-05-24 15:21:19, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> jbd2_transaction_committed() is used to check whether a transaction with
> the given tid has already committed, it hold j_state_lock in read mode
> and check the tid of current running transaction and committing
> transaction, but holding the j_state_lock is expensive.
> 
> We have already stored the sequence number of the most recently
> committed transaction in journal t->j_commit_sequence, we could do this
> check by comparing it with the given tid instead. If the given tid isn't
> smaller than j_commit_sequence, we can ensure that the given transaction
> has been committed. That way we could drop the expensive lock and
> achieve about 10% ~ 20% performance gains in concurrent DIOs on may
> virtual machine with 100G ramdisk.
> 
> fio -filename=/mnt/foo -direct=1 -iodepth=10 -rw=$rw -ioengine=libaio \
>     -bs=4k -size=10G -numjobs=10 -runtime=60 -overwrite=1 -name=test \
>     -group_reporting
> 
> Before:
>   overwrite       IOPS=88.2k, BW=344MiB/s
>   read            IOPS=95.7k, BW=374MiB/s
>   rand overwrite  IOPS=98.7k, BW=386MiB/s
>   randread        IOPS=102k, BW=397MiB/s
> 
> After:
>   verwrite:       IOPS=105k, BW=410MiB/s
>   read:           IOPS=112k, BW=436MiB/s
>   rand overwrite: IOPS=104k, BW=404MiB/s
>   randread:       IOPS=111k, BW=432MiB/s
> 
> CC: Dave Chinner <david@fromorbit.com>
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Link: https://lore.kernel.org/linux-ext4/493ab4c5-505c-a351-eefa-7d2677cdf800@huaweicloud.com/T/#m6a14df5d085527a188c5a151191e87a3252dc4e2
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

I agree this is workable solution and the performance benefits are nice. But
I have some comments regarding the implementation:

> @@ -3199,8 +3199,8 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
>  	journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
>  
>  	if (journal) {
> -		if (jbd2_transaction_committed(journal,
> -			EXT4_I(inode)->i_datasync_tid))
> +		if (tid_geq(journal->j_commit_sequence,
> +			    EXT4_I(inode)->i_datasync_tid))

Please leave the helper jbd2_transaction_committed(), just make the
implementation more efficient. Also accessing j_commit_sequence without any
lock is theoretically problematic wrt compiler optimization. You should have
READ_ONCE() there and the places modifying j_commit_sequence need to use
WRITE_ONCE().

								Honza
Zhang Yi May 16, 2024, 8:27 a.m. UTC | #2
On 2024/5/15 8:25, Jan Kara wrote:
> On Mon 13-05-24 15:21:19, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> jbd2_transaction_committed() is used to check whether a transaction with
>> the given tid has already committed, it hold j_state_lock in read mode
>> and check the tid of current running transaction and committing
>> transaction, but holding the j_state_lock is expensive.
>>
>> We have already stored the sequence number of the most recently
>> committed transaction in journal t->j_commit_sequence, we could do this
>> check by comparing it with the given tid instead. If the given tid isn't
>> smaller than j_commit_sequence, we can ensure that the given transaction
>> has been committed. That way we could drop the expensive lock and
>> achieve about 10% ~ 20% performance gains in concurrent DIOs on may
>> virtual machine with 100G ramdisk.
>>
>> fio -filename=/mnt/foo -direct=1 -iodepth=10 -rw=$rw -ioengine=libaio \
>>     -bs=4k -size=10G -numjobs=10 -runtime=60 -overwrite=1 -name=test \
>>     -group_reporting
>>
>> Before:
>>   overwrite       IOPS=88.2k, BW=344MiB/s
>>   read            IOPS=95.7k, BW=374MiB/s
>>   rand overwrite  IOPS=98.7k, BW=386MiB/s
>>   randread        IOPS=102k, BW=397MiB/s
>>
>> After:
>>   verwrite:       IOPS=105k, BW=410MiB/s
>>   read:           IOPS=112k, BW=436MiB/s
>>   rand overwrite: IOPS=104k, BW=404MiB/s
>>   randread:       IOPS=111k, BW=432MiB/s
>>
>> CC: Dave Chinner <david@fromorbit.com>
>> Suggested-by: Dave Chinner <david@fromorbit.com>
>> Link: https://lore.kernel.org/linux-ext4/493ab4c5-505c-a351-eefa-7d2677cdf800@huaweicloud.com/T/#m6a14df5d085527a188c5a151191e87a3252dc4e2
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> I agree this is workable solution and the performance benefits are nice. But
> I have some comments regarding the implementation:
> 
>> @@ -3199,8 +3199,8 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
>>  	journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
>>  
>>  	if (journal) {
>> -		if (jbd2_transaction_committed(journal,
>> -			EXT4_I(inode)->i_datasync_tid))
>> +		if (tid_geq(journal->j_commit_sequence,
>> +			    EXT4_I(inode)->i_datasync_tid))
> 
> Please leave the helper jbd2_transaction_committed(), just make the
> implementation more efficient. 

Sure.

> Also accessing j_commit_sequence without any
> lock is theoretically problematic wrt compiler optimization. You should have
> READ_ONCE() there and the places modifying j_commit_sequence need to use
> WRITE_ONCE().
> 

Thanks for pointing this out, but I'm not sure if we have to need READ_ONCE()
here. IIUC, if we add READ_ONCE(), we could make sure to get the latest
j_commit_sequence, if not, there is a window (it might becomes larger) that
we could get the old value and jbd2_transaction_committed() could return false
even if the given transaction was just committed, but I think the window is
always there, so it looks like it is not a big problem, is that right?

Thanks,
Yi.
Jan Kara May 20, 2024, 8:49 a.m. UTC | #3
On Thu 16-05-24 16:27:25, Zhang Yi wrote:
> On 2024/5/15 8:25, Jan Kara wrote:
> > On Mon 13-05-24 15:21:19, Zhang Yi wrote:
> > Also accessing j_commit_sequence without any
> > lock is theoretically problematic wrt compiler optimization. You should have
> > READ_ONCE() there and the places modifying j_commit_sequence need to use
> > WRITE_ONCE().
> > 
> 
> Thanks for pointing this out, but I'm not sure if we have to need READ_ONCE()
> here. IIUC, if we add READ_ONCE(), we could make sure to get the latest
> j_commit_sequence, if not, there is a window (it might becomes larger) that
> we could get the old value and jbd2_transaction_committed() could return false
> even if the given transaction was just committed, but I think the window is
> always there, so it looks like it is not a big problem, is that right?

Well, all accesses to any memory should use READ_ONCE(), be protected by a
lock, or use types that handle atomicity on assembly level (like atomic_t,
or atomic bit operations and similar). Otherwise the compiler is free to
assume the underlying memory cannot change and generate potentionally
invalid code. In this case, I don't think realistically any compiler will
do it but still it is a good practice and also it saves us from KCSAN
warnings. If you want to know more details about possible problems, see

  tools/memory-model/Documentation/explanation.txt

chapter "PLAIN ACCESSES AND DATA RACES".

								Honza
Zhang Yi May 20, 2024, 12:39 p.m. UTC | #4
On 2024/5/20 16:49, Jan Kara wrote:
> On Thu 16-05-24 16:27:25, Zhang Yi wrote:
>> On 2024/5/15 8:25, Jan Kara wrote:
>>> On Mon 13-05-24 15:21:19, Zhang Yi wrote:
>>> Also accessing j_commit_sequence without any
>>> lock is theoretically problematic wrt compiler optimization. You should have
>>> READ_ONCE() there and the places modifying j_commit_sequence need to use
>>> WRITE_ONCE().
>>>
>>
>> Thanks for pointing this out, but I'm not sure if we have to need READ_ONCE()
>> here. IIUC, if we add READ_ONCE(), we could make sure to get the latest
>> j_commit_sequence, if not, there is a window (it might becomes larger) that
>> we could get the old value and jbd2_transaction_committed() could return false
>> even if the given transaction was just committed, but I think the window is
>> always there, so it looks like it is not a big problem, is that right?
> 
> Well, all accesses to any memory should use READ_ONCE(), be protected by a
> lock, or use types that handle atomicity on assembly level (like atomic_t,
> or atomic bit operations and similar). Otherwise the compiler is free to
> assume the underlying memory cannot change and generate potentionally
> invalid code. In this case, I don't think realistically any compiler will
> do it but still it is a good practice and also it saves us from KCSAN
> warnings. If you want to know more details about possible problems, see
> 
>   tools/memory-model/Documentation/explanation.txt
> 
> chapter "PLAIN ACCESSES AND DATA RACES".
> 

Sure, this document is really helpful, I'll add READ_ONCE() and
WRITE_ONCE() here, thanks a lot.

Yi.
diff mbox series

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 537803250ca9..e8e2865bf9ac 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3199,8 +3199,8 @@  static bool ext4_inode_datasync_dirty(struct inode *inode)
 	journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
 
 	if (journal) {
-		if (jbd2_transaction_committed(journal,
-			EXT4_I(inode)->i_datasync_tid))
+		if (tid_geq(journal->j_commit_sequence,
+			    EXT4_I(inode)->i_datasync_tid))
 			return false;
 		if (test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT))
 			return !list_empty(&EXT4_I(inode)->i_fc_list);
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index b6c114c11b97..73737cd1106f 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -786,23 +786,6 @@  int jbd2_fc_end_commit_fallback(journal_t *journal)
 }
 EXPORT_SYMBOL(jbd2_fc_end_commit_fallback);
 
-/* Return 1 when transaction with given tid has already committed. */
-int jbd2_transaction_committed(journal_t *journal, tid_t tid)
-{
-	int ret = 1;
-
-	read_lock(&journal->j_state_lock);
-	if (journal->j_running_transaction &&
-	    journal->j_running_transaction->t_tid == tid)
-		ret = 0;
-	if (journal->j_committing_transaction &&
-	    journal->j_committing_transaction->t_tid == tid)
-		ret = 0;
-	read_unlock(&journal->j_state_lock);
-	return ret;
-}
-EXPORT_SYMBOL(jbd2_transaction_committed);
-
 /*
  * When this function returns the transaction corresponding to tid
  * will be completed.  If the transaction has currently running, start
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 971f3e826e15..e15ae324169d 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1643,7 +1643,6 @@  extern void	jbd2_clear_buffer_revoked_flags(journal_t *journal);
 int jbd2_log_start_commit(journal_t *journal, tid_t tid);
 int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
 int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
-int jbd2_transaction_committed(journal_t *journal, tid_t tid);
 int jbd2_complete_transaction(journal_t *journal, tid_t tid);
 int jbd2_log_do_checkpoint(journal_t *journal);
 int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid);