xfs: Remove i_rwsem lock in buffered read

Message ID	20241226061602.2222985-1-chizhiling@163.com (mailing list archive)
State	New
Headers	show Received: from m16.mail.163.com (m16.mail.163.com [117.135.210.4]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 08F5D17591; Thu, 26 Dec 2024 06:16:30 +0000 (UTC) From: Chi Zhiling <chizhiling@163.com> To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, Chi Zhiling <chizhiling@kylinos.cn> Subject: [PATCH] xfs: Remove i_rwsem lock in buffered read Date: Thu, 26 Dec 2024 14:16:02 +0800 Message-ID: <20241226061602.2222985-1-chizhiling@163.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	xfs: Remove i_rwsem lock in buffered read \| expand xfs: Remove i_rwsem lock in buffered read

Message ID

20241226061602.2222985-1-chizhiling@163.com (mailing list archive)

State

New

Headers

From: Chi Zhiling <chizhiling@163.com>
To: djwong@kernel.org,
	cem@kernel.org
Cc: linux-xfs@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Chi Zhiling <chizhiling@kylinos.cn>
Subject: [PATCH] xfs: Remove i_rwsem lock in buffered read
Date: Thu, 26 Dec 2024 14:16:02 +0800
Message-ID: <20241226061602.2222985-1-chizhiling@163.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

xfs: Remove i_rwsem lock in buffered read | expand

Commit Message

Chi Zhiling Dec. 26, 2024, 6:16 a.m. UTC

From: Chi Zhiling <chizhiling@kylinos.cn>

Using an rwsem to protect file data ensures that we can always obtain a
completed modification. But due to the lock, we need to wait for the
write process to release the rwsem before we can read it, even if we are
reading a different region of the file. This could take a lot of time
when many processes need to write and read this file.

On the other hand, The ext4 filesystem and others do not hold the lock
during buffered reading, which make the ext4 have better performance in
that case. Therefore, I think it will be fine if we remove the lock in
xfs, as most applications can handle this situation.

Without this lock, we achieve a great improvement when multi-threaded
reading and writing. Use the following command to test:
fio -ioengine=libaio -filename=testfile -bs=4k -rw=randrw -numjobs=16 -name="randrw"

Before this patch:
   READ: bw=351MiB/s (368MB/s), 21.8MiB/s-22.0MiB/s (22.9MB/s-23.1MB/s), io=8185MiB (8582MB), run=23206-23347msec
  WRITE: bw=351MiB/s (368MB/s), 21.9MiB/s-22.1MiB/s (23.0MB/s-23.2MB/s), io=8199MiB (8597MB), run=23206-23347msec

After this patch:
   READ: bw=1961MiB/s (2056MB/s), 122MiB/s-125MiB/s (128MB/s-131MB/s), io=8185MiB (8582MB), run=4097-4174msec
  WRITE: bw=1964MiB/s (2060MB/s), 123MiB/s-125MiB/s (129MB/s-131MB/s), io=8199MiB (8597MB), run=4097-4174msec

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
---
 fs/xfs/xfs_file.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

Comments

Dave Chinner Dec. 26, 2024, 9:50 p.m. UTC | #1

On Thu, Dec 26, 2024 at 02:16:02PM +0800, Chi Zhiling wrote:
> From: Chi Zhiling <chizhiling@kylinos.cn>
> 
> Using an rwsem to protect file data ensures that we can always obtain a
> completed modification. But due to the lock, we need to wait for the
> write process to release the rwsem before we can read it, even if we are
> reading a different region of the file. This could take a lot of time
> when many processes need to write and read this file.
> 
> On the other hand, The ext4 filesystem and others do not hold the lock
> during buffered reading, which make the ext4 have better performance in
> that case. Therefore, I think it will be fine if we remove the lock in
> xfs, as most applications can handle this situation.

Nope.

This means that XFS loses high level serialisation of incoming IO
against operations like truncate, fallocate, pnfs operations, etc.

We've been through this multiple times before; the solution lies in
doing the work to make buffered writes use shared locking, not
removing shared locking from buffered reads.

A couple of old discussions from the list:

https://lore.kernel.org/linux-xfs/CAOQ4uxi0pGczXBX7GRAFs88Uw0n1ERJZno3JSeZR71S1dXg+2w@mail.gmail.com/
https://lore.kernel.org/linux-xfs/20190404165737.30889-1-amir73il@gmail.com/

There are likely others - you can search for them yourself to get
more background information.

Fundamentally, though, removing locking from the read side is not
the answer to this buffered write IO exclusion problem....

-Dave.

Chi Zhiling Dec. 28, 2024, 7:37 a.m. UTC | #2

On 2024/12/27 05:50, Dave Chinner wrote:
> On Thu, Dec 26, 2024 at 02:16:02PM +0800, Chi Zhiling wrote:
>> From: Chi Zhiling <chizhiling@kylinos.cn>
>>
>> Using an rwsem to protect file data ensures that we can always obtain a
>> completed modification. But due to the lock, we need to wait for the
>> write process to release the rwsem before we can read it, even if we are
>> reading a different region of the file. This could take a lot of time
>> when many processes need to write and read this file.
>>
>> On the other hand, The ext4 filesystem and others do not hold the lock
>> during buffered reading, which make the ext4 have better performance in
>> that case. Therefore, I think it will be fine if we remove the lock in
>> xfs, as most applications can handle this situation.
> 
> Nope.
> 
> This means that XFS loses high level serialisation of incoming IO
> against operations like truncate, fallocate, pnfs operations, etc.
> 
> We've been through this multiple times before; the solution lies in
> doing the work to make buffered writes use shared locking, not
> removing shared locking from buffered reads.

You mean using shared locking for buffered reads and writes, right?

I think it's a great idea. In theory, write operations can be performed
simultaneously if they write to different ranges.

So we should track all the ranges we are reading or writing,
and check whether the new read or write operations can be performed
concurrently with the current operations.

Do we have any plans to use shared locking for buffered writes?


> 
> A couple of old discussions from the list:
> 
> https://lore.kernel.org/linux-xfs/CAOQ4uxi0pGczXBX7GRAFs88Uw0n1ERJZno3JSeZR71S1dXg+2w@mail.gmail.com/
> https://lore.kernel.org/linux-xfs/20190404165737.30889-1-amir73il@gmail.com/
> 
> There are likely others - you can search for them yourself to get
> more background information.

Sorry, I didn't find those discussions earlier.

> 
> Fundamentally, though, removing locking from the read side is not
> the answer to this buffered write IO exclusion problem....
> 
> -Dave.

Best regards,
Chi Zhiling

Dave Chinner Dec. 28, 2024, 10:17 p.m. UTC | #3

On Sat, Dec 28, 2024 at 03:37:41PM +0800, Chi Zhiling wrote:
> 
> 
> On 2024/12/27 05:50, Dave Chinner wrote:
> > On Thu, Dec 26, 2024 at 02:16:02PM +0800, Chi Zhiling wrote:
> > > From: Chi Zhiling <chizhiling@kylinos.cn>
> > > 
> > > Using an rwsem to protect file data ensures that we can always obtain a
> > > completed modification. But due to the lock, we need to wait for the
> > > write process to release the rwsem before we can read it, even if we are
> > > reading a different region of the file. This could take a lot of time
> > > when many processes need to write and read this file.
> > > 
> > > On the other hand, The ext4 filesystem and others do not hold the lock
> > > during buffered reading, which make the ext4 have better performance in
> > > that case. Therefore, I think it will be fine if we remove the lock in
> > > xfs, as most applications can handle this situation.
> > 
> > Nope.
> > 
> > This means that XFS loses high level serialisation of incoming IO
> > against operations like truncate, fallocate, pnfs operations, etc.
> > 
> > We've been through this multiple times before; the solution lies in
> > doing the work to make buffered writes use shared locking, not
> > removing shared locking from buffered reads.
> 
> You mean using shared locking for buffered reads and writes, right?
> 
> I think it's a great idea. In theory, write operations can be performed
> simultaneously if they write to different ranges.

Even if they overlap, the folio locks will prevent concurrent writes
to the same range.

Now that we have atomic write support as native functionality (i.e.
RWF_ATOMIC), we really should not have to care that much about
normal buffered IO being atomic. i.e. if the application wants
atomic writes, it can now specify that it wants atomic writes and so
we can relax the constraints we have on existing IO...

> So we should track all the ranges we are reading or writing,
> and check whether the new read or write operations can be performed
> concurrently with the current operations.

That is all discussed in detail in the discussions I linked.

> Do we have any plans to use shared locking for buffered writes?

All we are waiting for is someone to put the resources into making
the changes and testing it properly...

-Dave.

Chi Zhiling Dec. 30, 2024, 2:42 a.m. UTC | #4

On 2024/12/29 06:17, Dave Chinner wrote:
> On Sat, Dec 28, 2024 at 03:37:41PM +0800, Chi Zhiling wrote:
>>
>>
>> On 2024/12/27 05:50, Dave Chinner wrote:
>>> On Thu, Dec 26, 2024 at 02:16:02PM +0800, Chi Zhiling wrote:
>>>> From: Chi Zhiling <chizhiling@kylinos.cn>
>>>>
>>>> Using an rwsem to protect file data ensures that we can always obtain a
>>>> completed modification. But due to the lock, we need to wait for the
>>>> write process to release the rwsem before we can read it, even if we are
>>>> reading a different region of the file. This could take a lot of time
>>>> when many processes need to write and read this file.
>>>>
>>>> On the other hand, The ext4 filesystem and others do not hold the lock
>>>> during buffered reading, which make the ext4 have better performance in
>>>> that case. Therefore, I think it will be fine if we remove the lock in
>>>> xfs, as most applications can handle this situation.
>>>
>>> Nope.
>>>
>>> This means that XFS loses high level serialisation of incoming IO
>>> against operations like truncate, fallocate, pnfs operations, etc.
>>>
>>> We've been through this multiple times before; the solution lies in
>>> doing the work to make buffered writes use shared locking, not
>>> removing shared locking from buffered reads.
>>
>> You mean using shared locking for buffered reads and writes, right?
>>
>> I think it's a great idea. In theory, write operations can be performed
>> simultaneously if they write to different ranges.
> 
> Even if they overlap, the folio locks will prevent concurrent writes
> to the same range.
> 
> Now that we have atomic write support as native functionality (i.e.
> RWF_ATOMIC), we really should not have to care that much about
> normal buffered IO being atomic. i.e. if the application wants
> atomic writes, it can now specify that it wants atomic writes and so
> we can relax the constraints we have on existing IO...

Yes, I'm not particularly concerned about whether buffered I/O is 
atomic. I'm more concerned about the multithreading performance of 
buffered I/O.

Last week, it was mentioned that removing i_rwsem would have some 
impacts on truncate, fallocate, and PNFS operations.

(I'm not familiar with pNFS, so please correct me if I'm wrong.)

My understanding is that the current i_rwsem is used to protect both
the file's data and its size. Operations like truncate, fallocate,
and PNFS use i_rwsem because they modify both the file's data and its 
size. So, I'm thinking whether it's possible to use i_rwsem to protect 
only the file's size, without protecting the file's data.

So operations that modify the file's size need to be executed 
sequentially. For example, buffered writes to the EOF, fallocate 
operations without the "keep size" requirement, and truncate operations, 
etc, all need to hold an exclusive lock.

Other operations require a shared lock because they only need to access
the file's size without modifying it.

> 
>> So we should track all the ranges we are reading or writing,
>> and check whether the new read or write operations can be performed
>> concurrently with the current operations.
> 
> That is all discussed in detail in the discussions I linked.

Sorry, I overlooked some details from old discussion last time.
It seems that you are not satisfied with the effectiveness of
range locks.

Best regards,
Chi Zhiling

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 9a435b1ff264..7d039cc3ae9e 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -279,18 +279,9 @@  xfs_file_buffered_read(
 	struct kiocb		*iocb,
 	struct iov_iter		*to)
 {
-	struct xfs_inode	*ip = XFS_I(file_inode(iocb->ki_filp));
-	ssize_t			ret;
-
 	trace_xfs_file_buffered_read(iocb, to);
 
-	ret = xfs_ilock_iocb(iocb, XFS_IOLOCK_SHARED);
-	if (ret)
-		return ret;
-	ret = generic_file_read_iter(iocb, to);
-	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
-
-	return ret;
+	return generic_file_read_iter(iocb, to);
 }
 
 STATIC ssize_t

xfs: Remove i_rwsem lock in buffered read

Commit Message

Comments

Patch