Message ID | 20240827051028.1751933-2-hch@lst.de (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | [01/10] iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release | expand |
On Tue, Aug 27, 2024 at 07:09:48AM +0200, Christoph Hellwig wrote: > When direct I/O completions invalidates the page cache it holds neither the > i_rwsem nor the invalidate_lock so it can be racing with > iomap_write_delalloc_release. If the search for the end of the region that > contains data returns the start offset we hit such a race and just need to > look for the end of the newly created hole instead. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/iomap/buffered-io.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index f420c53d86acc5..69a931de1979b9 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -1241,7 +1241,15 @@ static int iomap_write_delalloc_release(struct inode *inode, > error = data_end; > goto out_unlock; > } > - WARN_ON_ONCE(data_end <= start_byte); > + > + /* > + * If we race with post-direct I/O invalidation of the page cache, > + * there might be no data left at start_byte. > + */ > + if (data_end == start_byte) > + continue; Is there any chance that we could get stuck in a loop here? I think it's the case that if SEEK_HOLE returns data_end == start_byte, then the next time through the loop, the SEEK_DATA will return something that is > start_byte. Unless someone is very rapidly writing and punching the page cache? Hmm but then if *xfs* is punching delalloc then we're we holding the iolock so who else could be doing that? If the answers are 'no' and 'nobody' then Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > + > + WARN_ON_ONCE(data_end < start_byte); > WARN_ON_ONCE(data_end > scan_end_byte); > > error = iomap_write_delalloc_scan(inode, &punch_start_byte, > -- > 2.43.0 > >
On Tue, Aug 27, 2024 at 09:14:16AM -0700, Darrick J. Wong wrote: > Is there any chance that we could get stuck in a loop here? I > think it's the case that if SEEK_HOLE returns data_end == start_byte, > then the next time through the loop, the SEEK_DATA will return something > that is > start_byte. Yes. > Unless someone is very rapidly writing and > punching the page cache? > > Hmm but then if *xfs* is punching delalloc then we're we holding the > iolock so who else could be doing that? Yes. It's only the async direct I/O completions that punch without the lock.
On Wed, Aug 28, 2024 at 06:48:48AM +0200, Christoph Hellwig wrote: > On Tue, Aug 27, 2024 at 09:14:16AM -0700, Darrick J. Wong wrote: > > Is there any chance that we could get stuck in a loop here? I > > think it's the case that if SEEK_HOLE returns data_end == start_byte, > > then the next time through the loop, the SEEK_DATA will return something > > that is > start_byte. > > Yes. > > > Unless someone is very rapidly writing and > > punching the page cache? > > > > Hmm but then if *xfs* is punching delalloc then we're we holding the > > iolock so who else could be doing that? > > Yes. It's only the async direct I/O completions that punch without > the lock. Heh, I forgot that I'd asked three questions. Anyway I'm satisfied with the answers I got, so Reviewed-by: Darrick J. Wong <djwong@kernel.org> Though we might have to revisit this for filesystems that don't take i_rwsem exclusively when writing -- is that a problem? I guess if you had two threads both writing and punching the pagecache they could get into trouble, but that might be a case of "if it hurts don't do that". --D
On Wed, Aug 28, 2024 at 09:13:38AM -0700, Darrick J. Wong wrote: > Though we might have to revisit this for filesystems that don't take > i_rwsem exclusively when writing -- is that a problem? I guess if you > had two threads both writing and punching the pagecache they could get > into trouble, but that might be a case of "if it hurts don't do that". No i_rwsem for buffered writes? You can't really do that without hell breaking lose. At least not without another exclusive lock.
On Thu, Aug 29, 2024 at 05:46:26AM +0200, Christoph Hellwig wrote: > On Wed, Aug 28, 2024 at 09:13:38AM -0700, Darrick J. Wong wrote: > > Though we might have to revisit this for filesystems that don't take > > i_rwsem exclusively when writing -- is that a problem? I guess if you > > had two threads both writing and punching the pagecache they could get > > into trouble, but that might be a case of "if it hurts don't do that". > > No i_rwsem for buffered writes? You can't really do that without hell > breaking lose. At least not without another exclusive lock. Well, i_rwsem in shared mode. ISTR ext4 does inode_lock_shared and serializes on the folio lock, at least for non extending writes. --D
On Thu, Aug 29, 2024 at 07:22:19AM -0700, Darrick J. Wong wrote: > On Thu, Aug 29, 2024 at 05:46:26AM +0200, Christoph Hellwig wrote: > > On Wed, Aug 28, 2024 at 09:13:38AM -0700, Darrick J. Wong wrote: > > > Though we might have to revisit this for filesystems that don't take > > > i_rwsem exclusively when writing -- is that a problem? I guess if you > > > had two threads both writing and punching the pagecache they could get > > > into trouble, but that might be a case of "if it hurts don't do that". > > > > No i_rwsem for buffered writes? You can't really do that without hell > > breaking lose. At least not without another exclusive lock. > > Well, i_rwsem in shared mode. ISTR ext4 does inode_lock_shared and > serializes on the folio lock, at least for non extending writes. ext4 uses plain inode lock/unlock which is an exclusive i_rwsem. Given that Posix requires the entire write to synchronized vs other writes applications would break otherwise.
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index f420c53d86acc5..69a931de1979b9 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1241,7 +1241,15 @@ static int iomap_write_delalloc_release(struct inode *inode, error = data_end; goto out_unlock; } - WARN_ON_ONCE(data_end <= start_byte); + + /* + * If we race with post-direct I/O invalidation of the page cache, + * there might be no data left at start_byte. + */ + if (data_end == start_byte) + continue; + + WARN_ON_ONCE(data_end < start_byte); WARN_ON_ONCE(data_end > scan_end_byte); error = iomap_write_delalloc_scan(inode, &punch_start_byte,
When direct I/O completions invalidates the page cache it holds neither the i_rwsem nor the invalidate_lock so it can be racing with iomap_write_delalloc_release. If the search for the end of the region that contains data returns the start offset we hit such a race and just need to look for the end of the newly created hole instead. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/iomap/buffered-io.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)