diff mbox series

fs: allow inode time modification with IOCB_NOWAIT

Message ID 39f8b446-dce3-373f-eb86-e3333b31122c@kernel.dk (mailing list archive)
State New, archived
Headers show
Series fs: allow inode time modification with IOCB_NOWAIT | expand

Commit Message

Jens Axboe July 1, 2022, 8:09 p.m. UTC
generic/471 complains because it expects any write done with RWF_NOWAIT
to succeed as long as the blocks for the write are already instantiated.
This isn't necessarily a correct assumption, as there are other conditions
that can cause an RWF_NOWAIT write to fail with -EAGAIN even if the range
is already there.

Since the risk of blocking off this path is minor, just allow inode
time updates with IOCB_NOWAIT set. Then we can later decide if we should
catch this further down the stack.

Fixes: 4faa13bd5d3b ("fs: Add async write file modification handling.")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/inode.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Christoph Hellwig July 2, 2022, 6:05 a.m. UTC | #1
On Fri, Jul 01, 2022 at 02:09:32PM -0600, Jens Axboe wrote:
> generic/471 complains because it expects any write done with RWF_NOWAIT
> to succeed as long as the blocks for the write are already instantiated.
> This isn't necessarily a correct assumption, as there are other conditions
> that can cause an RWF_NOWAIT write to fail with -EAGAIN even if the range
> is already there.
> 
> Since the risk of blocking off this path is minor, just allow inode
> time updates with IOCB_NOWAIT set. Then we can later decide if we should
> catch this further down the stack.

I think this is broken.  Please just drop the test, the non-blocking
behavior here makes a lot of sense.  At least for XFS, the update
will end up allocating and commit a transaction which involves memory
allocation, a blocking lock and possibly waiting for lock space.
Jens Axboe July 2, 2022, 3:45 p.m. UTC | #2
On 7/2/22 12:05 AM, Christoph Hellwig wrote:
> On Fri, Jul 01, 2022 at 02:09:32PM -0600, Jens Axboe wrote:
>> generic/471 complains because it expects any write done with RWF_NOWAIT
>> to succeed as long as the blocks for the write are already instantiated.
>> This isn't necessarily a correct assumption, as there are other conditions
>> that can cause an RWF_NOWAIT write to fail with -EAGAIN even if the range
>> is already there.
>>
>> Since the risk of blocking off this path is minor, just allow inode
>> time updates with IOCB_NOWAIT set. Then we can later decide if we should
>> catch this further down the stack.
> 
> I think this is broken.  Please just drop the test, the non-blocking
> behavior here makes a lot of sense.  At least for XFS, the update
> will end up allocating and commit a transaction which involves memory
> allocation, a blocking lock and possibly waiting for lock space.

I'm fine with that, I've made my opinions on that test case clear in
previous emails. I'll drop the patch for now.

I will say though that even in low memory testing, I never saw XFS block
off the inode time update. So at least we have room for future
improvements here, it's wasteful to return -EAGAIN here when the vast
majority of time updates don't end up blocking.

One issue there too is that, by default, XFS uses a high granularity
threshold for when the time should be updated, making the problem worse.
Dave Chinner July 3, 2022, 12:06 a.m. UTC | #3
On Sat, Jul 02, 2022 at 09:45:23AM -0600, Jens Axboe wrote:
> On 7/2/22 12:05 AM, Christoph Hellwig wrote:
> > On Fri, Jul 01, 2022 at 02:09:32PM -0600, Jens Axboe wrote:
> >> generic/471 complains because it expects any write done with RWF_NOWAIT
> >> to succeed as long as the blocks for the write are already instantiated.
> >> This isn't necessarily a correct assumption, as there are other conditions
> >> that can cause an RWF_NOWAIT write to fail with -EAGAIN even if the range
> >> is already there.
> >>
> >> Since the risk of blocking off this path is minor, just allow inode
> >> time updates with IOCB_NOWAIT set. Then we can later decide if we should
> >> catch this further down the stack.
> > 
> > I think this is broken.  Please just drop the test, the non-blocking
> > behavior here makes a lot of sense.  At least for XFS, the update
> > will end up allocating and commit a transaction which involves memory
> > allocation, a blocking lock and possibly waiting for lock space.
> 
> I'm fine with that, I've made my opinions on that test case clear in
> previous emails. I'll drop the patch for now.
> 
> I will say though that even in low memory testing, I never saw XFS block
> off the inode time update. So at least we have room for future
> improvements here, it's wasteful to return -EAGAIN here when the vast
> majority of time updates don't end up blocking.

It's not low memory testing that you should be concerned about -
it's when the journal runs out of space that you'll get long,
unbound latencies waiting for timestamp updates. Waiting for journal
space to become available could, in the worst case, entail waiting
for tens of thousands of small random metadata IOs to be submitted
and completed....

> One issue there too is that, by default, XFS uses a high granularity
> threshold for when the time should be updated, making the problem worse.

That's not an XFS issue - we're just following the VFS rules for
when mtime needs to be changed. If you want to avoid frequent
transactional (on-disk) mtime updates, then use the lazytime mount
option to limit on-disk mtime updates to once per day.

Cheers,

Dave.
Jens Axboe July 3, 2022, 12:45 p.m. UTC | #4
On 7/2/22 6:06 PM, Dave Chinner wrote:
> On Sat, Jul 02, 2022 at 09:45:23AM -0600, Jens Axboe wrote:
>> On 7/2/22 12:05 AM, Christoph Hellwig wrote:
>>> On Fri, Jul 01, 2022 at 02:09:32PM -0600, Jens Axboe wrote:
>>>> generic/471 complains because it expects any write done with RWF_NOWAIT
>>>> to succeed as long as the blocks for the write are already instantiated.
>>>> This isn't necessarily a correct assumption, as there are other conditions
>>>> that can cause an RWF_NOWAIT write to fail with -EAGAIN even if the range
>>>> is already there.
>>>>
>>>> Since the risk of blocking off this path is minor, just allow inode
>>>> time updates with IOCB_NOWAIT set. Then we can later decide if we should
>>>> catch this further down the stack.
>>>
>>> I think this is broken.  Please just drop the test, the non-blocking
>>> behavior here makes a lot of sense.  At least for XFS, the update
>>> will end up allocating and commit a transaction which involves memory
>>> allocation, a blocking lock and possibly waiting for lock space.
>>
>> I'm fine with that, I've made my opinions on that test case clear in
>> previous emails. I'll drop the patch for now.
>>
>> I will say though that even in low memory testing, I never saw XFS block
>> off the inode time update. So at least we have room for future
>> improvements here, it's wasteful to return -EAGAIN here when the vast
>> majority of time updates don't end up blocking.
> 
> It's not low memory testing that you should be concerned about -
> it's when the journal runs out of space that you'll get long,
> unbound latencies waiting for timestamp updates. Waiting for journal
> space to become available could, in the worst case, entail waiting
> for tens of thousands of small random metadata IOs to be submitted
> and completed....

Right, I know it's not a specifically targeted test. But testing
blocking on a new journal space would certainly be interesting in
itself, if someone where to make xfs_vn_update_time() honor IOCB_NOWAIT
for that.

More realistic is probably just checking SB_LAZYTIME and allowing
IOCB_NOWAIT for that.

>> One issue there too is that, by default, XFS uses a high granularity
>> threshold for when the time should be updated, making the problem worse.
> 
> That's not an XFS issue - we're just following the VFS rules for
> when mtime needs to be changed. If you want to avoid frequent
> transactional (on-disk) mtime updates, then use the lazytime mount
> option to limit on-disk mtime updates to once per day.

Seems like that would be a better default (for anyone, not just XFS),
but that's more likely a bigger can of worms I don't have any interest
in opening :-)
diff mbox series

Patch

diff --git a/fs/inode.c b/fs/inode.c
index 259ebf438893..98a48fbfa0ad 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2150,8 +2150,6 @@  static int file_modified_flags(struct file *file, int flags)
 	ret = inode_needs_update_time(inode, &now);
 	if (ret <= 0)
 		return ret;
-	if (flags & IOCB_NOWAIT)
-		return -EAGAIN;
 
 	return __file_update_time(file, &now, ret);
 }