diff mbox series

block: add filemap_invalidate_lock_killable()

Message ID 3392d41c-5477-118a-677f-5780f9cedf95@I-love.SAKURA.ne.jp (mailing list archive)
State New, archived
Headers show
Series block: add filemap_invalidate_lock_killable() | expand

Commit Message

Tetsuo Handa Jan. 3, 2022, 10:49 a.m. UTC
syzbot is reporting hung task at blkdev_fallocate() [1], for it can take
minutes with mapping->invalidate_lock held. Since fallocate() has to accept
size > MAX_RW_COUNT bytes, we can't predict how long it will take. Thus,
mitigate this problem by using killable wait where possible.

  ----------
  #define _GNU_SOURCE
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <fcntl.h>
  #include <unistd.h>

  int main(int argc, char *argv[])
  {
    fork();
    fallocate(open("/dev/nullb0", O_RDWR), 0x11ul, 0ul, 0x7ffffffffffffffful);
    return 0;
  }
  ----------

Link: https://syzkaller.appspot.com/bug?extid=39b75c02b8be0a061bfc [1]
Reported-by: syzbot <syzbot+39b75c02b8be0a061bfc@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 block/fops.c       | 4 +++-
 include/linux/fs.h | 5 +++++
 2 files changed, 8 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig Jan. 4, 2022, 7:14 a.m. UTC | #1
On Mon, Jan 03, 2022 at 07:49:11PM +0900, Tetsuo Handa wrote:
> syzbot is reporting hung task at blkdev_fallocate() [1], for it can take
> minutes with mapping->invalidate_lock held. Since fallocate() has to accept
> size > MAX_RW_COUNT bytes, we can't predict how long it will take. Thus,
> mitigate this problem by using killable wait where possible.

Well, but that also means we want all other users of the invalidate_lock
to be killable, as fallocate vs fallocate synchronization is probably
not the interesting case.

Or we should limit the locked batch size of block device fallocates that
actually do write zeroes, which never really was the intent of the
fallocate interface to start with..
Tetsuo Handa Jan. 4, 2022, 1:26 p.m. UTC | #2
On 2022/01/04 16:14, Christoph Hellwig wrote:
> On Mon, Jan 03, 2022 at 07:49:11PM +0900, Tetsuo Handa wrote:
>> syzbot is reporting hung task at blkdev_fallocate() [1], for it can take
>> minutes with mapping->invalidate_lock held. Since fallocate() has to accept
>> size > MAX_RW_COUNT bytes, we can't predict how long it will take. Thus,
>> mitigate this problem by using killable wait where possible.
> 
> Well, but that also means we want all other users of the invalidate_lock
> to be killable, as fallocate vs fallocate synchronization is probably
> not the interesting case.

Right. But being responsive to SIGKILL is generally preferable.

syzbot (and other syzkaller based fuzzing) is reporting many hung task reports,
but many of such reports are simply overstressing.

We can't use killable lock wait for release operation because it is a "void"
function. But we can use killable lock wait for majority of operations which
are not "void" functions. Use of killable lock wait where possible can improve
situation.
Tetsuo Handa Feb. 12, 2022, 8:28 a.m. UTC | #3
On 2022/01/04 22:26, Tetsuo Handa wrote:
> On 2022/01/04 16:14, Christoph Hellwig wrote:
>> On Mon, Jan 03, 2022 at 07:49:11PM +0900, Tetsuo Handa wrote:
>>> syzbot is reporting hung task at blkdev_fallocate() [1], for it can take
>>> minutes with mapping->invalidate_lock held. Since fallocate() has to accept
>>> size > MAX_RW_COUNT bytes, we can't predict how long it will take. Thus,
>>> mitigate this problem by using killable wait where possible.
>>
>> Well, but that also means we want all other users of the invalidate_lock
>> to be killable, as fallocate vs fallocate synchronization is probably
>> not the interesting case.
> 
> Right. But being responsive to SIGKILL is generally preferable.
> 
> syzbot (and other syzkaller based fuzzing) is reporting many hung task reports,
> but many of such reports are simply overstressing.
> 
> We can't use killable lock wait for release operation because it is a "void"
> function. But we can use killable lock wait for majority of operations which
> are not "void" functions. Use of killable lock wait where possible can improve
> situation.
> 

If there is no alternative, can we apply this patch?
Christoph Hellwig Feb. 14, 2022, 8:17 a.m. UTC | #4
On Sat, Feb 12, 2022 at 05:28:09PM +0900, Tetsuo Handa wrote:
> On 2022/01/04 22:26, Tetsuo Handa wrote:
> > On 2022/01/04 16:14, Christoph Hellwig wrote:
> >> On Mon, Jan 03, 2022 at 07:49:11PM +0900, Tetsuo Handa wrote:
> >>> syzbot is reporting hung task at blkdev_fallocate() [1], for it can take
> >>> minutes with mapping->invalidate_lock held. Since fallocate() has to accept
> >>> size > MAX_RW_COUNT bytes, we can't predict how long it will take. Thus,
> >>> mitigate this problem by using killable wait where possible.
> >>
> >> Well, but that also means we want all other users of the invalidate_lock
> >> to be killable, as fallocate vs fallocate synchronization is probably
> >> not the interesting case.
> > 
> > Right. But being responsive to SIGKILL is generally preferable.
> > 
> > syzbot (and other syzkaller based fuzzing) is reporting many hung task reports,
> > but many of such reports are simply overstressing.
> > 
> > We can't use killable lock wait for release operation because it is a "void"
> > function. But we can use killable lock wait for majority of operations which
> > are not "void" functions. Use of killable lock wait where possible can improve
> > situation.
> > 
> 
> If there is no alternative, can we apply this patch?

As mentioned before I do not think this patch makes sense.  If running
fallocate on the block device under the invalidate lock creates a problem
with long hold time we must get it out from under the lock, not turn one
random caller into a killable lock acquisition.
diff mbox series

Patch

diff --git a/block/fops.c b/block/fops.c
index 0da147edbd18..a87050db4670 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -622,7 +622,9 @@  static long blkdev_fallocate(struct file *file, int mode, loff_t start,
 	if ((start | len) & (bdev_logical_block_size(bdev) - 1))
 		return -EINVAL;
 
-	filemap_invalidate_lock(inode->i_mapping);
+	/* fallocate() might take minutes with lock held. */
+	if (filemap_invalidate_lock_killable(inode->i_mapping))
+		return -EINTR;
 
 	/* Invalidate the page cache, including dirty pages. */
 	error = truncate_bdev_range(bdev, file->f_mode, start, end);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bbf812ce89a8..27b3d36bb73c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -828,6 +828,11 @@  static inline void filemap_invalidate_lock(struct address_space *mapping)
 	down_write(&mapping->invalidate_lock);
 }
 
+static inline int filemap_invalidate_lock_killable(struct address_space *mapping)
+{
+	return down_write_killable(&mapping->invalidate_lock);
+}
+
 static inline void filemap_invalidate_unlock(struct address_space *mapping)
 {
 	up_write(&mapping->invalidate_lock);