diff mbox

[GIT,PULL] aio: a couple of fixes for 4.4

Message ID 20160109220826.GA11174@kvack.org (mailing list archive)
State New, archived
Headers show

Commit Message

Benjamin LaHaise Jan. 9, 2016, 10:08 p.m. UTC
Hello Linus et al,

Please consider pulling the following changes to fix a couple of issues
reported by Dmitry from git://git.kvack.org/~bcrl/aio-fixes.git .  Thanks!

		-ben

Benjamin LaHaise (1):
  aio: handle integer overflow in io_getevents() timespec usage

Jan Kara (1):
  aio: Fix freeze protection of aio writes

 fs/aio.c           | 33 ++++++++++++++++++++++++++++++---
 include/linux/fs.h |  1 +
 2 files changed, 31 insertions(+), 3 deletions(-)

Comments

Linus Torvalds Jan. 9, 2016, 10:43 p.m. UTC | #1
On Sat, Jan 9, 2016 at 2:08 PM, Benjamin LaHaise <bcrl@kvack.org> wrote:
>
> Please consider pulling the following changes to fix a couple of issues
> reported by Dmitry from git://git.kvack.org/~bcrl/aio-fixes.git .

No. This is much too late for this kind of hackery. That second patch
in particular is both subtle and ugly, and is messing with lockdep.

No way will I take something like this the last fay before a release.

It's not even a regression, nor did you send me anything at all for
this release. Trying to sneak something in just before 4.4 is not ok.

                     Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin LaHaise Jan. 9, 2016, 11 p.m. UTC | #2
On Sat, Jan 09, 2016 at 02:43:44PM -0800, Linus Torvalds wrote:
> On Sat, Jan 9, 2016 at 2:08 PM, Benjamin LaHaise <bcrl@kvack.org> wrote:
> >
> > Please consider pulling the following changes to fix a couple of issues
> > reported by Dmitry from git://git.kvack.org/~bcrl/aio-fixes.git .
> 
> No. This is much too late for this kind of hackery. That second patch
> in particular is both subtle and ugly, and is messing with lockdep.

I wasn't particularly fond of how Jan implemented that.  Any ideas on a 
better way to avoid lockdep complaining about this?  My initial feedback 
to Jan was exactly that this should be handled within the file_start_write() 
and file_end_write() APIs -- AIO really shouldn't need to be mucking in what 
ought to be hidden behind that API.

> No way will I take something like this the last fay before a release.
> 
> It's not even a regression, nor did you send me anything at all for
> this release. Trying to sneak something in just before 4.4 is not ok.

Okay, no worries.

		-ben

>                      Linus
Al Viro Jan. 9, 2016, 11:08 p.m. UTC | #3
On Sat, Jan 09, 2016 at 02:43:44PM -0800, Linus Torvalds wrote:
> On Sat, Jan 9, 2016 at 2:08 PM, Benjamin LaHaise <bcrl@kvack.org> wrote:
> >
> > Please consider pulling the following changes to fix a couple of issues
> > reported by Dmitry from git://git.kvack.org/~bcrl/aio-fixes.git .
> 
> No. This is much too late for this kind of hackery. That second patch
> in particular is both subtle and ugly, and is messing with lockdep.
> 
> No way will I take something like this the last fay before a release.
> 
> It's not even a regression, nor did you send me anything at all for
> this release. Trying to sneak something in just before 4.4 is not ok.

BTW, right now vfs.git#for-linus contains minimal compat_ioctl patches; unless
you want those in right now, I'm holding them back until Monday.  It's
two patches Jann has posted (with trivial fix in the first one folded in) +
not passing fd around where it's not needed (basically, those calls of
do_ioctl() in there are guaranteed to go into vfs_ioctl() - the paths in
do_vfs_ioctl() that care about the descriptor numbers are not reachable
with the arguments do_ioctl() is getting).

If for some reason you want those in before -final - yell and I'll send
a pull request, otherwise they'll be in one of the first pull requests
on Monday.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds Jan. 10, 2016, 4:36 a.m. UTC | #4
On Sat, Jan 9, 2016 at 3:08 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> If for some reason you want those in before -final - yell and I'll send
> a pull request, otherwise they'll be in one of the first pull requests
> on Monday.

No, I definitely don't want anything now unless it's a major
regression or security issue. Other stuff can wait until the merge
window and perhaps be marked for stable if required. That way they'll
get testing.

              Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Jan. 11, 2016, 4:59 p.m. UTC | #5
On Sat 09-01-16 18:00:36, Benjamin LaHaise wrote:
> On Sat, Jan 09, 2016 at 02:43:44PM -0800, Linus Torvalds wrote:
> > On Sat, Jan 9, 2016 at 2:08 PM, Benjamin LaHaise <bcrl@kvack.org> wrote:
> > >
> > > Please consider pulling the following changes to fix a couple of issues
> > > reported by Dmitry from git://git.kvack.org/~bcrl/aio-fixes.git .
> > 
> > No. This is much too late for this kind of hackery. That second patch
> > in particular is both subtle and ugly, and is messing with lockdep.
> 
> I wasn't particularly fond of how Jan implemented that.  Any ideas on a 
> better way to avoid lockdep complaining about this?  My initial feedback 
> to Jan was exactly that this should be handled within the file_start_write() 
> and file_end_write() APIs -- AIO really shouldn't need to be mucking in what 
> ought to be hidden behind that API.

[To introduce Peter to what we are talking about: This is about filesystem
freezing protection implemented effectively as a RW semaphore which must be
held for reading whenever we modify filesystem and held for writing when we
want to freeze the filesystem. With AIO, we need to return to userspace
with the lock held and thus lockdep complains.]

Thinking about this some more maybe we could improve lockdep in the following
way:

We could flag a locking class that it is OK to return with this lock to
userspace. When returning to userspace with such lock, lockdep would just
silently discard the information about lock being held. When freeing such
lock without lockdep knowing we acquired it (i.e., we were called by
userspace / IRQ when holding the lock), lockdep would just silently ignore
such request.

This way we could avoid the hacks with telling lockdep we dropped the lock
when we actually did not and we would also get improved coverage to the
current state. Peter, would something like the above be doable?

Bonus would be if we verified that all locks held when releasing lock we
"inherited" rank below the lock we are releasing (that way we would
simulate acquiring the lock when entering kernel from userspace). However
we would have to be careful about handling locks acquired via trylock
(those cannot create deadlocks) or when release happens from softirq /
irq - there we care only about locks acquired since softirq started - this
happens e.g. when we release the lock on IO completion like in the AIO
case.

								Honza
diff mbox

Patch

diff --git a/fs/aio.c b/fs/aio.c
index 155f842..e0d5398 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1269,6 +1269,8 @@  static long read_events(struct kioctx *ctx, long min_nr, long nr,
 
 		if (unlikely(copy_from_user(&ts, timeout, sizeof(ts))))
 			return -EFAULT;
+		if (!timespec_valid(&ts))
+			return -EINVAL;
 
 		until = timespec_to_ktime(ts);
 	}
-- 
2.5.0

From 3b9688ff1e083a3c981bbc795f823fb0b0f2aacc Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 7 Jan 2016 16:03:04 +0100
Subject: [PATCH 2/2] aio: Fix freeze protection of aio writes

Currently we dropped freeze protection of aio writes just after IO was
submitted. Thus aio write could be in flight while the filesystem was
frozen and that could result in unexpected situation like aio completion
wanting to convert extent type on frozen filesystem. Testcase from
Dmitry triggering this is like:

for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done &
fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
    --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite

Fix the problem by dropping freeze protection only once IO is completed
in aio_complete().

Reported-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>

diff --git a/fs/aio.c b/fs/aio.c
index e0d5398..a574944 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1065,6 +1065,19 @@  static void aio_complete(struct kiocb *kiocb, long res, long res2)
 	unsigned tail, pos, head;
 	unsigned long	flags;
 
+	if (kiocb->ki_flags & IOCB_WRITE) {
+		struct file *f = kiocb->ki_filp;
+
+		/*
+		 * Tell lockdep we inherited freeze protection from submission
+		 * thread.
+		 */
+		percpu_rwsem_acquire(
+			&f->f_inode->i_sb->s_writers.rw_sem[SB_FREEZE_WRITE-1],
+			1, _THIS_IP_);
+		file_end_write(f);
+	}
+
 	/*
 	 * Special case handling for sync iocbs:
 	 *  - events go directly into the iocb for fast handling
@@ -1451,13 +1464,25 @@  rw_common:
 
 		len = ret;
 
-		if (rw == WRITE)
+		if (rw == WRITE) {
 			file_start_write(file);
+			req->ki_flags |= IOCB_WRITE;
+		}
 
 		ret = iter_op(req, &iter);
 
-		if (rw == WRITE)
-			file_end_write(file);
+		if (rw == WRITE) {
+			/*
+			 * We release freeze protection in aio_complete(). Fool
+			 * lockdep by telling it the lock got released so that
+			 * it doesn't complain about held lock when we return
+			 * to userspace.
+			 */
+			percpu_rwsem_release(
+				&file->f_inode->i_sb->s_writers.rw_sem[SB_FREEZE_WRITE-1],
+				1, _THIS_IP_);
+		}
+
 		kfree(iovec);
 		break;
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3aa5142..54af40e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -319,6 +319,7 @@  struct writeback_control;
 #define IOCB_EVENTFD		(1 << 0)
 #define IOCB_APPEND		(1 << 1)
 #define IOCB_DIRECT		(1 << 2)
+#define IOCB_WRITE		(1 << 3)
 
 struct kiocb {
 	struct file		*ki_filp;