Message ID | 1477727070-18806-2-git-send-email-hch@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sat, Oct 29, 2016 at 09:44:29AM +0200, Christoph Hellwig wrote: > - if (rw == WRITE) > + if (rw == WRITE) { > file_start_write(file); > + req->ki_flags |= IOCB_WRITE; > + } > + if (rw == WRITE) { > + /* > + * We release freeze protection in aio_complete(). Fool > + * lockdep by telling it the lock got released so that > + * it doesn't complain about held lock when we return > + * to userspace. > + */ > + __sb_writers_release(file_inode(file)->i_sb, > + SB_FREEZE_WRITE); > + } How about taking this chunk (i.e. telling lockdep that we are not holding this thing) past the iter_op() call, where file_end_write() used to be? As it is, you risk hiding the lock dependencies the current mainline would've caught. Other than that I see no problems with the patch... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Oct 29, 2016 at 01:24:51PM +0100, Al Viro wrote: > How about taking this chunk (i.e. telling lockdep that we are not holding this > thing) past the iter_op() call, where file_end_write() used to be? We can't as that would not fix the use after free (at least for the lockdep case - otherwise the call is a no-op). Once iter_op returns aio_complete might have dropped our reference to the file, and another thread might have closed the fd so that the fput from aio_complete was the last one. This is something that xfstests/323 can reproduce under the right conditions. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Oct 29, 2016 at 05:20:17PM +0200, Christoph Hellwig wrote: > On Sat, Oct 29, 2016 at 01:24:51PM +0100, Al Viro wrote: > > How about taking this chunk (i.e. telling lockdep that we are not holding this > > thing) past the iter_op() call, where file_end_write() used to be? > > We can't as that would not fix the use after free (at least for the lockdep > case - otherwise the call is a no-op). Once iter_op returns aio_complete > might have dropped our reference to the file, and another thread might > have closed the fd so that the fput from aio_complete was the last one. > > This is something that xfstests/323 can reproduce under the right conditions. Eww... IOW, as soon as we'd submitted an async iocb, nobody can so much as look at struct file *or* iocb, right? Or underlying inode, or any fs-private data structures attached to it... I certainly agree that it's a bug, but IMO you are just papering over it. Just look at e.g. written = mapping->a_ops->direct_IO(iocb, &data); /* * Finally, try again to invalidate clean pages which might have been * cached by non-direct readahead, or faulted in by get_user_pages() * if the source of the write was an mmap'ed region of the file * we're writing. Either one is a pretty crazy thing to do, * so we don't support it 100%. If this invalidation * fails, tough, the write still worked... */ if (mapping->nrpages) { invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, end); } in generic_file_direct_write() - who said that mapping doesn't point into freed memory by that point? Or that inode_lock(inode); ret = generic_write_checks(iocb, from); if (ret > 0) ret = __generic_file_write_iter(iocb, from); inode_unlock(inode); in generic_file_write_iter() won't blow up on inode_unlock() of a freed inode? Or that anything of that sort won't happen in other ->write_iter instances, for that matter? NAK, with apologies for not having looked at that earlier. The bug is real, all right, but this is not a solution - both incomplete and far too brittle. Why do we play that kind of insane games, anyway? Why not e.g. refcount the (async) iocb and have kiocb_free() drop the reference, with io_submit_one() holding one across the call of aio_run_iocb()? Cacheline bouncing issues? Anything more subtle? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Oct 29, 2016 at 05:12:30PM +0100, Al Viro wrote: > NAK, with apologies for not having looked at that earlier. The bug is real, > all right, but this is not a solution - both incomplete and far too brittle. > > Why do we play that kind of insane games, anyway? Why not e.g. refcount > the (async) iocb and have kiocb_free() drop the reference, with io_submit_one() > holding one across the call of aio_run_iocb()? Cacheline bouncing issues? > Anything more subtle? PS: I'm not saying that refcounting kiocb is the best solution - grabbing an extra reference to struct file might be better (we have just dirtied that cacheline, so the fact that struct file is shared more than kiocb shouldn't matter much), but I really think that "file and everything attached to it might disappear as soon as you get async IO started" is insanely brittle - e.g. xfs_rw_iunlock(ip, iolock) in xfs_file_dio_aio_write() is also unsafe <checks -next - yup, still there> If struct file might be gone, so might struct inode and everything behind it. Which means that we either are not allowed to hold any locks across __blockdev_direct_IO(), or need have end_io() callback taking care of dropping those (and adjust the callers accordingly). It might be not impossible, but... *ouch* -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Oct 29, 2016 at 05:12:30PM +0100, Al Viro wrote: > Eww... IOW, as soon as we'd submitted an async iocb, nobody can so much as > look at struct file *or* iocb, right? Or underlying inode, or any fs-private > data structures attached to it... Yeah. > I certainly agree that it's a bug, but IMO you are just papering over it. > Just look at e.g. > written = mapping->a_ops->direct_IO(iocb, &data); > > /* > * Finally, try again to invalidate clean pages which might have been > * cached by non-direct readahead, or faulted in by get_user_pages() > * if the source of the write was an mmap'ed region of the file > * we're writing. Either one is a pretty crazy thing to do, > * so we don't support it 100%. If this invalidation > * fails, tough, the write still worked... > */ > if (mapping->nrpages) { > invalidate_inode_pages2_range(mapping, > pos >> PAGE_SHIFT, end); > } > in generic_file_direct_write() - who said that mapping doesn't point into > freed memory by that point? True, somewhat unlike due to inode caching, but for sure possible and something that needs to be deal with. > Why do we play that kind of insane games, anyway? Why not e.g. refcount > the (async) iocb and have kiocb_free() drop the reference, with io_submit_one() > holding one across the call of aio_run_iocb()? Cacheline bouncing issues? > Anything more subtle? There shouldn't really be a need to refcount the iocb itself - nothing worth looking at. The one we consider was struct file, and I didn't like it because of the cacheline bouncing if we decrement if from both the submitter and completion context. But it starts to sounds like the sane version more and more. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/aio.c b/fs/aio.c index 1157e13..bf315cd 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1078,6 +1078,17 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2) unsigned tail, pos, head; unsigned long flags; + if (kiocb->ki_flags & IOCB_WRITE) { + struct file *file = kiocb->ki_filp; + + /* + * Tell lockdep we inherited freeze protection from submission + * thread. + */ + __sb_writers_acquired(file_inode(file)->i_sb, SB_FREEZE_WRITE); + file_end_write(file); + } + /* * Special case handling for sync iocbs: * - events go directly into the iocb for fast handling @@ -1460,13 +1471,24 @@ static ssize_t aio_run_iocb(struct kiocb *req, unsigned opcode, return ret; } - if (rw == WRITE) + if (rw == WRITE) { file_start_write(file); + req->ki_flags |= IOCB_WRITE; + } + + if (rw == WRITE) { + /* + * We release freeze protection in aio_complete(). Fool + * lockdep by telling it the lock got released so that + * it doesn't complain about held lock when we return + * to userspace. + */ + __sb_writers_release(file_inode(file)->i_sb, + SB_FREEZE_WRITE); + } ret = iter_op(req, &iter); - if (rw == WRITE) - file_end_write(file); kfree(iovec); break; diff --git a/include/linux/fs.h b/include/linux/fs.h index 16d2b6e..db600e9 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -321,6 +321,7 @@ struct writeback_control; #define IOCB_HIPRI (1 << 3) #define IOCB_DSYNC (1 << 4) #define IOCB_SYNC (1 << 5) +#define IOCB_WRITE (1 << 6) struct kiocb { struct file *ki_filp;