Message ID | 20160922123143.GO2834@quack2.suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Sep 22, 2016 at 02:31:43PM +0200, Jan Kara wrote: > So I think what Christoph meant in this case is something like attached > patch. That achieves more than your dirty hack in a much cleaner way. > Beware, the patch is only compile-tested. Yes, that's exactly what I had in mind. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 22, 2016 at 02:31:43PM +0200, Jan Kara wrote: > > So I think what Christoph meant in this case is something like attached > patch. That achieves more than your dirty hack in a much cleaner way. > Beware, the patch is only compile-tested. Your patch also disables dioread_nolock (which is what I think Christoph was asking about because it's the rest of the dioread nolock support code which causes the eye-bleeding complexity on the write path). > Then there is the case of unlocked direct IO overwrites which we allow to > run without inode_lock in dioread_nolock mode as well and that is more > difficult to resolve (there lay the problems with blocksize < pagesize you > speak about). Right, by disabling dioread_nolock, it means we lose the feature that dioread_nolock doesn't require blocking versus _any_ direct I/O writes (because of the post-write uninit->init conversion) --- not just DIO overwrites. But we should be able to support dioread_nolock as well as by only taking inode_lock_shared() in the non-dioread_nolock case, I think. Thanks for the prototype patch; I agree it's a cleaner way to go. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I've been looking at your patch and testing it, and it looks like works fine as-is. The other thing I like about this patch is it allows us to drop ext4_inode_{block,resume}_unlocked_dio() and EXT4_STATE_DIOREAD_LOCK. Thanks!! - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri 30-09-16 01:22:33, Ted Tso wrote: > I've been looking at your patch and testing it, and it looks like > works fine as-is. Glad to hear that! > The other thing I like about this patch is it allows us to drop > ext4_inode_{block,resume}_unlocked_dio() and EXT4_STATE_DIOREAD_LOCK. Well, not completely - we still need to deal with unlocked writes that happen in ext4_direct_IO_write() when overwrite == 1. But that should be doable in a similar way. We could just demote inode_lock to a shared mode instead of dropping it in ext4_direct_IO_write() and then we could drop the bits you mention above. I can look into that when I'll be looking into converting ext4 into the new iomap infrastructure but that's definitely material for the next merge window. Honza
From 7de4ca30e0c897bbdd49eae0fdec5132744c105a Mon Sep 17 00:00:00 2001 From: Jan Kara <jack@suse.cz> Date: Thu, 22 Sep 2016 14:20:15 +0200 Subject: [PATCH] ext4: Allow parallel DIO reads We can easily support parallel direct IO reads. We only have to make sure we cannot expose uninitialized data by reading allocated block to which data was not written yet, or which was already truncated. That is easily achieved by holding inode_lock in shared mode - that excludes all writes, truncates, hole punches. We also have to guard against page writeback allocating blocks for delay-allocated pages - that race is handled by the fact that we writeback all the pages in the affected range and the lock protects us from new pages being created there. Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext4/inode.c | 40 ++++++++++++++++++---------------------- 1 file changed, 18 insertions(+), 22 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index c6ea25a190f8..0af52f012bfb 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3526,35 +3526,31 @@ out: static ssize_t ext4_direct_IO_read(struct kiocb *iocb, struct iov_iter *iter) { - int unlocked = 0; - struct inode *inode = iocb->ki_filp->f_mapping->host; + struct address_space *mapping = iocb->ki_filp->f_mapping; + struct inode *inode = mapping->host; ssize_t ret; - if (ext4_should_dioread_nolock(inode)) { - /* - * Nolock dioread optimization may be dynamically disabled - * via ext4_inode_block_unlocked_dio(). Check inode's state - * while holding extra i_dio_count ref. - */ - inode_dio_begin(inode); - smp_mb(); - if (unlikely(ext4_test_inode_state(inode, - EXT4_STATE_DIOREAD_LOCK))) - inode_dio_end(inode); - else - unlocked = 1; - } + /* + * Shared inode_lock is enough for us - it protects against concurrent + * writes & truncates and since we take care of writing back page cache, + * we are protected against page writeback as well. + */ + inode_lock_shared(inode); if (IS_DAX(inode)) { - ret = dax_do_io(iocb, inode, iter, ext4_dio_get_block, - NULL, unlocked ? 0 : DIO_LOCKING); + ret = dax_do_io(iocb, inode, iter, ext4_dio_get_block, NULL, 0); } else { + size_t count = iov_iter_count(iter); + + ret = filemap_write_and_wait_range(mapping, iocb->ki_pos, + iocb->ki_pos + count); + if (ret) + goto out_unlock; ret = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter, ext4_dio_get_block, - NULL, NULL, - unlocked ? 0 : DIO_LOCKING); + NULL, NULL, 0); } - if (unlocked) - inode_dio_end(inode); +out_unlock: + inode_unlock_shared(inode); return ret; } -- 2.6.6