Message ID | 1499959036-9275-1-git-send-email-lczerner@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Lukas, [auto build test WARNING on linus/master] [also build test WARNING on v4.12 next-20170713] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Lukas-Czerner/fs-Fix-page-cache-inconsistency-when-mixing-buffered-and-AIO-DIO/20170714-181130 config: x86_64-randconfig-x010-201728 (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 Note: it may well be a FALSE warning. FWIW you are at least aware of it now. http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings All warnings (new ones prefixed by >>): fs/iomap.c: In function 'iomap_dio_complete': >> fs/iomap.c:629:25: warning: 'ret' may be used uninitialized in this function [-Wmaybe-uninitialized] (offset + dio->size + ret - 1) >> PAGE_SHIFT); ~~~~~~~~~~~~~~~~~~~^~~~~ vim +/ret +629 fs/iomap.c 618 619 static ssize_t iomap_dio_complete(struct iomap_dio *dio) 620 { 621 struct kiocb *iocb = dio->iocb; 622 loff_t offset = iocb->ki_pos; 623 struct inode *inode = file_inode(iocb->ki_filp); 624 ssize_t ret; 625 626 if ((dio->flags & IOMAP_DIO_WRITE) && inode->i_mapping->nrpages) 627 invalidate_inode_pages2_range(inode->i_mapping, 628 offset >> PAGE_SHIFT, > 629 (offset + dio->size + ret - 1) >> PAGE_SHIFT); 630 631 if (dio->end_io) { 632 ret = dio->end_io(iocb, 633 dio->error ? dio->error : dio->size, 634 dio->flags); 635 } else { 636 ret = dio->error; 637 } 638 639 if (likely(!ret)) { 640 ret = dio->size; 641 /* check for short read */ 642 if (iocb->ki_pos + ret > dio->i_size && 643 !(dio->flags & IOMAP_DIO_WRITE)) 644 ret = dio->i_size - iocb->ki_pos; 645 iocb->ki_pos += ret; 646 } 647 648 inode_dio_end(file_inode(iocb->ki_filp)); 649 kfree(dio); 650 651 return ret; 652 } 653 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Fri, Jul 14, 2017 at 06:41:52PM +0800, kbuild test robot wrote: > Hi Lukas, > > [auto build test WARNING on linus/master] > [also build test WARNING on v4.12 next-20170713] > [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] > > url: https://github.com/0day-ci/linux/commits/Lukas-Czerner/fs-Fix-page-cache-inconsistency-when-mixing-buffered-and-AIO-DIO/20170714-181130 > config: x86_64-randconfig-x010-201728 (attached as .config) > compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 > reproduce: > # save the attached .config to linux build tree > make ARCH=x86_64 > > Note: it may well be a FALSE warning. FWIW you are at least aware of it now. > http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings > > All warnings (new ones prefixed by >>): > > fs/iomap.c: In function 'iomap_dio_complete': > >> fs/iomap.c:629:25: warning: 'ret' may be used uninitialized in this function [-Wmaybe-uninitialized] > (offset + dio->size + ret - 1) >> PAGE_SHIFT); > ~~~~~~~~~~~~~~~~~~~^~~~~ Oops, right, this is obviously wrong. Thanks! -Lukas > > vim +/ret +629 fs/iomap.c > > 618 > 619 static ssize_t iomap_dio_complete(struct iomap_dio *dio) > 620 { > 621 struct kiocb *iocb = dio->iocb; > 622 loff_t offset = iocb->ki_pos; > 623 struct inode *inode = file_inode(iocb->ki_filp); > 624 ssize_t ret; > 625 > 626 if ((dio->flags & IOMAP_DIO_WRITE) && inode->i_mapping->nrpages) > 627 invalidate_inode_pages2_range(inode->i_mapping, > 628 offset >> PAGE_SHIFT, > > 629 (offset + dio->size + ret - 1) >> PAGE_SHIFT); > 630 > 631 if (dio->end_io) { > 632 ret = dio->end_io(iocb, > 633 dio->error ? dio->error : dio->size, > 634 dio->flags); > 635 } else { > 636 ret = dio->error; > 637 } > 638 > 639 if (likely(!ret)) { > 640 ret = dio->size; > 641 /* check for short read */ > 642 if (iocb->ki_pos + ret > dio->i_size && > 643 !(dio->flags & IOMAP_DIO_WRITE)) > 644 ret = dio->i_size - iocb->ki_pos; > 645 iocb->ki_pos += ret; > 646 } > 647 > 648 inode_dio_end(file_inode(iocb->ki_filp)); > 649 kfree(dio); > 650 > 651 return ret; > 652 } > 653 > > --- > 0-DAY kernel test infrastructure Open Source Technology Center > https://lists.01.org/pipermail/kbuild-all Intel Corporation
diff --git a/fs/direct-io.c b/fs/direct-io.c index 08cf278..2db9ada 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -258,6 +258,11 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, bool is_async) if (ret == 0) ret = transferred; + if ((dio->op == REQ_OP_WRITE && dio->inode->i_mapping->nrpages)) + invalidate_inode_pages2_range(dio->inode->i_mapping, + offset >> PAGE_SHIFT, + (offset + ret - 1) >> PAGE_SHIFT); + if (dio->end_io) { int err; @@ -304,6 +309,7 @@ static void dio_bio_end_aio(struct bio *bio) struct dio *dio = bio->bi_private; unsigned long remaining; unsigned long flags; + bool defer_completion = false; /* cleanup the bio */ dio_bio_complete(dio, bio); @@ -315,7 +321,19 @@ static void dio_bio_end_aio(struct bio *bio) spin_unlock_irqrestore(&dio->bio_lock, flags); if (remaining == 0) { - if (dio->result && dio->defer_completion) { + /* + * Defer completion when defer_completion is set or + * when the inode has pages mapped and this is AIO write. + * We need to invalidate those pages because there is a + * chance they contain stale data in the case buffered IO + * went in between AIO submission and completion into the + * same region. + */ + if (dio->result) + defer_completion = dio->defer_completion || + (dio->op == REQ_OP_WRITE && + dio->inode->i_mapping->nrpages); + if (defer_completion) { INIT_WORK(&dio->complete_work, dio_aio_complete_work); queue_work(dio->inode->i_sb->s_dio_done_wq, &dio->complete_work); @@ -1210,10 +1228,13 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, * For AIO O_(D)SYNC writes we need to defer completions to a workqueue * so that we can call ->fsync. */ - if (dio->is_async && iov_iter_rw(iter) == WRITE && - ((iocb->ki_filp->f_flags & O_DSYNC) || - IS_SYNC(iocb->ki_filp->f_mapping->host))) { - retval = dio_set_defer_completion(dio); + if (dio->is_async && iov_iter_rw(iter) == WRITE) { + retval = 0; + if ((iocb->ki_filp->f_flags & O_DSYNC) || + IS_SYNC(iocb->ki_filp->f_mapping->host)) + retval = dio_set_defer_completion(dio); + else if (!dio->inode->i_sb->s_dio_done_wq) + retval = sb_init_dio_done_wq(dio->inode->i_sb); if (retval) { /* * We grab i_mutex only for reads so we don't have diff --git a/fs/iomap.c b/fs/iomap.c index 1732228..a1ad4ca 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -713,8 +713,15 @@ struct iomap_dio { static ssize_t iomap_dio_complete(struct iomap_dio *dio) { struct kiocb *iocb = dio->iocb; + loff_t offset = iocb->ki_pos; + struct inode *inode = file_inode(iocb->ki_filp); ssize_t ret; + if ((dio->flags & IOMAP_DIO_WRITE) && inode->i_mapping->nrpages) + invalidate_inode_pages2_range(inode->i_mapping, + offset >> PAGE_SHIFT, + (offset + dio->size + ret - 1) >> PAGE_SHIFT); + if (dio->end_io) { ret = dio->end_io(iocb, dio->error ? dio->error : dio->size,
Currently when mixing buffered reads and asynchronous direct writes it is possible to end up with the situation where we have stale data in the page cache while the new data is already written to disk. This is permanent until the affected pages are flushed away. Despite the fact that mixing buffered and direct IO is ill-advised it does pose a thread for a data integrity, is unexpected and should be fixed. Fix this by deferring completion of asynchronous direct writes to a process context in the case that there are mapped pages to be found in the inode. Later before the completion in dio_complete() invalidate the pages in question. This ensures that after the completion the pages in the written area are either unmapped, or populated with up-to-date data. Also do the same for the iomap case which uses iomap_dio_complete() instead. This has a side effect of deferring the completion to a process context for every AIO DIO that happens on inode that has pages mapped. However since the consensus is that this is ill-advised practice the performance implication should not be a problem. This was based on proposal from Jeff Moyer, thanks! Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Jeff Moyer <jmoyer@redhat.com> --- fs/direct-io.c | 31 ++++++++++++++++++++++++++----- fs/iomap.c | 7 +++++++ 2 files changed, 33 insertions(+), 5 deletions(-)