Message ID | 20191126075719.1046485-1-damien.lemoal@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | f2fs: Fix direct IO handling | expand |
Hello Damien, IIUC, you are trying to fix a stale data read by DIO read for the case you explained in your patch w.r.t. DIO-write forced to write as buffIO. Coincidentally I was just looking at the same code path just now. So I do have a query to you/f2fs group. Below could be silly one, as I don't understand F2FS in great detail. How is the stale data by DIO read, is protected against a mmap writes via f2fs_vm_page_mkwrite? f2fs_vm_page_mkwrite() f2fs_direct_IO (read) filemap_write_and_wait_range() -> f2fs_get_blocks() -> submit_bio() -> set_page_dirty() Is above race possible with current f2fs code? i.e. f2fs_direct_IO could read the stale data from the blocks which were allocated due to mmap fault? Am I missing something here? -ritesh On 11/26/19 1:27 PM, Damien Le Moal wrote: > f2fs_preallocate_blocks() identifies direct IOs using the IOCB_DIRECT > flag for a kiocb structure. However, the file system direct IO handler > function f2fs_direct_IO() may have decided that a direct IO has to be > exececuted as a buffered IO using the function f2fs_force_buffered_io(). > This is the case for instance for volumes including zoned block device > and for unaligned write IOs with LFS mode enabled. > > These 2 different methods of identifying direct IOs can result in > inconsistencies generating stale data access for direct reads after a > direct IO write that is treated as a buffered write. Fix this > inconsistency by combining the IOCB_DIRECT flag test with the result > of f2fs_force_buffered_io(). > > Reported-by: Javier Gonzalez <javier@javigon.com> > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> > --- > fs/f2fs/data.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > index 5755e897a5f0..8ac2d3b70022 100644 > --- a/fs/f2fs/data.c > +++ b/fs/f2fs/data.c > @@ -1073,6 +1073,8 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) > int flag; > int err = 0; > bool direct_io = iocb->ki_flags & IOCB_DIRECT; > + bool do_direct_io = direct_io && > + !f2fs_force_buffered_io(inode, iocb, from); > > /* convert inline data for Direct I/O*/ > if (direct_io) { > @@ -1081,7 +1083,7 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) > return err; > } > > - if (direct_io && allow_outplace_dio(inode, iocb, from)) > + if (do_direct_io && allow_outplace_dio(inode, iocb, from)) > return 0; > > if (is_inode_flag_set(inode, FI_NO_PREALLOC)) >
On Tue, Nov 26, 2019 at 04:57:19PM +0900, Damien Le Moal wrote: > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > index 5755e897a5f0..8ac2d3b70022 100644 > --- a/fs/f2fs/data.c > +++ b/fs/f2fs/data.c > @@ -1073,6 +1073,8 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) > int flag; > int err = 0; > bool direct_io = iocb->ki_flags & IOCB_DIRECT; > + bool do_direct_io = direct_io && > + !f2fs_force_buffered_io(inode, iocb, from); I don't think this is the right fix. The proper fix is to clear IOCB_DIRECT when falling back to buffered I/O, preferably in the filemap.c helpers as well.
On 11/26, Damien Le Moal wrote: > f2fs_preallocate_blocks() identifies direct IOs using the IOCB_DIRECT > flag for a kiocb structure. However, the file system direct IO handler > function f2fs_direct_IO() may have decided that a direct IO has to be > exececuted as a buffered IO using the function f2fs_force_buffered_io(). > This is the case for instance for volumes including zoned block device > and for unaligned write IOs with LFS mode enabled. > > These 2 different methods of identifying direct IOs can result in > inconsistencies generating stale data access for direct reads after a > direct IO write that is treated as a buffered write. Fix this > inconsistency by combining the IOCB_DIRECT flag test with the result > of f2fs_force_buffered_io(). > > Reported-by: Javier Gonzalez <javier@javigon.com> > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> > --- > fs/f2fs/data.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > index 5755e897a5f0..8ac2d3b70022 100644 > --- a/fs/f2fs/data.c > +++ b/fs/f2fs/data.c > @@ -1073,6 +1073,8 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) > int flag; > int err = 0; > bool direct_io = iocb->ki_flags & IOCB_DIRECT; > + bool do_direct_io = direct_io && > + !f2fs_force_buffered_io(inode, iocb, from); > > /* convert inline data for Direct I/O*/ > if (direct_io) { > @@ -1081,7 +1083,7 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) > return err; > } > > - if (direct_io && allow_outplace_dio(inode, iocb, from)) > + if (do_direct_io && allow_outplace_dio(inode, iocb, from)) It seems f2fs_force_buffered_io() includes allow_outplace_dio(). How about this? --- fs/f2fs/data.c | 13 ------------- fs/f2fs/file.c | 35 +++++++++++++++++++++++++---------- 2 files changed, 25 insertions(+), 23 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index a034cd0ce021..fc40a72f7827 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1180,19 +1180,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) int err = 0; bool direct_io = iocb->ki_flags & IOCB_DIRECT; - /* convert inline data for Direct I/O*/ - if (direct_io) { - err = f2fs_convert_inline_inode(inode); - if (err) - return err; - } - - if (direct_io && allow_outplace_dio(inode, iocb, from)) - return 0; - - if (is_inode_flag_set(inode, FI_NO_PREALLOC)) - return 0; - map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos); map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from)); if (map.m_len > map.m_lblk) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index c0560d62dbee..6b32ac6c3382 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -3386,18 +3386,33 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) ret = -EAGAIN; goto out; } - } else { - preallocated = true; - target_size = iocb->ki_pos + iov_iter_count(from); + goto write; + } - err = f2fs_preallocate_blocks(iocb, from); - if (err) { - clear_inode_flag(inode, FI_NO_PREALLOC); - inode_unlock(inode); - ret = err; - goto out; - } + if (is_inode_flag_set(inode, FI_NO_PREALLOC)) + goto write; + + if (iocb->ki_flags & IOCB_DIRECT) { + /* convert inline data for Direct I/O*/ + err = f2fs_convert_inline_inode(inode); + if (err) + goto out_err; + + if (!f2fs_force_buffered_io(inode, iocb, from)) + goto write; + } + preallocated = true; + target_size = iocb->ki_pos + iov_iter_count(from); + + err = f2fs_preallocate_blocks(iocb, from); + if (err) { +out_err: + clear_inode_flag(inode, FI_NO_PREALLOC); + inode_unlock(inode); + ret = err; + goto out; } +write: ret = __generic_file_write_iter(iocb, from); clear_inode_flag(inode, FI_NO_PREALLOC);
On 2019/11/26 17:34, Ritesh Harjani wrote: > Hello Damien, > > IIUC, you are trying to fix a stale data read by DIO read for the case > you explained in your patch w.r.t. DIO-write forced to write as buffIO. > > Coincidentally I was just looking at the same code path just now. > So I do have a query to you/f2fs group. Below could be silly one, as I > don't understand F2FS in great detail. > > How is the stale data by DIO read, is protected against a mmap > writes via f2fs_vm_page_mkwrite? > > f2fs_vm_page_mkwrite() f2fs_direct_IO (read) > filemap_write_and_wait_range() > -> f2fs_get_blocks() > -> submit_bio() > > -> set_page_dirty() > > Is above race possible with current f2fs code? > i.e. f2fs_direct_IO could read the stale data from the blocks > which were allocated due to mmap fault? The faulted page is locked until the fault is fully processed so direct IO has to wait for that to complete first. > > Am I missing something here? > > -ritesh > > On 11/26/19 1:27 PM, Damien Le Moal wrote: >> f2fs_preallocate_blocks() identifies direct IOs using the IOCB_DIRECT >> flag for a kiocb structure. However, the file system direct IO handler >> function f2fs_direct_IO() may have decided that a direct IO has to be >> exececuted as a buffered IO using the function f2fs_force_buffered_io(). >> This is the case for instance for volumes including zoned block device >> and for unaligned write IOs with LFS mode enabled. >> >> These 2 different methods of identifying direct IOs can result in >> inconsistencies generating stale data access for direct reads after a >> direct IO write that is treated as a buffered write. Fix this >> inconsistency by combining the IOCB_DIRECT flag test with the result >> of f2fs_force_buffered_io(). >> >> Reported-by: Javier Gonzalez <javier@javigon.com> >> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> >> --- >> fs/f2fs/data.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >> index 5755e897a5f0..8ac2d3b70022 100644 >> --- a/fs/f2fs/data.c >> +++ b/fs/f2fs/data.c >> @@ -1073,6 +1073,8 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) >> int flag; >> int err = 0; >> bool direct_io = iocb->ki_flags & IOCB_DIRECT; >> + bool do_direct_io = direct_io && >> + !f2fs_force_buffered_io(inode, iocb, from); >> >> /* convert inline data for Direct I/O*/ >> if (direct_io) { >> @@ -1081,7 +1083,7 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) >> return err; >> } >> >> - if (direct_io && allow_outplace_dio(inode, iocb, from)) >> + if (do_direct_io && allow_outplace_dio(inode, iocb, from)) >> return 0; >> >> if (is_inode_flag_set(inode, FI_NO_PREALLOC)) >> > >
On Nov 26, 2019 / 15:44, Jaegeuk Kim wrote: > On 11/26, Damien Le Moal wrote: > > f2fs_preallocate_blocks() identifies direct IOs using the IOCB_DIRECT > > flag for a kiocb structure. However, the file system direct IO handler > > function f2fs_direct_IO() may have decided that a direct IO has to be > > exececuted as a buffered IO using the function f2fs_force_buffered_io(). > > This is the case for instance for volumes including zoned block device > > and for unaligned write IOs with LFS mode enabled. > > > > These 2 different methods of identifying direct IOs can result in > > inconsistencies generating stale data access for direct reads after a > > direct IO write that is treated as a buffered write. Fix this > > inconsistency by combining the IOCB_DIRECT flag test with the result > > of f2fs_force_buffered_io(). > > > > Reported-by: Javier Gonzalez <javier@javigon.com> > > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> > > --- > > fs/f2fs/data.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > > index 5755e897a5f0..8ac2d3b70022 100644 > > --- a/fs/f2fs/data.c > > +++ b/fs/f2fs/data.c > > @@ -1073,6 +1073,8 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) > > int flag; > > int err = 0; > > bool direct_io = iocb->ki_flags & IOCB_DIRECT; > > + bool do_direct_io = direct_io && > > + !f2fs_force_buffered_io(inode, iocb, from); > > > > /* convert inline data for Direct I/O*/ > > if (direct_io) { > > @@ -1081,7 +1083,7 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) > > return err; > > } > > > > - if (direct_io && allow_outplace_dio(inode, iocb, from)) > > + if (do_direct_io && allow_outplace_dio(inode, iocb, from)) > > It seems f2fs_force_buffered_io() includes allow_outplace_dio(). > > How about this? Thanks. I confirmed that the issue is gone with your patch. Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > --- > fs/f2fs/data.c | 13 ------------- > fs/f2fs/file.c | 35 +++++++++++++++++++++++++---------- > 2 files changed, 25 insertions(+), 23 deletions(-) > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > index a034cd0ce021..fc40a72f7827 100644 > --- a/fs/f2fs/data.c > +++ b/fs/f2fs/data.c > @@ -1180,19 +1180,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) > int err = 0; > bool direct_io = iocb->ki_flags & IOCB_DIRECT; > > - /* convert inline data for Direct I/O*/ > - if (direct_io) { > - err = f2fs_convert_inline_inode(inode); > - if (err) > - return err; > - } > - > - if (direct_io && allow_outplace_dio(inode, iocb, from)) > - return 0; > - > - if (is_inode_flag_set(inode, FI_NO_PREALLOC)) > - return 0; > - > map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos); > map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from)); > if (map.m_len > map.m_lblk) > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > index c0560d62dbee..6b32ac6c3382 100644 > --- a/fs/f2fs/file.c > +++ b/fs/f2fs/file.c > @@ -3386,18 +3386,33 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > ret = -EAGAIN; > goto out; > } > - } else { > - preallocated = true; > - target_size = iocb->ki_pos + iov_iter_count(from); > + goto write; > + } > > - err = f2fs_preallocate_blocks(iocb, from); > - if (err) { > - clear_inode_flag(inode, FI_NO_PREALLOC); > - inode_unlock(inode); > - ret = err; > - goto out; > - } > + if (is_inode_flag_set(inode, FI_NO_PREALLOC)) > + goto write; > + > + if (iocb->ki_flags & IOCB_DIRECT) { > + /* convert inline data for Direct I/O*/ > + err = f2fs_convert_inline_inode(inode); > + if (err) > + goto out_err; > + > + if (!f2fs_force_buffered_io(inode, iocb, from)) > + goto write; > + } > + preallocated = true; > + target_size = iocb->ki_pos + iov_iter_count(from); > + > + err = f2fs_preallocate_blocks(iocb, from); > + if (err) { > +out_err: > + clear_inode_flag(inode, FI_NO_PREALLOC); > + inode_unlock(inode); > + ret = err; > + goto out; > } > +write: > ret = __generic_file_write_iter(iocb, from); > clear_inode_flag(inode, FI_NO_PREALLOC); > > -- > 2.19.0.605.g01d371f741-goog > -- Best Regards, Shin'ichiro Kawasaki
On 11/28/19 7:40 AM, Damien Le Moal wrote: > On 2019/11/26 17:34, Ritesh Harjani wrote: >> Hello Damien, >> >> IIUC, you are trying to fix a stale data read by DIO read for the case >> you explained in your patch w.r.t. DIO-write forced to write as buffIO. >> >> Coincidentally I was just looking at the same code path just now. >> So I do have a query to you/f2fs group. Below could be silly one, as I >> don't understand F2FS in great detail. >> >> How is the stale data by DIO read, is protected against a mmap >> writes via f2fs_vm_page_mkwrite? >> >> f2fs_vm_page_mkwrite() f2fs_direct_IO (read) >> filemap_write_and_wait_range() >> -> f2fs_get_blocks() >> -> submit_bio() >> >> -> set_page_dirty() >> >> Is above race possible with current f2fs code? >> i.e. f2fs_direct_IO could read the stale data from the blocks >> which were allocated due to mmap fault? > > The faulted page is locked until the fault is fully processed so direct > IO has to wait for that to complete first. How about below parallelism? f2fs_vm_page_mkwrite() f2fs_direct_IO (read) filemap_write_and_wait_range() -> down_read(->i_mmap_sem); -> lock_page() -> f2fs_get_blocks() -> submit_bio() -> set_page_dirty() Can above DIO read not expose the stale data from block which was allocated in f2fs_vm_page_mkwrite path? > >> >> Am I missing something here? >> >> -ritesh >> >> On 11/26/19 1:27 PM, Damien Le Moal wrote: >>> f2fs_preallocate_blocks() identifies direct IOs using the IOCB_DIRECT >>> flag for a kiocb structure. However, the file system direct IO handler >>> function f2fs_direct_IO() may have decided that a direct IO has to be >>> exececuted as a buffered IO using the function f2fs_force_buffered_io(). >>> This is the case for instance for volumes including zoned block device >>> and for unaligned write IOs with LFS mode enabled. >>> >>> These 2 different methods of identifying direct IOs can result in >>> inconsistencies generating stale data access for direct reads after a >>> direct IO write that is treated as a buffered write. Fix this >>> inconsistency by combining the IOCB_DIRECT flag test with the result >>> of f2fs_force_buffered_io(). >>> >>> Reported-by: Javier Gonzalez <javier@javigon.com> >>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> >>> --- >>> fs/f2fs/data.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >>> index 5755e897a5f0..8ac2d3b70022 100644 >>> --- a/fs/f2fs/data.c >>> +++ b/fs/f2fs/data.c >>> @@ -1073,6 +1073,8 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) >>> int flag; >>> int err = 0; >>> bool direct_io = iocb->ki_flags & IOCB_DIRECT; >>> + bool do_direct_io = direct_io && >>> + !f2fs_force_buffered_io(inode, iocb, from); >>> >>> /* convert inline data for Direct I/O*/ >>> if (direct_io) { >>> @@ -1081,7 +1083,7 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) >>> return err; >>> } >>> >>> - if (direct_io && allow_outplace_dio(inode, iocb, from)) >>> + if (do_direct_io && allow_outplace_dio(inode, iocb, from)) >>> return 0; >>> >>> if (is_inode_flag_set(inode, FI_NO_PREALLOC)) >>> >> >> > >
On 26.11.2019 15:44, Jaegeuk Kim wrote: >On 11/26, Damien Le Moal wrote: >> f2fs_preallocate_blocks() identifies direct IOs using the IOCB_DIRECT >> flag for a kiocb structure. However, the file system direct IO handler >> function f2fs_direct_IO() may have decided that a direct IO has to be >> exececuted as a buffered IO using the function f2fs_force_buffered_io(). >> This is the case for instance for volumes including zoned block device >> and for unaligned write IOs with LFS mode enabled. >> >> These 2 different methods of identifying direct IOs can result in >> inconsistencies generating stale data access for direct reads after a >> direct IO write that is treated as a buffered write. Fix this >> inconsistency by combining the IOCB_DIRECT flag test with the result >> of f2fs_force_buffered_io(). >> >> Reported-by: Javier Gonzalez <javier@javigon.com> >> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> >> --- >> fs/f2fs/data.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >> index 5755e897a5f0..8ac2d3b70022 100644 >> --- a/fs/f2fs/data.c >> +++ b/fs/f2fs/data.c >> @@ -1073,6 +1073,8 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) >> int flag; >> int err = 0; >> bool direct_io = iocb->ki_flags & IOCB_DIRECT; >> + bool do_direct_io = direct_io && >> + !f2fs_force_buffered_io(inode, iocb, from); >> >> /* convert inline data for Direct I/O*/ >> if (direct_io) { >> @@ -1081,7 +1083,7 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) >> return err; >> } >> >> - if (direct_io && allow_outplace_dio(inode, iocb, from)) >> + if (do_direct_io && allow_outplace_dio(inode, iocb, from)) > >It seems f2fs_force_buffered_io() includes allow_outplace_dio(). > >How about this? >--- > fs/f2fs/data.c | 13 ------------- > fs/f2fs/file.c | 35 +++++++++++++++++++++++++---------- > 2 files changed, 25 insertions(+), 23 deletions(-) > >diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >index a034cd0ce021..fc40a72f7827 100644 >--- a/fs/f2fs/data.c >+++ b/fs/f2fs/data.c >@@ -1180,19 +1180,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) > int err = 0; > bool direct_io = iocb->ki_flags & IOCB_DIRECT; > >- /* convert inline data for Direct I/O*/ >- if (direct_io) { >- err = f2fs_convert_inline_inode(inode); >- if (err) >- return err; >- } >- >- if (direct_io && allow_outplace_dio(inode, iocb, from)) >- return 0; >- >- if (is_inode_flag_set(inode, FI_NO_PREALLOC)) >- return 0; >- > map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos); > map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from)); > if (map.m_len > map.m_lblk) >diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c >index c0560d62dbee..6b32ac6c3382 100644 >--- a/fs/f2fs/file.c >+++ b/fs/f2fs/file.c >@@ -3386,18 +3386,33 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > ret = -EAGAIN; > goto out; > } >- } else { >- preallocated = true; >- target_size = iocb->ki_pos + iov_iter_count(from); >+ goto write; >+ } > >- err = f2fs_preallocate_blocks(iocb, from); >- if (err) { >- clear_inode_flag(inode, FI_NO_PREALLOC); >- inode_unlock(inode); >- ret = err; >- goto out; >- } >+ if (is_inode_flag_set(inode, FI_NO_PREALLOC)) >+ goto write; >+ >+ if (iocb->ki_flags & IOCB_DIRECT) { >+ /* convert inline data for Direct I/O*/ >+ err = f2fs_convert_inline_inode(inode); >+ if (err) >+ goto out_err; >+ >+ if (!f2fs_force_buffered_io(inode, iocb, from)) >+ goto write; >+ } >+ preallocated = true; >+ target_size = iocb->ki_pos + iov_iter_count(from); >+ >+ err = f2fs_preallocate_blocks(iocb, from); >+ if (err) { >+out_err: >+ clear_inode_flag(inode, FI_NO_PREALLOC); >+ inode_unlock(inode); >+ ret = err; >+ goto out; > } >+write: > ret = __generic_file_write_iter(iocb, from); > clear_inode_flag(inode, FI_NO_PREALLOC); > >-- >2.19.0.605.g01d371f741-goog > This also addresses the original problem. Tested-by: Javier González <javier@javigon.com>
On 2019/11/27 7:44, Jaegeuk Kim wrote: > On 11/26, Damien Le Moal wrote: >> f2fs_preallocate_blocks() identifies direct IOs using the IOCB_DIRECT >> flag for a kiocb structure. However, the file system direct IO handler >> function f2fs_direct_IO() may have decided that a direct IO has to be >> exececuted as a buffered IO using the function f2fs_force_buffered_io(). >> This is the case for instance for volumes including zoned block device >> and for unaligned write IOs with LFS mode enabled. >> >> These 2 different methods of identifying direct IOs can result in >> inconsistencies generating stale data access for direct reads after a >> direct IO write that is treated as a buffered write. Fix this >> inconsistency by combining the IOCB_DIRECT flag test with the result >> of f2fs_force_buffered_io(). >> >> Reported-by: Javier Gonzalez <javier@javigon.com> >> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> >> --- >> fs/f2fs/data.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >> index 5755e897a5f0..8ac2d3b70022 100644 >> --- a/fs/f2fs/data.c >> +++ b/fs/f2fs/data.c >> @@ -1073,6 +1073,8 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) >> int flag; >> int err = 0; >> bool direct_io = iocb->ki_flags & IOCB_DIRECT; >> + bool do_direct_io = direct_io && >> + !f2fs_force_buffered_io(inode, iocb, from); >> >> /* convert inline data for Direct I/O*/ >> if (direct_io) { >> @@ -1081,7 +1083,7 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) >> return err; >> } >> >> - if (direct_io && allow_outplace_dio(inode, iocb, from)) >> + if (do_direct_io && allow_outplace_dio(inode, iocb, from)) > > It seems f2fs_force_buffered_io() includes allow_outplace_dio(). > > How about this? > --- > fs/f2fs/data.c | 13 ------------- > fs/f2fs/file.c | 35 +++++++++++++++++++++++++---------- > 2 files changed, 25 insertions(+), 23 deletions(-) > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > index a034cd0ce021..fc40a72f7827 100644 > --- a/fs/f2fs/data.c > +++ b/fs/f2fs/data.c > @@ -1180,19 +1180,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) > int err = 0; > bool direct_io = iocb->ki_flags & IOCB_DIRECT; > > - /* convert inline data for Direct I/O*/ > - if (direct_io) { > - err = f2fs_convert_inline_inode(inode); > - if (err) > - return err; > - } > - > - if (direct_io && allow_outplace_dio(inode, iocb, from)) > - return 0; > - > - if (is_inode_flag_set(inode, FI_NO_PREALLOC)) > - return 0; > - > map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos); > map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from)); > if (map.m_len > map.m_lblk) > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > index c0560d62dbee..6b32ac6c3382 100644 > --- a/fs/f2fs/file.c > +++ b/fs/f2fs/file.c > @@ -3386,18 +3386,33 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > ret = -EAGAIN; > goto out; > } > - } else { > - preallocated = true; > - target_size = iocb->ki_pos + iov_iter_count(from); > + goto write; > + } > > - err = f2fs_preallocate_blocks(iocb, from); > - if (err) { > - clear_inode_flag(inode, FI_NO_PREALLOC); > - inode_unlock(inode); > - ret = err; > - goto out; > - } > + if (is_inode_flag_set(inode, FI_NO_PREALLOC)) > + goto write; > + > + if (iocb->ki_flags & IOCB_DIRECT) { > + /* convert inline data for Direct I/O*/ Minor thing. I/O */ > + err = f2fs_convert_inline_inode(inode); > + if (err) > + goto out_err; > + > + if (!f2fs_force_buffered_io(inode, iocb, from)) > + goto write; We can call f2fs_convert_inline_inode() here to avoid unneeded inline conversion. Thanks, > + } > + preallocated = true; > + target_size = iocb->ki_pos + iov_iter_count(from); > + > + err = f2fs_preallocate_blocks(iocb, from); > + if (err) { > +out_err: > + clear_inode_flag(inode, FI_NO_PREALLOC); > + inode_unlock(inode); > + ret = err; > + goto out; > } > +write: > ret = __generic_file_write_iter(iocb, from); > clear_inode_flag(inode, FI_NO_PREALLOC); > >
Thank you for checking the patch. I found some regressions in xfstests, so want to follow the Damien's one like below. Thanks, === From 9df6f09e3a09ed804aba4b56ff7cd9524c002e69 Mon Sep 17 00:00:00 2001 From: Jaegeuk Kim <jaegeuk@kernel.org> Date: Tue, 26 Nov 2019 15:01:42 -0800 Subject: [PATCH] f2fs: preallocate DIO blocks when forcing buffered_io The previous preallocation and DIO decision like below. allow_outplace_dio !allow_outplace_dio f2fs_force_buffered_io (*) No_Prealloc / Buffered_IO Prealloc / Buffered_IO !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO But, Javier reported Case (*) where zoned device bypassed preallocation but fell back to buffered writes in f2fs_direct_IO(), resulting in stale data being read. In order to fix the issue, actually we need to preallocate blocks whenever we fall back to buffered IO like this. No change is made in the other cases. allow_outplace_dio !allow_outplace_dio f2fs_force_buffered_io (*) Prealloc / Buffered_IO Prealloc / Buffered_IO !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO Reported-and-tested-by: Javier Gonzalez <javier@javigon.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> --- fs/f2fs/data.c | 13 ------------- fs/f2fs/file.c | 43 +++++++++++++++++++++++++++++++++---------- 2 files changed, 33 insertions(+), 23 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index a034cd0ce021..fc40a72f7827 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1180,19 +1180,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) int err = 0; bool direct_io = iocb->ki_flags & IOCB_DIRECT; - /* convert inline data for Direct I/O*/ - if (direct_io) { - err = f2fs_convert_inline_inode(inode); - if (err) - return err; - } - - if (direct_io && allow_outplace_dio(inode, iocb, from)) - return 0; - - if (is_inode_flag_set(inode, FI_NO_PREALLOC)) - return 0; - map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos); map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from)); if (map.m_len > map.m_lblk) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index c0560d62dbee..0e1b12a4a4d6 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -3386,18 +3386,41 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) ret = -EAGAIN; goto out; } - } else { - preallocated = true; - target_size = iocb->ki_pos + iov_iter_count(from); + goto write; + } - err = f2fs_preallocate_blocks(iocb, from); - if (err) { - clear_inode_flag(inode, FI_NO_PREALLOC); - inode_unlock(inode); - ret = err; - goto out; - } + if (is_inode_flag_set(inode, FI_NO_PREALLOC)) + goto write; + + if (iocb->ki_flags & IOCB_DIRECT) { + /* + * Convert inline data for Direct I/O before entering + * f2fs_direct_IO(). + */ + err = f2fs_convert_inline_inode(inode); + if (err) + goto out_err; + /* + * If force_buffere_io() is true, we have to allocate + * blocks all the time, since f2fs_direct_IO will fall + * back to buffered IO. + */ + if (!f2fs_force_buffered_io(inode, iocb, from) && + allow_outplace_dio(inode, iocb, from)) + goto write; + } + preallocated = true; + target_size = iocb->ki_pos + iov_iter_count(from); + + err = f2fs_preallocate_blocks(iocb, from); + if (err) { +out_err: + clear_inode_flag(inode, FI_NO_PREALLOC); + inode_unlock(inode); + ret = err; + goto out; } +write: ret = __generic_file_write_iter(iocb, from); clear_inode_flag(inode, FI_NO_PREALLOC);
On 2019/12/4 1:33, Jaegeuk Kim wrote: > Thank you for checking the patch. > I found some regressions in xfstests, so want to follow the Damien's one > like below. > > Thanks, > > === >>From 9df6f09e3a09ed804aba4b56ff7cd9524c002e69 Mon Sep 17 00:00:00 2001 > From: Jaegeuk Kim <jaegeuk@kernel.org> > Date: Tue, 26 Nov 2019 15:01:42 -0800 > Subject: [PATCH] f2fs: preallocate DIO blocks when forcing buffered_io > > The previous preallocation and DIO decision like below. > > allow_outplace_dio !allow_outplace_dio > f2fs_force_buffered_io (*) No_Prealloc / Buffered_IO Prealloc / Buffered_IO > !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO > > But, Javier reported Case (*) where zoned device bypassed preallocation but > fell back to buffered writes in f2fs_direct_IO(), resulting in stale data > being read. > > In order to fix the issue, actually we need to preallocate blocks whenever > we fall back to buffered IO like this. No change is made in the other cases. > > allow_outplace_dio !allow_outplace_dio > f2fs_force_buffered_io (*) Prealloc / Buffered_IO Prealloc / Buffered_IO > !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO > > Reported-and-tested-by: Javier Gonzalez <javier@javigon.com> > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> > Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Thanks,
On Dec 03, 2019 / 09:33, Jaegeuk Kim wrote: > Thank you for checking the patch. > I found some regressions in xfstests, so want to follow the Damien's one > like below. > > Thanks, > > === > From 9df6f09e3a09ed804aba4b56ff7cd9524c002e69 Mon Sep 17 00:00:00 2001 > From: Jaegeuk Kim <jaegeuk@kernel.org> > Date: Tue, 26 Nov 2019 15:01:42 -0800 > Subject: [PATCH] f2fs: preallocate DIO blocks when forcing buffered_io > > The previous preallocation and DIO decision like below. > > allow_outplace_dio !allow_outplace_dio > f2fs_force_buffered_io (*) No_Prealloc / Buffered_IO Prealloc / Buffered_IO > !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO > > But, Javier reported Case (*) where zoned device bypassed preallocation but > fell back to buffered writes in f2fs_direct_IO(), resulting in stale data > being read. > > In order to fix the issue, actually we need to preallocate blocks whenever > we fall back to buffered IO like this. No change is made in the other cases. > > allow_outplace_dio !allow_outplace_dio > f2fs_force_buffered_io (*) Prealloc / Buffered_IO Prealloc / Buffered_IO > !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO > > Reported-and-tested-by: Javier Gonzalez <javier@javigon.com> > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> > Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Using SMR disks, I reconfirmed that the reported problem goes away with this modified patch also. Thanks. > --- > fs/f2fs/data.c | 13 ------------- > fs/f2fs/file.c | 43 +++++++++++++++++++++++++++++++++---------- > 2 files changed, 33 insertions(+), 23 deletions(-) > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > index a034cd0ce021..fc40a72f7827 100644 > --- a/fs/f2fs/data.c > +++ b/fs/f2fs/data.c > @@ -1180,19 +1180,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) > int err = 0; > bool direct_io = iocb->ki_flags & IOCB_DIRECT; > > - /* convert inline data for Direct I/O*/ > - if (direct_io) { > - err = f2fs_convert_inline_inode(inode); > - if (err) > - return err; > - } > - > - if (direct_io && allow_outplace_dio(inode, iocb, from)) > - return 0; > - > - if (is_inode_flag_set(inode, FI_NO_PREALLOC)) > - return 0; > - > map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos); > map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from)); > if (map.m_len > map.m_lblk) > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > index c0560d62dbee..0e1b12a4a4d6 100644 > --- a/fs/f2fs/file.c > +++ b/fs/f2fs/file.c > @@ -3386,18 +3386,41 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > ret = -EAGAIN; > goto out; > } > - } else { > - preallocated = true; > - target_size = iocb->ki_pos + iov_iter_count(from); > + goto write; > + } > > - err = f2fs_preallocate_blocks(iocb, from); > - if (err) { > - clear_inode_flag(inode, FI_NO_PREALLOC); > - inode_unlock(inode); > - ret = err; > - goto out; > - } > + if (is_inode_flag_set(inode, FI_NO_PREALLOC)) > + goto write; > + > + if (iocb->ki_flags & IOCB_DIRECT) { > + /* > + * Convert inline data for Direct I/O before entering > + * f2fs_direct_IO(). > + */ > + err = f2fs_convert_inline_inode(inode); > + if (err) > + goto out_err; > + /* > + * If force_buffere_io() is true, we have to allocate > + * blocks all the time, since f2fs_direct_IO will fall > + * back to buffered IO. > + */ > + if (!f2fs_force_buffered_io(inode, iocb, from) && > + allow_outplace_dio(inode, iocb, from)) > + goto write; > + } > + preallocated = true; > + target_size = iocb->ki_pos + iov_iter_count(from); > + > + err = f2fs_preallocate_blocks(iocb, from); > + if (err) { > +out_err: > + clear_inode_flag(inode, FI_NO_PREALLOC); > + inode_unlock(inode); > + ret = err; > + goto out; > } > +write: > ret = __generic_file_write_iter(iocb, from); > clear_inode_flag(inode, FI_NO_PREALLOC); > > -- > 2.19.0.605.g01d371f741-goog > > -- Best Regards, Shin'ichiro Kawasaki
On 03.12.2019 09:33, Jaegeuk Kim wrote: >Thank you for checking the patch. >I found some regressions in xfstests, so want to follow the Damien's one >like below. > >Thanks, > >=== >From 9df6f09e3a09ed804aba4b56ff7cd9524c002e69 Mon Sep 17 00:00:00 2001 >From: Jaegeuk Kim <jaegeuk@kernel.org> >Date: Tue, 26 Nov 2019 15:01:42 -0800 >Subject: [PATCH] f2fs: preallocate DIO blocks when forcing buffered_io > >The previous preallocation and DIO decision like below. > > allow_outplace_dio !allow_outplace_dio >f2fs_force_buffered_io (*) No_Prealloc / Buffered_IO Prealloc / Buffered_IO >!f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO > >But, Javier reported Case (*) where zoned device bypassed preallocation but >fell back to buffered writes in f2fs_direct_IO(), resulting in stale data >being read. > >In order to fix the issue, actually we need to preallocate blocks whenever >we fall back to buffered IO like this. No change is made in the other cases. > > allow_outplace_dio !allow_outplace_dio >f2fs_force_buffered_io (*) Prealloc / Buffered_IO Prealloc / Buffered_IO >!f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO > >Reported-and-tested-by: Javier Gonzalez <javier@javigon.com> >Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> >Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> >Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> >--- > fs/f2fs/data.c | 13 ------------- > fs/f2fs/file.c | 43 +++++++++++++++++++++++++++++++++---------- > 2 files changed, 33 insertions(+), 23 deletions(-) > >diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >index a034cd0ce021..fc40a72f7827 100644 >--- a/fs/f2fs/data.c >+++ b/fs/f2fs/data.c >@@ -1180,19 +1180,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) > int err = 0; > bool direct_io = iocb->ki_flags & IOCB_DIRECT; > >- /* convert inline data for Direct I/O*/ >- if (direct_io) { >- err = f2fs_convert_inline_inode(inode); >- if (err) >- return err; >- } >- >- if (direct_io && allow_outplace_dio(inode, iocb, from)) >- return 0; >- >- if (is_inode_flag_set(inode, FI_NO_PREALLOC)) >- return 0; >- > map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos); > map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from)); > if (map.m_len > map.m_lblk) >diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c >index c0560d62dbee..0e1b12a4a4d6 100644 >--- a/fs/f2fs/file.c >+++ b/fs/f2fs/file.c >@@ -3386,18 +3386,41 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > ret = -EAGAIN; > goto out; > } >- } else { >- preallocated = true; >- target_size = iocb->ki_pos + iov_iter_count(from); >+ goto write; >+ } > >- err = f2fs_preallocate_blocks(iocb, from); >- if (err) { >- clear_inode_flag(inode, FI_NO_PREALLOC); >- inode_unlock(inode); >- ret = err; >- goto out; >- } >+ if (is_inode_flag_set(inode, FI_NO_PREALLOC)) >+ goto write; >+ >+ if (iocb->ki_flags & IOCB_DIRECT) { >+ /* >+ * Convert inline data for Direct I/O before entering >+ * f2fs_direct_IO(). >+ */ >+ err = f2fs_convert_inline_inode(inode); >+ if (err) >+ goto out_err; >+ /* >+ * If force_buffere_io() is true, we have to allocate >+ * blocks all the time, since f2fs_direct_IO will fall >+ * back to buffered IO. >+ */ >+ if (!f2fs_force_buffered_io(inode, iocb, from) && >+ allow_outplace_dio(inode, iocb, from)) >+ goto write; >+ } >+ preallocated = true; >+ target_size = iocb->ki_pos + iov_iter_count(from); >+ >+ err = f2fs_preallocate_blocks(iocb, from); >+ if (err) { >+out_err: >+ clear_inode_flag(inode, FI_NO_PREALLOC); >+ inode_unlock(inode); >+ ret = err; >+ goto out; > } >+write: > ret = __generic_file_write_iter(iocb, from); > clear_inode_flag(inode, FI_NO_PREALLOC); > >-- >2.19.0.605.g01d371f741-goog > > Looks good to me. It also fixes the problem we see in our end. Reviewed-by: Javier González <javier@javigon.com>
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 5755e897a5f0..8ac2d3b70022 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1073,6 +1073,8 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) int flag; int err = 0; bool direct_io = iocb->ki_flags & IOCB_DIRECT; + bool do_direct_io = direct_io && + !f2fs_force_buffered_io(inode, iocb, from); /* convert inline data for Direct I/O*/ if (direct_io) { @@ -1081,7 +1083,7 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) return err; } - if (direct_io && allow_outplace_dio(inode, iocb, from)) + if (do_direct_io && allow_outplace_dio(inode, iocb, from)) return 0; if (is_inode_flag_set(inode, FI_NO_PREALLOC))
f2fs_preallocate_blocks() identifies direct IOs using the IOCB_DIRECT flag for a kiocb structure. However, the file system direct IO handler function f2fs_direct_IO() may have decided that a direct IO has to be exececuted as a buffered IO using the function f2fs_force_buffered_io(). This is the case for instance for volumes including zoned block device and for unaligned write IOs with LFS mode enabled. These 2 different methods of identifying direct IOs can result in inconsistencies generating stale data access for direct reads after a direct IO write that is treated as a buffered write. Fix this inconsistency by combining the IOCB_DIRECT flag test with the result of f2fs_force_buffered_io(). Reported-by: Javier Gonzalez <javier@javigon.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> --- fs/f2fs/data.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)