diff mbox

[1/2] cephfs: ignore error from invalidate_inode_pages2_range() in direct write.

Message ID 871t15ej9y.fsf@notabene.neil.brown.name (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown Aug. 31, 2016, 2:58 a.m. UTC
This call can fail if there are dirty pages.  The preceding call to
filemap_write_and_wait_range() will normally remove dirty pages, but
as inode_lock() is not held over calls to ceph_direct_read_write(), it
could race with non-direct writes and pages could be dirtied
immediately after filemap_write_and_wait_range() returns

If there are dirty pages, they will be removed by the subsequent call
to truncate_inode_pages_range(), so having them here is not a problem.

If the 'ret' value is left holding an error, then in the async IO case
(aio_req is not NULL) the loop that would normally call
ceph_osdc_start_request() will see the error in 'ret' and abort all
requests.  This doesn't seem like correct behaviour.

So clear 'ret' and ignore the error (other than the dout() message).

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/ceph/file.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Jeff Layton Aug. 31, 2016, 1:47 p.m. UTC | #1
On Wed, 2016-08-31 at 12:58 +1000, NeilBrown wrote:
> This call can fail if there are dirty pages.  The preceding call to
> filemap_write_and_wait_range() will normally remove dirty pages, but
> as inode_lock() is not held over calls to ceph_direct_read_write(), it
> could race with non-direct writes and pages could be dirtied
> immediately after filemap_write_and_wait_range() returns
> 
> If there are dirty pages, they will be removed by the subsequent call
> to truncate_inode_pages_range(), so having them here is not a problem.
> 
> If the 'ret' value is left holding an error, then in the async IO case
> (aio_req is not NULL) the loop that would normally call
> ceph_osdc_start_request() will see the error in 'ret' and abort all
> requests.  This doesn't seem like correct behaviour.
> 
> So clear 'ret' and ignore the error (other than the dout() message).
> 
> > Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  fs/ceph/file.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 0f5375d8e030..1ca6e29edcc9 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -905,8 +905,14 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
> >  		ret = invalidate_inode_pages2_range(inode->i_mapping,
> >  					pos >> PAGE_SHIFT,
> >  					(pos + count) >> PAGE_SHIFT);
> > -		if (ret < 0)
> > +		if (ret < 0) {
> >  			dout("invalidate_inode_pages2_range returned %d\n", ret);
> > +			/*
> > +			 * Error is not fatal as we truncate_inode_pages_range()
> > +			 * below.
> > +			 */
> > +			ret = 0;
> > +		}
>  
> >  		flags = CEPH_OSD_FLAG_ORDERSNAP |
> >  			CEPH_OSD_FLAG_ONDISK |


Good catch. Even better might be to just declare a int ret2 and not
clobber "ret" at all.

Clearly, mixing buffered and direct I/O is gross, but I suppose you
could hit the occasional problem here with a real workload
occasionally.

Should this go to stable? The patch seems safe enough.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 0f5375d8e030..1ca6e29edcc9 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -905,8 +905,14 @@  ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
 		ret = invalidate_inode_pages2_range(inode->i_mapping,
 					pos >> PAGE_SHIFT,
 					(pos + count) >> PAGE_SHIFT);
-		if (ret < 0)
+		if (ret < 0) {
 			dout("invalidate_inode_pages2_range returned %d\n", ret);
+			/*
+			 * Error is not fatal as we truncate_inode_pages_range()
+			 * below.
+			 */
+			ret = 0;
+		}
 
 		flags = CEPH_OSD_FLAG_ORDERSNAP |
 			CEPH_OSD_FLAG_ONDISK |