diff mbox

[RFC,v2] fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE

Message ID 1450194205-44587-1-git-send-email-adas@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Abhi Das Dec. 15, 2015, 3:43 p.m. UTC
During testing, I discovered that __generic_file_splice_read() returns
0 (EOF) when aops->readpage fails with AOP_TRUNCATED_PAGE on the first
page of a single/multi-page splice read operation. This EOF return code
causes the userspace test to (correctly) report a zero-length read error
when it was expecting otherwise.

The current strategy of returning a partial non-zero read when ->readpage
returns AOP_TRUNCATED_PAGE works only when the failed page is not the
first of the lot being processed.

This patch attempts to retry lookup and call ->readpage again on pages
that had previously failed with AOP_TRUNCATED_PAGE. With this patch, my
tests pass and I haven't noticed any unwanted side effects.

This version fixes a return code issue pointed out by Bob Peterson.

Signed-off-by: Abhi Das <adas@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
---
 fs/splice.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

Comments

Jan Kara Dec. 16, 2015, 3:34 p.m. UTC | #1
On Tue 15-12-15 09:43:25, Abhi Das wrote:
> During testing, I discovered that __generic_file_splice_read() returns
> 0 (EOF) when aops->readpage fails with AOP_TRUNCATED_PAGE on the first
> page of a single/multi-page splice read operation. This EOF return code
> causes the userspace test to (correctly) report a zero-length read error
> when it was expecting otherwise.
> 
> The current strategy of returning a partial non-zero read when ->readpage
> returns AOP_TRUNCATED_PAGE works only when the failed page is not the
> first of the lot being processed.
> 
> This patch attempts to retry lookup and call ->readpage again on pages
> that had previously failed with AOP_TRUNCATED_PAGE. With this patch, my
> tests pass and I haven't noticed any unwanted side effects.
> 
> This version fixes a return code issue pointed out by Bob Peterson.
> 
> Signed-off-by: Abhi Das <adas@redhat.com>
> Cc: Bob Peterson <rpeterso@redhat.com>
> ---
>  fs/splice.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/splice.c b/fs/splice.c
> index 801c21c..365cd2a 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -387,6 +387,7 @@ __generic_file_splice_read(struct file *in, loff_t *ppos,
>  	spd.nr_pages = 0;
>  	for (page_nr = 0; page_nr < nr_pages; page_nr++) {
>  		unsigned int this_len;
> +		int retries = 0;
>  
>  		if (!len)
>  			break;
> @@ -415,6 +416,7 @@ __generic_file_splice_read(struct file *in, loff_t *ppos,
>  			 */
>  			if (!page->mapping) {
>  				unlock_page(page);
> +retry_lookup:
>  				page = find_or_create_page(mapping, index,
>  						mapping_gfp_mask(mapping));
>  
> @@ -439,14 +441,13 @@ __generic_file_splice_read(struct file *in, loff_t *ppos,
>  			error = mapping->a_ops->readpage(in, page);
>  			if (unlikely(error)) {
>  				/*
> -				 * We really should re-lookup the page here,
> -				 * but it complicates things a lot. Instead
> -				 * lets just do what we already stored, and
> -				 * we'll get it the next time we are called.
> +				 * Re-lookup the page
>  				 */
> -				if (error == AOP_TRUNCATED_PAGE)
> +				if (error == AOP_TRUNCATED_PAGE) {
>  					error = 0;
> -
> +					if (retries++ < 3)
> +						goto retry_lookup;
> +				}

I don't like this retry-three-times loop. That is still leaving the
possibility of 0 return just much less likely (so it will lead to even
weirder and harded to debug failures). IMO we should just terminate the
loop like we did previously if spd.nr_pages > 0 and we retry indefinitely
if it is the first page that failed to read.

								Honza
Abhi Das Dec. 16, 2015, 3:55 p.m. UTC | #2
----- Original Message -----
> From: "Jan Kara" <jack@suse.cz>
> To: "Abhi Das" <adas@redhat.com>
> Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, "Bob Peterson" <rpeterso@redhat.com>
> Sent: Wednesday, December 16, 2015 9:34:04 AM
> Subject: Re: [RFC v2 PATCH] fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
> 
> On Tue 15-12-15 09:43:25, Abhi Das wrote:
> > During testing, I discovered that __generic_file_splice_read() returns
> > 0 (EOF) when aops->readpage fails with AOP_TRUNCATED_PAGE on the first
> > page of a single/multi-page splice read operation. This EOF return code
> > causes the userspace test to (correctly) report a zero-length read error
> > when it was expecting otherwise.
> > 
> > The current strategy of returning a partial non-zero read when ->readpage
> > returns AOP_TRUNCATED_PAGE works only when the failed page is not the
> > first of the lot being processed.
> > 
> > This patch attempts to retry lookup and call ->readpage again on pages
> > that had previously failed with AOP_TRUNCATED_PAGE. With this patch, my
> > tests pass and I haven't noticed any unwanted side effects.
> > 
> > This version fixes a return code issue pointed out by Bob Peterson.
> > 
> > Signed-off-by: Abhi Das <adas@redhat.com>
> > Cc: Bob Peterson <rpeterso@redhat.com>
> > ---
> >  fs/splice.c | 13 +++++++------
> >  1 file changed, 7 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/splice.c b/fs/splice.c
> > index 801c21c..365cd2a 100644
> > --- a/fs/splice.c
> > +++ b/fs/splice.c
> > @@ -387,6 +387,7 @@ __generic_file_splice_read(struct file *in, loff_t
> > *ppos,
> >  	spd.nr_pages = 0;
> >  	for (page_nr = 0; page_nr < nr_pages; page_nr++) {
> >  		unsigned int this_len;
> > +		int retries = 0;
> >  
> >  		if (!len)
> >  			break;
> > @@ -415,6 +416,7 @@ __generic_file_splice_read(struct file *in, loff_t
> > *ppos,
> >  			 */
> >  			if (!page->mapping) {
> >  				unlock_page(page);
> > +retry_lookup:
> >  				page = find_or_create_page(mapping, index,
> >  						mapping_gfp_mask(mapping));
> >  
> > @@ -439,14 +441,13 @@ __generic_file_splice_read(struct file *in, loff_t
> > *ppos,
> >  			error = mapping->a_ops->readpage(in, page);
> >  			if (unlikely(error)) {
> >  				/*
> > -				 * We really should re-lookup the page here,
> > -				 * but it complicates things a lot. Instead
> > -				 * lets just do what we already stored, and
> > -				 * we'll get it the next time we are called.
> > +				 * Re-lookup the page
> >  				 */
> > -				if (error == AOP_TRUNCATED_PAGE)
> > +				if (error == AOP_TRUNCATED_PAGE) {
> >  					error = 0;
> > -
> > +					if (retries++ < 3)
> > +						goto retry_lookup;
> > +				}
> 
> I don't like this retry-three-times loop. That is still leaving the
> possibility of 0 return just much less likely (so it will lead to even
> weirder and harded to debug failures). IMO we should just terminate the
> loop like we did previously if spd.nr_pages > 0 and we retry indefinitely
> if it is the first page that failed to read.
> 
> 								Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> 

The 3-retry loop was the thing I was unsure about too. With regards to the
indefinite retry, I was wondering if there's some corner case where we might
get into an infinite retry loop...

If we're doing the retry for the first page, why not for other pages too? Is
it because we'd potentially be increasing the odds for an infinite loop and/or
affecting performance by doing more lookups?

Cheers!
--Abhi
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Dec. 16, 2015, 4:47 p.m. UTC | #3
On Wed 16-12-15 10:55:25, Abhijith Das wrote:
> > > @@ -439,14 +441,13 @@ __generic_file_splice_read(struct file *in, loff_t
> > > *ppos,
> > >  			error = mapping->a_ops->readpage(in, page);
> > >  			if (unlikely(error)) {
> > >  				/*
> > > -				 * We really should re-lookup the page here,
> > > -				 * but it complicates things a lot. Instead
> > > -				 * lets just do what we already stored, and
> > > -				 * we'll get it the next time we are called.
> > > +				 * Re-lookup the page
> > >  				 */
> > > -				if (error == AOP_TRUNCATED_PAGE)
> > > +				if (error == AOP_TRUNCATED_PAGE) {
> > >  					error = 0;
> > > -
> > > +					if (retries++ < 3)
> > > +						goto retry_lookup;
> > > +				}
> > 
> > I don't like this retry-three-times loop. That is still leaving the
> > possibility of 0 return just much less likely (so it will lead to even
> > weirder and harded to debug failures). IMO we should just terminate the
> > loop like we did previously if spd.nr_pages > 0 and we retry indefinitely
> > if it is the first page that failed to read.
> 
> The 3-retry loop was the thing I was unsure about too. With regards to the
> indefinite retry, I was wondering if there's some corner case where we might
> get into an infinite retry loop...

Well you have two options:

1) Return incorrect value from splice_read()
2) Retry indefinitely

Option two looks better to me. Also do_generic_file_read() behaves the same
way so mirroring the behavior in __generic_file_splice_read() makes sense.

> If we're doing the retry for the first page, why not for other pages too?
> Is it because we'd potentially be increasing the odds for an infinite
> loop and/or affecting performance by doing more lookups?

Yes, that was my thought. But seeing now that do_generic_file_read()
actually retries indefinitely for every page, I'd just do the same in 
__generic_file_splice_read().

								Honza
diff mbox

Patch

diff --git a/fs/splice.c b/fs/splice.c
index 801c21c..365cd2a 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -387,6 +387,7 @@  __generic_file_splice_read(struct file *in, loff_t *ppos,
 	spd.nr_pages = 0;
 	for (page_nr = 0; page_nr < nr_pages; page_nr++) {
 		unsigned int this_len;
+		int retries = 0;
 
 		if (!len)
 			break;
@@ -415,6 +416,7 @@  __generic_file_splice_read(struct file *in, loff_t *ppos,
 			 */
 			if (!page->mapping) {
 				unlock_page(page);
+retry_lookup:
 				page = find_or_create_page(mapping, index,
 						mapping_gfp_mask(mapping));
 
@@ -439,14 +441,13 @@  __generic_file_splice_read(struct file *in, loff_t *ppos,
 			error = mapping->a_ops->readpage(in, page);
 			if (unlikely(error)) {
 				/*
-				 * We really should re-lookup the page here,
-				 * but it complicates things a lot. Instead
-				 * lets just do what we already stored, and
-				 * we'll get it the next time we are called.
+				 * Re-lookup the page
 				 */
-				if (error == AOP_TRUNCATED_PAGE)
+				if (error == AOP_TRUNCATED_PAGE) {
 					error = 0;
-
+					if (retries++ < 3)
+						goto retry_lookup;
+				}
 				break;
 			}
 		}