mm: Make sendfile(2) killable
diff mbox

Message ID 1444653923-22111-1-git-send-email-jack@suse.com
State New
Headers show

Commit Message

Jan Kara Oct. 12, 2015, 12:45 p.m. UTC
Currently a simple program below issues a sendfile(2) system call which
takes about 62 days to complete in my test KVM instance.

        int fd;
        off_t off = 0;

        fd = open("file", O_RDWR | O_TRUNC | O_SYNC | O_CREAT, 0644);
        ftruncate(fd, 2);
        lseek(fd, 0, SEEK_END);
        sendfile(fd, fd, &off, 0xfffffff);

Now you should not ask kernel to do a stupid stuff like copying 256MB in
2-byte chunks and call fsync(2) after each chunk but if you do, sysadmin
should have a way to stop you.

We actually do have a check for fatal_signal_pending() in
generic_perform_write() which triggers in this path however because we
always succeed in writing something before the check is done, we return
value > 0 from generic_perform_write() and thus the information about
signal gets lost.

Fix the problem by doing the signal check before writing anything. That
way generic_perform_write() returns -EINTR, the error gets propagated up
and the sendfile loop terminates early.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jan Kara <jack@suse.com>
---
 mm/filemap.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Comments

Andrew Morton Oct. 15, 2015, 8:46 p.m. UTC | #1
On Mon, 12 Oct 2015 14:45:23 +0200 Jan Kara <jack@suse.com> wrote:

> Currently a simple program below issues a sendfile(2) system call which
> takes about 62 days to complete in my test KVM instance.

Geeze some people are impatient.

>         int fd;
>         off_t off = 0;
> 
>         fd = open("file", O_RDWR | O_TRUNC | O_SYNC | O_CREAT, 0644);
>         ftruncate(fd, 2);
>         lseek(fd, 0, SEEK_END);
>         sendfile(fd, fd, &off, 0xfffffff);
> 
> Now you should not ask kernel to do a stupid stuff like copying 256MB in
> 2-byte chunks and call fsync(2) after each chunk but if you do, sysadmin
> should have a way to stop you.
> 
> We actually do have a check for fatal_signal_pending() in
> generic_perform_write() which triggers in this path however because we
> always succeed in writing something before the check is done, we return
> value > 0 from generic_perform_write() and thus the information about
> signal gets lost.

ah.

> Fix the problem by doing the signal check before writing anything. That
> way generic_perform_write() returns -EINTR, the error gets propagated up
> and the sendfile loop terminates early.
>
> ...
>
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2488,6 +2488,11 @@ again:
>  			break;
>  		}
>  
> +		if (fatal_signal_pending(current)) {
> +			status = -EINTR;
> +			break;
> +		}
> +
>  		status = a_ops->write_begin(file, mapping, pos, bytes, flags,
>  						&page, &fsdata);
>  		if (unlikely(status < 0))
> @@ -2525,10 +2530,6 @@ again:
>  		written += copied;
>  
>  		balance_dirty_pages_ratelimited(mapping);
> -		if (fatal_signal_pending(current)) {
> -			status = -EINTR;
> -			break;
> -		}
>  	} while (iov_iter_count(i));
>  
>  	return written ? written : status;

This won't work, will it?  If user hits ^C after we've written a few
pages, `written' is non-zero and the same thing happens?

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Oct. 16, 2015, 6:40 a.m. UTC | #2
On Thu 15-10-15 13:46:44, Andrew Morton wrote:
> On Mon, 12 Oct 2015 14:45:23 +0200 Jan Kara <jack@suse.com> wrote:
> 
> > Currently a simple program below issues a sendfile(2) system call which
> > takes about 62 days to complete in my test KVM instance.
> 
> Geeze some people are impatient.
> 
> >         int fd;
> >         off_t off = 0;
> > 
> >         fd = open("file", O_RDWR | O_TRUNC | O_SYNC | O_CREAT, 0644);
> >         ftruncate(fd, 2);
> >         lseek(fd, 0, SEEK_END);
> >         sendfile(fd, fd, &off, 0xfffffff);
> > 
> > Now you should not ask kernel to do a stupid stuff like copying 256MB in
> > 2-byte chunks and call fsync(2) after each chunk but if you do, sysadmin
> > should have a way to stop you.
> > 
> > We actually do have a check for fatal_signal_pending() in
> > generic_perform_write() which triggers in this path however because we
> > always succeed in writing something before the check is done, we return
> > value > 0 from generic_perform_write() and thus the information about
> > signal gets lost.
> 
> ah.
> 
> > Fix the problem by doing the signal check before writing anything. That
> > way generic_perform_write() returns -EINTR, the error gets propagated up
> > and the sendfile loop terminates early.
> >
> > ...
> >
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -2488,6 +2488,11 @@ again:
> >  			break;
> >  		}
> >  
> > +		if (fatal_signal_pending(current)) {
> > +			status = -EINTR;
> > +			break;
> > +		}
> > +
> >  		status = a_ops->write_begin(file, mapping, pos, bytes, flags,
> >  						&page, &fsdata);
> >  		if (unlikely(status < 0))
> > @@ -2525,10 +2530,6 @@ again:
> >  		written += copied;
> >  
> >  		balance_dirty_pages_ratelimited(mapping);
> > -		if (fatal_signal_pending(current)) {
> > -			status = -EINTR;
> > -			break;
> > -		}
> >  	} while (iov_iter_count(i));
> >  
> >  	return written ? written : status;
> 
> This won't work, will it?  If user hits ^C after we've written a few
> pages, `written' is non-zero and the same thing happens?

It does work - I've tested it :). Sure, the generic_perform_write() call
that is running when the signal is delivered will return with value > 0.
But the interesting thing is what happens after that: Either we return to
userspace (and then we are fine) or generic_perform_write() gets called
again because there's more to write and *that* call will return -EINTR
which ends up terminating the whole sendfile syscall.

Actually there is one general lesson to be learned here: When you check for
fatal signal and bail out, it's better to do it before doing any work. That
way things keep working even if the function is called in a loop.

								Honza
Andrew Morton Oct. 16, 2015, 9:05 p.m. UTC | #3
On Fri, 16 Oct 2015 08:40:27 +0200 Jan Kara <jack@suse.cz> wrote:

> > >  		balance_dirty_pages_ratelimited(mapping);
> > > -		if (fatal_signal_pending(current)) {
> > > -			status = -EINTR;
> > > -			break;
> > > -		}
> > >  	} while (iov_iter_count(i));
> > >  
> > >  	return written ? written : status;
> > 
> > This won't work, will it?  If user hits ^C after we've written a few
> > pages, `written' is non-zero and the same thing happens?
> 
> It does work - I've tested it :). Sure, the generic_perform_write() call
> that is running when the signal is delivered will return with value > 0.
> But the interesting thing is what happens after that: Either we return to
> userspace (and then we are fine) or generic_perform_write() gets called
> again because there's more to write and *that* call will return -EINTR
> which ends up terminating the whole sendfile syscall.

OK.  I guess that's better behaviour than overwriting a non-zero
`written' when signalled.

I'm going to tag this one for -stable.  It's a bit of a DoS.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/mm/filemap.c b/mm/filemap.c
index 1cc5467cf36c..327910c2400c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2488,6 +2488,11 @@  again:
 			break;
 		}
 
+		if (fatal_signal_pending(current)) {
+			status = -EINTR;
+			break;
+		}
+
 		status = a_ops->write_begin(file, mapping, pos, bytes, flags,
 						&page, &fsdata);
 		if (unlikely(status < 0))
@@ -2525,10 +2530,6 @@  again:
 		written += copied;
 
 		balance_dirty_pages_ratelimited(mapping);
-		if (fatal_signal_pending(current)) {
-			status = -EINTR;
-			break;
-		}
 	} while (iov_iter_count(i));
 
 	return written ? written : status;