diff mbox series

block: allow for_each_bvec to support zero len bvec

Message ID 20200810031915.2209658-1-ming.lei@redhat.com (mailing list archive)
State New, archived
Headers show
Series block: allow for_each_bvec to support zero len bvec | expand

Commit Message

Ming Lei Aug. 10, 2020, 3:19 a.m. UTC
Block layer usually doesn't support or allow zero-length bvec. Since
commit 1bdc76aea115 ("iov_iter: use bvec iterator to implement
iterate_bvec()"), iterate_bvec() switches to bvec iterator. However,
Al mentioned that 'Zero-length segments are not disallowed' in iov_iter.

Fixes for_each_bvec() so that it can move on after seeing one zero
length bvec.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2262077.html
Fixes: 1bdc76aea115 ("iov_iter: use bvec iterator to implement iterate_bvec()")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
---
 include/linux/bvec.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Matthew Wilcox Aug. 10, 2020, 3:33 a.m. UTC | #1
On Mon, Aug 10, 2020 at 11:19:15AM +0800, Ming Lei wrote:
> +++ b/include/linux/bvec.h
> @@ -117,11 +117,18 @@ static inline bool bvec_iter_advance(const struct bio_vec *bv,
>  	return true;
>  }
>  
> +static inline void bvec_iter_skip_zero_bvec(struct bvec_iter *iter)
> +{
> +	iter->bi_bvec_done = 0;
> +	iter->bi_idx++;
> +}
> +
>  #define for_each_bvec(bvl, bio_vec, iter, start)			\
>  	for (iter = (start);						\
>  	     (iter).bi_size &&						\
>  		((bvl = bvec_iter_bvec((bio_vec), (iter))), 1);	\
> -	     bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len))
> +	     (bvl).bv_len ? bvec_iter_advance((bio_vec), &(iter),	\
> +		     (bvl).bv_len) : bvec_iter_skip_zero_bvec(&(iter)))
>  

What if you have two zero-length bvecs in a row?  Won't this just skip
the first one?

It would seem better to me to put the bv_len test in bvec_iter_advance()
instead of making the macro more complicated.
Ming Lei Aug. 10, 2020, 4:02 a.m. UTC | #2
On Mon, Aug 10, 2020 at 04:33:09AM +0100, Matthew Wilcox wrote:
> On Mon, Aug 10, 2020 at 11:19:15AM +0800, Ming Lei wrote:
> > +++ b/include/linux/bvec.h
> > @@ -117,11 +117,18 @@ static inline bool bvec_iter_advance(const struct bio_vec *bv,
> >  	return true;
> >  }
> >  
> > +static inline void bvec_iter_skip_zero_bvec(struct bvec_iter *iter)
> > +{
> > +	iter->bi_bvec_done = 0;
> > +	iter->bi_idx++;
> > +}
> > +
> >  #define for_each_bvec(bvl, bio_vec, iter, start)			\
> >  	for (iter = (start);						\
> >  	     (iter).bi_size &&						\
> >  		((bvl = bvec_iter_bvec((bio_vec), (iter))), 1);	\
> > -	     bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len))
> > +	     (bvl).bv_len ? bvec_iter_advance((bio_vec), &(iter),	\
> > +		     (bvl).bv_len) : bvec_iter_skip_zero_bvec(&(iter)))
> >  
> 
> What if you have two zero-length bvecs in a row?  Won't this just skip
> the first one?

The 2nd one will be skipped too when it is observed in next loop.

> 
> It would seem better to me to put the bv_len test in bvec_iter_advance()
> instead of making the macro more complicated.

The reason is that block layer won't support zero length bvec, and I'd
not bother bvec_iter_advance() for adding this check.


Thanks,
Ming
Tetsuo Handa Aug. 10, 2020, 7:52 a.m. UTC | #3
On 2020/08/10 12:19, Ming Lei wrote:
> Block layer usually doesn't support or allow zero-length bvec. Since
> commit 1bdc76aea115 ("iov_iter: use bvec iterator to implement
> iterate_bvec()"), iterate_bvec() switches to bvec iterator. However,
> Al mentioned that 'Zero-length segments are not disallowed' in iov_iter.
> 
> Fixes for_each_bvec() so that it can move on after seeing one zero
> length bvec.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> Link: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2262077.html
> Fixes: 1bdc76aea115 ("iov_iter: use bvec iterator to implement iterate_bvec()")

Is this Fixes: correct? That commit should be in RHEL8's 4.18 kernel but that kernel
does not hit this bug.

Moreover, maybe nobody cares, but behavior of splice() differs when there are only 
zero-length pages. With this fix, splice() returns 0 despite there is still pipe writers.
Man page seems to say that splice() returns 0 when there is no pipe writers...

    A return value of 0 means end of input.  If fd_in refers to a pipe,
    then this means that there was no data to transfer, and it would not
    make sense to block because there are no writers connected to the
    write end of the pipe.

----- test case -----
#define _GNU_SOURCE
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

int main(int argc, char *argv[])
{
        static char buffer[4096];
        const int fd = open("/tmp/testfile", O_WRONLY | O_CREAT, 0600);
        int pipe_fd[2] = { EOF, EOF };
        pipe(pipe_fd);
        write(pipe_fd[1], NULL, sizeof(buffer));
        write(pipe_fd[1], NULL, sizeof(buffer));
        memset(buffer, 'a', sizeof(buffer));
        //write(pipe_fd[1], buffer, sizeof(buffer));
        write(pipe_fd[1], NULL, sizeof(buffer));
        write(pipe_fd[1], NULL, sizeof(buffer));
        memset(buffer, 'b', sizeof(buffer));
        //write(pipe_fd[1], buffer, sizeof(buffer));
        write(pipe_fd[1], NULL, sizeof(buffer));
        write(pipe_fd[1], NULL, sizeof(buffer));
        memset(buffer, 'c', sizeof(buffer));
        //write(pipe_fd[1], buffer, sizeof(buffer));
        write(pipe_fd[1], NULL, sizeof(buffer));
        write(pipe_fd[1], NULL, sizeof(buffer));
        memset(buffer, 'd', sizeof(buffer));
        //write(pipe_fd[1], buffer, sizeof(buffer));
        write(pipe_fd[1], NULL, sizeof(buffer));
        write(pipe_fd[1], NULL, sizeof(buffer));
        splice(pipe_fd[0], NULL, fd, NULL, 65536, 0);
        return 0;
}

----- 4.18.0-193.14.2.el8_2.x86_64 -----
openat(AT_FDCWD, "/tmp/testfile", O_WRONLY|O_CREAT, 0600) = 3
pipe([4, 5])                            = 0
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
splice(4, NULL, 3, NULL, 65536, 0

^C)      = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
strace: Process 1486 detached

----- linux.git + this fix -----
open("/tmp/testfile", O_WRONLY|O_CREAT, 0600) = 3
pipe([4, 5])                            = 0
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
splice(4, NULL, 3, NULL, 65536, 0)      = 0
exit_group(0)                           = ?
+++ exited with 0 +++

> Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>

I just forwarded syzbot's report. Thus, credit goes to

Reported-by: syzbot <syzbot+61acc40a49a3e46e25ea@syzkaller.appspotmail.com>

> Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: <stable@vger.kernel.org>
Ming Lei Aug. 10, 2020, 4:23 p.m. UTC | #4
On Mon, Aug 10, 2020 at 04:52:17PM +0900, Tetsuo Handa wrote:
> On 2020/08/10 12:19, Ming Lei wrote:
> > Block layer usually doesn't support or allow zero-length bvec. Since
> > commit 1bdc76aea115 ("iov_iter: use bvec iterator to implement
> > iterate_bvec()"), iterate_bvec() switches to bvec iterator. However,
> > Al mentioned that 'Zero-length segments are not disallowed' in iov_iter.
> > 
> > Fixes for_each_bvec() so that it can move on after seeing one zero
> > length bvec.
> > 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > Link: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2262077.html
> > Fixes: 1bdc76aea115 ("iov_iter: use bvec iterator to implement iterate_bvec()")
> 
> Is this Fixes: correct? That commit should be in RHEL8's 4.18 kernel but that kernel
> does not hit this bug.

Yeah, it is correct, see the following link:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.8&id=1bdc76aea1159a750846c2fc98e404403eb7d51c

Commit 1bdc76aea115 was merged to v4.8, so it is definitely in both RHEL8's
4.18 based kernel and upstream kernel.

> 
> Moreover, maybe nobody cares, but behavior of splice() differs when there are only 
> zero-length pages. With this fix, splice() returns 0 despite there is still pipe writers.

It is another new issue, which isn't related with Commit 1bdc76aea115,
see below.

> Man page seems to say that splice() returns 0 when there is no pipe writers...
> 
>     A return value of 0 means end of input.  If fd_in refers to a pipe,
>     then this means that there was no data to transfer, and it would not
>     make sense to block because there are no writers connected to the
>     write end of the pipe.
> 
> ----- test case -----
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <string.h>
> 
> int main(int argc, char *argv[])
> {
>         static char buffer[4096];
>         const int fd = open("/tmp/testfile", O_WRONLY | O_CREAT, 0600);
>         int pipe_fd[2] = { EOF, EOF };
>         pipe(pipe_fd);
>         write(pipe_fd[1], NULL, sizeof(buffer));
>         write(pipe_fd[1], NULL, sizeof(buffer));
>         memset(buffer, 'a', sizeof(buffer));
>         //write(pipe_fd[1], buffer, sizeof(buffer));
>         write(pipe_fd[1], NULL, sizeof(buffer));
>         write(pipe_fd[1], NULL, sizeof(buffer));
>         memset(buffer, 'b', sizeof(buffer));
>         //write(pipe_fd[1], buffer, sizeof(buffer));
>         write(pipe_fd[1], NULL, sizeof(buffer));
>         write(pipe_fd[1], NULL, sizeof(buffer));
>         memset(buffer, 'c', sizeof(buffer));
>         //write(pipe_fd[1], buffer, sizeof(buffer));
>         write(pipe_fd[1], NULL, sizeof(buffer));
>         write(pipe_fd[1], NULL, sizeof(buffer));
>         memset(buffer, 'd', sizeof(buffer));
>         //write(pipe_fd[1], buffer, sizeof(buffer));
>         write(pipe_fd[1], NULL, sizeof(buffer));
>         write(pipe_fd[1], NULL, sizeof(buffer));
>         splice(pipe_fd[0], NULL, fd, NULL, 65536, 0);
>         return 0;
> }

The above test doesn't trigger the reported lockup issue, so this patch
isn't related with the new issue you described.

> 
> ----- 4.18.0-193.14.2.el8_2.x86_64 -----
> openat(AT_FDCWD, "/tmp/testfile", O_WRONLY|O_CREAT, 0600) = 3
> pipe([4, 5])                            = 0
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> splice(4, NULL, 3, NULL, 65536, 0
> 
> ^C)      = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> strace: Process 1486 detached

The same behavior can be observed on v4.8 too, both v4.8 and v4.18
includes 1bdc76aea115. If you apply the fix against v4.8, you can
observe the same behavior too.

> 
> ----- linux.git + this fix -----

It should have been linux.git, :-)

I think this new issue may be introduced between v4.18 and v5.8.

> open("/tmp/testfile", O_WRONLY|O_CREAT, 0600) = 3
> pipe([4, 5])                            = 0
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> write(5, NULL, 4096)                    = -1 EFAULT (Bad address)
> splice(4, NULL, 3, NULL, 65536, 0)      = 0
> exit_group(0)                           = ?
> +++ exited with 0 +++
> 
> > Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> 
> I just forwarded syzbot's report. Thus, credit goes to
> 
> Reported-by: syzbot <syzbot+61acc40a49a3e46e25ea@syzkaller.appspotmail.com>

OK.


Thanks,
Ming
Ming Lei Aug. 12, 2020, 9 a.m. UTC | #5
On Tue, Aug 11, 2020 at 12:23:31AM +0800, Ming Lei wrote:
> On Mon, Aug 10, 2020 at 04:52:17PM +0900, Tetsuo Handa wrote:
> > On 2020/08/10 12:19, Ming Lei wrote:
> > > Block layer usually doesn't support or allow zero-length bvec. Since
> > > commit 1bdc76aea115 ("iov_iter: use bvec iterator to implement
> > > iterate_bvec()"), iterate_bvec() switches to bvec iterator. However,
> > > Al mentioned that 'Zero-length segments are not disallowed' in iov_iter.
> > > 
> > > Fixes for_each_bvec() so that it can move on after seeing one zero
> > > length bvec.
> > > 
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > Link: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2262077.html
> > > Fixes: 1bdc76aea115 ("iov_iter: use bvec iterator to implement iterate_bvec()")
> > 
> > Is this Fixes: correct? That commit should be in RHEL8's 4.18 kernel but that kernel
> > does not hit this bug.
> 
> Yeah, it is correct, see the following link:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.8&id=1bdc76aea1159a750846c2fc98e404403eb7d51c
> 
> Commit 1bdc76aea115 was merged to v4.8, so it is definitely in both RHEL8's
> 4.18 based kernel and upstream kernel.
> 
> > 
> > Moreover, maybe nobody cares, but behavior of splice() differs when there are only 
> > zero-length pages. With this fix, splice() returns 0 despite there is still pipe writers.
> 
> It is another new issue, which isn't related with Commit 1bdc76aea115,
> see below.
> 
> > Man page seems to say that splice() returns 0 when there is no pipe writers...
> > 
> >     A return value of 0 means end of input.  If fd_in refers to a pipe,
> >     then this means that there was no data to transfer, and it would not
> >     make sense to block because there are no writers connected to the
> >     write end of the pipe.
> > 
> > ----- test case -----
> > #define _GNU_SOURCE
> > #include <stdio.h>
> > #include <sys/types.h>
> > #include <sys/stat.h>
> > #include <fcntl.h>
> > #include <unistd.h>
> > #include <string.h>
> > 
> > int main(int argc, char *argv[])
> > {
> >         static char buffer[4096];
> >         const int fd = open("/tmp/testfile", O_WRONLY | O_CREAT, 0600);
> >         int pipe_fd[2] = { EOF, EOF };
> >         pipe(pipe_fd);
> >         write(pipe_fd[1], NULL, sizeof(buffer));
> >         write(pipe_fd[1], NULL, sizeof(buffer));
> >         memset(buffer, 'a', sizeof(buffer));
> >         //write(pipe_fd[1], buffer, sizeof(buffer));
> >         write(pipe_fd[1], NULL, sizeof(buffer));
> >         write(pipe_fd[1], NULL, sizeof(buffer));
> >         memset(buffer, 'b', sizeof(buffer));
> >         //write(pipe_fd[1], buffer, sizeof(buffer));
> >         write(pipe_fd[1], NULL, sizeof(buffer));
> >         write(pipe_fd[1], NULL, sizeof(buffer));
> >         memset(buffer, 'c', sizeof(buffer));
> >         //write(pipe_fd[1], buffer, sizeof(buffer));
> >         write(pipe_fd[1], NULL, sizeof(buffer));
> >         write(pipe_fd[1], NULL, sizeof(buffer));
> >         memset(buffer, 'd', sizeof(buffer));
> >         //write(pipe_fd[1], buffer, sizeof(buffer));
> >         write(pipe_fd[1], NULL, sizeof(buffer));
> >         write(pipe_fd[1], NULL, sizeof(buffer));
> >         splice(pipe_fd[0], NULL, fd, NULL, 65536, 0);
> >         return 0;
> > }
> 
> The above test doesn't trigger the reported lockup issue, so this patch
> isn't related with the new issue you described.

BTW, for_each_bvec won't be called in the above splice test code.


Thanks,
Ming
Tetsuo Handa Aug. 12, 2020, 10:03 a.m. UTC | #6
On 2020/08/12 18:00, Ming Lei wrote:
> BTW, for_each_bvec won't be called in the above splice test code.

Please uncomment the // lines when testing for_each_bvec() case.
This is a test case for testing all empty pages.
Ming Lei Aug. 12, 2020, 12:47 p.m. UTC | #7
On Wed, Aug 12, 2020 at 07:03:59PM +0900, Tetsuo Handa wrote:
> On 2020/08/12 18:00, Ming Lei wrote:
> > BTW, for_each_bvec won't be called in the above splice test code.
> 
> Please uncomment the // lines when testing for_each_bvec() case.

What is the '//' lines?

> This is a test case for testing all empty pages.

But the case for testing all empty pages is not related with this patch,
is it?


Thanks,
Ming
Matthew Wilcox Aug. 12, 2020, 12:51 p.m. UTC | #8
On Wed, Aug 12, 2020 at 08:47:12PM +0800, Ming Lei wrote:
> On Wed, Aug 12, 2020 at 07:03:59PM +0900, Tetsuo Handa wrote:
> > On 2020/08/12 18:00, Ming Lei wrote:
> > > BTW, for_each_bvec won't be called in the above splice test code.
> > 
> > Please uncomment the // lines when testing for_each_bvec() case.
> 
> What is the '//' lines?

The lines in the test-case which begin with the sequence '//'.

> > This is a test case for testing all empty pages.
> 
> But the case for testing all empty pages is not related with this patch,
> is it?
> 
> 
> Thanks,
> Ming
>
Tetsuo Handa Aug. 13, 2020, 1:13 a.m. UTC | #9
On 2020/08/11 1:23, Ming Lei wrote:
> The same behavior can be observed on v4.8 too, both v4.8 and v4.18
> includes 1bdc76aea115. If you apply the fix against v4.8, you can
> observe the same behavior too.

(...snipped...)

> I think this new issue may be introduced between v4.18 and v5.8.

Bisection reported that both problems ("infinite busy loop lockup" and "premature splice() return") became
visible since commit a194dfe6e6f6f720 ("pipe: Rearrange sequence in pipe_write() to preallocate slot").

Therefore, although the bug might have been existed since commit 1bdc76aea115 ("iov_iter: use bvec iterator
to implement iterate_bvec()"), we need to apply your patch to 5.5+ only.

----- test case -----

#define _GNU_SOURCE
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

int main(int argc, char *argv[])
{
        static char buffer[4096];
        const int fd = open("/tmp/testfile", O_WRONLY | O_CREAT, 0600);
        int pipe_fd[2] = { EOF, EOF };
        pipe(pipe_fd);
        write(pipe_fd[1], NULL, sizeof(buffer));
        write(pipe_fd[1], NULL, sizeof(buffer));
        memset(buffer, 'a', sizeof(buffer));
        if (argc > 1)
                write(pipe_fd[1], buffer, sizeof(buffer));
        write(pipe_fd[1], NULL, sizeof(buffer));
        write(pipe_fd[1], NULL, sizeof(buffer));
        memset(buffer, 'b', sizeof(buffer));
        if (argc > 1)
                write(pipe_fd[1], buffer, sizeof(buffer));
        write(pipe_fd[1], NULL, sizeof(buffer));
        write(pipe_fd[1], NULL, sizeof(buffer));
        memset(buffer, 'c', sizeof(buffer));
        if (argc > 1)
                write(pipe_fd[1], buffer, sizeof(buffer));
        write(pipe_fd[1], NULL, sizeof(buffer));
        write(pipe_fd[1], NULL, sizeof(buffer));
        memset(buffer, 'd', sizeof(buffer));
        if (argc > 1)
                write(pipe_fd[1], buffer, sizeof(buffer));
        write(pipe_fd[1], NULL, sizeof(buffer));
        write(pipe_fd[1], NULL, sizeof(buffer));
        splice(pipe_fd[0], NULL, fd, NULL, 65536, 0);
        return 0;
}

----- bisect log -----

# bad: [e42617b825f8073569da76dc4510bfa019b1c35a] Linux 5.5-rc1
# good: [219d54332a09e8d8741c1e1982f5eae56099de85] Linux 5.4
# good: [4d856f72c10ecb060868ed10ff1b1453943fc6c8] Linux 5.3
# good: [0ecfebd2b52404ae0c54a878c872bb93363ada36] Linux 5.2
# good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1
# good: [1c163f4c7b3f621efff9b28a47abb36f7378d783] Linux 5.0
git bisect start 'v5.5-rc1' 'v5.4' 'v5.3' 'v5.2' 'v5.1' 'v5.0' '--' 'fs/splice.c' 'fs/pipe.c' 'include/linux/splice.h' 'include/linux/pipe_fs_i.h'
# bad: [8f868d68d335a17923dffb6858f8e9b656424699] pipe: Fix missing mask update after pipe_wait()
git bisect bad 8f868d68d335a17923dffb6858f8e9b656424699
# bad: [a194dfe6e6f6f7205eea850a420f2bc6a1541209] pipe: Rearrange sequence in pipe_write() to preallocate slot
git bisect bad a194dfe6e6f6f7205eea850a420f2bc6a1541209
# good: [6718b6f855a0b4962d54bd625be2718cb820cec6] pipe: Allow pipes to have kernel-reserved slots
git bisect good 6718b6f855a0b4962d54bd625be2718cb820cec6
# good: [8446487feba988a92e7649c60367510f0b0445a8] pipe: Conditionalise wakeup in pipe_read()
git bisect good 8446487feba988a92e7649c60367510f0b0445a8
# first bad commit: [a194dfe6e6f6f7205eea850a420f2bc6a1541209] pipe: Rearrange sequence in pipe_write() to preallocate slot
Tetsuo Handa Aug. 27, 2020, 1:27 p.m. UTC | #10
Jens or Al, will you pick up
"[PATCH V2] block: allow for_each_bvec to support zero len bvec"
( https://lkml.kernel.org/r/20200817100055.2495905-1-ming.lei@redhat.com )
which needs be backported to 5.5+ kernels in order to avoid DoS attack
by a local unprivileged user.

David, is the patch show below (which should be backported to 5.5+ kernels)
correct? Is splice_from_pipe_next() the better location to check?
Are there other consumers which needs to do the same thing?

From 60c3e828f9d8279752865d80411c9b19dbe5c35c Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Thu, 27 Aug 2020 22:17:02 +0900
Subject: [PATCH] splice: fix premature end of input detection

splice() from pipe should return 0 when there is no pipe writer. However,
since commit a194dfe6e6f6f720 ("pipe: Rearrange sequence in pipe_write()
to preallocate slot") started inserting empty pages, splice() from pipe
also returns 0 when all ready buffers are empty pages. Since such behavior
might confuse splice() users, let's fix it by waiting for non-empty pages
before building the vector.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: a194dfe6e6f6f720 ("pipe: Rearrange sequence in pipe_write() to preallocate slot")
Cc: stable@vger.kernel.org # 5.5+
---
 fs/splice.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/splice.c b/fs/splice.c
index d7c8a7c4db07..52daa5fea879 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -724,6 +724,19 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 		tail = pipe->tail;
 		mask = pipe->ring_size - 1;
 
+		/* dismiss the empty buffers */
+		while (!pipe_empty(head, tail)) {
+			struct pipe_buffer *buf = &pipe->bufs[tail & mask];
+
+			if (likely(buf->len))
+				break;
+			pipe_buf_release(pipe, buf);
+			pipe->tail = ++tail;
+		}
+		/* wait again if all buffers were empty */
+		if (unlikely(pipe_empty(head, tail)))
+			continue;
+
 		/* build the vector */
 		left = sd.total_len;
 		for (n = 0; !pipe_empty(head, tail) && left && n < nbufs; tail++, n++) {
diff mbox series

Patch

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index ac0c7299d5b8..9c4fab5f22a7 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -117,11 +117,18 @@  static inline bool bvec_iter_advance(const struct bio_vec *bv,
 	return true;
 }
 
+static inline void bvec_iter_skip_zero_bvec(struct bvec_iter *iter)
+{
+	iter->bi_bvec_done = 0;
+	iter->bi_idx++;
+}
+
 #define for_each_bvec(bvl, bio_vec, iter, start)			\
 	for (iter = (start);						\
 	     (iter).bi_size &&						\
 		((bvl = bvec_iter_bvec((bio_vec), (iter))), 1);	\
-	     bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len))
+	     (bvl).bv_len ? bvec_iter_advance((bio_vec), &(iter),	\
+		     (bvl).bv_len) : bvec_iter_skip_zero_bvec(&(iter)))
 
 /* for iterating one bio from start to end */
 #define BVEC_ITER_ALL_INIT (struct bvec_iter)				\