diff mbox series

SUNRPC: Fix svc_flush_dcache()

Message ID 160063136387.1537.11599713172507546412.stgit@klimt.1015granger.net
State New
Headers show
Series SUNRPC: Fix svc_flush_dcache() | expand

Commit Message

Chuck Lever Sept. 20, 2020, 7:51 p.m. UTC
On platforms that implement flush_dcache_page(), a large NFS WRITE
triggers the WARN_ONCE in bvec_iter_advance():

Sep 20 14:01:05 klimt.1015granger.net kernel: Attempted to advance past end of bvec iter
Sep 20 14:01:05 klimt.1015granger.net kernel: WARNING: CPU: 0 PID: 1032 at include/linux/bvec.h:101 bvec_iter_advance.isra.0+0xa7/0x158 [sunrpc]

Sep 20 14:01:05 klimt.1015granger.net kernel: Call Trace:
Sep 20 14:01:05 klimt.1015granger.net kernel:  svc_tcp_recvfrom+0x60c/0x12c7 [sunrpc]
Sep 20 14:01:05 klimt.1015granger.net kernel:  ? bvec_iter_advance.isra.0+0x158/0x158 [sunrpc]
Sep 20 14:01:05 klimt.1015granger.net kernel:  ? del_timer_sync+0x4b/0x55
Sep 20 14:01:05 klimt.1015granger.net kernel:  ? test_bit+0x1d/0x27 [sunrpc]
Sep 20 14:01:05 klimt.1015granger.net kernel:  svc_recv+0x1193/0x15e4 [sunrpc]
Sep 20 14:01:05 klimt.1015granger.net kernel:  ? try_to_freeze.isra.0+0x6f/0x6f [sunrpc]
Sep 20 14:01:05 klimt.1015granger.net kernel:  ? refcount_sub_and_test.constprop.0+0x13/0x40 [sunrpc]
Sep 20 14:01:05 klimt.1015granger.net kernel:  ? svc_xprt_put+0x1e/0x29f [sunrpc]
Sep 20 14:01:05 klimt.1015granger.net kernel:  ? svc_send+0x39f/0x3c1 [sunrpc]
Sep 20 14:01:05 klimt.1015granger.net kernel:  nfsd+0x282/0x345 [nfsd]
Sep 20 14:01:05 klimt.1015granger.net kernel:  ? __kthread_parkme+0x74/0xba
Sep 20 14:01:05 klimt.1015granger.net kernel:  kthread+0x2ad/0x2bc
Sep 20 14:01:05 klimt.1015granger.net kernel:  ? nfsd_destroy+0x124/0x124 [nfsd]
Sep 20 14:01:05 klimt.1015granger.net kernel:  ? test_bit+0x1d/0x27
Sep 20 14:01:05 klimt.1015granger.net kernel:  ? kthread_mod_delayed_work+0x115/0x115
Sep 20 14:01:05 klimt.1015granger.net kernel:  ret_from_fork+0x22/0x30

Reported-by: He Zhe <zhe.he@windriver.com>
Fixes: ca07eda33e01 ("SUNRPC: Refactor svc_recvfrom()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/svcsock.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Hi Zhe-

If you confirm this fixes your issue and there are no other
objections or regressions, I can submit this for v5.9-rc.

Comments

He Zhe Sept. 22, 2020, 7:13 a.m. UTC | #1
On 9/21/20 3:51 AM, Chuck Lever wrote:
> On platforms that implement flush_dcache_page(), a large NFS WRITE
> triggers the WARN_ONCE in bvec_iter_advance():
>
> Sep 20 14:01:05 klimt.1015granger.net kernel: Attempted to advance past end of bvec iter
> Sep 20 14:01:05 klimt.1015granger.net kernel: WARNING: CPU: 0 PID: 1032 at include/linux/bvec.h:101 bvec_iter_advance.isra.0+0xa7/0x158 [sunrpc]
>
> Sep 20 14:01:05 klimt.1015granger.net kernel: Call Trace:
> Sep 20 14:01:05 klimt.1015granger.net kernel:  svc_tcp_recvfrom+0x60c/0x12c7 [sunrpc]
> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? bvec_iter_advance.isra.0+0x158/0x158 [sunrpc]
> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? del_timer_sync+0x4b/0x55
> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? test_bit+0x1d/0x27 [sunrpc]
> Sep 20 14:01:05 klimt.1015granger.net kernel:  svc_recv+0x1193/0x15e4 [sunrpc]
> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? try_to_freeze.isra.0+0x6f/0x6f [sunrpc]
> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? refcount_sub_and_test.constprop.0+0x13/0x40 [sunrpc]
> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? svc_xprt_put+0x1e/0x29f [sunrpc]
> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? svc_send+0x39f/0x3c1 [sunrpc]
> Sep 20 14:01:05 klimt.1015granger.net kernel:  nfsd+0x282/0x345 [nfsd]
> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? __kthread_parkme+0x74/0xba
> Sep 20 14:01:05 klimt.1015granger.net kernel:  kthread+0x2ad/0x2bc
> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? nfsd_destroy+0x124/0x124 [nfsd]
> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? test_bit+0x1d/0x27
> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? kthread_mod_delayed_work+0x115/0x115
> Sep 20 14:01:05 klimt.1015granger.net kernel:  ret_from_fork+0x22/0x30
>
> Reported-by: He Zhe <zhe.he@windriver.com>
> Fixes: ca07eda33e01 ("SUNRPC: Refactor svc_recvfrom()")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  net/sunrpc/svcsock.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> Hi Zhe-
>
> If you confirm this fixes your issue and there are no other
> objections or regressions, I can submit this for v5.9-rc.

I don't quite get why we add "seek" to "size". It seems this action does not
reflect the actual scenario and forcedly neutralizes the WARN_ONCE check in
bvec_iter_advance, so that it may "advance past end of bvec iter" and thus
introduces overflow.

Why don't we avoid this problem at the very begginning like my v1? That is, call
svc_flush_bvec only when we have received more than we want to seek.

        len = sock_recvmsg(svsk->sk_sock, &msg, MSG_DONTWAIT);
-       if (len > 0)
+       if (len > 0 && (size_t)len > (seek & PAGE_MASK))
                svc_flush_bvec(bvec, len, seek);


Regards,
Zhe

>
>
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index d5805fa1d066..c2752e2b9ce3 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -228,7 +228,7 @@ static int svc_one_sock_name(struct svc_sock *svsk, char *buf, int remaining)
>  static void svc_flush_bvec(const struct bio_vec *bvec, size_t size, size_t seek)
>  {
>  	struct bvec_iter bi = {
> -		.bi_size	= size,
> +		.bi_size	= size + seek,
>  	};
>  	struct bio_vec bv;
>  
>
>
>
Chuck Lever Sept. 22, 2020, 2:14 p.m. UTC | #2
> On Sep 22, 2020, at 3:13 AM, He Zhe <zhe.he@windriver.com> wrote:
> 
> 
> 
> On 9/21/20 3:51 AM, Chuck Lever wrote:
>> On platforms that implement flush_dcache_page(), a large NFS WRITE
>> triggers the WARN_ONCE in bvec_iter_advance():
>> 
>> Sep 20 14:01:05 klimt.1015granger.net kernel: Attempted to advance past end of bvec iter
>> Sep 20 14:01:05 klimt.1015granger.net kernel: WARNING: CPU: 0 PID: 1032 at include/linux/bvec.h:101 bvec_iter_advance.isra.0+0xa7/0x158 [sunrpc]
>> 
>> Sep 20 14:01:05 klimt.1015granger.net kernel: Call Trace:
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  svc_tcp_recvfrom+0x60c/0x12c7 [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? bvec_iter_advance.isra.0+0x158/0x158 [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? del_timer_sync+0x4b/0x55
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? test_bit+0x1d/0x27 [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  svc_recv+0x1193/0x15e4 [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? try_to_freeze.isra.0+0x6f/0x6f [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? refcount_sub_and_test.constprop.0+0x13/0x40 [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? svc_xprt_put+0x1e/0x29f [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? svc_send+0x39f/0x3c1 [sunrpc]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  nfsd+0x282/0x345 [nfsd]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? __kthread_parkme+0x74/0xba
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  kthread+0x2ad/0x2bc
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? nfsd_destroy+0x124/0x124 [nfsd]
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? test_bit+0x1d/0x27
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? kthread_mod_delayed_work+0x115/0x115
>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ret_from_fork+0x22/0x30
>> 
>> Reported-by: He Zhe <zhe.he@windriver.com>
>> Fixes: ca07eda33e01 ("SUNRPC: Refactor svc_recvfrom()")
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> net/sunrpc/svcsock.c |    2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> Hi Zhe-
>> 
>> If you confirm this fixes your issue and there are no other
>> objections or regressions, I can submit this for v5.9-rc.
> 
> I don't quite get why we add "seek" to "size". It seems this action does not
> reflect the actual scenario and forcedly neutralizes the WARN_ONCE check in
> bvec_iter_advance, so that it may "advance past end of bvec iter" and thus
> introduces overflow.

> Why don't we avoid this problem at the very begginning like my v1? That is, call
> svc_flush_bvec only when we have received more than we want to seek.
> 
>         len = sock_recvmsg(svsk->sk_sock, &msg, MSG_DONTWAIT);
> -       if (len > 0)
> +       if (len > 0 && (size_t)len > (seek & PAGE_MASK))
>                 svc_flush_bvec(bvec, len, seek);

Because this doesn't fix the underlying bug that triggered the
WARN_ONCE.

svc_tcp_recvfrom() attempts to assemble a possibly large RPC Call
from a sequence of sock_recvmsg's.

@seek is the running number of bytes that has been received so
far for the RPC Call we are assembling. @size is the number of
bytes that was just received in the most recent sock_recvmsg.

We want svc_flush_bvec to flush just the area of @bvec that
hasn't been flushed yet.

Thus: the current size of the partial Call message in @bvec is
@seek + @size. The starting location of the flush is
@seek & PAGE_MASK. This aligns the flush so it starts on a page
boundary.

This:

 230         struct bvec_iter bi = {
 231                 .bi_size        = size + seek,
 232         };

 235         bvec_iter_advance(bvec, &bi, seek & PAGE_MASK);

advances the bvec_iter to the part of @bvec that hasn't been
flushed yet.

This loop:

 236         for_each_bvec(bv, bvec, bi, bi)
 237                 flush_dcache_page(bv.bv_page);

flushes each page starting at that point to the end of the bytes
that have been received so far

In other words, ca07eda33e01 was wrong because it always flushed
the first section of @bvec, never the later parts of it.


> Regards,
> Zhe
> 
>> 
>> 
>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>> index d5805fa1d066..c2752e2b9ce3 100644
>> --- a/net/sunrpc/svcsock.c
>> +++ b/net/sunrpc/svcsock.c
>> @@ -228,7 +228,7 @@ static int svc_one_sock_name(struct svc_sock *svsk, char *buf, int remaining)
>> static void svc_flush_bvec(const struct bio_vec *bvec, size_t size, size_t seek)
>> {
>> 	struct bvec_iter bi = {
>> -		.bi_size	= size,
>> +		.bi_size	= size + seek,
>> 	};
>> 	struct bio_vec bv;

--
Chuck Lever
He Zhe Sept. 23, 2020, 3:30 a.m. UTC | #3
On 9/22/20 10:14 PM, Chuck Lever wrote:
>
>> On Sep 22, 2020, at 3:13 AM, He Zhe <zhe.he@windriver.com> wrote:
>>
>>
>>
>> On 9/21/20 3:51 AM, Chuck Lever wrote:
>>> On platforms that implement flush_dcache_page(), a large NFS WRITE
>>> triggers the WARN_ONCE in bvec_iter_advance():
>>>
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: Attempted to advance past end of bvec iter
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: WARNING: CPU: 0 PID: 1032 at include/linux/bvec.h:101 bvec_iter_advance.isra.0+0xa7/0x158 [sunrpc]
>>>
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: Call Trace:
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  svc_tcp_recvfrom+0x60c/0x12c7 [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? bvec_iter_advance.isra.0+0x158/0x158 [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? del_timer_sync+0x4b/0x55
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? test_bit+0x1d/0x27 [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  svc_recv+0x1193/0x15e4 [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? try_to_freeze.isra.0+0x6f/0x6f [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? refcount_sub_and_test.constprop.0+0x13/0x40 [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? svc_xprt_put+0x1e/0x29f [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? svc_send+0x39f/0x3c1 [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  nfsd+0x282/0x345 [nfsd]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? __kthread_parkme+0x74/0xba
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  kthread+0x2ad/0x2bc
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? nfsd_destroy+0x124/0x124 [nfsd]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? test_bit+0x1d/0x27
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? kthread_mod_delayed_work+0x115/0x115
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ret_from_fork+0x22/0x30
>>>
>>> Reported-by: He Zhe <zhe.he@windriver.com>
>>> Fixes: ca07eda33e01 ("SUNRPC: Refactor svc_recvfrom()")
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>> net/sunrpc/svcsock.c |    2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> Hi Zhe-
>>>
>>> If you confirm this fixes your issue and there are no other
>>> objections or regressions, I can submit this for v5.9-rc.
>> I don't quite get why we add "seek" to "size". It seems this action does not
>> reflect the actual scenario and forcedly neutralizes the WARN_ONCE check in
>> bvec_iter_advance, so that it may "advance past end of bvec iter" and thus
>> introduces overflow.
>> Why don't we avoid this problem at the very begginning like my v1? That is, call
>> svc_flush_bvec only when we have received more than we want to seek.
>>
>>         len = sock_recvmsg(svsk->sk_sock, &msg, MSG_DONTWAIT);
>> -       if (len > 0)
>> +       if (len > 0 && (size_t)len > (seek & PAGE_MASK))
>>                 svc_flush_bvec(bvec, len, seek);
> Because this doesn't fix the underlying bug that triggered the
> WARN_ONCE.
>
> svc_tcp_recvfrom() attempts to assemble a possibly large RPC Call
> from a sequence of sock_recvmsg's.
>
> @seek is the running number of bytes that has been received so
> far for the RPC Call we are assembling. @size is the number of
> bytes that was just received in the most recent sock_recvmsg.
>
> We want svc_flush_bvec to flush just the area of @bvec that
> hasn't been flushed yet.
>
> Thus: the current size of the partial Call message in @bvec is
> @seek + @size. The starting location of the flush is
> @seek & PAGE_MASK. This aligns the flush so it starts on a page
> boundary.
>
> This:
>
>  230         struct bvec_iter bi = {
>  231                 .bi_size        = size + seek,
>  232         };
>
>  235         bvec_iter_advance(bvec, &bi, seek & PAGE_MASK);
>
> advances the bvec_iter to the part of @bvec that hasn't been
> flushed yet.
>
> This loop:
>
>  236         for_each_bvec(bv, bvec, bi, bi)
>  237                 flush_dcache_page(bv.bv_page);
>
> flushes each page starting at that point to the end of the bytes
> that have been received so far
>
> In other words, ca07eda33e01 was wrong because it always flushed
> the first section of @bvec, never the later parts of it.

Thanks for clarification. I just tested the patch. It works well.

Zhe

>
>
>> Regards,
>> Zhe
>>
>>>
>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>>> index d5805fa1d066..c2752e2b9ce3 100644
>>> --- a/net/sunrpc/svcsock.c
>>> +++ b/net/sunrpc/svcsock.c
>>> @@ -228,7 +228,7 @@ static int svc_one_sock_name(struct svc_sock *svsk, char *buf, int remaining)
>>> static void svc_flush_bvec(const struct bio_vec *bvec, size_t size, size_t seek)
>>> {
>>> 	struct bvec_iter bi = {
>>> -		.bi_size	= size,
>>> +		.bi_size	= size + seek,
>>> 	};
>>> 	struct bio_vec bv;
> --
> Chuck Lever
>
>
>
diff mbox series

Patch

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index d5805fa1d066..c2752e2b9ce3 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -228,7 +228,7 @@  static int svc_one_sock_name(struct svc_sock *svsk, char *buf, int remaining)
 static void svc_flush_bvec(const struct bio_vec *bvec, size_t size, size_t seek)
 {
 	struct bvec_iter bi = {
-		.bi_size	= size,
+		.bi_size	= size + seek,
 	};
 	struct bio_vec bv;