Message ID | 20230609131817.712867-2-xianting.tian@linux.alibaba.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 7a5103b81a9628b6b66fc710d9ccdd2f2d27a58c |
Headers | show |
Series | fixup potential cpu stall | expand |
Context | Check | Description |
---|---|---|
tedd_an/pre-ci_am | success | Success |
tedd_an/CheckPatch | warning | WARNING: From:/Signed-off-by: email address mismatch: 'From: Xianting Tian <tianxianting.txt@alibaba-inc.com>' != 'Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>' total: 0 errors, 1 warnings, 7 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. /github/workspace/src/src/13273948.patch has style problems, please review. NOTE: Ignored message types: UNKNOWN_COMMIT_ID NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. |
tedd_an/GitLint | success | Gitlint PASS |
tedd_an/SubjectPrefix | fail | "Bluetooth: " prefix is not specified in the subject |
tedd_an/BuildKernel | success | BuildKernel PASS |
tedd_an/CheckAllWarning | success | CheckAllWarning PASS |
tedd_an/CheckSparse | success | CheckSparse PASS |
tedd_an/CheckSmatch | success | CheckSparse PASS |
tedd_an/BuildKernel32 | success | BuildKernel32 PASS |
tedd_an/TestRunnerSetup | success | TestRunnerSetup PASS |
tedd_an/TestRunner_l2cap-tester | success | TestRunner PASS |
tedd_an/TestRunner_iso-tester | success | TestRunner PASS |
tedd_an/TestRunner_bnep-tester | success | TestRunner PASS |
tedd_an/TestRunner_mgmt-tester | success | TestRunner PASS |
tedd_an/TestRunner_rfcomm-tester | success | TestRunner PASS |
tedd_an/TestRunner_sco-tester | success | TestRunner PASS |
tedd_an/TestRunner_ioctl-tester | success | TestRunner PASS |
tedd_an/TestRunner_mesh-tester | success | TestRunner PASS |
tedd_an/TestRunner_smp-tester | success | TestRunner PASS |
tedd_an/TestRunner_userchan-tester | success | TestRunner PASS |
tedd_an/IncrementalBuild | success | Incremental Build PASS |
On Fri, Jun 09, 2023 at 09:18:15PM +0800, Xianting Tian wrote: > From: Xianting Tian <tianxianting.txt@alibaba-inc.com> > > Cpu stall issue may happen if device is configured with multi queues > and large queue depth, so fix it. > > Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> > --- > drivers/crypto/virtio/virtio_crypto_core.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/crypto/virtio/virtio_crypto_core.c b/drivers/crypto/virtio/virtio_crypto_core.c > index 1198bd306365..94849fa3bd74 100644 > --- a/drivers/crypto/virtio/virtio_crypto_core.c > +++ b/drivers/crypto/virtio/virtio_crypto_core.c > @@ -480,6 +480,7 @@ static void virtcrypto_free_unused_reqs(struct virtio_crypto *vcrypto) > kfree(vc_req->req_data); > kfree(vc_req->sgs); > } > + cond_resched(); that's not "fixing a stall", it is "call the scheduler because we are taking too long". The CPU isn't stalled at all, just busy. Are you sure this isn't just a bug in the code? Why is this code taking so long that you have to force the scheduler to run? This is almost always a sign that something else needs to be fixed instead. thanks, greg k-h
On Fri, Jun 09, 2023 at 03:39:24PM +0200, Greg KH wrote: > On Fri, Jun 09, 2023 at 09:18:15PM +0800, Xianting Tian wrote: > > From: Xianting Tian <tianxianting.txt@alibaba-inc.com> > > > > Cpu stall issue may happen if device is configured with multi queues > > and large queue depth, so fix it. > > > > Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> > > --- > > drivers/crypto/virtio/virtio_crypto_core.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/crypto/virtio/virtio_crypto_core.c b/drivers/crypto/virtio/virtio_crypto_core.c > > index 1198bd306365..94849fa3bd74 100644 > > --- a/drivers/crypto/virtio/virtio_crypto_core.c > > +++ b/drivers/crypto/virtio/virtio_crypto_core.c > > @@ -480,6 +480,7 @@ static void virtcrypto_free_unused_reqs(struct virtio_crypto *vcrypto) > > kfree(vc_req->req_data); > > kfree(vc_req->sgs); > > } > > + cond_resched(); > > that's not "fixing a stall", it is "call the scheduler because we are > taking too long". The CPU isn't stalled at all, just busy. > > Are you sure this isn't just a bug in the code? Why is this code taking > so long that you have to force the scheduler to run? This is almost > always a sign that something else needs to be fixed instead. And same comment on the other 2 patches, please fix this properly. Also, this is a tight loop that is just freeing memory, why is it taking so long? Why do you want it to take longer (which is what you are doing here), ideally it would be faster, not slower, so you are now slowing down the system overall with this patchset, right? thanks, greg k-h
在 2023/6/9 下午9:41, Greg KH 写道: > On Fri, Jun 09, 2023 at 03:39:24PM +0200, Greg KH wrote: >> On Fri, Jun 09, 2023 at 09:18:15PM +0800, Xianting Tian wrote: >>> From: Xianting Tian <tianxianting.txt@alibaba-inc.com> >>> >>> Cpu stall issue may happen if device is configured with multi queues >>> and large queue depth, so fix it. >>> >>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> >>> --- >>> drivers/crypto/virtio/virtio_crypto_core.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/drivers/crypto/virtio/virtio_crypto_core.c b/drivers/crypto/virtio/virtio_crypto_core.c >>> index 1198bd306365..94849fa3bd74 100644 >>> --- a/drivers/crypto/virtio/virtio_crypto_core.c >>> +++ b/drivers/crypto/virtio/virtio_crypto_core.c >>> @@ -480,6 +480,7 @@ static void virtcrypto_free_unused_reqs(struct virtio_crypto *vcrypto) >>> kfree(vc_req->req_data); >>> kfree(vc_req->sgs); >>> } >>> + cond_resched(); >> that's not "fixing a stall", it is "call the scheduler because we are >> taking too long". The CPU isn't stalled at all, just busy. >> >> Are you sure this isn't just a bug in the code? Why is this code taking >> so long that you have to force the scheduler to run? This is almost >> always a sign that something else needs to be fixed instead. > And same comment on the other 2 patches, please fix this properly. > > Also, this is a tight loop that is just freeing memory, why is it taking > so long? Why do you want it to take longer (which is what you are doing > here), ideally it would be faster, not slower, so you are now slowing > down the system overall with this patchset, right? yes, it is the similar fix with one for virtio-net https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/virtio_net.c?h=v6.4-rc5&id=f8bb5104394560e29017c25bcade4c6b7aabd108 <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/virtio_net.c?h=v6.4-rc5&id=f8bb5104394560e29017c25bcade4c6b7aabd108> > > thanks, > > greg k-h
This is automated email and please do not reply to this email! Dear submitter, Thank you for submitting the patches to the linux bluetooth mailing list. This is a CI test results with your patch series: PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=755724 ---Test result--- Test Summary: CheckPatch FAIL 2.07 seconds GitLint PASS 0.89 seconds SubjectPrefix FAIL 0.60 seconds BuildKernel PASS 31.52 seconds CheckAllWarning PASS 35.09 seconds CheckSparse PASS 39.28 seconds CheckSmatch PASS 110.54 seconds BuildKernel32 PASS 30.66 seconds TestRunnerSetup PASS 441.30 seconds TestRunner_l2cap-tester PASS 16.46 seconds TestRunner_iso-tester PASS 21.93 seconds TestRunner_bnep-tester PASS 5.33 seconds TestRunner_mgmt-tester PASS 112.39 seconds TestRunner_rfcomm-tester PASS 8.67 seconds TestRunner_sco-tester PASS 7.87 seconds TestRunner_ioctl-tester PASS 9.21 seconds TestRunner_mesh-tester PASS 6.80 seconds TestRunner_smp-tester PASS 7.86 seconds TestRunner_userchan-tester PASS 5.59 seconds IncrementalBuild PASS 37.02 seconds Details ############################## Test: CheckPatch - FAIL Desc: Run checkpatch.pl script Output: [1/3] virtio-crypto: fixup potential cpu stall when free unused bufs WARNING: From:/Signed-off-by: email address mismatch: 'From: Xianting Tian <tianxianting.txt@alibaba-inc.com>' != 'Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>' total: 0 errors, 1 warnings, 7 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. /github/workspace/src/src/13273948.patch has style problems, please review. NOTE: Ignored message types: UNKNOWN_COMMIT_ID NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. ############################## Test: SubjectPrefix - FAIL Desc: Check subject contains "Bluetooth" prefix Output: "Bluetooth: " prefix is not specified in the subject "Bluetooth: " prefix is not specified in the subject "Bluetooth: " prefix is not specified in the subject --- Regards, Linux Bluetooth
On Fri, Jun 09, 2023 at 09:49:39PM +0800, Xianting Tian wrote: > > 在 2023/6/9 下午9:41, Greg KH 写道: > > On Fri, Jun 09, 2023 at 03:39:24PM +0200, Greg KH wrote: > > > On Fri, Jun 09, 2023 at 09:18:15PM +0800, Xianting Tian wrote: > > > > From: Xianting Tian <tianxianting.txt@alibaba-inc.com> > > > > > > > > Cpu stall issue may happen if device is configured with multi queues > > > > and large queue depth, so fix it. > > > > > > > > Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> > > > > --- > > > > drivers/crypto/virtio/virtio_crypto_core.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/drivers/crypto/virtio/virtio_crypto_core.c b/drivers/crypto/virtio/virtio_crypto_core.c > > > > index 1198bd306365..94849fa3bd74 100644 > > > > --- a/drivers/crypto/virtio/virtio_crypto_core.c > > > > +++ b/drivers/crypto/virtio/virtio_crypto_core.c > > > > @@ -480,6 +480,7 @@ static void virtcrypto_free_unused_reqs(struct virtio_crypto *vcrypto) > > > > kfree(vc_req->req_data); > > > > kfree(vc_req->sgs); > > > > } > > > > + cond_resched(); > > > that's not "fixing a stall", it is "call the scheduler because we are > > > taking too long". The CPU isn't stalled at all, just busy. > > > > > > Are you sure this isn't just a bug in the code? Why is this code taking > > > so long that you have to force the scheduler to run? This is almost > > > always a sign that something else needs to be fixed instead. > > And same comment on the other 2 patches, please fix this properly. > > > > Also, this is a tight loop that is just freeing memory, why is it taking > > so long? Why do you want it to take longer (which is what you are doing > > here), ideally it would be faster, not slower, so you are now slowing > > down the system overall with this patchset, right? > > yes, it is the similar fix with one for virtio-net > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/virtio_net.c?h=v6.4-rc5&id=f8bb5104394560e29017c25bcade4c6b7aabd108 <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/virtio_net.c?h=v6.4-rc5&id=f8bb5104394560e29017c25bcade4c6b7aabd108> I would argue that this too is incorrect, because why does freeing memory take so long? And again, you are making it take longer, is that ok? thanks, greg k-h
在 2023/6/9 下午10:05, Greg KH 写道: > On Fri, Jun 09, 2023 at 09:49:39PM +0800, Xianting Tian wrote: >> 在 2023/6/9 下午9:41, Greg KH 写道: >>> On Fri, Jun 09, 2023 at 03:39:24PM +0200, Greg KH wrote: >>>> On Fri, Jun 09, 2023 at 09:18:15PM +0800, Xianting Tian wrote: >>>>> From: Xianting Tian <tianxianting.txt@alibaba-inc.com> >>>>> >>>>> Cpu stall issue may happen if device is configured with multi queues >>>>> and large queue depth, so fix it. >>>>> >>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> >>>>> --- >>>>> drivers/crypto/virtio/virtio_crypto_core.c | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> diff --git a/drivers/crypto/virtio/virtio_crypto_core.c b/drivers/crypto/virtio/virtio_crypto_core.c >>>>> index 1198bd306365..94849fa3bd74 100644 >>>>> --- a/drivers/crypto/virtio/virtio_crypto_core.c >>>>> +++ b/drivers/crypto/virtio/virtio_crypto_core.c >>>>> @@ -480,6 +480,7 @@ static void virtcrypto_free_unused_reqs(struct virtio_crypto *vcrypto) >>>>> kfree(vc_req->req_data); >>>>> kfree(vc_req->sgs); >>>>> } >>>>> + cond_resched(); >>>> that's not "fixing a stall", it is "call the scheduler because we are >>>> taking too long". The CPU isn't stalled at all, just busy. >>>> >>>> Are you sure this isn't just a bug in the code? Why is this code taking >>>> so long that you have to force the scheduler to run? This is almost >>>> always a sign that something else needs to be fixed instead. >>> And same comment on the other 2 patches, please fix this properly. >>> >>> Also, this is a tight loop that is just freeing memory, why is it taking >>> so long? Why do you want it to take longer (which is what you are doing >>> here), ideally it would be faster, not slower, so you are now slowing >>> down the system overall with this patchset, right? >> yes, it is the similar fix with one for virtio-net >> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/virtio_net.c?h=v6.4-rc5&id=f8bb5104394560e29017c25bcade4c6b7aabd108 <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/virtio_net.c?h=v6.4-rc5&id=f8bb5104394560e29017c25bcade4c6b7aabd108> > I would argue that this too is incorrect, because why does freeing > memory take so long? And again, you are making it take longer, is that > ok? Yes, it may take longer, but I think it's no harms. As the queue numbers and queue's depth are uncertain, it depends on user's configuration. It may take more times in kernel space to free all queues without schedule, so it has the risk to cause other task starve > > thanks, > > greg k-h
On Fri, Jun 09, 2023 at 09:18:15PM +0800, Xianting Tian wrote: > From: Xianting Tian <tianxianting.txt@alibaba-inc.com> > > Cpu stall issue may happen if device is configured with multi queues > and large queue depth, so fix it. What does "may happen" imply exactly? was this observed? > Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> > --- > drivers/crypto/virtio/virtio_crypto_core.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/crypto/virtio/virtio_crypto_core.c b/drivers/crypto/virtio/virtio_crypto_core.c > index 1198bd306365..94849fa3bd74 100644 > --- a/drivers/crypto/virtio/virtio_crypto_core.c > +++ b/drivers/crypto/virtio/virtio_crypto_core.c > @@ -480,6 +480,7 @@ static void virtcrypto_free_unused_reqs(struct virtio_crypto *vcrypto) > kfree(vc_req->req_data); > kfree(vc_req->sgs); > } > + cond_resched(); > } > } > > -- > 2.17.1
On Fri, Jun 09, 2023 at 09:18:15PM +0800, Xianting Tian wrote: > From: Xianting Tian <tianxianting.txt@alibaba-inc.com> > > Cpu stall issue may happen if device is configured with multi queues > and large queue depth, so fix it. > > Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> include a Fixes tag? > --- > drivers/crypto/virtio/virtio_crypto_core.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/crypto/virtio/virtio_crypto_core.c b/drivers/crypto/virtio/virtio_crypto_core.c > index 1198bd306365..94849fa3bd74 100644 > --- a/drivers/crypto/virtio/virtio_crypto_core.c > +++ b/drivers/crypto/virtio/virtio_crypto_core.c > @@ -480,6 +480,7 @@ static void virtcrypto_free_unused_reqs(struct virtio_crypto *vcrypto) > kfree(vc_req->req_data); > kfree(vc_req->sgs); > } > + cond_resched(); > } > } > > -- > 2.17.1
On Fri, Jun 09, 2023 at 04:05:57PM +0200, Greg KH wrote: > On Fri, Jun 09, 2023 at 09:49:39PM +0800, Xianting Tian wrote: > > > > 在 2023/6/9 下午9:41, Greg KH 写道: > > > On Fri, Jun 09, 2023 at 03:39:24PM +0200, Greg KH wrote: > > > > On Fri, Jun 09, 2023 at 09:18:15PM +0800, Xianting Tian wrote: > > > > > From: Xianting Tian <tianxianting.txt@alibaba-inc.com> > > > > > > > > > > Cpu stall issue may happen if device is configured with multi queues > > > > > and large queue depth, so fix it. > > > > > > > > > > Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> > > > > > --- > > > > > drivers/crypto/virtio/virtio_crypto_core.c | 1 + > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > diff --git a/drivers/crypto/virtio/virtio_crypto_core.c b/drivers/crypto/virtio/virtio_crypto_core.c > > > > > index 1198bd306365..94849fa3bd74 100644 > > > > > --- a/drivers/crypto/virtio/virtio_crypto_core.c > > > > > +++ b/drivers/crypto/virtio/virtio_crypto_core.c > > > > > @@ -480,6 +480,7 @@ static void virtcrypto_free_unused_reqs(struct virtio_crypto *vcrypto) > > > > > kfree(vc_req->req_data); > > > > > kfree(vc_req->sgs); > > > > > } > > > > > + cond_resched(); > > > > that's not "fixing a stall", it is "call the scheduler because we are > > > > taking too long". The CPU isn't stalled at all, just busy. > > > > > > > > Are you sure this isn't just a bug in the code? Why is this code taking > > > > so long that you have to force the scheduler to run? This is almost > > > > always a sign that something else needs to be fixed instead. > > > And same comment on the other 2 patches, please fix this properly. > > > > > > Also, this is a tight loop that is just freeing memory, why is it taking > > > so long? Why do you want it to take longer (which is what you are doing > > > here), ideally it would be faster, not slower, so you are now slowing > > > down the system overall with this patchset, right? > > > > yes, it is the similar fix with one for virtio-net > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/virtio_net.c?h=v6.4-rc5&id=f8bb5104394560e29017c25bcade4c6b7aabd108 <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/virtio_net.c?h=v6.4-rc5&id=f8bb5104394560e29017c25bcade4c6b7aabd108> Well that one actually at least describes the configuration: For multi-queue and large ring-size use case, the following error occurred when free_unused_bufs: rcu: INFO: rcu_sched self-detected stall on CPU. So a similar fix but not a similar commit log, this one lacks Fixes tag and description of what the problem is and when does it trigger. > I would argue that this too is incorrect, because why does freeing > memory take so long? You are correct that even that one lacks detailed explanation why does the patch help. And the explanation why it takes so long is exactly that we have very deep queues and a very large number of queues. What the patch does is gives scheduler a chance to do some work between the queues. > And again, you are making it take longer, is that > ok? > > thanks, > > greg k-h
在 2023/6/9 下午11:57, Michael S. Tsirkin 写道: > On Fri, Jun 09, 2023 at 09:18:15PM +0800, Xianting Tian wrote: >> From: Xianting Tian <tianxianting.txt@alibaba-inc.com> >> >> Cpu stall issue may happen if device is configured with multi queues >> and large queue depth, so fix it. > What does "may happen" imply exactly? > was this observed? I didn't met such issue, this patch set just a theoretical fix. > >> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> >> --- >> drivers/crypto/virtio/virtio_crypto_core.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/crypto/virtio/virtio_crypto_core.c b/drivers/crypto/virtio/virtio_crypto_core.c >> index 1198bd306365..94849fa3bd74 100644 >> --- a/drivers/crypto/virtio/virtio_crypto_core.c >> +++ b/drivers/crypto/virtio/virtio_crypto_core.c >> @@ -480,6 +480,7 @@ static void virtcrypto_free_unused_reqs(struct virtio_crypto *vcrypto) >> kfree(vc_req->req_data); >> kfree(vc_req->sgs); >> } >> + cond_resched(); >> } >> } >> >> -- >> 2.17.1
On Sat, Jun 10, 2023 at 11:20:49AM +0800, Xianting Tian wrote: > > 在 2023/6/9 下午11:57, Michael S. Tsirkin 写道: > > On Fri, Jun 09, 2023 at 09:18:15PM +0800, Xianting Tian wrote: > > > From: Xianting Tian <tianxianting.txt@alibaba-inc.com> > > > > > > Cpu stall issue may happen if device is configured with multi queues > > > and large queue depth, so fix it. > > What does "may happen" imply exactly? > > was this observed? > I didn't met such issue, this patch set just a theoretical fix. Then I would not recommend adding it at this time, as you just slowed down the kernel for something that no one has reported :( thanks, greg k-h
diff --git a/drivers/crypto/virtio/virtio_crypto_core.c b/drivers/crypto/virtio/virtio_crypto_core.c index 1198bd306365..94849fa3bd74 100644 --- a/drivers/crypto/virtio/virtio_crypto_core.c +++ b/drivers/crypto/virtio/virtio_crypto_core.c @@ -480,6 +480,7 @@ static void virtcrypto_free_unused_reqs(struct virtio_crypto *vcrypto) kfree(vc_req->req_data); kfree(vc_req->sgs); } + cond_resched(); } }