Message ID | 20230808104239.146085-2-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-mq: fix wrong queue mapping for kdump kernel | expand |
I'm starting to sound like a broken record, but we can't just do random is_kdump checks, and it's not going to get better by resending it again and again. If kdump kernels limit the number of possible CPUs, it needs to reflected in cpu_possible_map and we need to use that information.
On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote: > I'm starting to sound like a broken record, but we can't just do random > is_kdump checks, and it's not going to get better by resending it again and > again. If kdump kernels limit the number of possible CPUs, it needs to > reflected in cpu_possible_map and we need to use that information. > Can you look at previous kdump/arch guys' comment about kdump usage & num_possible_cpus? https://lore.kernel.org/linux-block/CAF+s44RuqswbosY9kMDx35crviQnxOeuvgNsuE75Bb0Y2Jg2uw@mail.gmail.com/ https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/ The point is that kdump kernels does not limit the number of possible CPUs. 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since num_possible_cpus becomes 1. 2) some archs do not support 'nr_cpus=1', and have to rely on 'max_cpus=1', so num_possible_cpus isn't changed, and kernel just boots with single online cpu. That causes trouble because blk-mq limits single queue. Documentation/admin-guide/kdump/kdump.rst Thanks, Ming
On 08/10/23 at 08:09am, Ming Lei wrote: > On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote: > > I'm starting to sound like a broken record, but we can't just do random > > is_kdump checks, and it's not going to get better by resending it again and > > again. If kdump kernels limit the number of possible CPUs, it needs to > > reflected in cpu_possible_map and we need to use that information. > > > > Can you look at previous kdump/arch guys' comment about kdump usage & > num_possible_cpus? > > https://lore.kernel.org/linux-block/CAF+s44RuqswbosY9kMDx35crviQnxOeuvgNsuE75Bb0Y2Jg2uw@mail.gmail.com/ > https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/ > > The point is that kdump kernels does not limit the number of possible CPUs. > > 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since > num_possible_cpus becomes 1. Yes, "nr_cpus=" is strongly suggested in kdump kernel because "nr_cpus=" limits the possible cpu numbers, while "maxcpuss=" only limits the cpu number which can be brought up during bootup. We noticed this diference because a large number of possible cpus will cost more memory in kdump kernel. e.g percpu initialization, even though kdump kernel have set "maxcpus=1". Currently x86 and arm64 all support "nr_cpus=". Pingfan ever spent much effort to make patches to add "nr_cpus=" support to ppc64, seems ppc64 dev and maintainers do not care about it. Finally the patches are not accepted, and the work is not continued. Now, I am wondering what is the barrier to add "nr_cpus=" to power ach. Can we reconsider adding 'nr_cpus=' to power arch since real issue occurred in kdump kernel? As for this patchset, it can be accpeted so that no failure in kdump kernel is seen on ARCHes w/o "nr_cpus=" support? My personal opinion. > > 2) some archs do not support 'nr_cpus=1', and have to rely on > 'max_cpus=1', so num_possible_cpus isn't changed, and kernel just boots > with single online cpu. That causes trouble because blk-mq limits single > queue. > > Documentation/admin-guide/kdump/kdump.rst > > Thanks, > Ming >
On Thu, Aug 10, 2023 at 09:18:27AM +0800, Baoquan He wrote: > On 08/10/23 at 08:09am, Ming Lei wrote: > > On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote: > > > I'm starting to sound like a broken record, but we can't just do random > > > is_kdump checks, and it's not going to get better by resending it again and > > > again. If kdump kernels limit the number of possible CPUs, it needs to > > > reflected in cpu_possible_map and we need to use that information. > > > > > > > Can you look at previous kdump/arch guys' comment about kdump usage & > > num_possible_cpus? > > > > https://lore.kernel.org/linux-block/CAF+s44RuqswbosY9kMDx35crviQnxOeuvgNsuE75Bb0Y2Jg2uw@mail.gmail.com/ > > https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/ > > > > The point is that kdump kernels does not limit the number of possible CPUs. > > > > 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since > > num_possible_cpus becomes 1. > > Yes, "nr_cpus=" is strongly suggested in kdump kernel because "nr_cpus=" > limits the possible cpu numbers, while "maxcpuss=" only limits the cpu > number which can be brought up during bootup. We noticed this diference > because a large number of possible cpus will cost more memory in kdump > kernel. e.g percpu initialization, even though kdump kernel have set > "maxcpus=1". > > Currently x86 and arm64 all support "nr_cpus=". Pingfan ever spent much > effort to make patches to add "nr_cpus=" support to ppc64, seems ppc64 > dev and maintainers do not care about it. Finally the patches are not > accepted, and the work is not continued. > > Now, I am wondering what is the barrier to add "nr_cpus=" to power ach. > Can we reconsider adding 'nr_cpus=' to power arch since real issue > occurred in kdump kernel? If 'nr_cpus=' can be supported on ppc64, this patchset isn't needed. > > As for this patchset, it can be accpeted so that no failure in kdump > kernel is seen on ARCHes w/o "nr_cpus=" support? My personal opinion. IMO 'nr_cpus=' support should be preferred, given it is annoying to maintain two kinds of implementation for kdump kernel from driver viewpoint. I guess kdump things can be simplified too with supporting 'nr_cpus=' only. thanks, Ming
On 08/10/23 at 10:06am, Ming Lei wrote: > On Thu, Aug 10, 2023 at 09:18:27AM +0800, Baoquan He wrote: > > On 08/10/23 at 08:09am, Ming Lei wrote: > > > On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote: > > > > I'm starting to sound like a broken record, but we can't just do random > > > > is_kdump checks, and it's not going to get better by resending it again and > > > > again. If kdump kernels limit the number of possible CPUs, it needs to > > > > reflected in cpu_possible_map and we need to use that information. > > > > > > > > > > Can you look at previous kdump/arch guys' comment about kdump usage & > > > num_possible_cpus? > > > > > > https://lore.kernel.org/linux-block/CAF+s44RuqswbosY9kMDx35crviQnxOeuvgNsuE75Bb0Y2Jg2uw@mail.gmail.com/ > > > https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/ > > > > > > The point is that kdump kernels does not limit the number of possible CPUs. > > > > > > 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since > > > num_possible_cpus becomes 1. > > > > Yes, "nr_cpus=" is strongly suggested in kdump kernel because "nr_cpus=" > > limits the possible cpu numbers, while "maxcpuss=" only limits the cpu > > number which can be brought up during bootup. We noticed this diference > > because a large number of possible cpus will cost more memory in kdump > > kernel. e.g percpu initialization, even though kdump kernel have set > > "maxcpus=1". > > > > Currently x86 and arm64 all support "nr_cpus=". Pingfan ever spent much > > effort to make patches to add "nr_cpus=" support to ppc64, seems ppc64 > > dev and maintainers do not care about it. Finally the patches are not > > accepted, and the work is not continued. > > > > Now, I am wondering what is the barrier to add "nr_cpus=" to power ach. > > Can we reconsider adding 'nr_cpus=' to power arch since real issue > > occurred in kdump kernel? > > If 'nr_cpus=' can be supported on ppc64, this patchset isn't needed. > > > > > As for this patchset, it can be accpeted so that no failure in kdump > > kernel is seen on ARCHes w/o "nr_cpus=" support? My personal opinion. > > IMO 'nr_cpus=' support should be preferred, given it is annoying to > maintain two kinds of implementation for kdump kernel from driver > viewpoint. I guess kdump things can be simplified too with supporting > 'nr_cpus=' only. Yes, 'nr_cpus=' is ideal. Not sure if there's some underlying concerns so that power people decided to not support it.
On 10/08/23 8:31 am, Baoquan He wrote: > On 08/10/23 at 10:06am, Ming Lei wrote: >> On Thu, Aug 10, 2023 at 09:18:27AM +0800, Baoquan He wrote: >>> On 08/10/23 at 08:09am, Ming Lei wrote: >>>> On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote: >>>>> I'm starting to sound like a broken record, but we can't just do random >>>>> is_kdump checks, and it's not going to get better by resending it again and >>>>> again. If kdump kernels limit the number of possible CPUs, it needs to >>>>> reflected in cpu_possible_map and we need to use that information. >>>>> >>>> >>>> Can you look at previous kdump/arch guys' comment about kdump usage & >>>> num_possible_cpus? >>>> >>>> https://lore.kernel.org/linux-block/CAF+s44RuqswbosY9kMDx35crviQnxOeuvgNsuE75Bb0Y2Jg2uw@mail.gmail.com/ >>>> https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/ >>>> >>>> The point is that kdump kernels does not limit the number of possible CPUs. >>>> >>>> 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since >>>> num_possible_cpus becomes 1. >>> >>> Yes, "nr_cpus=" is strongly suggested in kdump kernel because "nr_cpus=" >>> limits the possible cpu numbers, while "maxcpuss=" only limits the cpu >>> number which can be brought up during bootup. We noticed this diference >>> because a large number of possible cpus will cost more memory in kdump >>> kernel. e.g percpu initialization, even though kdump kernel have set >>> "maxcpus=1". >>> >>> Currently x86 and arm64 all support "nr_cpus=". Pingfan ever spent much >>> effort to make patches to add "nr_cpus=" support to ppc64, seems ppc64 >>> dev and maintainers do not care about it. Finally the patches are not >>> accepted, and the work is not continued. >>> >>> Now, I am wondering what is the barrier to add "nr_cpus=" to power ach. >>> Can we reconsider adding 'nr_cpus=' to power arch since real issue >>> occurred in kdump kernel? >> >> If 'nr_cpus=' can be supported on ppc64, this patchset isn't needed. >> >>> >>> As for this patchset, it can be accpeted so that no failure in kdump >>> kernel is seen on ARCHes w/o "nr_cpus=" support? My personal opinion. >> >> IMO 'nr_cpus=' support should be preferred, given it is annoying to >> maintain two kinds of implementation for kdump kernel from driver >> viewpoint. I guess kdump things can be simplified too with supporting >> 'nr_cpus=' only. > > Yes, 'nr_cpus=' is ideal. Not sure if there's some underlying concerns so > that power people decided to not support it. Though "nr_cpus=1" is an ideal solution, maintainer was not happy with the patch as the code changes have impact for regular boot path and it is likely to cause breakages. So, even if "nr_cpus=1" support for ppc64 is revived, the change is going to take time to be accepted upstream. Also, I see is_kdump_kernel() being used irrespective of "nr_cpus=1" support for other optimizations in the driver for the special dump capture environment kdump is. If there is no other downside for driver code, to use is_kdump_kernel(), other than the maintainability aspect, I think the above changes are worth considering. Thanks Hari
On Thu, Aug 10, 2023 at 08:09:27AM +0800, Ming Lei wrote: > 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since > num_possible_cpus becomes 1. > > 2) some archs do not support 'nr_cpus=1', and have to rely on > 'max_cpus=1', so num_possible_cpus isn't changed, and kernel just boots > with single online cpu. That causes trouble because blk-mq limits single > queue. And we need to fix case 2. We need to drop the is_kdump support, and if they want to force less cpus they need to make nr_cpus=1 work.
Hi Hari, Michael On 08/11/23 at 01:23pm, Hari Bathini wrote: > > > On 10/08/23 8:31 am, Baoquan He wrote: > > On 08/10/23 at 10:06am, Ming Lei wrote: > > > On Thu, Aug 10, 2023 at 09:18:27AM +0800, Baoquan He wrote: > > > > On 08/10/23 at 08:09am, Ming Lei wrote: > > > > > On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote: > > > > > > I'm starting to sound like a broken record, but we can't just do random > > > > > > is_kdump checks, and it's not going to get better by resending it again and > > > > > > again. If kdump kernels limit the number of possible CPUs, it needs to > > > > > > reflected in cpu_possible_map and we need to use that information. > > > > > > > > > > > > > > > > Can you look at previous kdump/arch guys' comment about kdump usage & > > > > > num_possible_cpus? > > > > > > > > > > https://lore.kernel.org/linux-block/CAF+s44RuqswbosY9kMDx35crviQnxOeuvgNsuE75Bb0Y2Jg2uw@mail.gmail.com/ > > > > > https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/ > > > > > > > > > > The point is that kdump kernels does not limit the number of possible CPUs. > > > > > > > > > > 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since > > > > > num_possible_cpus becomes 1. > > > > > > > > Yes, "nr_cpus=" is strongly suggested in kdump kernel because "nr_cpus=" > > > > limits the possible cpu numbers, while "maxcpuss=" only limits the cpu > > > > number which can be brought up during bootup. We noticed this diference > > > > because a large number of possible cpus will cost more memory in kdump > > > > kernel. e.g percpu initialization, even though kdump kernel have set > > > > "maxcpus=1". > > > > > > > > Currently x86 and arm64 all support "nr_cpus=". Pingfan ever spent much > > > > effort to make patches to add "nr_cpus=" support to ppc64, seems ppc64 > > > > dev and maintainers do not care about it. Finally the patches are not > > > > accepted, and the work is not continued. > > > > > > > > Now, I am wondering what is the barrier to add "nr_cpus=" to power ach. > > > > Can we reconsider adding 'nr_cpus=' to power arch since real issue > > > > occurred in kdump kernel? > > > > > > If 'nr_cpus=' can be supported on ppc64, this patchset isn't needed. > > > > > > > > > > > As for this patchset, it can be accpeted so that no failure in kdump > > > > kernel is seen on ARCHes w/o "nr_cpus=" support? My personal opinion. > > > > > > IMO 'nr_cpus=' support should be preferred, given it is annoying to > > > maintain two kinds of implementation for kdump kernel from driver > > > viewpoint. I guess kdump things can be simplified too with supporting > > > 'nr_cpus=' only. > > > > Yes, 'nr_cpus=' is ideal. Not sure if there's some underlying concerns so > > that power people decided to not support it. > > Though "nr_cpus=1" is an ideal solution, maintainer was not happy with > the patch as the code changes have impact for regular boot path and > it is likely to cause breakages. So, even if "nr_cpus=1" support for > ppc64 is revived, the change is going to take time to be accepted > upstream. I talked to pingfan recently, he said he posted patches to add 'nr_cpus=' support in powerpc in order to reduce memory amount for kdump kernel. His patches were rejected by maintainer because maintainer thought the reason is not sufficient. So up to now, in architectures fedora/RHEL supports to provide default crashkernel reservation value, powerpc costs most. Now with this emerging issue, can we reconsider supporting 'nr_cpus=' in powerpc? > > Also, I see is_kdump_kernel() being used irrespective of "nr_cpus=1" > support for other optimizations in the driver for the special dump > capture environment kdump is. > > If there is no other downside for driver code, to use is_kdump_kernel(), > other than the maintainability aspect, I think the above changes are > worth considering. Hi Hari, By the way, will you use the ppc specific is_kdump_kernel() and is_crashdump_kernel() in your patches to fix this issue? Thanks Baoquan
diff --git a/block/blk-mq.c b/block/blk-mq.c index b04ff6f56926..617d6f849a7b 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -140,6 +140,22 @@ void blk_mq_freeze_queue_wait(struct request_queue *q) } EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait); +/* + * Return the max supported nr_hw_queues for each hw queue type + * + * blk_mq_alloc_tag_set() may change nr_hw_queues for kdump kernel, so + * driver has to take blk-mq max supported nr_hw_queues into account + * when figuring out nr_hw_queues from hardware info, for avoiding + * inconsistency between driver and blk-mq. + */ +unsigned int blk_mq_max_nr_hw_queues(void) +{ + if (is_kdump_kernel()) + return 1; + return nr_cpu_ids; +} +EXPORT_SYMBOL_GPL(blk_mq_max_nr_hw_queues); + int blk_mq_freeze_queue_wait_timeout(struct request_queue *q, unsigned long timeout) { diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 495ca198775f..4c0cfd1f9e52 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -711,6 +711,7 @@ int blk_mq_alloc_sq_tag_set(struct blk_mq_tag_set *set, const struct blk_mq_ops *ops, unsigned int queue_depth, unsigned int set_flags); void blk_mq_free_tag_set(struct blk_mq_tag_set *set); +unsigned int blk_mq_max_nr_hw_queues(void); void blk_mq_free_request(struct request *rq); int blk_rq_poll(struct request *rq, struct io_comp_batch *iob,
blk_mq_alloc_tag_set() may override set->nr_hw_queues as 1 in case of kdump kernel. This way causes trouble for driver, because blk-mq and driver see different queue mapping. Especially the only online CPU may not be 1 for kdump kernel, in which 'maxcpus=1' is passed from kernel command line, then driver may map hctx0 into one inactive real hw queue which cpu affinity is 0(offline). The issue exists on all drivers which use managed irq and support multiple hw queue. Prepare for fixing this kind of issue by applying the added helper, so driver can take blk-mq max nr_hw_queues knowledge into account when calculating io queues. Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-mq.c | 16 ++++++++++++++++ include/linux/blk-mq.h | 1 + 2 files changed, 17 insertions(+)