Message ID | 1561776155-38975-1-git-send-email-wangxiongfeng2@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | Support CPU hotplug for ARM64 | expand |
Hi guys, (CC: +kvmarm list) On 29/06/2019 03:42, Xiongfeng Wang wrote: > This patchset mark all the GICC node in MADT as possible CPUs even though it > is disabled. But only those enabled GICC node are marked as present CPUs. > So that kernel will initialize some CPU related data structure in advance before > the CPU is actually hot added into the system. This patchset also implement > 'acpi_(un)map_cpu()' and 'arch_(un)register_cpu()' for ARM64. These functions are > needed to enable CPU hotplug. > > To support CPU hotplug, we need to add all the possible GICC node in MADT > including those CPUs that are not present but may be hot added later. Those > CPUs are marked as disabled in GICC nodes. ... what do you need this for? (The term cpu-hotplug in the arm world almost never means hot-adding a new package/die to the platform, we usually mean taking CPUs online/offline for power management. e.g. cpuhp_offline_cpu_device()) It looks like you're adding support for hot-adding a new package/die to the platform ... but only for virtualisation. I don't see why this is needed for virtualisation. The in-kernel irqchip needs to know these vcpu exist before you can enter the guest for the first time. You can't create them late. At best you're saving the host scheduling a vcpu that is offline. Is this really a problem? If we moved PSCI support to user-space, you could avoid creating host vcpu threads until the guest brings the vcpu online, which would solve that problem, and save the host resources for the thread too. (and its acpi/dt agnostic) I don't see the difference here between booting the guest with 'maxcpus=1', and bringing the vcpu online later. The only real difference seems to be moving the can-be-online policy into the hypervisor/VMM... I think physical package/die hotadd is a much bigger, uglier problem than doing the same under virtualisation. Its best to do this on real hardware first so we don't miss something. (cpu-topology, numa, memory, errata, timers?) I'm worried that doing virtualisation first means the firmware-requirements for physical hotadd stuff is "whatever Qemu does". Thanks, James
On 7/5/2019 3:12 AM, James Morse wrote: > Hi guys, > > (CC: +kvmarm list) > > On 29/06/2019 03:42, Xiongfeng Wang wrote: >> This patchset mark all the GICC node in MADT as possible CPUs even though it >> is disabled. But only those enabled GICC node are marked as present CPUs. >> So that kernel will initialize some CPU related data structure in advance before >> the CPU is actually hot added into the system. This patchset also implement >> 'acpi_(un)map_cpu()' and 'arch_(un)register_cpu()' for ARM64. These functions are >> needed to enable CPU hotplug. >> >> To support CPU hotplug, we need to add all the possible GICC node in MADT >> including those CPUs that are not present but may be hot added later. Those >> CPUs are marked as disabled in GICC nodes. > ... what do you need this for? > > (The term cpu-hotplug in the arm world almost never means hot-adding a new package/die to > the platform, we usually mean taking CPUs online/offline for power management. e.g. > cpuhp_offline_cpu_device()) > > It looks like you're adding support for hot-adding a new package/die to the platform ... > but only for virtualisation. > > I don't see why this is needed for virtualisation. The in-kernel irqchip needs to know > these vcpu exist before you can enter the guest for the first time. You can't create them > late. At best you're saving the host scheduling a vcpu that is offline. Is this really a > problem? > > If we moved PSCI support to user-space, you could avoid creating host vcpu threads until > the guest brings the vcpu online, which would solve that problem, and save the host > resources for the thread too. (and its acpi/dt agnostic) > > I don't see the difference here between booting the guest with 'maxcpus=1', and bringing > the vcpu online later. The only real difference seems to be moving the can-be-online > policy into the hypervisor/VMM... Isn't that an important distinction from a cloud service provider's perspective? As far as I understand it, you also need CPU hotplug capabilities to support things like Kata runtime under Kubernetes. i.e. when implementing your containers in the form of light weight VMs for the additional security ... and the orchestration layer cannot determine ahead of time how much CPU/memory resources are going to be needed to run the pod(s). Thanks, -Maran > > I think physical package/die hotadd is a much bigger, uglier problem than doing the same > under virtualisation. Its best to do this on real hardware first so we don't miss > something. (cpu-topology, numa, memory, errata, timers?) > I'm worried that doing virtualisation first means the firmware-requirements for physical > hotadd stuff is "whatever Qemu does". > > > Thanks, > > James > _______________________________________________ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
On 09/07/2019 20:06, Maran Wilson wrote: > On 7/5/2019 3:12 AM, James Morse wrote: >> Hi guys, >> >> (CC: +kvmarm list) >> >> On 29/06/2019 03:42, Xiongfeng Wang wrote: >>> This patchset mark all the GICC node in MADT as possible CPUs even though it >>> is disabled. But only those enabled GICC node are marked as present CPUs. >>> So that kernel will initialize some CPU related data structure in advance before >>> the CPU is actually hot added into the system. This patchset also implement >>> 'acpi_(un)map_cpu()' and 'arch_(un)register_cpu()' for ARM64. These functions are >>> needed to enable CPU hotplug. >>> >>> To support CPU hotplug, we need to add all the possible GICC node in MADT >>> including those CPUs that are not present but may be hot added later. Those >>> CPUs are marked as disabled in GICC nodes. >> ... what do you need this for? >> >> (The term cpu-hotplug in the arm world almost never means hot-adding a new package/die to >> the platform, we usually mean taking CPUs online/offline for power management. e.g. >> cpuhp_offline_cpu_device()) >> >> It looks like you're adding support for hot-adding a new package/die to the platform ... >> but only for virtualisation. >> >> I don't see why this is needed for virtualisation. The in-kernel irqchip needs to know >> these vcpu exist before you can enter the guest for the first time. You can't create them >> late. At best you're saving the host scheduling a vcpu that is offline. Is this really a >> problem? >> >> If we moved PSCI support to user-space, you could avoid creating host vcpu threads until >> the guest brings the vcpu online, which would solve that problem, and save the host >> resources for the thread too. (and its acpi/dt agnostic) >> >> I don't see the difference here between booting the guest with 'maxcpus=1', and bringing >> the vcpu online later. The only real difference seems to be moving the can-be-online >> policy into the hypervisor/VMM... > > Isn't that an important distinction from a cloud service provider's > perspective? > > As far as I understand it, you also need CPU hotplug capabilities to > support things like Kata runtime under Kubernetes. i.e. when > implementing your containers in the form of light weight VMs for the > additional security ... and the orchestration layer cannot determine > ahead of time how much CPU/memory resources are going to be needed to > run the pod(s). Why would it be any different? You can pre-allocate your vcpus, leave them parked until some external agent decides to signal the container that it it can use another bunch of CPUs. At that point, the container must actively boot these vcpus (they aren't going to come up by magic). Given that you must have sized your virtual platform to deal with the maximum set of resources you anticipate (think of the GIC redistributors, for example), I really wonder what you gain here. > > Thanks, > -Maran > >> >> I think physical package/die hotadd is a much bigger, uglier problem than doing the same >> under virtualisation. Its best to do this on real hardware first so we don't miss >> something. (cpu-topology, numa, memory, errata, timers?) >> I'm worried that doing virtualisation first means the firmware-requirements for physical >> hotadd stuff is "whatever Qemu does". For sure, I want to model the virtualization side after the actual HW, and not the other way around. Live reconfiguration of the interrupt topology (and thus the whole memory map) will certainly be challenging. Thanks, M.
On 7/10/2019 2:15 AM, Marc Zyngier wrote: > On 09/07/2019 20:06, Maran Wilson wrote: >> On 7/5/2019 3:12 AM, James Morse wrote: >>> Hi guys, >>> >>> (CC: +kvmarm list) >>> >>> On 29/06/2019 03:42, Xiongfeng Wang wrote: >>>> This patchset mark all the GICC node in MADT as possible CPUs even though it >>>> is disabled. But only those enabled GICC node are marked as present CPUs. >>>> So that kernel will initialize some CPU related data structure in advance before >>>> the CPU is actually hot added into the system. This patchset also implement >>>> 'acpi_(un)map_cpu()' and 'arch_(un)register_cpu()' for ARM64. These functions are >>>> needed to enable CPU hotplug. >>>> >>>> To support CPU hotplug, we need to add all the possible GICC node in MADT >>>> including those CPUs that are not present but may be hot added later. Those >>>> CPUs are marked as disabled in GICC nodes. >>> ... what do you need this for? >>> >>> (The term cpu-hotplug in the arm world almost never means hot-adding a new package/die to >>> the platform, we usually mean taking CPUs online/offline for power management. e.g. >>> cpuhp_offline_cpu_device()) >>> >>> It looks like you're adding support for hot-adding a new package/die to the platform ... >>> but only for virtualisation. >>> >>> I don't see why this is needed for virtualisation. The in-kernel irqchip needs to know >>> these vcpu exist before you can enter the guest for the first time. You can't create them >>> late. At best you're saving the host scheduling a vcpu that is offline. Is this really a >>> problem? >>> >>> If we moved PSCI support to user-space, you could avoid creating host vcpu threads until >>> the guest brings the vcpu online, which would solve that problem, and save the host >>> resources for the thread too. (and its acpi/dt agnostic) >>> >>> I don't see the difference here between booting the guest with 'maxcpus=1', and bringing >>> the vcpu online later. The only real difference seems to be moving the can-be-online >>> policy into the hypervisor/VMM... >> Isn't that an important distinction from a cloud service provider's >> perspective? >> >> As far as I understand it, you also need CPU hotplug capabilities to >> support things like Kata runtime under Kubernetes. i.e. when >> implementing your containers in the form of light weight VMs for the >> additional security ... and the orchestration layer cannot determine >> ahead of time how much CPU/memory resources are going to be needed to >> run the pod(s). > Why would it be any different? You can pre-allocate your vcpus, leave > them parked until some external agent decides to signal the container > that it it can use another bunch of CPUs. At that point, the container > must actively boot these vcpus (they aren't going to come up by magic). > > Given that you must have sized your virtual platform to deal with the > maximum set of resources you anticipate (think of the GIC > redistributors, for example), I really wonder what you gain here. Maybe I'm not following the alternative proposal completely, but wouldn't a guest VM (who happens to be in control of its OS) be able to add/online vCPU resources without approval from the VMM this way? Thanks, -Maran >> Thanks, >> -Maran >> >>> I think physical package/die hotadd is a much bigger, uglier problem than doing the same >>> under virtualisation. Its best to do this on real hardware first so we don't miss >>> something. (cpu-topology, numa, memory, errata, timers?) >>> I'm worried that doing virtualisation first means the firmware-requirements for physical >>> hotadd stuff is "whatever Qemu does". > For sure, I want to model the virtualization side after the actual HW, > and not the other way around. Live reconfiguration of the interrupt > topology (and thus the whole memory map) will certainly be challenging. > > Thanks, > > M.
Hi Maran, On 10/07/2019 17:05, Maran Wilson wrote: > On 7/10/2019 2:15 AM, Marc Zyngier wrote: >> On 09/07/2019 20:06, Maran Wilson wrote: >>> On 7/5/2019 3:12 AM, James Morse wrote: >>>> On 29/06/2019 03:42, Xiongfeng Wang wrote: >>>>> This patchset mark all the GICC node in MADT as possible CPUs even though it >>>>> is disabled. But only those enabled GICC node are marked as present CPUs. >>>>> So that kernel will initialize some CPU related data structure in advance before >>>>> the CPU is actually hot added into the system. This patchset also implement >>>>> 'acpi_(un)map_cpu()' and 'arch_(un)register_cpu()' for ARM64. These functions are >>>>> needed to enable CPU hotplug. >>>>> >>>>> To support CPU hotplug, we need to add all the possible GICC node in MADT >>>>> including those CPUs that are not present but may be hot added later. Those >>>>> CPUs are marked as disabled in GICC nodes. >>>> ... what do you need this for? >>>> >>>> (The term cpu-hotplug in the arm world almost never means hot-adding a new package/die to >>>> the platform, we usually mean taking CPUs online/offline for power management. e.g. >>>> cpuhp_offline_cpu_device()) >>>> >>>> It looks like you're adding support for hot-adding a new package/die to the platform ... >>>> but only for virtualisation. >>>> >>>> I don't see why this is needed for virtualisation. The in-kernel irqchip needs to know >>>> these vcpu exist before you can enter the guest for the first time. You can't create them >>>> late. At best you're saving the host scheduling a vcpu that is offline. Is this really a >>>> problem? >>>> >>>> If we moved PSCI support to user-space, you could avoid creating host vcpu threads until >>>> the guest brings the vcpu online, which would solve that problem, and save the host >>>> resources for the thread too. (and its acpi/dt agnostic) >>>> >>>> I don't see the difference here between booting the guest with 'maxcpus=1', and bringing >>>> the vcpu online later. The only real difference seems to be moving the can-be-online >>>> policy into the hypervisor/VMM... >>> Isn't that an important distinction from a cloud service provider's >>> perspective? Host cpu-time is. Describing this as guest vcpu's is a bit weird. I'd expect the statement be something like "you're paying for 50% of one Xeon v-whatever". It shouldn't make a difference if I run 8 vcpus or 2, the amount of cpu-time would still be constrained by the cloud provider. >>> As far as I understand it, you also need CPU hotplug capabilities to >>> support things like Kata runtime under Kubernetes. i.e. when >>> implementing your containers in the form of light weight VMs for the >>> additional security ... and the orchestration layer cannot determine >>> ahead of time how much CPU/memory resources are going to be needed to >>> run the pod(s). >> Why would it be any different? You can pre-allocate your vcpus, leave >> them parked until some external agent decides to signal the container >> that it it can use another bunch of CPUs. At that point, the container >> must actively boot these vcpus (they aren't going to come up by magic). >> >> Given that you must have sized your virtual platform to deal with the >> maximum set of resources you anticipate (think of the GIC >> redistributors, for example), I really wonder what you gain here. > Maybe I'm not following the alternative proposal completely, but wouldn't a guest VM (who > happens to be in control of its OS) be able to add/online vCPU resources without approval > from the VMM this way? The in-kernel PSCI implementation will allow all CPUs to be online/offline. If we moved that support to the VMM, it could apply some policy as to whether a cpu-online call succeeds or fails. Thanks, James
On 2019/7/5 18:12, James Morse wrote: > Hi guys, > > (CC: +kvmarm list) > > On 29/06/2019 03:42, Xiongfeng Wang wrote: >> This patchset mark all the GICC node in MADT as possible CPUs even though it >> is disabled. But only those enabled GICC node are marked as present CPUs. >> So that kernel will initialize some CPU related data structure in advance before >> the CPU is actually hot added into the system. This patchset also implement >> 'acpi_(un)map_cpu()' and 'arch_(un)register_cpu()' for ARM64. These functions are >> needed to enable CPU hotplug. >> >> To support CPU hotplug, we need to add all the possible GICC node in MADT >> including those CPUs that are not present but may be hot added later. Those >> CPUs are marked as disabled in GICC nodes. > > ... what do you need this for? > > (The term cpu-hotplug in the arm world almost never means hot-adding a new package/die to > the platform, we usually mean taking CPUs online/offline for power management. e.g. > cpuhp_offline_cpu_device()) > > It looks like you're adding support for hot-adding a new package/die to the platform ... > but only for virtualisation. I read the GIC driver these days. It is a lot of work to configure the GIC at runtime, and this patchset doesn't support this. Actually, my original idea is hot-adding cores to the platform, and it is only for virtualisation. These cores need to be on the same physical package. The GIC is initialized when the kernel boots and GICR is initialized when the core is hot-added and brought up. > > I don't see why this is needed for virtualisation. The in-kernel irqchip needs to know > these vcpu exist before you can enter the guest for the first time. You can't create them > late. At best you're saving the host scheduling a vcpu that is offline. Is this really a > problem? > > If we moved PSCI support to user-space, you could avoid creating host vcpu threads until > the guest brings the vcpu online, which would solve that problem, and save the host > resources for the thread too. (and its acpi/dt agnostic) > > I don't see the difference here between booting the guest with 'maxcpus=1', and bringing > the vcpu online later. The only real difference seems to be moving the can-be-online > policy into the hypervisor/VMM... > > > I think physical package/die hotadd is a much bigger, uglier problem than doing the same > under virtualisation. Its best to do this on real hardware first so we don't miss > something. (cpu-topology, numa, memory, errata, timers?) > I'm worried that doing virtualisation first means the firmware-requirements for physical > hotadd stuff is "whatever Qemu does". > > > Thanks, > > James > > . >
Hi Marc On 2019/7/10 17:15, Marc Zyngier wrote: > On 09/07/2019 20:06, Maran Wilson wrote: >> On 7/5/2019 3:12 AM, James Morse wrote: >>> Hi guys, >>> >>> (CC: +kvmarm list) >>> >>> On 29/06/2019 03:42, Xiongfeng Wang wrote: >>>> This patchset mark all the GICC node in MADT as possible CPUs even though it >>>> is disabled. But only those enabled GICC node are marked as present CPUs. >>>> So that kernel will initialize some CPU related data structure in advance before >>>> the CPU is actually hot added into the system. This patchset also implement >>>> 'acpi_(un)map_cpu()' and 'arch_(un)register_cpu()' for ARM64. These functions are >>>> needed to enable CPU hotplug. >>>> >>>> To support CPU hotplug, we need to add all the possible GICC node in MADT >>>> including those CPUs that are not present but may be hot added later. Those >>>> CPUs are marked as disabled in GICC nodes. >>> ... what do you need this for? >>> >>> (The term cpu-hotplug in the arm world almost never means hot-adding a new package/die to >>> the platform, we usually mean taking CPUs online/offline for power management. e.g. >>> cpuhp_offline_cpu_device()) >>> >>> It looks like you're adding support for hot-adding a new package/die to the platform ... >>> but only for virtualisation. >>> >>> I don't see why this is needed for virtualisation. The in-kernel irqchip needs to know >>> these vcpu exist before you can enter the guest for the first time. You can't create them >>> late. At best you're saving the host scheduling a vcpu that is offline. Is this really a >>> problem? >>> >>> If we moved PSCI support to user-space, you could avoid creating host vcpu threads until >>> the guest brings the vcpu online, which would solve that problem, and save the host >>> resources for the thread too. (and its acpi/dt agnostic) >>> >>> I don't see the difference here between booting the guest with 'maxcpus=1', and bringing >>> the vcpu online later. The only real difference seems to be moving the can-be-online >>> policy into the hypervisor/VMM... >> Isn't that an important distinction from a cloud service provider's >> perspective? >> >> As far as I understand it, you also need CPU hotplug capabilities to >> support things like Kata runtime under Kubernetes. i.e. when >> implementing your containers in the form of light weight VMs for the >> additional security ... and the orchestration layer cannot determine >> ahead of time how much CPU/memory resources are going to be needed to >> run the pod(s). > Why would it be any different? You can pre-allocate your vcpus, leave > them parked until some external agent decides to signal the container > that it it can use another bunch of CPUs. At that point, the container > must actively boot these vcpus (they aren't going to come up by magic). > > Given that you must have sized your virtual platform to deal with the > maximum set of resources you anticipate (think of the GIC > redistributors, for example), I really wonder what you gain here. I agree with your point in GIC aspect. It will mess up things if it makes GIC resource hotpluggable in qemu. But it also would be better that vmm only startup limited vcpu thread resource. How about: 1. qemu only starts only N vcpu thread (-smp N, maxcpus=M) 2. qemu reserves the GIC resource with maxium M vcpu number 3. when qmp cmd cpu hotplug-add is triggerred, send a GED event to guest kernel 4. guest kernel recv it and trigger the acpi plug process. Currently ACPI_CPU_HOTPLUG is enabled for Kconfig but completely not workable. --- Cheers, Jia
Hi Jia, On 16/07/2019 08:59, Jia He wrote: > Hi Marc > > On 2019/7/10 17:15, Marc Zyngier wrote: >> On 09/07/2019 20:06, Maran Wilson wrote: >>> On 7/5/2019 3:12 AM, James Morse wrote: >>>> Hi guys, >>>> >>>> (CC: +kvmarm list) >>>> >>>> On 29/06/2019 03:42, Xiongfeng Wang wrote: >>>>> This patchset mark all the GICC node in MADT as possible CPUs even though it >>>>> is disabled. But only those enabled GICC node are marked as present CPUs. >>>>> So that kernel will initialize some CPU related data structure in advance before >>>>> the CPU is actually hot added into the system. This patchset also implement >>>>> 'acpi_(un)map_cpu()' and 'arch_(un)register_cpu()' for ARM64. These functions are >>>>> needed to enable CPU hotplug. >>>>> >>>>> To support CPU hotplug, we need to add all the possible GICC node in MADT >>>>> including those CPUs that are not present but may be hot added later. Those >>>>> CPUs are marked as disabled in GICC nodes. >>>> ... what do you need this for? >>>> >>>> (The term cpu-hotplug in the arm world almost never means hot-adding a new package/die to >>>> the platform, we usually mean taking CPUs online/offline for power management. e.g. >>>> cpuhp_offline_cpu_device()) >>>> >>>> It looks like you're adding support for hot-adding a new package/die to the platform ... >>>> but only for virtualisation. >>>> >>>> I don't see why this is needed for virtualisation. The in-kernel irqchip needs to know >>>> these vcpu exist before you can enter the guest for the first time. You can't create them >>>> late. At best you're saving the host scheduling a vcpu that is offline. Is this really a >>>> problem? >>>> >>>> If we moved PSCI support to user-space, you could avoid creating host vcpu threads until >>>> the guest brings the vcpu online, which would solve that problem, and save the host >>>> resources for the thread too. (and its acpi/dt agnostic) >>>> >>>> I don't see the difference here between booting the guest with 'maxcpus=1', and bringing >>>> the vcpu online later. The only real difference seems to be moving the can-be-online >>>> policy into the hypervisor/VMM... >>> Isn't that an important distinction from a cloud service provider's >>> perspective? >>> >>> As far as I understand it, you also need CPU hotplug capabilities to >>> support things like Kata runtime under Kubernetes. i.e. when >>> implementing your containers in the form of light weight VMs for the >>> additional security ... and the orchestration layer cannot determine >>> ahead of time how much CPU/memory resources are going to be needed to >>> run the pod(s). >> Why would it be any different? You can pre-allocate your vcpus, leave >> them parked until some external agent decides to signal the container >> that it it can use another bunch of CPUs. At that point, the container >> must actively boot these vcpus (they aren't going to come up by magic). >> >> Given that you must have sized your virtual platform to deal with the >> maximum set of resources you anticipate (think of the GIC >> redistributors, for example), I really wonder what you gain here. > I agree with your point in GIC aspect. It will mess up things if it makes > > GIC resource hotpluggable in qemu. It is far worse than just a mess. You'd need to come up with a way to place your redistributors in memory, and tell the running guest where these redistributors are. Currently, there is no method to describe such changes to the address space, and I certainly don't want QEMU to invent one. This needs to be modeled after what would happen on real HW. > But it also would be better that vmm > > only startup limited vcpu thread resource. > > How about: > > 1. qemu only starts only N vcpu thread (-smp N, maxcpus=M) > > 2. qemu reserves the GIC resource with maxium M vcpu number Note that this implies actually initializing M vcpus in the VM. You may not have created the corresponding (M - N) threads, but the vcpus will exist. Can you please quantify how much you'd save by doing that? > 3. when qmp cmd cpu hotplug-add is triggerred, send a GED event to guest kernel > > 4. guest kernel recv it and trigger the acpi plug process. > > Currently ACPI_CPU_HOTPLUG is enabled for Kconfig but completely not workable. Well, there so far *zero* CPU_HOTPLUG in the arm64 kernel other than getting CPUs in and out of PSCI. Thanks, M.