diff mbox series

kvm_host: bump KVM_MAX_IRQ_ROUTE to 128k

Message ID 20240321082442.195631-1-d-tatianin@yandex-team.ru (mailing list archive)
State New, archived
Headers show
Series kvm_host: bump KVM_MAX_IRQ_ROUTE to 128k | expand

Commit Message

Daniil Tatianin March 21, 2024, 8:24 a.m. UTC
We would like to be able to create large VMs (up to 224 vCPUs atm) with
up to 128 virtio-net cards, where each card needs a TX+RX queue per vCPU
for optimal performance (as well as config & control interrupts per
card). Adding in extra virtio-blk controllers with a queue per vCPU (up
to 192 disks) yields a total of about ~100k IRQ routes, rounded up to
128k for extra headroom and flexibility.

The current limit of 4096 was set in 2018 and is too low for modern
demands. It also seems to be there for no good reason as routes are
allocated lazily by the kernel anyway (depending on the largest GSI
requested by the VM).

Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
---
 include/linux/kvm_host.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Daniil Tatianin April 15, 2024, 6:36 a.m. UTC | #1
ping :)

On 3/21/24 11:24 AM, Daniil Tatianin wrote:
> We would like to be able to create large VMs (up to 224 vCPUs atm) with
> up to 128 virtio-net cards, where each card needs a TX+RX queue per vCPU
> for optimal performance (as well as config & control interrupts per
> card). Adding in extra virtio-blk controllers with a queue per vCPU (up
> to 192 disks) yields a total of about ~100k IRQ routes, rounded up to
> 128k for extra headroom and flexibility.
>
> The current limit of 4096 was set in 2018 and is too low for modern
> demands. It also seems to be there for no good reason as routes are
> allocated lazily by the kernel anyway (depending on the largest GSI
> requested by the VM).
>
> Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
> ---
>   include/linux/kvm_host.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 48f31dcd318a..10a141add2a8 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2093,7 +2093,7 @@ static inline bool mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm,
>   
>   #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
>   
> -#define KVM_MAX_IRQ_ROUTES 4096 /* might need extension/rework in the future */
> +#define KVM_MAX_IRQ_ROUTES 131072 /* might need extension/rework in the future */
>   
>   bool kvm_arch_can_set_irq_routing(struct kvm *kvm);
>   int kvm_set_irq_routing(struct kvm *kvm,
Daniil Tatianin June 17, 2024, 8:29 a.m. UTC | #2
Ping :)

Is anyone interested in this change?
Our internal testing has shown no problems with this, so I think it 
should be fine.

Thanks!

On 3/21/24 11:24 AM, Daniil Tatianin wrote:
> We would like to be able to create large VMs (up to 224 vCPUs atm) with
> up to 128 virtio-net cards, where each card needs a TX+RX queue per vCPU
> for optimal performance (as well as config & control interrupts per
> card). Adding in extra virtio-blk controllers with a queue per vCPU (up
> to 192 disks) yields a total of about ~100k IRQ routes, rounded up to
> 128k for extra headroom and flexibility.
>
> The current limit of 4096 was set in 2018 and is too low for modern
> demands. It also seems to be there for no good reason as routes are
> allocated lazily by the kernel anyway (depending on the largest GSI
> requested by the VM).
>
> Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
> ---
>   include/linux/kvm_host.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 48f31dcd318a..10a141add2a8 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2093,7 +2093,7 @@ static inline bool mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm,
>   
>   #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
>   
> -#define KVM_MAX_IRQ_ROUTES 4096 /* might need extension/rework in the future */
> +#define KVM_MAX_IRQ_ROUTES 131072 /* might need extension/rework in the future */
>   
>   bool kvm_arch_can_set_irq_routing(struct kvm *kvm);
>   int kvm_set_irq_routing(struct kvm *kvm,
Igor Mammedov June 18, 2024, 12:28 p.m. UTC | #3
On Thu, 21 Mar 2024 11:24:42 +0300
Daniil Tatianin <d-tatianin@yandex-team.ru> wrote:

> We would like to be able to create large VMs (up to 224 vCPUs atm) with
> up to 128 virtio-net cards, where each card needs a TX+RX queue per vCPU
> for optimal performance (as well as config & control interrupts per
> card). Adding in extra virtio-blk controllers with a queue per vCPU (up
> to 192 disks) yields a total of about ~100k IRQ routes, rounded up to
> 128k for extra headroom and flexibility.
> 
> The current limit of 4096 was set in 2018 and is too low for modern
> demands. It also seems to be there for no good reason as routes are
> allocated lazily by the kernel anyway (depending on the largest GSI
> requested by the VM).
> 
> Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>

LGTM

Acked-by: Igor Mammedov <imammedo@redhat.com>

> ---
>  include/linux/kvm_host.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 48f31dcd318a..10a141add2a8 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2093,7 +2093,7 @@ static inline bool mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm,
>  
>  #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
>  
> -#define KVM_MAX_IRQ_ROUTES 4096 /* might need extension/rework in the future */
> +#define KVM_MAX_IRQ_ROUTES 131072 /* might need extension/rework in the future */
>  
>  bool kvm_arch_can_set_irq_routing(struct kvm *kvm);
>  int kvm_set_irq_routing(struct kvm *kvm,
Daniil Tatianin July 24, 2024, 10:23 a.m. UTC | #4
On 6/18/24 3:28 PM, Igor Mammedov wrote:

> On Thu, 21 Mar 2024 11:24:42 +0300
> Daniil Tatianin <d-tatianin@yandex-team.ru> wrote:
>
>> We would like to be able to create large VMs (up to 224 vCPUs atm) with
>> up to 128 virtio-net cards, where each card needs a TX+RX queue per vCPU
>> for optimal performance (as well as config & control interrupts per
>> card). Adding in extra virtio-blk controllers with a queue per vCPU (up
>> to 192 disks) yields a total of about ~100k IRQ routes, rounded up to
>> 128k for extra headroom and flexibility.
>>
>> The current limit of 4096 was set in 2018 and is too low for modern
>> demands. It also seems to be there for no good reason as routes are
>> allocated lazily by the kernel anyway (depending on the largest GSI
>> requested by the VM).
>>
>> Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
> LGTM
>
> Acked-by: Igor Mammedov <imammedo@redhat.com>

Thank you!

I want to ping everyone once again to take a look at this, I think this 
patch is quite trivial and unlocks larger VMs for free, would really 
appreciate a review from anyone interested!

>> ---
>>   include/linux/kvm_host.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 48f31dcd318a..10a141add2a8 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -2093,7 +2093,7 @@ static inline bool mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm,
>>   
>>   #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
>>   
>> -#define KVM_MAX_IRQ_ROUTES 4096 /* might need extension/rework in the future */
>> +#define KVM_MAX_IRQ_ROUTES 131072 /* might need extension/rework in the future */
>>   
>>   bool kvm_arch_can_set_irq_routing(struct kvm *kvm);
>>   int kvm_set_irq_routing(struct kvm *kvm,
Sean Christopherson Aug. 22, 2024, 7:05 p.m. UTC | #5
KVM: for the scope

On Thu, Mar 21, 2024, Daniil Tatianin wrote:
> We would like to be able to create large VMs (up to 224 vCPUs atm) with
> up to 128 virtio-net cards, where each card needs a TX+RX queue per vCPU
> for optimal performance (as well as config & control interrupts per
> card). Adding in extra virtio-blk controllers with a queue per vCPU (up
> to 192 disks) yields a total of about ~100k IRQ routes, rounded up to
> 128k for extra headroom and flexibility.
> 
> The current limit of 4096 was set in 2018 and is too low for modern
> demands. It also seems to be there for no good reason as routes are
> allocated lazily by the kernel anyway (depending on the largest GSI
> requested by the VM).
> 
> Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
> ---
>  include/linux/kvm_host.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 48f31dcd318a..10a141add2a8 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2093,7 +2093,7 @@ static inline bool mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm,
>  
>  #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
>  
> -#define KVM_MAX_IRQ_ROUTES 4096 /* might need extension/rework in the future */
> +#define KVM_MAX_IRQ_ROUTES 131072 /* might need extension/rework in the future */

I am not comfortable simply bumping the max.  Yeah, it's allocated on-demand, but
if my math is correct, the means a max of ~8MiB for the table, plus another 8MiB
for the tnries.   And when handling KVM_SET_GSI_ROUTING, KVM will have 2 tables
(old and new), and another 4MiB for duplicating the userspace array. Those allocations
are accounted, but that's still a lot of potential thrash.

And KVM's handling is also grossly inefficient, e.g. reallocating everything just
to change one routing entry is awful.  Maybe painfully slow updates are fine for
your use case, but some OSes have a bad habit of round-robining IRQ destinations
on a regular basis.

So it might be "free" in the sense that it costs you nothing to get your use case
working, but there's very much a cost for KVM in the form of technical debt that
someone will have to eventually pay for.

I don't have any concrete thoughts on how to make KVM's implementation less sucky,
but I do think we need to give it some attention before increasing the maximum
number of IRQ routes, especially before increasing it by 32x.
diff mbox series

Patch

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 48f31dcd318a..10a141add2a8 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2093,7 +2093,7 @@  static inline bool mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm,
 
 #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
 
-#define KVM_MAX_IRQ_ROUTES 4096 /* might need extension/rework in the future */
+#define KVM_MAX_IRQ_ROUTES 131072 /* might need extension/rework in the future */
 
 bool kvm_arch_can_set_irq_routing(struct kvm *kvm);
 int kvm_set_irq_routing(struct kvm *kvm,