diff mbox series

[v2] i386: Add ratelimit for bus locks acquired in guest

Message ID 20210420093736.17613-1-chenyi.qiang@intel.com (mailing list archive)
State New
Headers show
Series [v2] i386: Add ratelimit for bus locks acquired in guest | expand

Commit Message

Chenyi Qiang April 20, 2021, 9:37 a.m. UTC
Virtual Machines can exploit bus locks to degrade the performance of
system. To address this kind of performance DOS attack, bus lock VM exit
is introduced in KVM and it will report the bus locks detected in guest,
which can help userspace to enforce throttling policies.

The availability of bus lock VM exit can be detected through the
KVM_CAP_X86_BUS_LOCK_EXIT. The returned bitmap contains the potential
policies supported by KVM. The field KVM_BUS_LOCK_DETECTION_EXIT in
bitmap is the only supported strategy at present. It indicates that KVM
will exit to userspace to handle the bus locks.

This patch adds a ratelimit on the bus locks acquired in guest as a
mitigation policy.

Introduce a new field "bld" to record the limited speed of bus locks in
target VM. The user can specify it through the "bus-lock-detection"
as a machine property. In current implementation, the default value of
the speed is 0 per second, which means no restriction on the bus locks.

Ratelimit enforced in data transmission uses a time slice of 100ms to
get smooth output during regular operations in block jobs. As for
ratelimit on bus lock detection, simply set the ratelimit interval to 1s
and restrict the quota of bus lock occurrence to the value of "bld". A
potential alternative is to introduce the time slice as a property
which can help the user achieve more precise control.

The detail of Bus lock VM exit can be found in spec:
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>

---
Changes from RFC v1:
  - Remove the rip info output, as the rip can't reflect the bus lock
    position correctly.
  - RFC v1: https://lore.kernel.org/qemu-devel/20210317084709.15605-1-chenyi.qiang@intel.com/
---
 hw/i386/x86.c         |  6 ++++++
 include/hw/i386/x86.h |  7 +++++++
 target/i386/kvm/kvm.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 55 insertions(+)

Comments

Eduardo Habkost April 20, 2021, 4:34 p.m. UTC | #1
Hello,

Thanks for the patch.  Comments below:

On Tue, Apr 20, 2021 at 05:37:36PM +0800, Chenyi Qiang wrote:
> Virtual Machines can exploit bus locks to degrade the performance of
> system. To address this kind of performance DOS attack, bus lock VM exit
> is introduced in KVM and it will report the bus locks detected in guest,
> which can help userspace to enforce throttling policies.
> 

Is there anything today that would protect the system from
similar attacks from userspace with access to /dev/kvm?


> The availability of bus lock VM exit can be detected through the
> KVM_CAP_X86_BUS_LOCK_EXIT. The returned bitmap contains the potential
> policies supported by KVM. The field KVM_BUS_LOCK_DETECTION_EXIT in
> bitmap is the only supported strategy at present. It indicates that KVM
> will exit to userspace to handle the bus locks.
> 
> This patch adds a ratelimit on the bus locks acquired in guest as a
> mitigation policy.
> 
> Introduce a new field "bld" to record the limited speed of bus locks in
> target VM. The user can specify it through the "bus-lock-detection"
> as a machine property. In current implementation, the default value of
> the speed is 0 per second, which means no restriction on the bus locks.
> 
> Ratelimit enforced in data transmission uses a time slice of 100ms to
> get smooth output during regular operations in block jobs. As for
> ratelimit on bus lock detection, simply set the ratelimit interval to 1s
> and restrict the quota of bus lock occurrence to the value of "bld". A
> potential alternative is to introduce the time slice as a property
> which can help the user achieve more precise control.
> 
> The detail of Bus lock VM exit can be found in spec:
> https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
> 
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> 
> ---
> Changes from RFC v1:
>   - Remove the rip info output, as the rip can't reflect the bus lock
>     position correctly.
>   - RFC v1: https://lore.kernel.org/qemu-devel/20210317084709.15605-1-chenyi.qiang@intel.com/
> ---
>  hw/i386/x86.c         |  6 ++++++
>  include/hw/i386/x86.h |  7 +++++++
>  target/i386/kvm/kvm.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 55 insertions(+)
> 
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index ed796fe6ba..42d10857a6 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -1256,6 +1256,12 @@ static void x86_machine_initfn(Object *obj)
>      x86ms->pci_irq_mask = ACPI_BUILD_PCI_IRQS;
>      x86ms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6);
>      x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
> +    x86ms->bld = 0;
> +
> +    object_property_add_uint64_ptr(obj, "bus-lock-detection",
> +                                   &x86ms->bld, OBJ_PROP_FLAG_READWRITE);
> +    object_property_set_description(obj, "bus-lock-detection",
> +            "Bus lock detection ratelimit");

I suggest using a name that indicates this is a rate limit (e.g.
"bus-lock-rate-limit").  "bus-lock-detection" sounds like a
boolean option to just enable/disable detection.

Please register a class property at x86_machine_class_init()
instead.  The plan is to eventually stop using instance
properties wherever possible, as class properties make property
introspection simpler.

See machine_class_init() for some examples of integer class
properties.  Unfortunately object_class_property_add_uint64_ptr()
is not very useful currently, so you'll need to write your own
getter/setter function.


>  }
>  
>  static void x86_machine_class_init(ObjectClass *oc, void *data)
> diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
> index c09b648dff..d6e198b228 100644
> --- a/include/hw/i386/x86.h
> +++ b/include/hw/i386/x86.h
> @@ -74,6 +74,13 @@ struct X86MachineState {
>       * will be translated to MSI messages in the address space.
>       */
>      AddressSpace *ioapic_as;
> +
> +    /*
> +     * ratelimit enforced on detected bus locks, the default value
> +     * is 0 per second
> +     */

I suggest documenting here that 0 means no limit.

> +    uint64_t bld;
> +    RateLimit bld_limit;
>  };
>  
>  #define X86_MACHINE_SMM              "smm"
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 7fe9f52710..a75fac0404 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -130,6 +130,8 @@ static bool has_msr_mcg_ext_ctl;
>  static struct kvm_cpuid2 *cpuid_cache;
>  static struct kvm_msr_list *kvm_feature_msrs;
>  
> +#define SLICE_TIME 1000000000ULL /* ns */
> +

"slice time" is a very generic name.  I suggest "BLD_SLICE_TIME"
or "BUS_LOCK_SLICE_TIME".

>  int kvm_has_pit_state2(void)
>  {
>      return has_pit_state2;
> @@ -2267,6 +2269,27 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>          }
>      }
>  
> +    if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) {
> +        X86MachineState *x86ms = X86_MACHINE(ms);
> +
> +        if (x86ms->bld > 0) {
> +            ret = kvm_check_extension(s, KVM_CAP_X86_BUS_LOCK_EXIT);
> +            if (!(ret & KVM_BUS_LOCK_DETECTION_EXIT)) {
> +                error_report("kvm: bus lock detection unsupported");
> +                return -ENOTSUP;
> +            }
> +            ret = kvm_vm_enable_cap(s, KVM_CAP_X86_BUS_LOCK_EXIT, 0,
> +                                    KVM_BUS_LOCK_DETECTION_EXIT);
> +            if (ret < 0) {
> +                error_report("kvm: Failed to enable bus lock detection cap: %s",
> +                             strerror(-ret));
> +                return ret;
> +            }
> +
> +            ratelimit_set_speed(&x86ms->bld_limit, x86ms->bld, SLICE_TIME);
> +        }
> +    }
> +
>      return 0;
>  }
>  
> @@ -4221,6 +4244,18 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
>      }
>  }
>  
> +static void kvm_rate_limit_on_bus_lock(void)
> +{
> +    MachineState *ms = MACHINE(qdev_get_machine());
> +    X86MachineState *x86ms = X86_MACHINE(ms);
> +
> +    uint64_t delay_ns = ratelimit_calculate_delay(&x86ms->bld_limit, 1);
> +

This doesn't look thread safe.  Isn't this going to run from the
VCPU thread with no locks acquired?

> +    if (delay_ns) {
> +        g_usleep(delay_ns / SCALE_US);
> +    }
> +}
> +
>  MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct kvm_run *run)
>  {
>      X86CPU *x86_cpu = X86_CPU(cpu);
> @@ -4236,6 +4271,9 @@ MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct kvm_run *run)
>      } else {
>          env->eflags &= ~IF_MASK;
>      }
> +    if (run->flags & KVM_RUN_X86_BUS_LOCK) {
> +        kvm_rate_limit_on_bus_lock();
> +    }
>  
>      /* We need to protect the apic state against concurrent accesses from
>       * different threads in case the userspace irqchip is used. */
> @@ -4594,6 +4632,10 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>          ioapic_eoi_broadcast(run->eoi.vector);
>          ret = 0;
>          break;
> +    case KVM_EXIT_X86_BUS_LOCK:
> +        /* already handled in kvm_arch_post_run */
> +        ret = 0;
> +        break;
>      default:
>          fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>          ret = -1;
> -- 
> 2.17.1
>
Chenyi Qiang April 21, 2021, 6:26 a.m. UTC | #2
Hi, Eduardo, thanks for your comments!


On 4/21/2021 12:34 AM, Eduardo Habkost wrote:
> Hello,
> 
> Thanks for the patch.  Comments below:
> 
> On Tue, Apr 20, 2021 at 05:37:36PM +0800, Chenyi Qiang wrote:
>> Virtual Machines can exploit bus locks to degrade the performance of
>> system. To address this kind of performance DOS attack, bus lock VM exit
>> is introduced in KVM and it will report the bus locks detected in guest,
>> which can help userspace to enforce throttling policies.
>>
> 
> Is there anything today that would protect the system from
> similar attacks from userspace with access to /dev/kvm?
> 

I can't fully understand your meaning for "similar attack with access to 
/dev/kvm". But there are some similar associated detection features on 
bare metal.

1. Split lock 
detection:https://lore.kernel.org/lkml/158031147976.396.8941798847364718785.tip-bot2@tip-bot2/
Some CPUs can raise an #AC trap when a split lock is attempted.

2. Bus lock Debug Exception: 
https://lore.kernel.org/lkml/20210322135325.682257-1-fenghua.yu@intel.com/
The kernel can be notified by an #DB trap after a user instruction 
acquires a bus lock and is executed.

> 
>> The availability of bus lock VM exit can be detected through the
>> KVM_CAP_X86_BUS_LOCK_EXIT. The returned bitmap contains the potential
>> policies supported by KVM. The field KVM_BUS_LOCK_DETECTION_EXIT in
>> bitmap is the only supported strategy at present. It indicates that KVM
>> will exit to userspace to handle the bus locks.
>>
>> This patch adds a ratelimit on the bus locks acquired in guest as a
>> mitigation policy.
>>
>> Introduce a new field "bld" to record the limited speed of bus locks in
>> target VM. The user can specify it through the "bus-lock-detection"
>> as a machine property. In current implementation, the default value of
>> the speed is 0 per second, which means no restriction on the bus locks.
>>
>> Ratelimit enforced in data transmission uses a time slice of 100ms to
>> get smooth output during regular operations in block jobs. As for
>> ratelimit on bus lock detection, simply set the ratelimit interval to 1s
>> and restrict the quota of bus lock occurrence to the value of "bld". A
>> potential alternative is to introduce the time slice as a property
>> which can help the user achieve more precise control.
>>
>> The detail of Bus lock VM exit can be found in spec:
>> https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
>>
>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>>
>> ---
>> Changes from RFC v1:
>>    - Remove the rip info output, as the rip can't reflect the bus lock
>>      position correctly.
>>    - RFC v1: https://lore.kernel.org/qemu-devel/20210317084709.15605-1-chenyi.qiang@intel.com/
>> ---
>>   hw/i386/x86.c         |  6 ++++++
>>   include/hw/i386/x86.h |  7 +++++++
>>   target/i386/kvm/kvm.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 55 insertions(+)
>>
>> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
>> index ed796fe6ba..42d10857a6 100644
>> --- a/hw/i386/x86.c
>> +++ b/hw/i386/x86.c
>> @@ -1256,6 +1256,12 @@ static void x86_machine_initfn(Object *obj)
>>       x86ms->pci_irq_mask = ACPI_BUILD_PCI_IRQS;
>>       x86ms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6);
>>       x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
>> +    x86ms->bld = 0;
>> +
>> +    object_property_add_uint64_ptr(obj, "bus-lock-detection",
>> +                                   &x86ms->bld, OBJ_PROP_FLAG_READWRITE);
>> +    object_property_set_description(obj, "bus-lock-detection",
>> +            "Bus lock detection ratelimit");
> 
> I suggest using a name that indicates this is a rate limit (e.g.
> "bus-lock-rate-limit").  "bus-lock-detection" sounds like a
> boolean option to just enable/disable detection.
> 

Fair enough.

> Please register a class property at x86_machine_class_init()
> instead.  The plan is to eventually stop using instance
> properties wherever possible, as class properties make property
> introspection simpler.
> 
> See machine_class_init() for some examples of integer class
> properties.  Unfortunately object_class_property_add_uint64_ptr()
> is not very useful currently, so you'll need to write your own
> getter/setter function.
> 

Yeah, will do the change.

> 
>>   }
>>   
>>   static void x86_machine_class_init(ObjectClass *oc, void *data)
>> diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
>> index c09b648dff..d6e198b228 100644
>> --- a/include/hw/i386/x86.h
>> +++ b/include/hw/i386/x86.h
>> @@ -74,6 +74,13 @@ struct X86MachineState {
>>        * will be translated to MSI messages in the address space.
>>        */
>>       AddressSpace *ioapic_as;
>> +
>> +    /*
>> +     * ratelimit enforced on detected bus locks, the default value
>> +     * is 0 per second
>> +     */
> 
> I suggest documenting here that 0 means no limit.
> 
>> +    uint64_t bld;
>> +    RateLimit bld_limit;
>>   };
>>   
>>   #define X86_MACHINE_SMM              "smm"
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index 7fe9f52710..a75fac0404 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -130,6 +130,8 @@ static bool has_msr_mcg_ext_ctl;
>>   static struct kvm_cpuid2 *cpuid_cache;
>>   static struct kvm_msr_list *kvm_feature_msrs;
>>   
>> +#define SLICE_TIME 1000000000ULL /* ns */
>> +
> 
> "slice time" is a very generic name.  I suggest "BLD_SLICE_TIME"
> or "BUS_LOCK_SLICE_TIME".
> 

Make sense. I'll change to use the BUS_LOCK_SLICE_TIME.

>>   int kvm_has_pit_state2(void)
>>   {
>>       return has_pit_state2;
>> @@ -2267,6 +2269,27 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>           }
>>       }
>>   
>> +    if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) {
>> +        X86MachineState *x86ms = X86_MACHINE(ms);
>> +
>> +        if (x86ms->bld > 0) {
>> +            ret = kvm_check_extension(s, KVM_CAP_X86_BUS_LOCK_EXIT);
>> +            if (!(ret & KVM_BUS_LOCK_DETECTION_EXIT)) {
>> +                error_report("kvm: bus lock detection unsupported");
>> +                return -ENOTSUP;
>> +            }
>> +            ret = kvm_vm_enable_cap(s, KVM_CAP_X86_BUS_LOCK_EXIT, 0,
>> +                                    KVM_BUS_LOCK_DETECTION_EXIT);
>> +            if (ret < 0) {
>> +                error_report("kvm: Failed to enable bus lock detection cap: %s",
>> +                             strerror(-ret));
>> +                return ret;
>> +            }
>> +
>> +            ratelimit_set_speed(&x86ms->bld_limit, x86ms->bld, SLICE_TIME);
>> +        }
>> +    }
>> +
>>       return 0;
>>   }
>>   
>> @@ -4221,6 +4244,18 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
>>       }
>>   }
>>   
>> +static void kvm_rate_limit_on_bus_lock(void)
>> +{
>> +    MachineState *ms = MACHINE(qdev_get_machine());
>> +    X86MachineState *x86ms = X86_MACHINE(ms);
>> +
>> +    uint64_t delay_ns = ratelimit_calculate_delay(&x86ms->bld_limit, 1);
>> +
> 
> This doesn't look thread safe.  Isn't this going to run from the
> VCPU thread with no locks acquired?
> 
Yes, lock should be added here to fix the race for ratelimit state. I'll 
add it in next version.

>> +    if (delay_ns) {
>> +        g_usleep(delay_ns / SCALE_US);
>> +    }
>> +}
>> +
>>   MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct kvm_run *run)
>>   {
>>       X86CPU *x86_cpu = X86_CPU(cpu);
>> @@ -4236,6 +4271,9 @@ MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct kvm_run *run)
>>       } else {
>>           env->eflags &= ~IF_MASK;
>>       }
>> +    if (run->flags & KVM_RUN_X86_BUS_LOCK) {
>> +        kvm_rate_limit_on_bus_lock();
>> +    }
>>   
>>       /* We need to protect the apic state against concurrent accesses from
>>        * different threads in case the userspace irqchip is used. */
>> @@ -4594,6 +4632,10 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>>           ioapic_eoi_broadcast(run->eoi.vector);
>>           ret = 0;
>>           break;
>> +    case KVM_EXIT_X86_BUS_LOCK:
>> +        /* already handled in kvm_arch_post_run */
>> +        ret = 0;
>> +        break;
>>       default:
>>           fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
>>           ret = -1;
>> -- 
>> 2.17.1
>>
>
Eduardo Habkost April 21, 2021, 2:12 p.m. UTC | #3
On Wed, Apr 21, 2021 at 02:26:42PM +0800, Chenyi Qiang wrote:
> Hi, Eduardo, thanks for your comments!
> 
> 
> On 4/21/2021 12:34 AM, Eduardo Habkost wrote:
> > Hello,
> > 
> > Thanks for the patch.  Comments below:
> > 
> > On Tue, Apr 20, 2021 at 05:37:36PM +0800, Chenyi Qiang wrote:
> > > Virtual Machines can exploit bus locks to degrade the performance of
> > > system. To address this kind of performance DOS attack, bus lock VM exit
> > > is introduced in KVM and it will report the bus locks detected in guest,
> > > which can help userspace to enforce throttling policies.
> > > 
> > 
> > Is there anything today that would protect the system from
> > similar attacks from userspace with access to /dev/kvm?
> > 
> 
> I can't fully understand your meaning for "similar attack with access to
> /dev/kvm". But there are some similar associated detection features on bare
> metal.

What I mean is: you say guests can make a performance DoS attack
on the host, and your patch mitigates that.

What would be the available methods to prevent untrusted
userspace running on the host with access to /dev/kvm from making
a similar DoS attack on the host?

> 
> 1. Split lock detection:https://lore.kernel.org/lkml/158031147976.396.8941798847364718785.tip-bot2@tip-bot2/
> Some CPUs can raise an #AC trap when a split lock is attempted.

Would split_lock_detect=fatal be enough to prevent the above attacks?

Is split_lock_detect=fatal the only available way to prevent them?

> 
> 2. Bus lock Debug Exception:
> https://lore.kernel.org/lkml/20210322135325.682257-1-fenghua.yu@intel.com/
> The kernel can be notified by an #DB trap after a user instruction acquires
> a bus lock and is executed.

I see a rate limit option mentioned at the above URL.  Would a
host kernel bus lock rate limit option make this QEMU patch
redundant?
Xiaoyao Li April 21, 2021, 2:50 p.m. UTC | #4
On 4/21/2021 10:12 PM, Eduardo Habkost wrote:
> On Wed, Apr 21, 2021 at 02:26:42PM +0800, Chenyi Qiang wrote:
>> Hi, Eduardo, thanks for your comments!
>>
>>
>> On 4/21/2021 12:34 AM, Eduardo Habkost wrote:
>>> Hello,
>>>
>>> Thanks for the patch.  Comments below:
>>>
>>> On Tue, Apr 20, 2021 at 05:37:36PM +0800, Chenyi Qiang wrote:
>>>> Virtual Machines can exploit bus locks to degrade the performance of
>>>> system. To address this kind of performance DOS attack, bus lock VM exit
>>>> is introduced in KVM and it will report the bus locks detected in guest,
>>>> which can help userspace to enforce throttling policies.
>>>>
>>>
>>> Is there anything today that would protect the system from
>>> similar attacks from userspace with access to /dev/kvm?
>>>
>>
>> I can't fully understand your meaning for "similar attack with access to
>> /dev/kvm". But there are some similar associated detection features on bare
>> metal.
> 
> What I mean is: you say guests can make a performance DoS attack
> on the host, and your patch mitigates that.
> 
> What would be the available methods to prevent untrusted
> userspace running on the host with access to /dev/kvm from making
> a similar DoS attack on the host?
> 
>>
>> 1. Split lock detection:https://lore.kernel.org/lkml/158031147976.396.8941798847364718785.tip-bot2@tip-bot2/
>> Some CPUs can raise an #AC trap when a split lock is attempted.
> 
> Would split_lock_detect=fatal be enough to prevent the above attacks?

NO.

There are two types bus lock:
1. split lock - lock on cacheable memory while the memory across two 
cache lines.
2. non-wb lock - lock on non-writableback memory (you can find it on 
Intel ISE chapter 8, 
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html)

split lock detection can only prevent 1)

> Is split_lock_detect=fatal the only available way to prevent them?

as above, 2) non-wb lock can be prevented by "non-wb lock disable" feature

> 
>>
>> 2. Bus lock Debug Exception:
>> https://lore.kernel.org/lkml/20210322135325.682257-1-fenghua.yu@intel.com/
>> The kernel can be notified by an #DB trap after a user instruction acquires
>> a bus lock and is executed.
> 
> I see a rate limit option mentioned at the above URL.  Would a
> host kernel bus lock rate limit option make this QEMU patch
> redundant?
> 

No. Bus lock Debug exception cannot be used to detect the bus lock 
happens in guest (vmx non-root mode).

We have patch to virtualize this feature for guest 
https://lore.kernel.org/kvm/20210202090433.13441-1-chenyi.qiang@intel.com/

that guest will have its own setting of bus lock debug exception on or off.

What's more important is that, even we force set the 
MSR_DEBUGCTL.BUS_LOCK_DETECT for guest, guest still can escape from it.
Because bus lock #DB is a trap which is delivered after the instruction 
completes. If the instruction acquires bus lock subsequently faults 
e.g., #PF, then no bus lock #DB generated. But the bus lock does happen.

But with bus lock VM exit, even the instruction faults, it will cause a 
BUS LOCK VM exit.
Eduardo Habkost April 21, 2021, 3:18 p.m. UTC | #5
On Wed, Apr 21, 2021 at 10:50:10PM +0800, Xiaoyao Li wrote:
> On 4/21/2021 10:12 PM, Eduardo Habkost wrote:
> > On Wed, Apr 21, 2021 at 02:26:42PM +0800, Chenyi Qiang wrote:
> > > Hi, Eduardo, thanks for your comments!
> > > 
> > > 
> > > On 4/21/2021 12:34 AM, Eduardo Habkost wrote:
> > > > Hello,
> > > > 
> > > > Thanks for the patch.  Comments below:
> > > > 
> > > > On Tue, Apr 20, 2021 at 05:37:36PM +0800, Chenyi Qiang wrote:
> > > > > Virtual Machines can exploit bus locks to degrade the performance of
> > > > > system. To address this kind of performance DOS attack, bus lock VM exit
> > > > > is introduced in KVM and it will report the bus locks detected in guest,
> > > > > which can help userspace to enforce throttling policies.
> > > > > 
> > > > 
> > > > Is there anything today that would protect the system from
> > > > similar attacks from userspace with access to /dev/kvm?
> > > > 
> > > 
> > > I can't fully understand your meaning for "similar attack with access to
> > > /dev/kvm". But there are some similar associated detection features on bare
> > > metal.
> > 
> > What I mean is: you say guests can make a performance DoS attack
> > on the host, and your patch mitigates that.
> > 
> > What would be the available methods to prevent untrusted
> > userspace running on the host with access to /dev/kvm from making
> > a similar DoS attack on the host?

Thanks for all the clarifications below.  Considering them,
what's the answer to the question above?

> > 
> > > 
> > > 1. Split lock detection:https://lore.kernel.org/lkml/158031147976.396.8941798847364718785.tip-bot2@tip-bot2/
> > > Some CPUs can raise an #AC trap when a split lock is attempted.
> > 
> > Would split_lock_detect=fatal be enough to prevent the above attacks?
> 
> NO.
> 
> There are two types bus lock:
> 1. split lock - lock on cacheable memory while the memory across two cache
> lines.
> 2. non-wb lock - lock on non-writableback memory (you can find it on Intel
> ISE chapter 8, https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html)
> 
> split lock detection can only prevent 1)
> 
> > Is split_lock_detect=fatal the only available way to prevent them?
> 
> as above, 2) non-wb lock can be prevented by "non-wb lock disable" feature

Bus lock VM exit applies to both 1 and 2, correct?

> 
> > 
> > > 
> > > 2. Bus lock Debug Exception:
> > > https://lore.kernel.org/lkml/20210322135325.682257-1-fenghua.yu@intel.com/
> > > The kernel can be notified by an #DB trap after a user instruction acquires
> > > a bus lock and is executed.
> > 
> > I see a rate limit option mentioned at the above URL.  Would a
> > host kernel bus lock rate limit option make this QEMU patch
> > redundant?
> > 
> 
> No. Bus lock Debug exception cannot be used to detect the bus lock happens
> in guest (vmx non-root mode).
> 
> We have patch to virtualize this feature for guest
> https://lore.kernel.org/kvm/20210202090433.13441-1-chenyi.qiang@intel.com/
> 
> that guest will have its own setting of bus lock debug exception on or off.
> 
> What's more important is that, even we force set the
> MSR_DEBUGCTL.BUS_LOCK_DETECT for guest, guest still can escape from it.
> Because bus lock #DB is a trap which is delivered after the instruction
> completes. If the instruction acquires bus lock subsequently faults e.g.,
> #PF, then no bus lock #DB generated. But the bus lock does happen.
> 
> But with bus lock VM exit, even the instruction faults, it will cause a BUS
> LOCK VM exit.
> 
>
Xiaoyao Li April 21, 2021, 3:33 p.m. UTC | #6
On 4/21/2021 11:18 PM, Eduardo Habkost wrote:
> On Wed, Apr 21, 2021 at 10:50:10PM +0800, Xiaoyao Li wrote:
>> On 4/21/2021 10:12 PM, Eduardo Habkost wrote:
>>> On Wed, Apr 21, 2021 at 02:26:42PM +0800, Chenyi Qiang wrote:
>>>> Hi, Eduardo, thanks for your comments!
>>>>
>>>>
>>>> On 4/21/2021 12:34 AM, Eduardo Habkost wrote:
>>>>> Hello,
>>>>>
>>>>> Thanks for the patch.  Comments below:
>>>>>
>>>>> On Tue, Apr 20, 2021 at 05:37:36PM +0800, Chenyi Qiang wrote:
>>>>>> Virtual Machines can exploit bus locks to degrade the performance of
>>>>>> system. To address this kind of performance DOS attack, bus lock VM exit
>>>>>> is introduced in KVM and it will report the bus locks detected in guest,
>>>>>> which can help userspace to enforce throttling policies.
>>>>>>
>>>>>
>>>>> Is there anything today that would protect the system from
>>>>> similar attacks from userspace with access to /dev/kvm?
>>>>>
>>>>
>>>> I can't fully understand your meaning for "similar attack with access to
>>>> /dev/kvm". But there are some similar associated detection features on bare
>>>> metal.
>>>
>>> What I mean is: you say guests can make a performance DoS attack
>>> on the host, and your patch mitigates that.
>>>
>>> What would be the available methods to prevent untrusted
>>> userspace running on the host with access to /dev/kvm from making
>>> a similar DoS attack on the host?
> 
> Thanks for all the clarifications below.  Considering them,
> what's the answer to the question above?

One choice would be enabling BUS LOCK VM exit by default/unconditionally 
in KVM. (Our original attempt is to enable it unconditionally, but 
people want it to be opted in by user)

Or we can use split_lock_detect=fatal to prevent all the split lock in 
guest. For non-wb lock, all the memory should be cacheable as long as no 
device passthrough to guest. More aggressively, we can enable non-wb 
lock disable for guest in KVM.

>>>
>>>>
>>>> 1. Split lock detection:https://lore.kernel.org/lkml/158031147976.396.8941798847364718785.tip-bot2@tip-bot2/
>>>> Some CPUs can raise an #AC trap when a split lock is attempted.
>>>
>>> Would split_lock_detect=fatal be enough to prevent the above attacks?
>>
>> NO.
>>
>> There are two types bus lock:
>> 1. split lock - lock on cacheable memory while the memory across two cache
>> lines.
>> 2. non-wb lock - lock on non-writableback memory (you can find it on Intel
>> ISE chapter 8, https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html)
>>
>> split lock detection can only prevent 1)
>>
>>> Is split_lock_detect=fatal the only available way to prevent them?
>>
>> as above, 2) non-wb lock can be prevented by "non-wb lock disable" feature
> 
> Bus lock VM exit applies to both 1 and 2, correct?

yes!

>>
>>>
>>>>
>>>> 2. Bus lock Debug Exception:
>>>> https://lore.kernel.org/lkml/20210322135325.682257-1-fenghua.yu@intel.com/
>>>> The kernel can be notified by an #DB trap after a user instruction acquires
>>>> a bus lock and is executed.
>>>
>>> I see a rate limit option mentioned at the above URL.  Would a
>>> host kernel bus lock rate limit option make this QEMU patch
>>> redundant?
>>>
>>
>> No. Bus lock Debug exception cannot be used to detect the bus lock happens
>> in guest (vmx non-root mode).
>>
>> We have patch to virtualize this feature for guest
>> https://lore.kernel.org/kvm/20210202090433.13441-1-chenyi.qiang@intel.com/
>>
>> that guest will have its own setting of bus lock debug exception on or off.
>>
>> What's more important is that, even we force set the
>> MSR_DEBUGCTL.BUS_LOCK_DETECT for guest, guest still can escape from it.
>> Because bus lock #DB is a trap which is delivered after the instruction
>> completes. If the instruction acquires bus lock subsequently faults e.g.,
>> #PF, then no bus lock #DB generated. But the bus lock does happen.
>>
>> But with bus lock VM exit, even the instruction faults, it will cause a BUS
>> LOCK VM exit.
>>
>>
>
Chenyi Qiang April 23, 2021, 1:48 a.m. UTC | #7
On 4/21/2021 11:18 PM, Eduardo Habkost wrote:
> On Wed, Apr 21, 2021 at 10:50:10PM +0800, Xiaoyao Li wrote:
>> On 4/21/2021 10:12 PM, Eduardo Habkost wrote:
>>> On Wed, Apr 21, 2021 at 02:26:42PM +0800, Chenyi Qiang wrote:
>>>> Hi, Eduardo, thanks for your comments!
>>>>
>>>>
>>>> On 4/21/2021 12:34 AM, Eduardo Habkost wrote:
>>>>> Hello,
>>>>>
>>>>> Thanks for the patch.  Comments below:
>>>>>
>>>>> On Tue, Apr 20, 2021 at 05:37:36PM +0800, Chenyi Qiang wrote:
>>>>>> Virtual Machines can exploit bus locks to degrade the performance of
>>>>>> system. To address this kind of performance DOS attack, bus lock VM exit
>>>>>> is introduced in KVM and it will report the bus locks detected in guest,
>>>>>> which can help userspace to enforce throttling policies.
>>>>>>
>>>>>
>>>>> Is there anything today that would protect the system from
>>>>> similar attacks from userspace with access to /dev/kvm?
>>>>>
>>>>
>>>> I can't fully understand your meaning for "similar attack with access to
>>>> /dev/kvm". But there are some similar associated detection features on bare
>>>> metal.
>>>
>>> What I mean is: you say guests can make a performance DoS attack
>>> on the host, and your patch mitigates that.
>>>
>>> What would be the available methods to prevent untrusted
>>> userspace running on the host with access to /dev/kvm from making
>>> a similar DoS attack on the host?
> 
> Thanks for all the clarifications below.  Considering them,
> what's the answer to the question above?
> 

Hi Eduardo,

Just make it more clear.

Bus lock detection contains two sub-features. One is bus lock debug 
exception, and the other is bus lock VM exit.

Bus lock #DB exception can help detect the bus locks acquired in user 
space and bus lock VM exit detects the bus locks insides VMs. To address 
the attacks from non-VM userspace attackers against VM, Bus lock #DB 
exception can help.

The Bus lock #DB exception support 
(https://lore.kernel.org/lkml/20210322135325.682257-3-fenghua.yu@intel.com/) 
extends the  existing kernel command line parameter "split_lock_detect=" 
also applying to non-wb bus lock.
For example, split_lock_detect=fatal will send SIGBUS to the attackers 
once this kind of #DB is detected.

>>>
>>>>
>>>> 1. Split lock detection:https://lore.kernel.org/lkml/158031147976.396.8941798847364718785.tip-bot2@tip-bot2/
>>>> Some CPUs can raise an #AC trap when a split lock is attempted.
>>>
>>> Would split_lock_detect=fatal be enough to prevent the above attacks?
>>
>> NO.
>>
>> There are two types bus lock:
>> 1. split lock - lock on cacheable memory while the memory across two cache
>> lines.
>> 2. non-wb lock - lock on non-writableback memory (you can find it on Intel
>> ISE chapter 8, https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html)
>>
>> split lock detection can only prevent 1)
>>
>>> Is split_lock_detect=fatal the only available way to prevent them?
>>
>> as above, 2) non-wb lock can be prevented by "non-wb lock disable" feature
> 
> Bus lock VM exit applies to both 1 and 2, correct?
> 
>>
>>>
>>>>
>>>> 2. Bus lock Debug Exception:
>>>> https://lore.kernel.org/lkml/20210322135325.682257-1-fenghua.yu@intel.com/
>>>> The kernel can be notified by an #DB trap after a user instruction acquires
>>>> a bus lock and is executed.
>>>
>>> I see a rate limit option mentioned at the above URL.  Would a
>>> host kernel bus lock rate limit option make this QEMU patch
>>> redundant?
>>>
>>
>> No. Bus lock Debug exception cannot be used to detect the bus lock happens
>> in guest (vmx non-root mode).
>>
>> We have patch to virtualize this feature for guest
>> https://lore.kernel.org/kvm/20210202090433.13441-1-chenyi.qiang@intel.com/
>>
>> that guest will have its own setting of bus lock debug exception on or off.
>>
>> What's more important is that, even we force set the
>> MSR_DEBUGCTL.BUS_LOCK_DETECT for guest, guest still can escape from it.
>> Because bus lock #DB is a trap which is delivered after the instruction
>> completes. If the instruction acquires bus lock subsequently faults e.g.,
>> #PF, then no bus lock #DB generated. But the bus lock does happen.
>>
>> But with bus lock VM exit, even the instruction faults, it will cause a BUS
>> LOCK VM exit.
>>
>>
>
diff mbox series

Patch

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index ed796fe6ba..42d10857a6 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1256,6 +1256,12 @@  static void x86_machine_initfn(Object *obj)
     x86ms->pci_irq_mask = ACPI_BUILD_PCI_IRQS;
     x86ms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6);
     x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
+    x86ms->bld = 0;
+
+    object_property_add_uint64_ptr(obj, "bus-lock-detection",
+                                   &x86ms->bld, OBJ_PROP_FLAG_READWRITE);
+    object_property_set_description(obj, "bus-lock-detection",
+            "Bus lock detection ratelimit");
 }
 
 static void x86_machine_class_init(ObjectClass *oc, void *data)
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index c09b648dff..d6e198b228 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -74,6 +74,13 @@  struct X86MachineState {
      * will be translated to MSI messages in the address space.
      */
     AddressSpace *ioapic_as;
+
+    /*
+     * ratelimit enforced on detected bus locks, the default value
+     * is 0 per second
+     */
+    uint64_t bld;
+    RateLimit bld_limit;
 };
 
 #define X86_MACHINE_SMM              "smm"
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 7fe9f52710..a75fac0404 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -130,6 +130,8 @@  static bool has_msr_mcg_ext_ctl;
 static struct kvm_cpuid2 *cpuid_cache;
 static struct kvm_msr_list *kvm_feature_msrs;
 
+#define SLICE_TIME 1000000000ULL /* ns */
+
 int kvm_has_pit_state2(void)
 {
     return has_pit_state2;
@@ -2267,6 +2269,27 @@  int kvm_arch_init(MachineState *ms, KVMState *s)
         }
     }
 
+    if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) {
+        X86MachineState *x86ms = X86_MACHINE(ms);
+
+        if (x86ms->bld > 0) {
+            ret = kvm_check_extension(s, KVM_CAP_X86_BUS_LOCK_EXIT);
+            if (!(ret & KVM_BUS_LOCK_DETECTION_EXIT)) {
+                error_report("kvm: bus lock detection unsupported");
+                return -ENOTSUP;
+            }
+            ret = kvm_vm_enable_cap(s, KVM_CAP_X86_BUS_LOCK_EXIT, 0,
+                                    KVM_BUS_LOCK_DETECTION_EXIT);
+            if (ret < 0) {
+                error_report("kvm: Failed to enable bus lock detection cap: %s",
+                             strerror(-ret));
+                return ret;
+            }
+
+            ratelimit_set_speed(&x86ms->bld_limit, x86ms->bld, SLICE_TIME);
+        }
+    }
+
     return 0;
 }
 
@@ -4221,6 +4244,18 @@  void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run)
     }
 }
 
+static void kvm_rate_limit_on_bus_lock(void)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    X86MachineState *x86ms = X86_MACHINE(ms);
+
+    uint64_t delay_ns = ratelimit_calculate_delay(&x86ms->bld_limit, 1);
+
+    if (delay_ns) {
+        g_usleep(delay_ns / SCALE_US);
+    }
+}
+
 MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct kvm_run *run)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
@@ -4236,6 +4271,9 @@  MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct kvm_run *run)
     } else {
         env->eflags &= ~IF_MASK;
     }
+    if (run->flags & KVM_RUN_X86_BUS_LOCK) {
+        kvm_rate_limit_on_bus_lock();
+    }
 
     /* We need to protect the apic state against concurrent accesses from
      * different threads in case the userspace irqchip is used. */
@@ -4594,6 +4632,10 @@  int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
         ioapic_eoi_broadcast(run->eoi.vector);
         ret = 0;
         break;
+    case KVM_EXIT_X86_BUS_LOCK:
+        /* already handled in kvm_arch_post_run */
+        ret = 0;
+        break;
     default:
         fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
         ret = -1;