diff mbox series

[v3] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER

Message ID 20231129030827.2657755-1-shahuang@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v3] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER | expand

Commit Message

Shaoqin Huang Nov. 29, 2023, 3:08 a.m. UTC
The KVM_ARM_VCPU_PMU_V3_FILTER provide the ability to let the VMM decide
which PMU events are provided to the guest. Add a new option
`pmu-filter` as -accel sub-option to set the PMU Event Filtering.
Without the filter, the KVM will expose all events from the host to
guest by default.

The `pmu-filter` has such format:

  pmu-filter="{A,D}:start-end[;{A,D}:start-end...]"

The A means "allow" and D means "deny", start is the first event of the
range and the end is the last one. The first registered range defines
the global policy(global ALLOW if the first @action is DENY, global DENY
if the first @action is ALLOW). The start and end only support hex
format now. For example:

  pmu-filter="A:0x11-0x11;A:0x23-0x3a;D:0x30-0x30"

Since the first action is allow, we have a global deny policy. It
will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is
also allowed except the event 0x30 is denied, and all the other events
are disallowed.

Here is an real example shows how to use the PMU Event Filtering, when
we launch a guest by use kvm, add such command line:

  # qemu-system-aarch64 \
	-accel kvm,pmu-filter="D:0x11-0x11"

Since the first action is deny, we have a global allow policy. This
disables the filtering of the cycle counter (event 0x11 being CPU_CYCLES).

And then in guest, use the perf to count the cycle:

  # perf stat sleep 1

   Performance counter stats for 'sleep 1':

              1.22 msec task-clock                       #    0.001 CPUs utilized
                 1      context-switches                 #  820.695 /sec
                 0      cpu-migrations                   #    0.000 /sec
                55      page-faults                      #   45.138 K/sec
   <not supported>      cycles
           1128954      instructions
            227031      branches                         #  186.323 M/sec
              8686      branch-misses                    #    3.83% of all branches

       1.002492480 seconds time elapsed

       0.001752000 seconds user
       0.000000000 seconds sys

As we can see, the cycle counter has been disabled in the guest, but
other pmu events are still work.

Signed-off-by: Shaoqin Huang <shahuang@redhat.com>
---
v2->v3:
  - Improve commits message, use kernel doc wording, add more explaination on
    filter example, fix some typo error.                [Eric]
  - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. [Eric]
  - Add more precise error message report.              [Eric]
  - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER support in
    KVM.                                                [Eric]

v1->v2:
  - Add more description for allow and deny meaning in 
    commit message.                                     [Sebastian]
  - Small improvement.                                  [Sebastian]

v2: https://lore.kernel.org/all/20231117060838.39723-1-shahuang@redhat.com/
v1: https://lore.kernel.org/all/20231113081713.153615-1-shahuang@redhat.com/
---
 include/sysemu/kvm_int.h |  1 +
 qemu-options.hx          | 21 +++++++++++++
 target/arm/kvm.c         | 23 ++++++++++++++
 target/arm/kvm64.c       | 68 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 113 insertions(+)

Comments

Sebastian Ott Nov. 30, 2023, 4:55 p.m. UTC | #1
On Tue, 28 Nov 2023, Shaoqin Huang wrote:
> +static void kvm_arm_pmu_filter_init(CPUState *cs)
> +{
> +    static bool pmu_filter_init = false;
> +    struct kvm_pmu_event_filter filter;
> +    struct kvm_device_attr attr = {
> +        .group      = KVM_ARM_VCPU_PMU_V3_CTRL,
> +        .attr       = KVM_ARM_VCPU_PMU_V3_FILTER,
> +        .addr       = (uint64_t)&filter,
> +    };
> +    KVMState *kvm_state = cs->kvm_state;
> +    char *tmp;
> +    char *str, act;
> +
> +    if (!kvm_state->kvm_pmu_filter)
> +        return;
> +
> +    if (kvm_vcpu_ioctl(cs, KVM_HAS_DEVICE_ATTR, attr)) {
> +        error_report("The kernel doesn't support the pmu event filter!\n");
> +        abort();
> +    }
> +
> +    /* The filter only needs to be initialized for 1 vcpu. */
> +    if (!pmu_filter_init)
> +        pmu_filter_init = true;

Imho this is missing an else to bail out. Or the shorter version

 	if (pmu_filter_init)
 		return;

 	pmu_filter_init = true;

which could also move above the other tests.

Sebastian
Gavin Shan Dec. 1, 2023, 5:37 a.m. UTC | #2
Hi Shaoqin,

On 11/29/23 14:08, Shaoqin Huang wrote:
> The KVM_ARM_VCPU_PMU_V3_FILTER provide the ability to let the VMM decide
> which PMU events are provided to the guest. Add a new option
> `pmu-filter` as -accel sub-option to set the PMU Event Filtering.
> Without the filter, the KVM will expose all events from the host to
> guest by default.
> 
> The `pmu-filter` has such format:
> 
>    pmu-filter="{A,D}:start-end[;{A,D}:start-end...]"
> 
> The A means "allow" and D means "deny", start is the first event of the
> range and the end is the last one. The first registered range defines
> the global policy(global ALLOW if the first @action is DENY, global DENY
> if the first @action is ALLOW). The start and end only support hex
> format now. For example:
> 
>    pmu-filter="A:0x11-0x11;A:0x23-0x3a;D:0x30-0x30"
> 
> Since the first action is allow, we have a global deny policy. It
> will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is
> also allowed except the event 0x30 is denied, and all the other events
> are disallowed.
> 
> Here is an real example shows how to use the PMU Event Filtering, when
> we launch a guest by use kvm, add such command line:
> 
>    # qemu-system-aarch64 \
> 	-accel kvm,pmu-filter="D:0x11-0x11"
> 
> Since the first action is deny, we have a global allow policy. This
> disables the filtering of the cycle counter (event 0x11 being CPU_CYCLES).
> 
> And then in guest, use the perf to count the cycle:
> 
>    # perf stat sleep 1
> 
>     Performance counter stats for 'sleep 1':
> 
>                1.22 msec task-clock                       #    0.001 CPUs utilized
>                   1      context-switches                 #  820.695 /sec
>                   0      cpu-migrations                   #    0.000 /sec
>                  55      page-faults                      #   45.138 K/sec
>     <not supported>      cycles
>             1128954      instructions
>              227031      branches                         #  186.323 M/sec
>                8686      branch-misses                    #    3.83% of all branches
> 
>         1.002492480 seconds time elapsed
> 
>         0.001752000 seconds user
>         0.000000000 seconds sys
> 
> As we can see, the cycle counter has been disabled in the guest, but
> other pmu events are still work.
> 
> Signed-off-by: Shaoqin Huang <shahuang@redhat.com>
> ---
> v2->v3:
>    - Improve commits message, use kernel doc wording, add more explaination on
>      filter example, fix some typo error.                [Eric]
>    - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. [Eric]
>    - Add more precise error message report.              [Eric]
>    - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER support in
>      KVM.                                                [Eric]
> 
> v1->v2:
>    - Add more description for allow and deny meaning in
>      commit message.                                     [Sebastian]
>    - Small improvement.                                  [Sebastian]
> 
> v2: https://lore.kernel.org/all/20231117060838.39723-1-shahuang@redhat.com/
> v1: https://lore.kernel.org/all/20231113081713.153615-1-shahuang@redhat.com/
> ---
>   include/sysemu/kvm_int.h |  1 +
>   qemu-options.hx          | 21 +++++++++++++
>   target/arm/kvm.c         | 23 ++++++++++++++
>   target/arm/kvm64.c       | 68 ++++++++++++++++++++++++++++++++++++++++
>   4 files changed, 113 insertions(+)
> 
> diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
> index fd846394be..8f4601474f 100644
> --- a/include/sysemu/kvm_int.h
> +++ b/include/sysemu/kvm_int.h
> @@ -120,6 +120,7 @@ struct KVMState
>       uint32_t xen_caps;
>       uint16_t xen_gnttab_max_frames;
>       uint16_t xen_evtchn_max_pirq;
> +    char *kvm_pmu_filter;
>   };
>   
>   void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 42fd09e4de..8b721d6668 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -187,6 +187,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel,
>       "                tb-size=n (TCG translation block cache size)\n"
>       "                dirty-ring-size=n (KVM dirty ring GFN count, default 0)\n"
>       "                eager-split-size=n (KVM Eager Page Split chunk size, default 0, disabled. ARM only)\n"
> +    "                pmu-filter={A,D}:start-end[;...] (KVM PMU Event Filter, default no filter. ARM only)\n"
   ^^^^^^^

Potential alignment issue, or the email isn't shown for me correctly.
Besides, why not follow the pattern in the commit log, which is nicer
than what's of being:

pmu-filter={A,D}:start-end[;...]

to

pmu-filter="{A,D}:start-end[;{A,D}:start-end...]

>       "                notify-vmexit=run|internal-error|disable,notify-window=n (enable notify VM exit and set notify window, x86 only)\n"
>       "                thread=single|multi (enable multi-threaded TCG)\n", QEMU_ARCH_ALL)
>   SRST
> @@ -259,6 +260,26 @@ SRST
>           impact on the memory. By default, this feature is disabled
>           (eager-split-size=0).
>   
> +    ``pmu-filter={A,D}:start-end[;...]``
> +        KVM implements pmu event filtering to prevent a guest from being able to
        ^^^^               ^^^^^^^^^^^^^^^^^^^
        Alignment          "PMU Event Filtering" to be consistent

> +	sample certain events. It depends on the KVM_ARM_VCPU_PMU_V3_FILTER attr
                                                                             ^^^^
                                                                             attribute
> +	supported in KVM. It has the following format:
> +
> +	pmu-filter="{A,D}:start-end[;{A,D}:start-end...]"
> +
> +	The A means "allow" and D means "deny", start is the first event of the
> +	range and the end is the last one. The first registered range defines
> +	the global policy(global ALLOW if the first @action is DENY, global DENY
> +	if the first @action is ALLOW). The start and end only support hex
> +	format now. For example:
> +
> +	pmu-filter="A:0x11-0x11;A:0x23-0x3a;D:0x30-0x30"
> +
> +	Since the first action is allow, we have a global deny policy. It
> +	will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is
> +	also allowed except the event 0x30 is denied, and all the other events
> +	are disallowed.
> +
>       ``notify-vmexit=run|internal-error|disable,notify-window=n``
>           Enables or disables notify VM exit support on x86 host and specify
>           the corresponding notify window to trigger the VM exit if enabled.
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index 7903e2ddde..116a0d3d2b 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -1108,6 +1108,22 @@ static void kvm_arch_set_eager_split_size(Object *obj, Visitor *v,
>       s->kvm_eager_split_size = value;
>   }
>   
> +static char *kvm_arch_get_pmu_filter(Object *obj, Error **errp)
> +{
> +    KVMState *s = KVM_STATE(obj);
> +
> +    return g_strdup(s->kvm_pmu_filter);
> +}
> +
> +static void kvm_arch_set_pmu_filter(Object *obj, const char *pmu_filter,
> +                                    Error **errp)
> +{
> +    KVMState *s = KVM_STATE(obj);
> +
> +    g_free(s->kvm_pmu_filter);
> +    s->kvm_pmu_filter = g_strdup(pmu_filter);
> +}
> +
>   void kvm_arch_accel_class_init(ObjectClass *oc)
>   {
>       object_class_property_add(oc, "eager-split-size", "size",
> @@ -1116,4 +1132,11 @@ void kvm_arch_accel_class_init(ObjectClass *oc)
>   
>       object_class_property_set_description(oc, "eager-split-size",
>           "Eager Page Split chunk size for hugepages. (default: 0, disabled)");
> +
> +    object_class_property_add_str(oc, "pmu-filter",
> +                                  kvm_arch_get_pmu_filter,
> +                                  kvm_arch_set_pmu_filter);
> +
> +    object_class_property_set_description(oc, "pmu-filter",
> +        "PMU Event Filtering description for guest pmu. (default: NULL, disabled)");
                                                       ^^^
                                                       PMU
>   }
> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> index 3c175c93a7..7947b83b36 100644
> --- a/target/arm/kvm64.c
> +++ b/target/arm/kvm64.c
> @@ -10,6 +10,7 @@
>    */
>   
>   #include "qemu/osdep.h"
> +#include <asm-arm64/kvm.h>
>   #include <sys/ioctl.h>
>   #include <sys/ptrace.h>
>   
> @@ -131,6 +132,70 @@ static bool kvm_arm_set_device_attr(CPUState *cs, struct kvm_device_attr *attr,
>       return true;
>   }
>   
> +static void kvm_arm_pmu_filter_init(CPUState *cs)
> +{
> +    static bool pmu_filter_init = false;
> +    struct kvm_pmu_event_filter filter;
> +    struct kvm_device_attr attr = {
> +        .group      = KVM_ARM_VCPU_PMU_V3_CTRL,
> +        .attr       = KVM_ARM_VCPU_PMU_V3_FILTER,
> +        .addr       = (uint64_t)&filter,
> +    };
> +    KVMState *kvm_state = cs->kvm_state;

I would move @kvm_state to the beginning of the function since it's the container
to everything else.

> +    char *tmp;
> +    char *str, act;
> +
> +    if (!kvm_state->kvm_pmu_filter)
> +        return;
> +
> +    if (kvm_vcpu_ioctl(cs, KVM_HAS_DEVICE_ATTR, attr)) {
> +        error_report("The kernel doesn't support the pmu event filter!\n");
> +        abort();
> +    }
> +

s/attr/&attr ?

The connection between vCPU and attribute query was set up in Linux v4.10 by
commit f577f6c2a6a ("arm64: KVM: Introduce per-vcpu kvm device controls"), and
the capability depends on KVM_CAP_VCPU_ATTRIBUTES. I think KVM_CAP_VCPU_ATTRIBUTES
needs to be checked prior to kvm_vcpu_ioctl(cs, KVM_HAS_DEVICE_ATTR, ...)

Besides, the PMU Event Filtering was introduced to Linux v4.10. It means the user
can crash qemu when "pmu-filter" is provided on Linux v4.9. So the correct behavior
would be warning and ignore "pmu-filter" since it's an add-on and best-effort
feature.


> +    /* The filter only needs to be initialized for 1 vcpu. */
> +    if (!pmu_filter_init)
> +        pmu_filter_init = true;
> +

{ } has been missed. QEMU needs it even for the block with single line of code.

> +    tmp = g_strdup(kvm_state->kvm_pmu_filter);
> +
> +    for (str = strtok(tmp, ";"); str != NULL; str = strtok(NULL, ";")) {
> +        unsigned short start = 0, end = 0;
> +
> +        sscanf(str, "%c:%hx-%hx", &act, &start, &end);
> +        if ((act != 'A' && act != 'D') || (!start && !end)) {
> +            error_report("skipping invalid filter %s\n", str);
> +            continue;
> +        }
> +
> +        filter = (struct kvm_pmu_event_filter) {
> +            .base_event     = start,
> +            .nevents        = end - start + 1,
> +            .action         = act == 'A' ? KVM_PMU_EVENT_ALLOW :
> +                                           KVM_PMU_EVENT_DENY,
> +        };
> +
> +        if (!kvm_arm_set_device_attr(cs, &attr, "PMU Event Filter")) {
> +            if (errno == EINVAL)
> +                error_report("Invalid filter range [0x%x-0x%x]. "
> +                             "ARMv8.0 support 10 bits event space, "
> +                             "ARMv8.1 support 16 bits event space",
> +                             start, end);
> +            else if (errno == ENODEV)
> +                error_report("GIC not initialized");
> +            else if (errno == ENXIO)
> +                error_report("PMUv3 not properly configured or in-kernel irqchip "
> +                             "not configured.");
> +            else if (errno == EBUSY)
> +                error_report("PMUv3 already initialized or a VCPU has already run");
> +
> +            abort();
> +        }
> +    }
> +
> +    g_free(tmp);
> +}
> +

{ } has been missed.

g_strsplit() may be good fit to parse "pmu-filter". cpu-target.c::parse_cpu_option()
is the example for its usage.

As I explained above, it wouldn't a "abort()" since "pmu-filter" is an add-on and
best-effort attempt. We probably just warn done by warn_report() instead of raising
error if the PMU Event Filter fails to be set.

>   void kvm_arm_pmu_init(CPUState *cs)
>   {
>       struct kvm_device_attr attr = {
> @@ -141,6 +206,9 @@ void kvm_arm_pmu_init(CPUState *cs)
>       if (!ARM_CPU(cs)->has_pmu) {
>           return;
>       }
> +
> +    kvm_arm_pmu_filter_init(cs);
> +
>       if (!kvm_arm_set_device_attr(cs, &attr, "PMU")) {
>           error_report("failed to init PMU");
>           abort();

Thanks,
Gavin
Shaoqin Huang Dec. 7, 2023, 6:07 a.m. UTC | #3
On 12/1/23 00:55, Sebastian Ott wrote:
> On Tue, 28 Nov 2023, Shaoqin Huang wrote:
>> +static void kvm_arm_pmu_filter_init(CPUState *cs)
>> +{
>> +    static bool pmu_filter_init = false;
>> +    struct kvm_pmu_event_filter filter;
>> +    struct kvm_device_attr attr = {
>> +        .group      = KVM_ARM_VCPU_PMU_V3_CTRL,
>> +        .attr       = KVM_ARM_VCPU_PMU_V3_FILTER,
>> +        .addr       = (uint64_t)&filter,
>> +    };
>> +    KVMState *kvm_state = cs->kvm_state;
>> +    char *tmp;
>> +    char *str, act;
>> +
>> +    if (!kvm_state->kvm_pmu_filter)
>> +        return;
>> +
>> +    if (kvm_vcpu_ioctl(cs, KVM_HAS_DEVICE_ATTR, attr)) {
>> +        error_report("The kernel doesn't support the pmu event 
>> filter!\n");
>> +        abort();
>> +    }
>> +
>> +    /* The filter only needs to be initialized for 1 vcpu. */
>> +    if (!pmu_filter_init)
>> +        pmu_filter_init = true;
> 
> Imho this is missing an else to bail out. Or the shorter version
> 
>      if (pmu_filter_init)
>          return;
> 
>      pmu_filter_init = true;
> 

Yes. This is what I want to do. Thanks for fixing it.

> which could also move above the other tests.
> 
> Sebastian
>
Shaoqin Huang Dec. 7, 2023, 6:34 a.m. UTC | #4
Hi Gavin,

On 12/1/23 13:37, Gavin Shan wrote:
> Hi Shaoqin,
> 
> On 11/29/23 14:08, Shaoqin Huang wrote:
>> The KVM_ARM_VCPU_PMU_V3_FILTER provide the ability to let the VMM decide
>> which PMU events are provided to the guest. Add a new option
>> `pmu-filter` as -accel sub-option to set the PMU Event Filtering.
>> Without the filter, the KVM will expose all events from the host to
>> guest by default.
>>
>> The `pmu-filter` has such format:
>>
>>    pmu-filter="{A,D}:start-end[;{A,D}:start-end...]"
>>
>> The A means "allow" and D means "deny", start is the first event of the
>> range and the end is the last one. The first registered range defines
>> the global policy(global ALLOW if the first @action is DENY, global DENY
>> if the first @action is ALLOW). The start and end only support hex
>> format now. For example:
>>
>>    pmu-filter="A:0x11-0x11;A:0x23-0x3a;D:0x30-0x30"
>>
>> Since the first action is allow, we have a global deny policy. It
>> will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is
>> also allowed except the event 0x30 is denied, and all the other events
>> are disallowed.
>>
>> Here is an real example shows how to use the PMU Event Filtering, when
>> we launch a guest by use kvm, add such command line:
>>
>>    # qemu-system-aarch64 \
>>     -accel kvm,pmu-filter="D:0x11-0x11"
>>
>> Since the first action is deny, we have a global allow policy. This
>> disables the filtering of the cycle counter (event 0x11 being 
>> CPU_CYCLES).
>>
>> And then in guest, use the perf to count the cycle:
>>
>>    # perf stat sleep 1
>>
>>     Performance counter stats for 'sleep 1':
>>
>>                1.22 msec task-clock                       #    0.001 
>> CPUs utilized
>>                   1      context-switches                 #  820.695 /sec
>>                   0      cpu-migrations                   #    0.000 /sec
>>                  55      page-faults                      #   45.138 
>> K/sec
>>     <not supported>      cycles
>>             1128954      instructions
>>              227031      branches                         #  186.323 
>> M/sec
>>                8686      branch-misses                    #    3.83% 
>> of all branches
>>
>>         1.002492480 seconds time elapsed
>>
>>         0.001752000 seconds user
>>         0.000000000 seconds sys
>>
>> As we can see, the cycle counter has been disabled in the guest, but
>> other pmu events are still work.
>>
>> Signed-off-by: Shaoqin Huang <shahuang@redhat.com>
>> ---
>> v2->v3:
>>    - Improve commits message, use kernel doc wording, add more 
>> explaination on
>>      filter example, fix some typo error.                [Eric]
>>    - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. 
>> [Eric]
>>    - Add more precise error message report.              [Eric]
>>    - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER 
>> support in
>>      KVM.                                                [Eric]
>>
>> v1->v2:
>>    - Add more description for allow and deny meaning in
>>      commit message.                                     [Sebastian]
>>    - Small improvement.                                  [Sebastian]
>>
>> v2: 
>> https://lore.kernel.org/all/20231117060838.39723-1-shahuang@redhat.com/
>> v1: 
>> https://lore.kernel.org/all/20231113081713.153615-1-shahuang@redhat.com/
>> ---
>>   include/sysemu/kvm_int.h |  1 +
>>   qemu-options.hx          | 21 +++++++++++++
>>   target/arm/kvm.c         | 23 ++++++++++++++
>>   target/arm/kvm64.c       | 68 ++++++++++++++++++++++++++++++++++++++++
>>   4 files changed, 113 insertions(+)
>>
>> diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
>> index fd846394be..8f4601474f 100644
>> --- a/include/sysemu/kvm_int.h
>> +++ b/include/sysemu/kvm_int.h
>> @@ -120,6 +120,7 @@ struct KVMState
>>       uint32_t xen_caps;
>>       uint16_t xen_gnttab_max_frames;
>>       uint16_t xen_evtchn_max_pirq;
>> +    char *kvm_pmu_filter;
>>   };
>>   void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 42fd09e4de..8b721d6668 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -187,6 +187,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel,
>>       "                tb-size=n (TCG translation block cache size)\n"
>>       "                dirty-ring-size=n (KVM dirty ring GFN count, 
>> default 0)\n"
>>       "                eager-split-size=n (KVM Eager Page Split chunk 
>> size, default 0, disabled. ARM only)\n"
>> +    "                pmu-filter={A,D}:start-end[;...] (KVM PMU Event 
>> Filter, default no filter. ARM only)\n"
>    ^^^^^^^
> 
> Potential alignment issue, or the email isn't shown for me correctly.
> Besides, why not follow the pattern in the commit log, which is nicer
> than what's of being:
> 
> pmu-filter={A,D}:start-end[;...]
> 
> to
> 
> pmu-filter="{A,D}:start-end[;{A,D}:start-end...]
> 

Ok. I can replace it with the better format.

>>       "                
>> notify-vmexit=run|internal-error|disable,notify-window=n (enable 
>> notify VM exit and set notify window, x86 only)\n"
>>       "                thread=single|multi (enable multi-threaded 
>> TCG)\n", QEMU_ARCH_ALL)
>>   SRST
>> @@ -259,6 +260,26 @@ SRST
>>           impact on the memory. By default, this feature is disabled
>>           (eager-split-size=0).
>> +    ``pmu-filter={A,D}:start-end[;...]``
>> +        KVM implements pmu event filtering to prevent a guest from 
>> being able to
>         ^^^^               ^^^^^^^^^^^^^^^^^^^
>         Alignment          "PMU Event Filtering" to be consistent
> 

Thanks for pointing it out. It should be an alignment issue. I will fix it.

>> +    sample certain events. It depends on the 
>> KVM_ARM_VCPU_PMU_V3_FILTER attr
>                                                                              ^^^^
>                                                                              attribute
>> +    supported in KVM. It has the following format:
>> +
>> +    pmu-filter="{A,D}:start-end[;{A,D}:start-end...]"
>> +
>> +    The A means "allow" and D means "deny", start is the first event 
>> of the
>> +    range and the end is the last one. The first registered range 
>> defines
>> +    the global policy(global ALLOW if the first @action is DENY, 
>> global DENY
>> +    if the first @action is ALLOW). The start and end only support hex
>> +    format now. For example:
>> +
>> +    pmu-filter="A:0x11-0x11;A:0x23-0x3a;D:0x30-0x30"
>> +
>> +    Since the first action is allow, we have a global deny policy. It
>> +    will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is
>> +    also allowed except the event 0x30 is denied, and all the other 
>> events
>> +    are disallowed.
>> +
>>       ``notify-vmexit=run|internal-error|disable,notify-window=n``
>>           Enables or disables notify VM exit support on x86 host and 
>> specify
>>           the corresponding notify window to trigger the VM exit if 
>> enabled.
>> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
>> index 7903e2ddde..116a0d3d2b 100644
>> --- a/target/arm/kvm.c
>> +++ b/target/arm/kvm.c
>> @@ -1108,6 +1108,22 @@ static void 
>> kvm_arch_set_eager_split_size(Object *obj, Visitor *v,
>>       s->kvm_eager_split_size = value;
>>   }
>> +static char *kvm_arch_get_pmu_filter(Object *obj, Error **errp)
>> +{
>> +    KVMState *s = KVM_STATE(obj);
>> +
>> +    return g_strdup(s->kvm_pmu_filter);
>> +}
>> +
>> +static void kvm_arch_set_pmu_filter(Object *obj, const char *pmu_filter,
>> +                                    Error **errp)
>> +{
>> +    KVMState *s = KVM_STATE(obj);
>> +
>> +    g_free(s->kvm_pmu_filter);
>> +    s->kvm_pmu_filter = g_strdup(pmu_filter);
>> +}
>> +
>>   void kvm_arch_accel_class_init(ObjectClass *oc)
>>   {
>>       object_class_property_add(oc, "eager-split-size", "size",
>> @@ -1116,4 +1132,11 @@ void kvm_arch_accel_class_init(ObjectClass *oc)
>>       object_class_property_set_description(oc, "eager-split-size",
>>           "Eager Page Split chunk size for hugepages. (default: 0, 
>> disabled)");
>> +
>> +    object_class_property_add_str(oc, "pmu-filter",
>> +                                  kvm_arch_get_pmu_filter,
>> +                                  kvm_arch_set_pmu_filter);
>> +
>> +    object_class_property_set_description(oc, "pmu-filter",
>> +        "PMU Event Filtering description for guest pmu. (default: 
>> NULL, disabled)");
>                                                        ^^^
>                                                        PMU

Got it.

>>   }
>> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
>> index 3c175c93a7..7947b83b36 100644
>> --- a/target/arm/kvm64.c
>> +++ b/target/arm/kvm64.c
>> @@ -10,6 +10,7 @@
>>    */
>>   #include "qemu/osdep.h"
>> +#include <asm-arm64/kvm.h>
>>   #include <sys/ioctl.h>
>>   #include <sys/ptrace.h>
>> @@ -131,6 +132,70 @@ static bool kvm_arm_set_device_attr(CPUState *cs, 
>> struct kvm_device_attr *attr,
>>       return true;
>>   }
>> +static void kvm_arm_pmu_filter_init(CPUState *cs)
>> +{
>> +    static bool pmu_filter_init = false;
>> +    struct kvm_pmu_event_filter filter;
>> +    struct kvm_device_attr attr = {
>> +        .group      = KVM_ARM_VCPU_PMU_V3_CTRL,
>> +        .attr       = KVM_ARM_VCPU_PMU_V3_FILTER,
>> +        .addr       = (uint64_t)&filter,
>> +    };
>> +    KVMState *kvm_state = cs->kvm_state;
> 
> I would move @kvm_state to the beginning of the function since it's the 
> container
> to everything else.
> 

Ok. I can move it to the first.

>> +    char *tmp;
>> +    char *str, act;
>> +
>> +    if (!kvm_state->kvm_pmu_filter)
>> +        return;
>> +
>> +    if (kvm_vcpu_ioctl(cs, KVM_HAS_DEVICE_ATTR, attr)) {
>> +        error_report("The kernel doesn't support the pmu event 
>> filter!\n");
>> +        abort();
>> +    }
>> +
> 
> s/attr/&attr ?
> 

It should be &attr.

> The connection between vCPU and attribute query was set up in Linux 
> v4.10 by
> commit f577f6c2a6a ("arm64: KVM: Introduce per-vcpu kvm device 
> controls"), and
> the capability depends on KVM_CAP_VCPU_ATTRIBUTES. I think 
> KVM_CAP_VCPU_ATTRIBUTES
> needs to be checked prior to kvm_vcpu_ioctl(cs, KVM_HAS_DEVICE_ATTR, ...)
> 

I searched the KVM_CAP_VCPU_ATTRIBUTES, no where check the capability. 
And the current usage of KVM_HAS_DEVICE_ATTR never check the 
KVM_CAP_VCPU_ATTRIBUTES. I guess we don't need it.

> Besides, the PMU Event Filtering was introduced to Linux v4.10. It means 
> the user
> can crash qemu when "pmu-filter" is provided on Linux v4.9. So the 
> correct behavior
> would be warning and ignore "pmu-filter" since it's an add-on and 
> best-effort
> feature.

Thanks for pointing this out, I think it's reasonable. I can replace all 
error_report with wran_report and replace delete all abort().

> 
> 
>> +    /* The filter only needs to be initialized for 1 vcpu. */
>> +    if (!pmu_filter_init)
>> +        pmu_filter_init = true;
>> +
> 
> { } has been missed. QEMU needs it even for the block with single line 
> of code.
> 

Ok. I will add it.

>> +    tmp = g_strdup(kvm_state->kvm_pmu_filter);
>> +
>> +    for (str = strtok(tmp, ";"); str != NULL; str = strtok(NULL, ";")) {
>> +        unsigned short start = 0, end = 0;
>> +
>> +        sscanf(str, "%c:%hx-%hx", &act, &start, &end);
>> +        if ((act != 'A' && act != 'D') || (!start && !end)) {
>> +            error_report("skipping invalid filter %s\n", str);
>> +            continue;
>> +        }
>> +
>> +        filter = (struct kvm_pmu_event_filter) {
>> +            .base_event     = start,
>> +            .nevents        = end - start + 1,
>> +            .action         = act == 'A' ? KVM_PMU_EVENT_ALLOW :
>> +                                           KVM_PMU_EVENT_DENY,
>> +        };
>> +
>> +        if (!kvm_arm_set_device_attr(cs, &attr, "PMU Event Filter")) {
>> +            if (errno == EINVAL)
>> +                error_report("Invalid filter range [0x%x-0x%x]. "
>> +                             "ARMv8.0 support 10 bits event space, "
>> +                             "ARMv8.1 support 16 bits event space",
>> +                             start, end);
>> +            else if (errno == ENODEV)
>> +                error_report("GIC not initialized");
>> +            else if (errno == ENXIO)
>> +                error_report("PMUv3 not properly configured or 
>> in-kernel irqchip "
>> +                             "not configured.");
>> +            else if (errno == EBUSY)
>> +                error_report("PMUv3 already initialized or a VCPU has 
>> already run");
>> +
>> +            abort();
>> +        }
>> +    }
>> +
>> +    g_free(tmp);
>> +}
>> +
> 
> { } has been missed.
> 
> g_strsplit() may be good fit to parse "pmu-filter". 
> cpu-target.c::parse_cpu_option()
> is the example for its usage.
> 

Ok. I can follow the implementation. It actually much simpler.

> As I explained above, it wouldn't a "abort()" since "pmu-filter" is an 
> add-on and
> best-effort attempt. We probably just warn done by warn_report() instead 
> of raising
> error if the PMU Event Filter fails to be set.
> 

Will do that.

Thanks,
Shaoqin

>>   void kvm_arm_pmu_init(CPUState *cs)
>>   {
>>       struct kvm_device_attr attr = {
>> @@ -141,6 +206,9 @@ void kvm_arm_pmu_init(CPUState *cs)
>>       if (!ARM_CPU(cs)->has_pmu) {
>>           return;
>>       }
>> +
>> +    kvm_arm_pmu_filter_init(cs);
>> +
>>       if (!kvm_arm_set_device_attr(cs, &attr, "PMU")) {
>>           error_report("failed to init PMU");
>>           abort();
> 
> Thanks,
> Gavin
>
diff mbox series

Patch

diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index fd846394be..8f4601474f 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -120,6 +120,7 @@  struct KVMState
     uint32_t xen_caps;
     uint16_t xen_gnttab_max_frames;
     uint16_t xen_evtchn_max_pirq;
+    char *kvm_pmu_filter;
 };
 
 void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
diff --git a/qemu-options.hx b/qemu-options.hx
index 42fd09e4de..8b721d6668 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -187,6 +187,7 @@  DEF("accel", HAS_ARG, QEMU_OPTION_accel,
     "                tb-size=n (TCG translation block cache size)\n"
     "                dirty-ring-size=n (KVM dirty ring GFN count, default 0)\n"
     "                eager-split-size=n (KVM Eager Page Split chunk size, default 0, disabled. ARM only)\n"
+    "                pmu-filter={A,D}:start-end[;...] (KVM PMU Event Filter, default no filter. ARM only)\n"
     "                notify-vmexit=run|internal-error|disable,notify-window=n (enable notify VM exit and set notify window, x86 only)\n"
     "                thread=single|multi (enable multi-threaded TCG)\n", QEMU_ARCH_ALL)
 SRST
@@ -259,6 +260,26 @@  SRST
         impact on the memory. By default, this feature is disabled
         (eager-split-size=0).
 
+    ``pmu-filter={A,D}:start-end[;...]``
+        KVM implements pmu event filtering to prevent a guest from being able to
+	sample certain events. It depends on the KVM_ARM_VCPU_PMU_V3_FILTER attr
+	supported in KVM. It has the following format:
+
+	pmu-filter="{A,D}:start-end[;{A,D}:start-end...]"
+
+	The A means "allow" and D means "deny", start is the first event of the
+	range and the end is the last one. The first registered range defines
+	the global policy(global ALLOW if the first @action is DENY, global DENY
+	if the first @action is ALLOW). The start and end only support hex
+	format now. For example:
+
+	pmu-filter="A:0x11-0x11;A:0x23-0x3a;D:0x30-0x30"
+
+	Since the first action is allow, we have a global deny policy. It
+	will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is
+	also allowed except the event 0x30 is denied, and all the other events
+	are disallowed.
+
     ``notify-vmexit=run|internal-error|disable,notify-window=n``
         Enables or disables notify VM exit support on x86 host and specify
         the corresponding notify window to trigger the VM exit if enabled.
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 7903e2ddde..116a0d3d2b 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -1108,6 +1108,22 @@  static void kvm_arch_set_eager_split_size(Object *obj, Visitor *v,
     s->kvm_eager_split_size = value;
 }
 
+static char *kvm_arch_get_pmu_filter(Object *obj, Error **errp)
+{
+    KVMState *s = KVM_STATE(obj);
+
+    return g_strdup(s->kvm_pmu_filter);
+}
+
+static void kvm_arch_set_pmu_filter(Object *obj, const char *pmu_filter,
+                                    Error **errp)
+{
+    KVMState *s = KVM_STATE(obj);
+
+    g_free(s->kvm_pmu_filter);
+    s->kvm_pmu_filter = g_strdup(pmu_filter);
+}
+
 void kvm_arch_accel_class_init(ObjectClass *oc)
 {
     object_class_property_add(oc, "eager-split-size", "size",
@@ -1116,4 +1132,11 @@  void kvm_arch_accel_class_init(ObjectClass *oc)
 
     object_class_property_set_description(oc, "eager-split-size",
         "Eager Page Split chunk size for hugepages. (default: 0, disabled)");
+
+    object_class_property_add_str(oc, "pmu-filter",
+                                  kvm_arch_get_pmu_filter,
+                                  kvm_arch_set_pmu_filter);
+
+    object_class_property_set_description(oc, "pmu-filter",
+        "PMU Event Filtering description for guest pmu. (default: NULL, disabled)");
 }
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 3c175c93a7..7947b83b36 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -10,6 +10,7 @@ 
  */
 
 #include "qemu/osdep.h"
+#include <asm-arm64/kvm.h>
 #include <sys/ioctl.h>
 #include <sys/ptrace.h>
 
@@ -131,6 +132,70 @@  static bool kvm_arm_set_device_attr(CPUState *cs, struct kvm_device_attr *attr,
     return true;
 }
 
+static void kvm_arm_pmu_filter_init(CPUState *cs)
+{
+    static bool pmu_filter_init = false;
+    struct kvm_pmu_event_filter filter;
+    struct kvm_device_attr attr = {
+        .group      = KVM_ARM_VCPU_PMU_V3_CTRL,
+        .attr       = KVM_ARM_VCPU_PMU_V3_FILTER,
+        .addr       = (uint64_t)&filter,
+    };
+    KVMState *kvm_state = cs->kvm_state;
+    char *tmp;
+    char *str, act;
+
+    if (!kvm_state->kvm_pmu_filter)
+        return;
+
+    if (kvm_vcpu_ioctl(cs, KVM_HAS_DEVICE_ATTR, attr)) {
+        error_report("The kernel doesn't support the pmu event filter!\n");
+        abort();
+    }
+
+    /* The filter only needs to be initialized for 1 vcpu. */
+    if (!pmu_filter_init)
+        pmu_filter_init = true;
+
+    tmp = g_strdup(kvm_state->kvm_pmu_filter);
+
+    for (str = strtok(tmp, ";"); str != NULL; str = strtok(NULL, ";")) {
+        unsigned short start = 0, end = 0;
+
+        sscanf(str, "%c:%hx-%hx", &act, &start, &end);
+        if ((act != 'A' && act != 'D') || (!start && !end)) {
+            error_report("skipping invalid filter %s\n", str);
+            continue;
+        }
+
+        filter = (struct kvm_pmu_event_filter) {
+            .base_event     = start,
+            .nevents        = end - start + 1,
+            .action         = act == 'A' ? KVM_PMU_EVENT_ALLOW :
+                                           KVM_PMU_EVENT_DENY,
+        };
+
+        if (!kvm_arm_set_device_attr(cs, &attr, "PMU Event Filter")) {
+            if (errno == EINVAL)
+                error_report("Invalid filter range [0x%x-0x%x]. "
+                             "ARMv8.0 support 10 bits event space, "
+                             "ARMv8.1 support 16 bits event space",
+                             start, end);
+            else if (errno == ENODEV)
+                error_report("GIC not initialized");
+            else if (errno == ENXIO)
+                error_report("PMUv3 not properly configured or in-kernel irqchip "
+                             "not configured.");
+            else if (errno == EBUSY)
+                error_report("PMUv3 already initialized or a VCPU has already run");
+
+            abort();
+        }
+    }
+
+    g_free(tmp);
+}
+
 void kvm_arm_pmu_init(CPUState *cs)
 {
     struct kvm_device_attr attr = {
@@ -141,6 +206,9 @@  void kvm_arm_pmu_init(CPUState *cs)
     if (!ARM_CPU(cs)->has_pmu) {
         return;
     }
+
+    kvm_arm_pmu_filter_init(cs);
+
     if (!kvm_arm_set_device_attr(cs, &attr, "PMU")) {
         error_report("failed to init PMU");
         abort();