diff mbox series

[v2,1/3] x86/tdx: Add TDX Guest event notify interrupt support

Message ID 20230413034108.1902712-2-sathyanarayanan.kuppuswamy@linux.intel.com (mailing list archive)
State New
Headers show
Series TDX Guest Quote generation support | expand

Commit Message

Kuppuswamy Sathyanarayanan April 13, 2023, 3:41 a.m. UTC
Host-guest event notification via configured interrupt vector is useful
in cases where a guest makes an asynchronous request and needs a
callback from the host to indicate the completion or to let the host
notify the guest about events like device removal. One usage example is,
callback requirement of GetQuote asynchronous hypercall.

In TDX guest, SetupEventNotifyInterrupt hypercall can be used by the
guest to specify which interrupt vector to use as an event-notify
vector from the VMM. Details about the SetupEventNotifyInterrupt
hypercall can be found in TDX Guest-Host Communication Interface
(GHCI) Specification, section "VP.VMCALL<SetupEventNotifyInterrupt>".

As per design, VMM will post the event completion IRQ using the same
CPU on which SetupEventNotifyInterrupt hypercall request is received.
So allocate an IRQ vector from "x86_vector_domain", and set the CPU
affinity of the IRQ vector to the CPU on which
SetupEventNotifyInterrupt hypercall is made.

Add tdx_register_event_irq_cb()/tdx_unregister_event_irq_cb()
interfaces to allow drivers register/unregister event noficiation
handlers.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v1:
 * Used early_initcall() instead of arch_initcall() to trigger
   tdx_event_irq_init().
 * Removed unused headers and included headers for spinlock and list
   explicitly.
 * Since during the early_initcall() only one CPU would be enabled, remove
   CPU locking logic (like using set_cpus_allowed_ptr() or get_cpu())

 arch/x86/coco/tdx/tdx.c    | 156 +++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/tdx.h |   6 ++
 2 files changed, 162 insertions(+)

Comments

Huang, Kai April 14, 2023, 1:34 p.m. UTC | #1
On Wed, 2023-04-12 at 20:41 -0700, Kuppuswamy Sathyanarayanan wrote:
> Host-guest event notification via configured interrupt vector is useful
> in cases where a guest makes an asynchronous request and needs a
> callback from the host to indicate the completion or to let the host
> notify the guest about events like device removal. One usage example is,
> callback requirement of GetQuote asynchronous hypercall.
> 
> In TDX guest, SetupEventNotifyInterrupt hypercall can be used by the
> guest to specify which interrupt vector to use as an event-notify
> vector from the VMM. Details about the SetupEventNotifyInterrupt
> hypercall can be found in TDX Guest-Host Communication Interface
> (GHCI) Specification, section "VP.VMCALL<SetupEventNotifyInterrupt>".
> 
> As per design, VMM will post the event completion IRQ using the same
> CPU on which SetupEventNotifyInterrupt hypercall request is received.
> So allocate an IRQ vector from "x86_vector_domain", and set the CPU
> affinity of the IRQ vector to the CPU on which
> SetupEventNotifyInterrupt hypercall is made.
> 
> Add tdx_register_event_irq_cb()/tdx_unregister_event_irq_cb()
> interfaces to allow drivers register/unregister event noficiation
			      ^
			      to register/unregister
> handlers.
> 
> 

[...]

> +/* Reserve an IRQ from x86_vector_domain for TD event notification */
> +static int __init tdx_event_irq_init(void)
> +{
> +	struct irq_alloc_info info;
> +	struct irq_cfg *cfg;
> +	int irq;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
> +		return 0;
> +
> +	init_irq_alloc_info(&info, NULL);
> +
> +	/*
> +	 * Event notification vector will be delivered to the CPU
> +	 * in which TDVMCALL_SETUP_NOTIFY_INTR hypercall is requested.
	   ^
	   on which (to be consistent with the changelog)

> +	 * So set the IRQ affinity to the current CPU (CPU 0).
> +	 */
> +	info.mask = cpumask_of(0);

I think we should use smp_processor_id() to get the CPU id even for BSP.

> +
> +	irq = irq_domain_alloc_irqs(x86_vector_domain, 1, cpu_to_node(0), &info);
> +	if (irq <= 0) {
> +		pr_err("Event notification IRQ allocation failed %d\n", irq);
> +		return -EIO;
> +	}
> +
> +	irq_set_handler(irq, handle_edge_irq);
> +
> +	/* Since the IRQ affinity is set, it cannot be balanced */
> +	if (request_irq(irq, tdx_event_irq_handler, IRQF_NOBALANCING,
> +			"tdx_event_irq", NULL)) {
> +		pr_err("Event notification IRQ request failed\n");
> +		goto err_free_domain_irqs;
> +	}

Firstly, this comment isn't right.  The @info->mask is only used to allocate the
vector of the IRQ, but it doesn't have anything to do with IRQ affinity. 
irq_domain_alloc_irqs() will set the IRQ to have the default affinity in fact.  

The comment should be something like below instead:

	/*
	 * The IRQ cannot be migrated because VMM always injects the vector
	 * event to the cpu on which the SetupEventNotifyInterrupt TDVMCALL
	 * is called.  Set the IRQ with IRQF_NOBALANCING to prevent its
affinity 
	 * from being changed.
	 */

That also being said, you actually haven't set IRQ's affinity to the BSP yet
before request_irq().  IIUC you can either:

1) Explicitly call irq_set_affinity() after irq_domain_alloc_irqs() to set
affinity to BSP; or

2) Use __irq_domain_alloc_irqs(), which allows you to set the affinity directly,
to allocate the IRQ instead of irq_domain_alloc_irqs().

	struct irq_affinity_desc desc;

	cpumask_set_cpu(smp_processor_id(), &desc.mask);

	irq = __irq_domain_alloc_irqs(..., &desc);

Personally I think 2) is more straightforward.

Using 2) also allows you to alternatively set desc.is_managed to 1 to set the
IRQ as kernel managed.  This will prevent userspace from changing IRQ affinity.
Kernel can still change the affinity, though, but no other kernel code will do
that.

So both kernel managed affinity IRQ and IRQF_NOBALANCING should work.  I have no
opinion on this.

And IIUC if you already set IRQ's affinity to BSP before request_irq(), then you
don't need to do:

	info.mask = cpumask_of(0);

before allocating the IRQ as the vector will be allocated in request_irq() on
the BSP anyway.


> +
> +	cfg = irq_cfg(irq);
> +
> +	/*
> +	 * Since tdx_event_irq_init() is triggered via early_initcall(),
> +	 * it will called before secondary CPUs bringup. Since there is
> +	 * only one CPU, it complies with the requirement of executing
> +	 * the TDVMCALL_SETUP_NOTIFY_INTR hypercall on the same CPU where
> +	 * the IRQ vector is allocated.

IMHO this is unnecessary complicated.

In fact, IMHO we can just have one simple comment at the very beginning to
explain the whole thing:

	/*
	 * TDX guest uses SetupEventNotifyInterrupt TDVMCALL to allow the
	 * hypervisor to notify the TDX guest when needed, for instance,
	 * when VMM finishes the GetQuote request from the TDX guest.  
	 *
	 * The VMM always notifies the TDX guest via the vector specified in
the
	 * SetupEventNotifyInterrupt TDVMCALL to the cpu on which the TDVMCALL
	 * is called.  For simplicity, just allocate an IRQ (and a vector) 
	 * directly from x86_vector_domain for such notification and pin the
	 * IRQ to the BSP.
	 */

And then all the code follows.

> +	 *
> +	 * Register callback vector address with VMM. More details
> +	 * about the ABI can be found in TDX Guest-Host-Communication
> +	 * Interface (GHCI), sec titled
> +	 * "TDG.VP.VMCALL<SetupEventNotifyInterrupt>".
> +	 */
> +	if (_tdx_hypercall(TDVMCALL_SETUP_NOTIFY_INTR, cfg->vector, 0, 0, 0)) {
> +		pr_err("Event notification hypercall failed\n");
> +		goto err_free_irqs;
> +	}
> +
> +	tdx_event_irq = irq;
> +
> +	return 0;
> +
> +err_free_irqs:
> +	free_irq(irq, NULL);
> +err_free_domain_irqs:
> +	irq_domain_free_irqs(irq, 1);
> +
> +	return -EIO;
> +}
> +early_initcall(tdx_event_irq_init)
> +
> 

[...]
Kuppuswamy Sathyanarayanan April 25, 2023, 11:47 p.m. UTC | #2
Hi Kai,

On 4/14/23 6:34 AM, Huang, Kai wrote:
> On Wed, 2023-04-12 at 20:41 -0700, Kuppuswamy Sathyanarayanan wrote:
>> Host-guest event notification via configured interrupt vector is useful
>> in cases where a guest makes an asynchronous request and needs a
>> callback from the host to indicate the completion or to let the host
>> notify the guest about events like device removal. One usage example is,
>> callback requirement of GetQuote asynchronous hypercall.
>>
>> In TDX guest, SetupEventNotifyInterrupt hypercall can be used by the
>> guest to specify which interrupt vector to use as an event-notify
>> vector from the VMM. Details about the SetupEventNotifyInterrupt
>> hypercall can be found in TDX Guest-Host Communication Interface
>> (GHCI) Specification, section "VP.VMCALL<SetupEventNotifyInterrupt>".
>>
>> As per design, VMM will post the event completion IRQ using the same
>> CPU on which SetupEventNotifyInterrupt hypercall request is received.
>> So allocate an IRQ vector from "x86_vector_domain", and set the CPU
>> affinity of the IRQ vector to the CPU on which
>> SetupEventNotifyInterrupt hypercall is made.
>>
>> Add tdx_register_event_irq_cb()/tdx_unregister_event_irq_cb()
>> interfaces to allow drivers register/unregister event noficiation
> 			      ^
> 			      to register/unregister
>> handlers.
>>
>>
> 
> [...]
> 

With suggested changes, the final version looks like below.

+/**
+ * tdx_event_irq_init() - Register IRQ for event notification from the VMM to
+ *                       the TDX Guest.
+ *
+ * Use SetupEventNotifyInterrupt TDVMCALL to register the event notification
+ * IRQ with the VMM, which is used by the VMM to notify the TDX guest when
+ * needed, for instance, when VMM finishes the GetQuote request from the TDX
+ * guest. The VMM always notifies the TDX guest via the same CPU on which the
+ * SetupEventNotifyInterrupt TDVMCALL is called. For simplicity, just allocate
+ * an IRQ (and a vector) directly from x86_vector_domain for such notification
+ * and pin the IRQ to the same CPU on which TDVMCALL is called.
+ *
+ * Since tdx_event_irq_init() is triggered via early_initcall(), it will be
+ * called before secondary CPUs bring up, so no special logic is required to
+ * ensure that the same CPU is used for SetupEventNotifyInterrupt TDVMCALL and
+ * IRQ allocation.
+ */
+static int __init tdx_event_irq_init(void)
+{
+       struct irq_affinity_desc desc;
+       struct irq_alloc_info info;
+       struct irq_cfg *cfg;
+       int irq;
+
+       if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
+               return 0;
+
+       init_irq_alloc_info(&info, NULL);
+
+       cpumask_set_cpu(smp_processor_id(), &desc.mask);
+
+       irq = __irq_domain_alloc_irqs(x86_vector_domain, -1, 1, cpu_to_node(0),
+                                     &info, false, &desc);
+       if (irq <= 0) {
+               pr_err("Event notification IRQ allocation failed %d\n", irq);
+               return -EIO;
+       }
+
+       irq_set_handler(irq, handle_edge_irq);
+
+       /*
+        * The IRQ cannot be migrated because VMM always notifies the TDX
+        * guest on the same CPU on which the SetupEventNotifyInterrupt
+        * TDVMCALL is called. Set the IRQ with IRQF_NOBALANCING to prevent
+        * its affinity from being changed.
+        */
+       if (request_irq(irq, tdx_event_irq_handler, IRQF_NOBALANCING,
+                       "tdx_event_irq", NULL)) {
+               pr_err("Event notification IRQ request failed\n");
+               goto err_free_domain_irqs;
+       }
+
+       cfg = irq_cfg(irq);
+
+       if (_tdx_hypercall(TDVMCALL_SETUP_NOTIFY_INTR, cfg->vector, 0, 0, 0)) {
+               pr_err("Event notification hypercall failed\n");
+               goto err_free_irqs;
+       }
+
+       tdx_event_irq = irq;
+
+       return 0;
+
+err_free_irqs:
+       free_irq(irq, NULL);
+err_free_domain_irqs:
+       irq_domain_free_irqs(irq, 1);
+
+       return -EIO;
+}
+early_initcall(tdx_event_irq_init)
Huang, Kai April 26, 2023, 1:59 a.m. UTC | #3
On Tue, 2023-04-25 at 16:47 -0700, Sathyanarayanan Kuppuswamy wrote:
> Hi Kai,
> 
> On 4/14/23 6:34 AM, Huang, Kai wrote:
> > On Wed, 2023-04-12 at 20:41 -0700, Kuppuswamy Sathyanarayanan wrote:
> > > Host-guest event notification via configured interrupt vector is useful
> > > in cases where a guest makes an asynchronous request and needs a
> > > callback from the host to indicate the completion or to let the host
> > > notify the guest about events like device removal. One usage example is,
> > > callback requirement of GetQuote asynchronous hypercall.
> > > 
> > > In TDX guest, SetupEventNotifyInterrupt hypercall can be used by the
> > > guest to specify which interrupt vector to use as an event-notify
> > > vector from the VMM. Details about the SetupEventNotifyInterrupt
> > > hypercall can be found in TDX Guest-Host Communication Interface
> > > (GHCI) Specification, section "VP.VMCALL<SetupEventNotifyInterrupt>".
> > > 
> > > As per design, VMM will post the event completion IRQ using the same
> > > CPU on which SetupEventNotifyInterrupt hypercall request is received.
> > > So allocate an IRQ vector from "x86_vector_domain", and set the CPU
> > > affinity of the IRQ vector to the CPU on which
> > > SetupEventNotifyInterrupt hypercall is made.
> > > 
> > > Add tdx_register_event_irq_cb()/tdx_unregister_event_irq_cb()
> > > interfaces to allow drivers register/unregister event noficiation
> > 			      ^
> > 			      to register/unregister
> > > handlers.
> > > 
> > > 
> > 
> > [...]
> > 
> 
> With suggested changes, the final version looks like below.
> 
> +/**
> + * tdx_event_irq_init() - Register IRQ for event notification from the VMM to
> + *                       the TDX Guest.
> + *
> + * Use SetupEventNotifyInterrupt TDVMCALL to register the event notification
> + * IRQ with the VMM, which is used by the VMM to notify the TDX guest when
> + * needed, for instance, when VMM finishes the GetQuote request from the TDX
> + * guest. The VMM always notifies the TDX guest via the same CPU on which the
> + * SetupEventNotifyInterrupt TDVMCALL is called. For simplicity, just allocate
> + * an IRQ (and a vector) directly from x86_vector_domain for such notification
> + * and pin the IRQ to the same CPU on which TDVMCALL is called.

I think "for simplicity" applies to allocate IRQ/vector "from BSP using
early_initcall()" (so TDVMCALL is easily guaranteed to be called on the same cpu
where vector is allocated), but doesn't apply to allocate IRQ/vector from
x86_vector_domain and "pin the IRQ to the same CPU on which TDVMCALAL is
called".  The latter is something you must do (otherwise you need to allocate
the same vector on all cpus), but not something that you do "for simplicity".

> + *
> + * Since tdx_event_irq_init() is triggered via early_initcall(), it will be
> + * called before secondary CPUs bring up, so no special logic is required to
> + * ensure that the same CPU is used for SetupEventNotifyInterrupt TDVMCALL and
> + * IRQ allocation.

IMHO the second paragraph is obvious and no need to mention.

As explained above, I guess you just need to at somewhere simply mention
something like: "for simplicity use early_initcall() to allocate and pin the
IRQ/vector on BSP and also call the TDVMCALL on BSP".  Or probably "also call
the TDVMCALL on BSP" can also be omitted as it's kinda already explained in the
nature of the TDVMCALL.

> + */
> +static int __init tdx_event_irq_init(void)
> +{
> +       struct irq_affinity_desc desc;
> +       struct irq_alloc_info info;
> +       struct irq_cfg *cfg;
> +       int irq;
> +
> +       if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
> +               return 0;
> +
> +       init_irq_alloc_info(&info, NULL);
> +
> +       cpumask_set_cpu(smp_processor_id(), &desc.mask);
> +
> +       irq = __irq_domain_alloc_irqs(x86_vector_domain, -1, 1, cpu_to_node(0),

cpu_to_node(smp_processor_id())?

> +                                     &info, false, &desc);
> +       if (irq <= 0) {
> +               pr_err("Event notification IRQ allocation failed %d\n", irq);
> +               return -EIO;
> +       }
> +
> +       irq_set_handler(irq, handle_edge_irq);
> +
> +       /*
> +        * The IRQ cannot be migrated because VMM always notifies the TDX
> +        * guest on the same CPU on which the SetupEventNotifyInterrupt
> +        * TDVMCALL is called. Set the IRQ with IRQF_NOBALANCING to prevent
> +        * its affinity from being changed.
> +        */
> +       if (request_irq(irq, tdx_event_irq_handler, IRQF_NOBALANCING,
> +                       "tdx_event_irq", NULL)) {
> +               pr_err("Event notification IRQ request failed\n");
> +               goto err_free_domain_irqs;
> +       }
> +
> +       cfg = irq_cfg(irq);
> +
> +       if (_tdx_hypercall(TDVMCALL_SETUP_NOTIFY_INTR, cfg->vector, 0, 0, 0)) {
> +               pr_err("Event notification hypercall failed\n");
> +               goto err_free_irqs;
> +       }
> +
> +       tdx_event_irq = irq;
> +
> +       return 0;
> +
> +err_free_irqs:
> +       free_irq(irq, NULL);
> +err_free_domain_irqs:
> +       irq_domain_free_irqs(irq, 1);
> +
> +       return -EIO;
> +}
> +early_initcall(tdx_event_irq_init)
> 
> 
> 
> 
> -- 
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer
Kuppuswamy Sathyanarayanan April 26, 2023, 6:07 a.m. UTC | #4
On 4/25/23 6:59 PM, Huang, Kai wrote:
> On Tue, 2023-04-25 at 16:47 -0700, Sathyanarayanan Kuppuswamy wrote:
>> Hi Kai,
>>
>> On 4/14/23 6:34 AM, Huang, Kai wrote:
>>> On Wed, 2023-04-12 at 20:41 -0700, Kuppuswamy Sathyanarayanan wrote:
>>>> Host-guest event notification via configured interrupt vector is useful
>>>> in cases where a guest makes an asynchronous request and needs a
>>>> callback from the host to indicate the completion or to let the host
>>>> notify the guest about events like device removal. One usage example is,
>>>> callback requirement of GetQuote asynchronous hypercall.
>>>>
>>>> In TDX guest, SetupEventNotifyInterrupt hypercall can be used by the
>>>> guest to specify which interrupt vector to use as an event-notify
>>>> vector from the VMM. Details about the SetupEventNotifyInterrupt
>>>> hypercall can be found in TDX Guest-Host Communication Interface
>>>> (GHCI) Specification, section "VP.VMCALL<SetupEventNotifyInterrupt>".
>>>>
>>>> As per design, VMM will post the event completion IRQ using the same
>>>> CPU on which SetupEventNotifyInterrupt hypercall request is received.
>>>> So allocate an IRQ vector from "x86_vector_domain", and set the CPU
>>>> affinity of the IRQ vector to the CPU on which
>>>> SetupEventNotifyInterrupt hypercall is made.
>>>>
>>>> Add tdx_register_event_irq_cb()/tdx_unregister_event_irq_cb()
>>>> interfaces to allow drivers register/unregister event noficiation
>>> 			      ^
>>> 			      to register/unregister
>>>> handlers.
>>>>
>>>>
>>>
>>> [...]
>>>
>>
>> With suggested changes, the final version looks like below.
>>
>> +/**
>> + * tdx_event_irq_init() - Register IRQ for event notification from the VMM to
>> + *                       the TDX Guest.
>> + *
>> + * Use SetupEventNotifyInterrupt TDVMCALL to register the event notification
>> + * IRQ with the VMM, which is used by the VMM to notify the TDX guest when
>> + * needed, for instance, when VMM finishes the GetQuote request from the TDX
>> + * guest. The VMM always notifies the TDX guest via the same CPU on which the
>> + * SetupEventNotifyInterrupt TDVMCALL is called. For simplicity, just allocate
>> + * an IRQ (and a vector) directly from x86_vector_domain for such notification
>> + * and pin the IRQ to the same CPU on which TDVMCALL is called.
> 
> I think "for simplicity" applies to allocate IRQ/vector "from BSP using
> early_initcall()" (so TDVMCALL is easily guaranteed to be called on the same cpu
> where vector is allocated), but doesn't apply to allocate IRQ/vector from
> x86_vector_domain and "pin the IRQ to the same CPU on which TDVMCALAL is
> called".  The latter is something you must do (otherwise you need to allocate
> the same vector on all cpus), but not something that you do "for simplicity".
> 
>> + *
>> + * Since tdx_event_irq_init() is triggered via early_initcall(), it will be
>> + * called before secondary CPUs bring up, so no special logic is required to
>> + * ensure that the same CPU is used for SetupEventNotifyInterrupt TDVMCALL and
>> + * IRQ allocation.
> 
> IMHO the second paragraph is obvious and no need to mention.
> 
> As explained above, I guess you just need to at somewhere simply mention
> something like: "for simplicity use early_initcall() to allocate and pin the
> IRQ/vector on BSP and also call the TDVMCALL on BSP".  Or probably "also call
> the TDVMCALL on BSP" can also be omitted as it's kinda already explained in the
> nature of the TDVMCALL.

How about the following?

Use SetupEventNotifyInterrupt TDVMCALL to register the event notification
IRQ with the VMM, which is used by the VMM to notify the TDX guest when
needed, for instance, when VMM finishes the GetQuote request from the TDX
guest. The VMM always notifies the TDX guest via the same CPU that calls the
SetupEventNotifyInterrupt TDVMCALL. Allocate an IRQ/vector from the
x86_vector_domain and pin it on the same CPU on which TDVMCALL is called.
For simplicity, use early_initcall() to allow both IRQ allocation and
TDVMCALL to use BSP.

> 
>> + */
>> +static int __init tdx_event_irq_init(void)
>> +{
>> +       struct irq_affinity_desc desc;
>> +       struct irq_alloc_info info;
>> +       struct irq_cfg *cfg;
>> +       int irq;
>> +
>> +       if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
>> +               return 0;
>> +
>> +       init_irq_alloc_info(&info, NULL);
>> +
>> +       cpumask_set_cpu(smp_processor_id(), &desc.mask);
>> +
>> +       irq = __irq_domain_alloc_irqs(x86_vector_domain, -1, 1, cpu_to_node(0),
> 
> cpu_to_node(smp_processor_id())?
> 
>> +                                     &info, false, &desc);
>> +       if (irq <= 0) {
>> +               pr_err("Event notification IRQ allocation failed %d\n", irq);
>> +               return -EIO;
>> +       }
>> +
>> +       irq_set_handler(irq, handle_edge_irq);
>> +
>> +       /*
>> +        * The IRQ cannot be migrated because VMM always notifies the TDX
>> +        * guest on the same CPU on which the SetupEventNotifyInterrupt
>> +        * TDVMCALL is called. Set the IRQ with IRQF_NOBALANCING to prevent
>> +        * its affinity from being changed.
>> +        */
>> +       if (request_irq(irq, tdx_event_irq_handler, IRQF_NOBALANCING,
>> +                       "tdx_event_irq", NULL)) {
>> +               pr_err("Event notification IRQ request failed\n");
>> +               goto err_free_domain_irqs;
>> +       }
>> +
>> +       cfg = irq_cfg(irq);
>> +
>> +       if (_tdx_hypercall(TDVMCALL_SETUP_NOTIFY_INTR, cfg->vector, 0, 0, 0)) {
>> +               pr_err("Event notification hypercall failed\n");
>> +               goto err_free_irqs;
>> +       }
>> +
>> +       tdx_event_irq = irq;
>> +
>> +       return 0;
>> +
>> +err_free_irqs:
>> +       free_irq(irq, NULL);
>> +err_free_domain_irqs:
>> +       irq_domain_free_irqs(irq, 1);
>> +
>> +       return -EIO;
>> +}
>> +early_initcall(tdx_event_irq_init)
>>
>>
>>
>>
>> -- 
>> Sathyanarayanan Kuppuswamy
>> Linux Kernel Developer
> 
>
Huang, Kai April 28, 2023, 1:50 p.m. UTC | #5
On Tue, 2023-04-25 at 23:07 -0700, Sathyanarayanan Kuppuswamy wrote:
> > > +/**
> > > + * tdx_event_irq_init() - Register IRQ for event notification from the
> > > VMM to
> > > + *                       the TDX Guest.
> > > + *
> > > + * Use SetupEventNotifyInterrupt TDVMCALL to register the event
> > > notification
> > > + * IRQ with the VMM, which is used by the VMM to notify the TDX guest
> > > when
> > > + * needed, for instance, when VMM finishes the GetQuote request from the
> > > TDX
> > > + * guest. The VMM always notifies the TDX guest via the same CPU on which
> > > the
> > > + * SetupEventNotifyInterrupt TDVMCALL is called. For simplicity, just
> > > allocate
> > > + * an IRQ (and a vector) directly from x86_vector_domain for such
> > > notification
> > > + * and pin the IRQ to the same CPU on which TDVMCALL is called.
> > 
> > I think "for simplicity" applies to allocate IRQ/vector "from BSP using
> > early_initcall()" (so TDVMCALL is easily guaranteed to be called on the same
> > cpu
> > where vector is allocated), but doesn't apply to allocate IRQ/vector from
> > x86_vector_domain and "pin the IRQ to the same CPU on which TDVMCALAL is
> > called".  The latter is something you must do (otherwise you need to
> > allocate
> > the same vector on all cpus), but not something that you do "for
> > simplicity".
> > 
> > > + *
> > > + * Since tdx_event_irq_init() is triggered via early_initcall(), it will
> > > be
> > > + * called before secondary CPUs bring up, so no special logic is required
> > > to
> > > + * ensure that the same CPU is used for SetupEventNotifyInterrupt
> > > TDVMCALL and
> > > + * IRQ allocation.
> > 
> > IMHO the second paragraph is obvious and no need to mention.
> > 
> > As explained above, I guess you just need to at somewhere simply mention
> > something like: "for simplicity use early_initcall() to allocate and pin the
> > IRQ/vector on BSP and also call the TDVMCALL on BSP".  Or probably "also
> > call
> > the TDVMCALL on BSP" can also be omitted as it's kinda already explained in
> > the
> > nature of the TDVMCALL.
> 
> How about the following?
> 
> Use SetupEventNotifyInterrupt TDVMCALL to register the event notification
> IRQ with the VMM, which is used by the VMM to notify the TDX guest when
> needed, for instance, when VMM finishes the GetQuote request from the TDX
> guest. The VMM always notifies the TDX guest via the same CPU that calls the
> SetupEventNotifyInterrupt TDVMCALL. Allocate an IRQ/vector from the
> x86_vector_domain and pin it on the same CPU on which TDVMCALL is called.
> For simplicity, use early_initcall() to allow both IRQ allocation and
> TDVMCALL to use BSP.

Sorry I missed your reply.  OK to me.
diff mbox series

Patch

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 055300e08fb3..26f6e2eaf5c8 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -7,12 +7,17 @@ 
 #include <linux/cpufeature.h>
 #include <linux/export.h>
 #include <linux/io.h>
+#include <linux/spinlock.h>
+#include <linux/list.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
 #include <asm/coco.h>
 #include <asm/tdx.h>
 #include <asm/vmx.h>
 #include <asm/insn.h>
 #include <asm/insn-eval.h>
 #include <asm/pgtable.h>
+#include <asm/irqdomain.h>
 
 /* TDX module Call Leaf IDs */
 #define TDX_GET_INFO			1
@@ -27,6 +32,7 @@ 
 /* TDX hypercall Leaf IDs */
 #define TDVMCALL_MAP_GPA		0x10001
 #define TDVMCALL_REPORT_FATAL_ERROR	0x10003
+#define TDVMCALL_SETUP_NOTIFY_INTR	0x10004
 
 /* MMIO direction */
 #define EPT_READ	0
@@ -51,6 +57,16 @@ 
 
 #define TDREPORT_SUBTYPE_0	0
 
+struct event_irq_entry {
+	tdx_event_irq_cb_t handler;
+	void *data;
+	struct list_head head;
+};
+
+static int tdx_event_irq __ro_after_init;
+static LIST_HEAD(event_irq_cb_list);
+static DEFINE_SPINLOCK(event_irq_cb_lock);
+
 /*
  * Wrapper for standard use of __tdx_hypercall with no output aside from
  * return code.
@@ -873,3 +889,143 @@  void __init tdx_early_init(void)
 
 	pr_info("Guest detected\n");
 }
+
+static irqreturn_t tdx_event_irq_handler(int irq, void *dev_id)
+{
+	struct event_irq_entry *entry;
+
+	spin_lock(&event_irq_cb_lock);
+	list_for_each_entry(entry, &event_irq_cb_list, head) {
+		if (entry->handler)
+			entry->handler(entry->data);
+	}
+	spin_unlock(&event_irq_cb_lock);
+
+	return IRQ_HANDLED;
+}
+
+/* Reserve an IRQ from x86_vector_domain for TD event notification */
+static int __init tdx_event_irq_init(void)
+{
+	struct irq_alloc_info info;
+	struct irq_cfg *cfg;
+	int irq;
+
+	if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
+		return 0;
+
+	init_irq_alloc_info(&info, NULL);
+
+	/*
+	 * Event notification vector will be delivered to the CPU
+	 * in which TDVMCALL_SETUP_NOTIFY_INTR hypercall is requested.
+	 * So set the IRQ affinity to the current CPU (CPU 0).
+	 */
+	info.mask = cpumask_of(0);
+
+	irq = irq_domain_alloc_irqs(x86_vector_domain, 1, cpu_to_node(0), &info);
+	if (irq <= 0) {
+		pr_err("Event notification IRQ allocation failed %d\n", irq);
+		return -EIO;
+	}
+
+	irq_set_handler(irq, handle_edge_irq);
+
+	/* Since the IRQ affinity is set, it cannot be balanced */
+	if (request_irq(irq, tdx_event_irq_handler, IRQF_NOBALANCING,
+			"tdx_event_irq", NULL)) {
+		pr_err("Event notification IRQ request failed\n");
+		goto err_free_domain_irqs;
+	}
+
+	cfg = irq_cfg(irq);
+
+	/*
+	 * Since tdx_event_irq_init() is triggered via early_initcall(),
+	 * it will called before secondary CPUs bringup. Since there is
+	 * only one CPU, it complies with the requirement of executing
+	 * the TDVMCALL_SETUP_NOTIFY_INTR hypercall on the same CPU where
+	 * the IRQ vector is allocated.
+	 *
+	 * Register callback vector address with VMM. More details
+	 * about the ABI can be found in TDX Guest-Host-Communication
+	 * Interface (GHCI), sec titled
+	 * "TDG.VP.VMCALL<SetupEventNotifyInterrupt>".
+	 */
+	if (_tdx_hypercall(TDVMCALL_SETUP_NOTIFY_INTR, cfg->vector, 0, 0, 0)) {
+		pr_err("Event notification hypercall failed\n");
+		goto err_free_irqs;
+	}
+
+	tdx_event_irq = irq;
+
+	return 0;
+
+err_free_irqs:
+	free_irq(irq, NULL);
+err_free_domain_irqs:
+	irq_domain_free_irqs(irq, 1);
+
+	return -EIO;
+}
+early_initcall(tdx_event_irq_init)
+
+/**
+ * tdx_register_event_irq_cb() - Register TDX event IRQ callback handler.
+ * @handler: Address of driver specific event IRQ callback handler. Handler
+ *           will be called in IRQ context and hence cannot sleep.
+ * @data: Context data to be passed to the callback handler.
+ *
+ * Return: 0 on success or standard error code on other failures.
+ */
+int tdx_register_event_irq_cb(tdx_event_irq_cb_t handler, void *data)
+{
+	struct event_irq_entry *entry;
+	unsigned long flags;
+
+	if (tdx_event_irq <= 0)
+		return -EIO;
+
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		return -ENOMEM;
+
+	entry->data = data;
+	entry->handler = handler;
+
+	spin_lock_irqsave(&event_irq_cb_lock, flags);
+	list_add_tail(&entry->head, &event_irq_cb_list);
+	spin_unlock_irqrestore(&event_irq_cb_lock, flags);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(tdx_register_event_irq_cb);
+
+/**
+ * tdx_unregister_event_irq_cb() - Unregister TDX event IRQ callback handler.
+ * @handler: Address of driver specific event IRQ callback handler.
+ * @data: Context data to be passed to the callback handler.
+ *
+ * Return: 0 on success or -EIO if event IRQ is not allocated.
+ */
+int tdx_unregister_event_irq_cb(tdx_event_irq_cb_t handler, void *data)
+{
+	struct event_irq_entry *entry;
+	unsigned long flags;
+
+	if (tdx_event_irq <= 0)
+		return -EIO;
+
+	spin_lock_irqsave(&event_irq_cb_lock, flags);
+	list_for_each_entry(entry, &event_irq_cb_list, head) {
+		if (entry->handler == handler && entry->data == data) {
+			list_del(&entry->head);
+			kfree(entry);
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&event_irq_cb_lock, flags);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(tdx_unregister_event_irq_cb);
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 28d889c9aa16..8807fe1b1f3f 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -53,6 +53,8 @@  struct ve_info {
 
 #ifdef CONFIG_INTEL_TDX_GUEST
 
+typedef int (*tdx_event_irq_cb_t)(void *);
+
 void __init tdx_early_init(void);
 
 /* Used to communicate with the TDX module */
@@ -69,6 +71,10 @@  bool tdx_early_handle_ve(struct pt_regs *regs);
 
 int tdx_mcall_get_report0(u8 *reportdata, u8 *tdreport);
 
+int tdx_register_event_irq_cb(tdx_event_irq_cb_t handler, void *data);
+
+int tdx_unregister_event_irq_cb(tdx_event_irq_cb_t handler, void *data);
+
 #else
 
 static inline void tdx_early_init(void) { };