diff mbox

[RFC] KVM: arm/arm64: vgic: change condition for level interrupt resampling

Message ID 1520492490-7943-1-git-send-email-shunyong.yang@hxt-semitech.com (mailing list archive)
State New, archived
Headers show

Commit Message

Shunyong Yang March 8, 2018, 7:01 a.m. UTC
When resampling irqfds is enabled, level interrupt should be
de-asserted when resampling happens. On page 4-47 of GIC v3
specification IHI0069D, it said,
"When the PE acknowledges an SGI, a PPI, or an SPI at the CPU
interface, the IRI changes the status of the interrupt to active
and pending if:
• It is an edge-triggered interrupt, and another edge has been
detected since the interrupt was acknowledged.
• It is a level-sensitive interrupt, and the level has not been
deasserted since the interrupt was acknowledged."

GIC v2 specification IHI0048B.b has similar description on page
3-42 for state machine transition.

When some VFIO device, like mtty(8250 VFIO mdev emulation driver
in samples/vfio-mdev) triggers a level interrupt, the status
transition in LR is pending-->active-->active and pending.
Then it will wait resampling to de-assert the interrupt.

Current design of lr_signals_eoi_mi() will return false if state
in LR is not invalid(Inactive). It causes resampling will not happen
in mtty case.

This will cause interrupt fired continuously to guest even 8250 IIR
has no interrupt. When 8250's interrupt is configured in shared mode,
it will pass interrupt to other drivers to handle. However, there
is no other driver involved. Then, a "nobody cared" kernel complaint
occurs.

/ # cat /dev/ttyS0
[    4.826836] random: crng init done
[    6.373620] irq 41: nobody cared (try booting with the "irqpoll"
option)
[    6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4
[    6.378927] Hardware name: linux,dummy-virt (DT)
[    6.380876] Call trace:
[    6.381937]  dump_backtrace+0x0/0x180
[    6.383495]  show_stack+0x14/0x1c
[    6.384902]  dump_stack+0x90/0xb4
[    6.386312]  __report_bad_irq+0x38/0xe0
[    6.387944]  note_interrupt+0x1f4/0x2b8
[    6.389568]  handle_irq_event_percpu+0x54/0x7c
[    6.391433]  handle_irq_event+0x44/0x74
[    6.393056]  handle_fasteoi_irq+0x9c/0x154
[    6.394784]  generic_handle_irq+0x24/0x38
[    6.396483]  __handle_domain_irq+0x60/0xb4
[    6.398207]  gic_handle_irq+0x98/0x1b0
[    6.399796]  el1_irq+0xb0/0x128
[    6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40
[    6.403149]  __setup_irq+0x41c/0x678
[    6.404669]  request_threaded_irq+0xe0/0x190
[    6.406474]  univ8250_setup_irq+0x208/0x234
[    6.408250]  serial8250_do_startup+0x1b4/0x754
[    6.410123]  serial8250_startup+0x20/0x28
[    6.411826]  uart_startup.part.21+0x78/0x144
[    6.413633]  uart_port_activate+0x50/0x68
[    6.415328]  tty_port_open+0x84/0xd4
[    6.416851]  uart_open+0x34/0x44
[    6.418229]  tty_open+0xec/0x3c8
[    6.419610]  chrdev_open+0xb0/0x198
[    6.421093]  do_dentry_open+0x200/0x310
[    6.422714]  vfs_open+0x54/0x84
[    6.424054]  path_openat+0x2dc/0xf04
[    6.425569]  do_filp_open+0x68/0xd8
[    6.427044]  do_sys_open+0x16c/0x224
[    6.428563]  SyS_openat+0x10/0x18
[    6.429972]  el0_svc_naked+0x30/0x34
[    6.431494] handlers:
[    6.432479] [<000000000e9fb4bb>] serial8250_interrupt
[    6.434597] Disabling IRQ #41

This patch changes the lr state condition in lr_signals_eoi_mi() from
invalid(Inactive) to active and pending to avoid this.

I am not sure about the original design of the condition of
invalid(active). So, This RFC is sent out for comments.

Cc: Joey Zheng <yu.zheng@hxt-semitech.com>
Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
---
 virt/kvm/arm/vgic/vgic-v2.c | 4 ++--
 virt/kvm/arm/vgic/vgic-v3.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

Comments

Eric Auger March 8, 2018, 8:57 a.m. UTC | #1
Hi,

On 08/03/18 08:01, Shunyong Yang wrote:
> When resampling irqfds is enabled, level interrupt should be
> de-asserted when resampling happens. On page 4-47 of GIC v3
> specification IHI0069D, it said,
> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU
> interface, the IRI changes the status of the interrupt to active
> and pending if:
> • It is an edge-triggered interrupt, and another edge has been
> detected since the interrupt was acknowledged.
> • It is a level-sensitive interrupt, and the level has not been
> deasserted since the interrupt was acknowledged."
> 
> GIC v2 specification IHI0048B.b has similar description on page
> 3-42 for state machine transition.
> 
> When some VFIO device, like mtty(8250 VFIO mdev emulation driver
> in samples/vfio-mdev) triggers a level interrupt, the status
> transition in LR is pending-->active-->active and pending.
> Then it will wait resampling to de-assert the interrupt.
> 
> Current design of lr_signals_eoi_mi() will return false if state
> in LR is not invalid(Inactive). It causes resampling will not happen
> in mtty case.
> 
> This will cause interrupt fired continuously to guest even 8250 IIR
> has no interrupt. When 8250's interrupt is configured in shared mode,
> it will pass interrupt to other drivers to handle. However, there
> is no other driver involved. Then, a "nobody cared" kernel complaint
> occurs.
> 
> / # cat /dev/ttyS0
> [    4.826836] random: crng init done
> [    6.373620] irq 41: nobody cared (try booting with the "irqpoll"
> option)
> [    6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4
> [    6.378927] Hardware name: linux,dummy-virt (DT)
> [    6.380876] Call trace:
> [    6.381937]  dump_backtrace+0x0/0x180
> [    6.383495]  show_stack+0x14/0x1c
> [    6.384902]  dump_stack+0x90/0xb4
> [    6.386312]  __report_bad_irq+0x38/0xe0
> [    6.387944]  note_interrupt+0x1f4/0x2b8
> [    6.389568]  handle_irq_event_percpu+0x54/0x7c
> [    6.391433]  handle_irq_event+0x44/0x74
> [    6.393056]  handle_fasteoi_irq+0x9c/0x154
> [    6.394784]  generic_handle_irq+0x24/0x38
> [    6.396483]  __handle_domain_irq+0x60/0xb4
> [    6.398207]  gic_handle_irq+0x98/0x1b0
> [    6.399796]  el1_irq+0xb0/0x128
> [    6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40
> [    6.403149]  __setup_irq+0x41c/0x678
> [    6.404669]  request_threaded_irq+0xe0/0x190
> [    6.406474]  univ8250_setup_irq+0x208/0x234
> [    6.408250]  serial8250_do_startup+0x1b4/0x754
> [    6.410123]  serial8250_startup+0x20/0x28
> [    6.411826]  uart_startup.part.21+0x78/0x144
> [    6.413633]  uart_port_activate+0x50/0x68
> [    6.415328]  tty_port_open+0x84/0xd4
> [    6.416851]  uart_open+0x34/0x44
> [    6.418229]  tty_open+0xec/0x3c8
> [    6.419610]  chrdev_open+0xb0/0x198
> [    6.421093]  do_dentry_open+0x200/0x310
> [    6.422714]  vfs_open+0x54/0x84
> [    6.424054]  path_openat+0x2dc/0xf04
> [    6.425569]  do_filp_open+0x68/0xd8
> [    6.427044]  do_sys_open+0x16c/0x224
> [    6.428563]  SyS_openat+0x10/0x18
> [    6.429972]  el0_svc_naked+0x30/0x34
> [    6.431494] handlers:
> [    6.432479] [<000000000e9fb4bb>] serial8250_interrupt
> [    6.434597] Disabling IRQ #41
> 
> This patch changes the lr state condition in lr_signals_eoi_mi() from
> invalid(Inactive) to active and pending to avoid this.
> 
> I am not sure about the original design of the condition of
> invalid(active). So, This RFC is sent out for comments.
> 
> Cc: Joey Zheng <yu.zheng@hxt-semitech.com>
> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
> ---
>  virt/kvm/arm/vgic/vgic-v2.c | 4 ++--
>  virt/kvm/arm/vgic/vgic-v3.c | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> index e9d840a75e7b..740ee9a5f551 100644
> --- a/virt/kvm/arm/vgic/vgic-v2.c
> +++ b/virt/kvm/arm/vgic/vgic-v2.c
> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu)
>  
>  static bool lr_signals_eoi_mi(u32 lr_val)
>  {
> -	return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) &&
> -	       !(lr_val & GICH_LR_HW);
> +	return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) &&
> +	       (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW);
>  }
>  
>  /*
> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> index 6b329414e57a..43111bba7af9 100644
> --- a/virt/kvm/arm/vgic/vgic-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu)
>  
>  static bool lr_signals_eoi_mi(u64 lr_val)
>  {
> -	return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) &&
> -	       !(lr_val & ICH_LR_HW);
> +	return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) &&
> +	       (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW);


In general don't we have this state transition

inactive -> pending -> pending + active (1) -> active -> inactive.

In that case won't we lower the virt irq level when folding the LR on
Pending + Active state, which is not was we want?

Thanks

Eric

>  }
>  
>  void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>
Shunyong Yang March 8, 2018, 9:31 a.m. UTC | #2
Hi, Eric,

First, please let me change Christoffer's email to cdall@kernel.org. I
add more information about my test below, please check.

On Thu, 2018-03-08 at 09:57 +0100, Auger Eric wrote:
> Hi,

> 

> On 08/03/18 08:01, Shunyong Yang wrote:

> > 

> > When resampling irqfds is enabled, level interrupt should be

> > de-asserted when resampling happens. On page 4-47 of GIC v3

> > specification IHI0069D, it said,

> > "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU

> > interface, the IRI changes the status of the interrupt to active

> > and pending if:

> > • It is an edge-triggered interrupt, and another edge has been

> > detected since the interrupt was acknowledged.

> > • It is a level-sensitive interrupt, and the level has not been

> > deasserted since the interrupt was acknowledged."

> > 

> > GIC v2 specification IHI0048B.b has similar description on page

> > 3-42 for state machine transition.

> > 

> > When some VFIO device, like mtty(8250 VFIO mdev emulation driver

> > in samples/vfio-mdev) triggers a level interrupt, the status

> > transition in LR is pending-->active-->active and pending.

> > Then it will wait resampling to de-assert the interrupt.

> > 

> > Current design of lr_signals_eoi_mi() will return false if state

> > in LR is not invalid(Inactive). It causes resampling will not

> > happen

> > in mtty case.

> > 

> > This will cause interrupt fired continuously to guest even 8250 IIR

> > has no interrupt. When 8250's interrupt is configured in shared

> > mode,

> > it will pass interrupt to other drivers to handle. However, there

> > is no other driver involved. Then, a "nobody cared" kernel

> > complaint

> > occurs.

> > 

> > / # cat /dev/ttyS0

> > [    4.826836] random: crng init done

> > [    6.373620] irq 41: nobody cared (try booting with the "irqpoll"

> > option)

> > [    6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4

> > [    6.378927] Hardware name: linux,dummy-virt (DT)

> > [    6.380876] Call trace:

> > [    6.381937]  dump_backtrace+0x0/0x180

> > [    6.383495]  show_stack+0x14/0x1c

> > [    6.384902]  dump_stack+0x90/0xb4

> > [    6.386312]  __report_bad_irq+0x38/0xe0

> > [    6.387944]  note_interrupt+0x1f4/0x2b8

> > [    6.389568]  handle_irq_event_percpu+0x54/0x7c

> > [    6.391433]  handle_irq_event+0x44/0x74

> > [    6.393056]  handle_fasteoi_irq+0x9c/0x154

> > [    6.394784]  generic_handle_irq+0x24/0x38

> > [    6.396483]  __handle_domain_irq+0x60/0xb4

> > [    6.398207]  gic_handle_irq+0x98/0x1b0

> > [    6.399796]  el1_irq+0xb0/0x128

> > [    6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40

> > [    6.403149]  __setup_irq+0x41c/0x678

> > [    6.404669]  request_threaded_irq+0xe0/0x190

> > [    6.406474]  univ8250_setup_irq+0x208/0x234

> > [    6.408250]  serial8250_do_startup+0x1b4/0x754

> > [    6.410123]  serial8250_startup+0x20/0x28

> > [    6.411826]  uart_startup.part.21+0x78/0x144

> > [    6.413633]  uart_port_activate+0x50/0x68

> > [    6.415328]  tty_port_open+0x84/0xd4

> > [    6.416851]  uart_open+0x34/0x44

> > [    6.418229]  tty_open+0xec/0x3c8

> > [    6.419610]  chrdev_open+0xb0/0x198

> > [    6.421093]  do_dentry_open+0x200/0x310

> > [    6.422714]  vfs_open+0x54/0x84

> > [    6.424054]  path_openat+0x2dc/0xf04

> > [    6.425569]  do_filp_open+0x68/0xd8

> > [    6.427044]  do_sys_open+0x16c/0x224

> > [    6.428563]  SyS_openat+0x10/0x18

> > [    6.429972]  el0_svc_naked+0x30/0x34

> > [    6.431494] handlers:

> > [    6.432479] [<000000000e9fb4bb>] serial8250_interrupt

> > [    6.434597] Disabling IRQ #41

> > 

> > This patch changes the lr state condition in lr_signals_eoi_mi()

> > from

> > invalid(Inactive) to active and pending to avoid this.

> > 

> > I am not sure about the original design of the condition of

> > invalid(active). So, This RFC is sent out for comments.

> > 

> > Cc: Joey Zheng <yu.zheng@hxt-semitech.com>

> > Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>

> > ---

> >  virt/kvm/arm/vgic/vgic-v2.c | 4 ++--

> >  virt/kvm/arm/vgic/vgic-v3.c | 4 ++--

> >  2 files changed, 4 insertions(+), 4 deletions(-)

> > 

> > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-

> > v2.c

> > index e9d840a75e7b..740ee9a5f551 100644

> > --- a/virt/kvm/arm/vgic/vgic-v2.c

> > +++ b/virt/kvm/arm/vgic/vgic-v2.c

> > @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu)

> >  

> >  static bool lr_signals_eoi_mi(u32 lr_val)

> >  {

> > -	return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI)

> > &&

> > -	       !(lr_val & GICH_LR_HW);

> > +	return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) &&

> > +	       (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW);

> >  }

> >  

> >  /*

> > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-

> > v3.c

> > index 6b329414e57a..43111bba7af9 100644

> > --- a/virt/kvm/arm/vgic/vgic-v3.c

> > +++ b/virt/kvm/arm/vgic/vgic-v3.c

> > @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu)

> >  

> >  static bool lr_signals_eoi_mi(u64 lr_val)

> >  {

> > -	return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI)

> > &&

> > -	       !(lr_val & ICH_LR_HW);

> > +	return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) &&

> > +	       (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW);

> 

> In general don't we have this state transition

> 

> inactive -> pending -> pending + active (1) -> active -> inactive.

> 

> In that case won't we lower the virt irq level when folding the LR on

> Pending + Active state, which is not was we want?

> 

> Thanks

> 

> Eric


In current code, in my test, when I output LR value of the mtty IRQ 41
(hwirq = 36) in vgic_v3_fold_lr_state(). The LR's transition starts
like following,

0-->50a0020000000024-->90a0020000000024-->d0a0020000000024

That is inactive-->pending-->active-->pending + active.
Then it keep running cyclic pending-->active-->pending + active.

The level interrupt de-assert should happen in following code
/* Notify fds when the guest EOI'ed a level-triggered IRQ */
	if (lr_signals_eoi_mi(val) && vgic_valid_spi(vcpu->kvm, intid))
		kvm_notify_acked_irq(vcpu->kvm, 0,
				     intid - VGIC_NR_PRIVATE_IRQS);

But as addressed in commit message, lr_signals_eoi_mi() will return
false if state in LR is not invalid(inactive), so it has no chance to
de-assert the level interrupt in my test. 

Thanks.
Shunyong.

> 

> > 

> >  }

> >  

> >  void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)

> >
Marc Zyngier March 8, 2018, 9:49 a.m. UTC | #3
[updated Christoffer's email address]

Hi Shunyong,

On 08/03/18 07:01, Shunyong Yang wrote:
> When resampling irqfds is enabled, level interrupt should be
> de-asserted when resampling happens. On page 4-47 of GIC v3
> specification IHI0069D, it said,
> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU
> interface, the IRI changes the status of the interrupt to active
> and pending if:
> • It is an edge-triggered interrupt, and another edge has been
> detected since the interrupt was acknowledged.
> • It is a level-sensitive interrupt, and the level has not been
> deasserted since the interrupt was acknowledged."
> 
> GIC v2 specification IHI0048B.b has similar description on page
> 3-42 for state machine transition.
> 
> When some VFIO device, like mtty(8250 VFIO mdev emulation driver
> in samples/vfio-mdev) triggers a level interrupt, the status
> transition in LR is pending-->active-->active and pending.
> Then it will wait resampling to de-assert the interrupt.
> 
> Current design of lr_signals_eoi_mi() will return false if state
> in LR is not invalid(Inactive). It causes resampling will not happen
> in mtty case.

Let me rephrase this, and tell me if I understood it correctly:

- A level interrupt is injected, activated by the guest (LR state=active)
- guest exits, re-enters, (LR state=pending+active)
- guest EOIs the interrupt (LR state=pending)
- maintenance interrupt
- we don't signal the resampling because we're not in an invalid state

Is that correct?

That's an interesting case, because it seems to invalidate some of the 
optimization that went in over a year ago.

096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields
b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state
af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation

We could compare the value of the LR before the guest entry with
the value at exit time, but we still could miss it if we have a
transition such as P+A -> P -> A and assume a long enough propagation
delay for the maintenance interrupt (which is very likely).

In essence, we have lost the benefit of EISR, which was to give us a
way to deal with asynchronous signalling.

> 
> This will cause interrupt fired continuously to guest even 8250 IIR
> has no interrupt. When 8250's interrupt is configured in shared mode,
> it will pass interrupt to other drivers to handle. However, there
> is no other driver involved. Then, a "nobody cared" kernel complaint
> occurs.
> 
> / # cat /dev/ttyS0
> [    4.826836] random: crng init done
> [    6.373620] irq 41: nobody cared (try booting with the "irqpoll"
> option)
> [    6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4
> [    6.378927] Hardware name: linux,dummy-virt (DT)
> [    6.380876] Call trace:
> [    6.381937]  dump_backtrace+0x0/0x180
> [    6.383495]  show_stack+0x14/0x1c
> [    6.384902]  dump_stack+0x90/0xb4
> [    6.386312]  __report_bad_irq+0x38/0xe0
> [    6.387944]  note_interrupt+0x1f4/0x2b8
> [    6.389568]  handle_irq_event_percpu+0x54/0x7c
> [    6.391433]  handle_irq_event+0x44/0x74
> [    6.393056]  handle_fasteoi_irq+0x9c/0x154
> [    6.394784]  generic_handle_irq+0x24/0x38
> [    6.396483]  __handle_domain_irq+0x60/0xb4
> [    6.398207]  gic_handle_irq+0x98/0x1b0
> [    6.399796]  el1_irq+0xb0/0x128
> [    6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40
> [    6.403149]  __setup_irq+0x41c/0x678
> [    6.404669]  request_threaded_irq+0xe0/0x190
> [    6.406474]  univ8250_setup_irq+0x208/0x234
> [    6.408250]  serial8250_do_startup+0x1b4/0x754
> [    6.410123]  serial8250_startup+0x20/0x28
> [    6.411826]  uart_startup.part.21+0x78/0x144
> [    6.413633]  uart_port_activate+0x50/0x68
> [    6.415328]  tty_port_open+0x84/0xd4
> [    6.416851]  uart_open+0x34/0x44
> [    6.418229]  tty_open+0xec/0x3c8
> [    6.419610]  chrdev_open+0xb0/0x198
> [    6.421093]  do_dentry_open+0x200/0x310
> [    6.422714]  vfs_open+0x54/0x84
> [    6.424054]  path_openat+0x2dc/0xf04
> [    6.425569]  do_filp_open+0x68/0xd8
> [    6.427044]  do_sys_open+0x16c/0x224
> [    6.428563]  SyS_openat+0x10/0x18
> [    6.429972]  el0_svc_naked+0x30/0x34
> [    6.431494] handlers:
> [    6.432479] [<000000000e9fb4bb>] serial8250_interrupt
> [    6.434597] Disabling IRQ #41
> 
> This patch changes the lr state condition in lr_signals_eoi_mi() from
> invalid(Inactive) to active and pending to avoid this.
> 
> I am not sure about the original design of the condition of
> invalid(active). So, This RFC is sent out for comments.
> 
> Cc: Joey Zheng <yu.zheng@hxt-semitech.com>
> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
> ---
>  virt/kvm/arm/vgic/vgic-v2.c | 4 ++--
>  virt/kvm/arm/vgic/vgic-v3.c | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> index e9d840a75e7b..740ee9a5f551 100644
> --- a/virt/kvm/arm/vgic/vgic-v2.c
> +++ b/virt/kvm/arm/vgic/vgic-v2.c
> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu)
>  
>  static bool lr_signals_eoi_mi(u32 lr_val)
>  {
> -	return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) &&
> -	       !(lr_val & GICH_LR_HW);
> +	return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) &&

That feels very wrong. You're now signalling the resampling in both
invalid and pending+active, and the latter state doesn't mean you've
EOIed anything. You're now over-signalling, and signalling the
wrong event.

> +	       (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW);
>  }
>  
>  /*
> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> index 6b329414e57a..43111bba7af9 100644
> --- a/virt/kvm/arm/vgic/vgic-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu)
>  
>  static bool lr_signals_eoi_mi(u64 lr_val)
>  {
> -	return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) &&
> -	       !(lr_val & ICH_LR_HW);
> +	return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) &&
> +	       (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW);
>  }
>  
>  void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
> 

Assuming I understand the issue correctly, I cannot really see how
to solve this without reintroducing EISR, which sucks majorly.

I'll try to cook something shortly and we can all have a good
fight about how crap this is.

Thanks,

	M.
Marc Zyngier March 8, 2018, 11:01 a.m. UTC | #4
On 08/03/18 09:31, Yang, Shunyong wrote:
> Hi, Eric,
> 
> First, please let me change Christoffer's email to cdall@kernel.org. I
> add more information about my test below, please check.
> 
> On Thu, 2018-03-08 at 09:57 +0100, Auger Eric wrote:
>> Hi,
>>
>> On 08/03/18 08:01, Shunyong Yang wrote:
>>>
>>> When resampling irqfds is enabled, level interrupt should be
>>> de-asserted when resampling happens. On page 4-47 of GIC v3
>>> specification IHI0069D, it said,
>>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU
>>> interface, the IRI changes the status of the interrupt to active
>>> and pending if:
>>> • It is an edge-triggered interrupt, and another edge has been
>>> detected since the interrupt was acknowledged.
>>> • It is a level-sensitive interrupt, and the level has not been
>>> deasserted since the interrupt was acknowledged."
>>>
>>> GIC v2 specification IHI0048B.b has similar description on page
>>> 3-42 for state machine transition.
>>>
>>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver
>>> in samples/vfio-mdev) triggers a level interrupt, the status
>>> transition in LR is pending-->active-->active and pending.
>>> Then it will wait resampling to de-assert the interrupt.
>>>
>>> Current design of lr_signals_eoi_mi() will return false if state
>>> in LR is not invalid(Inactive). It causes resampling will not
>>> happen
>>> in mtty case.
>>>
>>> This will cause interrupt fired continuously to guest even 8250 IIR
>>> has no interrupt. When 8250's interrupt is configured in shared
>>> mode,
>>> it will pass interrupt to other drivers to handle. However, there
>>> is no other driver involved. Then, a "nobody cared" kernel
>>> complaint
>>> occurs.
>>>
>>> / # cat /dev/ttyS0
>>> [    4.826836] random: crng init done
>>> [    6.373620] irq 41: nobody cared (try booting with the "irqpoll"
>>> option)
>>> [    6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4
>>> [    6.378927] Hardware name: linux,dummy-virt (DT)
>>> [    6.380876] Call trace:
>>> [    6.381937]  dump_backtrace+0x0/0x180
>>> [    6.383495]  show_stack+0x14/0x1c
>>> [    6.384902]  dump_stack+0x90/0xb4
>>> [    6.386312]  __report_bad_irq+0x38/0xe0
>>> [    6.387944]  note_interrupt+0x1f4/0x2b8
>>> [    6.389568]  handle_irq_event_percpu+0x54/0x7c
>>> [    6.391433]  handle_irq_event+0x44/0x74
>>> [    6.393056]  handle_fasteoi_irq+0x9c/0x154
>>> [    6.394784]  generic_handle_irq+0x24/0x38
>>> [    6.396483]  __handle_domain_irq+0x60/0xb4
>>> [    6.398207]  gic_handle_irq+0x98/0x1b0
>>> [    6.399796]  el1_irq+0xb0/0x128
>>> [    6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40
>>> [    6.403149]  __setup_irq+0x41c/0x678
>>> [    6.404669]  request_threaded_irq+0xe0/0x190
>>> [    6.406474]  univ8250_setup_irq+0x208/0x234
>>> [    6.408250]  serial8250_do_startup+0x1b4/0x754
>>> [    6.410123]  serial8250_startup+0x20/0x28
>>> [    6.411826]  uart_startup.part.21+0x78/0x144
>>> [    6.413633]  uart_port_activate+0x50/0x68
>>> [    6.415328]  tty_port_open+0x84/0xd4
>>> [    6.416851]  uart_open+0x34/0x44
>>> [    6.418229]  tty_open+0xec/0x3c8
>>> [    6.419610]  chrdev_open+0xb0/0x198
>>> [    6.421093]  do_dentry_open+0x200/0x310
>>> [    6.422714]  vfs_open+0x54/0x84
>>> [    6.424054]  path_openat+0x2dc/0xf04
>>> [    6.425569]  do_filp_open+0x68/0xd8
>>> [    6.427044]  do_sys_open+0x16c/0x224
>>> [    6.428563]  SyS_openat+0x10/0x18
>>> [    6.429972]  el0_svc_naked+0x30/0x34
>>> [    6.431494] handlers:
>>> [    6.432479] [<000000000e9fb4bb>] serial8250_interrupt
>>> [    6.434597] Disabling IRQ #41
>>>
>>> This patch changes the lr state condition in lr_signals_eoi_mi()
>>> from
>>> invalid(Inactive) to active and pending to avoid this.
>>>
>>> I am not sure about the original design of the condition of
>>> invalid(active). So, This RFC is sent out for comments.
>>>
>>> Cc: Joey Zheng <yu.zheng@hxt-semitech.com>
>>> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
>>> ---
>>>  virt/kvm/arm/vgic/vgic-v2.c | 4 ++--
>>>  virt/kvm/arm/vgic/vgic-v3.c | 4 ++--
>>>  2 files changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-
>>> v2.c
>>> index e9d840a75e7b..740ee9a5f551 100644
>>> --- a/virt/kvm/arm/vgic/vgic-v2.c
>>> +++ b/virt/kvm/arm/vgic/vgic-v2.c
>>> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu)
>>>  
>>>  static bool lr_signals_eoi_mi(u32 lr_val)
>>>  {
>>> -	return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI)
>>> &&
>>> -	       !(lr_val & GICH_LR_HW);
>>> +	return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) &&
>>> +	       (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW);
>>>  }
>>>  
>>>  /*
>>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-
>>> v3.c
>>> index 6b329414e57a..43111bba7af9 100644
>>> --- a/virt/kvm/arm/vgic/vgic-v3.c
>>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
>>> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu)
>>>  
>>>  static bool lr_signals_eoi_mi(u64 lr_val)
>>>  {
>>> -	return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI)
>>> &&
>>> -	       !(lr_val & ICH_LR_HW);
>>> +	return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) &&
>>> +	       (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW);
>>
>> In general don't we have this state transition
>>
>> inactive -> pending -> pending + active (1) -> active -> inactive.
>>
>> In that case won't we lower the virt irq level when folding the LR on
>> Pending + Active state, which is not was we want?
>>
>> Thanks
>>
>> Eric
> 
> In current code, in my test, when I output LR value of the mtty IRQ 41
> (hwirq = 36) in vgic_v3_fold_lr_state(). The LR's transition starts
> like following,
> 
> 0-->50a0020000000024-->90a0020000000024-->d0a0020000000024
> 
> That is inactive-->pending-->active-->pending + active.
> Then it keep running cyclic pending-->active-->pending + active.
> 
> The level interrupt de-assert should happen in following code
> /* Notify fds when the guest EOI'ed a level-triggered IRQ */
> 	if (lr_signals_eoi_mi(val) && vgic_valid_spi(vcpu->kvm, intid))
> 		kvm_notify_acked_irq(vcpu->kvm, 0,
> 				     intid - VGIC_NR_PRIVATE_IRQS);
> 
> But as addressed in commit message, lr_signals_eoi_mi() will return
> false if state in LR is not invalid(inactive), so it has no chance to
> de-assert the level interrupt in my test. 

The problem is that pending+active is not an indication that the guest
has actually EOI'd anything. In only indicates that it has been activated.

Note that there is a bit of vocabulary discrepancy between KVM and the
ARM architecture: KVM uses "acked" where ARM uses EOI. ARM uses "ACK" or
"Activate" for something entirely different. Maybe the confusion stems
from this difference.

Thanks,

	M.
Eric Auger March 8, 2018, 3:29 p.m. UTC | #5
Hi Shunyong,
On 08/03/18 10:31, Yang, Shunyong wrote:
> Hi, Eric,
> 
> First, please let me change Christoffer's email to cdall@kernel.org. I
> add more information about my test below, please check.
> 
> On Thu, 2018-03-08 at 09:57 +0100, Auger Eric wrote:
>> Hi,
>>
>> On 08/03/18 08:01, Shunyong Yang wrote:
>>>
>>> When resampling irqfds is enabled, level interrupt should be
>>> de-asserted when resampling happens. On page 4-47 of GIC v3
>>> specification IHI0069D, it said,
>>> "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU
>>> interface, the IRI changes the status of the interrupt to active
>>> and pending if:
>>> • It is an edge-triggered interrupt, and another edge has been
>>> detected since the interrupt was acknowledged.
>>> • It is a level-sensitive interrupt, and the level has not been
>>> deasserted since the interrupt was acknowledged."
>>>
>>> GIC v2 specification IHI0048B.b has similar description on page
>>> 3-42 for state machine transition.
>>>
>>> When some VFIO device, like mtty(8250 VFIO mdev emulation driver
>>> in samples/vfio-mdev) triggers a level interrupt, the status
>>> transition in LR is pending-->active-->active and pending.
>>> Then it will wait resampling to de-assert the interrupt.
>>>
>>> Current design of lr_signals_eoi_mi() will return false if state
>>> in LR is not invalid(Inactive). It causes resampling will not
>>> happen
>>> in mtty case.
>>>
>>> This will cause interrupt fired continuously to guest even 8250 IIR
>>> has no interrupt. When 8250's interrupt is configured in shared
>>> mode,
>>> it will pass interrupt to other drivers to handle. However, there
>>> is no other driver involved. Then, a "nobody cared" kernel
>>> complaint
>>> occurs.
>>>
>>> / # cat /dev/ttyS0
>>> [    4.826836] random: crng init done
>>> [    6.373620] irq 41: nobody cared (try booting with the "irqpoll"
>>> option)
>>> [    6.376414] CPU: 0 PID: 1307 Comm: cat Not tainted 4.16.0-rc4 #4
>>> [    6.378927] Hardware name: linux,dummy-virt (DT)
>>> [    6.380876] Call trace:
>>> [    6.381937]  dump_backtrace+0x0/0x180
>>> [    6.383495]  show_stack+0x14/0x1c
>>> [    6.384902]  dump_stack+0x90/0xb4
>>> [    6.386312]  __report_bad_irq+0x38/0xe0
>>> [    6.387944]  note_interrupt+0x1f4/0x2b8
>>> [    6.389568]  handle_irq_event_percpu+0x54/0x7c
>>> [    6.391433]  handle_irq_event+0x44/0x74
>>> [    6.393056]  handle_fasteoi_irq+0x9c/0x154
>>> [    6.394784]  generic_handle_irq+0x24/0x38
>>> [    6.396483]  __handle_domain_irq+0x60/0xb4
>>> [    6.398207]  gic_handle_irq+0x98/0x1b0
>>> [    6.399796]  el1_irq+0xb0/0x128
>>> [    6.401138]  _raw_spin_unlock_irqrestore+0x18/0x40
>>> [    6.403149]  __setup_irq+0x41c/0x678
>>> [    6.404669]  request_threaded_irq+0xe0/0x190
>>> [    6.406474]  univ8250_setup_irq+0x208/0x234
>>> [    6.408250]  serial8250_do_startup+0x1b4/0x754
>>> [    6.410123]  serial8250_startup+0x20/0x28
>>> [    6.411826]  uart_startup.part.21+0x78/0x144
>>> [    6.413633]  uart_port_activate+0x50/0x68
>>> [    6.415328]  tty_port_open+0x84/0xd4
>>> [    6.416851]  uart_open+0x34/0x44
>>> [    6.418229]  tty_open+0xec/0x3c8
>>> [    6.419610]  chrdev_open+0xb0/0x198
>>> [    6.421093]  do_dentry_open+0x200/0x310
>>> [    6.422714]  vfs_open+0x54/0x84
>>> [    6.424054]  path_openat+0x2dc/0xf04
>>> [    6.425569]  do_filp_open+0x68/0xd8
>>> [    6.427044]  do_sys_open+0x16c/0x224
>>> [    6.428563]  SyS_openat+0x10/0x18
>>> [    6.429972]  el0_svc_naked+0x30/0x34
>>> [    6.431494] handlers:
>>> [    6.432479] [<000000000e9fb4bb>] serial8250_interrupt
>>> [    6.434597] Disabling IRQ #41
>>>
>>> This patch changes the lr state condition in lr_signals_eoi_mi()
>>> from
>>> invalid(Inactive) to active and pending to avoid this.
>>>
>>> I am not sure about the original design of the condition of
>>> invalid(active). So, This RFC is sent out for comments.
>>>
>>> Cc: Joey Zheng <yu.zheng@hxt-semitech.com>
>>> Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
>>> ---
>>>  virt/kvm/arm/vgic/vgic-v2.c | 4 ++--
>>>  virt/kvm/arm/vgic/vgic-v3.c | 4 ++--
>>>  2 files changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-
>>> v2.c
>>> index e9d840a75e7b..740ee9a5f551 100644
>>> --- a/virt/kvm/arm/vgic/vgic-v2.c
>>> +++ b/virt/kvm/arm/vgic/vgic-v2.c
>>> @@ -46,8 +46,8 @@ void vgic_v2_set_underflow(struct kvm_vcpu *vcpu)
>>>  
>>>  static bool lr_signals_eoi_mi(u32 lr_val)
>>>  {
>>> -	return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI)
>>> &&
>>> -	       !(lr_val & GICH_LR_HW);
>>> +	return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) &&
>>> +	       (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW);
>>>  }
>>>  
>>>  /*
>>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-
>>> v3.c
>>> index 6b329414e57a..43111bba7af9 100644
>>> --- a/virt/kvm/arm/vgic/vgic-v3.c
>>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
>>> @@ -35,8 +35,8 @@ void vgic_v3_set_underflow(struct kvm_vcpu *vcpu)
>>>  
>>>  static bool lr_signals_eoi_mi(u64 lr_val)
>>>  {
>>> -	return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI)
>>> &&
>>> -	       !(lr_val & ICH_LR_HW);
>>> +	return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) &&
>>> +	       (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW);
>>
>> In general don't we have this state transition
>>
>> inactive -> pending -> pending + active (1) -> active -> inactive.
>>
>> In that case won't we lower the virt irq level when folding the LR on
>> Pending + Active state, which is not was we want?
>>
>> Thanks
>>
>> Eric
> 
> In current code, in my test, when I output LR value of the mtty IRQ 41
> (hwirq = 36) in vgic_v3_fold_lr_state(). The LR's transition starts
> like following,
> 
> 0-->50a0020000000024-->90a0020000000024-->d0a0020000000024
> 
> That is inactive-->pending-->active-->pending + active.
Yes sorry I did a big mixture of virt line level and LR pending state.

I had below case in mind: P -> guest IAR -> A -> exit/entry -> P+A  -> exit

in which case you shouldn't call the resampler.

Thanks

Eric
> Then it keep running cyclic pending-->active-->pending + active.
> 
> The level interrupt de-assert should happen in following code
> /* Notify fds when the guest EOI'ed a level-triggered IRQ */
> 	if (lr_signals_eoi_mi(val) && vgic_valid_spi(vcpu->kvm, intid))
> 		kvm_notify_acked_irq(vcpu->kvm, 0,
> 				     intid - VGIC_NR_PRIVATE_IRQS);
> 
> But as addressed in commit message, lr_signals_eoi_mi() will return
> false if state in LR is not invalid(inactive), so it has no chance to
> de-assert the level interrupt in my test. 
> 
> Thanks.
> Shunyong.
> 
>>
>>>
>>>  }
>>>  
>>>  void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>>>
Christoffer Dall March 8, 2018, 4:10 p.m. UTC | #6
On Thu, Mar 08, 2018 at 09:49:43AM +0000, Marc Zyngier wrote:
> [updated Christoffer's email address]
> 
> Hi Shunyong,
> 
> On 08/03/18 07:01, Shunyong Yang wrote:
> > When resampling irqfds is enabled, level interrupt should be
> > de-asserted when resampling happens. On page 4-47 of GIC v3
> > specification IHI0069D, it said,
> > "When the PE acknowledges an SGI, a PPI, or an SPI at the CPU
> > interface, the IRI changes the status of the interrupt to active
> > and pending if:
> > • It is an edge-triggered interrupt, and another edge has been
> > detected since the interrupt was acknowledged.
> > • It is a level-sensitive interrupt, and the level has not been
> > deasserted since the interrupt was acknowledged."
> > 
> > GIC v2 specification IHI0048B.b has similar description on page
> > 3-42 for state machine transition.
> > 
> > When some VFIO device, like mtty(8250 VFIO mdev emulation driver
> > in samples/vfio-mdev) triggers a level interrupt, the status
> > transition in LR is pending-->active-->active and pending.
> > Then it will wait resampling to de-assert the interrupt.
> > 
> > Current design of lr_signals_eoi_mi() will return false if state
> > in LR is not invalid(Inactive). It causes resampling will not happen
> > in mtty case.
> 
> Let me rephrase this, and tell me if I understood it correctly:
> 
> - A level interrupt is injected, activated by the guest (LR state=active)
> - guest exits, re-enters, (LR state=pending+active)
> - guest EOIs the interrupt (LR state=pending)
> - maintenance interrupt
> - we don't signal the resampling because we're not in an invalid state
> 
> Is that correct?
> 
> That's an interesting case, because it seems to invalidate some of the 
> optimization that went in over a year ago.
> 
> 096f31c4360f KVM: arm/arm64: vgic: Get rid of MISR and EISR fields
> b6095b084d87 KVM: arm/arm64: vgic: Get rid of unnecessary save_maint_int_state
> af0614991ab6 KVM: arm/arm64: vgic: Get rid of unnecessary process_maintenance operation
> 
> We could compare the value of the LR before the guest entry with
> the value at exit time, but we still could miss it if we have a
> transition such as P+A -> P -> A and assume a long enough propagation
> delay for the maintenance interrupt (which is very likely).
> 
> In essence, we have lost the benefit of EISR, which was to give us a
> way to deal with asynchronous signalling.
> 

I don't understand why EISR gives us anything beyond looking at the LR
and evaluating if the state is 00.  My reading of the spec is that the
EISR is merely a shortcut to knowing the state of the LRs but contains
not record or information beyond what you can read from the LRs.

What am I missing?

Thanks,
-Christoffer
diff mbox

Patch

diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index e9d840a75e7b..740ee9a5f551 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -46,8 +46,8 @@  void vgic_v2_set_underflow(struct kvm_vcpu *vcpu)
 
 static bool lr_signals_eoi_mi(u32 lr_val)
 {
-	return !(lr_val & GICH_LR_STATE) && (lr_val & GICH_LR_EOI) &&
-	       !(lr_val & GICH_LR_HW);
+	return !((lr_val & GICH_LR_STATE) ^ GICH_LR_STATE) &&
+	       (lr_val & GICH_LR_EOI) && !(lr_val & GICH_LR_HW);
 }
 
 /*
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index 6b329414e57a..43111bba7af9 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -35,8 +35,8 @@  void vgic_v3_set_underflow(struct kvm_vcpu *vcpu)
 
 static bool lr_signals_eoi_mi(u64 lr_val)
 {
-	return !(lr_val & ICH_LR_STATE) && (lr_val & ICH_LR_EOI) &&
-	       !(lr_val & ICH_LR_HW);
+	return !((lr_val & ICH_LR_STATE) ^ ICH_LR_STATE) &&
+	       (lr_val & ICH_LR_EOI) && !(lr_val & ICH_LR_HW);
 }
 
 void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)