Message ID | 5CD2D545020000780022CD3D@prv1-mh.provo.novell.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86: IRQ management adjustments | expand |
On Wed, May 08, 2019 at 07:10:29AM -0600, Jan Beulich wrote: > Mixed meaning was implied so far by different pieces of code - > disagreement was in particular about whether to expect offline CPUs' > bits to possibly be set. Switch to a mostly consistent meaning > (exception being high priority interrupts, which would perhaps better > be switched to the same model as well in due course). Use the field to > record the vector allocation mask, i.e. potentially including bits of > offline (parked) CPUs. This implies that before passing the mask to > certain functions (most notably cpu_mask_to_apicid()) it needs to be > further reduced to the online subset. > > The exception of high priority interrupts is also why for the moment > _bind_irq_vector() is left as is, despite looking wrong: It's used > exclusively for IRQ0, which isn't supposed to move off CPU0 at any time. > > The prior lack of restricting to online CPUs in set_desc_affinity() > before calling cpu_mask_to_apicid() in particular allowed (in x2APIC > clustered mode) offlined CPUs to end up enabled in an IRQ's destination > field. (I wonder whether vector_allocation_cpumask_flat() shouldn't > follow a similar model, using cpu_present_map in favor of > cpu_online_map.) > > For IO-APIC code it was definitely wrong to potentially store, as a > fallback, TARGET_CPUS (i.e. all online ones) into the field, as that > would have caused problems when determining on which CPUs to release > vectors when they've gone out of use. Disable interrupts instead when > no valid target CPU can be established (which code elsewhere should > guarantee to never happen), and log a message in such an unlikely event. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Thanks. Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Some comments below. > --- > v2: New. > > --- a/xen/arch/x86/io_apic.c > +++ b/xen/arch/x86/io_apic.c > @@ -680,7 +680,7 @@ void /*__init*/ setup_ioapic_dest(void) > continue; > irq = pin_2_irq(irq_entry, ioapic, pin); > desc = irq_to_desc(irq); > - BUG_ON(cpumask_empty(desc->arch.cpu_mask)); > + BUG_ON(!cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map)); I wonder if maybe you could instead do: if ( cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map) ) set_ioapic_affinity_irq(desc, desc->arch.cpu_mask); else ASSERT_UNREACHABLE(); I guess if the IRQ is in use by Xen itself the failure ought to be fatal. > set_ioapic_affinity_irq(desc, desc->arch.cpu_mask); > } > > @@ -2197,7 +2197,6 @@ int io_apic_set_pci_routing (int ioapic, > { > struct irq_desc *desc = irq_to_desc(irq); > struct IO_APIC_route_entry entry; > - cpumask_t mask; > unsigned long flags; > int vector; > > @@ -2232,11 +2231,17 @@ int io_apic_set_pci_routing (int ioapic, > return vector; > entry.vector = vector; > > - cpumask_copy(&mask, TARGET_CPUS); > - /* Don't chance ending up with an empty mask. */ > - if (cpumask_intersects(&mask, desc->arch.cpu_mask)) > - cpumask_and(&mask, &mask, desc->arch.cpu_mask); > - SET_DEST(entry, logical, cpu_mask_to_apicid(&mask)); > + if (cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS)) { > + cpumask_t *mask = this_cpu(scratch_cpumask); > + > + cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS); > + SET_DEST(entry, logical, cpu_mask_to_apicid(mask)); > + } else { > + printk(XENLOG_ERR "IRQ%d: no target CPU (%*pb vs %*pb)\n", > + irq, nr_cpu_ids, cpumask_bits(desc->arch.cpu_mask), > + nr_cpu_ids, cpumask_bits(TARGET_CPUS)); > + desc->status |= IRQ_DISABLED; > + } Hm, part of this file doesn't seem to use Xen coding style, but the chunk you add below does use it. And there are functions (like mask_and_ack_level_ioapic_irq that seem to use a mix of coding styles). I'm not sure what's the policy here, should new chunks follow Xen's coding style? > > apic_printk(APIC_DEBUG, KERN_DEBUG "IOAPIC[%d]: Set PCI routing entry " > "(%d-%d -> %#x -> IRQ %d Mode:%i Active:%i)\n", ioapic, > @@ -2422,7 +2427,21 @@ int ioapic_guest_write(unsigned long phy > /* Set the vector field to the real vector! */ > rte.vector = desc->arch.vector; > > - SET_DEST(rte, logical, cpu_mask_to_apicid(desc->arch.cpu_mask)); > + if ( cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS) ) > + { > + cpumask_t *mask = this_cpu(scratch_cpumask); > + > + cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS); > + SET_DEST(rte, logical, cpu_mask_to_apicid(mask)); > + } > + else > + { > + gprintk(XENLOG_ERR, "IRQ%d: no target CPU (%*pb vs %*pb)\n", > + irq, nr_cpu_ids, cpumask_bits(desc->arch.cpu_mask), > + nr_cpu_ids, cpumask_bits(TARGET_CPUS)); > + desc->status |= IRQ_DISABLED; > + rte.mask = 1; > + } > > __ioapic_write_entry(apic, pin, 0, rte); > > --- a/xen/arch/x86/irq.c > +++ b/xen/arch/x86/irq.c > @@ -471,11 +471,13 @@ static int __assign_irq_vector( > */ > static int current_vector = FIRST_DYNAMIC_VECTOR, current_offset = 0; > int cpu, err, old_vector; > - cpumask_t tmp_mask; > vmask_t *irq_used_vectors = NULL; > > old_vector = irq_to_vector(irq); > - if (old_vector > 0) { > + if ( old_vector > 0 ) Another candidate to switch to valid_irq_vector or at least make an explicit comparison with IRQ_VECTOR_UNASSIGNED. Seeing your reply to my comment in that direction on a previous patch this can be done as a follow up. Roger.
>>> On 13.05.19 at 13:32, <roger.pau@citrix.com> wrote: > On Wed, May 08, 2019 at 07:10:29AM -0600, Jan Beulich wrote: >> --- a/xen/arch/x86/io_apic.c >> +++ b/xen/arch/x86/io_apic.c >> @@ -680,7 +680,7 @@ void /*__init*/ setup_ioapic_dest(void) >> continue; >> irq = pin_2_irq(irq_entry, ioapic, pin); >> desc = irq_to_desc(irq); >> - BUG_ON(cpumask_empty(desc->arch.cpu_mask)); >> + BUG_ON(!cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map)); > > I wonder if maybe you could instead do: > > if ( cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map) ) > set_ioapic_affinity_irq(desc, desc->arch.cpu_mask); > else > ASSERT_UNREACHABLE(); > > I guess if the IRQ is in use by Xen itself the failure ought to be > fatal. And imo also when it's another one (used by Dom0). Iirc we get here only during Dom0 boot (the commented out __init serving as a hint). Hence I think BUG_ON() is better in this case than any for of assertion. >> @@ -2232,11 +2231,17 @@ int io_apic_set_pci_routing (int ioapic, >> return vector; >> entry.vector = vector; >> >> - cpumask_copy(&mask, TARGET_CPUS); >> - /* Don't chance ending up with an empty mask. */ >> - if (cpumask_intersects(&mask, desc->arch.cpu_mask)) >> - cpumask_and(&mask, &mask, desc->arch.cpu_mask); >> - SET_DEST(entry, logical, cpu_mask_to_apicid(&mask)); >> + if (cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS)) { >> + cpumask_t *mask = this_cpu(scratch_cpumask); >> + >> + cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS); >> + SET_DEST(entry, logical, cpu_mask_to_apicid(mask)); >> + } else { >> + printk(XENLOG_ERR "IRQ%d: no target CPU (%*pb vs %*pb)\n", >> + irq, nr_cpu_ids, cpumask_bits(desc->arch.cpu_mask), >> + nr_cpu_ids, cpumask_bits(TARGET_CPUS)); >> + desc->status |= IRQ_DISABLED; >> + } > > Hm, part of this file doesn't seem to use Xen coding style, but the > chunk you add below does use it. And there are functions (like > mask_and_ack_level_ioapic_irq that seem to use a mix of coding > styles). > > I'm not sure what's the policy here, should new chunks follow Xen's > coding style? Well, I've decided to match surrounding code's style, until the file gets morphed into consistent shape. Jan
--- a/xen/arch/x86/io_apic.c +++ b/xen/arch/x86/io_apic.c @@ -680,7 +680,7 @@ void /*__init*/ setup_ioapic_dest(void) continue; irq = pin_2_irq(irq_entry, ioapic, pin); desc = irq_to_desc(irq); - BUG_ON(cpumask_empty(desc->arch.cpu_mask)); + BUG_ON(!cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map)); set_ioapic_affinity_irq(desc, desc->arch.cpu_mask); } @@ -2197,7 +2197,6 @@ int io_apic_set_pci_routing (int ioapic, { struct irq_desc *desc = irq_to_desc(irq); struct IO_APIC_route_entry entry; - cpumask_t mask; unsigned long flags; int vector; @@ -2232,11 +2231,17 @@ int io_apic_set_pci_routing (int ioapic, return vector; entry.vector = vector; - cpumask_copy(&mask, TARGET_CPUS); - /* Don't chance ending up with an empty mask. */ - if (cpumask_intersects(&mask, desc->arch.cpu_mask)) - cpumask_and(&mask, &mask, desc->arch.cpu_mask); - SET_DEST(entry, logical, cpu_mask_to_apicid(&mask)); + if (cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS)) { + cpumask_t *mask = this_cpu(scratch_cpumask); + + cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS); + SET_DEST(entry, logical, cpu_mask_to_apicid(mask)); + } else { + printk(XENLOG_ERR "IRQ%d: no target CPU (%*pb vs %*pb)\n", + irq, nr_cpu_ids, cpumask_bits(desc->arch.cpu_mask), + nr_cpu_ids, cpumask_bits(TARGET_CPUS)); + desc->status |= IRQ_DISABLED; + } apic_printk(APIC_DEBUG, KERN_DEBUG "IOAPIC[%d]: Set PCI routing entry " "(%d-%d -> %#x -> IRQ %d Mode:%i Active:%i)\n", ioapic, @@ -2422,7 +2427,21 @@ int ioapic_guest_write(unsigned long phy /* Set the vector field to the real vector! */ rte.vector = desc->arch.vector; - SET_DEST(rte, logical, cpu_mask_to_apicid(desc->arch.cpu_mask)); + if ( cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS) ) + { + cpumask_t *mask = this_cpu(scratch_cpumask); + + cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS); + SET_DEST(rte, logical, cpu_mask_to_apicid(mask)); + } + else + { + gprintk(XENLOG_ERR, "IRQ%d: no target CPU (%*pb vs %*pb)\n", + irq, nr_cpu_ids, cpumask_bits(desc->arch.cpu_mask), + nr_cpu_ids, cpumask_bits(TARGET_CPUS)); + desc->status |= IRQ_DISABLED; + rte.mask = 1; + } __ioapic_write_entry(apic, pin, 0, rte); --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -471,11 +471,13 @@ static int __assign_irq_vector( */ static int current_vector = FIRST_DYNAMIC_VECTOR, current_offset = 0; int cpu, err, old_vector; - cpumask_t tmp_mask; vmask_t *irq_used_vectors = NULL; old_vector = irq_to_vector(irq); - if (old_vector > 0) { + if ( old_vector > 0 ) + { + cpumask_t tmp_mask; + cpumask_and(&tmp_mask, mask, &cpu_online_map); if (cpumask_intersects(&tmp_mask, desc->arch.cpu_mask)) { desc->arch.vector = old_vector; @@ -498,7 +500,9 @@ static int __assign_irq_vector( else irq_used_vectors = irq_get_used_vector_mask(irq); - for_each_cpu(cpu, mask) { + for_each_cpu(cpu, mask) + { + const cpumask_t *vec_mask; int new_cpu; int vector, offset; @@ -506,8 +510,7 @@ static int __assign_irq_vector( if (!cpu_online(cpu)) continue; - cpumask_and(&tmp_mask, vector_allocation_cpumask(cpu), - &cpu_online_map); + vec_mask = vector_allocation_cpumask(cpu); vector = current_vector; offset = current_offset; @@ -528,7 +531,7 @@ next: && test_bit(vector, irq_used_vectors) ) goto next; - for_each_cpu(new_cpu, &tmp_mask) + for_each_cpu(new_cpu, vec_mask) if (per_cpu(vector_irq, new_cpu)[vector] >= 0) goto next; /* Found one! */ @@ -547,12 +550,12 @@ next: release_old_vec(desc); } - trace_irq_mask(TRC_HW_IRQ_ASSIGN_VECTOR, irq, vector, &tmp_mask); + trace_irq_mask(TRC_HW_IRQ_ASSIGN_VECTOR, irq, vector, vec_mask); - for_each_cpu(new_cpu, &tmp_mask) + for_each_cpu(new_cpu, vec_mask) per_cpu(vector_irq, new_cpu)[vector] = irq; desc->arch.vector = vector; - cpumask_copy(desc->arch.cpu_mask, &tmp_mask); + cpumask_copy(desc->arch.cpu_mask, vec_mask); desc->arch.used = IRQ_USED; ASSERT((desc->arch.used_vectors == NULL) @@ -783,6 +786,7 @@ unsigned int set_desc_affinity(struct ir cpumask_copy(desc->affinity, mask); cpumask_and(&dest_mask, mask, desc->arch.cpu_mask); + cpumask_and(&dest_mask, &dest_mask, &cpu_online_map); return cpu_mask_to_apicid(&dest_mask); } --- a/xen/include/asm-x86/irq.h +++ b/xen/include/asm-x86/irq.h @@ -32,6 +32,12 @@ struct irq_desc; struct arch_irq_desc { s16 vector; /* vector itself is only 8 bits, */ s16 old_vector; /* but we use -1 for unassigned */ + /* + * Except for high priority interrupts @cpu_mask may have bits set for + * offline CPUs. Consumers need to be careful to mask this down to + * online ones as necessary. There is supposed to always be a non- + * empty intersection with cpu_online_map. + */ cpumask_var_t cpu_mask; cpumask_var_t old_cpu_mask; cpumask_var_t pending_mask;
Mixed meaning was implied so far by different pieces of code - disagreement was in particular about whether to expect offline CPUs' bits to possibly be set. Switch to a mostly consistent meaning (exception being high priority interrupts, which would perhaps better be switched to the same model as well in due course). Use the field to record the vector allocation mask, i.e. potentially including bits of offline (parked) CPUs. This implies that before passing the mask to certain functions (most notably cpu_mask_to_apicid()) it needs to be further reduced to the online subset. The exception of high priority interrupts is also why for the moment _bind_irq_vector() is left as is, despite looking wrong: It's used exclusively for IRQ0, which isn't supposed to move off CPU0 at any time. The prior lack of restricting to online CPUs in set_desc_affinity() before calling cpu_mask_to_apicid() in particular allowed (in x2APIC clustered mode) offlined CPUs to end up enabled in an IRQ's destination field. (I wonder whether vector_allocation_cpumask_flat() shouldn't follow a similar model, using cpu_present_map in favor of cpu_online_map.) For IO-APIC code it was definitely wrong to potentially store, as a fallback, TARGET_CPUS (i.e. all online ones) into the field, as that would have caused problems when determining on which CPUs to release vectors when they've gone out of use. Disable interrupts instead when no valid target CPU can be established (which code elsewhere should guarantee to never happen), and log a message in such an unlikely event. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- v2: New.