diff mbox

AVIC warning - what does it mean?

Message ID 20170908133100.GA7160@flask (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Radim Krčmář Sept. 8, 2017, 1:31 p.m. UTC
2017-09-08 13:23+0200, Andreas Hartmann:
> On 09/08/2017 at 12:58 PM Mason wrote:
> > On 08/09/2017 11:58, Andreas Hartmann wrote:
> > 
> >> If AVIC is enabled, I'm usually getting one warning like the one below a
> >> few minutes after the VM has been started.
> >>
> >> What does it mean?
> >>
> >> There isn't any problem with that VM - it's just working as expected. If
> >> AVIC is disabled, the warning doesn't come up and the VM works fine, too.
> >>
> >> Base is a Ryzen system (RMA'd CPU w/o gcc segfaults meanwhile and X370
> >> chipset (ASUS Prime X370-PRO with Bios 0810)). Linux is 4.9.48.
> >>
> >>
> >> [ 1603.482692] ------------[ cut here ]------------
> >> [ 1603.482702] WARNING: CPU: 1 PID: 2936 at ../arch/x86/kvm/svm.c:1484 avic_vcpu_load+0x162/0x190 [kvm_amd]
> >> [ 1603.482703] Modules linked in: sr_mod cdrom uas usb_storage vhost_net tun vhost macvtap macvlan igb dca nf_log_ipv4 nf_log_common xt_LOG ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables vfio_pci vfio_iommu_type1 vfio_virqfd vfio br_netfilter bridge stp llc iscsi_ibft iscsi_boot_sysfs snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep kvm_amd snd_pcm eeepc_wmi asus_wmi kvm sparse_keymap snd_seq rfkill video snd_seq_device mxm_wmi irqbypass snd_timer sp5100_tco i2c_piix4 snd pcspkr e1000e soundcore ptp pps_core fjes pinctrl_amd gpio_amdpt acpi_cpufreq gpio_generic 8250_dw tpm_tis shpchp i2c_designware_platform wmi i2c_designware_core tpm_tis_core button tpm nfsd auth_rp
> >>   cgss nfs_acl
> >> [ 1603.482740]  lockd grace sunrpc xfs libcrc32c dm_crypt algif_skcipher af_alg hid_generic usbhid raid1 md_mod amdkfd amd_iommu_v2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel radeon i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm serio_raw drm ccp xhci_pci xhci_hcd usbcore aesni_intel aes_x86_64 glue_helper lrw ablk_helper cryptd ata_generic pata_atiixp dm_mirror dm_region_hash dm_log sg thermal dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
> >> [ 1603.482766] CPU: 1 PID: 2936 Comm: CPU 2/KVM Not tainted 4.9.48-1.1-default #1
> >> [ 1603.482767] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 0810 08/01/2017
> >> [ 1603.482768]  ffffab614aa4fcb8 ffffffff893c14fa 0000000000000000 0000000000000000
> >> [ 1603.482772]  ffffab614aa4fcf8 ffffffff89085da1 000005cc004b6f7f 0000000000000001
> >> [ 1603.482774]  c0000007ea4bc001 0000000000000001 000001753dd1e17d ffff9e84ea7e0140
> >> [ 1603.482776] Call Trace:
> >> [ 1603.482798]  [<ffffffff893c14fa>] dump_stack+0x63/0x89
> >> [ 1603.482804]  [<ffffffff89085da1>] __warn+0xd1/0xf0
> >> [ 1603.482809]  [<ffffffff89085e8d>] warn_slowpath_null+0x1d/0x20
> >> [ 1603.482813]  [<ffffffffc15ade42>] avic_vcpu_load+0x162/0x190 [kvm_amd]
> >> [ 1603.482849]  [<ffffffffc0950da4>] ? kvm_cpu_has_interrupt+0x44/0x50 [kvm]
> >> [ 1603.482852]  [<ffffffffc15ae138>] svm_vcpu_unblocking+0x18/0x20 [kvm_amd]
> >> [ 1603.482868]  [<ffffffffc091b424>] kvm_vcpu_block+0xc4/0x310 [kvm]
> >> [ 1603.482889]  [<ffffffffc0937fac>] kvm_arch_vcpu_ioctl_run+0x19c/0x400 [kvm]
> >> [ 1603.482906]  [<ffffffffc091cc42>] kvm_vcpu_ioctl+0x312/0x5e0 [kvm]
> >> [ 1603.482913]  [<ffffffff8910d10f>] ? do_futex+0x10f/0x510
> >> [ 1603.482916]  [<ffffffff8923ec06>] do_vfs_ioctl+0x96/0x5b0
> >> [ 1603.482921]  [<ffffffff89249767>] ? __fget+0x77/0xb0
> >> [ 1603.482922]  [<ffffffff8923f199>] SyS_ioctl+0x79/0x90
> >> [ 1603.482927]  [<ffffffff8900393b>] do_syscall_64+0x5b/0xd0
> >> [ 1603.482932]  [<ffffffff89719c2f>] entry_SYSCALL64_slow_path+0x25/0x25
> >> [ 1603.484799] ---[ end trace d24b045704bb6ae7 ]---
> > 
> > For what it's worth...
> > 
> > http://elixir.free-electrons.com/linux/v4.9.48/source/arch/x86/kvm/svm.c#L1484
> > 
> >     entry = READ_ONCE(*(svm->avic_physical_id_cache));
> >     WARN_ON(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);
> > 
> > Added by commit 8221c13700561
> > 
> >     svm: Manage vcpu load/unload when enable AVIC
> >         When a vcpu is loaded/unloaded to a physical core, we need to update
> >     host physical APIC ID information in the Physical APIC-ID table
> >     accordingly.
> >         Also, when vCPU is blocking/un-blocking (due to halt instruction),
> >     we need to make sure that the is-running bit in set accordingly in the
> >     physical APIC-ID table.
> 
> Thanks - I already found this - but I don't know how to estimate it. 
> 
> Should I worry about? Or is it just informative? As it is working 
> afterwards the same as before, it probably isn't critical? 
> Could the WARN_ON be removed?

I'm thinking this happens if the VCPU is preempted in avic_set_running()
after 'svm->avic_is_running = true' and before avic_vcpu_load().
avic_vcpu_load() is then called from both avic_set_running() and
svm_vcpu_load().

The warning does not reveal any harmful bug.  Disabling preemption in
avic_set_running() would allow us to keep the protective warning.
(It seems to be slightly better than remembeging and restoring the avic
 state on sched out/in.)

Can you try the following patch?

Thanks.

Comments

Andreas Hartmann Sept. 8, 2017, 4:46 p.m. UTC | #1
On 09/08/2017 at 03:31 PM Radim Krcmar wrote:
> 2017-09-08 13:23+0200, Andreas Hartmann:
>>> On 08/09/2017 11:58, Andreas Hartmann wrote:
>>>
>>>> If AVIC is enabled, I'm usually getting one warning like the one below a
>>>> few minutes after the VM has been started.
>>>>
>>>> What does it mean?
>>>>
>>>> There isn't any problem with that VM - it's just working as expected. If
>>>> AVIC is disabled, the warning doesn't come up and the VM works fine, too.
>>>>
>>>> Base is a Ryzen system (RMA'd CPU w/o gcc segfaults meanwhile and X370
>>>> chipset (ASUS Prime X370-PRO with Bios 0810)). Linux is 4.9.48.
>>>>
>>>>
>>>> [ 1603.482692] ------------[ cut here ]------------
>>>> [ 1603.482702] WARNING: CPU: 1 PID: 2936 at ../arch/x86/kvm/svm.c:1484 avic_vcpu_load+0x162/0x190 [kvm_amd]
>>>> [ 1603.482703] Modules linked in: sr_mod cdrom uas usb_storage vhost_net tun vhost macvtap macvlan igb dca nf_log_ipv4 nf_log_common xt_LOG ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables vfio_pci vfio_iommu_type1 vfio_virqfd vfio br_netfilter bridge stp llc iscsi_ibft iscsi_boot_sysfs snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep kvm_amd snd_pcm eeepc_wmi asus_wmi kvm sparse_keymap snd_seq rfkill video snd_seq_device mxm_wmi irqbypass snd_timer sp5100_tco i2c_piix4 snd pcspkr e1000e soundcore ptp pps_core fjes pinctrl_amd gpio_amdpt acpi_cpufreq gpio_generic 8250_dw tpm_tis shpchp i2c_designware_platform wmi i2c_designware_core tpm_tis_core button tpm nfsd auth_rp
>>>>   cgss nfs_acl
>>>> [ 1603.482740]  lockd grace sunrpc xfs libcrc32c dm_crypt algif_skcipher af_alg hid_generic usbhid raid1 md_mod amdkfd amd_iommu_v2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel radeon i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm serio_raw drm ccp xhci_pci xhci_hcd usbcore aesni_intel aes_x86_64 glue_helper lrw ablk_helper cryptd ata_generic pata_atiixp dm_mirror dm_region_hash dm_log sg thermal dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
>>>> [ 1603.482766] CPU: 1 PID: 2936 Comm: CPU 2/KVM Not tainted 4.9.48-1.1-default #1
>>>> [ 1603.482767] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 0810 08/01/2017
>>>> [ 1603.482768]  ffffab614aa4fcb8 ffffffff893c14fa 0000000000000000 0000000000000000
>>>> [ 1603.482772]  ffffab614aa4fcf8 ffffffff89085da1 000005cc004b6f7f 0000000000000001
>>>> [ 1603.482774]  c0000007ea4bc001 0000000000000001 000001753dd1e17d ffff9e84ea7e0140
>>>> [ 1603.482776] Call Trace:
>>>> [ 1603.482798]  [<ffffffff893c14fa>] dump_stack+0x63/0x89
>>>> [ 1603.482804]  [<ffffffff89085da1>] __warn+0xd1/0xf0
>>>> [ 1603.482809]  [<ffffffff89085e8d>] warn_slowpath_null+0x1d/0x20
>>>> [ 1603.482813]  [<ffffffffc15ade42>] avic_vcpu_load+0x162/0x190 [kvm_amd]
>>>> [ 1603.482849]  [<ffffffffc0950da4>] ? kvm_cpu_has_interrupt+0x44/0x50 [kvm]
>>>> [ 1603.482852]  [<ffffffffc15ae138>] svm_vcpu_unblocking+0x18/0x20 [kvm_amd]
>>>> [ 1603.482868]  [<ffffffffc091b424>] kvm_vcpu_block+0xc4/0x310 [kvm]
>>>> [ 1603.482889]  [<ffffffffc0937fac>] kvm_arch_vcpu_ioctl_run+0x19c/0x400 [kvm]
>>>> [ 1603.482906]  [<ffffffffc091cc42>] kvm_vcpu_ioctl+0x312/0x5e0 [kvm]
>>>> [ 1603.482913]  [<ffffffff8910d10f>] ? do_futex+0x10f/0x510
>>>> [ 1603.482916]  [<ffffffff8923ec06>] do_vfs_ioctl+0x96/0x5b0
>>>> [ 1603.482921]  [<ffffffff89249767>] ? __fget+0x77/0xb0
>>>> [ 1603.482922]  [<ffffffff8923f199>] SyS_ioctl+0x79/0x90
>>>> [ 1603.482927]  [<ffffffff8900393b>] do_syscall_64+0x5b/0xd0
>>>> [ 1603.482932]  [<ffffffff89719c2f>] entry_SYSCALL64_slow_path+0x25/0x25
>>>> [ 1603.484799] ---[ end trace d24b045704bb6ae7 ]---

[...]

> I'm thinking this happens if the VCPU is preempted in avic_set_running()
> after 'svm->avic_is_running = true' and before avic_vcpu_load().
> avic_vcpu_load() is then called from both avic_set_running() and
> svm_vcpu_load().
> 
> The warning does not reveal any harmful bug.  Disabling preemption in
> avic_set_running() would allow us to keep the protective warning.
> (It seems to be slightly better than remembeging and restoring the avic
>  state on sched out/in.)
> 
> Can you try the following patch?

I'm currently testing. I'll be back after I'm sure that the patch fixes 
the problem (or not).


Thanks,
regards,
Andreas

> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 968f38dcb864..6b123f375d66 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1572,11 +1572,15 @@ static void avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> +	preempt_disable();
> +
>  	svm->avic_is_running = is_run;
>  	if (is_run)
>  		avic_vcpu_load(vcpu, vcpu->cpu);
>  	else
>  		avic_vcpu_put(vcpu);
> +
> +	preempt_enable();
>  }
>  
>  static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>
Andreas Hartmann Sept. 9, 2017, 11:09 a.m. UTC | #2
Hello Radim,

I tested your patch against 4.12.11 and 4.9.47. The warnings are gone.

But there is a side effect I faced during testing with netperf: The
tests with a passed through GB Ethernet interface showed massive
performance reduction (until 50% during a test period of about 10
minutes - each run repeatedly 10 seconds: TCP_STREAM / TCP_MAERTS /
TCP_SENDFILE).

I even could see rx overruns.

W/o the patch, the performance is flaky, too, but not as bad as with the patch.

Best is to disable AVIC at all. There are no overruns and throughput is
always between 105 and 112 MBytes/s as expected.

The big question therefore is: why should I use AVIC?

Passed through interface (as seen by VM):

00:06.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
        Subsystem: ASUSTeK Computer Inc. Device 85f0
        Physical Slot: 6
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 11
        Region 0: Memory at febc0000 (32-bit, non-prefetchable) [size=128K]
        Region 2: I/O ports at c0a0 [size=32]
        Region 3: Memory at febf0000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00002000
        Capabilities: [a0] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #6, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <16us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Kernel driver in use: igb
        Kernel modules: igb


VM: Centos 7


Thanks,
kind regards,
Andreas


On 09/08/2017 at 03:31 PM Radim Krcmar wrote:
> 2017-09-08 13:23+0200, Andreas Hartmann:
>> On 09/08/2017 at 12:58 PM Mason wrote:
>>> On 08/09/2017 11:58, Andreas Hartmann wrote:
>>>
>>>> If AVIC is enabled, I'm usually getting one warning like the one below a
>>>> few minutes after the VM has been started.
>>>>
>>>> What does it mean?
>>>>
>>>> There isn't any problem with that VM - it's just working as expected. If
>>>> AVIC is disabled, the warning doesn't come up and the VM works fine, too.
>>>>
>>>> Base is a Ryzen system (RMA'd CPU w/o gcc segfaults meanwhile and X370
>>>> chipset (ASUS Prime X370-PRO with Bios 0810)). Linux is 4.9.48.
>>>>
>>>>
>>>> [ 1603.482692] ------------[ cut here ]------------
>>>> [ 1603.482702] WARNING: CPU: 1 PID: 2936 at ../arch/x86/kvm/svm.c:1484 avic_vcpu_load+0x162/0x190 [kvm_amd]
>>>> [ 1603.482703] Modules linked in: sr_mod cdrom uas usb_storage vhost_net tun vhost macvtap macvlan igb dca nf_log_ipv4 nf_log_common xt_LOG ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables vfio_pci vfio_iommu_type1 vfio_virqfd vfio br_netfilter bridge stp llc iscsi_ibft iscsi_boot_sysfs snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep kvm_amd snd_pcm eeepc_wmi asus_wmi kvm sparse_keymap snd_seq rfkill video snd_seq_device mxm_wmi irqbypass snd_timer sp5100_tco i2c_piix4 snd pcspkr e1000e soundcore ptp pps_core fjes pinctrl_amd gpio_amdpt acpi_cpufreq gpio_generic 8250_dw tpm_tis shpchp i2c_designware_platform wmi i2c_designware_core tpm_tis_core button tpm nfsd auth_rp
>>>>   cgss nfs_acl
>>>> [ 1603.482740]  lockd grace sunrpc xfs libcrc32c dm_crypt algif_skcipher af_alg hid_generic usbhid raid1 md_mod amdkfd amd_iommu_v2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel radeon i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm serio_raw drm ccp xhci_pci xhci_hcd usbcore aesni_intel aes_x86_64 glue_helper lrw ablk_helper cryptd ata_generic pata_atiixp dm_mirror dm_region_hash dm_log sg thermal dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
>>>> [ 1603.482766] CPU: 1 PID: 2936 Comm: CPU 2/KVM Not tainted 4.9.48-1.1-default #1
>>>> [ 1603.482767] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 0810 08/01/2017
>>>> [ 1603.482768]  ffffab614aa4fcb8 ffffffff893c14fa 0000000000000000 0000000000000000
>>>> [ 1603.482772]  ffffab614aa4fcf8 ffffffff89085da1 000005cc004b6f7f 0000000000000001
>>>> [ 1603.482774]  c0000007ea4bc001 0000000000000001 000001753dd1e17d ffff9e84ea7e0140
>>>> [ 1603.482776] Call Trace:
>>>> [ 1603.482798]  [<ffffffff893c14fa>] dump_stack+0x63/0x89
>>>> [ 1603.482804]  [<ffffffff89085da1>] __warn+0xd1/0xf0
>>>> [ 1603.482809]  [<ffffffff89085e8d>] warn_slowpath_null+0x1d/0x20
>>>> [ 1603.482813]  [<ffffffffc15ade42>] avic_vcpu_load+0x162/0x190 [kvm_amd]
>>>> [ 1603.482849]  [<ffffffffc0950da4>] ? kvm_cpu_has_interrupt+0x44/0x50 [kvm]
>>>> [ 1603.482852]  [<ffffffffc15ae138>] svm_vcpu_unblocking+0x18/0x20 [kvm_amd]
>>>> [ 1603.482868]  [<ffffffffc091b424>] kvm_vcpu_block+0xc4/0x310 [kvm]
>>>> [ 1603.482889]  [<ffffffffc0937fac>] kvm_arch_vcpu_ioctl_run+0x19c/0x400 [kvm]
>>>> [ 1603.482906]  [<ffffffffc091cc42>] kvm_vcpu_ioctl+0x312/0x5e0 [kvm]
>>>> [ 1603.482913]  [<ffffffff8910d10f>] ? do_futex+0x10f/0x510
>>>> [ 1603.482916]  [<ffffffff8923ec06>] do_vfs_ioctl+0x96/0x5b0
>>>> [ 1603.482921]  [<ffffffff89249767>] ? __fget+0x77/0xb0
>>>> [ 1603.482922]  [<ffffffff8923f199>] SyS_ioctl+0x79/0x90
>>>> [ 1603.482927]  [<ffffffff8900393b>] do_syscall_64+0x5b/0xd0
>>>> [ 1603.482932]  [<ffffffff89719c2f>] entry_SYSCALL64_slow_path+0x25/0x25
>>>> [ 1603.484799] ---[ end trace d24b045704bb6ae7 ]---
>>>
>>> For what it's worth...
>>>
>>> http://elixir.free-electrons.com/linux/v4.9.48/source/arch/x86/kvm/svm.c#L1484
>>>
>>>     entry = READ_ONCE(*(svm->avic_physical_id_cache));
>>>     WARN_ON(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);
>>>
>>> Added by commit 8221c13700561
>>>
>>>     svm: Manage vcpu load/unload when enable AVIC
>>>         When a vcpu is loaded/unloaded to a physical core, we need to update
>>>     host physical APIC ID information in the Physical APIC-ID table
>>>     accordingly.
>>>         Also, when vCPU is blocking/un-blocking (due to halt instruction),
>>>     we need to make sure that the is-running bit in set accordingly in the
>>>     physical APIC-ID table.
>>
>> Thanks - I already found this - but I don't know how to estimate it. 
>>
>> Should I worry about? Or is it just informative? As it is working 
>> afterwards the same as before, it probably isn't critical? 
>> Could the WARN_ON be removed?
> 
> I'm thinking this happens if the VCPU is preempted in avic_set_running()
> after 'svm->avic_is_running = true' and before avic_vcpu_load().
> avic_vcpu_load() is then called from both avic_set_running() and
> svm_vcpu_load().
> 
> The warning does not reveal any harmful bug.  Disabling preemption in
> avic_set_running() would allow us to keep the protective warning.
> (It seems to be slightly better than remembeging and restoring the avic
>  state on sched out/in.)
> 
> Can you try the following patch?
> 
> Thanks.
> 
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 968f38dcb864..6b123f375d66 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1572,11 +1572,15 @@ static void avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> +	preempt_disable();
> +
>  	svm->avic_is_running = is_run;
>  	if (is_run)
>  		avic_vcpu_load(vcpu, vcpu->cpu);
>  	else
>  		avic_vcpu_put(vcpu);
> +
> +	preempt_enable();
>  }
>  
>  static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>
Andreas Hartmann Dec. 24, 2017, 4:15 p.m. UTC | #3
On 09/09/2017 at 01:09 PM Andreas Hartmann wrote:
> Hello Radim,
> 
> I tested your patch against 4.12.11 and 4.9.47. The warnings are gone.
> 
> But there is a side effect I faced during testing with netperf: The
> tests with a passed through GB Ethernet interface showed massive
> performance reduction (until 50% during a test period of about 10
> minutes - each run repeatedly 10 seconds: TCP_STREAM / TCP_MAERTS /
> TCP_SENDFILE).
> 
> I even could see rx overruns.
> 
> W/o the patch, the performance is flaky, too, but not as bad as with the patch.
> 
> Best is to disable AVIC at all. There are no overruns and throughput is
> always between 105 and 112 MBytes/s as expected.

I retested w/ linux 4.14.8: no more problems. It's just working as
expected. Throughput is stable and maximal (113 MBytes/s) in both
directions.


Regards,
Andreas

> 
> The big question therefore is: why should I use AVIC?
> 
> Passed through interface (as seen by VM):
> 
> 00:06.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
>         Subsystem: ASUSTeK Computer Inc. Device 85f0
>         Physical Slot: 6
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 11
>         Region 0: Memory at febc0000 (32-bit, non-prefetchable) [size=128K]
>         Region 2: I/O ports at c0a0 [size=32]
>         Region 3: Memory at febf0000 (32-bit, non-prefetchable) [size=16K]
>         Capabilities: [40] Power Management version 3
>                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>         Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
>                 Address: 0000000000000000  Data: 0000
>                 Masking: 00000000  Pending: 00000000
>         Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
>                 Vector table: BAR=3 offset=00000000
>                 PBA: BAR=3 offset=00002000
>         Capabilities: [a0] Express (v2) Endpoint, MSI 00
>                 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
>                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
>                         MaxPayload 128 bytes, MaxReadReq 512 bytes
>                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
>                 LnkCap: Port #6, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <16us
>                         ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>                 DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
>                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
>                 LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
>                          EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>         Kernel driver in use: igb
>         Kernel modules: igb
> 
> 
> VM: Centos 7
> 
> 
> Thanks,
> kind regards,
> Andreas
> 
> 
> On 09/08/2017 at 03:31 PM Radim Krcmar wrote:
>> 2017-09-08 13:23+0200, Andreas Hartmann:
>>> On 09/08/2017 at 12:58 PM Mason wrote:
>>>> On 08/09/2017 11:58, Andreas Hartmann wrote:
>>>>
>>>>> If AVIC is enabled, I'm usually getting one warning like the one below a
>>>>> few minutes after the VM has been started.
>>>>>
>>>>> What does it mean?
>>>>>
>>>>> There isn't any problem with that VM - it's just working as expected. If
>>>>> AVIC is disabled, the warning doesn't come up and the VM works fine, too.
>>>>>
>>>>> Base is a Ryzen system (RMA'd CPU w/o gcc segfaults meanwhile and X370
>>>>> chipset (ASUS Prime X370-PRO with Bios 0810)). Linux is 4.9.48.
>>>>>
>>>>>
>>>>> [ 1603.482692] ------------[ cut here ]------------
>>>>> [ 1603.482702] WARNING: CPU: 1 PID: 2936 at ../arch/x86/kvm/svm.c:1484 avic_vcpu_load+0x162/0x190 [kvm_amd]
>>>>> [ 1603.482703] Modules linked in: sr_mod cdrom uas usb_storage vhost_net tun vhost macvtap macvlan igb dca nf_log_ipv4 nf_log_common xt_LOG ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables vfio_pci vfio_iommu_type1 vfio_virqfd vfio br_netfilter bridge stp llc iscsi_ibft iscsi_boot_sysfs snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep kvm_amd snd_pcm eeepc_wmi asus_wmi kvm sparse_keymap snd_seq rfkill video snd_seq_device mxm_wmi irqbypass snd_timer sp5100_tco i2c_piix4 snd pcspkr e1000e soundcore ptp pps_core fjes pinctrl_amd gpio_amdpt acpi_cpufreq gpio_generic 8250_dw tpm_tis shpchp i2c_designware_platform wmi i2c_designware_core tpm_tis_core button tpm nfsd auth_rp
>>>>>   cgss nfs_acl
>>>>> [ 1603.482740]  lockd grace sunrpc xfs libcrc32c dm_crypt algif_skcipher af_alg hid_generic usbhid raid1 md_mod amdkfd amd_iommu_v2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel radeon i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm serio_raw drm ccp xhci_pci xhci_hcd usbcore aesni_intel aes_x86_64 glue_helper lrw ablk_helper cryptd ata_generic pata_atiixp dm_mirror dm_region_hash dm_log sg thermal dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
>>>>> [ 1603.482766] CPU: 1 PID: 2936 Comm: CPU 2/KVM Not tainted 4.9.48-1.1-default #1
>>>>> [ 1603.482767] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 0810 08/01/2017
>>>>> [ 1603.482768]  ffffab614aa4fcb8 ffffffff893c14fa 0000000000000000 0000000000000000
>>>>> [ 1603.482772]  ffffab614aa4fcf8 ffffffff89085da1 000005cc004b6f7f 0000000000000001
>>>>> [ 1603.482774]  c0000007ea4bc001 0000000000000001 000001753dd1e17d ffff9e84ea7e0140
>>>>> [ 1603.482776] Call Trace:
>>>>> [ 1603.482798]  [<ffffffff893c14fa>] dump_stack+0x63/0x89
>>>>> [ 1603.482804]  [<ffffffff89085da1>] __warn+0xd1/0xf0
>>>>> [ 1603.482809]  [<ffffffff89085e8d>] warn_slowpath_null+0x1d/0x20
>>>>> [ 1603.482813]  [<ffffffffc15ade42>] avic_vcpu_load+0x162/0x190 [kvm_amd]
>>>>> [ 1603.482849]  [<ffffffffc0950da4>] ? kvm_cpu_has_interrupt+0x44/0x50 [kvm]
>>>>> [ 1603.482852]  [<ffffffffc15ae138>] svm_vcpu_unblocking+0x18/0x20 [kvm_amd]
>>>>> [ 1603.482868]  [<ffffffffc091b424>] kvm_vcpu_block+0xc4/0x310 [kvm]
>>>>> [ 1603.482889]  [<ffffffffc0937fac>] kvm_arch_vcpu_ioctl_run+0x19c/0x400 [kvm]
>>>>> [ 1603.482906]  [<ffffffffc091cc42>] kvm_vcpu_ioctl+0x312/0x5e0 [kvm]
>>>>> [ 1603.482913]  [<ffffffff8910d10f>] ? do_futex+0x10f/0x510
>>>>> [ 1603.482916]  [<ffffffff8923ec06>] do_vfs_ioctl+0x96/0x5b0
>>>>> [ 1603.482921]  [<ffffffff89249767>] ? __fget+0x77/0xb0
>>>>> [ 1603.482922]  [<ffffffff8923f199>] SyS_ioctl+0x79/0x90
>>>>> [ 1603.482927]  [<ffffffff8900393b>] do_syscall_64+0x5b/0xd0
>>>>> [ 1603.482932]  [<ffffffff89719c2f>] entry_SYSCALL64_slow_path+0x25/0x25
>>>>> [ 1603.484799] ---[ end trace d24b045704bb6ae7 ]---
>>>>
>>>> For what it's worth...
>>>>
>>>> http://elixir.free-electrons.com/linux/v4.9.48/source/arch/x86/kvm/svm.c#L1484
>>>>
>>>>     entry = READ_ONCE(*(svm->avic_physical_id_cache));
>>>>     WARN_ON(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);
>>>>
>>>> Added by commit 8221c13700561
>>>>
>>>>     svm: Manage vcpu load/unload when enable AVIC
>>>>         When a vcpu is loaded/unloaded to a physical core, we need to update
>>>>     host physical APIC ID information in the Physical APIC-ID table
>>>>     accordingly.
>>>>         Also, when vCPU is blocking/un-blocking (due to halt instruction),
>>>>     we need to make sure that the is-running bit in set accordingly in the
>>>>     physical APIC-ID table.
>>>
>>> Thanks - I already found this - but I don't know how to estimate it. 
>>>
>>> Should I worry about? Or is it just informative? As it is working 
>>> afterwards the same as before, it probably isn't critical? 
>>> Could the WARN_ON be removed?
>>
>> I'm thinking this happens if the VCPU is preempted in avic_set_running()
>> after 'svm->avic_is_running = true' and before avic_vcpu_load().
>> avic_vcpu_load() is then called from both avic_set_running() and
>> svm_vcpu_load().
>>
>> The warning does not reveal any harmful bug.  Disabling preemption in
>> avic_set_running() would allow us to keep the protective warning.
>> (It seems to be slightly better than remembeging and restoring the avic
>>  state on sched out/in.)
>>
>> Can you try the following patch?
>>
>> Thanks.
>>
>>
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 968f38dcb864..6b123f375d66 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -1572,11 +1572,15 @@ static void avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
>>  {
>>  	struct vcpu_svm *svm = to_svm(vcpu);
>>  
>> +	preempt_disable();
>> +
>>  	svm->avic_is_running = is_run;
>>  	if (is_run)
>>  		avic_vcpu_load(vcpu, vcpu->cpu);
>>  	else
>>  		avic_vcpu_put(vcpu);
>> +
>> +	preempt_enable();
>>  }
>>  
>>  static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>>
Bjorn Helgaas Dec. 28, 2017, 9:33 p.m. UTC | #4
[+cc maintainers, lists (not sure why linux-pci was the original recipient)]

On Sun, Dec 24, 2017 at 05:15:59PM +0100, Andreas Hartmann wrote:
> On 09/09/2017 at 01:09 PM Andreas Hartmann wrote:
> > Hello Radim,
> > 
> > I tested your patch against 4.12.11 and 4.9.47. The warnings are gone.
> > 
> > But there is a side effect I faced during testing with netperf: The
> > tests with a passed through GB Ethernet interface showed massive
> > performance reduction (until 50% during a test period of about 10
> > minutes - each run repeatedly 10 seconds: TCP_STREAM / TCP_MAERTS /
> > TCP_SENDFILE).
> > 
> > I even could see rx overruns.
> > 
> > W/o the patch, the performance is flaky, too, but not as bad as with the patch.
> > 
> > Best is to disable AVIC at all. There are no overruns and throughput is
> > always between 105 and 112 MBytes/s as expected.
> 
> I retested w/ linux 4.14.8: no more problems. It's just working as
> expected. Throughput is stable and maximal (113 MBytes/s) in both
> directions.
> 
> 
> Regards,
> Andreas
> 
> > 
> > The big question therefore is: why should I use AVIC?
> > 
> > Passed through interface (as seen by VM):
> > 
> > 00:06.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
> >         Subsystem: ASUSTeK Computer Inc. Device 85f0
> >         Physical Slot: 6
> >         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Latency: 0, Cache Line Size: 64 bytes
> >         Interrupt: pin A routed to IRQ 11
> >         Region 0: Memory at febc0000 (32-bit, non-prefetchable) [size=128K]
> >         Region 2: I/O ports at c0a0 [size=32]
> >         Region 3: Memory at febf0000 (32-bit, non-prefetchable) [size=16K]
> >         Capabilities: [40] Power Management version 3
> >                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> >                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
> >         Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> >                 Address: 0000000000000000  Data: 0000
> >                 Masking: 00000000  Pending: 00000000
> >         Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
> >                 Vector table: BAR=3 offset=00000000
> >                 PBA: BAR=3 offset=00002000
> >         Capabilities: [a0] Express (v2) Endpoint, MSI 00
> >                 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> >                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
> >                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> >                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
> >                         MaxPayload 128 bytes, MaxReadReq 512 bytes
> >                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
> >                 LnkCap: Port #6, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <16us
> >                         ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
> >                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> >                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >                 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >                 DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
> >                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> >                 LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
> >                          EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >         Kernel driver in use: igb
> >         Kernel modules: igb
> > 
> > 
> > VM: Centos 7
> > 
> > 
> > Thanks,
> > kind regards,
> > Andreas
> > 
> > 
> > On 09/08/2017 at 03:31 PM Radim Krcmar wrote:
> >> 2017-09-08 13:23+0200, Andreas Hartmann:
> >>> On 09/08/2017 at 12:58 PM Mason wrote:
> >>>> On 08/09/2017 11:58, Andreas Hartmann wrote:
> >>>>
> >>>>> If AVIC is enabled, I'm usually getting one warning like the one below a
> >>>>> few minutes after the VM has been started.
> >>>>>
> >>>>> What does it mean?
> >>>>>
> >>>>> There isn't any problem with that VM - it's just working as expected. If
> >>>>> AVIC is disabled, the warning doesn't come up and the VM works fine, too.
> >>>>>
> >>>>> Base is a Ryzen system (RMA'd CPU w/o gcc segfaults meanwhile and X370
> >>>>> chipset (ASUS Prime X370-PRO with Bios 0810)). Linux is 4.9.48.
> >>>>>
> >>>>>
> >>>>> [ 1603.482692] ------------[ cut here ]------------
> >>>>> [ 1603.482702] WARNING: CPU: 1 PID: 2936 at ../arch/x86/kvm/svm.c:1484 avic_vcpu_load+0x162/0x190 [kvm_amd]
> >>>>> [ 1603.482703] Modules linked in: sr_mod cdrom uas usb_storage vhost_net tun vhost macvtap macvlan igb dca nf_log_ipv4 nf_log_common xt_LOG ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables vfio_pci vfio_iommu_type1 vfio_virqfd vfio br_netfilter bridge stp llc iscsi_ibft iscsi_boot_sysfs snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep kvm_amd snd_pcm eeepc_wmi asus_wmi kvm sparse_keymap snd_seq rfkill video snd_seq_device mxm_wmi irqbypass snd_timer sp5100_tco i2c_piix4 snd pcspkr e1000e soundcore ptp pps_core fjes pinctrl_amd gpio_amdpt acpi_cpufreq gpio_generic 8250_dw tpm_tis shpchp i2c_designware_platform wmi i2c_designware_core tpm_tis_core button tpm nfsd auth_rp
> >>>>>   cgss nfs_acl
> >>>>> [ 1603.482740]  lockd grace sunrpc xfs libcrc32c dm_crypt algif_skcipher af_alg hid_generic usbhid raid1 md_mod amdkfd amd_iommu_v2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel radeon i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm serio_raw drm ccp xhci_pci xhci_hcd usbcore aesni_intel aes_x86_64 glue_helper lrw ablk_helper cryptd ata_generic pata_atiixp dm_mirror dm_region_hash dm_log sg thermal dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
> >>>>> [ 1603.482766] CPU: 1 PID: 2936 Comm: CPU 2/KVM Not tainted 4.9.48-1.1-default #1
> >>>>> [ 1603.482767] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 0810 08/01/2017
> >>>>> [ 1603.482768]  ffffab614aa4fcb8 ffffffff893c14fa 0000000000000000 0000000000000000
> >>>>> [ 1603.482772]  ffffab614aa4fcf8 ffffffff89085da1 000005cc004b6f7f 0000000000000001
> >>>>> [ 1603.482774]  c0000007ea4bc001 0000000000000001 000001753dd1e17d ffff9e84ea7e0140
> >>>>> [ 1603.482776] Call Trace:
> >>>>> [ 1603.482798]  [<ffffffff893c14fa>] dump_stack+0x63/0x89
> >>>>> [ 1603.482804]  [<ffffffff89085da1>] __warn+0xd1/0xf0
> >>>>> [ 1603.482809]  [<ffffffff89085e8d>] warn_slowpath_null+0x1d/0x20
> >>>>> [ 1603.482813]  [<ffffffffc15ade42>] avic_vcpu_load+0x162/0x190 [kvm_amd]
> >>>>> [ 1603.482849]  [<ffffffffc0950da4>] ? kvm_cpu_has_interrupt+0x44/0x50 [kvm]
> >>>>> [ 1603.482852]  [<ffffffffc15ae138>] svm_vcpu_unblocking+0x18/0x20 [kvm_amd]
> >>>>> [ 1603.482868]  [<ffffffffc091b424>] kvm_vcpu_block+0xc4/0x310 [kvm]
> >>>>> [ 1603.482889]  [<ffffffffc0937fac>] kvm_arch_vcpu_ioctl_run+0x19c/0x400 [kvm]
> >>>>> [ 1603.482906]  [<ffffffffc091cc42>] kvm_vcpu_ioctl+0x312/0x5e0 [kvm]
> >>>>> [ 1603.482913]  [<ffffffff8910d10f>] ? do_futex+0x10f/0x510
> >>>>> [ 1603.482916]  [<ffffffff8923ec06>] do_vfs_ioctl+0x96/0x5b0
> >>>>> [ 1603.482921]  [<ffffffff89249767>] ? __fget+0x77/0xb0
> >>>>> [ 1603.482922]  [<ffffffff8923f199>] SyS_ioctl+0x79/0x90
> >>>>> [ 1603.482927]  [<ffffffff8900393b>] do_syscall_64+0x5b/0xd0
> >>>>> [ 1603.482932]  [<ffffffff89719c2f>] entry_SYSCALL64_slow_path+0x25/0x25
> >>>>> [ 1603.484799] ---[ end trace d24b045704bb6ae7 ]---
> >>>>
> >>>> For what it's worth...
> >>>>
> >>>> http://elixir.free-electrons.com/linux/v4.9.48/source/arch/x86/kvm/svm.c#L1484
> >>>>
> >>>>     entry = READ_ONCE(*(svm->avic_physical_id_cache));
> >>>>     WARN_ON(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);
> >>>>
> >>>> Added by commit 8221c13700561
> >>>>
> >>>>     svm: Manage vcpu load/unload when enable AVIC
> >>>>         When a vcpu is loaded/unloaded to a physical core, we need to update
> >>>>     host physical APIC ID information in the Physical APIC-ID table
> >>>>     accordingly.
> >>>>         Also, when vCPU is blocking/un-blocking (due to halt instruction),
> >>>>     we need to make sure that the is-running bit in set accordingly in the
> >>>>     physical APIC-ID table.
> >>>
> >>> Thanks - I already found this - but I don't know how to estimate it. 
> >>>
> >>> Should I worry about? Or is it just informative? As it is working 
> >>> afterwards the same as before, it probably isn't critical? 
> >>> Could the WARN_ON be removed?
> >>
> >> I'm thinking this happens if the VCPU is preempted in avic_set_running()
> >> after 'svm->avic_is_running = true' and before avic_vcpu_load().
> >> avic_vcpu_load() is then called from both avic_set_running() and
> >> svm_vcpu_load().
> >>
> >> The warning does not reveal any harmful bug.  Disabling preemption in
> >> avic_set_running() would allow us to keep the protective warning.
> >> (It seems to be slightly better than remembeging and restoring the avic
> >>  state on sched out/in.)
> >>
> >> Can you try the following patch?
> >>
> >> Thanks.
> >>
> >>
> >> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >> index 968f38dcb864..6b123f375d66 100644
> >> --- a/arch/x86/kvm/svm.c
> >> +++ b/arch/x86/kvm/svm.c
> >> @@ -1572,11 +1572,15 @@ static void avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
> >>  {
> >>  	struct vcpu_svm *svm = to_svm(vcpu);
> >>  
> >> +	preempt_disable();
> >> +
> >>  	svm->avic_is_running = is_run;
> >>  	if (is_run)
> >>  		avic_vcpu_load(vcpu, vcpu->cpu);
> >>  	else
> >>  		avic_vcpu_put(vcpu);
> >> +
> >> +	preempt_enable();
> >>  }
> >>  
> >>  static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> >>
diff mbox

Patch

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 968f38dcb864..6b123f375d66 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1572,11 +1572,15 @@  static void avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	preempt_disable();
+
 	svm->avic_is_running = is_run;
 	if (is_run)
 		avic_vcpu_load(vcpu, vcpu->cpu);
 	else
 		avic_vcpu_put(vcpu);
+
+	preempt_enable();
 }
 
 static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)