mbox series

[GIT,PULL] virtio: fatures, fixes

Message ID 20220812114250-mutt-send-email-mst@kernel.org (mailing list archive)
State New, archived
Headers show
Series [GIT,PULL] virtio: fatures, fixes | expand

Pull-request

https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

Message

Michael S. Tsirkin Aug. 12, 2022, 3:42 p.m. UTC
The following changes since commit 3d7cb6b04c3f3115719235cc6866b10326de34cd:

  Linux 5.19 (2022-07-31 14:03:01 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 93e530d2a1c4c0fcce45e01ae6c5c6287a08d3e3:

  vdpa/mlx5: Fix possible uninitialized return value (2022-08-11 10:00:36 -0400)

----------------------------------------------------------------
virtio: fatures, fixes

A huge patchset supporting vq resize using the
new vq reset capability.
Features, fixes, cleanups all over the place.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

----------------------------------------------------------------
Alvaro Karsz (1):
      net: virtio_net: notifications coalescing support

Bo Liu (3):
      virtio: Check dev_set_name() return value
      vhost-vdpa: Call ida_simple_remove() when failed
      virtio_vdpa: support the arg sizes of find_vqs()

Colin Ian King (1):
      vDPA/ifcvf: remove duplicated assignment to pointer cfg

David Hildenbrand (1):
      drivers/virtio: Clarify CONFIG_VIRTIO_MEM for unsupported architectures

Eli Cohen (3):
      vdpa/mlx5: Implement susupend virtqueue callback
      vdpa/mlx5: Support different address spaces for control and data
      vdpa/mlx5: Fix possible uninitialized return value

Eugenio PĂ©rez (4):
      vdpa: Add suspend operation
      vhost-vdpa: introduce SUSPEND backend feature bit
      vhost-vdpa: uAPI to suspend the device
      vdpa_sim: Implement suspend vdpa op

Jason Wang (2):
      virtio_pmem: initialize provider_data through nd_region_desc
      virtio_pmem: set device ready in probe()

Michael S. Tsirkin (1):
      virtio: VIRTIO_HARDEN_NOTIFICATION is broken

Mike Christie (2):
      vhost-scsi: Fix max number of virtqueues
      vhost scsi: Allow user to control num virtqueues

Minghao Xue (2):
      dt-bindings: virtio: mmio: add optional wakeup-source property
      virtio_mmio: add support to set IRQ of a virtio device as wakeup source

Robin Murphy (1):
      vdpa: Use device_iommu_capable()

Shigeru Yoshida (1):
      virtio-blk: Avoid use-after-free on suspend/resume

Stefano Garzarella (11):
      vringh: iterate on iotlb_translate to handle large translations
      vdpa_sim_blk: use dev_dbg() to print errors
      vdpa_sim_blk: limit the number of request handled per batch
      vdpa_sim_blk: call vringh_complete_iotlb() also in the error path
      vdpa_sim_blk: set number of address spaces and virtqueue groups
      vdpa_sim: use max_iotlb_entries as a limit in vhost_iotlb_init
      tools/virtio: fix build
      vdpa_sim_blk: check if sector is 0 for commands other than read or write
      vdpa_sim_blk: make vdpasim_blk_check_range usable by other requests
      vdpa_sim_blk: add support for VIRTIO_BLK_T_FLUSH
      vdpa_sim_blk: add support for discard and write-zeroes

Xie Yongji (5):
      vduse: Remove unnecessary spin lock protection
      vduse: Use memcpy_{to,from}_page() in do_bounce()
      vduse: Support using userspace pages as bounce buffer
      vduse: Support registering userspace memory for IOVA regions
      vduse: Support querying information of IOVA regions

Xu Qiang (1):
      vdpa/mlx5: Use eth_broadcast_addr() to assign broadcast address

Xuan Zhuo (44):
      remoteproc: rename len of rpoc_vring to num
      virtio_ring: remove the arg vq of vring_alloc_desc_extra()
      virtio: record the maximum queue num supported by the device.
      virtio: struct virtio_config_ops add callbacks for queue_reset
      virtio_ring: update the document of the virtqueue_detach_unused_buf for queue reset
      virtio_ring: extract the logic of freeing vring
      virtio_ring: split vring_virtqueue
      virtio_ring: introduce virtqueue_init()
      virtio_ring: split: stop __vring_new_virtqueue as export symbol
      virtio_ring: split: __vring_new_virtqueue() accept struct vring_virtqueue_split
      virtio_ring: split: introduce vring_free_split()
      virtio_ring: split: extract the logic of alloc queue
      virtio_ring: split: extract the logic of alloc state and extra
      virtio_ring: split: extract the logic of vring init
      virtio_ring: split: extract the logic of attach vring
      virtio_ring: split: introduce virtqueue_reinit_split()
      virtio_ring: split: reserve vring_align, may_reduce_num
      virtio_ring: split: introduce virtqueue_resize_split()
      virtio_ring: packed: introduce vring_free_packed
      virtio_ring: packed: extract the logic of alloc queue
      virtio_ring: packed: extract the logic of alloc state and extra
      virtio_ring: packed: extract the logic of vring init
      virtio_ring: packed: extract the logic of attach vring
      virtio_ring: packed: introduce virtqueue_reinit_packed()
      virtio_ring: packed: introduce virtqueue_resize_packed()
      virtio_ring: introduce virtqueue_resize()
      virtio_pci: struct virtio_pci_common_cfg add queue_notify_data
      virtio: allow to unbreak/break virtqueue individually
      virtio: queue_reset: add VIRTIO_F_RING_RESET
      virtio_ring: struct virtqueue introduce reset
      virtio_pci: struct virtio_pci_common_cfg add queue_reset
      virtio_pci: introduce helper to get/set queue reset
      virtio_pci: extract the logic of active vq for modern pci
      virtio_pci: support VIRTIO_F_RING_RESET
      virtio: find_vqs() add arg sizes
      virtio_pci: support the arg sizes of find_vqs()
      virtio_mmio: support the arg sizes of find_vqs()
      virtio: add helper virtio_find_vqs_ctx_size()
      virtio_net: set the default max ring size by find_vqs()
      virtio_net: get ringparam by virtqueue_get_vring_max_size()
      virtio_net: split free_unused_bufs()
      virtio_net: support rx queue resize
      virtio_net: support tx queue resize
      virtio_net: support set_ringparam

Zhang Jiaming (1):
      vdpa: ifcvf: Fix spelling mistake in comments

Zhu Lingshan (4):
      vDPA/ifcvf: get_config_size should return a value no greater than dev implementation
      vDPA/ifcvf: support userspace to query features and MQ of a management device
      vDPA: !FEATURES_OK should not block querying device config space
      vDPA: fix 'cast to restricted le16' warnings in vdpa.c

 Documentation/devicetree/bindings/virtio/mmio.yaml |   4 +
 arch/um/drivers/virtio_uml.c                       |   3 +-
 drivers/block/virtio_blk.c                         |  24 +-
 drivers/net/virtio_net.c                           | 325 +++++++-
 drivers/nvdimm/virtio_pmem.c                       |   9 +-
 drivers/platform/mellanox/mlxbf-tmfifo.c           |   3 +
 drivers/remoteproc/remoteproc_core.c               |   4 +-
 drivers/remoteproc/remoteproc_virtio.c             |  13 +-
 drivers/s390/virtio/virtio_ccw.c                   |   4 +
 drivers/vdpa/ifcvf/ifcvf_base.c                    |  14 +-
 drivers/vdpa/ifcvf/ifcvf_base.h                    |   2 +
 drivers/vdpa/ifcvf/ifcvf_main.c                    | 144 ++--
 drivers/vdpa/mlx5/core/mlx5_vdpa.h                 |  11 +
 drivers/vdpa/mlx5/net/mlx5_vnet.c                  | 175 ++++-
 drivers/vdpa/vdpa.c                                |  14 +-
 drivers/vdpa/vdpa_sim/vdpa_sim.c                   |  18 +-
 drivers/vdpa/vdpa_sim/vdpa_sim.h                   |   1 +
 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c               | 176 ++++-
 drivers/vdpa/vdpa_sim/vdpa_sim_net.c               |   3 +
 drivers/vdpa/vdpa_user/iova_domain.c               | 102 ++-
 drivers/vdpa/vdpa_user/iova_domain.h               |   8 +
 drivers/vdpa/vdpa_user/vduse_dev.c                 | 180 +++++
 drivers/vhost/scsi.c                               |  85 ++-
 drivers/vhost/vdpa.c                               |  38 +-
 drivers/vhost/vringh.c                             |  78 +-
 drivers/virtio/Kconfig                             |  11 +-
 drivers/virtio/virtio.c                            |   4 +-
 drivers/virtio/virtio_mmio.c                       |  14 +-
 drivers/virtio/virtio_pci_common.c                 |  32 +-
 drivers/virtio/virtio_pci_common.h                 |   3 +-
 drivers/virtio/virtio_pci_legacy.c                 |   8 +-
 drivers/virtio/virtio_pci_modern.c                 | 153 +++-
 drivers/virtio/virtio_pci_modern_dev.c             |  39 +
 drivers/virtio/virtio_ring.c                       | 814 +++++++++++++++------
 drivers/virtio/virtio_vdpa.c                       |  18 +-
 include/linux/mlx5/mlx5_ifc_vdpa.h                 |   8 +
 include/linux/remoteproc.h                         |   4 +-
 include/linux/vdpa.h                               |   4 +
 include/linux/virtio.h                             |  10 +
 include/linux/virtio_config.h                      |  40 +-
 include/linux/virtio_pci_modern.h                  |   9 +
 include/linux/virtio_ring.h                        |  10 -
 include/uapi/linux/vduse.h                         |  47 ++
 include/uapi/linux/vhost.h                         |   9 +
 include/uapi/linux/vhost_types.h                   |   2 +
 include/uapi/linux/virtio_config.h                 |   7 +-
 include/uapi/linux/virtio_net.h                    |  34 +-
 include/uapi/linux/virtio_pci.h                    |   2 +
 tools/virtio/linux/kernel.h                        |   2 +-
 tools/virtio/linux/vringh.h                        |   1 +
 tools/virtio/virtio_test.c                         |   4 +-
 51 files changed, 2171 insertions(+), 556 deletions(-)

Comments

pr-tracker-bot@kernel.org Aug. 12, 2022, 4:59 p.m. UTC | #1
The pull request you sent on Fri, 12 Aug 2022 11:42:50 -0400:

> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/7a53e17accce9d310d2e522dfc701d8da7ccfa65

Thank you!
Andres Freund Aug. 14, 2022, 12:45 a.m. UTC | #2
Hi,

On 2022-08-12 11:42:50 -0400, Michael S. Tsirkin wrote:
> The following changes since commit 3d7cb6b04c3f3115719235cc6866b10326de34cd:
> 
>   Linux 5.19 (2022-07-31 14:03:01 -0700)
> 
> are available in the Git repository at:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
> 
> for you to fetch changes up to 93e530d2a1c4c0fcce45e01ae6c5c6287a08d3e3:
> 
>   vdpa/mlx5: Fix possible uninitialized return value (2022-08-11 10:00:36 -0400)
> ----------------------------------------------------------------
> virtio: fatures, fixes
> 
> A huge patchset supporting vq resize using the
> new vq reset capability.
> Features, fixes, cleanups all over the place.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> ----------------------------------------------------------------

I have a script [1] that daily builds google cloud VM images with a fresh vanilla
kernel for postgres CI testing. The last successful image creation was
7ebfc85e2cd7b08f518b526173e9a33b56b3913b
and the first failing was
69dac8e431af26173ca0a1ebc87054e01c585bcc

Since then creating a new kernel boots but network does not come up.

Looking at the merges between those commit makes me suspect this merge:

69dac8e431af Merge tag 'riscv-for-linus-5.20-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
6c833c0581f1 Merge tag 'devicetree-fixes-for-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
3d076fec5a0c Merge tag 'rtc-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
4a9350597aff Merge tag 'sound-fix-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
7a53e17accce Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
999324f58c41 Merge tag 'loongarch-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
f7cdaeeab8ca Merge tag 'for-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
d16b418fac3d Merge tag 'vfio-v6.0-rc1pt2' of https://github.com/awilliam/linux-vfio
9801002f76c6 perf: riscv_pmu{,_sbi}: Miscallenous improvement & fixes
c3adefb5baf3 Merge tag 'for-6.0/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
7ce2aa6d7fe1 Merge tag 'drm-next-2022-08-12-1' of git://anongit.freedesktop.org/drm/drm
7ab52f75a9cf RISC-V: Add Sstc extension support
36fa1cb56ac5 Merge tag 'drm-misc-next-fixes-2022-08-10' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
da06cc5bb600 RISC-V: fixups to work with crash tool
6de9eb21cd36 Merge 'irq/loongarch', 'pci/ctrl/loongson' and 'pci/header-cleanup-immutable'
3aefb2ee5bdd riscv: implement Zicbom-based CMO instructions + the t-head variant
8f2f74b4b6e6 RISC-V: Canaan devicetree fixes
f94ba7039fb4 Merge tag 'at91-reset-sama7g5-signed' into psy-next

all the drivers/net changes in that commit range were part of this pull
request.


excerpt from serial log for debian sid kernel (sorry for the interspersed logs):

Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 cloud-ifupdown-helper: Generated configuration for ens4
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kern[  OK  ] Finished Raise network interfaces.
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Found device Virtio network device.
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Commit a transient machine-id on disk was skipped because of a failed condition check (ConditionPathIsMountPoint[  OK  ] Reached target Network.
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Started ifup for ens4.
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.354044] x86: [  OK  ] Reached target Network is Online.
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: Internet Systems Consortium DHCP Client 4.4.3
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: Internet Systems Consortium DHCP Client 4.4.3
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: Copyright 2004-2022 Internet Systems Consortium.
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: For info, please visit https://www.isc.org/software/dhcp/
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: Copyright 2004-2022 Internet Systems Consortium.
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: For info, please visit https://www.isc.org/software/dhcp/
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Starting Raise network interfaces...
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 ifup[356]: ifup: waiting for lock on /run/network/ifstate.ens4
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: Listening on LPF/ens4/42:01:0a:a8:00:07
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: Listening on LPF/ens4/42:01:0a:a8:00:07
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: Sending on   LPF/ens4/42:01:0a:a8:00:07
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: Sending on   LPF/ens4/42:01:0a:a8:00:07
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: DHCPDISCOVER on ens4 to 255.255.255.255 port 67 interval 7
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.400657] NET: Registered PF_NETLINK/PF_ROUTE protocol family
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: DHCPDISCOVER on ens4 to 255.255.255.255 port 67 interval 7
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.408289] audit: initializing netlink subsys (disabled)
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: DHCPOFFER of 10.168.0.7 from 169.254.169.254
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: DHCPOFFER of 10.168.0.7 from 169.254.169.254
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: DHCPREQUEST for 10.168.0.7 on ens4 to 255.255.255.255 port 67
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: DHCPREQUEST for 10.168.0.7 on ens4 to 255.255.255.255 port 67
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: DHCPACK of 10.168.0.7 from 169.254.169.254
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: DHCPACK of 10.168.0.7 from 169.254.169.254
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.549954] NetLabel: Initializing
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.550736] NetLabel:  domain hash size = 128
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.551480] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.552303] NetLabel:  unlabeled traffic allowed by default
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.570445] NET: Registered PF_INET protocol family
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.586842] NET: Registered PF_UNIX/PF_LOCAL protocol family
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.587916] NET: Registered PF_XDP protocol family
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.865585] NET: Registered PF_INET6 protocol family
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.872235] NET: Registered PF_PACKET protocol family
rnel: [    1.153962] virtio_net virtio1 ens4: renamed from eth0
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[474]: ens4=ens4
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Finished Raise network interfaces.
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Reached target Network.
Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Reached target Network is Online.

rebooting into the new kernel:

[    0.475837] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.476558] audit: initializing netlink subsys (disabled)
[    0.630598] NetLabel: Initializing
[    0.631503] NetLabel:  domain hash size = 128
[    0.632409] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
[    0.632515] NetLabel:  unlabeled traffic allowed by default
[    0.654654] NET: Registered PF_INET protocol family
[    0.672514] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    0.871362] Initializing XFRM netlink socket
[    0.872171] NET: Registered PF_INET6 protocol family
[    0.875791] NET: Registered PF_PACKET protocol family
[    0.876932] 9pnet: Installing 9P2000 support
[    0.887570] printk: console [netcon0] enabled
[    0.888339] netconsole: network logging started
[    0.943112] virtio_net virtio1 enp0s4: renamed from eth0
         Starting Raise network interfaces...
[  OK  ] Found device Virtio network device.
[    1.876517] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s4: link becomes ready
Aug 13 22:51:16 debian systemd[1]: Starting Raise network interfaces...
Aug 13 22:51:16 debian dhclient[349]: Internet Systems Consortium DHCP Client 4.4.3
Aug 13 22:51:16 debian ifup[349]: Internet Systems Consortium DHCP Client 4.4.3
Aug 13 22:51:16 debian ifup[349]: Copyright 2004-2022 Internet Systems Consortium.
Aug 13 22:51:16 debian ifup[349]: For info, please visit https://www.isc.org/software/dhcp/
Aug 13 22:51:16 debian dhclient[349]: Copyright 2004-2022 Internet Systems Consortium.
Aug 13 22:51:16 debian dhclient[349]: For info, please visit https://www.isc.org/software/dhcp/
Aug 13 22:51:16 debian kernel: [    0.475837] NET: Registered PF_NETLINK/PF_ROUTE protocol family
Aug 13 22:51:16 debian kernel: [    0.476558] audit: initializing netlink subsys (disabled)
Aug 13 22:51:16 debian systemd[1]: Found device Virtio network device.
Aug 13 22:51:16 debian ifup[349]: DHCPDISCOVER on enp0s4 to 255.255.255.255 port 67 interval 6
Aug 13 22:51:16 debian dhclient[349]: DHCPDISCOVER on enp0s4 to 255.255.255.255 port 67 interval 6
Aug 13 22:51:16 debian sh[356]: ifup: waiting for lock on /run/network/ifstate.enp0s4
Aug 13 22:51:16 debian kernel: [    0.630598] NetLabel: Initializing
Aug 13 22:51:16 debian kernel: [    0.631503] NetLabel:  domain hash size = 128
Aug 13 22:51:16 debian kernel: [    0.632409] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
Aug 13 22:51:16 debian kernel: [    0.632515] NetLabel:  unlabeled traffic allowed by default
Aug 13 22:51:16 debian kernel: [    0.654654] NET: Registered PF_INET protocol family
Aug 13 22:51:16 debian kernel: [    0.672514] NET: Registered PF_UNIX/PF_LOCAL protocol family
Aug 13 22:51:16 debian kernel: [    0.871362] Initializing XFRM netlink socket
Aug 13 22:51:16 debian kernel: [    0.872171] NET: Registered PF_INET6 protocol family
Aug 13 22:51:16 debian kernel: [    0.875791] NET: Registered PF_PACKET protocol family
Aug 13 22:51:16 debian kernel: [    0.876932] 9pnet: Installing 9P2000 support
Aug 13 22:51:16 debian kernel: [    0.887570] printk: console [netcon0] enabled
Aug 13 22:51:16 debian kernel: [    0.888339] netconsole: network logging started
Aug 13 22:51:16 debian kernel: [    0.943112] virtio_net virtio1 enp0s4: renamed from eth0
Aug 13 22:51:16 debian kernel: [    1.876517] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s4: link becomes ready
[ ***  ] A start job is running for Raise network interfaces (6s / 5min)
Aug 13 22:51:22 debian dhclient[349]: DHCPDISCOVER on enp0s4 to 255.255.255.255 port 67 interval 13
[***   ] A start job is running for Raise network interfaces (19s / 5min)
Aug 13 22:51:35 debian dhclient[349]: DHCPDISCOVER on enp0s4 to 255.255.255.255 port 67 interval 14
[***   ] A start job is running for Raise network interfaces (33s / 5min)
...


Greetings,

Andres Freund


[1] https://github.com/anarazel/pg-vm-images/blob/main/packer/linux_debian.pkr.hcl#L225
Xuan Zhuo Aug. 14, 2022, 1:50 a.m. UTC | #3
On Sat, 13 Aug 2022 17:45:22 -0700, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2022-08-12 11:42:50 -0400, Michael S. Tsirkin wrote:
> > The following changes since commit 3d7cb6b04c3f3115719235cc6866b10326de34cd:
> >
> >   Linux 5.19 (2022-07-31 14:03:01 -0700)
> >
> > are available in the Git repository at:
> >
> >   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
> >
> > for you to fetch changes up to 93e530d2a1c4c0fcce45e01ae6c5c6287a08d3e3:
> >
> >   vdpa/mlx5: Fix possible uninitialized return value (2022-08-11 10:00:36 -0400)
> > ----------------------------------------------------------------
> > virtio: fatures, fixes
> >
> > A huge patchset supporting vq resize using the
> > new vq reset capability.
> > Features, fixes, cleanups all over the place.
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >
> > ----------------------------------------------------------------
>
> I have a script [1] that daily builds google cloud VM images with a fresh vanilla
> kernel for postgres CI testing. The last successful image creation was
> 7ebfc85e2cd7b08f518b526173e9a33b56b3913b
> and the first failing was
> 69dac8e431af26173ca0a1ebc87054e01c585bcc
>
> Since then creating a new kernel boots but network does not come up.
>
> Looking at the merges between those commit makes me suspect this merge:
>
> 69dac8e431af Merge tag 'riscv-for-linus-5.20-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
> 6c833c0581f1 Merge tag 'devicetree-fixes-for-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
> 3d076fec5a0c Merge tag 'rtc-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
> 4a9350597aff Merge tag 'sound-fix-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
> 7a53e17accce Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
> 999324f58c41 Merge tag 'loongarch-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
> f7cdaeeab8ca Merge tag 'for-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
> d16b418fac3d Merge tag 'vfio-v6.0-rc1pt2' of https://github.com/awilliam/linux-vfio
> 9801002f76c6 perf: riscv_pmu{,_sbi}: Miscallenous improvement & fixes
> c3adefb5baf3 Merge tag 'for-6.0/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
> 7ce2aa6d7fe1 Merge tag 'drm-next-2022-08-12-1' of git://anongit.freedesktop.org/drm/drm
> 7ab52f75a9cf RISC-V: Add Sstc extension support
> 36fa1cb56ac5 Merge tag 'drm-misc-next-fixes-2022-08-10' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
> da06cc5bb600 RISC-V: fixups to work with crash tool
> 6de9eb21cd36 Merge 'irq/loongarch', 'pci/ctrl/loongson' and 'pci/header-cleanup-immutable'
> 3aefb2ee5bdd riscv: implement Zicbom-based CMO instructions + the t-head variant
> 8f2f74b4b6e6 RISC-V: Canaan devicetree fixes
> f94ba7039fb4 Merge tag 'at91-reset-sama7g5-signed' into psy-next
>
> all the drivers/net changes in that commit range were part of this pull
> request.
>
>
> excerpt from serial log for debian sid kernel (sorry for the interspersed logs):
>
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 cloud-ifupdown-helper: Generated configuration for ens4
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kern[  OK  ] Finished Raise network interfaces.
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Found device Virtio network device.
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Commit a transient machine-id on disk was skipped because of a failed condition check (ConditionPathIsMountPoint[  OK  ] Reached target Network.
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Started ifup for ens4.
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.354044] x86: [  OK  ] Reached target Network is Online.
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: Internet Systems Consortium DHCP Client 4.4.3
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: Internet Systems Consortium DHCP Client 4.4.3
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: Copyright 2004-2022 Internet Systems Consortium.
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: For info, please visit https://www.isc.org/software/dhcp/
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: Copyright 2004-2022 Internet Systems Consortium.
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: For info, please visit https://www.isc.org/software/dhcp/
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Starting Raise network interfaces...
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 ifup[356]: ifup: waiting for lock on /run/network/ifstate.ens4
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: Listening on LPF/ens4/42:01:0a:a8:00:07
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: Listening on LPF/ens4/42:01:0a:a8:00:07
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: Sending on   LPF/ens4/42:01:0a:a8:00:07
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: Sending on   LPF/ens4/42:01:0a:a8:00:07
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: DHCPDISCOVER on ens4 to 255.255.255.255 port 67 interval 7
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.400657] NET: Registered PF_NETLINK/PF_ROUTE protocol family
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: DHCPDISCOVER on ens4 to 255.255.255.255 port 67 interval 7
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.408289] audit: initializing netlink subsys (disabled)
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: DHCPOFFER of 10.168.0.7 from 169.254.169.254
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: DHCPOFFER of 10.168.0.7 from 169.254.169.254
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: DHCPREQUEST for 10.168.0.7 on ens4 to 255.255.255.255 port 67
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: DHCPREQUEST for 10.168.0.7 on ens4 to 255.255.255.255 port 67
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 dhclient[354]: DHCPACK of 10.168.0.7 from 169.254.169.254
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[354]: DHCPACK of 10.168.0.7 from 169.254.169.254
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.549954] NetLabel: Initializing
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.550736] NetLabel:  domain hash size = 128
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.551480] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.552303] NetLabel:  unlabeled traffic allowed by default
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.570445] NET: Registered PF_INET protocol family
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.586842] NET: Registered PF_UNIX/PF_LOCAL protocol family
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.587916] NET: Registered PF_XDP protocol family
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.865585] NET: Registered PF_INET6 protocol family
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 kernel: [    0.872235] NET: Registered PF_PACKET protocol family
> rnel: [    1.153962] virtio_net virtio1 ens4: renamed from eth0
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 sh[474]: ens4=ens4
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Finished Raise network interfaces.
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Reached target Network.
> Aug 13 22:44:15 build-sid-newkernel-2022-08-13t22-41 systemd[1]: Reached target Network is Online.
>
> rebooting into the new kernel:
>
> [    0.475837] NET: Registered PF_NETLINK/PF_ROUTE protocol family
> [    0.476558] audit: initializing netlink subsys (disabled)
> [    0.630598] NetLabel: Initializing
> [    0.631503] NetLabel:  domain hash size = 128
> [    0.632409] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
> [    0.632515] NetLabel:  unlabeled traffic allowed by default
> [    0.654654] NET: Registered PF_INET protocol family
> [    0.672514] NET: Registered PF_UNIX/PF_LOCAL protocol family
> [    0.871362] Initializing XFRM netlink socket
> [    0.872171] NET: Registered PF_INET6 protocol family
> [    0.875791] NET: Registered PF_PACKET protocol family
> [    0.876932] 9pnet: Installing 9P2000 support
> [    0.887570] printk: console [netcon0] enabled
> [    0.888339] netconsole: network logging started
> [    0.943112] virtio_net virtio1 enp0s4: renamed from eth0
>          Starting Raise network interfaces...
> [  OK  ] Found device Virtio network device.
> [    1.876517] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s4: link becomes ready
> Aug 13 22:51:16 debian systemd[1]: Starting Raise network interfaces...
> Aug 13 22:51:16 debian dhclient[349]: Internet Systems Consortium DHCP Client 4.4.3
> Aug 13 22:51:16 debian ifup[349]: Internet Systems Consortium DHCP Client 4.4.3
> Aug 13 22:51:16 debian ifup[349]: Copyright 2004-2022 Internet Systems Consortium.
> Aug 13 22:51:16 debian ifup[349]: For info, please visit https://www.isc.org/software/dhcp/
> Aug 13 22:51:16 debian dhclient[349]: Copyright 2004-2022 Internet Systems Consortium.
> Aug 13 22:51:16 debian dhclient[349]: For info, please visit https://www.isc.org/software/dhcp/
> Aug 13 22:51:16 debian kernel: [    0.475837] NET: Registered PF_NETLINK/PF_ROUTE protocol family
> Aug 13 22:51:16 debian kernel: [    0.476558] audit: initializing netlink subsys (disabled)
> Aug 13 22:51:16 debian systemd[1]: Found device Virtio network device.
> Aug 13 22:51:16 debian ifup[349]: DHCPDISCOVER on enp0s4 to 255.255.255.255 port 67 interval 6
> Aug 13 22:51:16 debian dhclient[349]: DHCPDISCOVER on enp0s4 to 255.255.255.255 port 67 interval 6
> Aug 13 22:51:16 debian sh[356]: ifup: waiting for lock on /run/network/ifstate.enp0s4
> Aug 13 22:51:16 debian kernel: [    0.630598] NetLabel: Initializing
> Aug 13 22:51:16 debian kernel: [    0.631503] NetLabel:  domain hash size = 128
> Aug 13 22:51:16 debian kernel: [    0.632409] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
> Aug 13 22:51:16 debian kernel: [    0.632515] NetLabel:  unlabeled traffic allowed by default
> Aug 13 22:51:16 debian kernel: [    0.654654] NET: Registered PF_INET protocol family
> Aug 13 22:51:16 debian kernel: [    0.672514] NET: Registered PF_UNIX/PF_LOCAL protocol family
> Aug 13 22:51:16 debian kernel: [    0.871362] Initializing XFRM netlink socket
> Aug 13 22:51:16 debian kernel: [    0.872171] NET: Registered PF_INET6 protocol family
> Aug 13 22:51:16 debian kernel: [    0.875791] NET: Registered PF_PACKET protocol family
> Aug 13 22:51:16 debian kernel: [    0.876932] 9pnet: Installing 9P2000 support
> Aug 13 22:51:16 debian kernel: [    0.887570] printk: console [netcon0] enabled
> Aug 13 22:51:16 debian kernel: [    0.888339] netconsole: network logging started
> Aug 13 22:51:16 debian kernel: [    0.943112] virtio_net virtio1 enp0s4: renamed from eth0
> Aug 13 22:51:16 debian kernel: [    1.876517] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s4: link becomes ready
> [ ***  ] A start job is running for Raise network interfaces (6s / 5min)
> Aug 13 22:51:22 debian dhclient[349]: DHCPDISCOVER on enp0s4 to 255.255.255.255 port 67 interval 13
> [***   ] A start job is running for Raise network interfaces (19s / 5min)
> Aug 13 22:51:35 debian dhclient[349]: DHCPDISCOVER on enp0s4 to 255.255.255.255 port 67 interval 14
> [***   ] A start job is running for Raise network interfaces (33s / 5min)
> ...
>


Hi,

Sorry, I didn't get any valuable information from the logs, can you tell me how
to get such an image? Or how your [1] script is executed.

Thanks.


>
> Greetings,
>
> Andres Freund
>
>
> [1] https://github.com/anarazel/pg-vm-images/blob/main/packer/linux_debian.pkr.hcl#L225
Andres Freund Aug. 14, 2022, 3:52 a.m. UTC | #4
Hi,

On 2022-08-14 09:50:35 +0800, Xuan Zhuo wrote:
> Sorry, I didn't get any valuable information from the logs, can you tell me how
> to get such an image? Or how your [1] script is executed.

Is there specific information you'd like from the VM? I just recreated the
problem and can extract.


The last image that succeeded getting built is publically available, so you
could create a gcp VM for that, go to /usr/src/linux, git pull, make & install
the new kernel and reproduce the problem that way.  The git pull will take a
bit because it's a shallow clone...

gcloud compute instances create myvm --preemptible --project your-gcp-project --image-project pg-ci-images --image pg-ci-sid-newkernel-2022-08-12t06-52 --zone us-west1-a --custom-cpu=4 --custom-memory=4 --metadata=serial-port-enable=true

If you want to log in via serial console, you'd have set a password before
rebooting.

gcloud compute connect-to-serial-port --zone us-west1-a --project=pg-ci-images-dev myvm


Executing the script requires a gcp key with the right to create instances and
images. Here's how to invoke it:

PACKER_LOG=1 GOOGLE_APPLICATION_CREDENTIALS=~/image-builder@pg-ci-images-dev.iam.gserviceaccount.com.json \
  packer build \
    -var gcp_project=pg-ci-images-dev \
    -var "image_date=$(date --utc +'%Y-%m-%dt%H-%M')" \
    -var "task_name=sid-newkernel" \
    -only 'linux.googlecompute.sid-newkernel' \
    -on-error=ask \
    packer/linux_debian.pkr.hcl

Of course you'd need to change the gcp_project= variable to point to a the
project you have access to and GOOGLE_APPLICATION_CREDENTIALS to point to your
gcp key.

Initially (package upgrades, kernel builds) the VM would be SSH
accessible. After building the kernel it's only accessible via serial console.


I can probably also get you the image in some other form that you prefer,
although I don't know if the problem will reproduce outside gcp. If helpful I
could upload a "broken" gcp image that you could use to


> > [1] https://github.com/anarazel/pg-vm-images/blob/main/packer/linux_debian.pkr.hcl#L225

Greetings,

Andres Freund
Andres Freund Aug. 14, 2022, 4:39 a.m. UTC | #5
Hi,

On 2022-08-13 20:52:39 -0700, Andres Freund wrote:
> Is there specific information you'd like from the VM? I just recreated the
> problem and can extract.

Actually, after reproducing I seem to now hit a likely different issue. I
guess I should have checked exactly the revision I had a problem with earlier,
rather than doing a git pull (up to aea23e7c464b)

[    0.727199] scsi host0: Virtio SCSI HBA
[    0.732257] scsi 0:0:1:0: Direct-Access     Google   PersistentDisk   1    PQ: 0 ANSI: 6
[    0.736259] Freeing initrd memory: 7236K
[    0.741743] sd 0:0:1:0: Attached scsi generic sg0 type 0
[    0.742569] sd 0:0:1:0: [sda] 52428800 512-byte logical blocks: (26.8 GB/25.0 GiB)
[    0.742628] tun: Universal TUN/TAP device driver, 1.6
[    0.743730] sd 0:0:1:0: [sda] 4096-byte physical blocks
[    0.748026] sd 0:0:1:0: [sda] Write Protect is off
[    0.750684] sd 0:0:1:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    0.795519] BUG: unable to handle page fault for address: ffffa3107bd80008
[    0.795753] sky2: driver version 1.30
[    0.796500] #PF: supervisor read access in kernel mode
[    0.797252] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    0.796500] #PF: error_code(0x0000) - not-present page
[    0.796500] PGD 100001067 P4D 100001067 PUD 0
[    0.796500] Oops: 0000 [#1] PREEMPT SMP PTI
[    0.796500] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.19.0-origin-14013-gaea23e7c464b #2
[    0.798728] ehci-pci: EHCI PCI platform driver
[    0.796500] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
[    0.800112] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    0.796500] RIP: 0010:kmem_cache_free+0x155/0x3e0
[    0.801875] ohci-pci: OHCI PCI platform driver
[    0.796500] Code: 02 00 00 65 48 ff 08 e8 e9 cd e6 ff 66 90 8b 45 28 48 c7 04 03 00 00 00 00 48 85 db 74 38 48 8b 45 00 65 48 03 05 fb 13 34 6d <48> 8b 50 08 4c 39 60 10 0f 85 da 01 00 00 8b 4d 28 48 8b 00 48 89
[    0.803798] uhci_hcd: USB Universal Host Controller Interface driver
[    0.796500] RSP: 0000:ffffa29cc0134e80 EFLAGS: 00010286
[    0.805319] RAX: ffffa3107bd80000 RBX: ffff998840b253c0 RCX: ffff029c00000000
[    0.805319] RDX: 0000000000000000 RSI: ffffc8f280000000 RDI: ffff998840ab2300
[    0.805319] RBP: ffff998840ab2300 R08: fffffffffff0bddf R09: 0000000000000008
[    0.805319] R10: ffffffff93e060c0 R11: ffffa29cc0134ff8 R12: ffffc8f28402c940
[    0.805319] R13: ffffffff92f17edd R14: 0000000000001000 R15: 0000000000001000
[    0.805319] FS:  0000000000000000(0000) GS:ffff99887bd80000(0000) knlGS:0000000000000000
[    0.805319] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.805319] CR2: ffffa3107bd80008 CR3: 000000002720c001 CR4: 00000000003706e0
[    0.805319] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.805319] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    0.805319] Call Trace:
[    0.805319]  <IRQ>
[    0.805319]  blk_update_request+0xfd/0x3d0
[    0.805319]  ? detach_buf_split+0x6a/0x150
[    0.805319]  scsi_end_request+0x22/0x1b0
[    0.805319]  scsi_io_completion+0x3c/0x750
[    0.805319]  blk_complete_reqs+0x38/0x50
[    0.805319]  __do_softirq+0xe1/0x2ed
[    0.805319]  ? handle_edge_irq+0x9a/0x230
[    0.805319]  __irq_exit_rcu+0xa6/0x100
[    0.805319]  common_interrupt+0xa5/0xc0
[    0.805319]  </IRQ>
[    0.805319]  <TASK>
[    0.805319]  asm_common_interrupt+0x22/0x40
[    0.805319] RIP: 0010:acpi_idle_do_entry+0x46/0x60
[    0.805319] Code: 75 08 48 8b 15 2f 1a 19 01 ed c3 cc cc cc cc 65 48 8b 04 25 00 ad 01 00 48 8b 00 a8 08 75 eb 66 90 0f 00 2d 9c 0d 5b 00 fb f4 <fa> c3 cc cc cc cc e9 2f fd ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
[    0.805319] RSP: 0000:ffffa29cc00a7e68 EFLAGS: 00000246
[    0.805319] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000000098d
[    0.805319] RDX: ffff99887bd80000 RSI: ffff998840b2c000 RDI: ffff998840b2c064
[    0.805319] RBP: ffff998841a2a400 R08: fffffffffff0be0e R09: 0000000157c1aaba
[    0.805319] R10: 0000000000000018 R11: 0000000000000c27 R12: ffffffff93fc46a0
[    0.805319] R13: ffff998840b2c064 R14: 0000000000000001 R15: 0000000000000000
[    0.805319]  acpi_idle_enter+0x9f/0x100
[    0.805319]  cpuidle_enter_state+0x84/0x400
[    0.805319]  cpuidle_enter+0x24/0x40
[    0.805319]  do_idle+0x1df/0x260
[    0.805319]  cpu_startup_entry+0x14/0x20
[    0.805319]  start_secondary+0xe8/0xf0
[    0.805319]  secondary_startup_64_no_verify+0xe0/0xeb
[    0.805319]  </TASK>
[    0.805319] Modules linked in:
[    0.805319] CR2: ffffa3107bd80008
[    0.805319] ---[ end trace 0000000000000000 ]---

Regards,

Andres
Michael S. Tsirkin Aug. 14, 2022, 8:59 a.m. UTC | #6
On Sat, Aug 13, 2022 at 09:39:06PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2022-08-13 20:52:39 -0700, Andres Freund wrote:
> > Is there specific information you'd like from the VM? I just recreated the
> > problem and can extract.
> 
> Actually, after reproducing I seem to now hit a likely different issue. I
> guess I should have checked exactly the revision I had a problem with earlier,
> rather than doing a git pull (up to aea23e7c464b)

Looks like there's a generic memory corruption so it crashes
in random places. Would bisect be possible for you?
Andres Freund Aug. 14, 2022, 7:40 p.m. UTC | #7
Hi,

On 2022-08-14 04:59:48 -0400, Michael S. Tsirkin wrote:
> On Sat, Aug 13, 2022 at 09:39:06PM -0700, Andres Freund wrote:
> > Hi,
> > 
> > On 2022-08-13 20:52:39 -0700, Andres Freund wrote:
> > > Is there specific information you'd like from the VM? I just recreated the
> > > problem and can extract.
> > 
> > Actually, after reproducing I seem to now hit a likely different issue. I
> > guess I should have checked exactly the revision I had a problem with earlier,
> > rather than doing a git pull (up to aea23e7c464b)
> 
> Looks like there's a generic memory corruption so it crashes
> in random places.

Either a generic memory corruption, or something wrong with IO.

> Would bisect be possible for you?

I'll give it a go.

Greetings,

Andres Freund
Andres Freund Aug. 15, 2022, 7:02 a.m. UTC | #8
Hi,

On 2022-08-14 12:40:31 -0700, Andres Freund wrote:
> On 2022-08-14 04:59:48 -0400, Michael S. Tsirkin wrote:
> > On Sat, Aug 13, 2022 at 09:39:06PM -0700, Andres Freund wrote:
> > > Hi,
> > >
> > > On 2022-08-13 20:52:39 -0700, Andres Freund wrote:
> > > > Is there specific information you'd like from the VM? I just recreated the
> > > > problem and can extract.
> > >
> > > Actually, after reproducing I seem to now hit a likely different issue. I
> > > guess I should have checked exactly the revision I had a problem with earlier,
> > > rather than doing a git pull (up to aea23e7c464b)
> >
> > Looks like there's a generic memory corruption so it crashes
> > in random places.
>
> Either a generic memory corruption, or something wrong with IO.
>
> > Would bisect be possible for you?
>
> I'll give it a go.

Bisect points to

commit 762faee5a2678559d3dc09d95f8f2c54cd0466a7 (refs/bisect/bad)
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date:   Mon Aug 1 14:38:57 2022 +0800

    virtio_net: set the default max ring size by find_vqs()

    Use virtio_find_vqs_ctx_size() to specify the maximum ring size of tx,
    rx at the same time.

                             | rx/tx ring size
    -------------------------------------------
    speed == UNKNOWN or < 10G| 1024
    speed < 40G              | 4096
    speed >= 40G             | 8192

    Call virtnet_update_settings() once before calling init_vqs() to update
    speed.

    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Message-Id: <20220801063902.129329-38-xuanzhuo@linux.alibaba.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


I'm not 100% confident yet, because the likelihood of encountering problems
was not uniform across the versions, with one of them showing the problem only
in 1/3 boots, whereas some of the others showed it 100% of the time. But I've
rebooted enough times to be fairly confident.

With 762faee5a267 I reliably see network not connecting, with
762faee5a267^=fe3dc04e31aa I haven't seen a problem yet.


I did see some other types of crashes in commits nearby, so this might not be
the only problematic bit. See also the discussion around
https://lore.kernel.org/all/CAHk-=wikzU4402P-FpJRK_QwfVOS+t-3p1Wx5awGHTvr-s_0Ew@mail.gmail.com/

Greetings,

Andres Freund