diff mbox

[1/2] mmc: bcm2835: reset host on timeout

Message ID 97593d6e1a41af1baff61f7d9e6e68a450fc9da6.1518619058.git.msuchanek@suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Michal Suchánek Feb. 14, 2018, 2:38 p.m. UTC
The bcm2835 mmc host tends to lock up for unknown reason so reset it on
timeout. The upper mmc block layer tries retransimitting with single
blocks which tends to work out after a long wait.

This is better than giving up and leaving the machine broken for no
obvious reason.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
 drivers/mmc/host/bcm2835.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Stefan Wahren Feb. 14, 2018, 2:58 p.m. UTC | #1
Hi Michal,

Am 14.02.2018 um 15:38 schrieb Michal Suchanek:
> The bcm2835 mmc host tends to lock up for unknown reason so reset it on
> timeout. The upper mmc block layer tries retransimitting with single
> blocks which tends to work out after a long wait.
>
> This is better than giving up and leaving the machine broken for no
> obvious reason.

could you please provide more information about this issue (affected
hardware, kernel config, version, dmesg, reproducible scenario)?
Michal Suchánek Feb. 14, 2018, 3:36 p.m. UTC | #2
On Wed, 14 Feb 2018 15:58:31 +0100
Stefan Wahren <stefan.wahren@i2se.com> wrote:

> Hi Michal,
> 
> Am 14.02.2018 um 15:38 schrieb Michal Suchanek:
> > The bcm2835 mmc host tends to lock up for unknown reason so reset
> > it on timeout. The upper mmc block layer tries retransimitting with
> > single blocks which tends to work out after a long wait.
> >
> > This is better than giving up and leaving the machine broken for no
> > obvious reason.  
> 
> could you please provide more information about this issue (affected
> hardware, kernel config, version, dmesg, reproducible scenario)?
> 

The RPi3 is known to not work with some SD cards. You can find some
wiki pages with large tables of known-working and known-broken cards. I
have a couple of RPi3 boards and a card that works and card that does
not. I tried debugging the issue but did not find anything I can do
about it - AFAICT the issue happens somewhere inside the MMC controller
IP. 

I have no inside knowledge of the controller in question but during
testing I tried to reset the controller whenever the issue happens so I
can continue running the test system for a longer time until it gets
unusable. While I did not find any solution to the problem the
workaround with resetting the controller works quite reliably for me.

So I am posting it in the hope that people with the wrong combination
of RPi3 and SD card will not get a blank screen but rather a system
that boots but tends to lock up for half a minute occasionally.

Thanks

Michal
Michal Suchánek Feb. 14, 2018, 4:13 p.m. UTC | #3
On Wed, 14 Feb 2018 16:36:49 +0100
Michal Suchánek <msuchanek@suse.de> wrote:

> On Wed, 14 Feb 2018 15:58:31 +0100
> Stefan Wahren <stefan.wahren@i2se.com> wrote:
> 
> > Hi Michal,
> > 
> > Am 14.02.2018 um 15:38 schrieb Michal Suchanek:  
> > > The bcm2835 mmc host tends to lock up for unknown reason so reset
> > > it on timeout. The upper mmc block layer tries retransimitting
> > > with single blocks which tends to work out after a long wait.
> > >
> > > This is better than giving up and leaving the machine broken for
> > > no obvious reason.    
> > 
> > could you please provide more information about this issue (affected
> > hardware, kernel config, version, dmesg, reproducible scenario)?
> >   
> 

It tends to reproduce when upgrading a few packages with zypper and
otherwise at random during system operation. It seems that for my card
it worsens with age to some degree so perhaps it depends on the
fragmentation of the internal card flash.

Attaching dmesg and kernel config.

Thanks

Michal
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[    0.000000] Linux version 4.15.0-rc4-3.g809d3f7-default (geeko@buildhost) (gcc version 7.2.1 20171020 [gcc-7-branch revision 253932] (SUSE Linux)) #1 SMP Fri Dec 22 10:26:00 UTC 2017 (809d3f7)
[    0.000000] Machine model: Raspberry Pi 3 Model B
[    0.000000] efi: Getting EFI parameters from FDT:
[    0.000000] efi: EFI v2.05 by Das U-boot
[    0.000000] efi:  SMBIOS=0x35b0e000 
[    0.000000] cma: Reserved 256 MiB at 0x0000000015400000
[    0.000000] NUMA: No NUMA configuration found
[    0.000000] NUMA: Faking a node at [mem 0x0000000000000000-0x0000000036ffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x36fe0300-0x36fe1dff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000000000-0x0000000036ffffff]
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x0000000000000fff]
[    0.000000]   node   0: [mem 0x0000000000001000-0x0000000035b0dfff]
[    0.000000]   node   0: [mem 0x0000000035b0e000-0x0000000035b0efff]
[    0.000000]   node   0: [mem 0x0000000035b0f000-0x0000000036f86fff]
[    0.000000]   node   0: [mem 0x0000000036f87000-0x0000000036f87fff]
[    0.000000]   node   0: [mem 0x0000000036f88000-0x0000000036ffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000036ffffff]
[    0.000000] On node 0 totalpages: 225280
[    0.000000]   DMA zone: 3520 pages used for memmap
[    0.000000]   DMA zone: 0 pages reserved
[    0.000000]   DMA zone: 225280 pages, LIFO batch:31
[    0.000000] random: fast init done
[    0.000000] percpu: Embedded 25 pages/cpu @        (ptrval) s62616 r8192 d31592 u102400
[    0.000000] pcpu-alloc: s62616 r8192 d31592 u102400 alloc=25*4096
[    0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 
[    0.000000] Detected VIPT I-cache on CPU0
[    0.000000] CPU features: enabling workaround for ARM erratum 845719
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 221760
[    0.000000] Policy zone: DMA
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/Image-4.15.0-rc4-3.g809d3f7-default root=UUID=e70a1a15-cf94-46da-9a9c-5cb5abc296ad root=/dev/disk/by-id/mmc-00000_0x55353082-part2 disk=/dev/disk/by-id/mmc-00000_0x55353082 resume=/dev/disk/by-id/mmc-00000_0x55353082-part3 quiet splash=silent plymouth.enable=0 console=ttyS0,115200n8 console=tty cma=256MB
[    0.000000] Memory: 592660K/901120K available (9148K kernel code, 1596K rwdata, 4032K rodata, 5056K init, 1028K bss, 46316K reserved, 262144K cma-reserved)
[    0.000000] Virtual kernel memory layout:
[    0.000000]     modules : 0xffff000000000000 - 0xffff000008000000   (   128 MB)
[    0.000000]     vmalloc : 0xffff000008000000 - 0xffff7dffbfff0000   (129022 GB)
[    0.000000]       .text : 0x        (ptrval) - 0x        (ptrval)   (  9152 KB)
[    0.000000]     .rodata : 0x        (ptrval) - 0x        (ptrval)   (  4096 KB)
[    0.000000]       .init : 0x        (ptrval) - 0x        (ptrval)   (  5056 KB)
[    0.000000]       .data : 0x        (ptrval) - 0x        (ptrval)   (  1597 KB)
[    0.000000]        .bss : 0x        (ptrval) - 0x        (ptrval)   (  1029 KB)
[    0.000000]     fixed   : 0xffff7dfffe7fb000 - 0xffff7dfffec00000   (  4116 KB)
[    0.000000]     PCI I/O : 0xffff7dfffee00000 - 0xffff7dffffe00000   (    16 MB)
[    0.000000]     vmemmap : 0xffff7e0000000000 - 0xffff800000000000   (  2048 GB maximum)
[    0.000000]               0xffff7e0000000000 - 0xffff7e0000dc0000   (    13 MB actual)
[    0.000000]     memory  : 0xffff800000000000 - 0xffff800037000000   (   880 MB)
[    0.000000] ftrace: allocating 34715 entries in 136 pages
[    0.000000] Hierarchical RCU implementation.
[    0.000000] 	RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=4.
[    0.000000] 	Tasks RCU enabled.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] arch_timer: WARNING: Invalid trigger for IRQ2, assuming level low
[    0.000000] arch_timer: WARNING: Please fix your firmware
[    0.000000] arch_timer: cp15 timer(s) running at 19.20MHz (phys).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x46d987e47, max_idle_ns: 440795202767 ns
[    0.000003] sched_clock: 56 bits at 19MHz, resolution 52ns, wraps every 4398046511078ns
[    0.000406] Console: colour dummy device 80x25
[    0.000419] console [tty0] enabled
[    0.000481] Calibrating delay loop (skipped), value calculated using timer frequency.. 38.40 BogoMIPS (lpj=192000)
[    0.000490] pid_max: default: 32768 minimum: 301
[    0.000616] Security Framework initialized
[    0.000689] AppArmor: AppArmor initialized
[    0.001776] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.002273] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.002324] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes)
[    0.002346] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes)
[    0.003650] ASID allocator initialised with 65536 entries
[    0.003733] Hierarchical SRCU implementation.
[    0.005343] Remapping and enabling EFI services.
[    0.005387]   EFI remap 0x0000000035b0e000 =>         (ptrval)
[    0.005392]   EFI remap 0x0000000036f87000 =>         (ptrval)
[    0.005396]   EFI remap 0x000000003f100000 =>         (ptrval)
[    0.005710] smp: Bringing up secondary CPUs ...
[    0.006310] Detected VIPT I-cache on CPU1
[    0.006356] arch_timer: WARNING: Invalid trigger for IRQ2, assuming level low
[    0.006358] arch_timer: WARNING: Please fix your firmware
[    0.006367] CPU1: Booted secondary processor 0x0000000001 [0x410fd034]
[    0.007088] Detected VIPT I-cache on CPU2
[    0.007114] arch_timer: WARNING: Invalid trigger for IRQ2, assuming level low
[    0.007116] arch_timer: WARNING: Please fix your firmware
[    0.007123] CPU2: Booted secondary processor 0x0000000002 [0x410fd034]
[    0.007767] Detected VIPT I-cache on CPU3
[    0.007787] arch_timer: WARNING: Invalid trigger for IRQ2, assuming level low
[    0.007788] arch_timer: WARNING: Please fix your firmware
[    0.007796] CPU3: Booted secondary processor 0x0000000003 [0x410fd034]
[    0.007909] smp: Brought up 1 node, 4 CPUs
[    0.007914] SMP: Total of 4 processors activated.
[    0.007919] CPU features: detected feature: 32-bit EL0 Support
[    0.008136] CPU: All CPU(s) started at EL2
[    0.008160] alternatives: patching kernel code
[    0.008953] devtmpfs: initialized
[    0.013445] Registered cp15_barrier emulation handler
[    0.013457] Registered setend emulation handler
[    0.013777] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.013827] futex hash table entries: 1024 (order: 5, 131072 bytes)
[    0.021417] pinctrl core: initialized pinctrl subsystem
[    0.022352] SMBIOS 3.0 present.
[    0.022365] DMI: raspberrypi rpi/rpi, BIOS 2017.05 05/08/2017
[    0.022733] NET: Registered protocol family 16
[    0.024312] audit: initializing netlink subsys (disabled)
[    0.024490] audit: type=2000 audit(0.020:1): state=initialized audit_enabled=0 res=1
[    0.025368] cpuidle: using governor ladder
[    0.025466] cpuidle: using governor menu
[    0.025929] vdso: 2 pages (1 code @ 000000003ae86378, 1 data @ 000000000cb2f34c)
[    0.025944] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
[    0.027455] DMA: preallocated 256 KiB pool for atomic allocations
[    0.027909] Serial: AMBA PL011 UART driver
[    0.030644] uart-pl011 3f201000.serial: could not find pctldev for node /soc/gpio@7e200000/uart0_ctsrts_gpio32, deferring probe
[    0.040637] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    0.041650] ACPI: Interpreter disabled.
[    0.042097] vgaarb: loaded
[    0.042495] SCSI subsystem initialized
[    0.042653] libata version 3.00 loaded.
[    0.042974] EDAC MC: Ver: 3.0.0
[    0.043086] EDAC DEBUG: edac_mc_sysfs_init: device mc created
[    0.043275] Registered efivars operations
[    0.043291] efivars: get_next_variable: status=8000000000000007
[    0.044025] NetLabel: Initializing
[    0.044029] NetLabel:  domain hash size = 128
[    0.044032] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
[    0.044095] NetLabel:  unlabeled traffic allowed by default
[    0.044397] clocksource: Switched to clocksource arch_sys_counter
[    0.131160] VFS: Disk quotas dquot_6.6.0
[    0.131284] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.132170] AppArmor: AppArmor Filesystem Enabled
[    0.132267] pnp: PnP ACPI: disabled
[    0.145293] NET: Registered protocol family 2
[    0.146114] TCP established hash table entries: 8192 (order: 4, 65536 bytes)
[    0.146286] TCP bind hash table entries: 8192 (order: 5, 131072 bytes)
[    0.146494] TCP: Hash tables configured (established 8192 bind 8192)
[    0.146775] UDP hash table entries: 512 (order: 2, 16384 bytes)
[    0.146834] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
[    0.147174] NET: Registered protocol family 1
[    0.147217] PCI: CLS 0 bytes, default 128
[    0.147622] Unpacking initramfs...
[    3.601894] Freeing initrd memory: 6356K
[    3.602549] kvm [1]: 8-bit VMID
[    3.602553] kvm [1]: IDMAP page: 962000
[    3.602557] kvm [1]: HYP VA range: 800000000000:ffffffffffff
[    3.603309] kvm [1]: Invalid trigger for IRQ4, assuming level low
[    3.603351] kvm [1]: kvm_arch_timer: error setting vcpu affinity
[    3.604867] Initialise system trusted keyrings
[    3.605060] workingset: timestamp_bits=44 max_order=18 bucket_order=0
[    3.605244] zbud: loaded
[    3.609652] Key type asymmetric registered
[    3.609661] Asymmetric key parser 'x509' registered
[    3.609785] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 248)
[    3.609906] io scheduler noop registered
[    3.609910] io scheduler deadline registered
[    3.609964] io scheduler cfq registered (default)
[    3.609968] io scheduler kyber registered
[    3.609992] io scheduler bfq registered
[    3.614293] efifb: probing for efifb
[    3.614484] efifb: framebuffer at 0x3d513000, using 7012k, total 7011k
[    3.614489] efifb: mode is 1824x984x32, linelength=7296, pages=1
[    3.614491] efifb: scrolling: redraw
[    3.614496] efifb: Truecolor: size=8:8:8:8, shift=24:0:8:16
[    3.636406] Console: switching to colour frame buffer device 228x61
[    3.657761] fb0: EFI VGA frame buffer device
[    3.662864] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    3.664499] console [ttyS0] disabled
[    3.664530] 3f215040.serial: ttyS0 at MMIO 0x0 (irq = 61, base_baud = 31249999) is a 16550
[    3.677483] console [ttyS0] enabled
[    3.678666] Serial: AMBA driver
[    3.679257] msm_serial: driver initialized
[    3.680541] cacheinfo: Unable to detect cache hierarchy for CPU 0
[    3.681565] libphy: Fixed MDIO Bus: probed
[    3.682183] mousedev: PS/2 mouse device common for all mice
[    3.685033] ledtrig-cpu: registered to indicate activity on CPUs
[    3.685388] EFI Variables Facility v0.08 2004-May-17
[    3.685417] efivars: get_next_variable: status=8000000000000007
[    3.685519] hidraw: raw HID events driver (C) Jiri Kosina
[    3.685905] bcm2835-mbox 3f00b880.mailbox: mailbox enabled
[    3.687973] NET: Registered protocol family 10
[    3.713254] Segment Routing with IPv6
[    3.714810] registered taskstats version 1
[    3.714833] Loading compiled-in X.509 certificates
[    3.714965] zswap: loaded using pool lzo/zbud
[    3.737457] Key type big_key registered
[    3.748614] Key type encrypted registered
[    3.748635] AppArmor: AppArmor sha1 policy hashing enabled
[    3.753655] 3f201000.serial: ttyAMA0 at MMIO 0x3f201000 (irq = 72, base_baud = 0) is a PL011 rev2
[    3.756119] raspberrypi-firmware soc:firmware: Attached to firmware from 2016-09-02 11:49
[    3.758185] hctosys: unable to open rtc device (rtc0)
[    3.786151] Freeing unused kernel memory: 5056K
[    3.811907] efivars: get_next_variable: status=8000000000000007
[    3.820125] systemd[1]: systemd 228 running in system mode. (+PAM -AUDIT +SELINUX -IMA +APPARMOR -SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN)
[    3.821298] systemd[1]: Detected architecture arm64.
[    3.821318] systemd[1]: Running in initial RAM disk.
[    3.821467] systemd[1]: Set hostname to <linux>.
[    4.097464] systemd[1]: Listening on Journal Socket.
[    4.097821] systemd[1]: Reached target Timers.
[    4.098344] systemd[1]: Listening on Journal Audit Socket.
[    4.098635] systemd[1]: Listening on udev Control Socket.
[    4.098976] systemd[1]: Started Dispatch Password Requests to Console Directory Watch.
[    4.099046] systemd[1]: Reached target Paths.
[    5.004609] sdhost-bcm2835 3f202000.sdhost: unable to initialise DMA channel. Falling back to PIO
[    5.154518] sdhost-bcm2835 3f202000.sdhost: loaded - DMA disabled
[    5.155215] sdhost-bcm2835 3f202000.sdhost: resetting
[    5.281198] mmc0: host does not support reading read-only switch, assuming write-enable
[    5.282971] mmc0: new high speed SDHC card at address 0001
[    5.292431] bounce: isa pool size: 16 pages
[    5.292649] mmcblk0: mmc0:0001 00000 14.6 GiB 
[    5.295681]  mmcblk0: p1 p2 p3
[    5.578269] PM: Starting manual resume from disk
[    5.579100] PM: Image not found (code -22)
[    5.801684] random: crng init done
[    5.807429] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
[    6.422364] systemd-journald[150]: Received SIGTERM from PID 1 (systemd).
[    6.514245] systemd: 16 output lines suppressed due to ratelimiting
[    7.933350] EXT4-fs (mmcblk0p2): re-mounted. Opts: nobarrier
[    8.269253] systemd-journald[327]: Received request to flush runtime journal from PID 1
[    9.872387] sdhci: Secure Digital Host Controller Interface driver
[    9.872402] sdhci: Copyright(c) Pierre Ossman
[    9.886280] sdhci-pltfm: SDHCI platform and OF driver helper
[    9.984829] mmc1: SDHCI controller on 3f300000.sdhci [3f300000.sdhci] using PIO
[    9.988368] i2c-bcm2835 3f805000.i2c: Could not read clock-frequency property
[   10.082207] usbcore: registered new interface driver usbfs
[   10.082322] usbcore: registered new interface driver hub
[   10.082551] usbcore: registered new device driver usb
[   10.966182] bcm2835-rng 3f104000.rng: hwrng registered
[   10.967827] bcm2835-wdt 3f100000.watchdog: Broadcom BCM2835 watchdog timer
[   10.969164] dwc2 3f980000.usb: 3f980000.usb supply vusb_d not found, using dummy regulator
[   10.969285] dwc2 3f980000.usb: 3f980000.usb supply vusb_a not found, using dummy regulator
[   11.020617] mmc1: queuing unknown CIS tuple 0x80 (2 bytes)
[   11.041623] mmc1: queuing unknown CIS tuple 0x80 (3 bytes)
[   11.045995] mmc1: queuing unknown CIS tuple 0x80 (3 bytes)
[   11.057082] Adding 505172k swap on /dev/mmcblk0p3.  Priority:-2 extents:1 across:505172k SSFS
[   11.058565] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)
[   11.118179] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[   11.268064] mmc1: new high speed SDIO card at address 0001
[   11.434521] dwc2 3f980000.usb: dwc2_check_params: Invalid parameter g_np_tx_fifo_size=1024
[   11.434579] dwc2 3f980000.usb: EPs: 8, dedicated fifos, 4080 entries in SPRAM
[   11.436247] dwc2 3f980000.usb: DWC OTG Controller
[   11.436311] dwc2 3f980000.usb: new USB bus registered, assigned bus number 1
[   11.436387] dwc2 3f980000.usb: irq 41, io mem 0x3f980000
[   11.437127] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[   11.437140] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   11.437150] usb usb1: Product: DWC OTG Controller
[   11.437160] usb usb1: Manufacturer: Linux 4.15.0-rc4-3.g809d3f7-default dwc2_hsotg
[   11.437169] usb usb1: SerialNumber: 3f980000.usb
[   11.438243] hub 1-0:1.0: USB hub found
[   11.438318] hub 1-0:1.0: 1 port detected
[   11.538017] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[   11.544560] Registered IR keymap rc-cec
[   11.544897] rc rc0: RC for vc4 as /devices/platform/soc/3f902000.hdmi/rc/rc0
[   11.545210] input: RC for vc4 as /devices/platform/soc/3f902000.hdmi/rc/rc0/input0
[   11.545926] vc4_hdmi 3f902000.hdmi: 'dmas' DT property is missing, no HDMI audio
[   11.546111] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
[   11.546309] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4])
[   11.546792] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4])
[   11.547140] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4])
[   11.547478] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4])
[   11.569784] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[   11.578534] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops [vc4])
[   11.578555] checking generic (3d513000 6d9000) vs hw (0 ffffffffffffffff)
[   11.578563] fb: switching to vc4drmfb from EFI VGA
[   11.578697] Console: switching to colour dummy device 80x25
[   11.580626] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
[   11.580647] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[   11.580654] [drm] Driver supports precise vblank timestamp query.
[   11.586754] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[   11.586779] cfg80211: failed to load regulatory.db
[   11.627445] Console: switching to colour frame buffer device 240x67
[   11.696464] vc4-drm soc:gpu: fb0:  frame buffer device
[   11.762819] brcmfmac: F1 signature read @0x18000000=0x1541a9a6
[   11.787963] brcmfmac: brcmf_fw_map_chip_to_name: using brcm/brcmfmac43430-sdio.bin for chip 0x00a9a6(43430) rev 0x000001
[   11.799858] usbcore: registered new interface driver brcmfmac
[   11.894493] usb 1-1: new high-speed USB device number 2 using dwc2
[   12.149147] usb 1-1: New USB device found, idVendor=0424, idProduct=9514
[   12.149168] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[   12.152659] hub 1-1:1.0: USB hub found
[   12.176530] hub 1-1:1.0: 5 ports detected
[   12.181770] brcmfmac mmc1:0001:1: Direct firmware load for brcm/brcmfmac43430-sdio.clm_blob failed with error -2
[   12.182737] brcmfmac: brcmf_c_preinit_dcmds: Firmware version = wl0: Aug 29 2016 20:48:16 version 7.45.41.26 (r640327) FWID 01-4527cfab
[   12.544537] usb 1-1.1: new high-speed USB device number 3 using dwc2
[   12.703680] usb 1-1.1: New USB device found, idVendor=0424, idProduct=ec00
[   12.703709] usb 1-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[   12.824488] usb 1-1.4: new low-speed USB device number 4 using dwc2
[   12.987449] usb 1-1.4: New USB device found, idVendor=0b38, idProduct=0003
[   12.987472] usb 1-1.4: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[   12.987481] usb 1-1.4: Product: USB-compliant keyboard
[   13.090170] smsc95xx v1.0.6
[   13.308320] smsc95xx 1-1.1:1.0 eth0: register 'smsc95xx' at usb-3f980000.usb-1.1, smsc95xx USB 2.0 Ethernet, b8:27:eb:69:dc:65
[   13.308696] usbcore: registered new interface driver smsc95xx
[   13.357161] usbcore: registered new interface driver usbhid
[   13.357178] usbhid: USB HID core driver
[   13.426562] input: USB-compliant keyboard as /devices/platform/soc/3f980000.usb/usb1/1-1/1-1.4/1-1.4:1.0/0003:0B38:0003.0001/input/input1
[   13.505918] hid-generic 0003:0B38:0003.0001: input,hidraw0: USB HID v1.10 Keyboard [USB-compliant keyboard] on usb-3f980000.usb-1.4/input0
[   13.509938] input: USB-compliant keyboard as /devices/platform/soc/3f980000.usb/usb1/1-1/1-1.4/1-1.4:1.1/0003:0B38:0003.0002/input/input2
[   13.575767] hid-generic 0003:0B38:0003.0002: input,hiddev96,hidraw1: USB HID v1.10 Mouse [USB-compliant keyboard] on usb-3f980000.usb-1.4/input1
[   14.004312] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
[   14.004737] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   15.424561] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   15.426627] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
[   15.809012] NET: Registered protocol family 17
[   34.501723] device-mapper: uevent: version 1.0.3
[   34.502130] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com
[193799.841782] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[193799.849662] sdhost-bcm2835 3f202000.sdhost: resetting
[193810.081869] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[193810.089748] sdhost-bcm2835 3f202000.sdhost: resetting
[193810.161913] mmcblk0: error -110 sending stop command, original cmd response 0x900, card status 0xe00
[194410.176784] mmc0: Card stuck in programming state! mmcblk0 card_busy_detect
[194424.413063] systemd-coredump[16444]: Detected coredump of the journal daemon itself, diverted to /var/lib/systemd/coredump/core.systemd-journal.0.23d8a3f87c19473e93c439146450854f.327.1514410590000000.xz.
[194427.366715] systemd-journald[16449]: File /var/log/journal/4e6350529148997bd46d32525977db97/system.journal corrupted or uncleanly shut down, renaming and replacing.
[194456.487192] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[194456.495070] sdhost-bcm2835 3f202000.sdhost: resetting
[194466.727276] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[194466.735156] sdhost-bcm2835 3f202000.sdhost: resetting
[194466.807341] mmcblk0: error -110 sending stop command, original cmd response 0x900, card status 0xe00
[195066.822193] mmc0: Card stuck in programming state! mmcblk0 card_busy_detect
[195077.932349] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[195077.940232] sdhost-bcm2835 3f202000.sdhost: resetting
[195088.172423] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[195088.180303] sdhost-bcm2835 3f202000.sdhost: resetting
[195088.252458] mmcblk0: error -110 sending stop command, original cmd response 0x900, card status 0xe00
[195688.267335] mmc0: Card stuck in programming state! mmcblk0 card_busy_detect
[195698.737457] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[195698.745333] sdhost-bcm2835 3f202000.sdhost: resetting
[195708.977536] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[195708.985415] sdhost-bcm2835 3f202000.sdhost: resetting
[195709.057590] mmcblk0: error -110 sending stop command, original cmd response 0x900, card status 0xe00
[196309.072491] mmc0: Card stuck in programming state! mmcblk0 card_busy_detect
[196310.712454] systemd-journald[16675]: File /var/log/journal/4e6350529148997bd46d32525977db97/system.journal corrupted or uncleanly shut down, renaming and replacing.
[196310.798076] systemd-journald[16675]: Received SIGTERM from PID 1 (systemd).
[196418.743407] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[196418.751286] sdhost-bcm2835 3f202000.sdhost: resetting
[196428.983516] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[196428.991396] sdhost-bcm2835 3f202000.sdhost: resetting
[196429.063555] mmcblk0: error -110 sending stop command, original cmd response 0x900, card status 0xe00
[197029.078433] mmc0: Card stuck in programming state! mmcblk0 card_busy_detect
[197047.094867] systemd-journald[17382]: File /var/log/journal/4e6350529148997bd46d32525977db97/system.journal corrupted or uncleanly shut down, renaming and replacing.
[197092.668986] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[197092.676866] sdhost-bcm2835 3f202000.sdhost: resetting
[197102.909072] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[197102.916950] sdhost-bcm2835 3f202000.sdhost: resetting
[197102.989141] mmcblk0: error -110 sending stop command, original cmd response 0x900, card status 0xe00
[197703.003876] mmc0: Card stuck in programming state! mmcblk0 card_busy_detect
[197720.968609] systemd-journald[17435]: File /var/log/journal/4e6350529148997bd46d32525977db97/system.journal corrupted or uncleanly shut down, renaming and replacing.
[197770.434450] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[197770.442330] sdhost-bcm2835 3f202000.sdhost: resetting
[197780.674532] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[197780.682413] sdhost-bcm2835 3f202000.sdhost: resetting
[197780.754588] mmcblk0: error -110 sending stop command, original cmd response 0x900, card status 0xe00
[198380.769309] mmc0: Card stuck in programming state! mmcblk0 card_busy_detect
[198386.484664] systemd-journald[17528]: File /var/log/journal/4e6350529148997bd46d32525977db97/system.journal corrupted or uncleanly shut down, renaming and replacing.
[198414.279613] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[198414.287495] sdhost-bcm2835 3f202000.sdhost: resetting
[198424.519689] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[198424.527565] sdhost-bcm2835 3f202000.sdhost: resetting
[198424.599743] mmcblk0: error -110 sending stop command, original cmd response 0x900, card status 0xe00
[199024.614283] mmc0: Card stuck in programming state! mmcblk0 card_busy_detect
[199027.830987] systemd-journald[17580]: File /var/log/journal/4e6350529148997bd46d32525977db97/system.journal corrupted or uncleanly shut down, renaming and replacing.
[199038.887620] systemd-coredump[17576]: Detected coredump of the journal daemon itself, diverted to /var/lib/systemd/coredump/core.systemd-journal.0.23d8a3f87c19473e93c439146450854f.17528.1514415208000000.xz.
[199147.085257] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[199147.093138] sdhost-bcm2835 3f202000.sdhost: resetting
[199157.325316] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[199157.333193] sdhost-bcm2835 3f202000.sdhost: resetting
[199157.405398] mmcblk0: error -110 sending stop command, original cmd response 0x900, card status 0xe00
[199757.419866] mmc0: Card stuck in programming state! mmcblk0 card_busy_detect
[199778.770075] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[199778.777955] sdhost-bcm2835 3f202000.sdhost: resetting
[199789.010157] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[199789.018037] sdhost-bcm2835 3f202000.sdhost: resetting
[199789.090191] mmcblk0: error -110 sending stop command, original cmd response 0x900, card status 0xe00
[200389.104663] mmc0: Card stuck in programming state! mmcblk0 card_busy_detect
[200392.241350] systemd-coredump[17677]: Detected coredump of the journal daemon itself, diverted to /var/lib/systemd/coredump/core.systemd-journal.0.23d8a3f87c19473e93c439146450854f.17580.1514415942000000.xz.
[200392.673512] systemd-journald[17701]: File /var/log/journal/4e6350529148997bd46d32525977db97/system.journal corrupted or uncleanly shut down, renaming and replacing.
[200494.295507] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[200494.303419] sdhost-bcm2835 3f202000.sdhost: resetting
[200504.535606] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[200504.543502] sdhost-bcm2835 3f202000.sdhost: resetting
[200504.615629] mmcblk0: error -110 sending stop command, original cmd response 0x900, card status 0xe00
[201104.630169] mmc0: Card stuck in programming state! mmcblk0 card_busy_detect
[201111.953630] systemd-journald[18099]: File /var/log/journal/4e6350529148997bd46d32525977db97/system.journal corrupted or uncleanly shut down, renaming and replacing.
[201138.780453] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[201138.788333] sdhost-bcm2835 3f202000.sdhost: resetting
[201149.020530] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[201149.028408] sdhost-bcm2835 3f202000.sdhost: resetting
[201149.100592] mmcblk0: error -110 sending stop command, original cmd response 0x900, card status 0xe00
[201749.115168] mmc0: Card stuck in programming state! mmcblk0 card_busy_detect
[201760.225312] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[201760.233191] sdhost-bcm2835 3f202000.sdhost: resetting
[201770.465375] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[201770.473255] sdhost-bcm2835 3f202000.sdhost: resetting
[201770.545422] mmcblk0: error -110 sending stop command, original cmd response 0x900, card status 0xe00
[202370.559771] mmc0: Card stuck in programming state! mmcblk0 card_busy_detect
[202371.553562] systemd-journald[18396]: File /var/log/journal/4e6350529148997bd46d32525977db97/system.journal corrupted or uncleanly shut down, renaming and replacing.
[202391.909940] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[202391.917818] sdhost-bcm2835 3f202000.sdhost: resetting
[202402.150013] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
[202402.157891] sdhost-bcm2835 3f202000.sdhost: resetting
[202402.230081] mmcblk0: error -110 sending stop command, original cmd response 0x900, card status 0xe00
[203002.244336] mmc0: Card stuck in programming state! mmcblk0 card_busy_detect
[203006.584746] systemd-journald[19961]: File /var/log/journal/4e6350529148997bd46d32525977db97/system.journal corrupted or uncleanly shut down, renaming and replacing.
[203007.740753] systemd-coredump[19957]: Detected coredump of the journal daemon itself, diverted to /var/lib/systemd/coredump/core.systemd-journal.0.23d8a3f87c19473e93c439146450854f.18396.1514419187000000.xz.
[203508.078390] fuse init (API version 7.26)
Stefan Wahren Feb. 14, 2018, 4:49 p.m. UTC | #4
Hi Michal,

[add Phil]

Am 14.02.2018 um 17:13 schrieb Michal Suchánek:
> On Wed, 14 Feb 2018 16:36:49 +0100
> Michal Suchánek <msuchanek@suse.de> wrote:
>
>> On Wed, 14 Feb 2018 15:58:31 +0100
>> Stefan Wahren <stefan.wahren@i2se.com> wrote:
>>
>>> Hi Michal,
>>>
>>> Am 14.02.2018 um 15:38 schrieb Michal Suchanek:  
>>>> The bcm2835 mmc host tends to lock up for unknown reason so reset
>>>> it on timeout. The upper mmc block layer tries retransimitting
>>>> with single blocks which tends to work out after a long wait.
>>>>
>>>> This is better than giving up and leaving the machine broken for
>>>> no obvious reason.    
>>> could you please provide more information about this issue (affected
>>> hardware, kernel config, version, dmesg, reproducible scenario)?
>>>   
> It tends to reproduce when upgrading a few packages with zypper and
> otherwise at random during system operation. It seems that for my card
> it worsens with age to some degree so perhaps it depends on the
> fragmentation of the internal card flash.
>
> Attaching dmesg and kernel config.

do you noticed this issue before 4.15-rc4?

Could you please test with 4.15 final again?

What kind of SD card (name) triggers the issue?

Thanks Stefan

>
> Thanks
>
> Michal
Michal Suchánek Feb. 14, 2018, 7:24 p.m. UTC | #5
On Wed, 14 Feb 2018 17:49:31 +0100
Stefan Wahren <stefan.wahren@i2se.com> wrote:

> Hi Michal,
> 
> [add Phil]
> 
> Am 14.02.2018 um 17:13 schrieb Michal Suchánek:
> > On Wed, 14 Feb 2018 16:36:49 +0100
> > Michal Suchánek <msuchanek@suse.de> wrote:
> >  
> >> On Wed, 14 Feb 2018 15:58:31 +0100
> >> Stefan Wahren <stefan.wahren@i2se.com> wrote:
> >>  
> >>> Hi Michal,
> >>>
> >>> Am 14.02.2018 um 15:38 schrieb Michal Suchanek:    
> >>>> The bcm2835 mmc host tends to lock up for unknown reason so reset
> >>>> it on timeout. The upper mmc block layer tries retransimitting
> >>>> with single blocks which tends to work out after a long wait.
> >>>>
> >>>> This is better than giving up and leaving the machine broken for
> >>>> no obvious reason.      
> >>> could you please provide more information about this issue
> >>> (affected hardware, kernel config, version, dmesg, reproducible
> >>> scenario)? 
> > It tends to reproduce when upgrading a few packages with zypper and
> > otherwise at random during system operation. It seems that for my
> > card it worsens with age to some degree so perhaps it depends on the
> > fragmentation of the internal card flash.
> >
> > Attaching dmesg and kernel config.  
> 
> do you noticed this issue before 4.15-rc4?

I initially noticed it with 4.4 kernel with some backports to make it
bootable on RPi.
> 
> Could you please test with 4.15 final again?

Right, I can apply the patches on something more recent.

> 
> What kind of SD card (name) triggers the issue?

Samsung EVO MB-MP16D

Also see https://elinux.org/RPi_SD_cards#Which_SD_card.3F

Thanks

Michal
Stefan Wahren Feb. 14, 2018, 8:30 p.m. UTC | #6
Hi Michal,

> Michal Suchánek <msuchanek@suse.de> hat am 14. Februar 2018 um 20:24 geschrieben:
> 
> 
> On Wed, 14 Feb 2018 17:49:31 +0100
> Stefan Wahren <stefan.wahren@i2se.com> wrote:
> 
> > Hi Michal,
> > 
> > [add Phil]
> > 
> > Am 14.02.2018 um 17:13 schrieb Michal Suchánek:
> > > On Wed, 14 Feb 2018 16:36:49 +0100
> > > Michal Suchánek <msuchanek@suse.de> wrote:
> > >  
> > >> On Wed, 14 Feb 2018 15:58:31 +0100
> > >> Stefan Wahren <stefan.wahren@i2se.com> wrote:
> > >>  
> > >>> Hi Michal,
> > >>>
> > >>> Am 14.02.2018 um 15:38 schrieb Michal Suchanek:    
> > >>>> The bcm2835 mmc host tends to lock up for unknown reason so reset
> > >>>> it on timeout. The upper mmc block layer tries retransimitting
> > >>>> with single blocks which tends to work out after a long wait.
> > >>>>
> > >>>> This is better than giving up and leaving the machine broken for
> > >>>> no obvious reason.      
> > >>> could you please provide more information about this issue
> > >>> (affected hardware, kernel config, version, dmesg, reproducible
> > >>> scenario)? 
> > > It tends to reproduce when upgrading a few packages with zypper and
> > > otherwise at random during system operation. It seems that for my
> > > card it worsens with age to some degree so perhaps it depends on the
> > > fragmentation of the internal card flash.
> > >
> > > Attaching dmesg and kernel config.  
> > 
> > do you noticed this issue before 4.15-rc4?
> 
> I initially noticed it with 4.4 kernel with some backports to make it
> bootable on RPi.

this confuses me. Gerd and i ported this driver from downstream and finally it's got merged in 4.12. 

So do you mean that you backported the mainline version to 4.4 or the downstream version of 4.4?

On a quick look they seems identical, but they aren't.

> > 
> > Could you please test with 4.15 final again?
> 
> Right, I can apply the patches on something more recent.
> 
> > 
> > What kind of SD card (name) triggers the issue?
> 
> Samsung EVO MB-MP16D

Thanks

> 
> Also see https://elinux.org/RPi_SD_cards#Which_SD_card.3F

I'm very sceptical about this list. The card above is listed as OK and NOK. The experienced issues doesn't need to be direct related to the card (unproperly umounted, bad driver, ...).

Stefan

> 
> Thanks
> 
> Michal
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Stefan Wahren March 3, 2018, 1:58 p.m. UTC | #7
Hi Michal,

[add Stefan to CC]

> Michal Suchánek <msuchanek@suse.de> hat am 14. Februar 2018 um 20:24 geschrieben:
> 
> 
> On Wed, 14 Feb 2018 17:49:31 +0100
> Stefan Wahren <stefan.wahren@i2se.com> wrote:
> 
> > Hi Michal,
> > 
> > [add Phil]
> > 
> > Am 14.02.2018 um 17:13 schrieb Michal Suchánek:
> > > On Wed, 14 Feb 2018 16:36:49 +0100
> > > Michal Suchánek <msuchanek@suse.de> wrote:
> > >  
> > >> On Wed, 14 Feb 2018 15:58:31 +0100
> > >> Stefan Wahren <stefan.wahren@i2se.com> wrote:
> > >>  
> > >>> Hi Michal,
> > >>>
> > >>> Am 14.02.2018 um 15:38 schrieb Michal Suchanek:    
> > >>>> The bcm2835 mmc host tends to lock up for unknown reason so reset
> > >>>> it on timeout. The upper mmc block layer tries retransimitting
> > >>>> with single blocks which tends to work out after a long wait.
> > >>>>
> > >>>> This is better than giving up and leaving the machine broken for
> > >>>> no obvious reason.      
> > >>> could you please provide more information about this issue
> > >>> (affected hardware, kernel config, version, dmesg, reproducible
> > >>> scenario)? 
> > > It tends to reproduce when upgrading a few packages with zypper and
> > > otherwise at random during system operation. It seems that for my
> > > card it worsens with age to some degree so perhaps it depends on the
> > > fragmentation of the internal card flash.
> > >
> > > Attaching dmesg and kernel config.  
> > 
> > do you noticed this issue before 4.15-rc4?
> 
> I initially noticed it with 4.4 kernel with some backports to make it
> bootable on RPi.
> > 
> > Could you please test with 4.15 final again?
> 
> Right, I can apply the patches on something more recent.
> 
> > 
> > What kind of SD card (name) triggers the issue?
> 
> Samsung EVO MB-MP16D
> 
> Also see https://elinux.org/RPi_SD_cards#Which_SD_card.3F
> 
> Thanks
> 
> Michal
> 

yesterday i finished my stress tests with Raspberry Pi 3.

Scenario:
- copy Tumbleweed on SD card (openSUSE-Tumbleweed-ARM-JeOS-raspberrypi3.aarch64-2018.02.02-Build1.2.raw, Linux 4.14.15)
- setup locales with yast
- run zypper update
- reboot
- install and remove java 1.8 in a loop for at least 1 hour

Results of the different SD cards:
Toshiba uSDHC Class 10 UHS-1 32 GB: PASS
BASETech uSDHC Class 10 16 GB: PASS
Samsung uSDHC EVO+ UHS-1 16 GB: PASS
Samsung uSDHC Class 6 32 GB: PASS
SanDisk Edge Class 4 16 GB: PASS
Kingston uSDHC Class 10 UHS-1 32 GB: PASS
QUMOX uSDHC Class 10 UHS-1 16 GB: FAIL (zypper segfaulted permantently)
Transcend uSDHC Class 10 UHS-1 32 GB: PASS

I was never able to reproduce this timeout. So i still need the feedback about the 4.15 and i a reliable test scenario.

In a github issue, i've read that badblocks could reproduce the issue more likely.

Regards
Stefan
Michal Suchánek March 4, 2018, 3:57 p.m. UTC | #8
On Wed, 14 Feb 2018 21:30:16 +0100 (CET)
Stefan Wahren <stefan.wahren@i2se.com> wrote:

> Hi Michal,
> 
> > Michal Suchánek <msuchanek@suse.de> hat am 14. Februar 2018 um
> > 20:24 geschrieben:
> > 
> > 
> > On Wed, 14 Feb 2018 17:49:31 +0100
> > Stefan Wahren <stefan.wahren@i2se.com> wrote:
> >   
> > > Hi Michal,
> > > 
> > > [add Phil]
> > > 
> > > Am 14.02.2018 um 17:13 schrieb Michal Suchánek:  
> > > > On Wed, 14 Feb 2018 16:36:49 +0100
> > > > Michal Suchánek <msuchanek@suse.de> wrote:
> > > >    
> > > >> On Wed, 14 Feb 2018 15:58:31 +0100
> > > >> Stefan Wahren <stefan.wahren@i2se.com> wrote:
> > > >>    
> > > >>> Hi Michal,
> > > >>>
> > > >>> Am 14.02.2018 um 15:38 schrieb Michal Suchanek:      
> > > >>>> The bcm2835 mmc host tends to lock up for unknown reason so
> > > >>>> reset it on timeout. The upper mmc block layer tries
> > > >>>> retransimitting with single blocks which tends to work out
> > > >>>> after a long wait.
> > > >>>>
> > > >>>> This is better than giving up and leaving the machine broken
> > > >>>> for no obvious reason.        
> > > >>> could you please provide more information about this issue
> > > >>> (affected hardware, kernel config, version, dmesg,
> > > >>> reproducible scenario)?   
> > > > It tends to reproduce when upgrading a few packages with zypper
> > > > and otherwise at random during system operation. It seems that
> > > > for my card it worsens with age to some degree so perhaps it
> > > > depends on the fragmentation of the internal card flash.
> > > >
> > > > Attaching dmesg and kernel config.    
> > > 
> > > do you noticed this issue before 4.15-rc4?  
> > 
> > I initially noticed it with 4.4 kernel with some backports to make
> > it bootable on RPi.  
> 
> this confuses me. Gerd and i ported this driver from downstream and
> finally it's got merged in 4.12. 
> 
> So do you mean that you backported the mainline version to 4.4 or the
> downstream version of 4.4?

I did not backport it but looking at the changelog it is backport of
the 4.12 driver. It does not look as the 4.15 driver though. Looks like
there was some reorganization of the bcm mmc since then.

> 
> On a quick look they seems identical, but they aren't.
> 
> > > 
> > > Could you please test with 4.15 final again?  
> >

I tried upgrading to the current master (4.16-rc3+) and the issue is
still reproducible although less frequent. I did full upgrade from the
install image which installs over 300 packages and the issue triggered
somewhere around 200th while before installing a half dozen packages
would usually trigger it.

 
> > Right, I can apply the patches on something more recent.
> >   
> > > 
> > > What kind of SD card (name) triggers the issue?  
> > 
> > Samsung EVO MB-MP16D  
> 
> Thanks
> 
> > 
> > Also see https://elinux.org/RPi_SD_cards#Which_SD_card.3F  
> 
> I'm very sceptical about this list. The card above is listed as OK
> and NOK. The experienced issues doesn't need to be direct related to
> the card (unproperly umounted, bad driver, ...).

Right, it just shows that this is not an isolated problem. Not all test
results are reliable, of course. Some include interesting details,
though.

Thanks

Michal
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[    0.000000] Linux version 4.16.0-rc3-6.g005d1b3-default (geeko@buildhost) (gcc version 7.3.0 (SUSE Linux)) #1 SMP Sun Mar 4 10:10:16 UTC 2018 (005d1b3)
[    0.000000] Machine model: Raspberry Pi 3 Model B
[    0.000000] efi: Getting EFI parameters from FDT:
[    0.000000] efi: EFI v2.05 by Das U-boot
[    0.000000] efi:  SMBIOS=0x35b0e000 
[    0.000000] NUMA: No NUMA configuration found
[    0.000000] NUMA: Faking a node at [mem 0x0000000000000000-0x0000000036ffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x36fdf300-0x36fe0dff]
[    0.000000] Zone ranges:
[    0.000000]   DMA32    [mem 0x0000000000000000-0x0000000036ffffff]
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x0000000000000fff]
[    0.000000]   node   0: [mem 0x0000000000001000-0x0000000035b0dfff]
[    0.000000]   node   0: [mem 0x0000000035b0e000-0x0000000035b0efff]
[    0.000000]   node   0: [mem 0x0000000035b0f000-0x0000000036f86fff]
[    0.000000]   node   0: [mem 0x0000000036f87000-0x0000000036f87fff]
[    0.000000]   node   0: [mem 0x0000000036f88000-0x0000000036ffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000036ffffff]
[    0.000000] On node 0 totalpages: 225280
[    0.000000]   DMA32 zone: 3520 pages used for memmap
[    0.000000]   DMA32 zone: 0 pages reserved
[    0.000000]   DMA32 zone: 225280 pages, LIFO batch:31
[    0.000000] random: fast init done
[    0.000000] percpu: Embedded 25 pages/cpu @        (ptrval) s65048 r8192 d29160 u102400
[    0.000000] pcpu-alloc: s65048 r8192 d29160 u102400 alloc=25*4096
[    0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 
[    0.000000] Detected VIPT I-cache on CPU0
[    0.000000] CPU features: enabling workaround for ARM erratum 845719
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 221760
[    0.000000] Policy zone: DMA32
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/Image-4.16.0-rc3-6.g005d1b3-default root=UUID=e70a1a15-cf94-46da-9a9c-5cb5abc296ad root=/dev/disk/by-id/mmc-00000_0x55353082-part2 disk=/dev/disk/by-id/mmc-00000_0x55353082 resume=/dev/disk/by-id/mmc-00000_0x55353082-part3 quiet splash=silent plymouth.enable=0 console=ttyS0,115200n8 console=tty quiet
[    0.000000] Memory: 844284K/901120K available (9404K kernel code, 1642K rwdata, 4168K rodata, 5248K init, 1031K bss, 56836K reserved, 0K cma-reserved)
[    0.000000] ftrace: allocating 35374 entries in 139 pages
[    0.000000] Hierarchical RCU implementation.
[    0.000000] 	RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=4.
[    0.000000] 	Tasks RCU enabled.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] arch_timer: WARNING: Invalid trigger for IRQ75, assuming level low
[    0.000000] arch_timer: WARNING: Please fix your firmware
[    0.000000] arch_timer: cp15 timer(s) running at 19.20MHz (phys).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x46d987e47, max_idle_ns: 440795202767 ns
[    0.000007] sched_clock: 56 bits at 19MHz, resolution 52ns, wraps every 4398046511078ns
[    0.000528] Console: colour dummy device 80x25
[    0.000550] console [tty0] enabled
[    0.000647] Calibrating delay loop (skipped), value calculated using timer frequency.. 38.40 BogoMIPS (lpj=192000)
[    0.000664] pid_max: default: 32768 minimum: 301
[    0.000876] Security Framework initialized
[    0.001000] AppArmor: AppArmor initialized
[    0.002588] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.003281] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.003368] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes)
[    0.003401] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes)
[    0.005538] ASID allocator initialised with 32768 entries
[    0.005688] Hierarchical SRCU implementation.
[    0.009053] Remapping and enabling EFI services.
[    0.009116]   EFI remap 0x0000000035b0e000 =>         (ptrval)
[    0.009126]   EFI remap 0x0000000036f87000 =>         (ptrval)
[    0.009134]   EFI remap 0x000000003f100000 =>         (ptrval)
[    0.009733] smp: Bringing up secondary CPUs ...
[    0.010846] Detected VIPT I-cache on CPU1
[    0.010911] arch_timer: WARNING: Invalid trigger for IRQ75, assuming level low
[    0.010914] arch_timer: WARNING: Please fix your firmware
[    0.010930] CPU1: Booted secondary processor 0x0000000001 [0x410fd034]
[    0.012168] Detected VIPT I-cache on CPU2
[    0.012214] arch_timer: WARNING: Invalid trigger for IRQ75, assuming level low
[    0.012217] arch_timer: WARNING: Please fix your firmware
[    0.012229] CPU2: Booted secondary processor 0x0000000002 [0x410fd034]
[    0.013385] Detected VIPT I-cache on CPU3
[    0.013421] arch_timer: WARNING: Invalid trigger for IRQ75, assuming level low
[    0.013424] arch_timer: WARNING: Please fix your firmware
[    0.013435] CPU3: Booted secondary processor 0x0000000003 [0x410fd034]
[    0.013643] smp: Brought up 1 node, 4 CPUs
[    0.013652] SMP: Total of 4 processors activated.
[    0.013661] CPU features: detected feature: 32-bit EL0 Support
[    0.013668] CPU features: detected feature: Kernel page table isolation (KPTI)
[    0.019389] CPU: All CPU(s) started at EL2
[    0.019418] alternatives: patching kernel code
[    0.021061] devtmpfs: initialized
[    0.028700] evm: security.selinux
[    0.028710] evm: security.apparmor
[    0.028715] evm: security.ima
[    0.028721] evm: security.capability
[    0.029302] Registered cp15_barrier emulation handler
[    0.029321] Registered setend emulation handler
[    0.029875] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.029966] futex hash table entries: 1024 (order: 5, 131072 bytes)
[    0.030430] pinctrl core: initialized pinctrl subsystem
[    0.031873] SMBIOS 3.0 present.
[    0.031894] DMI: raspberrypi rpi/rpi, BIOS 2017.05 05/08/2017
[    0.032741] NET: Registered protocol family 16
[    0.035560] audit: initializing netlink subsys (disabled)
[    0.035814] audit: type=2000 audit(0.030:1): state=initialized audit_enabled=0 res=1
[    0.037373] cpuidle: using governor ladder
[    0.037557] cpuidle: using governor menu
[    0.038012] vdso: 2 pages (1 code @ 00000000bc8db256, 1 data @ 00000000238b0543)
[    0.038034] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
[    0.038562] DMA: preallocated 256 KiB pool for atomic allocations
[    0.039381] Serial: AMBA PL011 UART driver
[    0.044609] uart-pl011 3f201000.serial: could not find pctldev for node /soc/gpio@7e200000/uart0_ctsrts_gpio32, deferring probe
[    0.063474] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    0.065311] ACPI: Interpreter disabled.
[    0.066207] vgaarb: loaded
[    0.066922] SCSI subsystem initialized
[    0.067185] libata version 3.00 loaded.
[    0.067753] EDAC MC: Ver: 3.0.0
[    0.067954] EDAC DEBUG: edac_mc_sysfs_init: device mc created
[    0.068375] Registered efivars operations
[    0.068404] efivars: get_next_variable: status=8000000000000007
[    0.069788] NetLabel: Initializing
[    0.069795] NetLabel:  domain hash size = 128
[    0.069801] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
[    0.069926] NetLabel:  unlabeled traffic allowed by default
[    0.070560] clocksource: Switched to clocksource arch_sys_counter
[    0.173668] VFS: Disk quotas dquot_6.6.0
[    0.173796] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.174666] AppArmor: AppArmor Filesystem Enabled
[    0.174769] pnp: PnP ACPI: disabled
[    0.187310] NET: Registered protocol family 2
[    0.188138] tcp_listen_portaddr_hash hash table entries: 512 (order: 1, 8192 bytes)
[    0.188213] TCP established hash table entries: 8192 (order: 4, 65536 bytes)
[    0.188385] TCP bind hash table entries: 8192 (order: 5, 131072 bytes)
[    0.188592] TCP: Hash tables configured (established 8192 bind 8192)
[    0.188875] UDP hash table entries: 512 (order: 2, 16384 bytes)
[    0.188935] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
[    0.189282] NET: Registered protocol family 1
[    0.189323] PCI: CLS 0 bytes, default 128
[    0.189824] Unpacking initramfs...
[    9.062128] Freeing initrd memory: 16252K
[    9.063128] kvm [1]: 8-bit VMID
[    9.063137] kvm [1]: IDMAP page: 9a4000
[    9.063144] kvm [1]: HYP VA range: 800000000000:ffffffffffff
[    9.064354] kvm [1]: Invalid trigger for IRQ76, assuming level low
[    9.064418] kvm [1]: virtual timer IRQ76
[    9.064594] kvm [1]: Hyp mode initialized successfully
[    9.067299] Initialise system trusted keyrings
[    9.067574] workingset: timestamp_bits=44 max_order=18 bucket_order=0
[    9.067848] zbud: loaded
[    9.076403] Key type asymmetric registered
[    9.076417] Asymmetric key parser 'x509' registered
[    9.076601] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 247)
[    9.076799] io scheduler noop registered
[    9.076807] io scheduler deadline registered
[    9.076901] io scheduler cfq registered (default)
[    9.076909] io scheduler kyber registered
[    9.076956] io scheduler bfq registered
[    9.085523] efifb: probing for efifb
[    9.085726] efifb: framebuffer at 0x3d513000, using 7012k, total 7011k
[    9.085735] efifb: mode is 1824x984x32, linelength=7296, pages=1
[    9.085740] efifb: scrolling: redraw
[    9.085750] efifb: Truecolor: size=8:8:8:8, shift=24:0:8:16
[    9.129154] Console: switching to colour frame buffer device 228x61
[    9.171661] fb0: EFI VGA frame buffer device
[    9.181531] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    9.184389] console [ttyS0] disabled
[    9.184439] 3f215040.serial: ttyS0 at MMIO 0x0 (irq = 61, base_baud = 31249999) is a 16550
[    9.190836] console [ttyS0] enabled
[    9.191987] Serial: AMBA driver
[    9.192584] msm_serial: driver initialized
[    9.194032] cacheinfo: Unable to detect cache hierarchy for CPU 0
[    9.195127] libphy: Fixed MDIO Bus: probed
[    9.195753] mousedev: PS/2 mouse device common for all mice
[    9.198724] ledtrig-cpu: registered to indicate activity on CPUs
[    9.199069] EFI Variables Facility v0.08 2004-May-17
[    9.199096] efivars: get_next_variable: status=8000000000000007
[    9.199199] hidraw: raw HID events driver (C) Jiri Kosina
[    9.199575] bcm2835-mbox 3f00b880.mailbox: mailbox enabled
[    9.201882] NET: Registered protocol family 10
[    9.227523] Segment Routing with IPv6
[    9.228961] registered taskstats version 1
[    9.228985] Loading compiled-in X.509 certificates
[    9.242732] Loaded X.509 cert 'home:michals OBS Project: 544f3f8a786dbbef2bc5f439f86f094f2347e991'
[    9.242883] zswap: loaded using pool lzo/zbud
[    9.266738] Key type big_key registered
[    9.278414] Key type encrypted registered
[    9.278437] AppArmor: AppArmor sha1 policy hashing enabled
[    9.278452] ima: No TPM chip found, activating TPM-bypass! (rc=-19)
[    9.278605] evm: HMAC attrs: 0x1
[    9.284131] 3f201000.serial: ttyAMA0 at MMIO 0x3f201000 (irq = 66, base_baud = 0) is a PL011 rev2
[    9.286607] raspberrypi-firmware soc:firmware: Attached to firmware from 2016-09-02 11:49
[    9.288723] hctosys: unable to open rtc device (rtc0)
[    9.315948] Freeing unused kernel memory: 5248K
[    9.345959] efivars: get_next_variable: status=8000000000000007
[    9.355500] systemd[1]: systemd 228 running in system mode. (+PAM -AUDIT +SELINUX -IMA +APPARMOR -SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN)
[    9.356743] systemd[1]: Detected architecture arm64.
[    9.356763] systemd[1]: Running in initial RAM disk.
[    9.356917] systemd[1]: Set hostname to <linux>.
[    9.638640] systemd[1]: Reached target Timers.
[    9.642236] systemd[1]: Created slice System Slice.
[    9.642504] systemd[1]: Listening on udev Kernel Socket.
[    9.643411] systemd[1]: Created slice system-systemd\x2dhibernate\x2dresume.slice.
[    9.643693] systemd[1]: Listening on Journal Socket (/dev/log).
[    9.643973] systemd[1]: Listening on udev Control Socket.
[   10.483945] sdhost-bcm2835 3f202000.sdhost: unable to initialise DMA channel. Falling back to PIO
[   10.498917] sdhci: Secure Digital Host Controller Interface driver
[   10.498931] sdhci: Copyright(c) Pierre Ossman
[   10.504426] sdhci-pltfm: SDHCI platform and OF driver helper
[   10.586429] sdhost-bcm2835 3f202000.sdhost: loaded - DMA disabled
[   10.587211] sdhost-bcm2835 3f202000.sdhost: resetting
[   10.618898] mmc1: SDHCI controller on 3f300000.sdhci [3f300000.sdhci] using PIO
[   10.643482] mmc1: queuing unknown CIS tuple 0x80 (2 bytes)
[   10.645234] mmc1: queuing unknown CIS tuple 0x80 (3 bytes)
[   10.646980] mmc1: queuing unknown CIS tuple 0x80 (3 bytes)
[   10.650113] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)
[   10.694224] mmc0: host does not support reading read-only switch, assuming write-enable
[   10.695999] mmc0: new high speed SDHC card at address 0001
[   10.710758] mmcblk0: mmc0:0001 00000 14.6 GiB 
[   10.715732]  mmcblk0: p1 p2 p3
[   10.805776] mmc1: new high speed SDIO card at address 0001
[   10.866739] usbcore: registered new interface driver usbfs
[   10.866834] usbcore: registered new interface driver hub
[   10.867021] usbcore: registered new device driver usb
[   10.988211] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[   10.989936] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[   10.990224] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[   10.990238] cfg80211: failed to load regulatory.db
[   11.123585] PM: Starting manual resume from disk
[   11.124345] PM: Image not found (code -22)
[   11.335132] random: crng init done
[   11.359854] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
[   12.100186] systemd-journald[157]: Received SIGTERM from PID 1 (systemd).
[   12.238771] systemd: 16 output lines suppressed due to ratelimiting
[   13.595204] EXT4-fs (mmcblk0p2): re-mounted. Opts: nobarrier
[   13.892099] systemd-journald[363]: Received request to flush runtime journal from PID 1
[   15.397135] bcm2835-wdt 3f100000.watchdog: Broadcom BCM2835 watchdog timer
[   16.413335] bcm2835-rng 3f104000.rng: hwrng registered
[   16.416059] i2c-bcm2835 3f805000.i2c: Could not read clock-frequency property
[   16.417899] dwc2 3f980000.usb: 3f980000.usb supply vusb_d not found, using dummy regulator
[   16.418029] dwc2 3f980000.usb: 3f980000.usb supply vusb_a not found, using dummy regulator
[   16.484647] Adding 505172k swap on /dev/mmcblk0p3.  Priority:-2 extents:1 across:505172k SSFS
[   16.633867] brcmfmac: probe of mmc1:0001:1 failed with error -110
[   16.634398] brcmfmac: probe of mmc1:0001:2 failed with error -110
[   16.634626] usbcore: registered new interface driver brcmfmac
[   16.716968] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[   16.920804] dwc2 3f980000.usb: dwc2_check_params: Invalid parameter g_np_tx_fifo_size=1024
[   16.920872] dwc2 3f980000.usb: EPs: 8, dedicated fifos, 4080 entries in SPRAM
[   16.921404] dwc2 3f980000.usb: DWC OTG Controller
[   16.921443] dwc2 3f980000.usb: new USB bus registered, assigned bus number 1
[   16.921500] dwc2 3f980000.usb: irq 41, io mem 0x3f980000
[   16.921979] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[   16.921993] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   16.922003] usb usb1: Product: DWC OTG Controller
[   16.922013] usb usb1: Manufacturer: Linux 4.16.0-rc3-6.g005d1b3-default dwc2_hsotg
[   16.922023] usb usb1: SerialNumber: 3f980000.usb
[   16.924951] hub 1-0:1.0: USB hub found
[   16.925579] hub 1-0:1.0: 1 port detected
[   17.163362] rc_core: IR Remote Control driver registered, major 239
[   17.370705] usb 1-1: new high-speed USB device number 2 using dwc2
[   17.640963] usb 1-1: New USB device found, idVendor=0424, idProduct=9514
[   17.640980] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[   17.641982] hub 1-1:1.0: USB hub found
[   17.642079] hub 1-1:1.0: 5 ports detected
[   17.930635] Registered IR keymap rc-cec
[   17.930977] rc rc0: RC for vc4 as /devices/platform/soc/3f902000.hdmi/rc/rc0
[   17.931295] input: RC for vc4 as /devices/platform/soc/3f902000.hdmi/rc/rc0/input0
[   17.932069] vc4_hdmi 3f902000.hdmi: 'dmas' DT property is missing, no HDMI audio
[   17.932233] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
[   17.932606] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4])
[   17.933015] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4])
[   17.933313] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4])
[   17.933615] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4])
[   17.933730] vc4-drm soc:gpu: swiotlb: coherent allocation failed, size=16777216
[   17.933747] CPU: 2 PID: 460 Comm: systemd-udevd Not tainted 4.16.0-rc3-6.g005d1b3-default #1
[   17.933752] Hardware name: raspberrypi rpi/rpi, BIOS 2017.05 05/08/2017
[   17.933759] Call trace:
[   17.933779]  dump_backtrace+0x0/0x1d0
[   17.933788]  show_stack+0x24/0x30
[   17.933800]  dump_stack+0xbc/0xf8
[   17.933811]  swiotlb_alloc_buffer+0x174/0x178
[   17.933818]  swiotlb_alloc+0x64/0x78
[   17.933829]  __dma_alloc+0xe0/0x1d8
[   17.934156]  drm_gem_cma_create+0xa8/0x198 [drm]
[   17.934242]  vc4_bo_create+0x134/0x4e0 [vc4]
[   17.934329]  vc4_allocate_bin_bo+0x78/0x128 [vc4]
[   17.934416]  vc4_v3d_bind+0x100/0x278 [vc4]
[   17.934432]  component_bind_all+0x108/0x2c8
[   17.934518]  vc4_drm_bind+0xac/0x188 [vc4]
[   17.934529]  try_to_bring_up_master+0x214/0x300
[   17.934539]  component_master_add_with_match+0xcc/0x120
[   17.934624]  vc4_platform_drm_probe+0xc8/0x108 [vc4]
[   17.934636]  platform_drv_probe+0x60/0xc0
[   17.934645]  driver_probe_device+0x334/0x498
[   17.934653]  __driver_attach+0x124/0x128
[   17.934661]  bus_for_each_dev+0x70/0xa8
[   17.934668]  driver_attach+0x30/0x40
[   17.934676]  bus_add_driver+0x240/0x2b8
[   17.934684]  driver_register+0x6c/0x118
[   17.934692]  __platform_driver_register+0x54/0x60
[   17.934780]  vc4_drm_register+0x40/0x68 [vc4]
[   17.934789]  do_one_initcall+0x58/0x168
[   17.934802]  do_init_module+0x64/0x1dc
[   17.934811]  load_module+0x14c8/0x1678
[   17.934820]  SyS_finit_module+0xd8/0xf0
[   17.934827]  el0_svc_naked+0x30/0x34
[   17.934887] vc4-drm soc:gpu: swiotlb: coherent allocation failed, size=16777216
[   17.934899] CPU: 2 PID: 460 Comm: systemd-udevd Not tainted 4.16.0-rc3-6.g005d1b3-default #1
[   17.934904] Hardware name: raspberrypi rpi/rpi, BIOS 2017.05 05/08/2017
[   17.934907] Call trace:
[   17.934919]  dump_backtrace+0x0/0x1d0
[   17.934925]  show_stack+0x24/0x30
[   17.934937]  dump_stack+0xbc/0xf8
[   17.934946]  swiotlb_alloc_buffer+0x174/0x178
[   17.934953]  swiotlb_alloc+0x64/0x78
[   17.934962]  __dma_alloc+0xe0/0x1d8
[   17.935247]  drm_gem_cma_create+0xa8/0x198 [drm]
[   17.935332]  vc4_bo_create+0x1a4/0x4e0 [vc4]
[   17.935419]  vc4_allocate_bin_bo+0x78/0x128 [vc4]
[   17.935506]  vc4_v3d_bind+0x100/0x278 [vc4]
[   17.935516]  component_bind_all+0x108/0x2c8
[   17.935602]  vc4_drm_bind+0xac/0x188 [vc4]
[   17.935612]  try_to_bring_up_master+0x214/0x300
[   17.935622]  component_master_add_with_match+0xcc/0x120
[   17.935708]  vc4_platform_drm_probe+0xc8/0x108 [vc4]
[   17.935718]  platform_drv_probe+0x60/0xc0
[   17.935726]  driver_probe_device+0x334/0x498
[   17.935734]  __driver_attach+0x124/0x128
[   17.935741]  bus_for_each_dev+0x70/0xa8
[   17.935749]  driver_attach+0x30/0x40
[   17.935756]  bus_add_driver+0x240/0x2b8
[   17.935764]  driver_register+0x6c/0x118
[   17.935773]  __platform_driver_register+0x54/0x60
[   17.935860]  vc4_drm_register+0x40/0x68 [vc4]
[   17.935867]  do_one_initcall+0x58/0x168
[   17.935879]  do_init_module+0x64/0x1dc
[   17.935888]  load_module+0x14c8/0x1678
[   17.935897]  SyS_finit_module+0xd8/0xf0
[   17.935904]  el0_svc_naked+0x30/0x34
[   17.935943] vc4-drm soc:gpu: swiotlb: coherent allocation failed, size=16777216
[   17.935954] CPU: 2 PID: 460 Comm: systemd-udevd Not tainted 4.16.0-rc3-6.g005d1b3-default #1
[   17.935958] Hardware name: raspberrypi rpi/rpi, BIOS 2017.05 05/08/2017
[   17.935963] Call trace:
[   17.935973]  dump_backtrace+0x0/0x1d0
[   17.935979]  show_stack+0x24/0x30
[   17.935987]  dump_stack+0xbc/0xf8
[   17.935995]  swiotlb_alloc_buffer+0x174/0x178
[   17.936001]  swiotlb_alloc+0x64/0x78
[   17.936011]  __dma_alloc+0xe0/0x1d8
[   17.936271]  drm_gem_cma_create+0xa8/0x198 [drm]
[   17.936356]  vc4_bo_create+0x36c/0x4e0 [vc4]
[   17.936443]  vc4_allocate_bin_bo+0x78/0x128 [vc4]
[   17.936531]  vc4_v3d_bind+0x100/0x278 [vc4]
[   17.936541]  component_bind_all+0x108/0x2c8
[   17.936629]  vc4_drm_bind+0xac/0x188 [vc4]
[   17.936639]  try_to_bring_up_master+0x214/0x300
[   17.936648]  component_master_add_with_match+0xcc/0x120
[   17.936735]  vc4_platform_drm_probe+0xc8/0x108 [vc4]
[   17.936745]  platform_drv_probe+0x60/0xc0
[   17.936754]  driver_probe_device+0x334/0x498
[   17.936761]  __driver_attach+0x124/0x128
[   17.936770]  bus_for_each_dev+0x70/0xa8
[   17.936777]  driver_attach+0x30/0x40
[   17.936784]  bus_add_driver+0x240/0x2b8
[   17.936792]  driver_register+0x6c/0x118
[   17.936801]  __platform_driver_register+0x54/0x60
[   17.936889]  vc4_drm_register+0x40/0x68 [vc4]
[   17.936897]  do_one_initcall+0x58/0x168
[   17.936908]  do_init_module+0x64/0x1dc
[   17.936917]  load_module+0x14c8/0x1678
[   17.936926]  SyS_finit_module+0xd8/0xf0
[   17.936933]  el0_svc_naked+0x30/0x34
[   17.937033] [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from CMA:
[   17.944470] vc4_v3d 3fc00000.v3d: Failed to allocate memory for tile binning: -12. You may need to enable CMA or give it more memory.
[   17.957408] vc4-drm soc:gpu: failed to bind 3fc00000.v3d (ops vc4_v3d_ops [vc4]): -12
[   18.020719] usb 1-1.1: new high-speed USB device number 3 using dwc2
[   18.081881] vc4-drm soc:gpu: master bind failed: -12
[   18.081938] vc4-drm: probe of soc:gpu failed with error -12
[   18.171767] usb 1-1.1: New USB device found, idVendor=0424, idProduct=ec00
[   18.171784] usb 1-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[   18.300755] usb 1-1.4: new low-speed USB device number 4 using dwc2
[   18.595642] smsc95xx v1.0.6
[   18.787944] usb 1-1.4: New USB device found, idVendor=0b38, idProduct=0003
[   18.787964] usb 1-1.4: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[   18.787974] usb 1-1.4: Product: USB-compliant keyboard
[   18.844022] smsc95xx 1-1.1:1.0 eth0: register 'smsc95xx' at usb-3f980000.usb-1.1, smsc95xx USB 2.0 Ethernet, b8:27:eb:69:dc:65
[   18.844415] usbcore: registered new interface driver smsc95xx
[   19.549884] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
[   19.550277] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   19.706690] usbcore: registered new interface driver usbhid
[   19.706703] usbhid: USB HID core driver
[   19.724124] input: USB-compliant keyboard as /devices/platform/soc/3f980000.usb/usb1/1-1/1-1.4/1-1.4:1.0/0003:0B38:0003.0001/input/input1
[   19.792063] hid-generic 0003:0B38:0003.0001: input,hidraw0: USB HID v1.10 Keyboard [USB-compliant keyboard] on usb-3f980000.usb-1.4/input0
[   19.795138] input: USB-compliant keyboard as /devices/platform/soc/3f980000.usb/usb1/1-1/1-1.4/1-1.4:1.1/0003:0B38:0003.0002/input/input2
[   19.863347] hid-generic 0003:0B38:0003.0002: input,hiddev96,hidraw1: USB HID v1.10 Mouse [USB-compliant keyboard] on usb-3f980000.usb-1.4/input1
[   21.009199] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   21.011461] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
[   21.136118] NET: Registered protocol family 17
<3>[  731.358420] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
<6>[  731.366507] sdhost-bcm2835 3f202000.sdhost: resetting
<3>[  741.598499] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
<6>[  741.606583] sdhost-bcm2835 3f202000.sdhost: resetting
<3>[  744.688543] mmc0: Card stuck in wrong state! mmcblk0 card_busy_detect status: 0xe00
<3>[  799.198941] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
<6>[  799.207027] sdhost-bcm2835 3f202000.sdhost: resetting
<3>[  809.439021] sdhost-bcm2835 3f202000.sdhost: timeout waiting for hardware interrupt.
<6>[  809.447109] sdhost-bcm2835 3f202000.sdhost: resetting
<3>[  812.529050] mmc0: Card stuck in wrong state! mmcblk0 card_busy_detect status: 0xe00
Michal Suchánek March 4, 2018, 4:04 p.m. UTC | #9
Hello,

On Sat, 3 Mar 2018 14:58:45 +0100 (CET)
Stefan Wahren <stefan.wahren@i2se.com> wrote:

> Hi Michal,

> yesterday i finished my stress tests with Raspberry Pi 3.
> 
> Scenario:
> - copy Tumbleweed on SD card
> (openSUSE-Tumbleweed-ARM-JeOS-raspberrypi3.aarch64-2018.02.02-Build1.2.raw,
> Linux 4.14.15)
> - setup locales with yast
> - run zypper update
> - reboot
> - install and remove java 1.8 in a loop for at least 1 hour

How many cycles does that perform?

> 
> Results of the different SD cards:
> Toshiba uSDHC Class 10 UHS-1 32 GB: PASS
> BASETech uSDHC Class 10 16 GB: PASS
> Samsung uSDHC EVO+ UHS-1 16 GB: PASS
> Samsung uSDHC Class 6 32 GB: PASS
> SanDisk Edge Class 4 16 GB: PASS
> Kingston uSDHC Class 10 UHS-1 32 GB: PASS
> QUMOX uSDHC Class 10 UHS-1 16 GB: FAIL (zypper segfaulted
> permantently) 

It may very well segfault because the card disconnected due to the
timeout and it is missing some pages of its code. It did for me as well
without the workaround.

> Transcend uSDHC Class 10 UHS-1 32 GB: PASS
> 
> I was never able to reproduce this timeout. So i still need the
> feedback about the 4.15 and i a reliable test scenario.

You may very well never get any. AFAICT this is some kind of heisenbug
in the hardware.

Thanks

Michal
Michal Suchánek March 4, 2018, 4:36 p.m. UTC | #10
On Sat, 3 Mar 2018 14:58:45 +0100 (CET)
Stefan Wahren <stefan.wahren@i2se.com> wrote:

> In a github issue, i've read that badblocks could reproduce the issue
> more likely.

How many iterations of badblocks? It doe snot reproduce the issue for
me.

Thanks

Michal
Stefan Wahren March 4, 2018, 6:11 p.m. UTC | #11
Hi Michal,

> Michal Suchánek <msuchanek@suse.de> hat am 4. März 2018 um 16:57 geschrieben:
> 
> 
> On Wed, 14 Feb 2018 21:30:16 +0100 (CET)
> Stefan Wahren <stefan.wahren@i2se.com> wrote:
> 
> > Hi Michal,
> > 
> > > Michal Suchánek <msuchanek@suse.de> hat am 14. Februar 2018 um
> > > 20:24 geschrieben:
> > > 
> > > 
> > > On Wed, 14 Feb 2018 17:49:31 +0100
> > > Stefan Wahren <stefan.wahren@i2se.com> wrote:
> > >   
> > > > Hi Michal,
> > > > 
> > > > [add Phil]
> > > > 
> > > > Am 14.02.2018 um 17:13 schrieb Michal Suchánek:  
> > > > > On Wed, 14 Feb 2018 16:36:49 +0100
> > > > > Michal Suchánek <msuchanek@suse.de> wrote:
> > > > >    
> > > > >> On Wed, 14 Feb 2018 15:58:31 +0100
> > > > >> Stefan Wahren <stefan.wahren@i2se.com> wrote:
> > > > >>    
> > > > >>> Hi Michal,
> > > > >>>
> > > > >>> Am 14.02.2018 um 15:38 schrieb Michal Suchanek:      
> > > > >>>> The bcm2835 mmc host tends to lock up for unknown reason so
> > > > >>>> reset it on timeout. The upper mmc block layer tries
> > > > >>>> retransimitting with single blocks which tends to work out
> > > > >>>> after a long wait.
> > > > >>>>
> > > > >>>> This is better than giving up and leaving the machine broken
> > > > >>>> for no obvious reason.        
> > > > >>> could you please provide more information about this issue
> > > > >>> (affected hardware, kernel config, version, dmesg,
> > > > >>> reproducible scenario)?   
> > > > > It tends to reproduce when upgrading a few packages with zypper
> > > > > and otherwise at random during system operation. It seems that
> > > > > for my card it worsens with age to some degree so perhaps it
> > > > > depends on the fragmentation of the internal card flash.
> > > > >
> > > > > Attaching dmesg and kernel config.    
> > > > 
> > > > do you noticed this issue before 4.15-rc4?  
> > > 
> > > I initially noticed it with 4.4 kernel with some backports to make
> > > it bootable on RPi.  
> > 
> > this confuses me. Gerd and i ported this driver from downstream and
> > finally it's got merged in 4.12. 
> > 
> > So do you mean that you backported the mainline version to 4.4 or the
> > downstream version of 4.4?
> 
> I did not backport it but looking at the changelog it is backport of
> the 4.12 driver. It does not look as the 4.15 driver though. Looks like
> there was some reorganization of the bcm mmc since then.
> 
> > 
> > On a quick look they seems identical, but they aren't.
> > 
> > > > 
> > > > Could you please test with 4.15 final again?  
> > >
> 
> I tried upgrading to the current master (4.16-rc3+) and the issue is
> still reproducible although less frequent. I did full upgrade from the
> install image which installs over 300 packages and the issue triggered
> somewhere around 200th while before installing a half dozen packages
> would usually trigger it.
> 

this is the same what i did during my stress tests. The step installed 475 packages. The timeout never occured.

Stefan
Stefan Wahren March 4, 2018, 6:15 p.m. UTC | #12
Hi Michal,

> Michal Suchánek <msuchanek@suse.de> hat am 4. März 2018 um 17:36 geschrieben:
> 
> 
> On Sat, 3 Mar 2018 14:58:45 +0100 (CET)
> Stefan Wahren <stefan.wahren@i2se.com> wrote:
> 
> > In a github issue, i've read that badblocks could reproduce the issue
> > more likely.
> 
> How many iterations of badblocks? It doe snot reproduce the issue for
> me.

i asked for the parameters here:

https://github.com/raspberrypi/linux/issues/2392

Stefan

> 
> Thanks
> 
> Michal
Michal Suchánek March 4, 2018, 6:44 p.m. UTC | #13
On Sun, 4 Mar 2018 19:11:49 +0100 (CET)
Stefan Wahren <stefan.wahren@i2se.com> wrote:

> Hi Michal,
> 
> > Michal Suchánek <msuchanek@suse.de> hat am 4. März 2018 um 16:57
> > geschrieben:
> > 
> > 
> > On Wed, 14 Feb 2018 21:30:16 +0100 (CET)
> > Stefan Wahren <stefan.wahren@i2se.com> wrote:
> >   
> > > Hi Michal,
> > >   
> > > > Michal Suchánek <msuchanek@suse.de> hat am 14. Februar 2018 um
> > > > 20:24 geschrieben:
> > > > 
> > > > 
> > > > On Wed, 14 Feb 2018 17:49:31 +0100
> > > > Stefan Wahren <stefan.wahren@i2se.com> wrote:
> > > >     
> > > > > Hi Michal,
> > > > > 
> > > > > [add Phil]
> > > > > 
> > > > > Am 14.02.2018 um 17:13 schrieb Michal Suchánek:    
> > > > > > On Wed, 14 Feb 2018 16:36:49 +0100
> > > > > > Michal Suchánek <msuchanek@suse.de> wrote:
> > > > > >      
> > > > > >> On Wed, 14 Feb 2018 15:58:31 +0100
> > > > > >> Stefan Wahren <stefan.wahren@i2se.com> wrote:
> > > > > >>      
> > > > > >>> Hi Michal,
> > > > > >>>
> > > > > >>> Am 14.02.2018 um 15:38 schrieb Michal Suchanek:        
> > > > > >>>> The bcm2835 mmc host tends to lock up for unknown reason
> > > > > >>>> so reset it on timeout. The upper mmc block layer tries
> > > > > >>>> retransimitting with single blocks which tends to work
> > > > > >>>> out after a long wait.
> > > > > >>>>
> > > > > >>>> This is better than giving up and leaving the machine
> > > > > >>>> broken for no obvious reason.          
> > > > > >>> could you please provide more information about this issue
> > > > > >>> (affected hardware, kernel config, version, dmesg,
> > > > > >>> reproducible scenario)?     
> > > > > > It tends to reproduce when upgrading a few packages with
> > > > > > zypper and otherwise at random during system operation. It
> > > > > > seems that for my card it worsens with age to some degree
> > > > > > so perhaps it depends on the fragmentation of the internal
> > > > > > card flash.
> > > > > >
> > > > > > Attaching dmesg and kernel config.      
> > > > > 
> > > > > do you noticed this issue before 4.15-rc4?    
> > > > 
> > > > I initially noticed it with 4.4 kernel with some backports to
> > > > make it bootable on RPi.    
> > > 
> > > this confuses me. Gerd and i ported this driver from downstream
> > > and finally it's got merged in 4.12. 
> > > 
> > > So do you mean that you backported the mainline version to 4.4 or
> > > the downstream version of 4.4?  
> > 
> > I did not backport it but looking at the changelog it is backport of
> > the 4.12 driver. It does not look as the 4.15 driver though. Looks
> > like there was some reorganization of the bcm mmc since then.
> >   
> > > 
> > > On a quick look they seems identical, but they aren't.
> > >   
> > > > > 
> > > > > Could you please test with 4.15 final again?    
> > > >  
> > 
> > I tried upgrading to the current master (4.16-rc3+) and the issue is
> > still reproducible although less frequent. I did full upgrade from
> > the install image which installs over 300 packages and the issue
> > triggered somewhere around 200th while before installing a half
> > dozen packages would usually trigger it.
> >   
> 
> this is the same what i did during my stress tests. The step
> installed 475 packages. The timeout never occured.

First off, you did your testing with Tumbleweed image which probably
uses btrfs for / while I use Leap 42.3 image which uses ext4.

I was not able to reproduce the issue with installing packages so far -
installed GNOME which is over 700 packages and the issue did not
trigger. However, the upgrades also unlink the old files as does
removing packages - removing GNOME removed over 200 packages and the
issue triggered. However, re-installing and removing GNOME again did
not trigger the issue. So nothing so far reproduces the issue reliably.
With the new kernel the issue even reproduces less frequently than the
4.4 and 4.15 kernel - probably some i/o scheduler change affects the
disk i/o patterns.

Thanks

Michal
Michal Suchánek March 6, 2018, 7:21 p.m. UTC | #14
Hello,

On Sat, 3 Mar 2018 14:58:45 +0100 (CET)
Stefan Wahren <stefan.wahren@i2se.com> wrote:

> Hi Michal,
> 

> yesterday i finished my stress tests with Raspberry Pi 3.
> 
> Scenario:
> - copy Tumbleweed on SD card
> (openSUSE-Tumbleweed-ARM-JeOS-raspberrypi3.aarch64-2018.02.02-Build1.2.raw,
> Linux 4.14.15)
> - setup locales with yast
> - run zypper update
> - reboot
> - install and remove java 1.8 in a loop for at least 1 hour
> 
> Results of the different SD cards:
> Toshiba uSDHC Class 10 UHS-1 32 GB: PASS
> BASETech uSDHC Class 10 16 GB: PASS
> Samsung uSDHC EVO+ UHS-1 16 GB: PASS
> Samsung uSDHC Class 6 32 GB: PASS
> SanDisk Edge Class 4 16 GB: PASS
> Kingston uSDHC Class 10 UHS-1 32 GB: PASS
> QUMOX uSDHC Class 10 UHS-1 16 GB: FAIL (zypper segfaulted
> permantently) Transcend uSDHC Class 10 UHS-1 32 GB: PASS
> 
> I was never able to reproduce this timeout. 

For the record I was not able to reproduce the issue installing and
removing openjdk-1.7.0 in a loop for an hour on Leap 42.3 system which
is affected by the issue. This was a loop of repeatedly installing and
removing four packages and was executed 68 times. I enabled zypper
package caching to avoid spending time in download.

Thanks

Michal
diff mbox

Patch

diff --git a/drivers/mmc/host/bcm2835.c b/drivers/mmc/host/bcm2835.c
index 229dc18f0581..ce05fe72f865 100644
--- a/drivers/mmc/host/bcm2835.c
+++ b/drivers/mmc/host/bcm2835.c
@@ -286,6 +286,7 @@  static void bcm2835_reset(struct mmc_host *mmc)
 
 	if (host->dma_chan)
 		dmaengine_terminate_sync(host->dma_chan);
+	host->dma_chan = NULL;
 	bcm2835_reset_internal(host);
 }
 
@@ -837,6 +838,8 @@  static void bcm2835_timeout(struct work_struct *work)
 		dev_err(dev, "timeout waiting for hardware interrupt.\n");
 		bcm2835_dumpregs(host);
 
+		bcm2835_reset(host->mmc);
+
 		if (host->data) {
 			host->data->error = -ETIMEDOUT;
 			bcm2835_finish_data(host);