From patchwork Thu Sep 22 08:51:37 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Krzysztof_Ha=C5=82asa?= X-Patchwork-Id: 9344785 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id BC336601C2 for ; Thu, 22 Sep 2016 08:51:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ADC642A8E3 for ; Thu, 22 Sep 2016 08:51:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A1D4E2A8E7; Thu, 22 Sep 2016 08:51:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, WEIRD_PORT autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9D14C2A8E3 for ; Thu, 22 Sep 2016 08:51:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932335AbcIVIvn (ORCPT ); Thu, 22 Sep 2016 04:51:43 -0400 Received: from ni.piap.pl ([195.187.100.4]:32992 "EHLO ni.piap.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754916AbcIVIvm (ORCPT ); Thu, 22 Sep 2016 04:51:42 -0400 Received: from t19.piap.pl (OSB1819.piap.pl [10.0.9.19]) by ni.piap.pl (Postfix) with ESMTP id E520D4421E1; Thu, 22 Sep 2016 10:51:37 +0200 (CEST) From: khalasa@piap.pl (Krzysztof =?utf-8?Q?Ha=C5=82asa?=) To: Andrey Utkin Cc: Hans Verkuil , Andrey Utkin , linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, Mauro Carvalho Chehab , Hans Verkuil , Ismael Luceno , Bluecherry Maintainers Subject: Re: solo6010 modprobe lockup since e1ceb25a (v4.3 regression) References: <20160915130441.ji3f3jiiebsnsbct@acer> <9cbb2079-f705-5312-d295-34bc3c8dadb9@xs4all.nl> <20160921134554.s3tdolyej6r2w5wh@zver> Date: Thu, 22 Sep 2016 10:51:37 +0200 In-Reply-To: <20160921134554.s3tdolyej6r2w5wh@zver> (Andrey Utkin's message of "Wed, 21 Sep 2016 16:45:54 +0300") Message-ID: MIME-Version: 1.0 X-KLMS-Rule-ID: 1 X-KLMS-Message-Action: clean X-KLMS-AntiSpam-Lua-Profiles: 103156 [Sep 22 2016] X-KLMS-AntiSpam-Version: 5.6.0.28 X-KLMS-AntiSpam-Envelope-From: khalasa@piap.pl X-KLMS-AntiSpam-Rate: 0 X-KLMS-AntiSpam-Status: not_detected X-KLMS-AntiSpam-Method: none X-KLMS-AntiSpam-Info: LuaCore: 112 112 a2b0103ce64cfdf8d45a470d2c6a59d52eef415f, {Dosetcrawler: probable amspam}, Auth:dkim=none X-KLMS-AntiSpam-Interceptor-Info: scan successful X-KLMS-AntiPhishing: Clean, 2016/09/21 14:56:51 X-KLMS-AntiVirus: Kaspersky Security 8.0 for Linux Mail Server, version 8.0.1.721, bases: 2016/09/22 02:29:00 #8014189 X-KLMS-AntiVirus-Status: Clean, skipped Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Andrey Utkin writes: > It happens in solo_disp_init at uploading default motion thresholds > array. > > I've got a prints trace with solo6010-fix-lockup branch > https://github.com/bluecherrydvr/linux/tree/solo6010-fix-lockup/drivers/media/pci/solo6x10 > the trace itself in jpg: > https://decent.im:5281/upload/3793f393-e285-4514-83dd-bf08d1c8b4a2/e7ad898b-515b-4522-86a9-553daaeb0860.jpg solo_motion_config() uses BM DMA and thus generates IRQ, this may be indeed the ISR problem. BTW the IRQ debugging ("kernel hacking") should catch it. OTOH programming the DMA can be guilty as well. I wonder if the following fixes the problem (completely untested). > Indeed, targeted fixing would be more reasonable than making register > r/w routines follow blocking fashion. But the driver is already complete > and was known to be working, and I seems all places in code assume the > blocking fashion of reg r/w, and changing that assumption may lead to > covert bugs anywhere else, not just at probing, which may be hard to > nail down. The driver code doesn't have to assume anything about posted writes - except at very specific places (as explained by Alan). Normally, a CPU write to a register doesn't have to be flushed right away. It would be much slower, especially if used extensively. Nobody does anything alike since the end of the ISA bus. The driver (and the card) can still see all operations in correct order, in both cases. The potential problem is a write being held in a buffer (and not making it to the actual hardware). This may happen in ISR since the actual write is deactivates the physical IRQ line. Otherwise the ISR terminates and is immediately requested again - though this second call should bring the IRQ down by reading the register (thus flushing the write buffer) - so, while not very effective, it shouldn't lock up (but it's a real bug worth fixing). Also, I imagine a write to the DMA registers can be posted and the DMA may not start in time. This shouldn't end in a lock up, either. Perhaps a different bug is involved. The other thing is BM DMA (card->RAM). All DMA transfers (initiated by the card) are completed with an IRQ (either with success or failure). This is potentially a problem as well, though it has nothing to do with the patch in question. I guess the SOLO reads some descriptors or something, and such writes are flushed this way. > For now, I'll try setting pci_read_config_word() back instead of full > revert. Does it need to be just in reg_write? No need for it in > reg_read, right? Sure, reg_read() doesn't write to the device. It the patch doesn't fix the problem, what CPU and chipset are used by the computer which exhibits the issue? Perhaps I have something similar here and can reproduce it. diff --git a/drivers/media/pci/solo6x10/solo6x10-core.c b/drivers/media/pci/solo6x10/solo6x10-core.c index f50d072..2d4900e 100644 --- a/drivers/media/pci/solo6x10/solo6x10-core.c +++ b/drivers/media/pci/solo6x10/solo6x10-core.c @@ -99,6 +99,7 @@ static irqreturn_t solo_isr(int irq, void *data) { struct solo_dev *solo_dev = data; u32 status; + u16 tmp; int i; status = solo_reg_read(solo_dev, SOLO_IRQ_STAT); @@ -129,6 +130,7 @@ static irqreturn_t solo_isr(int irq, void *data) if (status & SOLO_IRQ_G723) solo_g723_isr(solo_dev); + pci_read_config_word(solo_dev->pdev, PCI_STATUS, &tmp) // flush write to SOLO_IRQ_STAT return IRQ_HANDLED; } diff --git a/drivers/media/pci/solo6x10/solo6x10-p2m.c b/drivers/media/pci/solo6x10/solo6x10-p2m.c index 07c4e07..8a51d45 100644 --- a/drivers/media/pci/solo6x10/solo6x10-p2m.c +++ b/drivers/media/pci/solo6x10/solo6x10-p2m.c @@ -70,6 +70,7 @@ int solo_p2m_dma_desc(struct solo_dev *solo_dev, unsigned int config = 0; int ret = 0; int p2m_id = 0; + u16 tmp; /* Get next ID. According to Softlogic, 6110 has problems on !=0 P2M */ if (solo_dev->type != SOLO_DEV_6110 && multi_p2m) { @@ -111,6 +112,7 @@ int solo_p2m_dma_desc(struct solo_dev *solo_dev, desc[1].ctrl); } + pci_read_config_word(solo_dev->pdev, PCI_STATUS, &tmp); // flush writes timeout = wait_for_completion_timeout(&p2m_dev->completion, solo_dev->p2m_jiffies);