From patchwork Fri Feb 20 17:19:33 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robert Jarzmik X-Patchwork-Id: 5857471 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 525D99F269 for ; Fri, 20 Feb 2015 17:22:22 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 6FDD820460 for ; Fri, 20 Feb 2015 17:22:20 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E3E2820456 for ; Fri, 20 Feb 2015 17:22:17 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1YOrFK-0007T6-VI; Fri, 20 Feb 2015 17:20:10 +0000 Received: from smtp08.smtpout.orange.fr ([80.12.242.130] helo=smtp.smtpout.orange.fr) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1YOrF8-0006vh-En for linux-arm-kernel@lists.infradead.org; Fri, 20 Feb 2015 17:20:02 +0000 Received: from beldin ([90.16.210.142]) by mwinf5d31 with ME id uhKZ1p00V34tC3e03hKZ2E; Fri, 20 Feb 2015 18:19:35 +0100 X-ME-Helo: beldin X-ME-Date: Fri, 20 Feb 2015 18:19:35 +0100 X-ME-IP: 90.16.210.142 From: Robert Jarzmik To: Daniel Mack , Vinod Koul , Robert Jarzmik Subject: [RFC] Redesign drivers/dma/mmp_pdma X-URL: http://belgarath.falguerolles.org/ Date: Fri, 20 Feb 2015 18:19:33 +0100 Message-ID: <8761aw8pqy.fsf@free.fr> User-Agent: Gnus/5.130008 (Ma Gnus v0.8) Emacs/24.3.92 (gnu/linux) MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20150220_091958_986728_8E257CAC X-CRM114-Status: GOOD ( 23.59 ) X-Spam-Score: 0.0 (/) Cc: dmaengine@vger.kernel.org, Arnd Bergmann , linux-arm-kernel@lists.infradead.org X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Daniel and Vinod, This is what I have in mind for the next evolution of mmp_pdma. Aside from the first parts of this document, I have questions in the second part I wish I had comments for. This is a pretty thick piece to injest, so take your time, I won't begin submitting anything around mmp_pdma anymore until we have a good discussion about it, at least with Daniel. The root of this goes back to 2013 (references in [1]), where we had a discussion between Daniel, Vinod and me, and where the conclusion was "DmaEngine slave API permits pxa_camera to work as before", it's just I didn't take the time to write down what was needed in the pxa dma driver to shift pxa_camera to dmaengine, given my knowledge of pxa_camera dma handling. Any comments are welcome, before I begin the actual work. Cheers. --- Robert ---8>--- From 1c0b943a0f58fb7eb1aac9c58a61abe21cb13b95 Mon Sep 17 00:00:00 2001 From: Robert Jarzmik Date: Fri, 20 Feb 2015 17:50:32 +0100 Subject: [PATCH] Documentation: dmaengine: mmp_pdma design Document the new design of the mmp_pdma dma driver. Signed-off-by: Robert Jarzmik --- Documentation/dmaengine/mmp_pdma.txt | 147 +++++++++++++++++++++++++++++++++++ 1 file changed, 147 insertions(+) create mode 100644 Documentation/dmaengine/mmp_pdma.txt + + 4) DmaEngine providing channels with properties + I need that the provided channel matches bandwidth criteria. It's not + doable to assign the most bandwidth capable channel to the first + requestor, but rather have the requestor ask for a "minimum" level of + priority (between low/medium/high). + + How is this supposed to be done ? diff --git a/Documentation/dmaengine/mmp_pdma.txt b/Documentation/dmaengine/mmp_pdma.txt new file mode 100644 index 0000000..78ce486 --- /dev/null +++ b/Documentation/dmaengine/mmp_pdma.txt @@ -0,0 +1,147 @@ +PXA/MMP - DMA Slave controller +============================== + +Constraints +----------- + a) Transfers hot queuing + A driver submitting a transfer and issuing it should be granted the transfer + is queued even on a running DMA channel. + This implies that the queuing doesn't wait for the previous transfer end, + and that the descriptor chaining is not only done in the irq/tasklet code + triggered by the end of the transfer. + + b) All transfers having asked for confirmation should be signaled + Any issued transfer with DMA_PREP_INTERRUPT should trigger a callback call. + This implies that even if an irq/tasklet is triggered by end of tx1, but + at the time of irq/dma tx2 is already finished, tx1->complete() and + tx2->complete() should be called. + + c) Channel residue calculation + A channel should be able to report how much advanced is a transfer. The + granularity is still [TBD]. + + d) Channel running state + A driver should be able to query if a channel is running or not. For the + multimedia case, such as video capture, if a transfer is submitted and then + a check of the DMA channel reports a "stopped channel", the transfer should + not be issued until the next "start of frame interrupt", hence the need to + know if a channel is in running or stopped state. + + e) Bandwidth guarantee + The PXA architecture has 3 levels of DMAs priorities : high, normal, low. + The high prorities get twice as much bandwith as the normal, which get twice + as much as the low priorities. + A driver should be able to request a priority, especially the real-time + ones such as pxa_camera with (big) throughputs. + + f) Transfer reusability + An issued and finished transfer should be "reusable". The choice of + "DMA_CTRL_ACK" should be left to the client, not the dma driver. + +Design +------ + a) Virtual channels + Same concept as in sa11x0 driver, ie. a driver was assigned a "virtual + channel" linked to the requestor line, and the physical DMA channel is + assigned on the fly when the transfer is issued. + + b) Transfer anatomy for a scatter-gather transfer + +------------+-----+---------------+----------------+-----------------+ + | desc-sg[0] | ... | desc-sg[last] | status updater | finisher/linker | + +------------+-----+---------------+----------------+-----------------+ + + This structure is pointed by dma->sg_cpu. + The descriptors are used as follows : + - desc-sg[i]: i-th descriptor, transferring the i-th sg + element to the video buffer scatter gather + - status updater + Transfers a single u32 to a well known dma coherent memory to leave + a trace that this transfer is done. The "well known" is unique per + physical channel, meaning that a read of this value will tell which + is the last finished transfer at that point in time. + - finisher: has ddadr=DADDR_STOP, dcmd=ENDIRQEN + - linker: has ddadr= desc-sg[0] of next transfer, dcmd=0 + + b) Transfers hot-chaining + Suppose the running chain is : + Buffer 1 Buffer 2 + +---------+----+---+ +----+----+----+---+ + | d0 | .. | dN | l | | d0 | .. | dN | f | + +---------+----+-|-+ ^----+----+----+---+ + | | + +----+ + + After a call to dma_async_issue_pending(), the chain will look like : + Buffer 1 Buffer 2 Buffer 3 + +---------+----+---+ +----+----+----+---+ +----+----+----+---+ + | d0 | .. | dN | l | | d0 | .. | dN | l | | d0 | .. | dN | f | + +---------+----+-|-+ ^----+----+----+-|-+ ^----+----+----+---+ + | | | | + +----+ +----+ + new_link + + If while new_link was created the DMA channel stopped, it is reactivated + with DDADR = Buffer3.d0. To know if it stopped because Buffer3 was already + taken care of, see below "transfers completion updater". + + c) Transfers completion updater + Each time a transfer is completed on a channel, an interrupt might be + generated or not, up to the client's request. But in each case, the last + descriptor of a transfer, the "status updater", will write the latest + transfer being completed into the physical channel's completion mark. + + This will speed up residue calculation, for large transfers such as video + buffers which hold around 6k descriptors or more. This also allows without + any lock to find out what is the latest completed transfer in a running + DMA chain. + + d) Transfers completion, irq and tasklet + When a transfer flagged as "DMA_PREP_INTERRUPT" is finished, the dma irq + is raised. Upon this interrupt, a tasklet is scheduled for the physical + channel. + The tasklet is responsible for : + - reading the physical channel last updater mark + - calling all the transfer callbacks of finished transfers, based on + that mark, and each transfer flags. + If a transfer is completed while this handling is done, a dma irq will + be raised, and the tasklet will be scheduled once again, having a new + updater mark. + + e) Residue + Residue granularity will be descriptor based. The issued but not completed + transfers will be scanned for all of their descriptors against the + currently running descriptor. + +--- +This below will be dropped from the submission when the RFC will become a true patch. + +Remaining questions: +------------------- + 1) Given the number of things to be changed, is it still worth patching + existing mmp_pdma or develop a new driver : the main changes will come + from virt-dma usage, and the irq/tasklet/hot chaining revamp ? + I'm expecting around 50% of code reuse, future will tell ... + + 2) Cyclic transfers and residue calculation + For this iteration, I'll leave cyclic transfers as they are. But I was + wondering if it wouldn't make sense to allocate dma descriptors for + the _unique_ running cyclic transfer in a page, so that all the descriptors + are contiguous. + Of course, this would limit the span of a cyclic transfer to : + 4096 / (32 * 4) * 4096 = 2097152 bytes + On the other hand, the residue calculation changes from a chained list + traversal to a simple (DDADR - desc0) / (32 * 4) * 4096, assuming all + scatter gather elements have the same 4k length ... + + Now the question is, will it be worth it ? + + 3) DmaEngine and DMA_CTRL_ACK : this one is for Dan/Vinod + In Documentation/dmaengine/provider.txt, it is written that "DMA_CTRL_ACK": + "No one really has an idea of what it's about". + Actually that fits exactly one of my requirements : being able to reuse a + transfer without having to recalculate all the descriptors for it. The + rationale behind is that for video buffers, of 6k descriptors, the chain + is reused when the transfer is submitted/issued again, without having to + re-allocate and compose the whole descirptor links. + + Is my understanding of what DMA_CTRL_ACK should mean correct ?