mbox series

[RFC,0/6] mmc: renesas_sdhi: add support for DMA end irqs

Message ID 20221006190452.5316-1-wsa+renesas@sang-engineering.com (mailing list archive)
Headers show
Series mmc: renesas_sdhi: add support for DMA end irqs | expand

Message

Wolfram Sang Oct. 6, 2022, 7:04 p.m. UTC
Motivation for this series from patch 5:

===
So far, we have been relying on access_end interrupts only to mark DMA
transfers as done implying that DMA end interrupts have occurred by then
anyhow. On some SoCs under some conditions, this turned out to be not
enough. So, we enable DMA interrupts as well and make sure that both
events, DMA irq and access_end irq, have happened before finishing the
DMA transfer.
===

The first two patches are cleanups. For the rest, the basis were BSP
patches, but they have been refactored heavily, so they look very
different now:

* self-contained
  except for the new dma_irq callback which is for the TMIO core, all
  other code is self-contained in renesas_sdhi_internal_dmac now.
* concise
  Less new members for the existing structs, also code duplication was
  avoided by moving code to more suitable locations
* replaced the spinlock with atomic bit operators
* used quirk mechanism for the old INFO1 layout

Tested on a Salvator-X board with M3-W (r8a77960) and a Salvator-XS
board with M3-N (r8a77965). No regression encountered in functionality
and performance. A branch can be found here:

git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git renesas/sdhi/upport_dmaend


Here are excerpts of a log when booting the M3-N with patch 6 applied to
show that all combinations of incoming irqs are handled:

=== DMA first, Access second ===

          <idle>-0       [000] d.h..     0.505454: renesas_sdhi_internal_dmac_dma_irq: DMA end
          <idle>-0       [000] d.h..     0.505496: renesas_sdhi_internal_dmac_dataend_dma: Access end: 0
          <idle>-0       [000] ..s..     0.505498: renesas_sdhi_internal_dmac_complete_tasklet_fn: Tasklet

=== Access first, DMA second ===

     kworker/0:2-42      [000] d.h..     0.510603: renesas_sdhi_internal_dmac_dataend_dma: Access end: 0
     kworker/0:2-42      [000] d.h..     0.510605: renesas_sdhi_internal_dmac_dma_irq: DMA end
     kworker/0:2-42      [000] ..s..     0.510606: renesas_sdhi_internal_dmac_complete_tasklet_fn: Tasklet

=== Access first, Simulated DMA second ===

          <idle>-0       [000] d.H..     0.510635: renesas_sdhi_internal_dmac_dataend_dma: Access end: 0
          <idle>-0       [000] ..s..     0.510637: renesas_sdhi_internal_dmac_issue_tasklet_fn: Simulated DMA end
          <idle>-0       [000] ..s..     0.510638: renesas_sdhi_internal_dmac_complete_tasklet_fn: Tasklet

(I have never seen Simulated DMA (= CMD error happened) first, but it
should be handled like regular DMA end as well(tm).)

=== Access first, no DMA end needed because of DATA error (EILSEQ) ===

          <idle>-0       [000] d.H..     0.510894: renesas_sdhi_internal_dmac_dataend_dma: Access end: -84
          <idle>-0       [000] ..s..     0.510896: renesas_sdhi_internal_dmac_complete_tasklet_fn: Tasklet

===

I think this is as far as I can reach alone. I'd love to get review
comments and further testing. Shimoda-san, can you kindly ask the BSP
team to do further testing?

Thank you everyone and happy hacking,

   Wolfram


Wolfram Sang (6):
  mmc: renesas_sdhi: remove accessor function for internal_dmac
  mmc: renesas_sdhi: improve naming of DMA struct
  mmc: tmio: add callback for dma irq
  mmc: renesas_sdhi: add quirk for broken register layout
  mmc: renesas_sdhi: take DMA end interrupts into account
  DEBUG mmc: renesas_sdhi: debug end_flags for DMA

 drivers/mmc/host/renesas_sdhi.h               | 14 ++-
 drivers/mmc/host/renesas_sdhi_core.c          |  2 +-
 drivers/mmc/host/renesas_sdhi_internal_dmac.c | 86 ++++++++++++-------
 drivers/mmc/host/tmio_mmc.h                   |  1 +
 drivers/mmc/host/tmio_mmc_core.c              |  3 +
 5 files changed, 72 insertions(+), 34 deletions(-)

Comments

Ulf Hansson Oct. 25, 2022, 10:51 a.m. UTC | #1
Hi Wolfram,

On Thu, 6 Oct 2022 at 21:05, Wolfram Sang
<wsa+renesas@sang-engineering.com> wrote:
>
> Motivation for this series from patch 5:
>
> ===
> So far, we have been relying on access_end interrupts only to mark DMA
> transfers as done implying that DMA end interrupts have occurred by then
> anyhow. On some SoCs under some conditions, this turned out to be not
> enough. So, we enable DMA interrupts as well and make sure that both
> events, DMA irq and access_end irq, have happened before finishing the
> DMA transfer.

The tmio/sdhi core still relies on using tasklets. I think we should
strive to move away from tasklets for all mmc host drivers and to use
threaded irqs instead.

That said, I am worried that it might be harder to move away from
tasklets beyond $subject series, for tmio/sdhi, but I might be wrong?
So, I am wondering if it perhaps would be better to make that
modernization/conversion as the first step instead?

Kind regards
Uffe


> ===
>
> The first two patches are cleanups. For the rest, the basis were BSP
> patches, but they have been refactored heavily, so they look very
> different now:
>
> * self-contained
>   except for the new dma_irq callback which is for the TMIO core, all
>   other code is self-contained in renesas_sdhi_internal_dmac now.
> * concise
>   Less new members for the existing structs, also code duplication was
>   avoided by moving code to more suitable locations
> * replaced the spinlock with atomic bit operators
> * used quirk mechanism for the old INFO1 layout
>
> Tested on a Salvator-X board with M3-W (r8a77960) and a Salvator-XS
> board with M3-N (r8a77965). No regression encountered in functionality
> and performance. A branch can be found here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git renesas/sdhi/upport_dmaend
>
>
> Here are excerpts of a log when booting the M3-N with patch 6 applied to
> show that all combinations of incoming irqs are handled:
>
> === DMA first, Access second ===
>
>           <idle>-0       [000] d.h..     0.505454: renesas_sdhi_internal_dmac_dma_irq: DMA end
>           <idle>-0       [000] d.h..     0.505496: renesas_sdhi_internal_dmac_dataend_dma: Access end: 0
>           <idle>-0       [000] ..s..     0.505498: renesas_sdhi_internal_dmac_complete_tasklet_fn: Tasklet
>
> === Access first, DMA second ===
>
>      kworker/0:2-42      [000] d.h..     0.510603: renesas_sdhi_internal_dmac_dataend_dma: Access end: 0
>      kworker/0:2-42      [000] d.h..     0.510605: renesas_sdhi_internal_dmac_dma_irq: DMA end
>      kworker/0:2-42      [000] ..s..     0.510606: renesas_sdhi_internal_dmac_complete_tasklet_fn: Tasklet
>
> === Access first, Simulated DMA second ===
>
>           <idle>-0       [000] d.H..     0.510635: renesas_sdhi_internal_dmac_dataend_dma: Access end: 0
>           <idle>-0       [000] ..s..     0.510637: renesas_sdhi_internal_dmac_issue_tasklet_fn: Simulated DMA end
>           <idle>-0       [000] ..s..     0.510638: renesas_sdhi_internal_dmac_complete_tasklet_fn: Tasklet
>
> (I have never seen Simulated DMA (= CMD error happened) first, but it
> should be handled like regular DMA end as well(tm).)
>
> === Access first, no DMA end needed because of DATA error (EILSEQ) ===
>
>           <idle>-0       [000] d.H..     0.510894: renesas_sdhi_internal_dmac_dataend_dma: Access end: -84
>           <idle>-0       [000] ..s..     0.510896: renesas_sdhi_internal_dmac_complete_tasklet_fn: Tasklet
>
> ===
>
> I think this is as far as I can reach alone. I'd love to get review
> comments and further testing. Shimoda-san, can you kindly ask the BSP
> team to do further testing?
>
> Thank you everyone and happy hacking,
>
>    Wolfram
>
>
> Wolfram Sang (6):
>   mmc: renesas_sdhi: remove accessor function for internal_dmac
>   mmc: renesas_sdhi: improve naming of DMA struct
>   mmc: tmio: add callback for dma irq
>   mmc: renesas_sdhi: add quirk for broken register layout
>   mmc: renesas_sdhi: take DMA end interrupts into account
>   DEBUG mmc: renesas_sdhi: debug end_flags for DMA
>
>  drivers/mmc/host/renesas_sdhi.h               | 14 ++-
>  drivers/mmc/host/renesas_sdhi_core.c          |  2 +-
>  drivers/mmc/host/renesas_sdhi_internal_dmac.c | 86 ++++++++++++-------
>  drivers/mmc/host/tmio_mmc.h                   |  1 +
>  drivers/mmc/host/tmio_mmc_core.c              |  3 +
>  5 files changed, 72 insertions(+), 34 deletions(-)
>
> --
> 2.35.1
>
Arnd Bergmann Oct. 25, 2022, 11:04 a.m. UTC | #2
On Tue, Oct 25, 2022, at 12:51, Ulf Hansson wrote:
> Hi Wolfram,
>
> On Thu, 6 Oct 2022 at 21:05, Wolfram Sang
> <wsa+renesas@sang-engineering.com> wrote:
>>
>> Motivation for this series from patch 5:
>>
>> ===
>> So far, we have been relying on access_end interrupts only to mark DMA
>> transfers as done implying that DMA end interrupts have occurred by then
>> anyhow. On some SoCs under some conditions, this turned out to be not
>> enough. So, we enable DMA interrupts as well and make sure that both
>> events, DMA irq and access_end irq, have happened before finishing the
>> DMA transfer.
>
> The tmio/sdhi core still relies on using tasklets. I think we should
> strive to move away from tasklets for all mmc host drivers and to use
> threaded irqs instead.
>
> That said, I am worried that it might be harder to move away from
> tasklets beyond $subject series, for tmio/sdhi, but I might be wrong?
> So, I am wondering if it perhaps would be better to make that
> modernization/conversion as the first step instead?

Moving away from tasklets is probably a good idea overall, but I'm
not sure that MMC actually needs a custom IRQ deferral mechanism
in addition to the existing BLOCK_SOFTIRQ. I would expect that block
drivers usually operate in the context of the blk_mq caller, and
adding in another thread context can add substantial latency.

    Arnd
Ulf Hansson Oct. 25, 2022, 12:32 p.m. UTC | #3
On Tue, 25 Oct 2022 at 13:04, Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Oct 25, 2022, at 12:51, Ulf Hansson wrote:
> > Hi Wolfram,
> >
> > On Thu, 6 Oct 2022 at 21:05, Wolfram Sang
> > <wsa+renesas@sang-engineering.com> wrote:
> >>
> >> Motivation for this series from patch 5:
> >>
> >> ===
> >> So far, we have been relying on access_end interrupts only to mark DMA
> >> transfers as done implying that DMA end interrupts have occurred by then
> >> anyhow. On some SoCs under some conditions, this turned out to be not
> >> enough. So, we enable DMA interrupts as well and make sure that both
> >> events, DMA irq and access_end irq, have happened before finishing the
> >> DMA transfer.
> >
> > The tmio/sdhi core still relies on using tasklets. I think we should
> > strive to move away from tasklets for all mmc host drivers and to use
> > threaded irqs instead.
> >
> > That said, I am worried that it might be harder to move away from
> > tasklets beyond $subject series, for tmio/sdhi, but I might be wrong?
> > So, I am wondering if it perhaps would be better to make that
> > modernization/conversion as the first step instead?
>
> Moving away from tasklets is probably a good idea overall, but I'm
> not sure that MMC actually needs a custom IRQ deferral mechanism
> in addition to the existing BLOCK_SOFTIRQ. I would expect that block
> drivers usually operate in the context of the blk_mq caller, and
> adding in another thread context can add substantial latency.

Well, it's not that simple as it depends on what the MMC controller
supports too (and whether the MMC/SD card really conforms to the
specs). We can't poll for busy completions in IRQ context, for
example. And in some cases, we simply need to poll with a CMD13.

Don't get me wrong, I am not promoting a custom deferral mechanism for
MMC, but just a regular threaded IRQ handler (per host driver of
course) when that is needed. And most mmc host drivers already have
that today.

For more sophisticated HWs, we have the mmc hsq interface, that host
drivers can hook into, to potentially avoid some unnecessary context
switchings.

>
>     Arnd

Kind regards
Uffe
Wolfram Sang Nov. 2, 2022, 7:14 p.m. UTC | #4
Hi Ulf,

> The tmio/sdhi core still relies on using tasklets. I think we should
> strive to move away from tasklets for all mmc host drivers and to use
> threaded irqs instead.

Ooookay, noted. I'll put it on my todo-list. But frankly, I can't give
this priority for a while :/ SDHI is used on a lot of platforms, even on
different architectures. Regression testing will be a big one here.

> That said, I am worried that it might be harder to move away from
> tasklets beyond $subject series, for tmio/sdhi, but I might be wrong?
> So, I am wondering if it perhaps would be better to make that
> modernization/conversion as the first step instead?

As said above, I can't do this in the foreseeable future. However, this
series fixes an issue we see. So, my vote goes for applying this now.
Even if it costs some extra cycles on the tasklet-removal-work.

D'accord?

   Wolfram
Ulf Hansson Nov. 3, 2022, 3:04 p.m. UTC | #5
On Wed, 2 Nov 2022 at 20:14, Wolfram Sang
<wsa+renesas@sang-engineering.com> wrote:
>
> Hi Ulf,
>
> > The tmio/sdhi core still relies on using tasklets. I think we should
> > strive to move away from tasklets for all mmc host drivers and to use
> > threaded irqs instead.
>
> Ooookay, noted. I'll put it on my todo-list. But frankly, I can't give
> this priority for a while :/ SDHI is used on a lot of platforms, even on
> different architectures. Regression testing will be a big one here.
>
> > That said, I am worried that it might be harder to move away from
> > tasklets beyond $subject series, for tmio/sdhi, but I might be wrong?
> > So, I am wondering if it perhaps would be better to make that
> > modernization/conversion as the first step instead?
>
> As said above, I can't do this in the foreseeable future. However, this
> series fixes an issue we see. So, my vote goes for applying this now.
> Even if it costs some extra cycles on the tasklet-removal-work.
>
> D'accord?

Sure, I am fine with applying this. I just wanted to point out my long
term thoughts around using the tasklets.

Do you intend to send a new version to drop the RFC? Or can I apply as is?

Kind regards
Uffe
Wolfram Sang Nov. 3, 2022, 6:23 p.m. UTC | #6
> Sure, I am fine with applying this. I just wanted to point out my long
> term thoughts around using the tasklets.

Yes, I made an action item for that.

> Do you intend to send a new version to drop the RFC? Or can I apply as is?

It should have actually been named RFT. I'd like the BSP team to test it
if they have time. I will ping them.

Thanks, Ulf!
Yoshihiro Shimoda Nov. 18, 2022, 12:49 a.m. UTC | #7
Hi Wolfram-san,

> From: Wolfram Sang, Sent: Friday, November 4, 2022 3:24 AM
> 
> > Do you intend to send a new version to drop the RFC? Or can I apply as is?
> 
> It should have actually been named RFT. I'd like the BSP team to test it
> if they have time. I will ping them.

I'm sorry for delayed the response.
Our test team tested this patch series on R-Car E3 board and they didn't observe
any regression. So,

Tested-by: Duy Nguyen <duy.nguyen.rh@renesas.com>

And, I tested this on R-Car H3 board. So,

Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>

Best regards,
Yoshihiro Shimoda

> Thanks, Ulf!
Ulf Hansson Nov. 18, 2022, 9:45 a.m. UTC | #8
On Thu, 6 Oct 2022 at 21:05, Wolfram Sang
<wsa+renesas@sang-engineering.com> wrote:
>
> Motivation for this series from patch 5:
>
> ===
> So far, we have been relying on access_end interrupts only to mark DMA
> transfers as done implying that DMA end interrupts have occurred by then
> anyhow. On some SoCs under some conditions, this turned out to be not
> enough. So, we enable DMA interrupts as well and make sure that both
> events, DMA irq and access_end irq, have happened before finishing the
> DMA transfer.
> ===
>
> The first two patches are cleanups. For the rest, the basis were BSP
> patches, but they have been refactored heavily, so they look very
> different now:
>
> * self-contained
>   except for the new dma_irq callback which is for the TMIO core, all
>   other code is self-contained in renesas_sdhi_internal_dmac now.
> * concise
>   Less new members for the existing structs, also code duplication was
>   avoided by moving code to more suitable locations
> * replaced the spinlock with atomic bit operators
> * used quirk mechanism for the old INFO1 layout
>
> Tested on a Salvator-X board with M3-W (r8a77960) and a Salvator-XS
> board with M3-N (r8a77965). No regression encountered in functionality
> and performance. A branch can be found here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git renesas/sdhi/upport_dmaend
>
>
> Here are excerpts of a log when booting the M3-N with patch 6 applied to
> show that all combinations of incoming irqs are handled:
>
> === DMA first, Access second ===
>
>           <idle>-0       [000] d.h..     0.505454: renesas_sdhi_internal_dmac_dma_irq: DMA end
>           <idle>-0       [000] d.h..     0.505496: renesas_sdhi_internal_dmac_dataend_dma: Access end: 0
>           <idle>-0       [000] ..s..     0.505498: renesas_sdhi_internal_dmac_complete_tasklet_fn: Tasklet
>
> === Access first, DMA second ===
>
>      kworker/0:2-42      [000] d.h..     0.510603: renesas_sdhi_internal_dmac_dataend_dma: Access end: 0
>      kworker/0:2-42      [000] d.h..     0.510605: renesas_sdhi_internal_dmac_dma_irq: DMA end
>      kworker/0:2-42      [000] ..s..     0.510606: renesas_sdhi_internal_dmac_complete_tasklet_fn: Tasklet
>
> === Access first, Simulated DMA second ===
>
>           <idle>-0       [000] d.H..     0.510635: renesas_sdhi_internal_dmac_dataend_dma: Access end: 0
>           <idle>-0       [000] ..s..     0.510637: renesas_sdhi_internal_dmac_issue_tasklet_fn: Simulated DMA end
>           <idle>-0       [000] ..s..     0.510638: renesas_sdhi_internal_dmac_complete_tasklet_fn: Tasklet
>
> (I have never seen Simulated DMA (= CMD error happened) first, but it
> should be handled like regular DMA end as well(tm).)
>
> === Access first, no DMA end needed because of DATA error (EILSEQ) ===
>
>           <idle>-0       [000] d.H..     0.510894: renesas_sdhi_internal_dmac_dataend_dma: Access end: -84
>           <idle>-0       [000] ..s..     0.510896: renesas_sdhi_internal_dmac_complete_tasklet_fn: Tasklet
>
> ===
>
> I think this is as far as I can reach alone. I'd love to get review
> comments and further testing. Shimoda-san, can you kindly ask the BSP
> team to do further testing?
>
> Thank you everyone and happy hacking,
>
>    Wolfram
>
>
> Wolfram Sang (6):
>   mmc: renesas_sdhi: remove accessor function for internal_dmac
>   mmc: renesas_sdhi: improve naming of DMA struct
>   mmc: tmio: add callback for dma irq
>   mmc: renesas_sdhi: add quirk for broken register layout
>   mmc: renesas_sdhi: take DMA end interrupts into account
>   DEBUG mmc: renesas_sdhi: debug end_flags for DMA
>
>  drivers/mmc/host/renesas_sdhi.h               | 14 ++-
>  drivers/mmc/host/renesas_sdhi_core.c          |  2 +-
>  drivers/mmc/host/renesas_sdhi_internal_dmac.c | 86 ++++++++++++-------
>  drivers/mmc/host/tmio_mmc.h                   |  1 +
>  drivers/mmc/host/tmio_mmc_core.c              |  3 +
>  5 files changed, 72 insertions(+), 34 deletions(-)
>

Patch 1->5 applied for next, thanks!

Kind regards
Uffe
Wolfram Sang Nov. 18, 2022, 8:55 p.m. UTC | #9
> Our test team tested this patch series on R-Car E3 board and they didn't observe
> any regression. So,
> 
> Tested-by: Duy Nguyen <duy.nguyen.rh@renesas.com>
> 
> And, I tested this on R-Car H3 board. So,
> 
> Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>

Thank you very much, Shimoda-san! Have a nice weekend.