diff mbox

[V17,2/3] dmaengine: qcom_hidma: add debugfs hooks

Message ID 1460384473-5775-3-git-send-email-okaya@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Sinan Kaya April 11, 2016, 2:21 p.m. UTC
Add debugfs hooks for debugging the execution behavior of the DMA
channel. The debugfs hooks get initialized by the probe function and
uninitialized by the remove function.

A stats file is created in debugfs. The stats file will show the
information about each HIDMA channel as well as each asynchronous job
queued and completed at a given time.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/dma/qcom/Makefile    |   2 +-
 drivers/dma/qcom/hidma.c     |   5 +-
 drivers/dma/qcom/hidma.h     |   2 +
 drivers/dma/qcom/hidma_dbg.c | 219 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 226 insertions(+), 2 deletions(-)
 create mode 100644 drivers/dma/qcom/hidma_dbg.c

Comments

Vinod Koul April 26, 2016, 3:30 a.m. UTC | #1
On Mon, Apr 11, 2016 at 10:21:12AM -0400, Sinan Kaya wrote:

> +static int hidma_chan_stats(struct seq_file *s, void *unused)
> +{
> +	struct hidma_chan *mchan = s->private;
> +	struct hidma_desc *mdesc;
> +	struct hidma_dev *dmadev = mchan->dmadev;
> +
> +	pm_runtime_get_sync(dmadev->ddev.dev);

debug shouldn't power up device, why do you want to do that
Sinan Kaya April 26, 2016, 12:08 p.m. UTC | #2
On 2016-04-25 23:30, Vinod Koul wrote:
> On Mon, Apr 11, 2016 at 10:21:12AM -0400, Sinan Kaya wrote:
> 
>> +static int hidma_chan_stats(struct seq_file *s, void *unused)
>> +{
>> +	struct hidma_chan *mchan = s->private;
>> +	struct hidma_desc *mdesc;
>> +	struct hidma_dev *dmadev = mchan->dmadev;
>> +
>> +	pm_runtime_get_sync(dmadev->ddev.dev);
> 
> debug shouldn't power up device, why do you want to do that


Clocks are turned off while the hw is idle. I can’t reach hw registers 
without restoring power.
Vinod Koul April 26, 2016, 4:25 p.m. UTC | #3
On Tue, Apr 26, 2016 at 08:08:16AM -0400, okaya@codeaurora.org wrote:
> On 2016-04-25 23:30, Vinod Koul wrote:
> >On Mon, Apr 11, 2016 at 10:21:12AM -0400, Sinan Kaya wrote:
> >
> >>+static int hidma_chan_stats(struct seq_file *s, void *unused)
> >>+{
> >>+	struct hidma_chan *mchan = s->private;
> >>+	struct hidma_desc *mdesc;
> >>+	struct hidma_dev *dmadev = mchan->dmadev;
> >>+
> >>+	pm_runtime_get_sync(dmadev->ddev.dev);
> >
> >debug shouldn't power up device, why do you want to do that
> 
> 
> Clocks are turned off while the hw is idle. I can’t reach hw
> registers without restoring power.

Hmm, have you thought about using regmap?
Sinan Kaya April 26, 2016, 4:55 p.m. UTC | #4
On 4/26/2016 12:25 PM, Vinod Koul wrote:
> On Tue, Apr 26, 2016 at 08:08:16AM -0400, okaya@codeaurora.org wrote:
>> On 2016-04-25 23:30, Vinod Koul wrote:
>>> On Mon, Apr 11, 2016 at 10:21:12AM -0400, Sinan Kaya wrote:
>>>
>>>> +static int hidma_chan_stats(struct seq_file *s, void *unused)
>>>> +{
>>>> +	struct hidma_chan *mchan = s->private;
>>>> +	struct hidma_desc *mdesc;
>>>> +	struct hidma_dev *dmadev = mchan->dmadev;
>>>> +
>>>> +	pm_runtime_get_sync(dmadev->ddev.dev);
>>>
>>> debug shouldn't power up device, why do you want to do that
>>
>>
>> Clocks are turned off while the hw is idle. I can’t reach hw
>> registers without restoring power.
> 
> Hmm, have you thought about using regmap?
> 

To be honest, I didn't know what regmap is but I just read some code
and looked at how it is used. Feel free to correct me if I got it 
wrong. 

Regmap seems to be designed for *slow* speed peripherals to improve frequent
accesses by the SW. It looks like it is used by MFD, SPI and I2C drivers.

It seems to cache the register contents and flush/invalidate them only when
needed.

The MMIO version seems to be assuming the presence of device-tree like CLK
API which doesn't exist on ACPI systems and is not portable.

My reaction is that it is a lot of code with no added functionality to what
HIDMA driver is trying to achieve. 

Given that the use case here is only for debug purposes; I think it is OK 
to keep this runtime call here. I don't want to add any overhead into the
existing code just to support the debug use case.  

None of my register read/writes are slow. This file will only be used to 
troubleshoot customer issues.
Vinod Koul April 27, 2016, 8:15 a.m. UTC | #5
On Tue, Apr 26, 2016 at 12:55:18PM -0400, Sinan Kaya wrote:
> On 4/26/2016 12:25 PM, Vinod Koul wrote:
> > On Tue, Apr 26, 2016 at 08:08:16AM -0400, okaya@codeaurora.org wrote:
> >> On 2016-04-25 23:30, Vinod Koul wrote:
> >>> On Mon, Apr 11, 2016 at 10:21:12AM -0400, Sinan Kaya wrote:
> >>>
> >>>> +static int hidma_chan_stats(struct seq_file *s, void *unused)
> >>>> +{
> >>>> +	struct hidma_chan *mchan = s->private;
> >>>> +	struct hidma_desc *mdesc;
> >>>> +	struct hidma_dev *dmadev = mchan->dmadev;
> >>>> +
> >>>> +	pm_runtime_get_sync(dmadev->ddev.dev);
> >>>
> >>> debug shouldn't power up device, why do you want to do that
> >>
> >>
> >> Clocks are turned off while the hw is idle. I can’t reach hw
> >> registers without restoring power.
> > 
> > Hmm, have you thought about using regmap?
> > 
> 
> To be honest, I didn't know what regmap is but I just read some code
> and looked at how it is used. Feel free to correct me if I got it 
> wrong. 
> 
> Regmap seems to be designed for *slow* speed peripherals to improve frequent
> accesses by the SW. It looks like it is used by MFD, SPI and I2C drivers.
> 
> It seems to cache the register contents and flush/invalidate them only when
> needed.
> 
> The MMIO version seems to be assuming the presence of device-tree like CLK
> API which doesn't exist on ACPI systems and is not portable.
> 
> My reaction is that it is a lot of code with no added functionality to what
> HIDMA driver is trying to achieve. 
> 
> Given that the use case here is only for debug purposes; I think it is OK 
> to keep this runtime call here. I don't want to add any overhead into the
> existing code just to support the debug use case.  
> 
> None of my register read/writes are slow. This file will only be used to 
> troubleshoot customer issues.

$ is always faster than MMIO. This way you can give reg contents to users
without waking up hw.

Also we at Intel use regmap on ACPI systems without CLK API
Marc Zyngier April 27, 2016, 8:47 a.m. UTC | #6
On 27/04/16 09:15, Vinod Koul wrote:
> On Tue, Apr 26, 2016 at 12:55:18PM -0400, Sinan Kaya wrote:
>> On 4/26/2016 12:25 PM, Vinod Koul wrote:
>>> On Tue, Apr 26, 2016 at 08:08:16AM -0400, okaya@codeaurora.org wrote:
>>>> On 2016-04-25 23:30, Vinod Koul wrote:
>>>>> On Mon, Apr 11, 2016 at 10:21:12AM -0400, Sinan Kaya wrote:
>>>>>
>>>>>> +static int hidma_chan_stats(struct seq_file *s, void *unused)
>>>>>> +{
>>>>>> +	struct hidma_chan *mchan = s->private;
>>>>>> +	struct hidma_desc *mdesc;
>>>>>> +	struct hidma_dev *dmadev = mchan->dmadev;
>>>>>> +
>>>>>> +	pm_runtime_get_sync(dmadev->ddev.dev);
>>>>>
>>>>> debug shouldn't power up device, why do you want to do that
>>>>
>>>>
>>>> Clocks are turned off while the hw is idle. I can’t reach hw
>>>> registers without restoring power.
>>>
>>> Hmm, have you thought about using regmap?
>>>
>>
>> To be honest, I didn't know what regmap is but I just read some code
>> and looked at how it is used. Feel free to correct me if I got it 
>> wrong. 
>>
>> Regmap seems to be designed for *slow* speed peripherals to improve frequent
>> accesses by the SW. It looks like it is used by MFD, SPI and I2C drivers.
>>
>> It seems to cache the register contents and flush/invalidate them only when
>> needed.
>>
>> The MMIO version seems to be assuming the presence of device-tree like CLK
>> API which doesn't exist on ACPI systems and is not portable.
>>
>> My reaction is that it is a lot of code with no added functionality to what
>> HIDMA driver is trying to achieve. 
>>
>> Given that the use case here is only for debug purposes; I think it is OK 
>> to keep this runtime call here. I don't want to add any overhead into the
>> existing code just to support the debug use case.  
>>
>> None of my register read/writes are slow. This file will only be used to 
>> troubleshoot customer issues.

I'd recommend you actually run perf on a a few of your MMIO accesses. I
believe the result will be eye opening. On the KVM side, we've trimmed
our MMIO access as much as possible, using a memory-based cache (similar
to regmap in concept). This has made some code paths about 40% faster.

> $ is always faster than MMIO. This way you can give reg contents to users
> without waking up hw.

Indeed. MMIO access sucks rocks, even on a very fast box. Actually, the
faster the box is, the slower MMIO feels (compared to memory).

Thanks,

	M.
Sinan Kaya April 27, 2016, 12:51 p.m. UTC | #7
On 2016-04-27 04:15, Vinod Koul wrote:
> On Tue, Apr 26, 2016 at 12:55:18PM -0400, Sinan Kaya wrote:
>> On 4/26/2016 12:25 PM, Vinod Koul wrote:
>> > On Tue, Apr 26, 2016 at 08:08:16AM -0400, okaya@codeaurora.org wrote:
>> >> On 2016-04-25 23:30, Vinod Koul wrote:
>> >>> On Mon, Apr 11, 2016 at 10:21:12AM -0400, Sinan Kaya wrote:
>> >>>
>> >>>> +static int hidma_chan_stats(struct seq_file *s, void *unused)
>> >>>> +{
>> >>>> +	struct hidma_chan *mchan = s->private;
>> >>>> +	struct hidma_desc *mdesc;
>> >>>> +	struct hidma_dev *dmadev = mchan->dmadev;
>> >>>> +
>> >>>> +	pm_runtime_get_sync(dmadev->ddev.dev);
>> >>>
>> >>> debug shouldn't power up device, why do you want to do that
>> >>
>> >>
>> >> Clocks are turned off while the hw is idle. I can’t reach hw
>> >> registers without restoring power.
>> >
>> > Hmm, have you thought about using regmap?
>> >
>> 
>> To be honest, I didn't know what regmap is but I just read some code
>> and looked at how it is used. Feel free to correct me if I got it
>> wrong.
>> 
>> Regmap seems to be designed for *slow* speed peripherals to improve 
>> frequent
>> accesses by the SW. It looks like it is used by MFD, SPI and I2C 
>> drivers.
>> 
>> It seems to cache the register contents and flush/invalidate them only 
>> when
>> needed.
>> 
>> The MMIO version seems to be assuming the presence of device-tree like 
>> CLK
>> API which doesn't exist on ACPI systems and is not portable.
>> 
>> My reaction is that it is a lot of code with no added functionality to 
>> what
>> HIDMA driver is trying to achieve.
>> 
>> Given that the use case here is only for debug purposes; I think it is 
>> OK
>> to keep this runtime call here. I don't want to add any overhead into 
>> the
>> existing code just to support the debug use case.
>> 
>> None of my register read/writes are slow. This file will only be used 
>> to
>> troubleshoot customer issues.
> 
> $ is always faster than MMIO. This way you can give reg contents to 
> users
> without waking up hw.
> 
> Also we at Intel use regmap on ACPI systems without CLK API

I can try and see the performance impact is. What happens to registers 
that hw updates like status registers. Those will be most interesting 
during debug. How does remap get updated for those? Is there a way to 
tell it not to cache certain registers
Sinan Kaya April 27, 2016, 1:25 p.m. UTC | #8
On 2016-04-27 04:47, Marc Zyngier wrote:
> On 27/04/16 09:15, Vinod Koul wrote:
>> On Tue, Apr 26, 2016 at 12:55:18PM -0400, Sinan Kaya wrote:
>>> On 4/26/2016 12:25 PM, Vinod Koul wrote:
>>>> On Tue, Apr 26, 2016 at 08:08:16AM -0400, okaya@codeaurora.org 
>>>> wrote:
>>>>> On 2016-04-25 23:30, Vinod Koul wrote:
>>>>>> On Mon, Apr 11, 2016 at 10:21:12AM -0400, Sinan Kaya wrote:
>>>>>> 
>>>>>>> +static int hidma_chan_stats(struct seq_file *s, void *unused)
>>>>>>> +{
>>>>>>> +	struct hidma_chan *mchan = s->private;
>>>>>>> +	struct hidma_desc *mdesc;
>>>>>>> +	struct hidma_dev *dmadev = mchan->dmadev;
>>>>>>> +
>>>>>>> +	pm_runtime_get_sync(dmadev->ddev.dev);
>>>>>> 
>>>>>> debug shouldn't power up device, why do you want to do that
>>>>> 
>>>>> 
>>>>> Clocks are turned off while the hw is idle. I can’t reach hw
>>>>> registers without restoring power.
>>>> 
>>>> Hmm, have you thought about using regmap?
>>>> 
>>> 
>>> To be honest, I didn't know what regmap is but I just read some code
>>> and looked at how it is used. Feel free to correct me if I got it
>>> wrong.
>>> 
>>> Regmap seems to be designed for *slow* speed peripherals to improve 
>>> frequent
>>> accesses by the SW. It looks like it is used by MFD, SPI and I2C 
>>> drivers.
>>> 
>>> It seems to cache the register contents and flush/invalidate them 
>>> only when
>>> needed.
>>> 
>>> The MMIO version seems to be assuming the presence of device-tree 
>>> like CLK
>>> API which doesn't exist on ACPI systems and is not portable.
>>> 
>>> My reaction is that it is a lot of code with no added functionality 
>>> to what
>>> HIDMA driver is trying to achieve.
>>> 
>>> Given that the use case here is only for debug purposes; I think it 
>>> is OK
>>> to keep this runtime call here. I don't want to add any overhead into 
>>> the
>>> existing code just to support the debug use case.
>>> 
>>> None of my register read/writes are slow. This file will only be used 
>>> to
>>> troubleshoot customer issues.
> 
> I'd recommend you actually run perf on a a few of your MMIO accesses. I
> believe the result will be eye opening. On the KVM side, we've trimmed
> our MMIO access as much as possible, using a memory-based cache 
> (similar
> to regmap in concept). This has made some code paths about 40% faster.
> 
>> $ is always faster than MMIO. This way you can give reg contents to 
>> users
>> without waking up hw.
> 
> Indeed. MMIO access sucks rocks, even on a very fast box. Actually, the
> faster the box is, the slower MMIO feels (compared to memory).
> 
> Thanks,
> 
> 	M.


Agreed. However, I need to understand how regmap really works under the 
covers and whether it is compatible with the hardware.
Sinan Kaya May 1, 2016, 4:35 a.m. UTC | #9
On 4/27/2016 8:51 AM, okaya@codeaurora.org wrote:
> On 2016-04-27 04:15, Vinod Koul wrote:
>> On Tue, Apr 26, 2016 at 12:55:18PM -0400, Sinan Kaya wrote:
>>> On 4/26/2016 12:25 PM, Vinod Koul wrote:
>>> > On Tue, Apr 26, 2016 at 08:08:16AM -0400, okaya@codeaurora.org wrote:
>>> >> On 2016-04-25 23:30, Vinod Koul wrote:
>>> >>> On Mon, Apr 11, 2016 at 10:21:12AM -0400, Sinan Kaya wrote:
>>> >>>
>>> >>>> +static int hidma_chan_stats(struct seq_file *s, void *unused)
>>> >>>> +{
>>> >>>> +    struct hidma_chan *mchan = s->private;
>>> >>>> +    struct hidma_desc *mdesc;
>>> >>>> +    struct hidma_dev *dmadev = mchan->dmadev;
>>> >>>> +
>>> >>>> +    pm_runtime_get_sync(dmadev->ddev.dev);
>>> >>>
>>> >>> debug shouldn't power up device, why do you want to do that
>>> >>
>>> >>
>>> >> Clocks are turned off while the hw is idle. I can’t reach hw
>>> >> registers without restoring power.
>>> >
>>> > Hmm, have you thought about using regmap?
>>> >
>>>
>>> To be honest, I didn't know what regmap is but I just read some code
>>> and looked at how it is used. Feel free to correct me if I got it
>>> wrong.
>>>
>>> Regmap seems to be designed for *slow* speed peripherals to improve frequent
>>> accesses by the SW. It looks like it is used by MFD, SPI and I2C drivers.
>>>
>>> It seems to cache the register contents and flush/invalidate them only when
>>> needed.
>>>
>>> The MMIO version seems to be assuming the presence of device-tree like CLK
>>> API which doesn't exist on ACPI systems and is not portable.
>>>
>>> My reaction is that it is a lot of code with no added functionality to what
>>> HIDMA driver is trying to achieve.
>>>
>>> Given that the use case here is only for debug purposes; I think it is OK
>>> to keep this runtime call here. I don't want to add any overhead into the
>>> existing code just to support the debug use case.
>>>
>>> None of my register read/writes are slow. This file will only be used to
>>> troubleshoot customer issues.
>>
>> $ is always faster than MMIO. This way you can give reg contents to users
>> without waking up hw.
>>
>> Also we at Intel use regmap on ACPI systems without CLK API
> 
> I can try and see the performance impact is. What happens to registers that hw updates like status registers. Those will be most interesting during debug. How does remap get updated for those? Is there a way to tell it not to cache certain registers

My evaluation turned out negative. The regmap code is nice for bus like peripherals
like I2C and SPI where everything is bitwise accessed. This is not the case
in this code. 

Regmap is a nice tool if used properly but it doesn't mean that it will work
in every single case. It doesn't match with the goal of this driver. 

As soon as I abstract register accesses, the regmap code writes all MMIO registers
with the readl variant functions.

Barriers are really expensive on ARM. I paid very special attention in the 
code to decide when to use relaxed version vs. the readl version. I lose 
all of this optimization. 

Since the clocks are restored only during the debug case, I don't see any
problems here. It is not worth the effort to do redo the whole thing and 
introduce errors as I see a lot of tripping points like regmap_sync variants.
Vinod Koul May 2, 2016, 9:25 a.m. UTC | #10
On Sun, May 01, 2016 at 12:35:37AM -0400, Sinan Kaya wrote:
> >>> >
> >>> > Hmm, have you thought about using regmap?
> >>>
> >>> To be honest, I didn't know what regmap is but I just read some code
> >>> and looked at how it is used. Feel free to correct me if I got it
> >>> wrong.
> >>>
> >>> Regmap seems to be designed for *slow* speed peripherals to improve frequent
> >>> accesses by the SW. It looks like it is used by MFD, SPI and I2C drivers.
> >>>
> >>> It seems to cache the register contents and flush/invalidate them only when
> >>> needed.
> >>>
> >>> The MMIO version seems to be assuming the presence of device-tree like CLK
> >>> API which doesn't exist on ACPI systems and is not portable.
> >>>
> >>> My reaction is that it is a lot of code with no added functionality to what
> >>> HIDMA driver is trying to achieve.
> >>>
> >>> Given that the use case here is only for debug purposes; I think it is OK
> >>> to keep this runtime call here. I don't want to add any overhead into the
> >>> existing code just to support the debug use case.
> >>>
> >>> None of my register read/writes are slow. This file will only be used to
> >>> troubleshoot customer issues.
> >>
> >> $ is always faster than MMIO. This way you can give reg contents to users
> >> without waking up hw.
> >>
> >> Also we at Intel use regmap on ACPI systems without CLK API
> > 
> > I can try and see the performance impact is. What happens to registers that hw updates like status registers. Those will be most interesting during debug. How does remap get updated for those? Is there a way to tell it not to cache certain registers
> 
> My evaluation turned out negative. The regmap code is nice for bus like peripherals
> like I2C and SPI where everything is bitwise accessed. This is not the case
> in this code. 

I do not entirely agree with the statements here, it does give big benefit
on our systems with MMIO. I am going to ask Mark to comment on this, he
know better and understands ARM.

I am probably going to be okay with this not using regmap and it is debug
but you should give that a try in future for better performance and ofcourse
you can add to regmap to get a better model for your device

> Regmap is a nice tool if used properly but it doesn't mean that it will work
> in every single case. It doesn't match with the goal of this driver. 
> 
> As soon as I abstract register accesses, the regmap code writes all MMIO registers
> with the readl variant functions.
> 
> Barriers are really expensive on ARM. I paid very special attention in the 
> code to decide when to use relaxed version vs. the readl version. I lose 
> all of this optimization. 
> 
> Since the clocks are restored only during the debug case, I don't see any
> problems here. It is not worth the effort to do redo the whole thing and 
> introduce errors as I see a lot of tripping points like regmap_sync variants.
Mark Brown May 2, 2016, 10:40 a.m. UTC | #11
On Mon, May 02, 2016 at 02:55:52PM +0530, Vinod Koul wrote:
> On Sun, May 01, 2016 at 12:35:37AM -0400, Sinan Kaya wrote:

> > My evaluation turned out negative. The regmap code is nice for bus like peripherals
> > like I2C and SPI where everything is bitwise accessed. This is not the case
> > in this code. 

> I do not entirely agree with the statements here, it does give big benefit
> on our systems with MMIO. I am going to ask Mark to comment on this, he
> know better and understands ARM.

> I am probably going to be okay with this not using regmap and it is debug
> but you should give that a try in future for better performance and ofcourse
> you can add to regmap to get a better model for your device

I've no idea what "this" is, sorry.  All I've got here is an enormous
backtrace with a bunch of the messages not even word wrapped.

Please be aware that I get CCed on so much irrelevant crap that copying
me into the middle of a thread about some other subsystem is likely to
get missed - almost always it's review of some patch I've got no
interest in.

> > Barriers are really expensive on ARM. I paid very special attention in the 
> > code to decide when to use relaxed version vs. the readl version. I lose 
> > all of this optimization. 

Drivers should not be using the relaxed accessors, they are there for
the generic code to build on not for drivers and they really need the
cache flush operations.  Getting the cache flush operations right,
especially on ARM, isn't easy and needs detailed review.
diff mbox

Patch

diff --git a/drivers/dma/qcom/Makefile b/drivers/dma/qcom/Makefile
index 6bf9267..4bfc38b 100644
--- a/drivers/dma/qcom/Makefile
+++ b/drivers/dma/qcom/Makefile
@@ -2,4 +2,4 @@  obj-$(CONFIG_QCOM_BAM_DMA) += bam_dma.o
 obj-$(CONFIG_QCOM_HIDMA_MGMT) += hdma_mgmt.o
 hdma_mgmt-objs	 := hidma_mgmt.o hidma_mgmt_sys.o
 obj-$(CONFIG_QCOM_HIDMA) +=  hdma.o
-hdma-objs        := hidma_ll.o hidma.o
+hdma-objs        := hidma_ll.o hidma.o hidma_dbg.o
diff --git a/drivers/dma/qcom/hidma.c b/drivers/dma/qcom/hidma.c
index f8960f1..8972508 100644
--- a/drivers/dma/qcom/hidma.c
+++ b/drivers/dma/qcom/hidma.c
@@ -1,7 +1,7 @@ 
 /*
  * Qualcomm Technologies HIDMA DMA engine interface
  *
- * Copyright (c) 2015, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2015-2016, The Linux Foundation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 and
@@ -681,6 +681,7 @@  static int hidma_probe(struct platform_device *pdev)
 
 	dmadev->irq = chirq;
 	tasklet_init(&dmadev->task, hidma_issue_task, (unsigned long)dmadev);
+	hidma_debug_init(dmadev);
 	dev_info(&pdev->dev, "HI-DMA engine driver registration complete\n");
 	platform_set_drvdata(pdev, dmadev);
 	pm_runtime_mark_last_busy(dmadev->ddev.dev);
@@ -689,6 +690,7 @@  static int hidma_probe(struct platform_device *pdev)
 	return 0;
 
 uninit:
+	hidma_debug_uninit(dmadev);
 	hidma_ll_uninit(dmadev->lldev);
 dmafree:
 	if (dmadev)
@@ -706,6 +708,7 @@  static int hidma_remove(struct platform_device *pdev)
 	pm_runtime_get_sync(dmadev->ddev.dev);
 	dma_async_device_unregister(&dmadev->ddev);
 	devm_free_irq(dmadev->ddev.dev, dmadev->irq, dmadev->lldev);
+	hidma_debug_uninit(dmadev);
 	hidma_ll_uninit(dmadev->lldev);
 	hidma_free(dmadev);
 
diff --git a/drivers/dma/qcom/hidma.h b/drivers/dma/qcom/hidma.h
index c5eea65..22806a2 100644
--- a/drivers/dma/qcom/hidma.h
+++ b/drivers/dma/qcom/hidma.h
@@ -157,4 +157,6 @@  int hidma_ll_uninit(struct hidma_lldev *llhndl);
 irqreturn_t hidma_ll_inthandler(int irq, void *arg);
 void hidma_cleanup_pending_tre(struct hidma_lldev *llhndl, u8 err_info,
 				u8 err_code);
+int hidma_debug_init(struct hidma_dev *dmadev);
+void hidma_debug_uninit(struct hidma_dev *dmadev);
 #endif
diff --git a/drivers/dma/qcom/hidma_dbg.c b/drivers/dma/qcom/hidma_dbg.c
new file mode 100644
index 0000000..68e779e
--- /dev/null
+++ b/drivers/dma/qcom/hidma_dbg.c
@@ -0,0 +1,219 @@ 
+/*
+ * Qualcomm Technologies HIDMA debug file
+ *
+ * Copyright (c) 2015-2016, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/debugfs.h>
+#include <linux/device.h>
+#include <linux/list.h>
+#include <linux/pm_runtime.h>
+
+#include "hidma.h"
+
+static void hidma_ll_chstats(struct seq_file *s, void *llhndl, u32 tre_ch)
+{
+	struct hidma_lldev *lldev = llhndl;
+	struct hidma_tre *tre;
+	u32 length;
+	dma_addr_t src_start;
+	dma_addr_t dest_start;
+	u32 *tre_local;
+
+	if (tre_ch >= lldev->nr_tres) {
+		dev_err(lldev->dev, "invalid TRE number in chstats:%d", tre_ch);
+		return;
+	}
+	tre = &lldev->trepool[tre_ch];
+	seq_printf(s, "------Channel %d -----\n", tre_ch);
+	seq_printf(s, "allocated=%d\n", atomic_read(&tre->allocated));
+	seq_printf(s, "queued = 0x%x\n", tre->queued);
+	seq_printf(s, "err_info = 0x%x\n",
+		   lldev->tx_status_list[tre->idx].err_info);
+	seq_printf(s, "err_code = 0x%x\n",
+		   lldev->tx_status_list[tre->idx].err_code);
+	seq_printf(s, "status = 0x%x\n", tre->status);
+	seq_printf(s, "idx = 0x%x\n", tre->idx);
+	seq_printf(s, "dma_sig = 0x%x\n", tre->dma_sig);
+	seq_printf(s, "dev_name=%s\n", tre->dev_name);
+	seq_printf(s, "callback=%p\n", tre->callback);
+	seq_printf(s, "data=%p\n", tre->data);
+	seq_printf(s, "tre_index = 0x%x\n", tre->tre_index);
+
+	tre_local = &tre->tre_local[0];
+	src_start = tre_local[HIDMA_TRE_SRC_LOW_IDX];
+	src_start = ((u64) (tre_local[HIDMA_TRE_SRC_HI_IDX]) << 32) + src_start;
+	dest_start = tre_local[HIDMA_TRE_DEST_LOW_IDX];
+	dest_start += ((u64) (tre_local[HIDMA_TRE_DEST_HI_IDX]) << 32);
+	length = tre_local[HIDMA_TRE_LEN_IDX];
+
+	seq_printf(s, "src=%pap\n", &src_start);
+	seq_printf(s, "dest=%pap\n", &dest_start);
+	seq_printf(s, "length = 0x%x\n", length);
+}
+
+static void hidma_ll_devstats(struct seq_file *s, void *llhndl)
+{
+	struct hidma_lldev *lldev = llhndl;
+
+	seq_puts(s, "------Device -----\n");
+	seq_printf(s, "lldev init = 0x%x\n", lldev->initialized);
+	seq_printf(s, "trch_state = 0x%x\n", lldev->trch_state);
+	seq_printf(s, "evch_state = 0x%x\n", lldev->evch_state);
+	seq_printf(s, "chidx = 0x%x\n", lldev->chidx);
+	seq_printf(s, "nr_tres = 0x%x\n", lldev->nr_tres);
+	seq_printf(s, "trca=%p\n", lldev->trca);
+	seq_printf(s, "tre_ring=%p\n", lldev->tre_ring);
+	seq_printf(s, "tre_ring_handle=%pap\n", &lldev->tre_ring_handle);
+	seq_printf(s, "tre_ring_size = 0x%x\n", lldev->tre_ring_size);
+	seq_printf(s, "tre_processed_off = 0x%x\n", lldev->tre_processed_off);
+	seq_printf(s, "pending_tre_count=%d\n", lldev->pending_tre_count);
+	seq_printf(s, "evca=%p\n", lldev->evca);
+	seq_printf(s, "evre_ring=%p\n", lldev->evre_ring);
+	seq_printf(s, "evre_ring_handle=%pap\n", &lldev->evre_ring_handle);
+	seq_printf(s, "evre_ring_size = 0x%x\n", lldev->evre_ring_size);
+	seq_printf(s, "evre_processed_off = 0x%x\n", lldev->evre_processed_off);
+	seq_printf(s, "tre_write_offset = 0x%x\n", lldev->tre_write_offset);
+}
+
+/*
+ * hidma_chan_stats: display HIDMA channel statistics
+ *
+ * Display the statistics for the current HIDMA virtual channel device.
+ */
+static int hidma_chan_stats(struct seq_file *s, void *unused)
+{
+	struct hidma_chan *mchan = s->private;
+	struct hidma_desc *mdesc;
+	struct hidma_dev *dmadev = mchan->dmadev;
+
+	pm_runtime_get_sync(dmadev->ddev.dev);
+	seq_printf(s, "paused=%u\n", mchan->paused);
+	seq_printf(s, "dma_sig=%u\n", mchan->dma_sig);
+	seq_puts(s, "prepared\n");
+	list_for_each_entry(mdesc, &mchan->prepared, node)
+		hidma_ll_chstats(s, mchan->dmadev->lldev, mdesc->tre_ch);
+
+	seq_puts(s, "active\n");
+	list_for_each_entry(mdesc, &mchan->active, node)
+		hidma_ll_chstats(s, mchan->dmadev->lldev, mdesc->tre_ch);
+
+	seq_puts(s, "completed\n");
+	list_for_each_entry(mdesc, &mchan->completed, node)
+		hidma_ll_chstats(s, mchan->dmadev->lldev, mdesc->tre_ch);
+
+	hidma_ll_devstats(s, mchan->dmadev->lldev);
+	pm_runtime_mark_last_busy(dmadev->ddev.dev);
+	pm_runtime_put_autosuspend(dmadev->ddev.dev);
+	return 0;
+}
+
+/*
+ * hidma_dma_info: display HIDMA device info
+ *
+ * Display the info for the current HIDMA device.
+ */
+static int hidma_dma_info(struct seq_file *s, void *unused)
+{
+	struct hidma_dev *dmadev = s->private;
+	resource_size_t sz;
+
+	seq_printf(s, "nr_descriptors=%d\n", dmadev->nr_descriptors);
+	seq_printf(s, "dev_trca=%p\n", &dmadev->dev_trca);
+	seq_printf(s, "dev_trca_phys=%pa\n", &dmadev->trca_resource->start);
+	sz = resource_size(dmadev->trca_resource);
+	seq_printf(s, "dev_trca_size=%pa\n", &sz);
+	seq_printf(s, "dev_evca=%p\n", &dmadev->dev_evca);
+	seq_printf(s, "dev_evca_phys=%pa\n", &dmadev->evca_resource->start);
+	sz = resource_size(dmadev->evca_resource);
+	seq_printf(s, "dev_evca_size=%pa\n", &sz);
+	return 0;
+}
+
+static int hidma_chan_stats_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, hidma_chan_stats, inode->i_private);
+}
+
+static int hidma_dma_info_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, hidma_dma_info, inode->i_private);
+}
+
+static const struct file_operations hidma_chan_fops = {
+	.open = hidma_chan_stats_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
+
+static const struct file_operations hidma_dma_fops = {
+	.open = hidma_dma_info_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
+
+void hidma_debug_uninit(struct hidma_dev *dmadev)
+{
+	debugfs_remove_recursive(dmadev->debugfs);
+	debugfs_remove_recursive(dmadev->stats);
+}
+
+int hidma_debug_init(struct hidma_dev *dmadev)
+{
+	int rc = 0;
+	int chidx = 0;
+	struct list_head *position = NULL;
+
+	dmadev->debugfs = debugfs_create_dir(dev_name(dmadev->ddev.dev), NULL);
+	if (!dmadev->debugfs) {
+		rc = -ENODEV;
+		return rc;
+	}
+
+	/* walk through the virtual channel list */
+	list_for_each(position, &dmadev->ddev.channels) {
+		struct hidma_chan *chan;
+
+		chan = list_entry(position, struct hidma_chan,
+				  chan.device_node);
+		sprintf(chan->dbg_name, "chan%d", chidx);
+		chan->debugfs = debugfs_create_dir(chan->dbg_name,
+						   dmadev->debugfs);
+		if (!chan->debugfs) {
+			rc = -ENOMEM;
+			goto cleanup;
+		}
+		chan->stats = debugfs_create_file("stats", S_IRUGO,
+						  chan->debugfs, chan,
+						  &hidma_chan_fops);
+		if (!chan->stats) {
+			rc = -ENOMEM;
+			goto cleanup;
+		}
+		chidx++;
+	}
+
+	dmadev->stats = debugfs_create_file("stats", S_IRUGO,
+					    dmadev->debugfs, dmadev,
+					    &hidma_dma_fops);
+	if (!dmadev->stats) {
+		rc = -ENOMEM;
+		goto cleanup;
+	}
+
+	return 0;
+cleanup:
+	hidma_debug_uninit(dmadev);
+	return rc;
+}