diff mbox

[08/20] mmc: host: pxamci: switch over to dmaengine use

Message ID 1386667657-26355-1-git-send-email-ynvich@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Sergey Yanovich Dec. 10, 2013, 9:27 a.m. UTC
Tested this patch rebased onto v3.13-rc2

In-kernel DMA driver was not modified.

Below is the full diff against Daniel's version based on v3.11-rc5

My changes are at the line 572 to fix build failure and at the lines
630 and 780 to receive DMA channels from device tree.

The device works in general, but it is slower than with the old DMA
and it reports sporadic failures like that
--->8---
[ 2848.451688] mmc0: DMA error on rx channel, status 1
[ 2848.458326] mmcblk0: error -5 transferring data, sector 7037462, nr 2, cmd response 0x900, card status 0x0
[ 2848.469245] end_request: I/O error, dev mmcblk0, sector 7037462
[ 2883.228720] mmc0: DMA error on rx channel, status 1
[ 2883.263844] mmcblk0: error -5 transferring data, sector 7163928, nr 8, cmd response 0x900, card status 0xb00
[ 2883.274240] mmcblk0: retrying using single block read
--->8---

Comments

Sergey Yanovich Dec. 10, 2013, 10:25 a.m. UTC | #1
On Tue, 2013-12-10 at 13:27 +0400, Sergei Ianovich wrote:
> The device works in general, but it is slower than with the old DMA
> and it reports sporadic failures like that

This took 40 minutes on my ARM device:
--->8---
# fsck.ext3 -cvf /dev/mmcblk0p1 
e2fsck 1.42.5 (29-Jul-2012)
Checking for bad blocks (read-only test): done                                                 
/dev/mmcblk0p1: Updating bad block inode.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/mmcblk0p1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/mmcblk0p1: ***** REBOOT LINUX *****

        8649 inodes used (3.56%, out of 242880)
           7 non-contiguous files (0.1%)
           7 non-contiguous directories (0.1%)
             # of inodes with ind/dind/tind blocks: 540/3/0
       91255 blocks used (9.40%, out of 970752)
           0 bad blocks
           1 large file

        6502 regular files
         860 directories
           0 character device files
           0 block device files
           1 fifo
           0 links
        1277 symbolic links (1277 fast symbolic links)
           0 sockets
------------
        8640 files
--->8---

The same card immediately after on my x86_64 PC, in 2 minutes:
--->8---
$ sudo fsck -cfv /dev/mmcblk0p1
fsck from util-linux 2.20.1
e2fsck 1.42.8 (20-Jun-2013)
Checking for bad blocks (read-only test):
done                                                 
/dev/mmcblk0p1: Updating bad block inode.
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found.  Fix<y>?
yes
Inode 24388 was part of the orphaned inode list.  FIXED.
Inode 24511 was part of the orphaned inode list.  FIXED.
Inode 24727 was part of the orphaned inode list.  FIXED.
Inode 32410 was part of the orphaned inode list.  FIXED.
Inode 32413 was part of the orphaned inode list.  FIXED.
Inode 57554 was part of the orphaned inode list.  FIXED.
Inode 57680 was part of the orphaned inode list.  FIXED.
Inode 57681 was part of the orphaned inode list.  FIXED.
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/mmcblk0p1: ***** FILE SYSTEM WAS MODIFIED *****

        8649 inodes used (3.56%, out of 242880)
           7 non-contiguous files (0.1%)
           7 non-contiguous directories (0.1%)
             # of inodes with ind/dind/tind blocks: 540/3/0
       91255 blocks used (9.40%, out of 970752)
           0 bad blocks
           1 large file

        6502 regular files
         860 directories
           0 character device files
           0 block device files
           1 fifo
           0 links
        1277 symbolic links (1277 fast symbolic links)
           0 sockets
------------
        8640 files
--->8---

The same command with old DMA driver took 23 minutes. There was no DMA
errors.

When retested the card on PC, there was:
Inodes that were part of a corrupted orphan linked list found.  Fix<y>? yes
Inode 57554 was part of the orphaned inode list.  FIXED.

The card was mounted as readonly rootfs while fsck run on ARM, so there
may be some background noise.

Conclusions:
1. The card itself is fine.

2. There are issues with new DMA driver. I am ready to test patches,
without questions.

3. There are issues with the MMC stack in my ARM device. Please drop a
pointer where to start digging.
Daniel Mack Dec. 10, 2013, 10:48 a.m. UTC | #2
On 12/10/2013 11:25 AM, Sergei Ianovich wrote:
> On Tue, 2013-12-10 at 13:27 +0400, Sergei Ianovich wrote:
>> The device works in general, but it is slower than with the old DMA
>> and it reports sporadic failures like that
> 
> This took 40 minutes on my ARM device:

[...]

> The same card immediately after on my x86_64 PC, in 2 minutes:

[...]

> The same command with old DMA driver took 23 minutes. There was no DMA
> errors.

I'd say something like bonnie++ is a more deterministic test for such
things, considering that the first fsck might have fixed things that the
2nd didn't have to touch.

Apart from that, your findings are interesting. I only tested pxamci
with a Wifi card connected via SDIO, and didn't see such a big
difference in performance.

> Conclusions:
> 1. The card itself is fine.
> 
> 2. There are issues with new DMA driver. I am ready to test patches,
> without questions.
> 
> 3. There are issues with the MMC stack in my ARM device. Please drop a
> pointer where to start digging.

Alright, thanks a lot for offering help! You could start having a look
at the DMA descriptors and see whether they differ in size, compared to
the old implementation. Taking time stamps to measure the turnaround
cycle of the transfers would be another thing that you can try.

I'm not aware of any code in the mmp_pdma driver that would burn lots of
CPU cycles, but that doesn't mean there aren't any.


Daniel
Sergey Yanovich Dec. 10, 2013, 3:04 p.m. UTC | #3
On Tue, 2013-12-10 at 11:48 +0100, Daniel Mack wrote:
> I'd say something like bonnie++ is a more deterministic test for such
> things, considering that the first fsck might have fixed things that the
> 2nd didn't have to touch.

The old driver:
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
pac26          300M    41  81   280   1   246   2   278  96  3056  10  99.2  23
Latency              1424ms    1054ms     510ms   95978us   86277us    3510ms
Version  1.96       ------Sequential Create------ --------Random Create--------
pac26               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16   354  42 16254  97   325  10   490  30 +++++ +++  1320  37
Latency              2354ms   48094us    6165ms    2979ms     586us    1116us
1.96,1.96,pac26,1,1386675098,300M,,41,81,280,1,246,2,278,96,3056,10,99.2,23,16,,,,,354,42,16254,97,325,10,490,30,+++++,+++,1320,37,1424ms,1054ms,510ms,95978us,86277us,3510ms,2354ms,48094us,6165ms,2979ms,586us,1116us

The new driver:
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
pac26          300M    38  77     0   0   218   2   267  93  1762   6  84.0  19
Latency              2522ms 138668225     792ms   67960us     141ms   16313ms
Version  1.96       ------Sequential Create------ --------Random Create--------
pac26               -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16   304  38 14930  95   383  13   525  31 +++++ +++  1042  28
Latency             13481ms   49916us    7144ms    4075ms     624us     993us
1.96,1.96,pac26,1,1639,300M,,38,77,0,0,218,2,267,93,1762,6,84.0,19,16,,,,,304,38,14930,95,383,13,525,31,+++++,+++,1042,28,2522ms,138668225,792ms,67960us,141ms,16313ms,13481ms,49916us,7144ms,4075ms,624us,993us

I forgot to run 'time bonnie++', so there is no hard numbers about total
time. My impression is that the old and new driver tests have run for 30
and 60 minutes respectively.

> Apart from that, your findings are interesting. I only tested pxamci
> with a Wifi card connected via SDIO, and didn't see such a big
> difference in performance.
> 
> > Conclusions:
> > 1. The card itself is fine.
> > 
> > 2. There are issues with new DMA driver. I am ready to test patches,
> > without questions.
> > 
> > 3. There are issues with the MMC stack in my ARM device. Please drop a
> > pointer where to start digging.
> 
> Alright, thanks a lot for offering help! You could start having a look
> at the DMA descriptors and see whether they differ in size, compared to
> the old implementation. Taking time stamps to measure the turnaround
> cycle of the transfers would be another thing that you can try.
> 
> I'm not aware of any code in the mmp_pdma driver that would burn lots of
> CPU cycles, but that doesn't mean there aren't any.

I can test patches, but I am not sure I can provide any meaningful help
with performance. Sorry.

In point 3 I was talking about non-DMA related performance. The device
has been busy running bonnie++, so I haven't had a change to verify. It
looks like my device is working with 1-bit-wide bus instead of
4-bit-wide (Bad block checking ran at 19500000 bits per second which
matches bus clock speed).
diff mbox

Patch

diff --git a/drivers/mmc/host/pxamci.c b/drivers/mmc/host/pxamci.c
index 7aa97eb..e5bafd2 100644
--- a/drivers/mmc/host/pxamci.c
+++ b/drivers/mmc/host/pxamci.c
@@ -85,7 +85,7 @@  struct pxamci_host {
 static inline void pxamci_init_ocr(struct pxamci_host *host)
 {
 #ifdef CONFIG_REGULATOR
-	host->vcc = regulator_get(mmc_dev(host->mmc), "vmmc");
+	host->vcc = regulator_get_optional(mmc_dev(host->mmc), "vmmc");
 
 	if (IS_ERR(host->vcc))
 		host->vcc = NULL;
@@ -555,6 +555,12 @@  static void pxamci_dma_irq(void *param)
 	struct dma_tx_state state;
 	enum dma_status status;
 	struct dma_chan *chan;
+	unsigned long flags;
+
+	spin_lock_irqsave(&host->lock, flags);
+
+	if (!host->data)
+		goto out_unlock;
 
 	if (host->data->flags & MMC_DATA_READ)
 		chan = host->dma_chan_rx;
@@ -563,14 +569,17 @@  static void pxamci_dma_irq(void *param)
 
 	status = dmaengine_tx_status(chan, host->dma_cookie, &state);
 
-	if (likely(status == DMA_SUCCESS)) {
+	if (likely(status == DMA_COMPLETE)) {
 		writel(BUF_PART_FULL, host->base + MMC_PRTBUF);
 	} else {
-		pr_err("%s: DMA error on %s channel\n", mmc_hostname(host->mmc),
-			host->data->flags & MMC_DATA_READ ? "rx" : "tx");
+		pr_err("%s: DMA error on %s channel, status %i\n", mmc_hostname(host->mmc),
+			host->data->flags & MMC_DATA_READ ? "rx" : "tx", status);
 		host->data->error = -EIO;
 		pxamci_data_done(host, 0);
 	}
+
+out_unlock:
+	spin_unlock_irqrestore(&host->lock, flags);
 }
 
 static irqreturn_t pxamci_detect_irq(int irq, void *devid)
@@ -618,11 +627,46 @@  static int pxamci_of_init(struct platform_device *pdev)
 
         return 0;
 }
+
+static int pxamci_of_init_dma(struct platform_device *pdev,
+		struct pxamci_host *host)
+{
+	struct device_node *np = pdev->dev.of_node;
+	u32 tmp;
+	int i;
+	int ret;
+
+	i = of_property_match_string(np, "dma-names", "rx");
+	if (i < 0)
+		return i;
+
+	ret = of_property_read_u32_index(np, "dmas", 2 * i + 1, &tmp);
+	if (ret < 0)
+		return ret;
+	host->dma_drcmrrx = tmp;
+
+	i = of_property_match_string(np, "dma-names", "tx");
+	if (i < 0)
+		return i;
+
+	ret = of_property_read_u32_index(np, "dmas", 2 * i + 1, &tmp);
+	if (ret < 0)
+		return ret;
+	host->dma_drcmrtx = tmp;
+
+	return 0;
+}
 #else
 static int pxamci_of_init(struct platform_device *pdev)
 {
         return 0;
 }
+
+static int pxamci_of_init_dma(struct platform_device *pdev,
+		struct pxamci_host *host)
+{
+	return -ENODATA;
+}
 #endif
 
 static int pxamci_probe(struct platform_device *pdev)
@@ -733,7 +777,7 @@  static int pxamci_probe(struct platform_device *pdev)
 
 	platform_set_drvdata(pdev, mmc);
 
-	if (!pdev->dev.of_node) {
+	if (pxamci_of_init_dma(pdev, host) < 0) {
 		dmarx = platform_get_resource(pdev, IORESOURCE_DMA, 0);
 		if (!dmarx) {
 			ret = -ENXIO;
@@ -896,35 +940,6 @@  static int pxamci_remove(struct platform_device *pdev)
 	return 0;
 }
 
-#ifdef CONFIG_PM
-static int pxamci_suspend(struct device *dev)
-{
-	struct mmc_host *mmc = dev_get_drvdata(dev);
-	int ret = 0;
-
-	if (mmc)
-		ret = mmc_suspend_host(mmc);
-
-	return ret;
-}
-
-static int pxamci_resume(struct device *dev)
-{
-	struct mmc_host *mmc = dev_get_drvdata(dev);
-	int ret = 0;
-
-	if (mmc)
-		ret = mmc_resume_host(mmc);
-
-	return ret;
-}
-
-static const struct dev_pm_ops pxamci_pm_ops = {
-	.suspend	= pxamci_suspend,
-	.resume		= pxamci_resume,
-};
-#endif
-
 static struct platform_driver pxamci_driver = {
 	.probe		= pxamci_probe,
 	.remove		= pxamci_remove,
@@ -932,9 +947,6 @@  static struct platform_driver pxamci_driver = {
 		.name	= DRIVER_NAME,
 		.owner	= THIS_MODULE,
 		.of_match_table = of_match_ptr(pxa_mmc_dt_ids),
-#ifdef CONFIG_PM
-		.pm	= &pxamci_pm_ops,
-#endif
 	},
 };