diff mbox

mmc: core: complete/wait_for_completion performance

Message ID 1481809814.29241.2.camel@embedded.rocks (mailing list archive)
State New, archived
Headers show

Commit Message

Jörg Krause Dec. 15, 2016, 1:50 p.m. UTC
Hi Stefan,

On Wed, 2016-12-14 at 19:57 +0100, Stefan Wahren wrote:
> Hi Jörg,
> 

[snip]

> > > 
> > > did you try cyclictest [1]?
> > 
> > Not yet. Not sure what to measure and which values to compare here.
> 
> i tought you have the vendor kernel and the mainline kernel available
> for your platform.
> 
> So you could compare the both kernels.

Yes, that's right. I will have a look at this tool.

> > 
> > > 
> > > Beside the time for a request the amount of requests for the
> > > complete
> > > iperf test
> > > would we interesting. Maybe there are retries.
> > > 
> > > I'm still interested in your PIO mode patches for mxs-mmc even
> > > without clean up.
> > 
> > Actually, the patch does not implement a PIO mode, but drops DMA
> > and
> > uses polling instead. I've attached the patch.
> 
> Thanks. I applied it, but unfortunately this breaks SD card support
> for my Duckbill and the kernel isn't able to mount the rootfs:
> 
> [    2.267073] mxs-mmc 80010000.ssp: initialized
> [    2.272624] mxs-mmc 80010000.ssp: AC command error 0xffffff92

Sorry, I messed up the branches. I attached the correct patch which is
working for me on Linux v4.9.

Jörg

Comments

Stefan Wahren Dec. 15, 2016, 6:51 p.m. UTC | #1
Hi Jörg,

> Jörg Krause <joerg.krause@embedded.rocks> hat am 15. Dezember 2016 um 14:50 geschrieben:
> 
> 
> Hi Stefan,
> 
> On Wed, 2016-12-14 at 19:57 +0100, Stefan Wahren wrote:
> > Hi Jörg,
> > 
> 
> [snip]
> 
> > > > 
> > > > did you try cyclictest [1]?
> > > 
> > > Not yet. Not sure what to measure and which values to compare here.
> > 
> > i tought you have the vendor kernel and the mainline kernel available
> > for your platform.
> > 
> > So you could compare the both kernels.
> 
> Yes, that's right. I will have a look at this tool.
> 
> > > 
> > > > 
> > > > Beside the time for a request the amount of requests for the
> > > > complete
> > > > iperf test
> > > > would we interesting. Maybe there are retries.
> > > > 
> > > > I'm still interested in your PIO mode patches for mxs-mmc even
> > > > without clean up.
> > > 
> > > Actually, the patch does not implement a PIO mode, but drops DMA
> > > and
> > > uses polling instead. I've attached the patch.
> > 
> > Thanks. I applied it, but unfortunately this breaks SD card support
> > for my Duckbill and the kernel isn't able to mount the rootfs:
> > 
> > [    2.267073] mxs-mmc 80010000.ssp: initialized
> > [    2.272624] mxs-mmc 80010000.ssp: AC command error 0xffffff92
> 
> Sorry, I messed up the branches. I attached the correct patch which is
> working for me on Linux v4.9.

i tested the second version but there isn't any performance gain with the patch.

Duckbill with class 10 SD card
Linux 4.8 without patch

dd if=/dev/zero of=test bs=1k count=10000
10000+0 records in
10000+0 records out
10240000 bytes (10 MB) copied, 2.68934 s, 3.8 MB/s

dd if=/dev/zero of=test bs=8k count=10000
10000+0 records in
10000+0 records out
81920000 bytes (82 MB) copied, 8.24305 s, 9.9 MB/s


Duckbill with class 10 SD card
Linux 4.8 with patch

dd if=/dev/zero of=test bs=1k count=10000
10000+0 records in
10000+0 records out
10240000 bytes (10 MB) copied, 3.41193 s, 3.0 MB/s

dd if=/dev/zero of=test bs=8k count=10000
10000+0 records in
10000+0 records out
81920000 bytes (82 MB) copied, 14.4564 s, 5.7 MB/s

Additionally i get these warning during boot:

[    2.278445] mxs-mmc 80010000.ssp: initialized
[    2.283996] mxs-mmc 80010000.ssp: AC command error -110
[    2.305158] mxs-mmc 80010000.ssp: AC command error -110
[    2.322975] mxs-mmc 80010000.ssp: AC command error -110
[    2.338660] mxs-mmc 80010000.ssp: AC command error -110
[    2.344289] mxs-mmc 80010000.ssp: AC command error -110
[    2.365653] mxs-mmc 80010000.ssp: AC command error -110

Regards
Stefan

> 
> Jörg
Jörg Krause Dec. 16, 2016, 10:06 a.m. UTC | #2
Hi Stefan,

On Thu, 2016-12-15 at 19:51 +0100, Stefan Wahren wrote:
> Hi Jörg,
> 
> > Jörg Krause <joerg.krause@embedded.rocks> hat am 15. Dezember 2016
> > um 14:50 geschrieben:
> > 
> > 
> > Hi Stefan,
> > 
> > On Wed, 2016-12-14 at 19:57 +0100, Stefan Wahren wrote:
> > > Hi Jörg,
> > > 
> > 
> > [snip]
> > 
> > > > > 
> > > > > did you try cyclictest [1]?
> > > > 
> > > > Not yet. Not sure what to measure and which values to compare
> > > > here.
> > > 
> > > i tought you have the vendor kernel and the mainline kernel
> > > available
> > > for your platform.
> > > 
> > > So you could compare the both kernels.
> > 
> > Yes, that's right. I will have a look at this tool.
> > 
> > > > 
> > > > > 
> > > > > Beside the time for a request the amount of requests for the
> > > > > complete
> > > > > iperf test
> > > > > would we interesting. Maybe there are retries.
> > > > > 
> > > > > I'm still interested in your PIO mode patches for mxs-mmc
> > > > > even
> > > > > without clean up.
> > > > 
> > > > Actually, the patch does not implement a PIO mode, but drops
> > > > DMA
> > > > and
> > > > uses polling instead. I've attached the patch.
> > > 
> > > Thanks. I applied it, but unfortunately this breaks SD card
> > > support
> > > for my Duckbill and the kernel isn't able to mount the rootfs:
> > > 
> > > [    2.267073] mxs-mmc 80010000.ssp: initialized
> > > [    2.272624] mxs-mmc 80010000.ssp: AC command error 0xffffff92
> > 
> > Sorry, I messed up the branches. I attached the correct patch which
> > is
> > working for me on Linux v4.9.
> 
> i tested the second version but there isn't any performance gain with
> the patch.

In the vendor kernel the polling is used only for small chunks of <=
1024 bytes to save the context switches when using DMA. This patch does
not use DMA at all, but only polling.

As I said before, I guess the limitation in the mxs-mmc driver is the
time needed to return the mmc request to the mmc core driver.

I have a Cubietruck with the same wifi chipset as on my i.MX28 target
where I get ~20Mbps throughput. Furthermore, I've found a benchmark on
a NXP thread [1] measuring about 30Mbps for an i.MX6 target and a
similiar wifi chip.

Looking at the sunxi-mmc driver shows that it calls mmc_request_done()
in an interrupt context and does not use the dmaengine driver at all.

For now, I would drop the polling mode and look how to optimize the
control flow between the DMA controller and the MMC host.
Unfortunately, this will need some time...

> Duckbill with class 10 SD card
> Linux 4.8 without patch
> 
> dd if=/dev/zero of=test bs=1k count=10000
> 10000+0 records in
> 10000+0 records out
> 10240000 bytes (10 MB) copied, 2.68934 s, 3.8 MB/s
> 
> dd if=/dev/zero of=test bs=8k count=10000
> 10000+0 records in
> 10000+0 records out
> 81920000 bytes (82 MB) copied, 8.24305 s, 9.9 MB/s
> 
> 
> Duckbill with class 10 SD card
> Linux 4.8 with patch
> 
> dd if=/dev/zero of=test bs=1k count=10000
> 10000+0 records in
> 10000+0 records out
> 10240000 bytes (10 MB) copied, 3.41193 s, 3.0 MB/s
> 
> dd if=/dev/zero of=test bs=8k count=10000
> 10000+0 records in
> 10000+0 records out
> 81920000 bytes (82 MB) copied, 14.4564 s, 5.7 MB/s
> 
> Additionally i get these warning during boot:
> 
> [    2.278445] mxs-mmc 80010000.ssp: initialized
> [    2.283996] mxs-mmc 80010000.ssp: AC command error -110
> [    2.305158] mxs-mmc 80010000.ssp: AC command error -110
> [    2.322975] mxs-mmc 80010000.ssp: AC command error -110
> [    2.338660] mxs-mmc 80010000.ssp: AC command error -110
> [    2.344289] mxs-mmc 80010000.ssp: AC command error -110
> [    2.365653] mxs-mmc 80010000.ssp: AC command error -110

I get this errors, too. The MMC host is sending some commands and the
MMC client is not (yet) responding to those commands. I haven't looked
any closer at this.

[1] https://community.nxp.com/thread/317396
Stefan Wahren Dec. 26, 2016, 11:03 p.m. UTC | #3
Hi Jörg,

> Jörg Krause <joerg.krause@embedded.rocks> hat am 16. Dezember 2016 um 11:06 geschrieben:
> 
> 
> Hi Stefan,
> 
> On Thu, 2016-12-15 at 19:51 +0100, Stefan Wahren wrote:
> > Hi Jörg,
> > 
> > > Jörg Krause <joerg.krause@embedded.rocks> hat am 15. Dezember 2016
> > > um 14:50 geschrieben:
> > > 
> > > 
> > > Hi Stefan,
> > > 
> > > On Wed, 2016-12-14 at 19:57 +0100, Stefan Wahren wrote:
> > > > Hi Jörg,
> > > > 
> > > 
> > > [snip]
> > > 
> > > > > > 
> > > > > > did you try cyclictest [1]?
> > > > > 
> > > > > Not yet. Not sure what to measure and which values to compare
> > > > > here.
> > > > 
> > > > i tought you have the vendor kernel and the mainline kernel
> > > > available
> > > > for your platform.
> > > > 
> > > > So you could compare the both kernels.
> > > 
> > > Yes, that's right. I will have a look at this tool.
> > > 
> > > > > 
> > > > > > 
> > > > > > Beside the time for a request the amount of requests for the
> > > > > > complete
> > > > > > iperf test
> > > > > > would we interesting. Maybe there are retries.
> > > > > > 
> > > > > > I'm still interested in your PIO mode patches for mxs-mmc
> > > > > > even
> > > > > > without clean up.
> > > > > 
> > > > > Actually, the patch does not implement a PIO mode, but drops
> > > > > DMA
> > > > > and
> > > > > uses polling instead. I've attached the patch.
> > > > 
> > > > Thanks. I applied it, but unfortunately this breaks SD card
> > > > support
> > > > for my Duckbill and the kernel isn't able to mount the rootfs:
> > > > 
> > > > [    2.267073] mxs-mmc 80010000.ssp: initialized
> > > > [    2.272624] mxs-mmc 80010000.ssp: AC command error 0xffffff92
> > > 
> > > Sorry, I messed up the branches. I attached the correct patch which
> > > is
> > > working for me on Linux v4.9.
> > 
> > i tested the second version but there isn't any performance gain with
> > the patch.
> 
> In the vendor kernel the polling is used only for small chunks of <=
> 1024 bytes to save the context switches when using DMA. This patch does
> not use DMA at all, but only polling.

also the vendor kernel uses polling for AC and BC commands. I tried this approach (use polling for AC/BC/BCR commands and DMA for all ADTC commands) [1] on Duckbill with SD card but the resulting read and write performance stays the same. Maybe you want to give it a try with Wifi over SDIO.

Here are some read performance values with Duckbill (Kernel 4.8, class 10 microSD card):

dd if=/dev/mmcblk0p2 of=/dev/null
64260+0 records in
64260+0 records out
32901120 bytes (33 MB) copied, 1.68618 s, 19.5 MB/s

> 
> As I said before, I guess the limitation in the mxs-mmc driver is the
> time needed to return the mmc request to the mmc core driver.

I don't think this is the problem. I added some GPIO handling into mxs-mmc driver and i couldn't see any big delay between the mmc requests with a logic analyzer.

> 
> I have a Cubietruck with the same wifi chipset as on my i.MX28 target
> where I get ~20Mbps throughput. Furthermore, I've found a benchmark on
> a NXP thread [1] measuring about 30Mbps for an i.MX6 target and a
> similiar wifi chip.
> 
> Looking at the sunxi-mmc driver shows that it calls mmc_request_done()
> in an interrupt context and does not use the dmaengine driver at all.
> 
> For now, I would drop the polling mode and look how to optimize the
> control flow between the DMA controller and the MMC host.
> Unfortunately, this will need some time...

I also rebased an old patch from Shawn Guo [2] with pre_req and post_req support, tried to call the DMA channel callback from the interrupt context instead of scheduling the tasklet within the DMA engine driver and implement CMD23 support [3]. But none of them show any measurable performance improvement.

Btw here are some really performance critical kernel config parameter which really needs to be disabled:

# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_WW_MUTEX_SLOWPATH is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_ATOMIC_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
# CONFIG_PROVE_RCU is not set

[1] - https://github.com/lategoodbye/linux-mxs-power/commit/beb341ed948ae9b8afe7378cff6b9d50144fd0b9
[2] - https://github.com/lategoodbye/linux-mxs-power/commit/e96b28e8730ccfcfecb7ec286102bc6969aa1ee0
[3] - https://github.com/lategoodbye/linux-mxs-power/commit/e53a3c9169a63eb61f9e67ff88724972acf312a9
Stefan Wahren Jan. 15, 2017, 9:08 p.m. UTC | #4
Hi Jörg,

> Jörg Krause <joerg.krause@embedded.rocks> hat am 15. Januar 2017 um 21:42 geschrieben:
> 
> 
> Hi Stefan,
> 
> On Tue, 2016-12-27 at 00:03 +0100, Stefan Wahren wrote:
> > Hi Jörg,
> > ...
> > I also rebased an old patch from Shawn Guo [2] with pre_req and
> > post_req support, tried to call the DMA channel callback from the
> > interrupt context instead of scheduling the tasklet within the DMA
> > engine driver and implement CMD23 support [3]. But none of them show
> > any measurable performance improvement.
> 
> I tested the three patches after disabling any debugging options in the
> config. There is no performance gain, but the timings have changed.
> Please have a look at the attached graphs. The time between complete()
> and wait_for_completion() is reduced to 15us whereas the time from
> return from wait_for_completion() to return to sdio_readsb() increases
> to 23us (from maybe 2us before).

that confirms my results. I think you are searching on the wrong layer. You better step "back" and take a look at the whole transfer instead of single blocks. Maybe there are some bigger delays.

Stefan

> 
> Jörg
diff mbox

Patch

From c321c217836fb08e31b026035a71b8ddad513a52 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=B6rg=20Krause?= <joerg.krause@embedded.rocks>
Date: Tue, 1 Nov 2016 18:02:46 +0100
Subject: [PATCH] mmc: mxs-mmc: use PIO mode
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Jörg Krause <joerg.krause@embedded.rocks>
---
 drivers/mmc/host/mxs-mmc.c  | 365 ++++++++++++++++++++++++++++++++++----------
 include/linux/spi/mxs-spi.h |  12 ++
 2 files changed, 300 insertions(+), 77 deletions(-)

diff --git a/drivers/mmc/host/mxs-mmc.c b/drivers/mmc/host/mxs-mmc.c
index d839147..958c073 100644
--- a/drivers/mmc/host/mxs-mmc.c
+++ b/drivers/mmc/host/mxs-mmc.c
@@ -47,14 +47,22 @@ 
 
 #define DRIVER_NAME	"mxs-mmc"
 
-#define MXS_MMC_IRQ_BITS	(BM_SSP_CTRL1_SDIO_IRQ		| \
-				 BM_SSP_CTRL1_RESP_ERR_IRQ	| \
-				 BM_SSP_CTRL1_RESP_TIMEOUT_IRQ	| \
-				 BM_SSP_CTRL1_DATA_TIMEOUT_IRQ	| \
-				 BM_SSP_CTRL1_DATA_CRC_IRQ	| \
-				 BM_SSP_CTRL1_FIFO_UNDERRUN_IRQ	| \
-				 BM_SSP_CTRL1_RECV_TIMEOUT_IRQ  | \
-				 BM_SSP_CTRL1_FIFO_OVERRUN_IRQ)
+#define MXS_MMC_ERR_IRQ_BITS  (BM_SSP_CTRL1_RESP_ERR_IRQ	| \
+				BM_SSP_CTRL1_RESP_TIMEOUT_IRQ	| \
+				BM_SSP_CTRL1_DATA_TIMEOUT_IRQ	| \
+				BM_SSP_CTRL1_DATA_CRC_IRQ	| \
+				BM_SSP_CTRL1_FIFO_UNDERRUN_IRQ	| \
+				BM_SSP_CTRL1_RECV_TIMEOUT_IRQ   | \
+				BM_SSP_CTRL1_FIFO_OVERRUN_IRQ)
+
+#define MXS_MMC_IRQ_BITS  (BM_SSP_CTRL1_SDIO_IRQ		| \
+				MXS_MMC_ERR_IRQ_BITS)
+
+#define MXS_MMC_ERR_BITS (BM_SSP_CTRL1_RESP_ERR_IRQ       	| \
+				BM_SSP_CTRL1_RESP_TIMEOUT_IRQ   | \
+				BM_SSP_CTRL1_DATA_TIMEOUT_IRQ   | \
+				BM_SSP_CTRL1_DATA_CRC_IRQ       | \
+				BM_SSP_CTRL1_RECV_TIMEOUT_IRQ)
 
 /* card detect polling timeout */
 #define MXS_MMC_DETECT_TIMEOUT			(HZ/2)
@@ -71,6 +79,10 @@  struct mxs_mmc_host {
 	spinlock_t			lock;
 	int				sdio_irq_en;
 	bool				broken_cd;
+
+	u32				status;
+	int				pio_size;
+	struct completion		dma_done;
 };
 
 static int mxs_mmc_get_cd(struct mmc_host *mmc)
@@ -256,38 +268,81 @@  static struct dma_async_tx_descriptor *mxs_mmc_prep_dma(
 	return desc;
 }
 
+/*
+ * Check for MMC command errors
+ * Returns error code or zero if no errors
+ */
+static inline int mxs_mmc_cmd_error(u32 status)
+{
+	int err = 0;
+
+	if (status & BM_SSP_STATUS_TIMEOUT)
+		err = -ETIMEDOUT;
+	else if (status & BM_SSP_STATUS_RESP_TIMEOUT)
+		err = -ETIMEDOUT;
+	else if (status & BM_SSP_STATUS_RESP_CRC_ERR)
+		err = -EILSEQ;
+	else if (status & BM_SSP_STATUS_RESP_ERR)
+		err = -EIO;
+
+	return err;
+}
+
 static void mxs_mmc_bc(struct mxs_mmc_host *host)
 {
 	struct mxs_ssp *ssp = &host->ssp;
 	struct mmc_command *cmd = host->cmd;
 	struct dma_async_tx_descriptor *desc;
-	u32 ctrl0, cmd0, cmd1;
+	u32 ctrl0, ctrl1, cmd0, cmd1, c1;
 
 	ctrl0 = BM_SSP_CTRL0_ENABLE | BM_SSP_CTRL0_IGNORE_CRC;
+	ctrl1 = BM_SSP_CTRL1_DMA_ENABLE |
+		BM_SSP_CTRL1_RECV_TIMEOUT_IRQ_EN |
+		BM_SSP_CTRL1_DATA_CRC_IRQ_EN |
+		BM_SSP_CTRL1_DATA_TIMEOUT_IRQ_EN |
+		BM_SSP_CTRL1_RESP_TIMEOUT_IRQ_EN |
+		BM_SSP_CTRL1_RESP_ERR_IRQ_EN;
 	cmd0 = BF_SSP(cmd->opcode, CMD0_CMD) | BM_SSP_CMD0_APPEND_8CYC;
-	cmd1 = cmd->arg;
+	cmd1 = BF_SSP(cmd->arg, CMD1_CMD);
 
 	if (host->sdio_irq_en) {
 		ctrl0 |= BM_SSP_CTRL0_SDIO_IRQ_CHECK;
 		cmd0 |= BM_SSP_CMD0_CONT_CLKING_EN | BM_SSP_CMD0_SLOW_CLKING_EN;
 	}
 
-	ssp->ssp_pio_words[0] = ctrl0;
-	ssp->ssp_pio_words[1] = cmd0;
-	ssp->ssp_pio_words[2] = cmd1;
-	ssp->dma_dir = DMA_NONE;
-	ssp->slave_dirn = DMA_TRANS_NONE;
-	desc = mxs_mmc_prep_dma(host, DMA_CTRL_ACK);
-	if (!desc)
-		goto out;
-
-	dmaengine_submit(desc);
-	dma_async_issue_pending(ssp->dmach);
-	return;
-
-out:
-	dev_warn(mmc_dev(host->mmc),
-		 "%s: failed to prep dma\n", __func__);
+	/* following IO operations */
+	writel(ctrl0, ssp->base + HW_SSP_CTRL0);
+	writel(cmd0, ssp->base + HW_SSP_CMD0);
+	writel(cmd1, ssp->base + HW_SSP_CMD1);
+
+	/* clear these bits */
+	writel(ctrl1, ssp->base + HW_SSP_CTRL1(ssp) + STMP_OFFSET_REG_CLR);
+	init_completion(&host->dma_done);
+	writel(BM_SSP_CTRL0_RUN,
+	       ssp->base + HW_SSP_CTRL0 + STMP_OFFSET_REG_SET);
+
+	/* wait here as long as SSP is running and busy */
+	while (readl(ssp->base + HW_SSP_CTRL0) & BM_SSP_CTRL0_RUN)
+		continue;
+	while (readl(ssp->base + HW_SSP_STATUS(ssp)) & BM_SSP_STATUS_BUSY)
+		continue;
+
+	host->status = readl(ssp->base + HW_SSP_STATUS(ssp));
+	c1 = readl(ssp->base + HW_SSP_CTRL1(ssp));
+
+	/* reset interrupt request status bits */
+	writel(c1 & MXS_MMC_ERR_IRQ_BITS,
+	       ssp->base + HW_SSP_CTRL1(ssp) + STMP_OFFSET_REG_CLR);
+
+	/* reenable these bits */
+	writel(ctrl1, ssp->base + HW_SSP_CTRL1(ssp) + STMP_OFFSET_REG_SET);
+	/* end IO operations */
+
+	cmd->error = mxs_mmc_cmd_error(host->status);
+
+	if (cmd->error) {
+		dev_warn(mmc_dev(host->mmc), "BC command error %d\n", cmd->error);
+	}
 }
 
 static void mxs_mmc_ac(struct mxs_mmc_host *host)
@@ -296,7 +351,7 @@  static void mxs_mmc_ac(struct mxs_mmc_host *host)
 	struct mmc_command *cmd = host->cmd;
 	struct dma_async_tx_descriptor *desc;
 	u32 ignore_crc, get_resp, long_resp;
-	u32 ctrl0, cmd0, cmd1;
+	u32 ctrl0, ctrl1, cmd0, cmd1, c1;
 
 	ignore_crc = (mmc_resp_type(cmd) & MMC_RSP_CRC) ?
 			0 : BM_SSP_CTRL0_IGNORE_CRC;
@@ -306,30 +361,76 @@  static void mxs_mmc_ac(struct mxs_mmc_host *host)
 			BM_SSP_CTRL0_LONG_RESP : 0;
 
 	ctrl0 = BM_SSP_CTRL0_ENABLE | ignore_crc | get_resp | long_resp;
+	ctrl1 = BM_SSP_CTRL1_DMA_ENABLE |
+		BM_SSP_CTRL1_RECV_TIMEOUT_IRQ_EN |
+		BM_SSP_CTRL1_DATA_CRC_IRQ_EN |
+		BM_SSP_CTRL1_DATA_TIMEOUT_IRQ_EN |
+		BM_SSP_CTRL1_RESP_TIMEOUT_IRQ_EN |
+		BM_SSP_CTRL1_RESP_ERR_IRQ_EN;
 	cmd0 = BF_SSP(cmd->opcode, CMD0_CMD);
-	cmd1 = cmd->arg;
+	cmd1 = BF_SSP(cmd->arg, CMD1_CMD);
 
 	if (host->sdio_irq_en) {
 		ctrl0 |= BM_SSP_CTRL0_SDIO_IRQ_CHECK;
 		cmd0 |= BM_SSP_CMD0_CONT_CLKING_EN | BM_SSP_CMD0_SLOW_CLKING_EN;
 	}
 
-	ssp->ssp_pio_words[0] = ctrl0;
-	ssp->ssp_pio_words[1] = cmd0;
-	ssp->ssp_pio_words[2] = cmd1;
-	ssp->dma_dir = DMA_NONE;
-	ssp->slave_dirn = DMA_TRANS_NONE;
-	desc = mxs_mmc_prep_dma(host, DMA_CTRL_ACK);
-	if (!desc)
-		goto out;
-
-	dmaengine_submit(desc);
-	dma_async_issue_pending(ssp->dmach);
-	return;
-
-out:
-	dev_warn(mmc_dev(host->mmc),
-		 "%s: failed to prep dma\n", __func__);
+	/* following IO operations */
+	writel(ctrl0, ssp->base + HW_SSP_CTRL0);
+	writel(cmd0, ssp->base + HW_SSP_CMD0);
+	writel(cmd1, ssp->base + HW_SSP_CMD1);
+
+	/* clear these bits */
+	writel(ctrl1, ssp->base + HW_SSP_CTRL1(ssp) + STMP_OFFSET_REG_CLR);
+	init_completion(&host->dma_done);
+	writel(BM_SSP_CTRL0_RUN,
+	       ssp->base + HW_SSP_CTRL0 + STMP_OFFSET_REG_SET);
+
+	/* wait here as long as SSP is running and busy */
+	while (readl(ssp->base + HW_SSP_CTRL0) & BM_SSP_CTRL0_RUN)
+		continue;
+	while (readl(ssp->base + HW_SSP_STATUS(ssp)) & BM_SSP_STATUS_BUSY)
+		continue;
+
+	host->status = readl(ssp->base + HW_SSP_STATUS(ssp));
+	c1 = readl(ssp->base + HW_SSP_CTRL1(ssp));
+
+	/* reset interrupt request status bits */
+	writel(c1 & MXS_MMC_ERR_IRQ_BITS,
+	       ssp->base + HW_SSP_CTRL1(ssp) + STMP_OFFSET_REG_CLR);
+
+	/* reenable these bits */
+	writel(ctrl1, ssp->base + HW_SSP_CTRL1(ssp) + STMP_OFFSET_REG_SET);
+	/* end IO operations */
+
+	switch (mmc_resp_type(cmd)) {
+	case MMC_RSP_NONE:
+		while (readl(ssp->base + HW_SSP_CTRL0) & BM_SSP_CTRL0_RUN)
+			continue;
+		break;
+	case MMC_RSP_R1:
+	case MMC_RSP_R1B:
+	case MMC_RSP_R3:
+		cmd->resp[0] = readl(ssp->base + HW_SSP_SDRESP0(ssp));
+		break;
+	case MMC_RSP_R2:
+		cmd->resp[3] = readl(ssp->base + HW_SSP_SDRESP0(ssp));
+		cmd->resp[2] = readl(ssp->base + HW_SSP_SDRESP1(ssp));
+		cmd->resp[1] = readl(ssp->base + HW_SSP_SDRESP2(ssp));
+		cmd->resp[0] = readl(ssp->base + HW_SSP_SDRESP3(ssp));
+		break;
+	default:
+		dev_warn(mmc_dev(host->mmc), "Unsupported response type 0x%x\n",
+			 mmc_resp_type(cmd));
+		BUG();
+		break;
+	}
+
+	cmd->error = mxs_mmc_cmd_error(host->status);
+
+	if (cmd->error) {
+		dev_warn(mmc_dev(host->mmc), "AC command error %d\n", cmd->error);
+	}
 }
 
 static unsigned short mxs_ns_to_ssp_ticks(unsigned clock_rate, unsigned ns)
@@ -357,6 +458,15 @@  static void mxs_mmc_adtc(struct mxs_mmc_host *host)
 	unsigned int sg_len = data->sg_len;
 	unsigned int i;
 
+	int is_reading = 0;
+	int index = 0;
+	int len;
+	struct scatterlist *_sg;
+	int size;
+	char *sgbuf;
+	u8 *p;
+	u32 _data, status;
+
 	unsigned short dma_data_dir, timeout;
 	enum dma_transfer_direction slave_dirn;
 	unsigned int data_size = 0, log2_blksz;
@@ -365,7 +475,7 @@  static void mxs_mmc_adtc(struct mxs_mmc_host *host)
 	struct mxs_ssp *ssp = &host->ssp;
 
 	u32 ignore_crc, get_resp, long_resp, read;
-	u32 ctrl0, cmd0, cmd1, val;
+	u32 ctrl0, ctrl1, cmd0, cmd1, val, c1;
 
 	ignore_crc = (mmc_resp_type(cmd) & MMC_RSP_CRC) ?
 			0 : BM_SSP_CTRL0_IGNORE_CRC;
@@ -378,10 +488,12 @@  static void mxs_mmc_adtc(struct mxs_mmc_host *host)
 		dma_data_dir = DMA_TO_DEVICE;
 		slave_dirn = DMA_MEM_TO_DEV;
 		read = 0;
+		is_reading = 0;
 	} else {
 		dma_data_dir = DMA_FROM_DEVICE;
 		slave_dirn = DMA_DEV_TO_MEM;
 		read = BM_SSP_CTRL0_READ;
+		is_reading = 1;
 	}
 
 	ctrl0 = BF_SSP(host->bus_width, CTRL0_BUS_WIDTH) |
@@ -428,38 +540,129 @@  static void mxs_mmc_adtc(struct mxs_mmc_host *host)
 		cmd0 |= BM_SSP_CMD0_CONT_CLKING_EN | BM_SSP_CMD0_SLOW_CLKING_EN;
 	}
 
-	/* set the timeout count */
-	timeout = mxs_ns_to_ssp_ticks(ssp->clk_rate, data->timeout_ns);
-	val = readl(ssp->base + HW_SSP_TIMING(ssp));
-	val &= ~(BM_SSP_TIMING_TIMEOUT);
-	val |= BF_SSP(timeout, TIMING_TIMEOUT);
-	writel(val, ssp->base + HW_SSP_TIMING(ssp));
-
-	/* pio */
-	ssp->ssp_pio_words[0] = ctrl0;
-	ssp->ssp_pio_words[1] = cmd0;
-	ssp->ssp_pio_words[2] = cmd1;
-	ssp->dma_dir = DMA_NONE;
-	ssp->slave_dirn = DMA_TRANS_NONE;
-	desc = mxs_mmc_prep_dma(host, 0);
-	if (!desc)
-		goto out;
-
-	/* append data sg */
-	WARN_ON(host->data != NULL);
-	host->data = data;
-	ssp->dma_dir = dma_data_dir;
-	ssp->slave_dirn = slave_dirn;
-	desc = mxs_mmc_prep_dma(host, DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
-	if (!desc)
-		goto out;
-
-	dmaengine_submit(desc);
-	dma_async_issue_pending(ssp->dmach);
-	return;
-out:
-	dev_warn(mmc_dev(host->mmc),
-		 "%s: failed to prep dma\n", __func__);
+	data_size = cmd->data->blksz * cmd->data->blocks;
+	u32 transfer_size = data_size;
+
+	_sg = host->cmd->data->sg;
+	len = host->cmd->data->sg_len;
+
+	ctrl1 =	BM_SSP_CTRL1_DMA_ENABLE |
+		BM_SSP_CTRL1_RECV_TIMEOUT_IRQ_EN |
+		BM_SSP_CTRL1_DATA_CRC_IRQ_EN |
+		BM_SSP_CTRL1_DATA_TIMEOUT_IRQ_EN |
+		BM_SSP_CTRL1_RESP_TIMEOUT_IRQ_EN |
+		BM_SSP_CTRL1_RESP_ERR_IRQ_EN;
+
+	writel(ctrl0, ssp->base + HW_SSP_CTRL0);
+	writel(cmd0, ssp->base + HW_SSP_CMD0);
+	writel(cmd1, ssp->base + HW_SSP_CMD1);
+	/* clear these bits */
+	writel(ctrl1, ssp->base + HW_SSP_CTRL1(ssp) + STMP_OFFSET_REG_CLR);
+	init_completion(&host->dma_done);
+	writel(BM_SSP_CTRL0_RUN, ssp->base + HW_SSP_CTRL0 + STMP_OFFSET_REG_SET);
+
+	while (readl(ssp->base + HW_SSP_CTRL0) & BM_SSP_STATUS_CMD_BUSY)
+		continue;
+
+	while (transfer_size) {
+		sgbuf = kmap_atomic(sg_page(&_sg[index])) + _sg[index].offset;
+
+		p = (u8 *)sgbuf;
+		size = transfer_size < _sg[index].length ? transfer_size : _sg[index].length;
+
+		if (is_reading) {
+			while (size) {
+				status = readl(ssp->base + HW_SSP_STATUS(ssp));
+				if (status & BM_SSP_STATUS_FIFO_EMPTY)
+					continue;
+				_data = readl(ssp->base + HW_SSP_DATA(ssp));
+				if ((u32)p & 0x3) {
+					*p++ = _data & 0xff;
+					*p++ = (_data >> 8) & 0xff;
+					*p++ = (_data >> 16) & 0xff;
+					*p++ = (_data >> 24) & 0xff;
+				} else {
+					*(u32 *)p = _data;
+					p += 4;
+				}
+				transfer_size -= 4;
+				size -= 4;
+			}
+		} else {
+			while (size) {
+				status = readl(ssp->base + HW_SSP_STATUS(ssp));
+				if (status & BM_SSP_STATUS_FIFO_FULL)
+					continue;
+				if ((u32)p & 0x3)
+					_data = p[0] | \
+						(p[1] << 8) | \
+						(p[2] << 16) | \
+						(p[3] << 24);
+				else
+					_data = *(u32 *)p;
+
+				writel(_data, ssp->base + HW_SSP_DATA(ssp));
+				transfer_size -= 4;
+				size -= 4;
+				p += 4;
+			}
+		}
+		kunmap_atomic(sgbuf);
+		index++;
+	}
+
+	while (readl(ssp->base + HW_SSP_STATUS(ssp)) & (BM_SSP_STATUS_BUSY |
+						   BM_SSP_STATUS_DATA_BUSY |
+						   BM_SSP_STATUS_CMD_BUSY))
+		continue;
+
+	cmd->data->bytes_xfered = data_size;
+
+	host->status = readl(ssp->base + HW_SSP_STATUS(ssp));
+
+	c1 = readl(ssp->base + HW_SSP_CTRL1(ssp));
+
+	/* reset interrupt request status bits */
+	writel(c1 & MXS_MMC_ERR_IRQ_BITS,
+	       ssp->base + HW_SSP_CTRL1(ssp) + STMP_OFFSET_REG_CLR);
+
+	/* reenable these bits */
+	writel(ctrl1, ssp->base + HW_SSP_CTRL1(ssp) + STMP_OFFSET_REG_SET);
+	/* end IO operations */
+
+	switch (mmc_resp_type(cmd)) {
+	case MMC_RSP_NONE:
+		break;
+	case MMC_RSP_R1:
+	case MMC_RSP_R3:
+		cmd->resp[0] =
+		    readl(ssp->base + HW_SSP_SDRESP0(ssp));
+		break;
+	case MMC_RSP_R2:
+		cmd->resp[3] =
+		    readl(ssp->base + HW_SSP_SDRESP0(ssp));
+		cmd->resp[2] =
+		    readl(ssp->base + HW_SSP_SDRESP1(ssp));
+		cmd->resp[1] =
+		    readl(ssp->base + HW_SSP_SDRESP2(ssp));
+		cmd->resp[0] =
+		    readl(ssp->base + HW_SSP_SDRESP3(ssp));
+		break;
+	default:
+		dev_warn(mmc_dev(host->mmc), "Unsupported response type 0x%x\n",
+			 mmc_resp_type(cmd));
+		BUG();
+		break;
+	}
+
+	cmd->error = mxs_mmc_cmd_error(host->status);
+
+	if (cmd->error) {
+		dev_warn(mmc_dev(host->mmc), "ADTC command error %d\n", cmd->error);
+	} else {
+		dev_dbg(mmc_dev(host->mmc), "Transferred %u bytes\n",
+			cmd->data->bytes_xfered);
+	}
 }
 
 static void mxs_mmc_start_cmd(struct mxs_mmc_host *host,
@@ -489,11 +692,18 @@  static void mxs_mmc_start_cmd(struct mxs_mmc_host *host,
 
 static void mxs_mmc_request(struct mmc_host *mmc, struct mmc_request *mrq)
 {
+	int done;
 	struct mxs_mmc_host *host = mmc_priv(mmc);
 
 	WARN_ON(host->mrq != NULL);
 	host->mrq = mrq;
 	mxs_mmc_start_cmd(host, mrq->cmd);
+
+	if (mrq->data && mrq->data->stop)
+		mxs_mmc_start_cmd(host, mrq->data->stop);
+
+	host->mrq = NULL;
+	mmc_request_done(mmc, mrq);
 }
 
 static void mxs_mmc_set_ios(struct mmc_host *mmc, struct mmc_ios *ios)
@@ -603,6 +813,7 @@  static int mxs_mmc_probe(struct platform_device *pdev)
 
 	host->mmc = mmc;
 	host->sdio_irq_en = 0;
+	host->pio_size = 0;
 
 	reg_vmmc = devm_regulator_get(&pdev->dev, "vmmc");
 	if (!IS_ERR(reg_vmmc)) {
diff --git a/include/linux/spi/mxs-spi.h b/include/linux/spi/mxs-spi.h
index 381d368..903c1c7 100644
--- a/include/linux/spi/mxs-spi.h
+++ b/include/linux/spi/mxs-spi.h
@@ -53,6 +53,8 @@ 
 #define  BP_SSP_CMD0_CMD			0
 #define  BM_SSP_CMD0_CMD			0xff
 #define HW_SSP_CMD1				0x020
+#define  BP_SSP_CMD1_CMD			0
+#define  BM_SSP_CMD1_CMD			0xFFFFFFFF
 #define HW_SSP_XFER_SIZE			0x030
 #define HW_SSP_BLOCK_SIZE			0x040
 #define  BP_SSP_BLOCK_SIZE_BLOCK_COUNT		4
@@ -116,6 +118,16 @@ 
 #define  BM_SSP_STATUS_CARD_DETECT		(1 << 28)
 #define  BM_SSP_STATUS_SDIO_IRQ			(1 << 17)
 #define  BM_SSP_STATUS_FIFO_EMPTY		(1 << 5)
+#define  BM_SSP_STATUS_RESP_CRC_ERR		0x00010000
+#define  BM_SSP_STATUS_RESP_ERR			0x00008000
+#define  BM_SSP_STATUS_RESP_TIMEOUT		0x00004000
+#define  BM_SSP_STATUS_DATA_CRC_ERR		0x00002000
+#define  BM_SSP_STATUS_TIMEOUT			0x00001000
+#define  BM_SSP_STATUS_FIFO_FULL		0x00000100
+#define  BM_SSP_STATUS_CMD_BUSY			0x00000008
+#define  BM_SSP_STATUS_DATA_BUSY		0x00000004
+#define  BM_SSP_STATUS_RSVD0			0x00000002
+#define  BM_SSP_STATUS_BUSY			0x00000001
 
 #define BF_SSP(value, field)	(((value) << BP_SSP_##field) & BM_SSP_##field)
 
-- 
2.10.2