diff mbox

[RFT] mmc: block: implement REQ_OP_WRITE_ZEROES

Message ID 1527493766-15487-1-git-send-email-shawn.lin@rock-chips.com (mailing list archive)
State New, archived
Headers show

Commit Message

Shawn Lin May 28, 2018, 7:49 a.m. UTC
card->erased_byte read from SCR(for SD cards) or EXT_CSD[181](for eMMC)
indicates whether the TRIM or ERASE make the erased data content zeros,
but DISCARD doesn't. Use the fact to implement REQ_OP_WRITE_ZEROES.

Cc: Faiz Abbas <faiz_abbas@ti.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
---
Hi Faiz,
Would you like to test this patch to see if it could
solve your performance drop on the first write after
filesystem creation?

 drivers/mmc/core/block.c | 10 +++++++++-
 drivers/mmc/core/queue.c |  3 +++
 2 files changed, 12 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig May 28, 2018, 7:57 a.m. UTC | #1
On Mon, May 28, 2018 at 03:49:26PM +0800, Shawn Lin wrote:
> card->erased_byte read from SCR(for SD cards) or EXT_CSD[181](for eMMC)
> indicates whether the TRIM or ERASE make the erased data content zeros,
> but DISCARD doesn't. Use the fact to implement REQ_OP_WRITE_ZEROES.

I don't have the mmc spec in front of my, but the ATA/SCSI and NVMe
equivalents all have the problem that if the device decideds to not
deallocate the blocks it can leave the old data in place.  Can you
confirm that MMC gurantees that this is not the case?
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shawn Lin May 28, 2018, 8:05 a.m. UTC | #2
On 2018/5/28 15:57, Christoph Hellwig wrote:
> On Mon, May 28, 2018 at 03:49:26PM +0800, Shawn Lin wrote:
>> card->erased_byte read from SCR(for SD cards) or EXT_CSD[181](for eMMC)
>> indicates whether the TRIM or ERASE make the erased data content zeros,
>> but DISCARD doesn't. Use the fact to implement REQ_OP_WRITE_ZEROES.
> 
> I don't have the mmc spec in front of my, but the ATA/SCSI and NVMe
> equivalents all have the problem that if the device decideds to not
> deallocate the blocks it can leave the old data in place.  Can you
> confirm that MMC gurantees that this is not the case?

Yes. Quoting from JESD85-B51 spec for eMMC, section 6.6.10 TRIM,
"The contents of a write block where the trim function has been applied
shall be '0' or '1' depending on different memory technology. This value
is defined in the EXT_CSD.".  But 6.6.12 Discard says "The
contents of a write block where the discard function has been applied
shall be ‘don’t care’After discard operation, the original data may
be remained partially or fully accessible to the host dependent on
device. The portions of data that are no longer accessible by the host
may be removed or unmapped just as in the case of TRIM. The device will
decide the contents of discarded write block."

Apart from the mandatory requirement in spec, I confirmed this with
several device vendors as well, in the past.

> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig May 28, 2018, 8:51 a.m. UTC | #3
On Mon, May 28, 2018 at 04:05:38PM +0800, Shawn Lin wrote:
> On 2018/5/28 15:57, Christoph Hellwig wrote:
>> On Mon, May 28, 2018 at 03:49:26PM +0800, Shawn Lin wrote:
>>> card->erased_byte read from SCR(for SD cards) or EXT_CSD[181](for eMMC)
>>> indicates whether the TRIM or ERASE make the erased data content zeros,
>>> but DISCARD doesn't. Use the fact to implement REQ_OP_WRITE_ZEROES.
>>
>> I don't have the mmc spec in front of my, but the ATA/SCSI and NVMe
>> equivalents all have the problem that if the device decideds to not
>> deallocate the blocks it can leave the old data in place.  Can you
>> confirm that MMC gurantees that this is not the case?
>
> Yes. Quoting from JESD85-B51 spec for eMMC, section 6.6.10 TRIM,
> "The contents of a write block where the trim function has been applied
> shall be '0' or '1' depending on different memory technology. This value
> is defined in the EXT_CSD.".  But 6.6.12 Discard says "The
> contents of a write block where the discard function has been applied
> shall be ‘don’t care’After discard operation, the original data may
> be remained partially or fully accessible to the host dependent on
> device. The portions of data that are no longer accessible by the host
> may be removed or unmapped just as in the case of TRIM. The device will
> decide the contents of discarded write block."
>
> Apart from the mandatory requirement in spec, I confirmed this with
> several device vendors as well, in the past.

So don't we need to check that the device actually supports the TRIM
operation (that is call mmc_can_trim()) before setting
max_write_zeroes_sectors?  Also you probably need a function different
from mmc_calc_max_discard to calculate max_write_zeroes_sectors given
that the different methods seem to have different limits.
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shawn Lin May 28, 2018, 9:03 a.m. UTC | #4
On 2018/5/28 16:51, Christoph Hellwig wrote:
> On Mon, May 28, 2018 at 04:05:38PM +0800, Shawn Lin wrote:
>> On 2018/5/28 15:57, Christoph Hellwig wrote:
>>> On Mon, May 28, 2018 at 03:49:26PM +0800, Shawn Lin wrote:
>>>> card->erased_byte read from SCR(for SD cards) or EXT_CSD[181](for eMMC)
>>>> indicates whether the TRIM or ERASE make the erased data content zeros,
>>>> but DISCARD doesn't. Use the fact to implement REQ_OP_WRITE_ZEROES.
>>>
>>> I don't have the mmc spec in front of my, but the ATA/SCSI and NVMe
>>> equivalents all have the problem that if the device decideds to not
>>> deallocate the blocks it can leave the old data in place.  Can you
>>> confirm that MMC gurantees that this is not the case?
>>
>> Yes. Quoting from JESD85-B51 spec for eMMC, section 6.6.10 TRIM,
>> "The contents of a write block where the trim function has been applied
>> shall be '0' or '1' depending on different memory technology. This value
>> is defined in the EXT_CSD.".  But 6.6.12 Discard says "The
>> contents of a write block where the discard function has been applied
>> shall be ‘don’t care’After discard operation, the original data may
>> be remained partially or fully accessible to the host dependent on
>> device. The portions of data that are no longer accessible by the host
>> may be removed or unmapped just as in the case of TRIM. The device will
>> decide the contents of discarded write block."
>>
>> Apart from the mandatory requirement in spec, I confirmed this with
>> several device vendors as well, in the past.
> 
> So don't we need to check that the device actually supports the TRIM
> operation (that is call mmc_can_trim()) before setting

mmc_can_erase() could help, as it, ERASE command, also guarantees the
data to be zeros after erased. And that checking is done before we set
max_write_zeroes_sectors.

drivers/mmc/core/queue.c
364         if (mmc_can_erase(card))
365                 mmc_queue_setup_discard(mq->queue, card);


> max_write_zeroes_sectors?  Also you probably need a function different
> from mmc_calc_max_discard to calculate max_write_zeroes_sectors given
> that the different methods seem to have different limits.

At this point, we needn't. The limitation of ERASE/TRIM/DISCARD for mmc
are the same. And mmc_calc_max_discard already do the right thing IMHO.

> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Faiz Abbas May 30, 2018, 6:05 a.m. UTC | #5
Hi Shawn, Christoph,
On Monday 28 May 2018 01:19 PM, Shawn Lin wrote:
> card->erased_byte read from SCR(for SD cards) or EXT_CSD[181](for eMMC)
> indicates whether the TRIM or ERASE make the erased data content zeros,
> but DISCARD doesn't. Use the fact to implement REQ_OP_WRITE_ZEROES.
> 
> Cc: Faiz Abbas <faiz_abbas@ti.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
> ---
> Hi Faiz,
> Would you like to test this patch to see if it could
> solve your performance drop on the first write after
> filesystem creation?
> 

This patch doesn't fix the issue for me. I tested it on the latest
master. I should get to further analyzing this by early next week.

Log (notice first throughput lower than others):
https://pastebin.ubuntu.com/p/NmZqzwfNjz/

Thanks,
Faiz
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index a0b9102..7d59887 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1132,7 +1132,14 @@  static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req)
 	from = blk_rq_pos(req);
 	nr = blk_rq_sectors(req);
 
-	if (mmc_can_discard(card))
+	/*
+	 * DISCARD says the contents of a write block where the discard
+	 * function has been applied shold be 'don't care'. But TRIM and
+	 * ERASE should have explicit '1' or '0' indicated by
+	 * card->erased_byte. So REQ_OP_WRITE_ZEROES should use TRIM/ERASE
+	 * instead.
+	 */
+	if (mmc_can_discard(card) && req_op(req) != REQ_OP_WRITE_ZEROES)
 		arg = MMC_DISCARD_ARG;
 	else if (mmc_can_trim(card))
 		arg = MMC_TRIM_ARG;
@@ -2228,6 +2235,7 @@  enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req)
 			mmc_blk_issue_drv_op(mq, req);
 			break;
 		case REQ_OP_DISCARD:
+		case REQ_OP_WRITE_ZEROES:
 			mmc_blk_issue_discard_rq(mq, req);
 			break;
 		case REQ_OP_SECURE_ERASE:
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 56e9a80..2657a46 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -51,6 +51,7 @@  static enum mmc_issue_type mmc_cqe_issue_type(struct mmc_host *host,
 	case REQ_OP_DRV_OUT:
 	case REQ_OP_DISCARD:
 	case REQ_OP_SECURE_ERASE:
+	case REQ_OP_WRITE_ZEROES:
 		return MMC_ISSUE_SYNC;
 	case REQ_OP_FLUSH:
 		return mmc_cqe_can_dcmd(host) ? MMC_ISSUE_DCMD : MMC_ISSUE_SYNC;
@@ -193,6 +194,8 @@  static void mmc_queue_setup_discard(struct request_queue *q,
 		q->limits.discard_granularity = 0;
 	if (mmc_can_secure_erase_trim(card))
 		blk_queue_flag_set(QUEUE_FLAG_SECERASE, q);
+	if (!card->erased_byte)
+		q->limits.max_write_zeroes_sectors = max_discard;
 }
 
 /**