From patchwork Wed Jun 24 02:55:57 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Martin K. Petersen" X-Patchwork-Id: 6664621 Return-Path: X-Original-To: patchwork-linux-scsi@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 86A22C05AC for ; Wed, 24 Jun 2015 02:56:15 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 87F7420614 for ; Wed, 24 Jun 2015 02:56:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5A21C2060D for ; Wed, 24 Jun 2015 02:56:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753892AbbFXC4I (ORCPT ); Tue, 23 Jun 2015 22:56:08 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:45869 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752959AbbFXC4G (ORCPT ); Tue, 23 Jun 2015 22:56:06 -0400 Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id t5O2u0Y2002408 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 24 Jun 2015 02:56:01 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userv0021.oracle.com (8.13.8/8.13.8) with ESMTP id t5O2u0f8023661 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Wed, 24 Jun 2015 02:56:00 GMT Received: from abhmp0007.oracle.com (abhmp0007.oracle.com [141.146.116.13]) by userv0122.oracle.com (8.13.8/8.13.8) with ESMTP id t5O2u03O008340; Wed, 24 Jun 2015 02:56:00 GMT Received: from mellowmood.mkp.net (/141.144.6.238) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 23 Jun 2015 19:55:58 -0700 To: Tom Yan Cc: "Martin K. Petersen" , linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org Subject: Re: configurable discard parameters From: "Martin K. Petersen" Organization: Oracle Corporation References: Date: Tue, 23 Jun 2015 22:55:57 -0400 In-Reply-To: (Tom Yan's message of "Wed, 24 Jun 2015 05:25:20 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux) MIME-Version: 1.0 X-Source-IP: userv0021.oracle.com [156.151.31.71] Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP >>>>> "Tom" == Tom Yan writes: Tom> total untrimmed sectors: (310635 - 1 - 2410) / 32 = 9632 total Tom> ranges: (43 GiB * (1024 ^ 3) / 512) / 65535 ~= 1376 average Tom> untrimmed sectors per range: (9632 / 1376) = 7 = (65535 % 8) Every type of drive has its own internal restrictions. Unless the drive guarantees zero after TRIM it is perfectly normal for heads, tails or random pieces in-between to be left untouched. I am surprised that the common case of contiguous range entries was not handled properly by your drive. Most of them deal with that just fine (regardless of their internal granularity and alignment constraints). It is normally only the partial block tracking between command invocations that causes grief. In any case. Unless discard_zeroes_data is 1 (and that requires guarantees above and beyond what can be discovered via the ATA Command Set) all bets are off wrt. dependable behavior. The code below is an untested proof of concept to see what it would take to alleviate your particular issue. I am, however, not at all convinced that introducing such a change is worth the risk. I know of at least a couple of devices that would blow up as a result of this patch... diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 3131adcc1f87..9c270a303f0b 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -2238,8 +2238,24 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf) * with the unmap bit set. */ if (ata_id_has_trim(args->id)) { - put_unaligned_be64(65535 * 512 / 8, &rbuf[36]); - put_unaligned_be32(1, &rbuf[28]); + /* + * If the drive supports reliable zero after trim we set + * the granularity to 1 and the blocks per range to + * 0xffff. Otherwise we set set the granularity to 8 and + * restrict the blocks per range to 0xfff8. This is done + * to accommodate older drives which prefer 4K-alignment. + */ + + if (ata_id_has_zero_after_trim(args->id) && + args->dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM) { + put_unaligned_be64(ATA_MAX_TRIM_RANGE * 512 / 8, + &rbuf[36]); + put_unaligned_be32(1, &rbuf[28]); + } else { + put_unaligned_be64(ATA_ALIGNED_TRIM_RANGE * 512 / 8, + &rbuf[36]); + put_unaligned_be32(8, &rbuf[28]); + } } return 0; @@ -3168,7 +3184,14 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc) goto invalid_fld; buf = page_address(sg_page(scsi_sglist(scmd))); - size = ata_set_lba_range_entries(buf, 512, block, n_block); + + if (ata_id_has_zero_after_trim(dev->id) && + dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM) + size = ata_set_lba_range_entries(buf, 512, block, n_block, + ATA_MAX_TRIM_RANGE); + else + size = ata_set_lba_range_entries(buf, 512, block, n_block, + ATA_ALIGNED_TRIM_RANGE); if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) { /* Newer devices support queued TRIM commands */ diff --git a/include/linux/ata.h b/include/linux/ata.h index fed36418dd1c..8a17fa22cdbe 100644 --- a/include/linux/ata.h +++ b/include/linux/ata.h @@ -47,6 +47,8 @@ enum { ATA_MAX_SECTORS = 256, ATA_MAX_SECTORS_LBA48 = 65535,/* TODO: 65536? */ ATA_MAX_SECTORS_TAPE = 65535, + ATA_MAX_TRIM_RANGE = 0xffff, + ATA_ALIGNED_TRIM_RANGE = 0xfff8, ATA_ID_WORDS = 256, ATA_ID_CONFIG = 0, @@ -1012,19 +1014,20 @@ static inline void ata_id_to_hd_driveid(u16 *id) * TO NV CACHE PINNED SET. */ static inline unsigned ata_set_lba_range_entries(void *_buffer, - unsigned buf_size, u64 sector, unsigned long count) + unsigned buf_size, u64 sector, unsigned long count, + u16 bpe) { __le64 *buffer = _buffer; unsigned i = 0, used_bytes; while (i < buf_size / 8 ) { /* 6-byte LBA + 2-byte range per entry */ u64 entry = sector | - ((u64)(count > 0xffff ? 0xffff : count) << 48); + ((u64)(count > bpe ? bpe : count) << 48); buffer[i++] = __cpu_to_le64(entry); - if (count <= 0xffff) + if (count <= bpe) break; - count -= 0xffff; - sector += 0xffff; + count -= bpe; + sector += bpe; } used_bytes = ALIGN(i * 8, 512);