From patchwork Wed May 4 15:10:26 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Martin K. Petersen" X-Patchwork-Id: 754062 Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25]) by demeter1.kernel.org (8.14.4/8.14.3) with ESMTP id p44FDbxG030198 for ; Wed, 4 May 2011 15:14:03 GMT Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p44FAwce018329; Wed, 4 May 2011 11:10:59 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p44FAu6B025935; Wed, 4 May 2011 11:10:56 -0400 Received: from mx1.redhat.com (ext-mx11.extmail.prod.ext.phx2.redhat.com [10.5.110.16]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p44FApSx001391; Wed, 4 May 2011 11:10:51 -0400 Received: from rcsinet10.oracle.com (rcsinet10.oracle.com [148.87.113.121]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p44FAlrc003686; Wed, 4 May 2011 11:10:47 -0400 Received: from rtcsinet22.oracle.com (rtcsinet22.oracle.com [66.248.204.30]) by rcsinet10.oracle.com (Switch-3.4.2/Switch-3.4.2) with ESMTP id p44FAbpk026387 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 4 May 2011 15:10:39 GMT Received: from acsmt358.oracle.com (acsmt358.oracle.com [141.146.40.158]) by rtcsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id p44FAauY018606 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 4 May 2011 15:10:36 GMT Received: from abhmt006.oracle.com (abhmt006.oracle.com [141.146.116.15]) by acsmt358.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id p44FAUXc028694; Wed, 4 May 2011 10:10:30 -0500 Received: from groovelator.mkp.net (/209.217.122.111) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 04 May 2011 08:10:27 -0700 To: Lukas Czerner From: "Martin K. Petersen" Organization: Oracle References: <20110413224025.GA18589@redhat.com> <20110413234854.GA19793@redhat.com> <20110426173213.GA19604@redhat.com> <20110428001912.GA14659@redhat.com> <20110428075355.GA2190@infradead.org> <20110428205935.GA24979@redhat.com> <20110429122454.GL32370@agk-dp.fab.redhat.com> <20110502081308.GC8642@agk-dp.fab.redhat.com> <20110502081925.GA11312@infradead.org> Date: Wed, 04 May 2011 11:10:26 -0400 In-Reply-To: (Lukas Czerner's message of "Tue, 3 May 2011 10:57:19 +0200 (CEST)") Message-ID: User-Agent: Gnus/5.110013 (No Gnus v0.13) Emacs/23.2 (gnu/linux) MIME-Version: 1.0 X-Source-IP: rtcsinet22.oracle.com [66.248.204.30] X-CT-RefId: str=0001.0A090207.4DC16C70.0055:SCFSTAT5015188,ss=1,fgs=0 X-RedHat-Spam-Score: -102.309 (RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY, USER_IN_WHITELIST) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Scanned-By: MIMEDefang 2.68 on 10.5.110.16 Cc: sandeen@redhat.com, "Martin K. Petersen" , Mike Snitzer , Christoph Hellwig , device-mapper development , DarkNovaNick@gmail.com, linux-lvm@redhat.com, linux-ext4@vger.kernel.org, Alasdair G Kergon Subject: Re: [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target] X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk Reply-To: device-mapper development List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter1.kernel.org [140.211.167.41]); Wed, 04 May 2011 15:14:03 +0000 (UTC) >>>>> "Lukas" == Lukas Czerner writes: Lukas> Nevertheless there is something weird going on, because even when Lukas> I create striped volume I get this: Could you please try the following patch? It has a bunch of small tweaks to the discard stack in it. I'll split it up before posting for real but I'd like to know if it fixes your issue... block/libata/scsi: Various logical block provisioning fixes - Add sysfs documentation for the discard topology parameters - Fix discard stacking problem - Switch our libata SAT over to using the WRITE SAME limits - UNMAP alignment needs to be converted to bytes - Only report alignment and zeroes_data if the device supports discard Reported-by: Lukas Czerner Signed-off-by: Martin K. Petersen --- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block index 4873c75..c1eb41c 100644 --- a/Documentation/ABI/testing/sysfs-block +++ b/Documentation/ABI/testing/sysfs-block @@ -142,3 +142,67 @@ Description: with the previous I/O request are enabled. When set to 2, all merge tries are disabled. The default value is 0 - which enables all types of merge tries. + +What: /sys/block//discard_alignment +Date: May 2011 +Contact: Martin K. Petersen +Description: + Devices that support discard functionality may + internally allocate space in units that are bigger than + the exported logical block size. The discard_alignment + parameter indicates how many bytes the beginning of the + device is offset from the internal allocation unit's + natural alignment. + +What: /sys/block///discard_alignment +Date: May 2011 +Contact: Martin K. Petersen +Description: + Devices that support discard functionality may + internally allocate space in units that are bigger than + the exported logical block size. The discard_alignment + parameter indicates how many bytes the beginning of the + partition is offset from the internal allocation unit's + natural alignment. + +What: /sys/block//queue/discard_granularity +Date: May 2011 +Contact: Martin K. Petersen +Description: + Devices that support discard functionality may + internally allocate space using units that are bigger + than the logical block size. The discard_granularity + parameter indicates the size of the internal allocation + unit in bytes if reported by the device. Otherwise the + discard_granularity will be set to match the device's + physical block size. A discard_granularity of 0 means + that the device does not support discard functionality. + +What: /sys/block//queue/discard_max_bytes +Date: May 2011 +Contact: Martin K. Petersen +Description: + Devices that support discard functionality may have + internal limits on the number of bytes that can be + trimmed or unmapped in a single operation. Some storage + protocols also have inherent limits on the number of + blocks that can be described in a single command. The + discard_max_bytes parameter is set by the device driver + to the maximum number of bytes that can be discarded in + a single operation. Discard requests issued to the + device must not exceed this limit. A discard_max_bytes + value of 0 means that the device does not support + discard functionality. + +What: /sys/block//queue/discard_zeroes_data +Date: May 2011 +Contact: Martin K. Petersen +Description: + Devices that support discard functionality may return + stale or random data when a previously discarded block + is read back. This can cause problems if the filesystem + expects discarded blocks to be explicitly cleared. If a + device reports that it deterministically returns zeroes + when a discarded area is read the discard_zeroes_data + parameter will be set to one. Otherwise it will be 0 and + the result of reading a discarded area is undefined. diff --git a/block/blk-settings.c b/block/blk-settings.c index 1fa7692..42d3bf5 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -120,7 +120,7 @@ void blk_set_default_limits(struct queue_limits *lim) lim->discard_granularity = 0; lim->discard_alignment = 0; lim->discard_misaligned = 0; - lim->discard_zeroes_data = -1; + lim->discard_zeroes_data = 1; lim->logical_block_size = lim->physical_block_size = lim->io_min = 512; lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT); lim->alignment_offset = 0; @@ -166,6 +166,7 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn) blk_set_default_limits(&q->limits); blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS); + q->limits.discard_zeroes_data = 0; /* * by default assume old behaviour and bounce for any highmem page diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index e2f57e9e..30ea95f 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -2138,7 +2138,7 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf) * with the unmap bit set. */ if (ata_id_has_trim(args->id)) { - put_unaligned_be32(65535 * 512 / 8, &rbuf[20]); + put_unaligned_be64(65535 * 512 / 8, &rbuf[36]); put_unaligned_be32(1, &rbuf[28]); } diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index bd0806e..cc081dd 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -490,7 +490,8 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode) unsigned int max_blocks = 0; q->limits.discard_zeroes_data = sdkp->lbprz; - q->limits.discard_alignment = sdkp->unmap_alignment; + q->limits.discard_alignment = sdkp->unmap_alignment * + (logical_block_size >> 9); q->limits.discard_granularity = max(sdkp->physical_block_size, sdkp->unmap_granularity * logical_block_size); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2ad95fa..1416e3c 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -257,7 +257,7 @@ struct queue_limits { unsigned char misaligned; unsigned char discard_misaligned; unsigned char cluster; - signed char discard_zeroes_data; + unsigned char discard_zeroes_data; }; struct request_queue @@ -1066,13 +1066,16 @@ static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector { unsigned int alignment = (sector << 9) & (lim->discard_granularity - 1); + if (!lim->max_discard_sectors) + return 0; + return (lim->discard_granularity + lim->discard_alignment - alignment) & (lim->discard_granularity - 1); } static inline unsigned int queue_discard_zeroes_data(struct request_queue *q) { - if (q->limits.discard_zeroes_data == 1) + if (q->limits.max_discard_sectors && q->limits.discard_zeroes_data == 1) return 1; return 0;