From patchwork Wed Aug 16 13:19:41 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukas Czerner X-Patchwork-Id: 9903757 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 35AEA60231 for ; Wed, 16 Aug 2017 13:19:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 25B20289EE for ; Wed, 16 Aug 2017 13:19:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 177F5289F0; Wed, 16 Aug 2017 13:19:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 650AF289EE for ; Wed, 16 Aug 2017 13:19:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751851AbdHPNTz (ORCPT ); Wed, 16 Aug 2017 09:19:55 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39190 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751654AbdHPNTy (ORCPT ); Wed, 16 Aug 2017 09:19:54 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6EC927ACD4; Wed, 16 Aug 2017 13:19:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 6EC927ACD4 Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=lczerner@redhat.com Received: from rh_laptop.brq.redhat.com (unknown [10.43.17.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id C95367DE0C; Wed, 16 Aug 2017 13:19:52 +0000 (UTC) From: Lukas Czerner To: linux-fsdevel@vger.kernel.org Cc: axboe@kernel.dk, martin.petersen@oracle.com, hch@lst.de, Lukas Czerner Subject: [PATCH] block: reintroduce discard_zeroes_data sysfs file and BLKDISCARDZEROES Date: Wed, 16 Aug 2017 15:19:41 +0200 Message-Id: <1502889581-19483-1-git-send-email-lczerner@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Wed, 16 Aug 2017 13:19:54 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Discard and zeroout code has been significantly rewritten recently and as a part of the rewrite we got rid o f the discard_zeroes_data flag. With commit 48920ff2a5a9 ("block: remove the discard_zeroes_data flag") discard_zeroes_data sysfs file and discard_zeroes_data ioctl now always returns zero, regardless of what the device actually supports. This has broken userspace utilities in a way that they will not take advantage of this functionality even if the device actually supports it. Now in order for user to figure out whether the device does suppot deterministic read zeroes after discard without actually running fallocate is to check for discard support (discard_max_bytes) and zeroout hw offload (write_zeroes_max_bytes). However we still have discard_zeroes_data sysfs file and BLKDISCARDZEROES ioctl so I do not see any reason why not to do this check in kernel and provide convenient and compatible way to continue to export this information to use space. With this patch both BLKDISCARDZEROES ioctl and discard_zeroes_data will return 1 in the case that discard and hw offload for write zeroes is supported. Otherwise it will return 0. Signed-off-by: Lukas Czerner --- Documentation/ABI/testing/sysfs-block | 11 +++++++++-- Documentation/block/queue-sysfs.txt | 5 +++++ block/blk-sysfs.c | 5 ++++- block/ioctl.c | 6 +++++- 4 files changed, 23 insertions(+), 4 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block index dea212d..6ea0d03 100644 --- a/Documentation/ABI/testing/sysfs-block +++ b/Documentation/ABI/testing/sysfs-block @@ -213,8 +213,15 @@ What: /sys/block//queue/discard_zeroes_data Date: May 2011 Contact: Martin K. Petersen Description: - Will always return 0. Don't rely on any specific behavior - for discards, and don't read this file. + Devices that support discard functionality may return + stale or random data when a previously discarded block + is read back. This can cause problems if the filesystem + expects discarded blocks to be explicitly cleared. If a + device reports that it deterministically returns zeroes + when a discarded area is read the discard_zeroes_data + parameter will be set to one. Otherwise it will be 0 and + the result of reading a discarded area is undefined. + What: /sys/block//queue/write_same_max_bytes Date: January 2012 diff --git a/Documentation/block/queue-sysfs.txt b/Documentation/block/queue-sysfs.txt index 2c1e670..b7f6bdc 100644 --- a/Documentation/block/queue-sysfs.txt +++ b/Documentation/block/queue-sysfs.txt @@ -43,6 +43,11 @@ large discards are issued, setting this value lower will make Linux issue smaller discards and potentially help reduce latencies induced by large discard operations. +discard_zeroes_data (RO) +------------------------ +When read, this file will show if the discarded block are zeroed by the +device or not. If its value is '1' the blocks are zeroed otherwise not. + hw_sector_size (RO) ------------------- This is the hardware sector size of the device, in bytes. diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 27aceab..5b41ad0 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -209,7 +209,10 @@ static ssize_t queue_discard_max_store(struct request_queue *q, static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *page) { - return queue_var_show(0, page); + if (blk_queue_discard(q) && q->limits.max_write_zeroes_sectors) + return queue_var_show(1, page); + else + return queue_var_show(0, page); } static ssize_t queue_write_same_max_show(struct request_queue *q, char *page) diff --git a/block/ioctl.c b/block/ioctl.c index 0de02ee..faecd44 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -508,6 +508,7 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, void __user *argp = (void __user *)arg; loff_t size; unsigned int max_sectors; + struct request_queue *q = bdev_get_queue(bdev); switch (cmd) { case BLKFLSBUF: @@ -547,7 +548,10 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, case BLKALIGNOFF: return put_int(arg, bdev_alignment_offset(bdev)); case BLKDISCARDZEROES: - return put_uint(arg, 0); + if (blk_queue_discard(q) && q->limits.max_write_zeroes_sectors) + return put_uint(arg, 1); + else + return put_uint(arg, 0); case BLKSECTGET: max_sectors = min_t(unsigned int, USHRT_MAX, queue_max_sectors(bdev_get_queue(bdev)));