[v5,2/2] loop: Better discard support for block devices

Message ID	20190506182736.21064-3-evgreen@chromium.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> From: Evan Green <evgreen@chromium.org> To: Jens Axboe <axboe@kernel.dk>, Martin K Petersen <martin.petersen@oracle.com> Cc: Bart Van Assche <bvanassche@acm.org>, Gwendal Grignou <gwendal@chromium.org>, Alexis Savery <asavery@chromium.org>, Ming Lei <ming.lei@redhat.com>, Evan Green <evgreen@chromium.org>, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v5 2/2] loop: Better discard support for block devices Date: Mon, 6 May 2019 11:27:36 -0700 Message-Id: <20190506182736.21064-3-evgreen@chromium.org> In-Reply-To: <20190506182736.21064-1-evgreen@chromium.org> References: <20190506182736.21064-1-evgreen@chromium.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-block-owner@vger.kernel.org Precedence: bulk
Series	loop: Better discard for block devices \| expand [v5,0/2] loop: Better discard for block devices [v5,1/2] loop: Report EOPNOTSUPP properly [v5,2/2] loop: Better discard support for block devices

Message ID

20190506182736.21064-3-evgreen@chromium.org (mailing list archive)

State

New, archived

Headers

From: Evan Green <evgreen@chromium.org>
To: Jens Axboe <axboe@kernel.dk>,
        Martin K Petersen <martin.petersen@oracle.com>
Cc: Bart Van Assche <bvanassche@acm.org>,
        Gwendal Grignou <gwendal@chromium.org>,
        Alexis Savery <asavery@chromium.org>,
        Ming Lei <ming.lei@redhat.com>,
        Evan Green <evgreen@chromium.org>, linux-block@vger.kernel.org,
        linux-kernel@vger.kernel.org
Subject: [PATCH v5 2/2] loop: Better discard support for block devices
Date: Mon,  6 May 2019 11:27:36 -0700
Message-Id: <20190506182736.21064-3-evgreen@chromium.org>
In-Reply-To: <20190506182736.21064-1-evgreen@chromium.org>
References: <20190506182736.21064-1-evgreen@chromium.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk

Series

loop: Better discard for block devices | expand

Commit Message

Evan Green May 6, 2019, 6:27 p.m. UTC

If the backing device for a loop device is a block device,
then mirror the "write zeroes" capabilities of the underlying
block device into the loop device. Copy this capability into both
max_write_zeroes_sectors and max_discard_sectors of the loop device.

The reason for this is that REQ_OP_DISCARD on a loop device translates
into blkdev_issue_zeroout(), rather than blkdev_issue_discard(). This
presents a consistent interface for loop devices (that discarded data
is zeroed), regardless of the backing device type of the loop device.
There should be no behavior change for loop devices backed by regular
files.

While in there, differentiate between REQ_OP_DISCARD and
REQ_OP_WRITE_ZEROES, which are different for block devices,
but which the loop device had just been lumping together, since
they're largely the same for files.

This change fixes blktest block/003, and removes an extraneous
error print in block/013 when testing on a loop device backed
by a block device that does not support discard.

Signed-off-by: Evan Green <evgreen@chromium.org>
---

Changes in v5:
- Don't mirror discard if lo_encrypt_key_size is non-zero (Gwendal)

Changes in v4:
- Mirror blkdev's write_zeroes into loopdev's discard_sectors.

Changes in v3:
- Updated commit description

Changes in v2: None

 drivers/block/loop.c | 57 ++++++++++++++++++++++++++++----------------
 1 file changed, 37 insertions(+), 20 deletions(-)

Comments

Gwendal Grignou May 7, 2019, 4:47 p.m. UTC | #1

Reviewed-by: Gwendal Grignou <gwendal@chromium.org>

On Mon, May 6, 2019 at 11:30 AM Evan Green <evgreen@chromium.org> wrote:
>
> If the backing device for a loop device is a block device,
> then mirror the "write zeroes" capabilities of the underlying
> block device into the loop device. Copy this capability into both
> max_write_zeroes_sectors and max_discard_sectors of the loop device.
>
> The reason for this is that REQ_OP_DISCARD on a loop device translates
> into blkdev_issue_zeroout(), rather than blkdev_issue_discard(). This
> presents a consistent interface for loop devices (that discarded data
> is zeroed), regardless of the backing device type of the loop device.
> There should be no behavior change for loop devices backed by regular
> files.
>
> While in there, differentiate between REQ_OP_DISCARD and
> REQ_OP_WRITE_ZEROES, which are different for block devices,
> but which the loop device had just been lumping together, since
> they're largely the same for files.
>
> This change fixes blktest block/003, and removes an extraneous
> error print in block/013 when testing on a loop device backed
> by a block device that does not support discard.
>
> Signed-off-by: Evan Green <evgreen@chromium.org>
> ---
>
> Changes in v5:
> - Don't mirror discard if lo_encrypt_key_size is non-zero (Gwendal)
>
> Changes in v4:
> - Mirror blkdev's write_zeroes into loopdev's discard_sectors.
>
> Changes in v3:
> - Updated commit description
>
> Changes in v2: None
>
>  drivers/block/loop.c | 57 ++++++++++++++++++++++++++++----------------
>  1 file changed, 37 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index bbf21ebeccd3..a147210ed009 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -417,19 +417,14 @@ static int lo_read_transfer(struct loop_device *lo, struct request *rq,
>         return ret;
>  }
>
> -static int lo_discard(struct loop_device *lo, struct request *rq, loff_t pos)
> +static int lo_discard(struct loop_device *lo, struct request *rq,
> +               int mode, loff_t pos)
>  {
> -       /*
> -        * We use punch hole to reclaim the free space used by the
> -        * image a.k.a. discard. However we do not support discard if
> -        * encryption is enabled, because it may give an attacker
> -        * useful information.
> -        */
>         struct file *file = lo->lo_backing_file;
> -       int mode = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE;
> +       struct request_queue *q = lo->lo_queue;
>         int ret;
>
> -       if ((!file->f_op->fallocate) || lo->lo_encrypt_key_size) {
> +       if (!blk_queue_discard(q)) {
>                 ret = -EOPNOTSUPP;
>                 goto out;
>         }
> @@ -599,8 +594,13 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
>         case REQ_OP_FLUSH:
>                 return lo_req_flush(lo, rq);
>         case REQ_OP_DISCARD:
> +               return lo_discard(lo, rq,
> +                       FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, pos);
> +
>         case REQ_OP_WRITE_ZEROES:
> -               return lo_discard(lo, rq, pos);
> +               return lo_discard(lo, rq,
> +                       FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE, pos);
> +
>         case REQ_OP_WRITE:
>                 if (lo->transfer)
>                         return lo_write_transfer(lo, rq, pos);
> @@ -854,6 +854,21 @@ static void loop_config_discard(struct loop_device *lo)
>         struct file *file = lo->lo_backing_file;
>         struct inode *inode = file->f_mapping->host;
>         struct request_queue *q = lo->lo_queue;
> +       struct request_queue *backingq;
> +
> +       /*
> +        * If the backing device is a block device, mirror its zeroing
> +        * capability. REQ_OP_DISCARD translates to a zero-out even when backed
> +        * by block devices to keep consistent behavior with file-backed loop
> +        * devices.
> +        */
> +       if (S_ISBLK(inode->i_mode) && !lo->lo_encrypt_key_size) {
> +               backingq = bdev_get_queue(inode->i_bdev);
> +               blk_queue_max_discard_sectors(q,
> +                       backingq->limits.max_write_zeroes_sectors);
> +
> +               blk_queue_max_write_zeroes_sectors(q,
> +                       backingq->limits.max_write_zeroes_sectors);
>
>         /*
>          * We use punch hole to reclaim the free space used by the
> @@ -861,22 +876,24 @@ static void loop_config_discard(struct loop_device *lo)
>          * encryption is enabled, because it may give an attacker
>          * useful information.
>          */
> -       if ((!file->f_op->fallocate) ||
> -           lo->lo_encrypt_key_size) {
> +       } else if ((!file->f_op->fallocate) || lo->lo_encrypt_key_size) {
>                 q->limits.discard_granularity = 0;
>                 q->limits.discard_alignment = 0;
>                 blk_queue_max_discard_sectors(q, 0);
>                 blk_queue_max_write_zeroes_sectors(q, 0);
> -               blk_queue_flag_clear(QUEUE_FLAG_DISCARD, q);
> -               return;
> -       }
>
> -       q->limits.discard_granularity = inode->i_sb->s_blocksize;
> -       q->limits.discard_alignment = 0;
> +       } else {
> +               q->limits.discard_granularity = inode->i_sb->s_blocksize;
> +               q->limits.discard_alignment = 0;
> +
> +               blk_queue_max_discard_sectors(q, UINT_MAX >> 9);
> +               blk_queue_max_write_zeroes_sectors(q, UINT_MAX >> 9);
> +       }
>
> -       blk_queue_max_discard_sectors(q, UINT_MAX >> 9);
> -       blk_queue_max_write_zeroes_sectors(q, UINT_MAX >> 9);
> -       blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
> +       if (q->limits.max_write_zeroes_sectors)
> +               blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
> +       else
> +               blk_queue_flag_clear(QUEUE_FLAG_DISCARD, q);
>  }
>
>  static void loop_unprepare_queue(struct loop_device *lo)
> --
> 2.20.1
>

Chaitanya Kulkarni May 9, 2019, 4:15 p.m. UTC | #2

Looks good to me.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

On 05/06/2019 11:30 AM, Evan Green wrote:
> If the backing device for a loop device is a block device,
> then mirror the "write zeroes" capabilities of the underlying
> block device into the loop device. Copy this capability into both
> max_write_zeroes_sectors and max_discard_sectors of the loop device.
>
> The reason for this is that REQ_OP_DISCARD on a loop device translates
> into blkdev_issue_zeroout(), rather than blkdev_issue_discard(). This
> presents a consistent interface for loop devices (that discarded data
> is zeroed), regardless of the backing device type of the loop device.
> There should be no behavior change for loop devices backed by regular
> files.
>
> While in there, differentiate between REQ_OP_DISCARD and
> REQ_OP_WRITE_ZEROES, which are different for block devices,
> but which the loop device had just been lumping together, since
> they're largely the same for files.
>
> This change fixes blktest block/003, and removes an extraneous
> error print in block/013 when testing on a loop device backed
> by a block device that does not support discard.
>
> Signed-off-by: Evan Green <evgreen@chromium.org>
> ---
>
> Changes in v5:
> - Don't mirror discard if lo_encrypt_key_size is non-zero (Gwendal)
>
> Changes in v4:
> - Mirror blkdev's write_zeroes into loopdev's discard_sectors.
>
> Changes in v3:
> - Updated commit description
>
> Changes in v2: None
>
>   drivers/block/loop.c | 57 ++++++++++++++++++++++++++++----------------
>   1 file changed, 37 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index bbf21ebeccd3..a147210ed009 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -417,19 +417,14 @@ static int lo_read_transfer(struct loop_device *lo, struct request *rq,
>   	return ret;
>   }
>
> -static int lo_discard(struct loop_device *lo, struct request *rq, loff_t pos)
> +static int lo_discard(struct loop_device *lo, struct request *rq,
> +		int mode, loff_t pos)
>   {
> -	/*
> -	 * We use punch hole to reclaim the free space used by the
> -	 * image a.k.a. discard. However we do not support discard if
> -	 * encryption is enabled, because it may give an attacker
> -	 * useful information.
> -	 */
>   	struct file *file = lo->lo_backing_file;
> -	int mode = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE;
> +	struct request_queue *q = lo->lo_queue;
>   	int ret;
>
> -	if ((!file->f_op->fallocate) || lo->lo_encrypt_key_size) {
> +	if (!blk_queue_discard(q)) {
>   		ret = -EOPNOTSUPP;
>   		goto out;
>   	}
> @@ -599,8 +594,13 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
>   	case REQ_OP_FLUSH:
>   		return lo_req_flush(lo, rq);
>   	case REQ_OP_DISCARD:
> +		return lo_discard(lo, rq,
> +			FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, pos);
> +
>   	case REQ_OP_WRITE_ZEROES:
> -		return lo_discard(lo, rq, pos);
> +		return lo_discard(lo, rq,
> +			FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE, pos);
> +
>   	case REQ_OP_WRITE:
>   		if (lo->transfer)
>   			return lo_write_transfer(lo, rq, pos);
> @@ -854,6 +854,21 @@ static void loop_config_discard(struct loop_device *lo)
>   	struct file *file = lo->lo_backing_file;
>   	struct inode *inode = file->f_mapping->host;
>   	struct request_queue *q = lo->lo_queue;
> +	struct request_queue *backingq;
> +
> +	/*
> +	 * If the backing device is a block device, mirror its zeroing
> +	 * capability. REQ_OP_DISCARD translates to a zero-out even when backed
> +	 * by block devices to keep consistent behavior with file-backed loop
> +	 * devices.
> +	 */
> +	if (S_ISBLK(inode->i_mode) && !lo->lo_encrypt_key_size) {
> +		backingq = bdev_get_queue(inode->i_bdev);
> +		blk_queue_max_discard_sectors(q,
> +			backingq->limits.max_write_zeroes_sectors);
> +
> +		blk_queue_max_write_zeroes_sectors(q,
> +			backingq->limits.max_write_zeroes_sectors);
>
>   	/*
>   	 * We use punch hole to reclaim the free space used by the
> @@ -861,22 +876,24 @@ static void loop_config_discard(struct loop_device *lo)
>   	 * encryption is enabled, because it may give an attacker
>   	 * useful information.
>   	 */
> -	if ((!file->f_op->fallocate) ||
> -	    lo->lo_encrypt_key_size) {
> +	} else if ((!file->f_op->fallocate) || lo->lo_encrypt_key_size) {
>   		q->limits.discard_granularity = 0;
>   		q->limits.discard_alignment = 0;
>   		blk_queue_max_discard_sectors(q, 0);
>   		blk_queue_max_write_zeroes_sectors(q, 0);
> -		blk_queue_flag_clear(QUEUE_FLAG_DISCARD, q);
> -		return;
> -	}
>
> -	q->limits.discard_granularity = inode->i_sb->s_blocksize;
> -	q->limits.discard_alignment = 0;
> +	} else {
> +		q->limits.discard_granularity = inode->i_sb->s_blocksize;
> +		q->limits.discard_alignment = 0;
> +
> +		blk_queue_max_discard_sectors(q, UINT_MAX >> 9);
> +		blk_queue_max_write_zeroes_sectors(q, UINT_MAX >> 9);
> +	}
>
> -	blk_queue_max_discard_sectors(q, UINT_MAX >> 9);
> -	blk_queue_max_write_zeroes_sectors(q, UINT_MAX >> 9);
> -	blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
> +	if (q->limits.max_write_zeroes_sectors)
> +		blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
> +	else
> +		blk_queue_flag_clear(QUEUE_FLAG_DISCARD, q);
>   }
>
>   static void loop_unprepare_queue(struct loop_device *lo)
>

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index bbf21ebeccd3..a147210ed009 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -417,19 +417,14 @@  static int lo_read_transfer(struct loop_device *lo, struct request *rq,
 	return ret;
 }
 
-static int lo_discard(struct loop_device *lo, struct request *rq, loff_t pos)
+static int lo_discard(struct loop_device *lo, struct request *rq,
+		int mode, loff_t pos)
 {
-	/*
-	 * We use punch hole to reclaim the free space used by the
-	 * image a.k.a. discard. However we do not support discard if
-	 * encryption is enabled, because it may give an attacker
-	 * useful information.
-	 */
 	struct file *file = lo->lo_backing_file;
-	int mode = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE;
+	struct request_queue *q = lo->lo_queue;
 	int ret;
 
-	if ((!file->f_op->fallocate) || lo->lo_encrypt_key_size) {
+	if (!blk_queue_discard(q)) {
 		ret = -EOPNOTSUPP;
 		goto out;
 	}
@@ -599,8 +594,13 @@  static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 	case REQ_OP_FLUSH:
 		return lo_req_flush(lo, rq);
 	case REQ_OP_DISCARD:
+		return lo_discard(lo, rq,
+			FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, pos);
+
 	case REQ_OP_WRITE_ZEROES:
-		return lo_discard(lo, rq, pos);
+		return lo_discard(lo, rq,
+			FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE, pos);
+
 	case REQ_OP_WRITE:
 		if (lo->transfer)
 			return lo_write_transfer(lo, rq, pos);
@@ -854,6 +854,21 @@  static void loop_config_discard(struct loop_device *lo)
 	struct file *file = lo->lo_backing_file;
 	struct inode *inode = file->f_mapping->host;
 	struct request_queue *q = lo->lo_queue;
+	struct request_queue *backingq;
+
+	/*
+	 * If the backing device is a block device, mirror its zeroing
+	 * capability. REQ_OP_DISCARD translates to a zero-out even when backed
+	 * by block devices to keep consistent behavior with file-backed loop
+	 * devices.
+	 */
+	if (S_ISBLK(inode->i_mode) && !lo->lo_encrypt_key_size) {
+		backingq = bdev_get_queue(inode->i_bdev);
+		blk_queue_max_discard_sectors(q,
+			backingq->limits.max_write_zeroes_sectors);
+
+		blk_queue_max_write_zeroes_sectors(q,
+			backingq->limits.max_write_zeroes_sectors);
 
 	/*
 	 * We use punch hole to reclaim the free space used by the
@@ -861,22 +876,24 @@  static void loop_config_discard(struct loop_device *lo)
 	 * encryption is enabled, because it may give an attacker
 	 * useful information.
 	 */
-	if ((!file->f_op->fallocate) ||
-	    lo->lo_encrypt_key_size) {
+	} else if ((!file->f_op->fallocate) || lo->lo_encrypt_key_size) {
 		q->limits.discard_granularity = 0;
 		q->limits.discard_alignment = 0;
 		blk_queue_max_discard_sectors(q, 0);
 		blk_queue_max_write_zeroes_sectors(q, 0);
-		blk_queue_flag_clear(QUEUE_FLAG_DISCARD, q);
-		return;
-	}
 
-	q->limits.discard_granularity = inode->i_sb->s_blocksize;
-	q->limits.discard_alignment = 0;
+	} else {
+		q->limits.discard_granularity = inode->i_sb->s_blocksize;
+		q->limits.discard_alignment = 0;
+
+		blk_queue_max_discard_sectors(q, UINT_MAX >> 9);
+		blk_queue_max_write_zeroes_sectors(q, UINT_MAX >> 9);
+	}
 
-	blk_queue_max_discard_sectors(q, UINT_MAX >> 9);
-	blk_queue_max_write_zeroes_sectors(q, UINT_MAX >> 9);
-	blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
+	if (q->limits.max_write_zeroes_sectors)
+		blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
+	else
+		blk_queue_flag_clear(QUEUE_FLAG_DISCARD, q);
 }
 
 static void loop_unprepare_queue(struct loop_device *lo)

[v5,2/2] loop: Better discard support for block devices

Commit Message

Comments

Patch