diff mbox

[v4] DAX: enable iostat for read/write

Message ID 20170111001122.10826-1-toshi.kani@hpe.com (mailing list archive)
State New, archived
Headers show

Commit Message

Kani, Toshi Jan. 11, 2017, 12:11 a.m. UTC
DAX IO path does not support iostat, but its metadata IO path does.
Therefore, iostat shows metadata IO statistics only, which has been
confusing to users.

Add iostat support to the DAX read/write path.

Note, iostat still does not support the DAX mmap path as it allows
user applications to access directly.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
---
v4: Rebased to 4.10, applied the v3 change to new dax_iomap_rw().
---
 fs/dax.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Dan Williams Jan. 10, 2017, 11:41 p.m. UTC | #1
On Tue, Jan 10, 2017 at 4:11 PM, Toshi Kani <toshi.kani@hpe.com> wrote:
> DAX IO path does not support iostat, but its metadata IO path does.
> Therefore, iostat shows metadata IO statistics only, which has been
> confusing to users.
>
> Add iostat support to the DAX read/write path.
>
> Note, iostat still does not support the DAX mmap path as it allows
> user applications to access directly.
>
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> ---
> v4: Rebased to 4.10, applied the v3 change to new dax_iomap_rw().
> ---
>  fs/dax.c |   13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 5c74f60..4d5f4c0 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1058,12 +1058,22 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
>  {
>         struct address_space *mapping = iocb->ki_filp->f_mapping;
>         struct inode *inode = mapping->host;
> +       struct gendisk *disk = inode->i_sb->s_bdev->bd_disk;
>         loff_t pos = iocb->ki_pos, ret = 0, done = 0;
>         unsigned flags = 0;
> +       unsigned long start = 0;
>
>         if (iov_iter_rw(iter) == WRITE)
>                 flags |= IOMAP_WRITE;
>
> +       if (blk_queue_io_stat(disk->queue)) {
> +               int sec = iov_iter_count(iter) >> 9;
> +
> +               start = jiffies;
> +               generic_start_io_acct(iov_iter_rw(iter),
> +                                     (!sec) ? 1 : sec, &disk->part0);
> +       }
> +
>         while (iov_iter_count(iter)) {
>                 ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops,
>                                 iter, dax_iomap_actor);
> @@ -1073,6 +1083,9 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
>                 done += ret;
>         }
>
> +       if (start)
> +               generic_end_io_acct(iov_iter_rw(iter), &disk->part0, start);
> +

I think we can afford to add a separate flag that indicates whether we
called generic_start_io_acct(). Just in case 'start' is '0' after
'jiffies' rolls over.
Kani, Toshi Jan. 10, 2017, 11:54 p.m. UTC | #2
On Tue, 2017-01-10 at 15:41 -0800, Dan Williams wrote:
> On Tue, Jan 10, 2017 at 4:11 PM, Toshi Kani <toshi.kani@hpe.com>

> wrote:

 :
> > 

> > +       if (blk_queue_io_stat(disk->queue)) {

> > +               int sec = iov_iter_count(iter) >> 9;

> > +

> > +               start = jiffies;

> > +               generic_start_io_acct(iov_iter_rw(iter),

> > +                                     (!sec) ? 1 : sec, &disk-

> > >part0);

> > +       }

> > +

> >         while (iov_iter_count(iter)) {

> >                 ret = iomap_apply(inode, pos, iov_iter_count(iter),

> > flags, ops,

> >                                 iter, dax_iomap_actor);

> > @@ -1073,6 +1083,9 @@ dax_iomap_rw(struct kiocb *iocb, struct

> > iov_iter *iter,

> >                 done += ret;

> >         }

> > 

> > +       if (start)

> > +               generic_end_io_acct(iov_iter_rw(iter), &disk-

> > >part0, start);

> > +

> 

> I think we can afford to add a separate flag that indicates whether

> we called generic_start_io_acct(). Just in case 'start' is '0' after

> 'jiffies' rolls over.


Good point. I will add a flag to account such a case.

Thanks,
-Toshi
Joe Perches Jan. 11, 2017, 4:36 a.m. UTC | #3
On Tue, 2017-01-10 at 17:11 -0700, Toshi Kani wrote:
> DAX IO path does not support iostat, but its metadata IO path does.
> Therefore, iostat shows metadata IO statistics only, which has been
> confusing to users.
[]
> diff --git a/fs/dax.c b/fs/dax.c
[]
> @@ -1058,12 +1058,22 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
[]
> +	if (blk_queue_io_stat(disk->queue)) {
> +		int sec = iov_iter_count(iter) >> 9;
> +
> +		start = jiffies;
> +		generic_start_io_acct(iov_iter_rw(iter),
> +				      (!sec) ? 1 : sec, &disk->part0);
> +	}

There is a signed/unsigned conversion of sec
It may be better to use something like:

		size_t sec  = iov_iter_count(iter) >> 9;
		[...]
		generic_start_io_acct(iov_iter_rw(iter),
				      min_t(unsigned long, 1, sec),
				      &disk->part0);
>
Kani, Toshi Jan. 11, 2017, 4:29 p.m. UTC | #4
On Tue, 2017-01-10 at 20:36 -0800, Joe Perches wrote:
> > 

On Tue, 2017-01-10 at 17:11 -0700, Toshi Kani wrote:
> > DAX IO path does not support iostat, but its metadata IO path does.

> > Therefore, iostat shows metadata IO statistics only, which has been

> > confusing to users.

> 

> []

> > diff --git a/fs/dax.c b/fs/dax.c

> 

> []

> > @@ -1058,12 +1058,22 @@ dax_iomap_rw(struct kiocb *iocb, struct

> > iov_iter *iter,

> 

> []

> > +	if (blk_queue_io_stat(disk->queue)) {

> > +		int sec = iov_iter_count(iter) >> 9;

> > +

> > +		start = jiffies;

> > +		generic_start_io_acct(iov_iter_rw(iter),

> > +				      (!sec) ? 1 : sec, &disk-

> > >part0);

> > +	}

> 

> There is a signed/unsigned conversion of sec

> It may be better to use something like:

> 

> 		size_t sec  = iov_iter_count(iter) >> 9;

> 		[...]

> 		generic_start_io_acct(iov_iter_rw(iter),

> 				      min_t(unsigned long, 1, sec),

> 				      &disk->part0);


Good catch. I will change as you suggested, and use 'sector_t' for
'sec'. 

Thanks,
-Toshi
diff mbox

Patch

diff --git a/fs/dax.c b/fs/dax.c
index 5c74f60..4d5f4c0 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1058,12 +1058,22 @@  dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
 {
 	struct address_space *mapping = iocb->ki_filp->f_mapping;
 	struct inode *inode = mapping->host;
+	struct gendisk *disk = inode->i_sb->s_bdev->bd_disk;
 	loff_t pos = iocb->ki_pos, ret = 0, done = 0;
 	unsigned flags = 0;
+	unsigned long start = 0;
 
 	if (iov_iter_rw(iter) == WRITE)
 		flags |= IOMAP_WRITE;
 
+	if (blk_queue_io_stat(disk->queue)) {
+		int sec = iov_iter_count(iter) >> 9;
+
+		start = jiffies;
+		generic_start_io_acct(iov_iter_rw(iter),
+				      (!sec) ? 1 : sec, &disk->part0);
+	}
+
 	while (iov_iter_count(iter)) {
 		ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops,
 				iter, dax_iomap_actor);
@@ -1073,6 +1083,9 @@  dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
 		done += ret;
 	}
 
+	if (start)
+		generic_end_io_acct(iov_iter_rw(iter), &disk->part0, start);
+
 	iocb->ki_pos += done;
 	return done ? done : ret;
 }