diff mbox

[RFC,v2,2/2] pmem: device flush over VIRTIO

Message ID 20180425112415.12327-3-pagupta@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Pankaj Gupta April 25, 2018, 11:24 a.m. UTC
This patch adds functionality to perform 
flush from guest to hosy over VIRTIO 
when 'ND_REGION_VIRTIO'flag is set on 
nd_negion. Flag is set by 'virtio-pmem'
driver.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/nvdimm/region_devs.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Dan Williams April 25, 2018, 2:23 p.m. UTC | #1
On Wed, Apr 25, 2018 at 4:24 AM, Pankaj Gupta <pagupta@redhat.com> wrote:
> This patch adds functionality to perform
> flush from guest to hosy over VIRTIO
> when 'ND_REGION_VIRTIO'flag is set on
> nd_negion. Flag is set by 'virtio-pmem'
> driver.
>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  drivers/nvdimm/region_devs.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index a612be6..6c6454e 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -20,6 +20,7 @@
>  #include <linux/nd.h>
>  #include "nd-core.h"
>  #include "nd.h"
> +#include <linux/virtio_pmem.h>
>
>  /*
>   * For readq() and writeq() on 32-bit builds, the hi-lo, lo-hi order is
> @@ -1074,6 +1075,12 @@ void nvdimm_flush(struct nd_region *nd_region)
>         struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
>         int i, idx;
>
> +       /* call PV device flush */
> +       if (test_bit(ND_REGION_VIRTIO, &nd_region->flags)) {
> +               virtio_pmem_flush(&nd_region->dev);
> +               return;
> +       }
> +

I'd rather introduce a ->flush() operation hanging off of 'struct
nd_region' so that this multiplexing can be a static setting.
Pankaj Gupta April 25, 2018, 2:47 p.m. UTC | #2
> 
> On Wed, Apr 25, 2018 at 4:24 AM, Pankaj Gupta <pagupta@redhat.com> wrote:
> > This patch adds functionality to perform
> > flush from guest to hosy over VIRTIO
> > when 'ND_REGION_VIRTIO'flag is set on
> > nd_negion. Flag is set by 'virtio-pmem'
> > driver.
> >
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > ---
> >  drivers/nvdimm/region_devs.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> > index a612be6..6c6454e 100644
> > --- a/drivers/nvdimm/region_devs.c
> > +++ b/drivers/nvdimm/region_devs.c
> > @@ -20,6 +20,7 @@
> >  #include <linux/nd.h>
> >  #include "nd-core.h"
> >  #include "nd.h"
> > +#include <linux/virtio_pmem.h>
> >
> >  /*
> >   * For readq() and writeq() on 32-bit builds, the hi-lo, lo-hi order is
> > @@ -1074,6 +1075,12 @@ void nvdimm_flush(struct nd_region *nd_region)
> >         struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
> >         int i, idx;
> >
> > +       /* call PV device flush */
> > +       if (test_bit(ND_REGION_VIRTIO, &nd_region->flags)) {
> > +               virtio_pmem_flush(&nd_region->dev);
> > +               return;
> > +       }
> > +
> 
> I'd rather introduce a ->flush() operation hanging off of 'struct
> nd_region' so that this multiplexing can be a static setting.

Sure! will make the change.

Thanks,
Pankaj
Pankaj Gupta April 26, 2018, 4:40 p.m. UTC | #3
> 
> On Wed, Apr 25, 2018 at 04:54:14PM +0530, Pankaj Gupta wrote:
> > This patch adds functionality to perform
> > flush from guest to hosy over VIRTIO
> > when 'ND_REGION_VIRTIO'flag is set on
> > nd_negion. Flag is set by 'virtio-pmem'
> > driver.
> > 
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > ---
> >  drivers/nvdimm/region_devs.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> > index a612be6..6c6454e 100644
> > --- a/drivers/nvdimm/region_devs.c
> > +++ b/drivers/nvdimm/region_devs.c
> > @@ -20,6 +20,7 @@
> >  #include <linux/nd.h>
> >  #include "nd-core.h"
> >  #include "nd.h"
> > +#include <linux/virtio_pmem.h>
> >  
> >  /*
> >   * For readq() and writeq() on 32-bit builds, the hi-lo, lo-hi order is
> > @@ -1074,6 +1075,12 @@ void nvdimm_flush(struct nd_region *nd_region)
> >  	struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
> >  	int i, idx;
> >  
> > +       /* call PV device flush */
> > +	if (test_bit(ND_REGION_VIRTIO, &nd_region->flags)) {
> > +		virtio_pmem_flush(&nd_region->dev);
> > +		return;
> > +	}
> 
> How does libnvdimm know when flush has completed?
> 
> Callers expect the flush to be finished when nvdimm_flush() returns but
> the virtio driver has only queued the request, it hasn't waited for
> completion!

I tried to implement what nvdimm does right now. It just writes to
flush hint address to make sure data persists.

I just did not want to block guest write requests till host side 
fsync completes.

Operations(write/fsync) on same file would be blocking at guest side and wait time could 
be worse for operations on different guest files because all these operations would happen 
ultimately on same file at host.

I think with current way, we can achieve an asynchronous queuing mechanism on cost of not 
100% sure when fsync would complete but it is assured it will happen. Also, its entire block
flush.

I am open for suggestions here, this is my current thought and implementation. 

Thanks,
Pankaj
Dan Williams April 26, 2018, 4:57 p.m. UTC | #4
On Thu, Apr 26, 2018 at 9:40 AM, Pankaj Gupta <pagupta@redhat.com> wrote:
>
>>
>> On Wed, Apr 25, 2018 at 04:54:14PM +0530, Pankaj Gupta wrote:
>> > This patch adds functionality to perform
>> > flush from guest to hosy over VIRTIO
>> > when 'ND_REGION_VIRTIO'flag is set on
>> > nd_negion. Flag is set by 'virtio-pmem'
>> > driver.
>> >
>> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
>> > ---
>> >  drivers/nvdimm/region_devs.c | 7 +++++++
>> >  1 file changed, 7 insertions(+)
>> >
>> > diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
>> > index a612be6..6c6454e 100644
>> > --- a/drivers/nvdimm/region_devs.c
>> > +++ b/drivers/nvdimm/region_devs.c
>> > @@ -20,6 +20,7 @@
>> >  #include <linux/nd.h>
>> >  #include "nd-core.h"
>> >  #include "nd.h"
>> > +#include <linux/virtio_pmem.h>
>> >
>> >  /*
>> >   * For readq() and writeq() on 32-bit builds, the hi-lo, lo-hi order is
>> > @@ -1074,6 +1075,12 @@ void nvdimm_flush(struct nd_region *nd_region)
>> >     struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
>> >     int i, idx;
>> >
>> > +       /* call PV device flush */
>> > +   if (test_bit(ND_REGION_VIRTIO, &nd_region->flags)) {
>> > +           virtio_pmem_flush(&nd_region->dev);
>> > +           return;
>> > +   }
>>
>> How does libnvdimm know when flush has completed?
>>
>> Callers expect the flush to be finished when nvdimm_flush() returns but
>> the virtio driver has only queued the request, it hasn't waited for
>> completion!
>
> I tried to implement what nvdimm does right now. It just writes to
> flush hint address to make sure data persists.

nvdimm_flush() is currently expected to be synchronous. Currently it
is sfence(); write to special address; sfence(). By the time the
second sfence returns the data is flushed. So you would need to make
this virtio flush interface synchronous as well, but that appears
problematic to stop the guest for unbounded amounts of time. Instead,
you need to rework nvdimm_flush() and the pmem driver to make these
flush requests asynchronous and add the plumbing for completion
callbacks via bio_endio().

> I just did not want to block guest write requests till host side
> fsync completes.

You must complete the flush before bio_endio(), otherwise you're
violating the expectations of the guest filesystem/block-layer.

>
> be worse for operations on different guest files because all these operations would happen
> ultimately on same file at host.
>
> I think with current way, we can achieve an asynchronous queuing mechanism on cost of not
> 100% sure when fsync would complete but it is assured it will happen. Also, its entire block
> flush.

No, again,  that's broken. We need to add the plumbing for
communicating the fsync() completion relative the WRITE_{FLUSH,FUA}
bio in the guest.
Pankaj Gupta April 26, 2018, 5:13 p.m. UTC | #5
> >
> >>
> >> On Wed, Apr 25, 2018 at 04:54:14PM +0530, Pankaj Gupta wrote:
> >> > This patch adds functionality to perform
> >> > flush from guest to hosy over VIRTIO
> >> > when 'ND_REGION_VIRTIO'flag is set on
> >> > nd_negion. Flag is set by 'virtio-pmem'
> >> > driver.
> >> >
> >> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> >> > ---
> >> >  drivers/nvdimm/region_devs.c | 7 +++++++
> >> >  1 file changed, 7 insertions(+)
> >> >
> >> > diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> >> > index a612be6..6c6454e 100644
> >> > --- a/drivers/nvdimm/region_devs.c
> >> > +++ b/drivers/nvdimm/region_devs.c
> >> > @@ -20,6 +20,7 @@
> >> >  #include <linux/nd.h>
> >> >  #include "nd-core.h"
> >> >  #include "nd.h"
> >> > +#include <linux/virtio_pmem.h>
> >> >
> >> >  /*
> >> >   * For readq() and writeq() on 32-bit builds, the hi-lo, lo-hi order is
> >> > @@ -1074,6 +1075,12 @@ void nvdimm_flush(struct nd_region *nd_region)
> >> >     struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
> >> >     int i, idx;
> >> >
> >> > +       /* call PV device flush */
> >> > +   if (test_bit(ND_REGION_VIRTIO, &nd_region->flags)) {
> >> > +           virtio_pmem_flush(&nd_region->dev);
> >> > +           return;
> >> > +   }
> >>
> >> How does libnvdimm know when flush has completed?
> >>
> >> Callers expect the flush to be finished when nvdimm_flush() returns but
> >> the virtio driver has only queued the request, it hasn't waited for
> >> completion!
> >
> > I tried to implement what nvdimm does right now. It just writes to
> > flush hint address to make sure data persists.
> 
> nvdimm_flush() is currently expected to be synchronous. Currently it
> is sfence(); write to special address; sfence(). By the time the
> second sfence returns the data is flushed. So you would need to make
> this virtio flush interface synchronous as well, but that appears
> problematic to stop the guest for unbounded amounts of time. Instead,
> you need to rework nvdimm_flush() and the pmem driver to make these
> flush requests asynchronous and add the plumbing for completion
> callbacks via bio_endio().

o.k. 

> 
> > I just did not want to block guest write requests till host side
> > fsync completes.
> 
> You must complete the flush before bio_endio(), otherwise you're
> violating the expectations of the guest filesystem/block-layer.

sure!

> 
> >
> > be worse for operations on different guest files because all these
> > operations would happen
> > ultimately on same file at host.
> >
> > I think with current way, we can achieve an asynchronous queuing mechanism
> > on cost of not
> > 100% sure when fsync would complete but it is assured it will happen. Also,
> > its entire block
> > flush.
> 
> No, again,  that's broken. We need to add the plumbing for
> communicating the fsync() completion relative the WRITE_{FLUSH,FUA}
> bio in the guest.

Sure. Thanks Dan & Stefan for the explanation and review. 

Best regards,
Pankaj
diff mbox

Patch

diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index a612be6..6c6454e 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -20,6 +20,7 @@ 
 #include <linux/nd.h>
 #include "nd-core.h"
 #include "nd.h"
+#include <linux/virtio_pmem.h>
 
 /*
  * For readq() and writeq() on 32-bit builds, the hi-lo, lo-hi order is
@@ -1074,6 +1075,12 @@  void nvdimm_flush(struct nd_region *nd_region)
 	struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
 	int i, idx;
 
+       /* call PV device flush */
+	if (test_bit(ND_REGION_VIRTIO, &nd_region->flags)) {
+		virtio_pmem_flush(&nd_region->dev);
+		return;
+	}
+
 	/*
 	 * Try to encourage some diversity in flush hint addresses
 	 * across cpus assuming a limited number of flush hints.