diff mbox series

[v5,2/7] dax: introduce dax device flag DAXDEV_RECOVERY

Message ID 20220128213150.1333552-3-jane.chu@oracle.com (mailing list archive)
State Superseded
Headers show
Series DAX poison recovery | expand

Commit Message

Jane Chu Jan. 28, 2022, 9:31 p.m. UTC
Introduce dax device flag DAXDEV_RECOVERY to indicate a device
that is capable of recoverying from media poison.
For MD raid DAX devices, the capability is allowed for partial
device as oppose to the entire device. And the final poison
detection and repair rely on the provisioning base drivers.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 drivers/dax/super.c   | 24 ++++++++++++++++++++++++
 drivers/nvdimm/pmem.c |  1 +
 include/linux/dax.h   | 14 ++++++++++++++
 3 files changed, 39 insertions(+)

Comments

Christoph Hellwig Feb. 2, 2022, 1:23 p.m. UTC | #1
On Fri, Jan 28, 2022 at 02:31:45PM -0700, Jane Chu wrote:
> +int dax_prep_recovery(struct dax_device *dax_dev, void **kaddr)
> +{
> +	if (dax_recovery_capable(dax_dev)) {
> +		set_bit(DAXDEV_RECOVERY, (unsigned long *)kaddr);
> +		return 0;
> +	}
> +	return -EINVAL;

Setting a random bit on a passed in memory address looks a little
dangerous to me.

Also I'd return early for the EINVAL case to make the flow a little
more clear.
Jane Chu Feb. 2, 2022, 9:27 p.m. UTC | #2
On 2/2/2022 5:23 AM, Christoph Hellwig wrote:
> On Fri, Jan 28, 2022 at 02:31:45PM -0700, Jane Chu wrote:
>> +int dax_prep_recovery(struct dax_device *dax_dev, void **kaddr)
>> +{
>> +	if (dax_recovery_capable(dax_dev)) {
>> +		set_bit(DAXDEV_RECOVERY, (unsigned long *)kaddr);
>> +		return 0;
>> +	}
>> +	return -EINVAL;
> 
> Setting a random bit on a passed in memory address looks a little
> dangerous to me.

Yeah, I see.  Would you suggest a way to pass the indication from
dax_iomap_iter to dax_direct_access that the caller intends the
callee to ignore poison in the range because the caller intends
to do recovery_write? We tried adding a flag to dax_direct_access, and 
that wasn't liked if I recall.

> 
> Also I'd return early for the EINVAL case to make the flow a little
> more clear.

Agreed, will do.

thanks!
-jane
Christoph Hellwig Feb. 3, 2022, 1:43 p.m. UTC | #3
On Wed, Feb 02, 2022 at 09:27:42PM +0000, Jane Chu wrote:
> Yeah, I see.  Would you suggest a way to pass the indication from
> dax_iomap_iter to dax_direct_access that the caller intends the
> callee to ignore poison in the range because the caller intends
> to do recovery_write? We tried adding a flag to dax_direct_access, and 
> that wasn't liked if I recall.

To me a flag seems cleaner than this magic, but let's wait for Dan to
chime in.
Dan Williams Feb. 4, 2022, 5:17 a.m. UTC | #4
On Thu, Feb 3, 2022 at 5:43 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Wed, Feb 02, 2022 at 09:27:42PM +0000, Jane Chu wrote:
> > Yeah, I see.  Would you suggest a way to pass the indication from
> > dax_iomap_iter to dax_direct_access that the caller intends the
> > callee to ignore poison in the range because the caller intends
> > to do recovery_write? We tried adding a flag to dax_direct_access, and
> > that wasn't liked if I recall.
>
> To me a flag seems cleaner than this magic, but let's wait for Dan to
> chime in.

So back in November I suggested modifying the kaddr, mainly to avoid
touching all the dax_direct_access() call sites [1]. However, now
seeing the code and Chrisoph's comment I think this either wants type
safety (e.g. 'dax_addr_t *'), or just add a new flag. Given both of
those options involve touching all dax_direct_access() call sites and
a @flags operation is more extensible if any other scenarios arrive
lets go ahead and plumb a flag and skip the magic.
Dan Williams Feb. 4, 2022, 5:32 a.m. UTC | #5
On Thu, Feb 3, 2022 at 9:17 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Thu, Feb 3, 2022 at 5:43 AM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Wed, Feb 02, 2022 at 09:27:42PM +0000, Jane Chu wrote:
> > > Yeah, I see.  Would you suggest a way to pass the indication from
> > > dax_iomap_iter to dax_direct_access that the caller intends the
> > > callee to ignore poison in the range because the caller intends
> > > to do recovery_write? We tried adding a flag to dax_direct_access, and
> > > that wasn't liked if I recall.
> >
> > To me a flag seems cleaner than this magic, but let's wait for Dan to
> > chime in.
>
> So back in November I suggested modifying the kaddr, mainly to avoid
> touching all the dax_direct_access() call sites [1]. However, now
> seeing the code and Chrisoph's comment I think this either wants type
> safety (e.g. 'dax_addr_t *'), or just add a new flag. Given both of
> those options involve touching all dax_direct_access() call sites and
> a @flags operation is more extensible if any other scenarios arrive
> lets go ahead and plumb a flag and skip the magic.

Just to be clear we are talking about a flow like:

        flags = 0;
        rc = dax_direct_access(..., &kaddr, flags, ...);
        if (unlikely(rc)) {
                flags |= DAX_RECOVERY;
                dax_direct_access(..., &kaddr, flags, ...);
                return dax_recovery_{read,write}(..., kaddr, ...);
        }
        return copy_{mc_to_iter,from_iter_flushcache}(...);
Jane Chu Feb. 6, 2022, 8:29 a.m. UTC | #6
On 2/3/2022 9:32 PM, Dan Williams wrote:
> On Thu, Feb 3, 2022 at 9:17 PM Dan Williams <dan.j.williams@intel.com> wrote:
>>
>> On Thu, Feb 3, 2022 at 5:43 AM Christoph Hellwig <hch@infradead.org> wrote:
>>>
>>> On Wed, Feb 02, 2022 at 09:27:42PM +0000, Jane Chu wrote:
>>>> Yeah, I see.  Would you suggest a way to pass the indication from
>>>> dax_iomap_iter to dax_direct_access that the caller intends the
>>>> callee to ignore poison in the range because the caller intends
>>>> to do recovery_write? We tried adding a flag to dax_direct_access, and
>>>> that wasn't liked if I recall.
>>>
>>> To me a flag seems cleaner than this magic, but let's wait for Dan to
>>> chime in.
>>
>> So back in November I suggested modifying the kaddr, mainly to avoid
>> touching all the dax_direct_access() call sites [1]. However, now
>> seeing the code and Chrisoph's comment I think this either wants type
>> safety (e.g. 'dax_addr_t *'), or just add a new flag. Given both of
>> those options involve touching all dax_direct_access() call sites and
>> a @flags operation is more extensible if any other scenarios arrive
>> lets go ahead and plumb a flag and skip the magic.
> 
> Just to be clear we are talking about a flow like:
> 
>          flags = 0;
>          rc = dax_direct_access(..., &kaddr, flags, ...);
>          if (unlikely(rc)) {
>                  flags |= DAX_RECOVERY;
>                  dax_direct_access(..., &kaddr, flags, ...);
>                  return dax_recovery_{read,write}(..., kaddr, ...);
>          }
>          return copy_{mc_to_iter,from_iter_flushcache}(...);

Okay, will go with a flag.

thanks!
-jane
diff mbox series

Patch

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e3029389d809..f4f607d9698b 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -109,6 +109,8 @@  enum dax_device_flags {
 	DAXDEV_NOCACHE,
 	/* handle CPU fetch exceptions during reads */
 	DAXDEV_NOMC,
+	/* flag to indicate device capable of poison recovery */
+	DAXDEV_RECOVERY,
 };
 
 /**
@@ -311,6 +313,28 @@  static void dax_destroy_inode(struct inode *inode)
 			"kill_dax() must be called before final iput()\n");
 }
 
+void set_dax_recovery(struct dax_device *dax_dev)
+{
+	set_bit(DAXDEV_RECOVERY, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(set_dax_recovery);
+
+bool dax_recovery_capable(struct dax_device *dax_dev)
+{
+	return test_bit(DAXDEV_RECOVERY, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(dax_recovery_capable);
+
+int dax_prep_recovery(struct dax_device *dax_dev, void **kaddr)
+{
+	if (dax_recovery_capable(dax_dev)) {
+		set_bit(DAXDEV_RECOVERY, (unsigned long *)kaddr);
+		return 0;
+	}
+	return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(dax_prep_recovery);
+
 static const struct super_operations dax_sops = {
 	.statfs = simple_statfs,
 	.alloc_inode = dax_alloc_inode,
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 58d95242a836..e8823813a8df 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -487,6 +487,7 @@  static int pmem_attach_disk(struct device *dev,
 	if (rc)
 		goto out_cleanup_dax;
 	dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
+	set_dax_recovery(dax_dev);
 	pmem->dax_dev = dax_dev;
 
 	rc = device_add_disk(dev, disk, pmem_attribute_groups);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 9fc5f99a0ae2..2fc776653c6e 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -40,6 +40,9 @@  void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
 bool dax_synchronous(struct dax_device *dax_dev);
 void set_dax_synchronous(struct dax_device *dax_dev);
+void set_dax_recovery(struct dax_device *dax_dev);
+bool dax_recovery_capable(struct dax_device *dax_dev);
+int dax_prep_recovery(struct dax_device *dax_dev, void **kaddr);
 /*
  * Check if given mapping is supported by the file / underlying device.
  */
@@ -87,6 +90,17 @@  static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
 {
 	return !(vma->vm_flags & VM_SYNC);
 }
+static inline void set_dax_recovery(struct dax_device *dax_dev)
+{
+}
+static inline bool dax_recovery_capable(struct dax_device *dax_dev)
+{
+	return false;
+}
+static inline int dax_prep_recovery(struct dax_device *dax_dev, void **kaddr)
+{
+	return -EINVAL;
+}
 #endif
 
 void set_dax_nocache(struct dax_device *dax_dev);