[v4,08/18] tools/testing/nvdimm: add 'bio_delay' mechanism

Message ID	151407700353.38751.17609507477918854876.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-nvdimm-bounces@lists.01.org> Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=192.55.52.88; helo=mga01.intel.com; envelope-from=dan.j.williams@intel.com; receiver=linux-nvdimm@lists.01.org Subject: [PATCH v4 08/18] tools/testing/nvdimm: add 'bio_delay' mechanism From: Dan Williams <dan.j.williams@intel.com> To: akpm@linux-foundation.org Date: Sat, 23 Dec 2017 16:56:43 -0800 Message-ID: <151407700353.38751.17609507477918854876.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <151407695916.38751.2866053440557472361.stgit@dwillia2-desk3.amr.corp.intel.com> References: <151407695916.38751.2866053440557472361.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.17.1-9-g687f MIME-Version: 1.0 Precedence: list Cc: jack@suse.cz, linux-nvdimm@lists.01.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, hch@lst.de Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" <linux-nvdimm-bounces@lists.01.org>

Dan Williams Dec. 24, 2017, 12:56 a.m. UTC

In support of testing truncate colliding with dma add a mechanism that
delays the completion of block I/O requests by a programmable number of
seconds. This allows a truncate operation to be issued while page
references are held for direct-I/O.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 tools/testing/nvdimm/Kbuild           |    1 +
 tools/testing/nvdimm/test/iomap.c     |   62 +++++++++++++++++++++++++++++++++
 tools/testing/nvdimm/test/nfit.c      |   34 ++++++++++++++++++
 tools/testing/nvdimm/test/nfit_test.h |    1 +
 4 files changed, 98 insertions(+)

Ross Zwisler Dec. 27, 2017, 6:08 p.m. UTC | #1

On Sat, Dec 23, 2017 at 04:56:43PM -0800, Dan Williams wrote:
> In support of testing truncate colliding with dma add a mechanism that
> delays the completion of block I/O requests by a programmable number of
> seconds. This allows a truncate operation to be issued while page
> references are held for direct-I/O.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

> @@ -387,4 +389,64 @@ union acpi_object * __wrap_acpi_evaluate_dsm(acpi_handle handle, const guid_t *g
>  }
>  EXPORT_SYMBOL(__wrap_acpi_evaluate_dsm);
>  
> +static DEFINE_SPINLOCK(bio_lock);
> +static struct bio *biolist;
> +int bio_do_queue;
> +
> +static void run_bio(struct work_struct *work)
> +{
> +	struct delayed_work *dw = container_of(work, typeof(*dw), work);
> +	struct bio *bio, *next;
> +
> +	pr_info("%s\n", __func__);

Did you mean to leave this print in, or was it part of your debug while
developing?  I don't see any other prints in the rest of the nvdimm testing
code?

> +	spin_lock(&bio_lock);
> +	bio_do_queue = 0;
> +	bio = biolist;
> +	biolist = NULL;
> +	spin_unlock(&bio_lock);
> +
> +	while (bio) {
> +		next = bio->bi_next;
> +		bio->bi_next = NULL;
> +		bio_endio(bio);
> +		bio = next;
> +	}
> +	kfree(dw);
> +}
> +
> +void nfit_test_inject_bio_delay(int sec)
> +{
> +	struct delayed_work *dw = kzalloc(sizeof(*dw), GFP_KERNEL);
> +
> +	spin_lock(&bio_lock);
> +	if (!bio_do_queue) {
> +		pr_info("%s: %d seconds\n", __func__, sec);

Ditto with this print - did you mean to leave it in?

> +		INIT_DELAYED_WORK(dw, run_bio);
> +		bio_do_queue = 1;
> +		schedule_delayed_work(dw, sec * HZ);
> +		dw = NULL;

Why set dw = NULL here?  In the else case we leak dw - was this dw=NULL meant
to allow a kfree(dw) after we get out of the if() (and probably after we drop
the spinlock)?

> +	}
> +	spin_unlock(&bio_lock);
> +}
> +EXPORT_SYMBOL_GPL(nfit_test_inject_bio_delay);
> +

> diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
> index 7217b2b953b5..9362b01e9a8f 100644
> --- a/tools/testing/nvdimm/test/nfit.c
> +++ b/tools/testing/nvdimm/test/nfit.c
> @@ -872,6 +872,39 @@ static const struct attribute_group *nfit_test_dimm_attribute_groups[] = {
>  	NULL,
>  };
>  
> +static ssize_t bio_delay_show(struct device_driver *drv, char *buf)
> +{
> +	return sprintf(buf, "0\n");
> +}

It doesn't seem like this _show() routine adds much?  We could have it print
out the value of 'bio_do_queue' so we can see if we are currently queueing
bios in a workqueue element, but that suffers pretty badly from a TOCTOU race.

Otherwise we could just omit the _show() altogether and just use
DRIVER_ATTR_WO(bio_delay).

> +
> +static ssize_t bio_delay_store(struct device_driver *drv, const char *buf,
> +		size_t count)
> +{
> +	unsigned long delay;
> +	int rc = kstrtoul(buf, 0, &delay);
> +
> +	if (rc < 0)
> +		return rc;
> +
> +	nfit_test_inject_bio_delay(delay);
> +	return count;
> +}
> +DRIVER_ATTR_RW(bio_delay);

   DRIVER_ATTR_WO(bio_delay);  ?

Dan Williams Jan. 2, 2018, 8:35 p.m. UTC | #2

On Wed, Dec 27, 2017 at 10:08 AM, Ross Zwisler
<ross.zwisler@linux.intel.com> wrote:
> On Sat, Dec 23, 2017 at 04:56:43PM -0800, Dan Williams wrote:
>> In support of testing truncate colliding with dma add a mechanism that
>> delays the completion of block I/O requests by a programmable number of
>> seconds. This allows a truncate operation to be issued while page
>> references are held for direct-I/O.
>>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
>> @@ -387,4 +389,64 @@ union acpi_object * __wrap_acpi_evaluate_dsm(acpi_handle handle, const guid_t *g
>>  }
>>  EXPORT_SYMBOL(__wrap_acpi_evaluate_dsm);
>>
>> +static DEFINE_SPINLOCK(bio_lock);
>> +static struct bio *biolist;
>> +int bio_do_queue;
>> +
>> +static void run_bio(struct work_struct *work)
>> +{
>> +     struct delayed_work *dw = container_of(work, typeof(*dw), work);
>> +     struct bio *bio, *next;
>> +
>> +     pr_info("%s\n", __func__);
>
> Did you mean to leave this print in, or was it part of your debug while
> developing?  I don't see any other prints in the rest of the nvdimm testing
> code?
>
>> +     spin_lock(&bio_lock);
>> +     bio_do_queue = 0;
>> +     bio = biolist;
>> +     biolist = NULL;
>> +     spin_unlock(&bio_lock);
>> +
>> +     while (bio) {
>> +             next = bio->bi_next;
>> +             bio->bi_next = NULL;
>> +             bio_endio(bio);
>> +             bio = next;
>> +     }
>> +     kfree(dw);
>> +}
>> +
>> +void nfit_test_inject_bio_delay(int sec)
>> +{
>> +     struct delayed_work *dw = kzalloc(sizeof(*dw), GFP_KERNEL);
>> +
>> +     spin_lock(&bio_lock);
>> +     if (!bio_do_queue) {
>> +             pr_info("%s: %d seconds\n", __func__, sec);
>
> Ditto with this print - did you mean to leave it in?

Yes, this one plus the previous one are in there deliberately so that
I can see the injection / completion of the delay relative to when the
test is performing direct-i/o.

>
>> +             INIT_DELAYED_WORK(dw, run_bio);
>> +             bio_do_queue = 1;
>> +             schedule_delayed_work(dw, sec * HZ);
>> +             dw = NULL;
>
> Why set dw = NULL here?  In the else case we leak dw - was this dw=NULL meant
> to allow a kfree(dw) after we get out of the if() (and probably after we drop
> the spinlock)?

Something like that, but now it's just a leftover from an initial
version of the code, will delete.

>> +     }
>> +     spin_unlock(&bio_lock);
>> +}
>> +EXPORT_SYMBOL_GPL(nfit_test_inject_bio_delay);
>> +
>
>> diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
>> index 7217b2b953b5..9362b01e9a8f 100644
>> --- a/tools/testing/nvdimm/test/nfit.c
>> +++ b/tools/testing/nvdimm/test/nfit.c
>> @@ -872,6 +872,39 @@ static const struct attribute_group *nfit_test_dimm_attribute_groups[] = {
>>       NULL,
>>  };
>>
>> +static ssize_t bio_delay_show(struct device_driver *drv, char *buf)
>> +{
>> +     return sprintf(buf, "0\n");
>> +}
>
> It doesn't seem like this _show() routine adds much?  We could have it print
> out the value of 'bio_do_queue' so we can see if we are currently queueing
> bios in a workqueue element, but that suffers pretty badly from a TOCTOU race.
>
> Otherwise we could just omit the _show() altogether and just use
> DRIVER_ATTR_WO(bio_delay).

Sure.

Dave Chinner Jan. 2, 2018, 9:44 p.m. UTC | #3

On Sat, Dec 23, 2017 at 04:56:43PM -0800, Dan Williams wrote:
> In support of testing truncate colliding with dma add a mechanism that
> delays the completion of block I/O requests by a programmable number of
> seconds. This allows a truncate operation to be issued while page
> references are held for direct-I/O.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Why not put this in the generic bio layer code and then write a
generic fstest to exercise this truncate vs direct IO completion
race condition on all types of storage and filesystems?

i.e. if it sits in a nvdimm test suite, it's never going to be run
by filesystem developers....

Cheers,

Dave.

Dan Williams Jan. 2, 2018, 9:51 p.m. UTC | #4

On Tue, Jan 2, 2018 at 1:44 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Sat, Dec 23, 2017 at 04:56:43PM -0800, Dan Williams wrote:
>> In support of testing truncate colliding with dma add a mechanism that
>> delays the completion of block I/O requests by a programmable number of
>> seconds. This allows a truncate operation to be issued while page
>> references are held for direct-I/O.
>>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> Why not put this in the generic bio layer code and then write a
> generic fstest to exercise this truncate vs direct IO completion
> race condition on all types of storage and filesystems?
>
> i.e. if it sits in a nvdimm test suite, it's never going to be run
> by filesystem developers....

I do want to get it into xfstests eventually. I picked the nvdimm
infrastructure for expediency of getting the fix developed. Also, I
consider the collision in the non-dax case a solved problem since the
core mm will keep the page out of circulation indefinitely.

Jan Kara Jan. 3, 2018, 3:46 p.m. UTC | #5

On Tue 02-01-18 13:51:49, Dan Williams wrote:
> On Tue, Jan 2, 2018 at 1:44 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Sat, Dec 23, 2017 at 04:56:43PM -0800, Dan Williams wrote:
> >> In support of testing truncate colliding with dma add a mechanism that
> >> delays the completion of block I/O requests by a programmable number of
> >> seconds. This allows a truncate operation to be issued while page
> >> references are held for direct-I/O.
> >>
> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> >
> > Why not put this in the generic bio layer code and then write a
> > generic fstest to exercise this truncate vs direct IO completion
> > race condition on all types of storage and filesystems?
> >
> > i.e. if it sits in a nvdimm test suite, it's never going to be run
> > by filesystem developers....
> 
> I do want to get it into xfstests eventually. I picked the nvdimm
> infrastructure for expediency of getting the fix developed. Also, I
> consider the collision in the non-dax case a solved problem since the
> core mm will keep the page out of circulation indefinitely.

Yes, but there are different races that could happen even for regular page
cache pages. So I also think it would be worthwhile to have this inside the
block layer possibly as part of the generic fault-injection framework which
is already there for fail_make_request. That already supports various
filtering, frequency, and other options that could be useful.

								Honza

Jeff Moyer Jan. 3, 2018, 8:37 p.m. UTC | #6

Jan Kara <jack@suse.cz> writes:

> On Tue 02-01-18 13:51:49, Dan Williams wrote:
>> On Tue, Jan 2, 2018 at 1:44 PM, Dave Chinner <david@fromorbit.com> wrote:
>> > On Sat, Dec 23, 2017 at 04:56:43PM -0800, Dan Williams wrote:
>> >> In support of testing truncate colliding with dma add a mechanism that
>> >> delays the completion of block I/O requests by a programmable number of
>> >> seconds. This allows a truncate operation to be issued while page
>> >> references are held for direct-I/O.
>> >>
>> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> >
>> > Why not put this in the generic bio layer code and then write a
>> > generic fstest to exercise this truncate vs direct IO completion
>> > race condition on all types of storage and filesystems?
>> >
>> > i.e. if it sits in a nvdimm test suite, it's never going to be run
>> > by filesystem developers....
>> 
>> I do want to get it into xfstests eventually. I picked the nvdimm
>> infrastructure for expediency of getting the fix developed. Also, I
>> consider the collision in the non-dax case a solved problem since the
>> core mm will keep the page out of circulation indefinitely.
>
> Yes, but there are different races that could happen even for regular page
> cache pages. So I also think it would be worthwhile to have this inside the
> block layer possibly as part of the generic fault-injection framework which
> is already there for fail_make_request. That already supports various
> filtering, frequency, and other options that could be useful.

Or consider extending the dm-delay target (which delays the queuing of
bios) to support delaying the completions.  I'm not sure I'm a fan of
sticking all sorts of debug code into the generic I/O submission path.

Cheers,
Jeff

[v4,08/18] tools/testing/nvdimm: add 'bio_delay' mechanism

Commit Message

Comments

Patch