diff mbox

[v5] fault-injection: introduce kvmalloc fallback options

Message ID alpine.LRH.2.02.1804251656300.9428@file01.intranet.prod.int.rdu2.redhat.com (mailing list archive)
State Not Applicable, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

Mikulas Patocka April 25, 2018, 8:57 p.m. UTC
On Wed, 25 Apr 2018, Randy Dunlap wrote:

> On 04/25/2018 01:02 PM, Mikulas Patocka wrote:
> > 
> > 
> > From: Mikulas Patocka <mpatocka@redhat.com>
> > Subject: [PATCH v4] fault-injection: introduce kvmalloc fallback options
> > 
> > This patch introduces a fault-injection option "kvmalloc_fallback". This
> > option makes kvmalloc randomly fall back to vmalloc.
> > 
> > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> 
>   Unfortunately,

OK - here I fixed the typos:


From: Mikulas Patocka <mpatocka@redhat.com>
Subject: [PATCH] fault-injection: introduce kvmalloc fallback options

This patch introduces a fault-injection option "kvmalloc_fallback". This
option makes kvmalloc randomly fall back to vmalloc.

Unfortunately, some kernel code has bugs - it uses kvmalloc and then
uses DMA-API on the returned memory or frees it with kfree. Such bugs were
found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
code. This options helps to test for these bugs.

The patch introduces a config option FAIL_KVMALLOC_FALLBACK_PROBABILITY.
It can be enabled in distribution debug kernels, so that kvmalloc abuse
can be tested by the users. The default can be overridden with
"kvmalloc_fallback" parameter or in /sys/kernel/debug/kvmalloc_fallback/.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 Documentation/fault-injection/fault-injection.txt |    7 +++++
 include/linux/fault-inject.h                      |    9 +++---
 kernel/futex.c                                    |    2 -
 lib/Kconfig.debug                                 |   15 +++++++++++
 mm/failslab.c                                     |    2 -
 mm/page_alloc.c                                   |    2 -
 mm/util.c                                         |   30 ++++++++++++++++++++++
 7 files changed, 60 insertions(+), 7 deletions(-)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Randy Dunlap April 25, 2018, 9:11 p.m. UTC | #1
On 04/25/2018 01:57 PM, Mikulas Patocka wrote:
> 
> 
> On Wed, 25 Apr 2018, Randy Dunlap wrote:
> 
>> On 04/25/2018 01:02 PM, Mikulas Patocka wrote:
>>>
>>>
>>> From: Mikulas Patocka <mpatocka@redhat.com>
>>> Subject: [PATCH v4] fault-injection: introduce kvmalloc fallback options
>>>
>>> This patch introduces a fault-injection option "kvmalloc_fallback". This
>>> option makes kvmalloc randomly fall back to vmalloc.
>>>
>>> Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
>>
>>   Unfortunately,
> 
> OK - here I fixed the typos:
> 
> 
> From: Mikulas Patocka <mpatocka@redhat.com>
> Subject: [PATCH] fault-injection: introduce kvmalloc fallback options
> 
> This patch introduces a fault-injection option "kvmalloc_fallback". This
> option makes kvmalloc randomly fall back to vmalloc.
> 
> Unfortunately, some kernel code has bugs - it uses kvmalloc and then
> uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> code. This options helps to test for these bugs.
> 
> The patch introduces a config option FAIL_KVMALLOC_FALLBACK_PROBABILITY.
> It can be enabled in distribution debug kernels, so that kvmalloc abuse
> can be tested by the users. The default can be overridden with
> "kvmalloc_fallback" parameter or in /sys/kernel/debug/kvmalloc_fallback/.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> 
> ---
>  Documentation/fault-injection/fault-injection.txt |    7 +++++
>  include/linux/fault-inject.h                      |    9 +++---
>  kernel/futex.c                                    |    2 -
>  lib/Kconfig.debug                                 |   15 +++++++++++
>  mm/failslab.c                                     |    2 -
>  mm/page_alloc.c                                   |    2 -
>  mm/util.c                                         |   30 ++++++++++++++++++++++
>  7 files changed, 60 insertions(+), 7 deletions(-)

Acked-by: Randy Dunlap <rdunlap@infradead.org> # Documentation and Kconfig only

thanks.
David Rientjes April 25, 2018, 9:18 p.m. UTC | #2
On Wed, 25 Apr 2018, Mikulas Patocka wrote:

> From: Mikulas Patocka <mpatocka@redhat.com>
> Subject: [PATCH] fault-injection: introduce kvmalloc fallback options
> 
> This patch introduces a fault-injection option "kvmalloc_fallback". This
> option makes kvmalloc randomly fall back to vmalloc.
> 
> Unfortunately, some kernel code has bugs - it uses kvmalloc and then
> uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> code. This options helps to test for these bugs.
> 
> The patch introduces a config option FAIL_KVMALLOC_FALLBACK_PROBABILITY.
> It can be enabled in distribution debug kernels, so that kvmalloc abuse
> can be tested by the users. The default can be overridden with
> "kvmalloc_fallback" parameter or in /sys/kernel/debug/kvmalloc_fallback/.
> 

Do we really need the new config option?  This could just be manually 
tunable via fault injection IIUC.

> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> 
> ---
>  Documentation/fault-injection/fault-injection.txt |    7 +++++
>  include/linux/fault-inject.h                      |    9 +++---
>  kernel/futex.c                                    |    2 -
>  lib/Kconfig.debug                                 |   15 +++++++++++
>  mm/failslab.c                                     |    2 -
>  mm/page_alloc.c                                   |    2 -
>  mm/util.c                                         |   30 ++++++++++++++++++++++
>  7 files changed, 60 insertions(+), 7 deletions(-)
> 
> Index: linux-2.6/Documentation/fault-injection/fault-injection.txt
> ===================================================================
> --- linux-2.6.orig/Documentation/fault-injection/fault-injection.txt	2018-04-16 21:08:34.000000000 +0200
> +++ linux-2.6/Documentation/fault-injection/fault-injection.txt	2018-04-25 21:36:36.000000000 +0200
> @@ -15,6 +15,12 @@ o fail_page_alloc
>  
>    injects page allocation failures. (alloc_pages(), get_free_pages(), ...)
>  
> +o kvmalloc_fallback
> +
> +  makes the function kvmalloc randomly fall back to vmalloc. This could be used
> +  to detects bugs such as using DMA-API on the result of kvmalloc or freeing
> +  the result of kvmalloc with free.
> +
>  o fail_futex
>  
>    injects futex deadlock and uaddr fault errors.
> @@ -167,6 +173,7 @@ use the boot option:
>  
>  	failslab=
>  	fail_page_alloc=
> +	kvmalloc_fallback=
>  	fail_make_request=
>  	fail_futex=
>  	mmc_core.fail_request=<interval>,<probability>,<space>,<times>
> Index: linux-2.6/include/linux/fault-inject.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fault-inject.h	2018-04-16 21:08:36.000000000 +0200
> +++ linux-2.6/include/linux/fault-inject.h	2018-04-25 21:38:22.000000000 +0200
> @@ -31,17 +31,18 @@ struct fault_attr {
>  	struct dentry *dname;
>  };
>  
> -#define FAULT_ATTR_INITIALIZER {					\
> +#define FAULT_ATTR_INITIALIZER(p) {					\
> +		.probability = (p),					\
>  		.interval = 1,						\
> -		.times = ATOMIC_INIT(1),				\
> +		.times = ATOMIC_INIT((p) ? -1 : 1),			\
> +		.verbose = (p) ? 0 : 2,					\
>  		.require_end = ULONG_MAX,				\
>  		.stacktrace_depth = 32,					\
>  		.ratelimit_state = RATELIMIT_STATE_INIT_DISABLED,	\
> -		.verbose = 2,						\
>  		.dname = NULL,						\
>  	}
>  
> -#define DECLARE_FAULT_ATTR(name) struct fault_attr name = FAULT_ATTR_INITIALIZER
> +#define DECLARE_FAULT_ATTR(name) struct fault_attr name = FAULT_ATTR_INITIALIZER(0)
>  int setup_fault_attr(struct fault_attr *attr, char *str);
>  bool should_fail(struct fault_attr *attr, ssize_t size);
>  
> Index: linux-2.6/lib/Kconfig.debug
> ===================================================================
> --- linux-2.6.orig/lib/Kconfig.debug	2018-04-25 15:56:16.000000000 +0200
> +++ linux-2.6/lib/Kconfig.debug	2018-04-25 21:39:45.000000000 +0200
> @@ -1527,6 +1527,21 @@ config FAIL_PAGE_ALLOC
>  	help
>  	  Provide fault-injection capability for alloc_pages().
>  
> +config FAIL_KVMALLOC_FALLBACK_PROBABILITY
> +	int "Default kvmalloc fallback probability"
> +	depends on FAULT_INJECTION
> +	range 0 100
> +	default "0"
> +	help
> +	  This option will make kvmalloc randomly fall back to vmalloc.
> +	  Normally, kvmalloc falls back to vmalloc only rarely, if memory
> +	  is fragmented.
> +
> +	  This option helps to detect hard-to-reproduce driver bugs, for
> +	  example using DMA API on the result of kvmalloc.
> +
> +	  The default may be overridden with the kvmalloc_fallback parameter.
> +
>  config FAIL_MAKE_REQUEST
>  	bool "Fault-injection capability for disk IO"
>  	depends on FAULT_INJECTION && BLOCK
> Index: linux-2.6/mm/util.c
> ===================================================================
> --- linux-2.6.orig/mm/util.c	2018-04-25 15:48:39.000000000 +0200
> +++ linux-2.6/mm/util.c	2018-04-25 21:43:31.000000000 +0200
> @@ -14,6 +14,7 @@
>  #include <linux/hugetlb.h>
>  #include <linux/vmalloc.h>
>  #include <linux/userfaultfd_k.h>
> +#include <linux/fault-inject.h>
>  
>  #include <asm/sections.h>
>  #include <linux/uaccess.h>
> @@ -377,6 +378,29 @@ unsigned long vm_mmap(struct file *file,
>  }
>  EXPORT_SYMBOL(vm_mmap);
>  
> +#ifdef CONFIG_FAULT_INJECTION
> +
> +static struct fault_attr kvmalloc_fallback =
> +	FAULT_ATTR_INITIALIZER(CONFIG_FAIL_KVMALLOC_FALLBACK_PROBABILITY);
> +
> +static int __init setup_kvmalloc_fallback(char *str)
> +{
> +	return setup_fault_attr(&kvmalloc_fallback, str);
> +}
> +
> +__setup("kvmalloc_fallback=", setup_kvmalloc_fallback);
> +
> +#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
> +static int __init kvmalloc_fallback_debugfs_init(void)
> +{
> +	fault_create_debugfs_attr("kvmalloc_fallback", NULL, &kvmalloc_fallback);
> +	return 0;
> +}
> +late_initcall(kvmalloc_fallback_debugfs_init);
> +#endif
> +
> +#endif
> +
>  /**
>   * kvmalloc_node - attempt to allocate physically contiguous memory, but upon
>   * failure, fall back to non-contiguous (vmalloc) allocation.
> @@ -404,6 +428,11 @@ void *kvmalloc_node(size_t size, gfp_t f
>  	 */
>  	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
>  
> +#ifdef CONFIG_FAULT_INJECTION
> +	if (should_fail(&kvmalloc_fallback, size))
> +		goto do_vmalloc;
> +#endif
> +
>  	/*
>  	 * We want to attempt a large physically contiguous block first because
>  	 * it is less likely to fragment multiple larger blocks and therefore
> @@ -427,6 +456,7 @@ void *kvmalloc_node(size_t size, gfp_t f
>  	if (ret || size <= PAGE_SIZE)
>  		return ret;
>  
> +do_vmalloc: __maybe_unused
>  	return __vmalloc_node_flags_caller(size, node, flags,
>  			__builtin_return_address(0));
>  }
> Index: linux-2.6/kernel/futex.c
> ===================================================================
> --- linux-2.6.orig/kernel/futex.c	2018-02-14 20:24:42.000000000 +0100
> +++ linux-2.6/kernel/futex.c	2018-04-25 21:11:33.000000000 +0200
> @@ -288,7 +288,7 @@ static struct {
>  
>  	bool ignore_private;
>  } fail_futex = {
> -	.attr = FAULT_ATTR_INITIALIZER,
> +	.attr = FAULT_ATTR_INITIALIZER(0),
>  	.ignore_private = false,
>  };
>  
> Index: linux-2.6/mm/failslab.c
> ===================================================================
> --- linux-2.6.orig/mm/failslab.c	2018-04-16 21:08:36.000000000 +0200
> +++ linux-2.6/mm/failslab.c	2018-04-25 21:11:40.000000000 +0200
> @@ -9,7 +9,7 @@ static struct {
>  	bool ignore_gfp_reclaim;
>  	bool cache_filter;
>  } failslab = {
> -	.attr = FAULT_ATTR_INITIALIZER,
> +	.attr = FAULT_ATTR_INITIALIZER(0),
>  	.ignore_gfp_reclaim = true,
>  	.cache_filter = false,
>  };
> Index: linux-2.6/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.orig/mm/page_alloc.c	2018-04-16 21:08:36.000000000 +0200
> +++ linux-2.6/mm/page_alloc.c	2018-04-25 21:11:47.000000000 +0200
> @@ -3055,7 +3055,7 @@ static struct {
>  	bool ignore_gfp_reclaim;
>  	u32 min_order;
>  } fail_page_alloc = {
> -	.attr = FAULT_ATTR_INITIALIZER,
> +	.attr = FAULT_ATTR_INITIALIZER(0),
>  	.ignore_gfp_reclaim = true,
>  	.ignore_gfp_highmem = true,
>  	.min_order = 1,
> 
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka April 25, 2018, 9:22 p.m. UTC | #3
On Wed, 25 Apr 2018, David Rientjes wrote:

> On Wed, 25 Apr 2018, Mikulas Patocka wrote:
> 
> > From: Mikulas Patocka <mpatocka@redhat.com>
> > Subject: [PATCH] fault-injection: introduce kvmalloc fallback options
> > 
> > This patch introduces a fault-injection option "kvmalloc_fallback". This
> > option makes kvmalloc randomly fall back to vmalloc.
> > 
> > Unfortunately, some kernel code has bugs - it uses kvmalloc and then
> > uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> > code. This options helps to test for these bugs.
> > 
> > The patch introduces a config option FAIL_KVMALLOC_FALLBACK_PROBABILITY.
> > It can be enabled in distribution debug kernels, so that kvmalloc abuse
> > can be tested by the users. The default can be overridden with
> > "kvmalloc_fallback" parameter or in /sys/kernel/debug/kvmalloc_fallback/.
> > 
> 
> Do we really need the new config option?  This could just be manually 
> tunable via fault injection IIUC.

We do, because we want to enable it in RHEL and Fedora debugging kernels, 
so that it will be tested by the users.

The users won't use some extra magic kernel options or debugfs files.

Mikulas


> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > 
> > ---
> >  Documentation/fault-injection/fault-injection.txt |    7 +++++
> >  include/linux/fault-inject.h                      |    9 +++---
> >  kernel/futex.c                                    |    2 -
> >  lib/Kconfig.debug                                 |   15 +++++++++++
> >  mm/failslab.c                                     |    2 -
> >  mm/page_alloc.c                                   |    2 -
> >  mm/util.c                                         |   30 ++++++++++++++++++++++
> >  7 files changed, 60 insertions(+), 7 deletions(-)
> > 
> > Index: linux-2.6/Documentation/fault-injection/fault-injection.txt
> > ===================================================================
> > --- linux-2.6.orig/Documentation/fault-injection/fault-injection.txt	2018-04-16 21:08:34.000000000 +0200
> > +++ linux-2.6/Documentation/fault-injection/fault-injection.txt	2018-04-25 21:36:36.000000000 +0200
> > @@ -15,6 +15,12 @@ o fail_page_alloc
> >  
> >    injects page allocation failures. (alloc_pages(), get_free_pages(), ...)
> >  
> > +o kvmalloc_fallback
> > +
> > +  makes the function kvmalloc randomly fall back to vmalloc. This could be used
> > +  to detects bugs such as using DMA-API on the result of kvmalloc or freeing
> > +  the result of kvmalloc with free.
> > +
> >  o fail_futex
> >  
> >    injects futex deadlock and uaddr fault errors.
> > @@ -167,6 +173,7 @@ use the boot option:
> >  
> >  	failslab=
> >  	fail_page_alloc=
> > +	kvmalloc_fallback=
> >  	fail_make_request=
> >  	fail_futex=
> >  	mmc_core.fail_request=<interval>,<probability>,<space>,<times>
> > Index: linux-2.6/include/linux/fault-inject.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/fault-inject.h	2018-04-16 21:08:36.000000000 +0200
> > +++ linux-2.6/include/linux/fault-inject.h	2018-04-25 21:38:22.000000000 +0200
> > @@ -31,17 +31,18 @@ struct fault_attr {
> >  	struct dentry *dname;
> >  };
> >  
> > -#define FAULT_ATTR_INITIALIZER {					\
> > +#define FAULT_ATTR_INITIALIZER(p) {					\
> > +		.probability = (p),					\
> >  		.interval = 1,						\
> > -		.times = ATOMIC_INIT(1),				\
> > +		.times = ATOMIC_INIT((p) ? -1 : 1),			\
> > +		.verbose = (p) ? 0 : 2,					\
> >  		.require_end = ULONG_MAX,				\
> >  		.stacktrace_depth = 32,					\
> >  		.ratelimit_state = RATELIMIT_STATE_INIT_DISABLED,	\
> > -		.verbose = 2,						\
> >  		.dname = NULL,						\
> >  	}
> >  
> > -#define DECLARE_FAULT_ATTR(name) struct fault_attr name = FAULT_ATTR_INITIALIZER
> > +#define DECLARE_FAULT_ATTR(name) struct fault_attr name = FAULT_ATTR_INITIALIZER(0)
> >  int setup_fault_attr(struct fault_attr *attr, char *str);
> >  bool should_fail(struct fault_attr *attr, ssize_t size);
> >  
> > Index: linux-2.6/lib/Kconfig.debug
> > ===================================================================
> > --- linux-2.6.orig/lib/Kconfig.debug	2018-04-25 15:56:16.000000000 +0200
> > +++ linux-2.6/lib/Kconfig.debug	2018-04-25 21:39:45.000000000 +0200
> > @@ -1527,6 +1527,21 @@ config FAIL_PAGE_ALLOC
> >  	help
> >  	  Provide fault-injection capability for alloc_pages().
> >  
> > +config FAIL_KVMALLOC_FALLBACK_PROBABILITY
> > +	int "Default kvmalloc fallback probability"
> > +	depends on FAULT_INJECTION
> > +	range 0 100
> > +	default "0"
> > +	help
> > +	  This option will make kvmalloc randomly fall back to vmalloc.
> > +	  Normally, kvmalloc falls back to vmalloc only rarely, if memory
> > +	  is fragmented.
> > +
> > +	  This option helps to detect hard-to-reproduce driver bugs, for
> > +	  example using DMA API on the result of kvmalloc.
> > +
> > +	  The default may be overridden with the kvmalloc_fallback parameter.
> > +
> >  config FAIL_MAKE_REQUEST
> >  	bool "Fault-injection capability for disk IO"
> >  	depends on FAULT_INJECTION && BLOCK
> > Index: linux-2.6/mm/util.c
> > ===================================================================
> > --- linux-2.6.orig/mm/util.c	2018-04-25 15:48:39.000000000 +0200
> > +++ linux-2.6/mm/util.c	2018-04-25 21:43:31.000000000 +0200
> > @@ -14,6 +14,7 @@
> >  #include <linux/hugetlb.h>
> >  #include <linux/vmalloc.h>
> >  #include <linux/userfaultfd_k.h>
> > +#include <linux/fault-inject.h>
> >  
> >  #include <asm/sections.h>
> >  #include <linux/uaccess.h>
> > @@ -377,6 +378,29 @@ unsigned long vm_mmap(struct file *file,
> >  }
> >  EXPORT_SYMBOL(vm_mmap);
> >  
> > +#ifdef CONFIG_FAULT_INJECTION
> > +
> > +static struct fault_attr kvmalloc_fallback =
> > +	FAULT_ATTR_INITIALIZER(CONFIG_FAIL_KVMALLOC_FALLBACK_PROBABILITY);
> > +
> > +static int __init setup_kvmalloc_fallback(char *str)
> > +{
> > +	return setup_fault_attr(&kvmalloc_fallback, str);
> > +}
> > +
> > +__setup("kvmalloc_fallback=", setup_kvmalloc_fallback);
> > +
> > +#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
> > +static int __init kvmalloc_fallback_debugfs_init(void)
> > +{
> > +	fault_create_debugfs_attr("kvmalloc_fallback", NULL, &kvmalloc_fallback);
> > +	return 0;
> > +}
> > +late_initcall(kvmalloc_fallback_debugfs_init);
> > +#endif
> > +
> > +#endif
> > +
> >  /**
> >   * kvmalloc_node - attempt to allocate physically contiguous memory, but upon
> >   * failure, fall back to non-contiguous (vmalloc) allocation.
> > @@ -404,6 +428,11 @@ void *kvmalloc_node(size_t size, gfp_t f
> >  	 */
> >  	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> >  
> > +#ifdef CONFIG_FAULT_INJECTION
> > +	if (should_fail(&kvmalloc_fallback, size))
> > +		goto do_vmalloc;
> > +#endif
> > +
> >  	/*
> >  	 * We want to attempt a large physically contiguous block first because
> >  	 * it is less likely to fragment multiple larger blocks and therefore
> > @@ -427,6 +456,7 @@ void *kvmalloc_node(size_t size, gfp_t f
> >  	if (ret || size <= PAGE_SIZE)
> >  		return ret;
> >  
> > +do_vmalloc: __maybe_unused
> >  	return __vmalloc_node_flags_caller(size, node, flags,
> >  			__builtin_return_address(0));
> >  }
> > Index: linux-2.6/kernel/futex.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/futex.c	2018-02-14 20:24:42.000000000 +0100
> > +++ linux-2.6/kernel/futex.c	2018-04-25 21:11:33.000000000 +0200
> > @@ -288,7 +288,7 @@ static struct {
> >  
> >  	bool ignore_private;
> >  } fail_futex = {
> > -	.attr = FAULT_ATTR_INITIALIZER,
> > +	.attr = FAULT_ATTR_INITIALIZER(0),
> >  	.ignore_private = false,
> >  };
> >  
> > Index: linux-2.6/mm/failslab.c
> > ===================================================================
> > --- linux-2.6.orig/mm/failslab.c	2018-04-16 21:08:36.000000000 +0200
> > +++ linux-2.6/mm/failslab.c	2018-04-25 21:11:40.000000000 +0200
> > @@ -9,7 +9,7 @@ static struct {
> >  	bool ignore_gfp_reclaim;
> >  	bool cache_filter;
> >  } failslab = {
> > -	.attr = FAULT_ATTR_INITIALIZER,
> > +	.attr = FAULT_ATTR_INITIALIZER(0),
> >  	.ignore_gfp_reclaim = true,
> >  	.cache_filter = false,
> >  };
> > Index: linux-2.6/mm/page_alloc.c
> > ===================================================================
> > --- linux-2.6.orig/mm/page_alloc.c	2018-04-16 21:08:36.000000000 +0200
> > +++ linux-2.6/mm/page_alloc.c	2018-04-25 21:11:47.000000000 +0200
> > @@ -3055,7 +3055,7 @@ static struct {
> >  	bool ignore_gfp_reclaim;
> >  	u32 min_order;
> >  } fail_page_alloc = {
> > -	.attr = FAULT_ATTR_INITIALIZER,
> > +	.attr = FAULT_ATTR_INITIALIZER(0),
> >  	.ignore_gfp_reclaim = true,
> >  	.ignore_gfp_highmem = true,
> >  	.min_order = 1,
> > 
> > 
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
James Bottomley April 25, 2018, 10:17 p.m. UTC | #4
On Wed, 2018-04-25 at 17:22 -0400, Mikulas Patocka wrote:
> 
> On Wed, 25 Apr 2018, David Rientjes wrote:
> 
> > On Wed, 25 Apr 2018, Mikulas Patocka wrote:
> > 
> > > From: Mikulas Patocka <mpatocka@redhat.com>
> > > Subject: [PATCH] fault-injection: introduce kvmalloc fallback
> > > options
> > > 
> > > This patch introduces a fault-injection option
> > > "kvmalloc_fallback". This option makes kvmalloc randomly fall
> > > back to vmalloc.
> > > 
> > > Unfortunately, some kernel code has bugs - it uses kvmalloc and
> > > then uses DMA-API on the returned memory or frees it with kfree.
> > > Such bugs were found in the virtio-net driver, dm-integrity or
> > > RHEL7 powerpc-specific code. This options helps to test for these
> > > bugs.
> > > 
> > > The patch introduces a config option
> > > FAIL_KVMALLOC_FALLBACK_PROBABILITY.
> > > It can be enabled in distribution debug kernels, so that kvmalloc
> > > abuse can be tested by the users. The default can be overridden
> > > with "kvmalloc_fallback" parameter or in
> > > /sys/kernel/debug/kvmalloc_fallback/.
> > > 
> > 
> > Do we really need the new config option?  This could just be
> > manually  tunable via fault injection IIUC.
> 
> We do, because we want to enable it in RHEL and Fedora debugging
> kernels, so that it will be tested by the users.
> 
> The users won't use some extra magic kernel options or debugfs files.

If it can be enabled via a tunable, then the distro can turn it on
without the user having to do anything.  If you want to present the
user with a different boot option, you can (just have the tunable set
on the command line), but being tunable driven means that you don't
have to choose that option, you could automatically enable it under a
range of circumstances.  I think most sane distributions would want
that flexibility.

Kconfig proliferation, conversely, is a bit of a nightmare from both
the user and the tester's point of view, so we're trying to avoid it
unless absolutely necessary.

James

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka April 25, 2018, 10:42 p.m. UTC | #5
On Wed, 25 Apr 2018, James Bottomley wrote:

> On Wed, 2018-04-25 at 17:22 -0400, Mikulas Patocka wrote:
> > 
> > On Wed, 25 Apr 2018, David Rientjes wrote:
> > > 
> > > Do we really need the new config option?  This could just be
> > > manually  tunable via fault injection IIUC.
> > 
> > We do, because we want to enable it in RHEL and Fedora debugging
> > kernels, so that it will be tested by the users.
> > 
> > The users won't use some extra magic kernel options or debugfs files.
> 
> If it can be enabled via a tunable, then the distro can turn it on
> without the user having to do anything.

You need to enable it on boot. Enabling it when the kernel starts to 
execute userspace code is already too late (because you would miss 
kvmalloc calls in the kernel boot path).

These are files in the kernel-debug rpm package. Where would you put the 
extra kernel parameter to enable this feature? None of these files contain 
kernel parameters.

kernel-debug              /boot/.vmlinuz-3.10.0-693.21.1.el7.x86_64.debug.hmac
kernel-debug              /boot/System.map-3.10.0-693.21.1.el7.x86_64.debug
kernel-debug              /boot/config-3.10.0-693.21.1.el7.x86_64.debug
kernel-debug              /boot/initramfs-3.10.0-693.21.1.el7.x86_64.debug.img
kernel-debug              /boot/symvers-3.10.0-693.21.1.el7.x86_64.debug.gz
kernel-debug              /boot/vmlinuz-3.10.0-693.21.1.el7.x86_64.debug
kernel-debug              /etc/ld.so.conf.d/kernel-3.10.0-693.21.1.el7.x86_64.debug.conf
kernel-debug              /lib/modules/3.10.0-693.21.1.el7.x86_64.debug

> If you want to present the user with a different boot option, you can 
> (just have the tunable set on the command line), but being tunable 
> driven means that you don't have to choose that option, you could 
> automatically enable it under a range of circumstances.  I think most 
> sane distributions would want that flexibility.
> 
> Kconfig proliferation, conversely, is a bit of a nightmare from both
> the user and the tester's point of view, so we're trying to avoid it
> unless absolutely necessary.
> 
> James

I already offered that we don't need to introduce a new kernel option and 
we can bind this feature to any other kernel option, that is enabled in 
the debug kernel, for example CONFIG_DEBUG_SG. Michal said no and he said 
that he wants a new kernel option instead.

Mikulas
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
David Rientjes April 25, 2018, 10:49 p.m. UTC | #6
On Wed, 25 Apr 2018, Mikulas Patocka wrote:

> You need to enable it on boot. Enabling it when the kernel starts to 
> execute userspace code is already too late (because you would miss 
> kvmalloc calls in the kernel boot path).
> 

Is your motivation that since kvmalloc() never falls back to vmalloc() on 
boot because fragmentation is not be an issue at boot that we should catch 
bugs where it would matter if it had fallen back?  If we are worrying 
about falling back to vmalloc before even initscripts have run I think we 
have bigger problems.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka April 25, 2018, 10:56 p.m. UTC | #7
On Wed, 25 Apr 2018, David Rientjes wrote:

> On Wed, 25 Apr 2018, Mikulas Patocka wrote:
> 
> > You need to enable it on boot. Enabling it when the kernel starts to 
> > execute userspace code is already too late (because you would miss 
> > kvmalloc calls in the kernel boot path).
> 
> Is your motivation that since kvmalloc() never falls back to vmalloc() on 
> boot because fragmentation is not be an issue at boot that we should catch 
> bugs where it would matter if it had fallen back?  If we are worrying 
> about falling back to vmalloc before even initscripts have run I think we 
> have bigger problems.

The same driver can be compiled directly into the kernel or be loaded as a 
module. If the user (or the person preparing distro kernel) compiles the 
driver directly into the kernel, kvmalloc should be tested on that driver, 
because a different user or distribution can compile that driver as a 
module.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka April 25, 2018, 11 p.m. UTC | #8
On Wed, 25 Apr 2018, James Bottomley wrote:

> > > Do we really need the new config option?  This could just be
> > > manually  tunable via fault injection IIUC.
> > 
> > We do, because we want to enable it in RHEL and Fedora debugging
> > kernels, so that it will be tested by the users.
> > 
> > The users won't use some extra magic kernel options or debugfs files.
> 
> If it can be enabled via a tunable, then the distro can turn it on
> without the user having to do anything.  If you want to present the
> user with a different boot option, you can (just have the tunable set
> on the command line), but being tunable driven means that you don't
> have to choose that option, you could automatically enable it under a
> range of circumstances.  I think most sane distributions would want
> that flexibility.
> 
> Kconfig proliferation, conversely, is a bit of a nightmare from both
> the user and the tester's point of view, so we're trying to avoid it
> unless absolutely necessary.
> 
> James

BTW. even developers who compile their own kernel should have this enabled 
by a CONFIG option - because if the developer sees the option when 
browsing through menuconfig, he may enable it. If he doesn't see the 
option, he won't even know that such an option exists.

Mikulas
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
James Bottomley April 25, 2018, 11:08 p.m. UTC | #9
On Wed, 2018-04-25 at 19:00 -0400, Mikulas Patocka wrote:
> 
> On Wed, 25 Apr 2018, James Bottomley wrote:
> 
> > > > Do we really need the new config option?  This could just be
> > > > manually  tunable via fault injection IIUC.
> > > 
> > > We do, because we want to enable it in RHEL and Fedora debugging
> > > kernels, so that it will be tested by the users.
> > > 
> > > The users won't use some extra magic kernel options or debugfs
> files.
> > 
> > If it can be enabled via a tunable, then the distro can turn it on
> > without the user having to do anything.  If you want to present the
> > user with a different boot option, you can (just have the tunable
> set
> > on the command line), but being tunable driven means that you don't
> > have to choose that option, you could automatically enable it under
> a
> > range of circumstances.  I think most sane distributions would want
> > that flexibility.
> > 
> > Kconfig proliferation, conversely, is a bit of a nightmare from
> both
> > the user and the tester's point of view, so we're trying to avoid
> it
> > unless absolutely necessary.
> > 
> > James
> 
> BTW. even developers who compile their own kernel should have this
> enabled by a CONFIG option - because if the developer sees the option
> when browsing through menuconfig, he may enable it. If he doesn't see
> the option, he won't even know that such an option exists.

I may be an atypical developer but I'd rather have a root canal than
browse through menuconfig options.  The way to get people to learn
about new debugging options is to blog about it (or write an lwn.net
article) which google will find the next time I ask it how I debug XXX.
 Google (probably as a service to humanity) rarely turns up Kconfig
options in response to a query.

James

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Michal Hocko April 26, 2018, 12:58 p.m. UTC | #10
On Wed 25-04-18 18:42:57, Mikulas Patocka wrote:
> 
> 
> On Wed, 25 Apr 2018, James Bottomley wrote:
[...]
> > Kconfig proliferation, conversely, is a bit of a nightmare from both
> > the user and the tester's point of view, so we're trying to avoid it
> > unless absolutely necessary.
> > 
> > James
> 
> I already offered that we don't need to introduce a new kernel option and 
> we can bind this feature to any other kernel option, that is enabled in 
> the debug kernel, for example CONFIG_DEBUG_SG. Michal said no and he said 
> that he wants a new kernel option instead.

Just for the record. I didn't say I _want_ a config option. Do not
misinterpret my words. I've said that a config option would be
acceptable if there is no way to deliver the functionality via kernel
package automatically. You haven't provided any argument that would
explain why the kernel package cannot add a boot option. Maybe there are
some but I do not see them right now.
Mikulas Patocka April 26, 2018, 2:28 p.m. UTC | #11
On Thu, 26 Apr 2018, Michal Hocko wrote:

> On Wed 25-04-18 18:42:57, Mikulas Patocka wrote:
> > 
> > 
> > On Wed, 25 Apr 2018, James Bottomley wrote:
> [...]
> > > Kconfig proliferation, conversely, is a bit of a nightmare from both
> > > the user and the tester's point of view, so we're trying to avoid it
> > > unless absolutely necessary.
> > > 
> > > James
> > 
> > I already offered that we don't need to introduce a new kernel option and 
> > we can bind this feature to any other kernel option, that is enabled in 
> > the debug kernel, for example CONFIG_DEBUG_SG. Michal said no and he said 
> > that he wants a new kernel option instead.
> 
> Just for the record. I didn't say I _want_ a config option. Do not
> misinterpret my words. I've said that a config option would be
> acceptable if there is no way to deliver the functionality via kernel
> package automatically. You haven't provided any argument that would
> explain why the kernel package cannot add a boot option. Maybe there are
> some but I do not see them right now.

AFAIK Grub doesn't load per-kernel options from a per-kernel file. Even if 
we hacked grub scripts to add this option, other distributions won't.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
James Bottomley April 26, 2018, 2:45 p.m. UTC | #12
On Thu, 2018-04-26 at 10:28 -0400, Mikulas Patocka wrote:
> 
> On Thu, 26 Apr 2018, Michal Hocko wrote:
> 
> > On Wed 25-04-18 18:42:57, Mikulas Patocka wrote:
> > > 
> > > 
> > > On Wed, 25 Apr 2018, James Bottomley wrote:
> > [...]
> > > > Kconfig proliferation, conversely, is a bit of a nightmare from
> both
> > > > the user and the tester's point of view, so we're trying to
> avoid it
> > > > unless absolutely necessary.
> > > > 
> > > > James
> > > 
> > > I already offered that we don't need to introduce a new kernel
> option and 
> > > we can bind this feature to any other kernel option, that is
> enabled in 
> > > the debug kernel, for example CONFIG_DEBUG_SG. Michal said no and
> he said 
> > > that he wants a new kernel option instead.
> > 
> > Just for the record. I didn't say I _want_ a config option. Do not
> > misinterpret my words. I've said that a config option would be
> > acceptable if there is no way to deliver the functionality via
> kernel
> > package automatically. You haven't provided any argument that would
> > explain why the kernel package cannot add a boot option. Maybe
> there are
> > some but I do not see them right now.
> 
> AFAIK Grub doesn't load per-kernel options from a per-kernel file.
> Even if we hacked grub scripts to add this option, other
> distributions won't.

Perhaps find out beforehand instead of insisting on an approach without
knowing.  On openSUSE the grub config is built from the files in
/etc/grub.d/ so any package can add a kernel option (and various
conditions around activating it) simply by adding a new file.  The
config files are quite sophisticated, so you can add what looks to be a
new kernel, but is really an existing kernel with different options
this way.

James

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka April 26, 2018, 2:55 p.m. UTC | #13
On Wed, 25 Apr 2018, James Bottomley wrote:

> > BTW. even developers who compile their own kernel should have this
> > enabled by a CONFIG option - because if the developer sees the option
> > when browsing through menuconfig, he may enable it. If he doesn't see
> > the option, he won't even know that such an option exists.
> 
> I may be an atypical developer but I'd rather have a root canal than
> browse through menuconfig options.  The way to get people to learn
> about new debugging options is to blog about it (or write an lwn.net
> article) which google will find the next time I ask it how I debug XXX.
>  Google (probably as a service to humanity) rarely turns up Kconfig
> options in response to a query.

>From my point of view, this feature should be as little disruptive to the 
developer as possible. It should work automatically behind the scenes 
without the developer or the tester even knowing that it is working. From 
this point of view, binding it to CONFIG_DEBUG_SG (or any other commonly 
used debugging option) would be ideal, because driver developers already 
enable CONFIG_DEBUG_SG, so they'll get this kvmalloc test for free.

>From your point of view, you should introduce a sysfs file and a kernel 
parameter that no one knows about - and then start blogging about it - to 
let people know. Why would you bother people with this knowledge? They'll 
forget about it anyway and won't turn it on.

Mikulas
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka April 26, 2018, 3:05 p.m. UTC | #14
On Thu, 26 Apr 2018, James Bottomley wrote:

> On Thu, 2018-04-26 at 10:28 -0400, Mikulas Patocka wrote:
> > 
> > On Thu, 26 Apr 2018, Michal Hocko wrote:
> > 
> > > On Wed 25-04-18 18:42:57, Mikulas Patocka wrote:
> > > > 
> > > > 
> > > > On Wed, 25 Apr 2018, James Bottomley wrote:
> > > [...]
> > > > > Kconfig proliferation, conversely, is a bit of a nightmare from
> > both
> > > > > the user and the tester's point of view, so we're trying to
> > avoid it
> > > > > unless absolutely necessary.
> > > > > 
> > > > > James
> > > > 
> > > > I already offered that we don't need to introduce a new kernel
> > option and 
> > > > we can bind this feature to any other kernel option, that is
> > enabled in 
> > > > the debug kernel, for example CONFIG_DEBUG_SG. Michal said no and
> > he said 
> > > > that he wants a new kernel option instead.
> > > 
> > > Just for the record. I didn't say I _want_ a config option. Do not
> > > misinterpret my words. I've said that a config option would be
> > > acceptable if there is no way to deliver the functionality via
> > kernel
> > > package automatically. You haven't provided any argument that would
> > > explain why the kernel package cannot add a boot option. Maybe
> > there are
> > > some but I do not see them right now.
> > 
> > AFAIK Grub doesn't load per-kernel options from a per-kernel file.
> > Even if we hacked grub scripts to add this option, other
> > distributions won't.
> 
> Perhaps find out beforehand instead of insisting on an approach without
> knowing.  On openSUSE the grub config is built from the files in
> /etc/grub.d/ so any package can add a kernel option (and various
> conditions around activating it) simply by adding a new file.

And then, different versions of the debug kernel will clash when 
attempting to create the same file.

And what about other distributions? What about people who the RHEL kernel 
from source with "make"?

The problem with this approach that you are trying to bother more and more 
people with this little silly feature.

> The config files are quite sophisticated, so you can add what looks to 
> be a new kernel, but is really an existing kernel with different options 
> this way.
> 
> James

Mikulas
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka April 26, 2018, 3:22 p.m. UTC | #15
On Thu, 26 Apr 2018, Mikulas Patocka wrote:

> 
> 
> On Wed, 25 Apr 2018, James Bottomley wrote:
> 
> > > BTW. even developers who compile their own kernel should have this
> > > enabled by a CONFIG option - because if the developer sees the option
> > > when browsing through menuconfig, he may enable it. If he doesn't see
> > > the option, he won't even know that such an option exists.
> > 
> > I may be an atypical developer but I'd rather have a root canal than
> > browse through menuconfig options.  The way to get people to learn
> > about new debugging options is to blog about it (or write an lwn.net
> > article) which google will find the next time I ask it how I debug XXX.
> >  Google (probably as a service to humanity) rarely turns up Kconfig
> > options in response to a query.
> 
> From my point of view, this feature should be as little disruptive to the 
> developer as possible. It should work automatically behind the scenes 
> without the developer or the tester even knowing that it is working. From 
> this point of view, binding it to CONFIG_DEBUG_SG (or any other commonly 
> used debugging option) would be ideal, because driver developers already 
> enable CONFIG_DEBUG_SG, so they'll get this kvmalloc test for free.
> 
> From your point of view, you should introduce a sysfs file and a kernel 
> parameter that no one knows about - and then start blogging about it - to 
> let people know. Why would you bother people with this knowledge? They'll 
> forget about it anyway and won't turn it on.

BTW. try to think about - how many total lines of code should this feature 
consume in the whole Linux ecosystem?

I made a 10-line patch. I got pushback.

I remade it to a 53-line patch. And you try to suggest that 53 lines is 
not enough and we must also change kernel packaging scripts in distro 
kernels, because the kernel just cannot enable this feature on its own.

If we hack kernel packaging scripts in most distros, how many lines of 
code would that be? What's your target?

Mikulas
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
James Bottomley April 26, 2018, 3:24 p.m. UTC | #16
On Thu, 2018-04-26 at 11:05 -0400, Mikulas Patocka wrote:
> 
> On Thu, 26 Apr 2018, James Bottomley wrote:
[...]
> > Perhaps find out beforehand instead of insisting on an approach
> without
> > knowing.  On openSUSE the grub config is built from the files in
> > /etc/grub.d/ so any package can add a kernel option (and various
> > conditions around activating it) simply by adding a new file.
> 
> And then, different versions of the debug kernel will clash when 
> attempting to create the same file.

Don't be silly ... there are many ways of coping with that in rpm/dpkg.
 However, I take it the fact you're now trying to get me to explain
them means you take the point that a kernel dynamic option can be
activated in a variety of easy ways in a distribution including through
the boot menu; so if you want this to appear in the boot menu you don't
need a Kconfig option to achieve it.

> And what about other distributions? What about people who the RHEL
> kernel from source with "make"?

Well, if you build your own kernel and we have a dynamic option, it
will "just work" without you having to muck about trying to re-Kconfig
it, so I'd see that as a win.

> The problem with this approach that you are trying to bother more and
> more people with this little silly feature.

So you're shifting your argument from "I have to do it as a Kconfig
option because the distros require it" to "distributions will build
separate kernel packages for this, but won't do enabling in a non
kernel package"?  To be honest, I think the argument is nuts but I
don't really care.  From my point of view it's usually me explaining to
people how to debug stuff and "you have to build your own kernel with
this Kconfig option" compared to "add this to the kernel command line
and reboot" is much more effort for the debugger.

James

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka April 26, 2018, 3:44 p.m. UTC | #17
On Thu, 26 Apr 2018, James Bottomley wrote:

> On Thu, 2018-04-26 at 11:05 -0400, Mikulas Patocka wrote:
> > 
> > On Thu, 26 Apr 2018, James Bottomley wrote:
> [...]
> > > Perhaps find out beforehand instead of insisting on an approach
> > without
> > > knowing.  On openSUSE the grub config is built from the files in
> > > /etc/grub.d/ so any package can add a kernel option (and various
> > > conditions around activating it) simply by adding a new file.
> > 
> > And then, different versions of the debug kernel will clash when 
> > attempting to create the same file.
> 
> Don't be silly ... there are many ways of coping with that in rpm/dpkg.

I know you can deal with it - but how many lines of code will that 
consume? Multiplied by the total number of rpm-based distros.

Mikulas
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka April 26, 2018, 3:55 p.m. UTC | #18
On Thu, 26 Apr 2018, James Bottomley wrote:

> So you're shifting your argument from "I have to do it as a Kconfig
> option because the distros require it" to "distributions will build
> separate kernel packages for this, but won't do enabling in a non
> kernel package"?  To be honest, I think the argument is nuts but I
> don't really care.  From my point of view it's usually me explaining to
> people how to debug stuff and "you have to build your own kernel with
> this Kconfig option" compared to "add this to the kernel command line
> and reboot" is much more effort for the debugger.
> 
> James

If you have to explain to the user that he needs to turn it on, it is 
already wrong.

In order to find the kvmalloc abuses, it should be tested by as many users 
as possible. And it could be tested by as many users as possible, if it 
can be enabled in a VISIBLE place (i.e. menuconfig) - or (in my opinion 
even better) it should be bound to an CONFIG_ option that is already 
enabled for debugging kernel - then you won't have to explain anything to 
the user at all.

Hardly anyone - except for people who read this thread - will know about 
the new commandline parameters or debugfs files.

I'm not arguing that the commandline parameter or debugfs files are wrong. 
They are OK to overridde the default settings for advanced users. But they 
are useless for common users because common users won't know about them.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Michael S. Tsirkin April 26, 2018, 3:59 p.m. UTC | #19
On Thu, Apr 26, 2018 at 11:44:21AM -0400, Mikulas Patocka wrote:
> 
> 
> On Thu, 26 Apr 2018, James Bottomley wrote:
> 
> > On Thu, 2018-04-26 at 11:05 -0400, Mikulas Patocka wrote:
> > > 
> > > On Thu, 26 Apr 2018, James Bottomley wrote:
> > [...]
> > > > Perhaps find out beforehand instead of insisting on an approach
> > > without
> > > > knowing.  On openSUSE the grub config is built from the files in
> > > > /etc/grub.d/ so any package can add a kernel option (and various
> > > > conditions around activating it) simply by adding a new file.
> > > 
> > > And then, different versions of the debug kernel will clash when 
> > > attempting to create the same file.
> > 
> > Don't be silly ... there are many ways of coping with that in rpm/dpkg.
> 
> I know you can deal with it - but how many lines of code will that 
> consume? Multiplied by the total number of rpm-based distros.
> 
> Mikulas

I don't think debug kernels should inject faults by default.

IIUC debug kernels mainly exist so people who experience e.g. memory
corruption can try and debug the failure. In this case, CONFIG_DEBUG_SG
will *already* catch a failure early. Nothing special needs to be done.

There is a much smaller class of people like QA who go actively looking
for trouble. That's the kind of thing fault injection is good for, and
IMO you do not want your QA team to test a separate kernel - otherwise
you are never quite sure whatever was tested will work in the field.
So a config option won't help them either.

How do you make sure QA tests a specific corner case? Add it to
the test plan :)

I don't speak for Red Hat, etc.
Mikulas Patocka April 26, 2018, 4:07 p.m. UTC | #20
On Thu, 26 Apr 2018, Michael S. Tsirkin wrote:

> On Thu, Apr 26, 2018 at 11:44:21AM -0400, Mikulas Patocka wrote:
> > 
> > 
> > On Thu, 26 Apr 2018, James Bottomley wrote:
> > 
> > > On Thu, 2018-04-26 at 11:05 -0400, Mikulas Patocka wrote:
> > > > 
> > > > On Thu, 26 Apr 2018, James Bottomley wrote:
> > > [...]
> > > > > Perhaps find out beforehand instead of insisting on an approach
> > > > without
> > > > > knowing.  On openSUSE the grub config is built from the files in
> > > > > /etc/grub.d/ so any package can add a kernel option (and various
> > > > > conditions around activating it) simply by adding a new file.
> > > > 
> > > > And then, different versions of the debug kernel will clash when 
> > > > attempting to create the same file.
> > > 
> > > Don't be silly ... there are many ways of coping with that in rpm/dpkg.
> > 
> > I know you can deal with it - but how many lines of code will that 
> > consume? Multiplied by the total number of rpm-based distros.
> > 
> > Mikulas
> 
> I don't think debug kernels should inject faults by default.

But it is not injecting any faults. It is using the fault-injection 
framework because Michal said that it should use it. The original version 
of the patch didn't use the fault-injection framework at all.

The kvmalloc function can take two branches, the kmalloc branch and 
vmalloc branch. The vmalloc branch is taken rarely, so the kernel contains 
bugs regarding this path - and the patch increases the probability that 
the vmalloc patch is taken, to discover the bugs early and make them 
reproducible.

> IIUC debug kernels mainly exist so people who experience e.g. memory
> corruption can try and debug the failure. In this case, CONFIG_DEBUG_SG
> will *already* catch a failure early. Nothing special needs to be done.

The patch helps people debug such memory coprruptions (such as using DMA 
API on the result of kvmalloc).

> There is a much smaller class of people like QA who go actively looking
> for trouble. That's the kind of thing fault injection is good for, and
> IMO you do not want your QA team to test a separate kernel - otherwise
> you are never quite sure whatever was tested will work in the field.
> So a config option won't help them either.
> 
> How do you make sure QA tests a specific corner case? Add it to
> the test plan :)
> 
> I don't speak for Red Hat, etc.
> 
> -- 
> MST

Mikulas
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Michael S. Tsirkin April 26, 2018, 6:49 p.m. UTC | #21
On Thu, Apr 26, 2018 at 12:07:25PM -0400, Mikulas Patocka wrote:
> > IIUC debug kernels mainly exist so people who experience e.g. memory
> > corruption can try and debug the failure. In this case, CONFIG_DEBUG_SG
> > will *already* catch a failure early. Nothing special needs to be done.
> 
> The patch helps people debug such memory coprruptions (such as using DMA 
> API on the result of kvmalloc).

That's my point.  I don't think your patch helps debug any memory
corruptions.  With CONFIG_DEBUG_SG using DMA API already causes a
BUG_ON, that's before any memory can get corrupted.
Mikulas Patocka April 26, 2018, 6:54 p.m. UTC | #22
On Thu, 26 Apr 2018, Michael S. Tsirkin wrote:

> On Thu, Apr 26, 2018 at 12:07:25PM -0400, Mikulas Patocka wrote:
> > > IIUC debug kernels mainly exist so people who experience e.g. memory
> > > corruption can try and debug the failure. In this case, CONFIG_DEBUG_SG
> > > will *already* catch a failure early. Nothing special needs to be done.
> > 
> > The patch helps people debug such memory coprruptions (such as using DMA 
> > API on the result of kvmalloc).
> 
> That's my point.  I don't think your patch helps debug any memory
> corruptions.  With CONFIG_DEBUG_SG using DMA API already causes a
> BUG_ON, that's before any memory can get corrupted.

The patch turns a hard-to-reproduce bug into an easy-to-reproduce bug. 

Obviously we don't want this in production kernels, but in the debug 
kernels it should be done.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka April 26, 2018, 6:58 p.m. UTC | #23
On Thu, 26 Apr 2018, Michael S. Tsirkin wrote:

> How do you make sure QA tests a specific corner case? Add it to
> the test plan :)

BTW. how many "lines of code" of corporate bureaucracy would that take? :-)

> I don't speak for Red Hat, etc.
> 
> -- 
> MST

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
John Stoffel April 26, 2018, 6:58 p.m. UTC | #24
>>>>> "James" == James Bottomley <James.Bottomley@HansenPartnership.com> writes:

James> On Wed, 2018-04-25 at 19:00 -0400, Mikulas Patocka wrote:
>> 
>> On Wed, 25 Apr 2018, James Bottomley wrote:
>> 
>> > > > Do we really need the new config option?  This could just be
>> > > > manually  tunable via fault injection IIUC.
>> > > 
>> > > We do, because we want to enable it in RHEL and Fedora debugging
>> > > kernels, so that it will be tested by the users.
>> > > 
>> > > The users won't use some extra magic kernel options or debugfs
>> files.
>> > 
>> > If it can be enabled via a tunable, then the distro can turn it on
>> > without the user having to do anything.  If you want to present the
>> > user with a different boot option, you can (just have the tunable
>> set
>> > on the command line), but being tunable driven means that you don't
>> > have to choose that option, you could automatically enable it under
>> a
>> > range of circumstances.  I think most sane distributions would want
>> > that flexibility.
>> > 
>> > Kconfig proliferation, conversely, is a bit of a nightmare from
>> both
>> > the user and the tester's point of view, so we're trying to avoid
>> it
>> > unless absolutely necessary.
>> > 
>> > James
>> 
>> BTW. even developers who compile their own kernel should have this
>> enabled by a CONFIG option - because if the developer sees the option
>> when browsing through menuconfig, he may enable it. If he doesn't see
>> the option, he won't even know that such an option exists.

James> I may be an atypical developer but I'd rather have a root canal
James> than browse through menuconfig options.  The way to get people
James> to learn about new debugging options is to blog about it (or
James> write an lwn.net article) which google will find the next time
James> I ask it how I debug XXX.  Google (probably as a service to
James> humanity) rarely turns up Kconfig options in response to a
James> query.

I agree with James here.  Looking at the SLAB vs SLUB Kconfig entries
tells me *nothing* about why I should pick one or the other, as an
example.

John

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Michael S. Tsirkin April 26, 2018, 7:05 p.m. UTC | #25
On Thu, Apr 26, 2018 at 02:58:08PM -0400, Mikulas Patocka wrote:
> 
> 
> On Thu, 26 Apr 2018, Michael S. Tsirkin wrote:
> 
> > How do you make sure QA tests a specific corner case? Add it to
> > the test plan :)
> 
> BTW. how many "lines of code" of corporate bureaucracy would that take? :-)

It's pretty easy at least here at Red Hat.

> > I don't speak for Red Hat, etc.
> > 
> > -- 
> > MST
> 
> Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Michael S. Tsirkin April 26, 2018, 7:14 p.m. UTC | #26
On Thu, Apr 26, 2018 at 02:54:26PM -0400, Mikulas Patocka wrote:
> 
> 
> On Thu, 26 Apr 2018, Michael S. Tsirkin wrote:
> 
> > On Thu, Apr 26, 2018 at 12:07:25PM -0400, Mikulas Patocka wrote:
> > > > IIUC debug kernels mainly exist so people who experience e.g. memory
> > > > corruption can try and debug the failure. In this case, CONFIG_DEBUG_SG
> > > > will *already* catch a failure early. Nothing special needs to be done.
> > > 
> > > The patch helps people debug such memory coprruptions (such as using DMA 
> > > API on the result of kvmalloc).
> > 
> > That's my point.  I don't think your patch helps debug any memory
> > corruptions.  With CONFIG_DEBUG_SG using DMA API already causes a
> > BUG_ON, that's before any memory can get corrupted.
> 
> The patch turns a hard-to-reproduce bug into an easy-to-reproduce bug. 

It's still not a memory corruption. It's a BUG_ON the source of which -
should it trigger - can be typically found using grep.

> Obviously we don't want this in production kernels, but in the debug 
> kernels it should be done.
> 
> Mikulas

I'm not so sure. debug kernels should make debugging easier,
definitely.

Unfortunately they are already slower so some races don't trigger.

If they also start crashing more because we are injecting
memory allocation errors, people are even less likely to
be able to use them.

Just add a comment near the BUG_ON within DMA API telling people how
they can inject this error some more if the bug does not
reproduce, and leave it at that.
Mikulas Patocka April 26, 2018, 7:36 p.m. UTC | #27
On Thu, 26 Apr 2018, Michael S. Tsirkin wrote:

> On Thu, Apr 26, 2018 at 02:54:26PM -0400, Mikulas Patocka wrote:
> > 
> > 
> > On Thu, 26 Apr 2018, Michael S. Tsirkin wrote:
> > 
> > > On Thu, Apr 26, 2018 at 12:07:25PM -0400, Mikulas Patocka wrote:
> > > > > IIUC debug kernels mainly exist so people who experience e.g. memory
> > > > > corruption can try and debug the failure. In this case, CONFIG_DEBUG_SG
> > > > > will *already* catch a failure early. Nothing special needs to be done.
> > > > 
> > > > The patch helps people debug such memory coprruptions (such as using DMA 
> > > > API on the result of kvmalloc).
> > > 
> > > That's my point.  I don't think your patch helps debug any memory
> > > corruptions.  With CONFIG_DEBUG_SG using DMA API already causes a
> > > BUG_ON, that's before any memory can get corrupted.
> > 
> > The patch turns a hard-to-reproduce bug into an easy-to-reproduce bug. 
> 
> It's still not a memory corruption. It's a BUG_ON the source of which -
> should it trigger - can be typically found using grep.
> 
> > Obviously we don't want this in production kernels, but in the debug 
> > kernels it should be done.
> > 
> > Mikulas
> 
> I'm not so sure. debug kernels should make debugging easier,
> definitely.
> 
> Unfortunately they are already slower so some races don't trigger.
> 
> If they also start crashing more because we are injecting
> memory allocation errors, people are even less likely to
> be able to use them.

I've actually already pushed this patch to RHEL-7 (just before 7.5 was 
released) and it found out some powerpc issues. See the commit 
ea376cc55bc3 in the RHEL-7 git. It was reverted just before RHEL-7.5 was 
released with the intention that it will be reinstated just after RHEL-7.5 
release, so that these issues could be found and eliminated in the 
7.5->7.6 development cycle. Jeff Moyer asked me to put it upstream because 
they want to follow upstream and they don't like RHEL-specific patches. 
There's clear incentive to put this patch to RHEL-7, that's why I'm 
posting it here.

> Just add a comment near the BUG_ON within DMA API telling people how
> they can inject this error some more if the bug does not
> reproduce, and leave it at that.

But the problem is that the powerpc bug only triggers with this patch. It 
doesn't trigger without it. So, we have a potential random-crashing bug in 
the codebase (and perhaps more others) and we want to eliminate them - 
that's why we need the patch.

People on this list argue "this should be a kernel parameter". But the 
testers won't enable the kernel parameter, the crashes won't happen 
without the kernel parameter and the bugs will stay unreported and 
uncorrected. That's why it needs to be the default.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Michael S. Tsirkin April 26, 2018, 7:45 p.m. UTC | #28
On Thu, Apr 26, 2018 at 03:36:14PM -0400, Mikulas Patocka wrote:
> People on this list argue "this should be a kernel parameter".

How about making it a writeable attribute, so it's easy to turn on/off
after boot. Then you can keep it deterministic, userspace can play with
the attribute at random if it wants to.
Mikulas Patocka April 26, 2018, 8:05 p.m. UTC | #29
On Thu, 26 Apr 2018, Michael S. Tsirkin wrote:

> On Thu, Apr 26, 2018 at 03:36:14PM -0400, Mikulas Patocka wrote:
> > People on this list argue "this should be a kernel parameter".
> 
> How about making it a writeable attribute, so it's easy to turn on/off
> after boot. Then you can keep it deterministic, userspace can play with
> the attribute at random if it wants to.
> 
> -- 
> MST

It is already controllable by an attribute in debugfs.

Will you email all the testers about this attribute? How many of them will 
remember to set it? How many of them will remember to set it a year after? 
Will you write a userspace program that manages it and introduce it into 
the distributon?

This is a little feature.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka April 26, 2018, 9:50 p.m. UTC | #30
On Thu, 26 Apr 2018, John Stoffel wrote:

> >>>>> "James" == James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> 
> James> I may be an atypical developer but I'd rather have a root canal
> James> than browse through menuconfig options.  The way to get people
> James> to learn about new debugging options is to blog about it (or
> James> write an lwn.net article) which google will find the next time
> James> I ask it how I debug XXX.  Google (probably as a service to
> James> humanity) rarely turns up Kconfig options in response to a
> James> query.
> 
> I agree with James here.  Looking at the SLAB vs SLUB Kconfig entries
> tells me *nothing* about why I should pick one or the other, as an
> example.
> 
> John

I see your point - and I think the misunderstanding is this.

This patch is not really helping people to debug existing crashes. It is 
not like "you get a crash" - "you google for some keywords" - "you get a 
page that suggests to turn this option on" - "you turn it on and solve the 
crash".

What this patch really does is that - it makes the kernel deliberately 
crash in a situation when the code violates the specification, but it 
would not crash otherwise or it would crash very rarely. It helps to 
detect specification violations.

If the kernel developer (or tester) doesn't use this option, his buggy 
code won't crash - and if it won't crash, he won't fix the bug or report 
it. How is the user or developer supposed to learn about this option, if 
he gets no crash at all?

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Michael S. Tsirkin April 26, 2018, 10:21 p.m. UTC | #31
On Thu, Apr 26, 2018 at 05:50:20PM -0400, Mikulas Patocka wrote:
> How is the user or developer supposed to learn about this option, if 
> he gets no crash at all?

Look in /sys/kernel/debug/fail* ? That actually lets you
filter by module, process etc.

I think this patch conflates two things:

1. Make kvmalloc use the vmalloc path.
    This seems a bit narrow.
    What is special about kvmalloc? IMHO nothing - it's yet another user
    of __GFP_NORETRY or __GFP_RETRY_MAYFAIL. As any such
    user, it either recovers correctly or not.
    So IMHO it's just a case of
    making __GFP_NORETRY, __GFP_RETRY_MAYFAIL, or both
    fail once in a while.
    Seems like a better extension to me than focusing on vmalloc.
    I think you will find more bugs this way.

2. Ability to control this from a separate config
   option.

   It's still not that clear to me why is this such a
   hard requirement.  If a distro wants to force specific
   boot time options, why isn't CONFIG_CMDLINE sufficient?

   But assuming it's important to control this kind of
   fault injection to be controlled from
   a dedicated menuconfig option, why not the rest of
   faults?

IMHO if you split 1/2 up, and generalize, the path upstream
will be much smoother.

Hope this helps.
Mikulas Patocka April 26, 2018, 10:52 p.m. UTC | #32
On Fri, 27 Apr 2018, Michael S. Tsirkin wrote:

> On Thu, Apr 26, 2018 at 05:50:20PM -0400, Mikulas Patocka wrote:
> > How is the user or developer supposed to learn about this option, if 
> > he gets no crash at all?
> 
> Look in /sys/kernel/debug/fail* ? That actually lets you
> filter by module, process etc.
> 
> I think this patch conflates two things:
> 
> 1. Make kvmalloc use the vmalloc path.
>     This seems a bit narrow.
>     What is special about kvmalloc? IMHO nothing - it's yet another user
>     of __GFP_NORETRY or __GFP_RETRY_MAYFAIL. As any such

__GFP_RETRY_MAYFAIL makes the allocator retry the costly_order allocations

>     user, it either recovers correctly or not.
>     So IMHO it's just a case of
>     making __GFP_NORETRY, __GFP_RETRY_MAYFAIL, or both
>     fail once in a while.
>     Seems like a better extension to me than focusing on vmalloc.
>     I think you will find more bugs this way.

If the array is <= PAGE_SIZE, vmalloc will not use __GFP_NORETRY. So it 
still hides some bugs - such as, if a structure grows above 4k, it would 
start randomly crashing due to memory fragmentation.

> 2. Ability to control this from a separate config
>    option.
> 
>    It's still not that clear to me why is this such a
>    hard requirement.  If a distro wants to force specific
>    boot time options, why isn't CONFIG_CMDLINE sufficient?

There are 489 kernel options declared with the __setup keyword. Hardly any 
kernel developer notices that a new one was added and selects it when 
testing his code.

>    But assuming it's important to control this kind of
>    fault injection to be controlled from
>    a dedicated menuconfig option, why not the rest of
>    faults?

The injected faults cause damage to the user, so there's no point to 
enable them by default. vmalloc fallback should not cause any damage 
(assuming that the code is correctly written).

> IMHO if you split 1/2 up, and generalize, the path upstream
> will be much smoother.

This seems like a lost case. So, let's not care about code correctness and 
let's solve crashes only after they are reported. If the upstream wants to 
work this way, there's nothing that can be done about it.

I'm wondering if I can still push it to RHEL or not.

> Hope this helps.
> 
> -- 
> MST

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Michal Hocko April 27, 2018, 8:25 a.m. UTC | #33
On Thu 26-04-18 18:52:05, Mikulas Patocka wrote:
> 
> 
> On Fri, 27 Apr 2018, Michael S. Tsirkin wrote:
[...]
> >    But assuming it's important to control this kind of
> >    fault injection to be controlled from
> >    a dedicated menuconfig option, why not the rest of
> >    faults?
> 
> The injected faults cause damage to the user, so there's no point to 
> enable them by default. vmalloc fallback should not cause any damage 
> (assuming that the code is correctly written).

But you want to find those bugs which would BUG_ON easier, so there is a
risk of harm IIUC and this is not much different than other fault
injecting paths.
Mikulas Patocka April 27, 2018, 10:20 a.m. UTC | #34
On Fri, 27 Apr 2018, Michal Hocko wrote:

> On Thu 26-04-18 18:52:05, Mikulas Patocka wrote:
> > 
> > 
> > On Fri, 27 Apr 2018, Michael S. Tsirkin wrote:
> [...]
> > >    But assuming it's important to control this kind of
> > >    fault injection to be controlled from
> > >    a dedicated menuconfig option, why not the rest of
> > >    faults?
> > 
> > The injected faults cause damage to the user, so there's no point to 
> > enable them by default. vmalloc fallback should not cause any damage 
> > (assuming that the code is correctly written).
> 
> But you want to find those bugs which would BUG_ON easier, so there is a
> risk of harm IIUC

Yes, I want to harm them, but I only want to harm the users using the 
debugging kernel. Testers should be "harmed" by crashes - so that the 
users of production kernels are harmed less.

If someone hits this, he should report it, use the kernel parameter to 
turn it off and continue with the testing.

> and this is not much different than other fault injecting paths.

Fault injections causes misbehavior even on completely bug-free code (for 
example, syscalls randomly returning -ENOMEM). This won't cause 
misbehavior on bug-free code.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
John Stoffel April 30, 2018, 6:27 p.m. UTC | #35
>>>>> "Mikulas" == Mikulas Patocka <mpatocka@redhat.com> writes:

Mikulas> On Thu, 26 Apr 2018, John Stoffel wrote:

>> >>>>> "James" == James Bottomley <James.Bottomley@HansenPartnership.com> writes:
>> 
James> I may be an atypical developer but I'd rather have a root canal
James> than browse through menuconfig options.  The way to get people
James> to learn about new debugging options is to blog about it (or
James> write an lwn.net article) which google will find the next time
James> I ask it how I debug XXX.  Google (probably as a service to
James> humanity) rarely turns up Kconfig options in response to a
James> query.
>> 
>> I agree with James here.  Looking at the SLAB vs SLUB Kconfig entries
>> tells me *nothing* about why I should pick one or the other, as an
>> example.
>> 
>> John

Mikulas> I see your point - and I think the misunderstanding is this.

Thanks.

Mikulas> This patch is not really helping people to debug existing crashes. It is 
Mikulas> not like "you get a crash" - "you google for some keywords" - "you get a 
Mikulas> page that suggests to turn this option on" - "you turn it on and solve the 
Mikulas> crash".

Mikulas> What this patch really does is that - it makes the kernel deliberately 
Mikulas> crash in a situation when the code violates the specification, but it 
Mikulas> would not crash otherwise or it would crash very rarely. It helps to 
Mikulas> detect specification violations.

Mikulas> If the kernel developer (or tester) doesn't use this option, his buggy 
Mikulas> code won't crash - and if it won't crash, he won't fix the bug or report 
Mikulas> it. How is the user or developer supposed to learn about this option, if 
Mikulas> he gets no crash at all?

So why do we make this a KConfig option at all?  Just turn it on and
let it rip.  Now I also think that Linus has the right idea to not
just sprinkle BUG_ONs into the code, just dump and oops and keep going
if you can.  If it's a filesystem or a device, turn it read only so
that people notice right away.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka April 30, 2018, 9:07 p.m. UTC | #36
On Mon, 30 Apr 2018, John Stoffel wrote:

> >>>>> "Mikulas" == Mikulas Patocka <mpatocka@redhat.com> writes:
> 
> Mikulas> On Thu, 26 Apr 2018, John Stoffel wrote:
> 
> Mikulas> I see your point - and I think the misunderstanding is this.
> 
> Thanks.
> 
> Mikulas> This patch is not really helping people to debug existing crashes. It is 
> Mikulas> not like "you get a crash" - "you google for some keywords" - "you get a 
> Mikulas> page that suggests to turn this option on" - "you turn it on and solve the 
> Mikulas> crash".
> 
> Mikulas> What this patch really does is that - it makes the kernel deliberately 
> Mikulas> crash in a situation when the code violates the specification, but it 
> Mikulas> would not crash otherwise or it would crash very rarely. It helps to 
> Mikulas> detect specification violations.
> 
> Mikulas> If the kernel developer (or tester) doesn't use this option, his buggy 
> Mikulas> code won't crash - and if it won't crash, he won't fix the bug or report 
> Mikulas> it. How is the user or developer supposed to learn about this option, if 
> Mikulas> he gets no crash at all?
> 
> So why do we make this a KConfig option at all?

Because other people see the KConfig option (so, they may enable it) and 
they don't see the kernel parameter (so, they won't enable it).

Close your eyes and say how many kernel parameters do you remember :-)

> Just turn it on and let it rip.

I can't test if all the networking drivers use kvmalloc properly, because 
I don't have the hardware. You can't test it neither. No one has all the 
hardware that is supported by Linux.

Driver issues can only be tested by a mass of users. And if the users 
don't know about the debugging option, they won't enable it.

> >> I agree with James here.  Looking at the SLAB vs SLUB Kconfig entries
> >> tells me *nothing* about why I should pick one or the other, as an
> >> example.

BTW. You can enable slub debugging either with CONFIG_SLUB_DEBUG_ON or 
with the kernel parameter "slub_debug" - and most users who compile their 
own kernel use CONFIG_SLUB_DEBUG_ON - just because it is visible.

> Now I also think that Linus has the right idea to not just sprinkle 
> BUG_ONs into the code, just dump and oops and keep going if you can.  
> If it's a filesystem or a device, turn it read only so that people 
> notice right away.

This vmalloc fallback is similar to CONFIG_DEBUG_KOBJECT_RELEASE. 
CONFIG_DEBUG_KOBJECT_RELEASE changes the behavior of kobject_put in order 
to cause deliberate crashes (that wouldn't happen otherwise) in drivers 
that misuse kobject_put. In the same sense, we want to cause deliberate 
crashes (that wouldn't happen otherwise) in drivers that misuse kvmalloc.

The crashes will only happen in debugging kernels, not in production 
kernels.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
John Stoffel May 2, 2018, 1:38 p.m. UTC | #37
>>>>> "Mikulas" == Mikulas Patocka <mpatocka@redhat.com> writes:

Mikulas> On Mon, 30 Apr 2018, John Stoffel wrote:

>> >>>>> "Mikulas" == Mikulas Patocka <mpatocka@redhat.com> writes:
>> 
Mikulas> On Thu, 26 Apr 2018, John Stoffel wrote:
>> 
Mikulas> I see your point - and I think the misunderstanding is this.
>> 
>> Thanks.
>> 
Mikulas> This patch is not really helping people to debug existing crashes. It is 
Mikulas> not like "you get a crash" - "you google for some keywords" - "you get a 
Mikulas> page that suggests to turn this option on" - "you turn it on and solve the 
Mikulas> crash".
>> 
Mikulas> What this patch really does is that - it makes the kernel deliberately 
Mikulas> crash in a situation when the code violates the specification, but it 
Mikulas> would not crash otherwise or it would crash very rarely. It helps to 
Mikulas> detect specification violations.
>> 
Mikulas> If the kernel developer (or tester) doesn't use this option, his buggy 
Mikulas> code won't crash - and if it won't crash, he won't fix the bug or report 
Mikulas> it. How is the user or developer supposed to learn about this option, if 
Mikulas> he gets no crash at all?
>> 
>> So why do we make this a KConfig option at all?

Mikulas> Because other people see the KConfig option (so, they may enable it) and 
Mikulas> they don't see the kernel parameter (so, they won't enable it).

Mikulas> Close your eyes and say how many kernel parameters do you remember :-)

>> Just turn it on and let it rip.

Mikulas> I can't test if all the networking drivers use kvmalloc properly, because 
Mikulas> I don't have the hardware. You can't test it neither. No one has all the 
Mikulas> hardware that is supported by Linux.

Mikulas> Driver issues can only be tested by a mass of users. And if the users 
Mikulas> don't know about the debugging option, they won't enable it.

>> >> I agree with James here.  Looking at the SLAB vs SLUB Kconfig entries
>> >> tells me *nothing* about why I should pick one or the other, as an
>> >> example.

Mikulas> BTW. You can enable slub debugging either with CONFIG_SLUB_DEBUG_ON or 
Mikulas> with the kernel parameter "slub_debug" - and most users who compile their 
Mikulas> own kernel use CONFIG_SLUB_DEBUG_ON - just because it is visible.

You miss my point, which is that there's no explanation of what the
difference is between SLAB and SLUB and which I should choose.  The
same goes here.  If the KConfig option doesn't give useful info, it's
useless.

>> Now I also think that Linus has the right idea to not just sprinkle 
>> BUG_ONs into the code, just dump and oops and keep going if you can.  
>> If it's a filesystem or a device, turn it read only so that people 
>> notice right away.

Mikulas> This vmalloc fallback is similar to
Mikulas> CONFIG_DEBUG_KOBJECT_RELEASE.  CONFIG_DEBUG_KOBJECT_RELEASE
Mikulas> changes the behavior of kobject_put in order to cause
Mikulas> deliberate crashes (that wouldn't happen otherwise) in
Mikulas> drivers that misuse kobject_put. In the same sense, we want
Mikulas> to cause deliberate crashes (that wouldn't happen otherwise)
Mikulas> in drivers that misuse kvmalloc.

Mikulas> The crashes will only happen in debugging kernels, not in
Mikulas> production kernels.

Says you.  What about people or distros that enable it
unconditionally?  They're going to get all kinds of reports and then
turn it off again.  Crashing the system isn't the answer here.  

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka May 3, 2018, 5:40 p.m. UTC | #38
On Wed, 2 May 2018, John Stoffel wrote:

> You miss my point, which is that there's no explanation of what the
> difference is between SLAB and SLUB and which I should choose.  The
> same goes here.  If the KConfig option doesn't give useful info, it's
> useless.

So what, we could write explamantion of that option.

> >> Now I also think that Linus has the right idea to not just sprinkle 
> >> BUG_ONs into the code, just dump and oops and keep going if you can.  
> >> If it's a filesystem or a device, turn it read only so that people 
> >> notice right away.
> 
> Mikulas> This vmalloc fallback is similar to
> Mikulas> CONFIG_DEBUG_KOBJECT_RELEASE.  CONFIG_DEBUG_KOBJECT_RELEASE
> Mikulas> changes the behavior of kobject_put in order to cause
> Mikulas> deliberate crashes (that wouldn't happen otherwise) in
> Mikulas> drivers that misuse kobject_put. In the same sense, we want
> Mikulas> to cause deliberate crashes (that wouldn't happen otherwise)
> Mikulas> in drivers that misuse kvmalloc.
> 
> Mikulas> The crashes will only happen in debugging kernels, not in
> Mikulas> production kernels.
> 
> Says you.  What about people or distros that enable it
> unconditionally?  They're going to get all kinds of reports and then
> turn it off again.  Crashing the system isn't the answer here.  

I've made that kvmalloc bug too (in the function 
dm_integrity_free_journal_scatterlist). I'd much rather like if the kernel 
crashed (because then - I would fix the bug). The kernel didn't crash and 
the bug sneaked into the official linux tree, where may be causing random 
crashes for other users.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

Index: linux-2.6/Documentation/fault-injection/fault-injection.txt
===================================================================
--- linux-2.6.orig/Documentation/fault-injection/fault-injection.txt	2018-04-16 21:08:34.000000000 +0200
+++ linux-2.6/Documentation/fault-injection/fault-injection.txt	2018-04-25 21:36:36.000000000 +0200
@@ -15,6 +15,12 @@  o fail_page_alloc
 
   injects page allocation failures. (alloc_pages(), get_free_pages(), ...)
 
+o kvmalloc_fallback
+
+  makes the function kvmalloc randomly fall back to vmalloc. This could be used
+  to detects bugs such as using DMA-API on the result of kvmalloc or freeing
+  the result of kvmalloc with free.
+
 o fail_futex
 
   injects futex deadlock and uaddr fault errors.
@@ -167,6 +173,7 @@  use the boot option:
 
 	failslab=
 	fail_page_alloc=
+	kvmalloc_fallback=
 	fail_make_request=
 	fail_futex=
 	mmc_core.fail_request=<interval>,<probability>,<space>,<times>
Index: linux-2.6/include/linux/fault-inject.h
===================================================================
--- linux-2.6.orig/include/linux/fault-inject.h	2018-04-16 21:08:36.000000000 +0200
+++ linux-2.6/include/linux/fault-inject.h	2018-04-25 21:38:22.000000000 +0200
@@ -31,17 +31,18 @@  struct fault_attr {
 	struct dentry *dname;
 };
 
-#define FAULT_ATTR_INITIALIZER {					\
+#define FAULT_ATTR_INITIALIZER(p) {					\
+		.probability = (p),					\
 		.interval = 1,						\
-		.times = ATOMIC_INIT(1),				\
+		.times = ATOMIC_INIT((p) ? -1 : 1),			\
+		.verbose = (p) ? 0 : 2,					\
 		.require_end = ULONG_MAX,				\
 		.stacktrace_depth = 32,					\
 		.ratelimit_state = RATELIMIT_STATE_INIT_DISABLED,	\
-		.verbose = 2,						\
 		.dname = NULL,						\
 	}
 
-#define DECLARE_FAULT_ATTR(name) struct fault_attr name = FAULT_ATTR_INITIALIZER
+#define DECLARE_FAULT_ATTR(name) struct fault_attr name = FAULT_ATTR_INITIALIZER(0)
 int setup_fault_attr(struct fault_attr *attr, char *str);
 bool should_fail(struct fault_attr *attr, ssize_t size);
 
Index: linux-2.6/lib/Kconfig.debug
===================================================================
--- linux-2.6.orig/lib/Kconfig.debug	2018-04-25 15:56:16.000000000 +0200
+++ linux-2.6/lib/Kconfig.debug	2018-04-25 21:39:45.000000000 +0200
@@ -1527,6 +1527,21 @@  config FAIL_PAGE_ALLOC
 	help
 	  Provide fault-injection capability for alloc_pages().
 
+config FAIL_KVMALLOC_FALLBACK_PROBABILITY
+	int "Default kvmalloc fallback probability"
+	depends on FAULT_INJECTION
+	range 0 100
+	default "0"
+	help
+	  This option will make kvmalloc randomly fall back to vmalloc.
+	  Normally, kvmalloc falls back to vmalloc only rarely, if memory
+	  is fragmented.
+
+	  This option helps to detect hard-to-reproduce driver bugs, for
+	  example using DMA API on the result of kvmalloc.
+
+	  The default may be overridden with the kvmalloc_fallback parameter.
+
 config FAIL_MAKE_REQUEST
 	bool "Fault-injection capability for disk IO"
 	depends on FAULT_INJECTION && BLOCK
Index: linux-2.6/mm/util.c
===================================================================
--- linux-2.6.orig/mm/util.c	2018-04-25 15:48:39.000000000 +0200
+++ linux-2.6/mm/util.c	2018-04-25 21:43:31.000000000 +0200
@@ -14,6 +14,7 @@ 
 #include <linux/hugetlb.h>
 #include <linux/vmalloc.h>
 #include <linux/userfaultfd_k.h>
+#include <linux/fault-inject.h>
 
 #include <asm/sections.h>
 #include <linux/uaccess.h>
@@ -377,6 +378,29 @@  unsigned long vm_mmap(struct file *file,
 }
 EXPORT_SYMBOL(vm_mmap);
 
+#ifdef CONFIG_FAULT_INJECTION
+
+static struct fault_attr kvmalloc_fallback =
+	FAULT_ATTR_INITIALIZER(CONFIG_FAIL_KVMALLOC_FALLBACK_PROBABILITY);
+
+static int __init setup_kvmalloc_fallback(char *str)
+{
+	return setup_fault_attr(&kvmalloc_fallback, str);
+}
+
+__setup("kvmalloc_fallback=", setup_kvmalloc_fallback);
+
+#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
+static int __init kvmalloc_fallback_debugfs_init(void)
+{
+	fault_create_debugfs_attr("kvmalloc_fallback", NULL, &kvmalloc_fallback);
+	return 0;
+}
+late_initcall(kvmalloc_fallback_debugfs_init);
+#endif
+
+#endif
+
 /**
  * kvmalloc_node - attempt to allocate physically contiguous memory, but upon
  * failure, fall back to non-contiguous (vmalloc) allocation.
@@ -404,6 +428,11 @@  void *kvmalloc_node(size_t size, gfp_t f
 	 */
 	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
 
+#ifdef CONFIG_FAULT_INJECTION
+	if (should_fail(&kvmalloc_fallback, size))
+		goto do_vmalloc;
+#endif
+
 	/*
 	 * We want to attempt a large physically contiguous block first because
 	 * it is less likely to fragment multiple larger blocks and therefore
@@ -427,6 +456,7 @@  void *kvmalloc_node(size_t size, gfp_t f
 	if (ret || size <= PAGE_SIZE)
 		return ret;
 
+do_vmalloc: __maybe_unused
 	return __vmalloc_node_flags_caller(size, node, flags,
 			__builtin_return_address(0));
 }
Index: linux-2.6/kernel/futex.c
===================================================================
--- linux-2.6.orig/kernel/futex.c	2018-02-14 20:24:42.000000000 +0100
+++ linux-2.6/kernel/futex.c	2018-04-25 21:11:33.000000000 +0200
@@ -288,7 +288,7 @@  static struct {
 
 	bool ignore_private;
 } fail_futex = {
-	.attr = FAULT_ATTR_INITIALIZER,
+	.attr = FAULT_ATTR_INITIALIZER(0),
 	.ignore_private = false,
 };
 
Index: linux-2.6/mm/failslab.c
===================================================================
--- linux-2.6.orig/mm/failslab.c	2018-04-16 21:08:36.000000000 +0200
+++ linux-2.6/mm/failslab.c	2018-04-25 21:11:40.000000000 +0200
@@ -9,7 +9,7 @@  static struct {
 	bool ignore_gfp_reclaim;
 	bool cache_filter;
 } failslab = {
-	.attr = FAULT_ATTR_INITIALIZER,
+	.attr = FAULT_ATTR_INITIALIZER(0),
 	.ignore_gfp_reclaim = true,
 	.cache_filter = false,
 };
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c	2018-04-16 21:08:36.000000000 +0200
+++ linux-2.6/mm/page_alloc.c	2018-04-25 21:11:47.000000000 +0200
@@ -3055,7 +3055,7 @@  static struct {
 	bool ignore_gfp_reclaim;
 	u32 min_order;
 } fail_page_alloc = {
-	.attr = FAULT_ATTR_INITIALIZER,
+	.attr = FAULT_ATTR_INITIALIZER(0),
 	.ignore_gfp_reclaim = true,
 	.ignore_gfp_highmem = true,
 	.min_order = 1,