diff mbox

[v3] blkcg: allocate struct blkcg_gq outside request queue spinlock

Message ID 20170304014005.23773-1-tahsin@google.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tahsin Erdogan March 4, 2017, 1:40 a.m. UTC
blkg_conf_prep() currently calls blkg_lookup_create() while holding
request queue spinlock. This means allocating memory for struct
blkcg_gq has to be made non-blocking. This causes occasional -ENOMEM
failures in call paths like below:

  pcpu_alloc+0x68f/0x710
  __alloc_percpu_gfp+0xd/0x10
  __percpu_counter_init+0x55/0xc0
  cfq_pd_alloc+0x3b2/0x4e0
  blkg_alloc+0x187/0x230
  blkg_create+0x489/0x670
  blkg_lookup_create+0x9a/0x230
  blkg_conf_prep+0x1fb/0x240
  __cfqg_set_weight_device.isra.105+0x5c/0x180
  cfq_set_weight_on_dfl+0x69/0xc0
  cgroup_file_write+0x39/0x1c0
  kernfs_fop_write+0x13f/0x1d0
  __vfs_write+0x23/0x120
  vfs_write+0xc2/0x1f0
  SyS_write+0x44/0xb0
  entry_SYSCALL_64_fastpath+0x18/0xad

In the code path above, percpu allocator cannot call vmalloc() due to
queue spinlock.

A failure in this call path gives grief to tools which are trying to
configure io weights. We see occasional failures happen shortly after
reboots even when system is not under any memory pressure. Machines
with a lot of cpus are more vulnerable to this condition.

Update blkg_create() function to temporarily drop the rcu and queue
locks when it is allowed by gfp mask.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
---
v3:
  Pushed down all blkg allocations into blkg_create()

v2:
  Moved blkg creation into blkg_lookup_create() to avoid duplicating
  blkg_lookup_create() logic.

 block/blk-cgroup.c         | 96 +++++++++++++++++++++++++++++-----------------
 include/linux/blk-cgroup.h | 20 ++++++++--
 2 files changed, 76 insertions(+), 40 deletions(-)

Comments

Tejun Heo March 4, 2017, 7:23 p.m. UTC | #1
Hello, Tahsin.

Generally looks good.  Please see below.

> @@ -184,24 +185,48 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
> +	if (unlikely(!wb_congested)) {
>  		ret = -ENOMEM;
>  		goto err_put_css;
> +	} else if (unlikely(!blkg)) {
> +		ret = -ENOMEM;
> +		goto err_put_congested;
>  	}

I'm not sure this error handling logic is correct.  We can have any
combo of alloc failure here, right?

> @@ -694,9 +695,20 @@ static inline bool blkcg_bio_issue_check(struct request_queue *q,
>  	blkg = blkg_lookup(blkcg, q);
>  	if (unlikely(!blkg)) {
>  		spin_lock_irq(q->queue_lock);
> -		blkg = blkg_lookup_create(blkcg, q);
> -		if (IS_ERR(blkg))
> -			blkg = NULL;
> +
> +		/*
> +		 * This could be the first entry point of blkcg implementation
> +		 * and we shouldn't allow anything to go through for a bypassing
> +		 * queue.
> +		 */
> +		if (likely(!blk_queue_bypass(q))) {
> +			blkg = blkg_lookup_create(blkcg, q,
> +						  GFP_NOWAIT | __GFP_NOWARN,
> +						  NULL);
> +			if (IS_ERR(blkg))
> +				blkg = NULL;
> +		}

Heh, this seems a bit too fragile.  I get that this covers both call
paths into lookup_create which needs the checking, but it seems a bit
too nasty to do it this way.  Would it be ugly if we factor out both
policy enabled and bypass tests into a helper and have inner
__blkg_lookup_create() which skips it?  We can call the same helper
from blkg_create() when @pol.  Sorry that this is involving so much
bikeshedding.

Thanks!
Tahsin Erdogan March 9, 2017, 7:59 a.m. UTC | #2
This is a good catch!

I will post a v5 of the patch shortly to the other email thread.

On Wed, Mar 8, 2017 at 9:25 PM, kernel test robot <xiaolong.ye@intel.com> wrote:
>
> FYI, we noticed the following commit:
>
> commit: ad63af3cb70378a7f780dbef2387a6d13e53a6c9 ("blkcg: allocate struct blkcg_gq outside request queue spinlock")
> url: https://github.com/0day-ci/linux/commits/Tahsin-Erdogan/blkcg-allocate-struct-blkcg_gq-outside-request-queue-spinlock/20170307-030921
> base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next
>
> in testcase: boot
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -m 512M
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> +----------------------------------------------------------------+------------+------------+
> |                                                                | 3695539290 | ad63af3cb7 |
> +----------------------------------------------------------------+------------+------------+
> | boot_successes                                                 | 12         | 12         |
> | boot_failures                                                  | 8          | 20         |
> | BUG:kernel_hang_in_test_stage                                  | 8          |            |
> | BUG:sleeping_function_called_from_invalid_context_at_mm/slab.h | 0          | 20         |
> +----------------------------------------------------------------+------------+------------+
>
>
>
> [   23.511528] BUG: sleeping function called from invalid context at mm/slab.h:408
> [   23.543085] in_atomic(): 1, irqs_disabled(): 0, pid: 130, name: udevd
> [   23.563283] CPU: 0 PID: 130 Comm: udevd Not tainted 4.10.0-rc7-00236-gad63af3 #1
> [   23.592056] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
> [   23.625535] Call Trace:
> [   23.638139]  dump_stack+0x63/0x8a
> [   23.652227]  ___might_sleep+0xd3/0x120
> [   23.667071]  __might_sleep+0x4a/0x80
> [   23.681760]  kmem_cache_alloc_node_trace+0x1d6/0x1f0
> [   23.699087]  blkg_alloc+0x4b/0x230
> [   23.713446]  blkg_lookup_create+0x3dc/0x610
> [   23.729228]  ? blkg_alloc+0x158/0x230
> [   23.744066]  blkcg_init_queue+0x62/0x100
> [   23.759658]  blk_alloc_queue_node+0x25a/0x2c0
> [   23.775866]  ? set_fdc+0x130/0x130 [floppy]
> [   23.791487]  blk_init_queue_node+0x20/0x60
> [   23.807233]  blk_init_queue+0x13/0x20
> [   23.822104]  floppy_module_init+0x234/0xee2 [floppy]
> [   23.838763]  ? vunmap_page_range+0x221/0x390
> [   23.853604]  ? set_cmos+0x68/0x68 [floppy]
> [   23.868345]  do_one_initcall+0x43/0x180
> [   23.883582]  ? __might_sleep+0x4a/0x80
> [   23.898621]  ? kmem_cache_alloc_trace+0x163/0x1b0
> [   23.915435]  do_init_module+0x5f/0x1f8
> [   23.930273]  load_module+0x149e/0x1b10
> [   23.945372]  ? __symbol_put+0x40/0x40
> [   23.960025]  ? kernel_read_file+0x1a3/0x1c0
> [   23.975740]  ? kernel_read_file_from_fd+0x49/0x80
> [   23.992559]  SYSC_finit_module+0xbc/0xf0
> [   24.007991]  SyS_finit_module+0xe/0x10
> [   24.022920]  entry_SYSCALL_64_fastpath+0x1a/0xa9
> [   24.039327] RIP: 0033:0x7f9380e1f4a9
> [   24.053934] RSP: 002b:00007ffc09ff8698 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
> [   24.082614] RAX: ffffffffffffffda RBX: 0000000000a64c20 RCX: 00007f9380e1f4a9
> [   24.103686] RDX: 0000000000000000 RSI: 00007f93810eb0aa RDI: 0000000000000007
> [   24.124749] RBP: 00007ffc09ff8690 R08: 0000000000000000 R09: 0000000000a63490
> [   24.146260] R10: 0000000000000007 R11: 0000000000000202 R12: 0000000000a64c20
> [   24.167489] R13: 0000000000000000 R14: 0000000000a5a2f0 R15: 0000000000a63490
> [   24.191631] parport_pc 00:04: reported by Plug and Play ACPI
> [   24.254724] piix4_smbus 0000:00:01.3: SMBus Host Controller at 0x700, revision 0
> [   24.344624] libata version 3.00 loaded.
> [   24.355991] ata_piix 0000:00:01.1: version 2.13
> [   24.380594] input: PC Speaker as /devices/platform/pcspkr/input/input4
> [   24.454024] scsi host0: ata_piix
> [   24.460472] scsi host1: ata_piix
> [   24.460962] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc080 irq 14
> [   24.460985] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc088 irq 15
> [   24.483223] Error: Driver 'pcspkr' is already registered, aborting...
> [   24.640449] ata2.01: NODEV after polling detection
> [   24.643436] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100
> [   24.649435] ata2.00: configured for MWDMA2
> [   24.653487] BUG: sleeping function called from invalid context at mm/slab.h:408
> [   24.653510] in_atomic(): 1, irqs_disabled(): 0, pid: 5, name: kworker/u2:0
> [   24.653515] CPU: 0 PID: 5 Comm: kworker/u2:0 Tainted: G        W       4.10.0-rc7-00236-gad63af3 #1
> [   24.653518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
> [   24.653548] Workqueue: events_unbound async_run_entry_fn
> [   24.653551] Call Trace:
> [   24.653584]  dump_stack+0x63/0x8a
> [   24.653609]  ___might_sleep+0xd3/0x120
> [   24.653611]  __might_sleep+0x4a/0x80
> [   24.653616]  kmem_cache_alloc_node_trace+0x1d6/0x1f0
> [   24.653643]  blkg_alloc+0x4b/0x230
> [   24.653647]  blkg_lookup_create+0x3dc/0x610
> [   24.653670]  ? blkg_alloc+0x158/0x230
> [   24.653674]  blkcg_init_queue+0x62/0x100
> [   24.653678]  blk_alloc_queue_node+0x25a/0x2c0
> [   24.653706]  scsi_alloc_queue+0x1e/0xf0
> [   24.653709]  scsi_alloc_sdev+0x1d3/0x300
> [   24.653712]  scsi_probe_and_add_lun+0x99d/0xe70
> [   24.653738]  ? rpm_resume+0x1a6/0x6e0
> [   24.653742]  ? __pm_runtime_resume+0x5b/0x90
> [   24.653745]  __scsi_add_device+0xff/0x110
> [   24.653876]  ata_scsi_scan_host+0xa3/0x1d0 [libata]
> [   24.653937]  async_port_probe+0x43/0x60 [libata]
> [   24.653941]  async_run_entry_fn+0x39/0x170
> [   24.653966]  process_one_work+0x1a3/0x480
> [   24.653970]  ? try_to_del_timer_sync+0x4b/0x60
> [   24.653973]  worker_thread+0x4e/0x4d0
> [   24.653997]  kthread+0x10c/0x140
> [   24.654000]  ? process_one_work+0x480/0x480
> [   24.654003]  ? kthread_create_on_node+0x40/0x40
> [   24.654028]  ret_from_fork+0x2c/0x40
> [   24.656370] scsi 1:0:0:0: CD-ROM            QEMU     QEMU DVD-ROM     2.5+ PQ: 0 ANSI: 5
> [   24.745810] ppdev: user-space parallel port driver
> [   24.849575] sr 1:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
> [   24.849599] cdrom: Uniform CD-ROM driver Revision: 3.20
> [   24.854112] sr 1:0:0:0: Attached scsi CD-ROM sr0
> [   25.653532] BUG: sleeping function called from invalid context at mm/slab.h:408
> [   25.653536] in_atomic(): 1, irqs_disabled(): 0, pid: 130, name: udevd
> [   25.653542] CPU: 0 PID: 130 Comm: udevd Tainted: G        W       4.10.0-rc7-00236-gad63af3 #1
> [   25.653564] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
> [   25.653566] Call Trace:
> [   25.653603]  dump_stack+0x63/0x8a
> [   25.653630]  ___might_sleep+0xd3/0x120
> [   25.653634]  __might_sleep+0x4a/0x80
> [   25.653639]  kmem_cache_alloc_node_trace+0x1d6/0x1f0
> [   25.653666]  blkg_alloc+0x4b/0x230
> [   25.653671]  blkg_lookup_create+0x3dc/0x610
> [   25.653695]  ? blkg_alloc+0x158/0x230
> [   25.653701]  blkcg_init_queue+0x62/0x100
> [   25.653726]  blk_alloc_queue_node+0x25a/0x2c0
> [   25.653760]  ? set_fdc+0x130/0x130 [floppy]
> [   25.653802]  blk_init_queue_node+0x20/0x60
> [   25.653827]  blk_init_queue+0x13/0x20
> [   25.653856]  floppy_module_init+0x234/0xee2 [floppy]
> [   25.653861]  ? vunmap_page_range+0x221/0x390
> [   25.653889]  ? set_cmos+0x68/0x68 [floppy]
> [   25.653895]  do_one_initcall+0x43/0x180
> [   25.653898]  ? __might_sleep+0x4a/0x80
> [   25.653923]  ? kmem_cache_alloc_trace+0x163/0x1b0
> [   25.653932]  do_init_module+0x5f/0x1f8
> [   25.653960]  load_module+0x149e/0x1b10
> [   25.653964]  ? __symbol_put+0x40/0x40
> [   25.653990]  ? kernel_read_file+0x1a3/0x1c0
> [   25.653995]  ? kernel_read_file_from_fd+0x49/0x80
> [   25.654020]  SYSC_finit_module+0xbc/0xf0
> [   25.654025]  SyS_finit_module+0xe/0x10
> [   25.654054]  entry_SYSCALL_64_fastpath+0x1a/0xa9
> [   25.654058] RIP: 0033:0x7f9380e1f4a9
> [   25.654060] RSP: 002b:00007ffc09ff8698 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
> [   25.654085] RAX: ffffffffffffffda RBX: 0000000000a64c20 RCX: 00007f9380e1f4a9
> [   25.654088] RDX: 0000000000000000 RSI: 00007f93810eb0aa RDI: 0000000000000007
> [   25.654090] RBP: 00007ffc09ff8690 R08: 0000000000000000 R09: 0000000000a63490
> [   25.654092] R10: 0000000000000007 R11: 0000000000000202 R12: 0000000000a64c20
> [   25.654115] R13: 0000000000000000 R14: 0000000000a5a2f0 R15: 0000000000a63490
> [   25.669580] Floppy drive(s): fd0 is 2.88M AMI BIOS
> [   25.691745] FDC 0 is a S82078B
> [   26.450439] parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
>
>
> To reproduce:
>
>         git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
>         cd lkp-tests
>         bin/lkp qemu -k <bzImage> job-script  # job-script is attached in this email
>
>
>
> Thanks,
> Xiaolong
diff mbox

Patch

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 295e98c2c8cc..9bc2b10f3b5a 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -164,16 +164,17 @@  struct blkcg_gq *blkg_lookup_slowpath(struct blkcg *blkcg,
 EXPORT_SYMBOL_GPL(blkg_lookup_slowpath);
 
 /*
- * If @new_blkg is %NULL, this function tries to allocate a new one as
- * necessary using %GFP_NOWAIT.  @new_blkg is always consumed on return.
+ * If gfp mask allows blocking, this function temporarily drops rcu and queue
+ * locks to allocate memory.
  */
 static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
-				    struct request_queue *q,
-				    struct blkcg_gq *new_blkg)
+				    struct request_queue *q, gfp_t gfp,
+				    const struct blkcg_policy *pol)
 {
-	struct blkcg_gq *blkg;
+	struct blkcg_gq *blkg = NULL;
 	struct bdi_writeback_congested *wb_congested;
 	int i, ret;
+	const bool drop_locks = gfpflags_allow_blocking(gfp);
 
 	WARN_ON_ONCE(!rcu_read_lock_held());
 	lockdep_assert_held(q->queue_lock);
@@ -184,24 +185,48 @@  static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
 		goto err_free_blkg;
 	}
 
+	if (drop_locks) {
+		spin_unlock_irq(q->queue_lock);
+		rcu_read_unlock();
+	}
+
 	wb_congested = wb_congested_get_create(q->backing_dev_info,
-					       blkcg->css.id,
-					       GFP_NOWAIT | __GFP_NOWARN);
-	if (!wb_congested) {
+					       blkcg->css.id, gfp);
+	blkg = blkg_alloc(blkcg, q, gfp);
+
+	if (drop_locks) {
+		rcu_read_lock();
+		spin_lock_irq(q->queue_lock);
+	}
+
+	if (unlikely(!wb_congested)) {
 		ret = -ENOMEM;
 		goto err_put_css;
+	} else if (unlikely(!blkg)) {
+		ret = -ENOMEM;
+		goto err_put_congested;
 	}
 
-	/* allocate */
-	if (!new_blkg) {
-		new_blkg = blkg_alloc(blkcg, q, GFP_NOWAIT | __GFP_NOWARN);
-		if (unlikely(!new_blkg)) {
-			ret = -ENOMEM;
+	blkg->wb_congested = wb_congested;
+
+	if (pol) {
+		WARN_ON(!drop_locks);
+
+		if (!blkcg_policy_enabled(q, pol)) {
+			ret = -EOPNOTSUPP;
+			goto err_put_congested;
+		}
+
+		/*
+		 * This could be the first entry point of blkcg implementation
+		 * and we shouldn't allow anything to go through for a bypassing
+		 * queue.
+		 */
+		if (unlikely(blk_queue_bypass(q))) {
+			ret = blk_queue_dying(q) ? -ENODEV : -EBUSY;
 			goto err_put_congested;
 		}
 	}
-	blkg = new_blkg;
-	blkg->wb_congested = wb_congested;
 
 	/* link parent */
 	if (blkcg_parent(blkcg)) {
@@ -250,7 +275,7 @@  static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
 err_put_css:
 	css_put(&blkcg->css);
 err_free_blkg:
-	blkg_free(new_blkg);
+	blkg_free(blkg);
 	return ERR_PTR(ret);
 }
 
@@ -258,31 +283,30 @@  static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
  * blkg_lookup_create - lookup blkg, try to create one if not there
  * @blkcg: blkcg of interest
  * @q: request_queue of interest
+ * @gfp: gfp mask
+ * @pol: blkcg policy (optional)
  *
  * Lookup blkg for the @blkcg - @q pair.  If it doesn't exist, try to
  * create one.  blkg creation is performed recursively from blkcg_root such
  * that all non-root blkg's have access to the parent blkg.  This function
  * should be called under RCU read lock and @q->queue_lock.
  *
+ * When gfp mask allows blocking, rcu and queue locks may be dropped for
+ * allocating memory. In this case, the locks will be reacquired on return.
+ *
  * Returns pointer to the looked up or created blkg on success, ERR_PTR()
  * value on error.  If @q is dead, returns ERR_PTR(-EINVAL).  If @q is not
  * dead and bypassing, returns ERR_PTR(-EBUSY).
  */
 struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
-				    struct request_queue *q)
+				    struct request_queue *q, gfp_t gfp,
+				    const struct blkcg_policy *pol)
 {
 	struct blkcg_gq *blkg;
 
 	WARN_ON_ONCE(!rcu_read_lock_held());
 	lockdep_assert_held(q->queue_lock);
 
-	/*
-	 * This could be the first entry point of blkcg implementation and
-	 * we shouldn't allow anything to go through for a bypassing queue.
-	 */
-	if (unlikely(blk_queue_bypass(q)))
-		return ERR_PTR(blk_queue_dying(q) ? -ENODEV : -EBUSY);
-
 	blkg = __blkg_lookup(blkcg, q, true);
 	if (blkg)
 		return blkg;
@@ -300,7 +324,7 @@  struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
 			parent = blkcg_parent(parent);
 		}
 
-		blkg = blkg_create(pos, q, NULL);
+		blkg = blkg_create(pos, q, gfp, pol);
 		if (pos == blkcg || IS_ERR(blkg))
 			return blkg;
 	}
@@ -789,6 +813,7 @@  int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
 {
 	struct gendisk *disk;
 	struct blkcg_gq *blkg;
+	struct request_queue *q;
 	struct module *owner;
 	unsigned int major, minor;
 	int key_len, part, ret;
@@ -812,18 +837,22 @@  int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
 		return -ENODEV;
 	}
 
+	q = disk->queue;
+
 	rcu_read_lock();
-	spin_lock_irq(disk->queue->queue_lock);
+	spin_lock_irq(q->queue_lock);
 
-	if (blkcg_policy_enabled(disk->queue, pol))
-		blkg = blkg_lookup_create(blkcg, disk->queue);
-	else
+	if (!blkcg_policy_enabled(q, pol))
 		blkg = ERR_PTR(-EOPNOTSUPP);
+	else if (unlikely(blk_queue_bypass(q)))
+		blkg = ERR_PTR(blk_queue_dying(q) ? -ENODEV : -EBUSY);
+	else
+		blkg = blkg_lookup_create(blkcg, q, GFP_KERNEL, pol);
 
 	if (IS_ERR(blkg)) {
 		ret = PTR_ERR(blkg);
 		rcu_read_unlock();
-		spin_unlock_irq(disk->queue->queue_lock);
+		spin_unlock_irq(q->queue_lock);
 		owner = disk->fops->owner;
 		put_disk(disk);
 		module_put(owner);
@@ -1065,14 +1094,9 @@  int blkcg_init_queue(struct request_queue *q)
 
 	preloaded = !radix_tree_preload(GFP_KERNEL);
 
-	/*
-	 * Make sure the root blkg exists and count the existing blkgs.  As
-	 * @q is bypassing at this point, blkg_lookup_create() can't be
-	 * used.  Open code insertion.
-	 */
 	rcu_read_lock();
 	spin_lock_irq(q->queue_lock);
-	blkg = blkg_create(&blkcg_root, q, new_blkg);
+	blkg = blkg_lookup_create(&blkcg_root, q, GFP_KERNEL, NULL);
 	spin_unlock_irq(q->queue_lock);
 	rcu_read_unlock();
 
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 01b62e7bac74..9dbe552c6ea0 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -172,7 +172,8 @@  extern struct cgroup_subsys_state * const blkcg_root_css;
 struct blkcg_gq *blkg_lookup_slowpath(struct blkcg *blkcg,
 				      struct request_queue *q, bool update_hint);
 struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
-				    struct request_queue *q);
+				    struct request_queue *q, gfp_t gfp,
+				    const struct blkcg_policy *pol);
 int blkcg_init_queue(struct request_queue *q);
 void blkcg_drain_queue(struct request_queue *q);
 void blkcg_exit_queue(struct request_queue *q);
@@ -694,9 +695,20 @@  static inline bool blkcg_bio_issue_check(struct request_queue *q,
 	blkg = blkg_lookup(blkcg, q);
 	if (unlikely(!blkg)) {
 		spin_lock_irq(q->queue_lock);
-		blkg = blkg_lookup_create(blkcg, q);
-		if (IS_ERR(blkg))
-			blkg = NULL;
+
+		/*
+		 * This could be the first entry point of blkcg implementation
+		 * and we shouldn't allow anything to go through for a bypassing
+		 * queue.
+		 */
+		if (likely(!blk_queue_bypass(q))) {
+			blkg = blkg_lookup_create(blkcg, q,
+						  GFP_NOWAIT | __GFP_NOWARN,
+						  NULL);
+			if (IS_ERR(blkg))
+				blkg = NULL;
+		}
+
 		spin_unlock_irq(q->queue_lock);
 	}