@@ -817,7 +817,19 @@ static bool dd_bio_merge(struct request_queue *q, struct bio *bio,
struct request *free = NULL;
bool ret;
- spin_lock(&dd->lock);
+ /*
+ * bio merging is called for every bio queued, and it's very easy
+ * to run into contention because of that. If we fail getting
+ * the dd lock, just skip this merge attempt. For related IO, the
+ * plug will be the successful merging point. If we get here, we
+ * already failed doing the obvious merge. Chances of actually
+ * getting a merge off this path is a lot slimmer, so skipping an
+ * occassional lookup that will most likely not succeed anyway should
+ * not be a problem.
+ */
+ if (!spin_trylock(&dd->lock))
+ return false;
+
ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free);
spin_unlock(&dd->lock);
We do several stages of merging in the block layer - the most likely one to work is also the cheap one, merging direct in the per-task plug when IO is submitted. Getting merges outside of that is a lot less likely, but IO schedulers may still maintain internal data structures to facilitate merge lookups outside of the plug. Make mq-deadline skip expensive merge lookups if the queue lock is already contended. The likelihood of getting a merge here is not very likely, hence it should not be a problem skipping the attempt in the also unlikely event that the queue is already contended. Perf diff shows the difference between a random read/write workload with 4 threads doing IO, with expensive merges turned on and off: 25.00% +61.94% [kernel.kallsyms] [k] queued_spin_lock_slowpath where we almost quadruple the lock contention by attempting these expensive merges. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- block/mq-deadline.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)