diff mbox

HP ProLiant DL360p Gen8 hangs with Linux 4.13+.

Message ID 20180115121659.GA17687@ming.t460p (mailing list archive)
State Not Applicable
Headers show

Commit Message

Ming Lei Jan. 15, 2018, 12:17 p.m. UTC
On Sun, Jan 14, 2018 at 06:40:40PM -0500, Laurence Oberman wrote:
> On Thu, 2018-01-04 at 14:32 -0800, Vinson Lee wrote:
> > Hi.
> > 
> > HP ProLiant DL360p Gen8 with Smart Array P420i boots to the login
> > prompt and hangs with Linux 4.13 or later. I cannot log in on console
> > or SSH into the machine. Linux 4.12 and older boot fine.
> > 
> > 
> ...
> 
> ...
> 
> This issue bit me for for two straight days.
> I was testing Mike Snitzers combined tree and this commit crept into
> the latest combined tree.
> 
> commit 84676c1f21e8ff54befe985f4f14dc1edc10046b
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Fri Jan 12 10:53:05 2018 +0800
> 
>     genirq/affinity: assign vectors to all possible CPUs
>    
>     Currently we assign managed interrupt vectors to all present
> CPUs.  This
>     works fine for systems were we only online/offline CPUs.  But in
> case of
>     systems that support physical CPU hotplug (or the virtualized
> version of
>     it) this means the additional CPUs covered for in the ACPI tables
> or on
>     the command line are not catered for.  To fix this we'd either need
> to
>     introduce new hotplug CPU states just for this case, or we can
> start
>     assining vectors to possible but not present CPUs.
>    
>     Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
>     Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>     Tested-by: Stefan Haberland <sth@linux.vnet.ibm.com>
>     Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")
>     Cc: linux-kernel@vger.kernel.org
>     Cc: Thomas Gleixner <tglx@linutronix.de>
>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>     Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> Reason I never thought about this being my reason for the latest hang
> is I have used Linus' tree all the way to 4.15-rc7 with no issues.
> 
> Vinson reporting it against 4.13 or later was not making sense because
> I had not seen the hang until this weekend.
> 
> I checked  and its in Linus's tree but its not an issue in the generic
> 4.15-rc7 for me.

Hi Laurence,

Wrt. your issue, I have investigated a bit and found that it is because
one irq vector may be assigned to all offline CPUs, and it may not be
same with Vinson's.

And the following patch can address your issue, I may prepare a formal
version if no one objects this approach.

Thomas, Christoph, could you take a look this patch?

---
 kernel/irq/affinity.c | 69 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 47 insertions(+), 22 deletions(-)

Comments

Laurence Oberman Jan. 15, 2018, 12:51 p.m. UTC | #1
On Mon, 2018-01-15 at 20:17 +0800, Ming Lei wrote:
> On Sun, Jan 14, 2018 at 06:40:40PM -0500, Laurence Oberman wrote:
> > On Thu, 2018-01-04 at 14:32 -0800, Vinson Lee wrote:
> > > Hi.
> > > 
> > > HP ProLiant DL360p Gen8 with Smart Array P420i boots to the login
> > > prompt and hangs with Linux 4.13 or later. I cannot log in on
> > > console
> > > or SSH into the machine. Linux 4.12 and older boot fine.
> > > 
> > > 
> > 
> > ...
> > 
> > ...
> > 
> > This issue bit me for for two straight days.
> > I was testing Mike Snitzers combined tree and this commit crept
> > into
> > the latest combined tree.
> > 
> > commit 84676c1f21e8ff54befe985f4f14dc1edc10046b
> > Author: Christoph Hellwig <hch@lst.de>
> > Date:   Fri Jan 12 10:53:05 2018 +0800
> > 
> >     genirq/affinity: assign vectors to all possible CPUs
> >    
> >     Currently we assign managed interrupt vectors to all present
> > CPUs.  This
> >     works fine for systems were we only online/offline CPUs.  But
> > in
> > case of
> >     systems that support physical CPU hotplug (or the virtualized
> > version of
> >     it) this means the additional CPUs covered for in the ACPI
> > tables
> > or on
> >     the command line are not catered for.  To fix this we'd either
> > need
> > to
> >     introduce new hotplug CPU states just for this case, or we can
> > start
> >     assining vectors to possible but not present CPUs.
> >    
> >     Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
> >     Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
> >     Tested-by: Stefan Haberland <sth@linux.vnet.ibm.com>
> >     Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present
> > CPU")
> >     Cc: linux-kernel@vger.kernel.org
> >     Cc: Thomas Gleixner <tglx@linutronix.de>
> >     Signed-off-by: Christoph Hellwig <hch@lst.de>
> >     Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > 
> > Reason I never thought about this being my reason for the latest
> > hang
> > is I have used Linus' tree all the way to 4.15-rc7 with no issues.
> > 
> > Vinson reporting it against 4.13 or later was not making sense
> > because
> > I had not seen the hang until this weekend.
> > 
> > I checked  and its in Linus's tree but its not an issue in the
> > generic
> > 4.15-rc7 for me.
> 
> Hi Laurence,
> 
> Wrt. your issue, I have investigated a bit and found that it is
> because
> one irq vector may be assigned to all offline CPUs, and it may not be
> same with Vinson's.
> 
> And the following patch can address your issue, I may prepare a
> formal
> version if no one objects this approach.
> 
> Thomas, Christoph, could you take a look this patch?
> 
> ---
>  kernel/irq/affinity.c | 69 +++++++++++++++++++++++++++++++++++----
> ------------
>  1 file changed, 47 insertions(+), 22 deletions(-)
> 
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index a37a3b4b6342..dfc1f6a9c488 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -94,6 +94,39 @@ static int get_nodes_in_cpumask(cpumask_var_t
> *node_to_possible_cpumask,
>  	return nodes;
>  }
>  
> +/*
> + * Spread the affinity of @nmsk into @nr_vecs irq vectors, and the
> + * result is stored to @start_irqmsk.
> + */
> +static int irq_vecs_spread_affinity(struct cpumask *irqmsk,
> +				    int max_irqmsks,
> +				    struct cpumask *nmsk,
> +				    int max_ncpus)
> +{
> +	int v, ncpus;
> +	int vecs_to_assign, extra_vecs;
> +
> +	/* Calculate the number of cpus per vector */
> +	ncpus = cpumask_weight(nmsk);
> +	vecs_to_assign = min(max_ncpus, ncpus);
> +
> +	/* Account for rounding errors */
> +	extra_vecs = ncpus - vecs_to_assign * (ncpus /
> vecs_to_assign);
> +
> +	for (v = 0; v < min(max_irqmsks, vecs_to_assign); v++) {
> +		int cpus_per_vec = ncpus / vecs_to_assign;
> +
> +		/* Account for extra vectors to compensate rounding
> errors */
> +		if (extra_vecs) {
> +			cpus_per_vec++;
> +			--extra_vecs;
> +		}
> +		irq_spread_init_one(irqmsk + v, nmsk, cpus_per_vec);
> +	}
> +
> +	return v;
> +}
> +
>  /**
>   * irq_create_affinity_masks - Create affinity masks for multiqueue
> spreading
>   * @nvecs:	The total number of vectors
> @@ -104,7 +137,7 @@ static int get_nodes_in_cpumask(cpumask_var_t
> *node_to_possible_cpumask,
>  struct cpumask *
>  irq_create_affinity_masks(int nvecs, const struct irq_affinity
> *affd)
>  {
> -	int n, nodes, cpus_per_vec, extra_vecs, curvec;
> +	int n, nodes, curvec;
>  	int affv = nvecs - affd->pre_vectors - affd->post_vectors;
>  	int last_affv = affv + affd->pre_vectors;
>  	nodemask_t nodemsk = NODE_MASK_NONE;
> @@ -154,33 +187,25 @@ irq_create_affinity_masks(int nvecs, const
> struct irq_affinity *affd)
>  	}
>  
>  	for_each_node_mask(n, nodemsk) {
> -		int ncpus, v, vecs_to_assign, vecs_per_node;
> +		int vecs_per_node;
>  
>  		/* Spread the vectors per node */
>  		vecs_per_node = (affv - (curvec - affd-
> >pre_vectors)) / nodes;
>  
> -		/* Get the cpus on this node which are in the mask
> */
> -		cpumask_and(nmsk, cpu_possible_mask,
> node_to_possible_cpumask[n]);
>  
> -		/* Calculate the number of cpus per vector */
> -		ncpus = cpumask_weight(nmsk);
> -		vecs_to_assign = min(vecs_per_node, ncpus);
> -
> -		/* Account for rounding errors */
> -		extra_vecs = ncpus - vecs_to_assign * (ncpus /
> vecs_to_assign);
> -
> -		for (v = 0; curvec < last_affv && v <
> vecs_to_assign;
> -		     curvec++, v++) {
> -			cpus_per_vec = ncpus / vecs_to_assign;
> -
> -			/* Account for extra vectors to compensate
> rounding errors */
> -			if (extra_vecs) {
> -				cpus_per_vec++;
> -				--extra_vecs;
> -			}
> -			irq_spread_init_one(masks + curvec, nmsk,
> cpus_per_vec);
> -		}
> +		/* spread non-online possible cpus */
> +		cpumask_andnot(nmsk, node_to_possible_cpumask[n],
> cpu_online_mask);
> +		irq_vecs_spread_affinity(&masks[curvec], last_affv -
> curvec,
> +					 nmsk, vecs_per_node);
>  
> +		/*
> +		 * spread online possible cpus to make sure each
> vector
> +		 * can get one online cpu to handle
> +		 */
> +		cpumask_and(nmsk, node_to_possible_cpumask[n],
> cpu_online_mask);
> +		curvec += irq_vecs_spread_affinity(&masks[curvec],
> +						   last_affv -
> curvec,
> +						   nmsk,
> vecs_per_node);
>  		if (curvec >= last_affv)
>  			break;
>  		--nodes;
> -- 
> 2.9.5
> 
> 

Hello Ming

I will test the patch. I did not spend a lot of time seeing if this
weekends stalls were an exact match to Vinson, I just knew pulling that
patch resolved it.
Perhaps this explains why I was not seeing this on generic 4.15-rc7.

Thanks
Laurence
Christoph Hellwig Jan. 15, 2018, 3:01 p.m. UTC | #2
Laurence, I'm a little confused.  Is this the same issue we just fixed,
or is this an issue showing up with the fix?

E.g. what kernel versions or trees are affected?
Laurence Oberman Jan. 15, 2018, 4:25 p.m. UTC | #3
On Mon, 2018-01-15 at 07:01 -0800, Hellwig, Christoph wrote:
> Laurence, I'm a little confused.  Is this the same issue we just
> fixed,
> or is this an issue showing up with the fix?
> 
> E.g. what kernel versions or trees are affected?

Hello Christoph

This showed up on a  combined tree of Mikes and Jens (4.15.0-
rc4.block.dm.4.16) I was testing this weekend but was not apparent on
the generic upstream 4.15-rc7 from Linus.
I have to admit that was puzzling me.

When I removed your commit the issue went away.

Ming has crafted a fix so that your original commit can remain in and I
am testing that now against the same tree that was hanging before.

Ming has a handle on the issue so I will report back after testing.

Kernel is building now

Thanks
Laurence
diff mbox

Patch

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index a37a3b4b6342..dfc1f6a9c488 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -94,6 +94,39 @@  static int get_nodes_in_cpumask(cpumask_var_t *node_to_possible_cpumask,
 	return nodes;
 }
 
+/*
+ * Spread the affinity of @nmsk into @nr_vecs irq vectors, and the
+ * result is stored to @start_irqmsk.
+ */
+static int irq_vecs_spread_affinity(struct cpumask *irqmsk,
+				    int max_irqmsks,
+				    struct cpumask *nmsk,
+				    int max_ncpus)
+{
+	int v, ncpus;
+	int vecs_to_assign, extra_vecs;
+
+	/* Calculate the number of cpus per vector */
+	ncpus = cpumask_weight(nmsk);
+	vecs_to_assign = min(max_ncpus, ncpus);
+
+	/* Account for rounding errors */
+	extra_vecs = ncpus - vecs_to_assign * (ncpus / vecs_to_assign);
+
+	for (v = 0; v < min(max_irqmsks, vecs_to_assign); v++) {
+		int cpus_per_vec = ncpus / vecs_to_assign;
+
+		/* Account for extra vectors to compensate rounding errors */
+		if (extra_vecs) {
+			cpus_per_vec++;
+			--extra_vecs;
+		}
+		irq_spread_init_one(irqmsk + v, nmsk, cpus_per_vec);
+	}
+
+	return v;
+}
+
 /**
  * irq_create_affinity_masks - Create affinity masks for multiqueue spreading
  * @nvecs:	The total number of vectors
@@ -104,7 +137,7 @@  static int get_nodes_in_cpumask(cpumask_var_t *node_to_possible_cpumask,
 struct cpumask *
 irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 {
-	int n, nodes, cpus_per_vec, extra_vecs, curvec;
+	int n, nodes, curvec;
 	int affv = nvecs - affd->pre_vectors - affd->post_vectors;
 	int last_affv = affv + affd->pre_vectors;
 	nodemask_t nodemsk = NODE_MASK_NONE;
@@ -154,33 +187,25 @@  irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	}
 
 	for_each_node_mask(n, nodemsk) {
-		int ncpus, v, vecs_to_assign, vecs_per_node;
+		int vecs_per_node;
 
 		/* Spread the vectors per node */
 		vecs_per_node = (affv - (curvec - affd->pre_vectors)) / nodes;
 
-		/* Get the cpus on this node which are in the mask */
-		cpumask_and(nmsk, cpu_possible_mask, node_to_possible_cpumask[n]);
 
-		/* Calculate the number of cpus per vector */
-		ncpus = cpumask_weight(nmsk);
-		vecs_to_assign = min(vecs_per_node, ncpus);
-
-		/* Account for rounding errors */
-		extra_vecs = ncpus - vecs_to_assign * (ncpus / vecs_to_assign);
-
-		for (v = 0; curvec < last_affv && v < vecs_to_assign;
-		     curvec++, v++) {
-			cpus_per_vec = ncpus / vecs_to_assign;
-
-			/* Account for extra vectors to compensate rounding errors */
-			if (extra_vecs) {
-				cpus_per_vec++;
-				--extra_vecs;
-			}
-			irq_spread_init_one(masks + curvec, nmsk, cpus_per_vec);
-		}
+		/* spread non-online possible cpus */
+		cpumask_andnot(nmsk, node_to_possible_cpumask[n], cpu_online_mask);
+		irq_vecs_spread_affinity(&masks[curvec], last_affv - curvec,
+					 nmsk, vecs_per_node);
 
+		/*
+		 * spread online possible cpus to make sure each vector
+		 * can get one online cpu to handle
+		 */
+		cpumask_and(nmsk, node_to_possible_cpumask[n], cpu_online_mask);
+		curvec += irq_vecs_spread_affinity(&masks[curvec],
+						   last_affv - curvec,
+						   nmsk, vecs_per_node);
 		if (curvec >= last_affv)
 			break;
 		--nodes;