diff mbox series

[v10,1/4] ipc: Allow boot time extension of IPCMNI from 32k to 8M

Message ID 1541432626-27780-2-git-send-email-longman@redhat.com (mailing list archive)
State New, archived
Headers show
Series ipc: Increase IPCMNI limit & IPC id generation modes | expand

Commit Message

Waiman Long Nov. 5, 2018, 3:43 p.m. UTC
The maximum number of unique System V IPC identifiers was limited to
32k.  That limit should be big enough for most use cases.

However, there are some users out there requesting for more, especially
those that are migrating from Solaris which uses 24 bits for unique
identifiers. To satisfy the need of those users, a new boot time kernel
option "ipcmni_extend" is added to extend the IPCMNI value to 8M. This
is a 256X increase which hopefully is big enough for them.

The use of this new option will change the pattern of the IPC identifiers
returned by functions like shmget(2). An application that depends on
such pattern may not work properly.  So it should only be used if the
users really need more than 32k of unique IPC numbers.

This new option does have the side effect of reducing the maximum number
of unique sequence numbers from 64k down to 256. So it is a trade-off.

The computation of a new IPC id is not done in the performance critical
path.  So a little bit of additional overhead shouldn't have any real
performance impact.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  3 ++
 ipc/ipc_sysctl.c                                | 12 ++++++-
 ipc/util.c                                      | 10 +++---
 ipc/util.h                                      | 44 ++++++++++++++++++++-----
 4 files changed, 54 insertions(+), 15 deletions(-)

Comments

Matthew Wilcox Nov. 6, 2018, 1:20 p.m. UTC | #1
On Mon, Nov 05, 2018 at 10:43:43AM -0500, Waiman Long wrote:
> The maximum number of unique System V IPC identifiers was limited to
> 32k.  That limit should be big enough for most use cases.
> 
> However, there are some users out there requesting for more, especially
> those that are migrating from Solaris which uses 24 bits for unique
> identifiers. To satisfy the need of those users, a new boot time kernel
> option "ipcmni_extend" is added to extend the IPCMNI value to 8M. This
> is a 256X increase which hopefully is big enough for them.

Why go to 23 bits when people are coming from systems with 24 bits?
Let's just go to 24 bits.  This happens to fit well with the underlying
data structure which uses 6 bits per layer of the tree.

> The use of this new option will change the pattern of the IPC identifiers
> returned by functions like shmget(2). An application that depends on
> such pattern may not work properly.  So it should only be used if the
> users really need more than 32k of unique IPC numbers.

Are there applications out there that rely on the internal structure of
the IPC identifiers?!

How about scrapping all this and just doing the following:

Allocate 24 bits of the ID cyclically.  Increment the top 7 bits of the
ID every time the cursor wraps.  That's not going to give us a perfect
progression from 0-2 billion, because it'll skip the ones in use.
But it'll ensure the ID isn't reused particularly quickly unless the
application is really using millions of IDs.
Waiman Long Nov. 8, 2018, 9:29 p.m. UTC | #2
On 11/06/2018 08:20 AM, Matthew Wilcox wrote:
> On Mon, Nov 05, 2018 at 10:43:43AM -0500, Waiman Long wrote:
>> The maximum number of unique System V IPC identifiers was limited to
>> 32k.  That limit should be big enough for most use cases.
>>
>> However, there are some users out there requesting for more, especially
>> those that are migrating from Solaris which uses 24 bits for unique
>> identifiers. To satisfy the need of those users, a new boot time kernel
>> option "ipcmni_extend" is added to extend the IPCMNI value to 8M. This
>> is a 256X increase which hopefully is big enough for them.
> Why go to 23 bits when people are coming from systems with 24 bits?
> Let's just go to 24 bits.  This happens to fit well with the underlying
> data structure which uses 6 bits per layer of the tree.

Sure. I can move it up to 24 bits leave 7 bits for the sequence number.

>
>> The use of this new option will change the pattern of the IPC identifiers
>> returned by functions like shmget(2). An application that depends on
>> such pattern may not work properly.  So it should only be used if the
>> users really need more than 32k of unique IPC numbers.
> Are there applications out there that rely on the internal structure of
> the IPC identifiers?!

That is a question that may not have a clear answer. Most applications
won't do that, but there are always some outliners that do crazy thing.
So you never know for sure.

>
> How about scrapping all this and just doing the following:
>
> Allocate 24 bits of the ID cyclically.  Increment the top 7 bits of the
> ID every time the cursor wraps.  That's not going to give us a perfect
> progression from 0-2 billion, because it'll skip the ones in use.
> But it'll ensure the ID isn't reused particularly quickly unless the
> application is really using millions of IDs.

Eric Biederman had sent out a patch somewhat like that before. Again,
there is a slight chance that it may break existing applications. So the
question is whether we are willing to take the risk. I won't mind if
upstream decide to go this route.

Cheers,
Longman
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b90fe3b..0449e0c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1795,6 +1795,9 @@ 
 	ip=		[IP_PNP]
 			See Documentation/filesystems/nfs/nfsroot.txt.
 
+	ipcmni_extend	[KNL] Extend the maximum number of unique System V
+			IPC identifiers from 32,768 to 8,388,608.
+
 	irqaffinity=	[SMP] Set the default irq affinity mask
 			The argument is a cpu list, as described above.
 
diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index 49f9bf4..73b7782 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -120,7 +120,8 @@  static int proc_ipc_sem_dointvec(struct ctl_table *table, int write,
 static int zero;
 static int one = 1;
 static int int_max = INT_MAX;
-static int ipc_mni = IPCMNI;
+int ipc_mni = IPCMNI;
+int ipc_mni_shift = IPCMNI_SHIFT;
 
 static struct ctl_table ipc_kern_table[] = {
 	{
@@ -246,3 +247,12 @@  static int __init ipc_sysctl_init(void)
 }
 
 device_initcall(ipc_sysctl_init);
+
+static int __init ipc_mni_extend(char *str)
+{
+	ipc_mni = IPCMNI_EXTEND;
+	ipc_mni_shift = IPCMNI_EXTEND_SHIFT;
+	pr_info("IPCMNI extended to %d.\n", ipc_mni);
+	return 0;
+}
+early_param("ipcmni_extend", ipc_mni_extend);
diff --git a/ipc/util.c b/ipc/util.c
index 0af0575..07ae117 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -110,7 +110,7 @@  static int __init ipc_init(void)
  * @ids: ipc identifier set
  *
  * Set up the sequence range to use for the ipc identifier range (limited
- * below IPCMNI) then initialise the keys hashtable and ids idr.
+ * below ipc_mni) then initialise the keys hashtable and ids idr.
  */
 void ipc_init_ids(struct ipc_ids *ids)
 {
@@ -226,7 +226,7 @@  static inline int ipc_idr_alloc(struct ipc_ids *ids, struct kern_ipc_perm *new)
 				0, GFP_NOWAIT);
 	}
 	if (idx >= 0)
-		new->id = SEQ_MULTIPLIER * new->seq + idx;
+		new->id = (new->seq << IPCMNI_SEQ_SHIFT) + idx;
 	return idx;
 }
 
@@ -254,8 +254,8 @@  int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int limit)
 	/* 1) Initialize the refcount so that ipc_rcu_putref works */
 	refcount_set(&new->refcount, 1);
 
-	if (limit > IPCMNI)
-		limit = IPCMNI;
+	if (limit > ipc_mni)
+		limit = ipc_mni;
 
 	if (ids->in_use >= limit)
 		return -ENOSPC;
@@ -738,7 +738,7 @@  static struct kern_ipc_perm *sysvipc_find_ipc(struct ipc_ids *ids, loff_t pos,
 	if (total >= ids->in_use)
 		return NULL;
 
-	for (; pos < IPCMNI; pos++) {
+	for (; pos < ipc_mni; pos++) {
 		ipc = idr_find(&ids->ipcs_idr, pos);
 		if (ipc != NULL) {
 			*new_pos = pos + 1;
diff --git a/ipc/util.h b/ipc/util.h
index d768fdb..640f916 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -15,8 +15,34 @@ 
 #include <linux/err.h>
 #include <linux/ipc_namespace.h>
 
-#define IPCMNI 32768  /* <= MAX_INT limit for ipc arrays (including sysctl changes) */
-#define SEQ_MULTIPLIER	(IPCMNI)
+/*
+ * The IPC ID contains 2 separate numbers - index and sequence number.
+ * By default,
+ *   bits  0-14: index (32k, 15 bits)
+ *   bits 15-30: sequence number (64k, 16 bits)
+ *
+ * When IPCMNI extension mode is turned on, the composition changes:
+ *   bits  0-22: index (8M, 23 bits)
+ *   bits 23-30: sequence number (256, 8 bits)
+ */
+#define IPCMNI_SHIFT		15
+#define IPCMNI_EXTEND_SHIFT	23
+#define IPCMNI			(1 << IPCMNI_SHIFT)
+#define IPCMNI_EXTEND		(1 << IPCMNI_EXTEND_SHIFT)
+
+#ifdef CONFIG_SYSVIPC_SYSCTL
+extern int ipc_mni;
+extern int ipc_mni_shift;
+
+#define IPCMNI_SEQ_SHIFT	ipc_mni_shift
+#define IPCMNI_IDX_MASK		((1 << ipc_mni_shift) - 1)
+
+#else /* CONFIG_SYSVIPC_SYSCTL */
+
+#define ipc_mni		IPCMNI
+#define IPCMNI_SEQ_SHIFT	IPCMNI_SHIFT
+#define IPCMNI_IDX_MASK		((1 << IPCMNI_SHIFT) - 1)
+#endif /* CONFIG_SYSVIPC_SYSCTL */
 
 void sem_init(void);
 void msg_init(void);
@@ -96,9 +122,9 @@  void __init ipc_init_proc_interface(const char *path, const char *header,
 #define IPC_MSG_IDS	1
 #define IPC_SHM_IDS	2
 
-#define ipcid_to_idx(id) ((id) % SEQ_MULTIPLIER)
-#define ipcid_to_seqx(id) ((id) / SEQ_MULTIPLIER)
-#define IPCID_SEQ_MAX min_t(int, INT_MAX/SEQ_MULTIPLIER, USHRT_MAX)
+#define ipcid_to_idx(id)  ((id) & IPCMNI_IDX_MASK)
+#define ipcid_to_seqx(id) ((id) >> IPCMNI_SEQ_SHIFT)
+#define IPCID_SEQ_MAX	  (INT_MAX >> IPCMNI_SEQ_SHIFT)
 
 /* must be called with ids->rwsem acquired for writing */
 int ipc_addid(struct ipc_ids *, struct kern_ipc_perm *, int);
@@ -123,8 +149,8 @@  static inline int ipc_get_maxidx(struct ipc_ids *ids)
 	if (ids->in_use == 0)
 		return -1;
 
-	if (ids->in_use == IPCMNI)
-		return IPCMNI - 1;
+	if (ids->in_use == ipc_mni)
+		return ipc_mni - 1;
 
 	return ids->max_idx;
 }
@@ -219,10 +245,10 @@  void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids,
 
 static inline int sem_check_semmni(struct ipc_namespace *ns) {
 	/*
-	 * Check semmni range [0, IPCMNI]
+	 * Check semmni range [0, ipc_mni]
 	 * semmni is the last element of sem_ctls[4] array
 	 */
-	return ((ns->sem_ctls[3] < 0) || (ns->sem_ctls[3] > IPCMNI))
+	return ((ns->sem_ctls[3] < 0) || (ns->sem_ctls[3] > ipc_mni))
 		? -ERANGE : 0;
 }