ocfs2: fix a panic problem caused by o2cb_ctl
diff mbox series

Message ID 00174f72-6e5f-e049-2144-d0bb177e24a5@huawei.com
State New
Headers show
Series
  • ocfs2: fix a panic problem caused by o2cb_ctl
Related show

Commit Message

Jia Guo Jan. 25, 2019, 6:05 a.m. UTC
In the process of creating a node, it will cause NULL pointer reference
in kernel if o2cb_ctl failed in the interval
(mkdir, o2cb_set_node_attribute(node_num)] in function o2cb_add_node.

The node num is initialized to 0 in function o2nm_node_group_make_item,
o2nm_node_group_drop_item will mistake the node number 0 for a
valid node number when we delete the node before the node number is set
correctly. If the local node number of the current host happens to be 0,
cluster->cl_local_node will be set to O2NM_INVALID_NODE_NUM while
o2hb_thread still running. The panic stack is generated as follows:

o2hb_thread
    \-o2hb_do_disk_heartbeat
        \-o2hb_check_own_slot
            |-slot = &reg->hr_slots[o2nm_this_node()];
            //o2nm_this_node() return O2NM_INVALID_NODE_NUM

We need to check whether the node number is set when we delete the node.

Signed-off-by: Jia Guo <guojia12@huawei.com>
---
 fs/ocfs2/cluster/nodemanager.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

Comments

Joseph Qi Jan. 25, 2019, 12:30 p.m. UTC | #1
On 19/1/25 14:05, Jia Guo wrote:
> In the process of creating a node, it will cause NULL pointer reference

NULL pointer dereference

> in kernel if o2cb_ctl failed in the interval
> (mkdir, o2cb_set_node_attribute(node_num)] in function o2cb_add_node.
> 
> The node num is initialized to 0 in function o2nm_node_group_make_item,
> o2nm_node_group_drop_item will mistake the node number 0 for a
> valid node number when we delete the node before the node number is set
> correctly. If the local node number of the current host happens to be 0,
> cluster->cl_local_node will be set to O2NM_INVALID_NODE_NUM while
> o2hb_thread still running. The panic stack is generated as follows:
> 
> o2hb_thread
>     \-o2hb_do_disk_heartbeat
>         \-o2hb_check_own_slot
>             |-slot = &reg->hr_slots[o2nm_this_node()];
>             //o2nm_this_node() return O2NM_INVALID_NODE_NUM
> 
> We need to check whether the node number is set when we delete the node.
> 
> Signed-off-by: Jia Guo <guojia12@huawei.com>
> ---
>  fs/ocfs2/cluster/nodemanager.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ocfs2/cluster/nodemanager.c b/fs/ocfs2/cluster/nodemanager.c
> index 0e4166c..992ccc0 100644
> --- a/fs/ocfs2/cluster/nodemanager.c
> +++ b/fs/ocfs2/cluster/nodemanager.c
> @@ -620,10 +620,16 @@ static void o2nm_node_group_drop_item(struct config_group *group,
>  {
>  	struct o2nm_node *node = to_o2nm_node(item);
>  	struct o2nm_cluster *cluster = to_o2nm_cluster(group->cg_item.ci_parent);
> +	int nd_num_set = 0;
> 
> -	o2net_disconnect_node(node);
> +	/* nd_num might be 0 if the node number hasn't been set.. */
> +	if (cluster->cl_nodes[node->nd_num] == node) {
> +		nd_num_set = 1;
> +
> +	if (nd_num_set)
> +		o2net_disconnect_node(node);
> 
> -	if (cluster->cl_has_local &&
> +	if (nd_num_set && cluster->cl_has_local &&
>  	    (cluster->cl_local_node == node->nd_num)) {
>  		cluster->cl_has_local = 0;
>  		cluster->cl_local_node = O2NM_INVALID_NODE_NUM;
> @@ -638,8 +644,7 @@ static void o2nm_node_group_drop_item(struct config_group *group,
>  	if (node->nd_ipv4_address)
>  		rb_erase(&node->nd_ip_node, &cluster->cl_node_ip_tree);

Why we have to this if node number hasn't been set?
If this is also under the condition, we may rewrite this patch like:
if (nd_num_set) {
	do_something();
}

Thanks,
Joseph

> 
> -	/* nd_num might be 0 if the node number hasn't been set.. */
> -	if (cluster->cl_nodes[node->nd_num] == node) {
> +	if (nd_num_set) {
>  		cluster->cl_nodes[node->nd_num] = NULL;
>  		clear_bit(node->nd_num, cluster->cl_nodes_bitmap);
>  	}
>
Jia Guo Jan. 26, 2019, 1:49 a.m. UTC | #2
On 2019/1/25 20:30, Joseph Qi wrote:
> 
> 
> On 19/1/25 14:05, Jia Guo wrote:
>> In the process of creating a node, it will cause NULL pointer reference
> 
> NULL pointer dereference
Fix it in patch v2.
> 
>> in kernel if o2cb_ctl failed in the interval
>> (mkdir, o2cb_set_node_attribute(node_num)] in function o2cb_add_node.
>>
>> The node num is initialized to 0 in function o2nm_node_group_make_item,
>> o2nm_node_group_drop_item will mistake the node number 0 for a
>> valid node number when we delete the node before the node number is set
>> correctly. If the local node number of the current host happens to be 0,
>> cluster->cl_local_node will be set to O2NM_INVALID_NODE_NUM while
>> o2hb_thread still running. The panic stack is generated as follows:
>>
>> o2hb_thread
>>     \-o2hb_do_disk_heartbeat
>>         \-o2hb_check_own_slot
>>             |-slot = &reg->hr_slots[o2nm_this_node()];
>>             //o2nm_this_node() return O2NM_INVALID_NODE_NUM
>>
>> We need to check whether the node number is set when we delete the node.
>>
>> Signed-off-by: Jia Guo <guojia12@huawei.com>
>> ---
>>  fs/ocfs2/cluster/nodemanager.c | 13 +++++++++----
>>  1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/ocfs2/cluster/nodemanager.c b/fs/ocfs2/cluster/nodemanager.c
>> index 0e4166c..992ccc0 100644
>> --- a/fs/ocfs2/cluster/nodemanager.c
>> +++ b/fs/ocfs2/cluster/nodemanager.c
>> @@ -620,10 +620,16 @@ static void o2nm_node_group_drop_item(struct config_group *group,
>>  {
>>  	struct o2nm_node *node = to_o2nm_node(item);
>>  	struct o2nm_cluster *cluster = to_o2nm_cluster(group->cg_item.ci_parent);
>> +	int nd_num_set = 0;
>>
>> -	o2net_disconnect_node(node);
>> +	/* nd_num might be 0 if the node number hasn't been set.. */
>> +	if (cluster->cl_nodes[node->nd_num] == node) {
>> +		nd_num_set = 1;
>> +
>> +	if (nd_num_set)
>> +		o2net_disconnect_node(node);
>>
>> -	if (cluster->cl_has_local &&
>> +	if (nd_num_set && cluster->cl_has_local &&
>>  	    (cluster->cl_local_node == node->nd_num)) {
>>  		cluster->cl_has_local = 0;
>>  		cluster->cl_local_node = O2NM_INVALID_NODE_NUM;
>> @@ -638,8 +644,7 @@ static void o2nm_node_group_drop_item(struct config_group *group,
>>  	if (node->nd_ipv4_address)
>>  		rb_erase(&node->nd_ip_node, &cluster->cl_node_ip_tree);
> 
> Why we have to this if node number hasn't been set?
> If this is also under the condition, we may rewrite this patch like:
> if (nd_num_set) {
> 	do_something();
> }
o2cb_ctl set attribute in the following order:
1、mkdir, create configfs item
2、set IP port attribute
3、set IP address attribute
4、set node number attribute
IP address is set before the node number, we need to do erase whether the node number
is set successfully or not.

This patch can still rewrite as you suggest, fix it in patch v2.

Thanks,
Jia
> 
> Thanks,
> Joseph
> 
>>
>> -	/* nd_num might be 0 if the node number hasn't been set.. */
>> -	if (cluster->cl_nodes[node->nd_num] == node) {
>> +	if (nd_num_set) {
>>  		cluster->cl_nodes[node->nd_num] = NULL;
>>  		clear_bit(node->nd_num, cluster->cl_nodes_bitmap);
>>  	}
>>
> .
>

Patch
diff mbox series

diff --git a/fs/ocfs2/cluster/nodemanager.c b/fs/ocfs2/cluster/nodemanager.c
index 0e4166c..992ccc0 100644
--- a/fs/ocfs2/cluster/nodemanager.c
+++ b/fs/ocfs2/cluster/nodemanager.c
@@ -620,10 +620,16 @@  static void o2nm_node_group_drop_item(struct config_group *group,
 {
 	struct o2nm_node *node = to_o2nm_node(item);
 	struct o2nm_cluster *cluster = to_o2nm_cluster(group->cg_item.ci_parent);
+	int nd_num_set = 0;

-	o2net_disconnect_node(node);
+	/* nd_num might be 0 if the node number hasn't been set.. */
+	if (cluster->cl_nodes[node->nd_num] == node) {
+		nd_num_set = 1;
+
+	if (nd_num_set)
+		o2net_disconnect_node(node);

-	if (cluster->cl_has_local &&
+	if (nd_num_set && cluster->cl_has_local &&
 	    (cluster->cl_local_node == node->nd_num)) {
 		cluster->cl_has_local = 0;
 		cluster->cl_local_node = O2NM_INVALID_NODE_NUM;
@@ -638,8 +644,7 @@  static void o2nm_node_group_drop_item(struct config_group *group,
 	if (node->nd_ipv4_address)
 		rb_erase(&node->nd_ip_node, &cluster->cl_node_ip_tree);

-	/* nd_num might be 0 if the node number hasn't been set.. */
-	if (cluster->cl_nodes[node->nd_num] == node) {
+	if (nd_num_set) {
 		cluster->cl_nodes[node->nd_num] = NULL;
 		clear_bit(node->nd_num, cluster->cl_nodes_bitmap);
 	}