[09/15] ocfs2: add functions to add and remove inode in orphan dir
diff mbox

Message ID 548f65e2.GaTIVhOoT5YiOhC8%akpm@linux-foundation.org
State New, archived
Headers show

Commit Message

Andrew Morton Dec. 15, 2014, 10:51 p.m. UTC
From: Weiwei Wang <wangww631@huawei.com>
Subject: ocfs2: add functions to add and remove inode in orphan dir

Add functions to add inode to orphan dir and remove inode in orphan dir. 
Here we do not call ocfs2_prepare_orphan_dir and ocfs2_orphan_add
directly.  Because append O_DIRECT will add inode to orphan two and may
result in more than one orphan entry for the same inode.

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Weiwei Wang <wangww631@huawei.com>
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/ocfs2/journal.h     |    5 
 fs/ocfs2/namei.c       |  315 +++++++++++++++++++++++++++++++++++++++
 fs/ocfs2/namei.h       |    5 
 fs/ocfs2/ocfs2_fs.h    |    5 
 fs/ocfs2/ocfs2_trace.h |    3 
 5 files changed, 332 insertions(+), 1 deletion(-)

Comments

Mark Fasheh Dec. 18, 2014, 11:18 p.m. UTC | #1
Thanks once again for all of this by the way.

On Mon, Dec 15, 2014 at 02:51:14PM -0800, Andrew Morton wrote:
> From: Weiwei Wang <wangww631@huawei.com>
> Subject: ocfs2: add functions to add and remove inode in orphan dir
> 
> Add functions to add inode to orphan dir and remove inode in orphan dir. 
> Here we do not call ocfs2_prepare_orphan_dir and ocfs2_orphan_add
> directly.  Because append O_DIRECT will add inode to orphan two and may
> result in more than one orphan entry for the same inode.

Just so I understand -  your problem is that ocfs2_prepare_orphan_dir() will
EEXIST when it finds the inode already there.

The solution you have below has a couple problems, stemming from the fact
that it uses the same name for an inode that is orphaned because of
nlink == 0 as that of one that is orphaned because it is undergoing direct
IO.

Having identical names in a directory is always considered an error, even
for system directories like the orphan dir. In my honest opinion we should
not do that here either.

Instead you could use a prefix for those entries - something like "dio-".

If you modify ocfs2_prepare_orphan_dir() and __ocfs2_prepare_orphan_dir()
to pass down an optional prefix which is prepended to the name you don't need a
special dio prepare function. Then we don't need ocfs2_dio_prepare_orphan_dir()
any more.

In fact, the EEXIST check now becomes valid again and you don't need to
do all that extra work in ocfs2_dio_orphan_add() to figure out whether the
name is there because of nlink == 0 or we're doing dio (or both!).

Actually, it really becomes trivial to special case dio in
ocfs2_orphan_add() so I would suggest just adding a boolean argument and
doing the bits for dio there, thus allowing us to drop
ocfs2_dio_orphan_add() too.


I hope this all helps.
	--Mark

--
Mark Fasheh
Joseph Qi Dec. 19, 2014, 3:37 a.m. UTC | #2
Hi Mark,
I much appreciate your pertinent comments on this feature.

The reason we use orphan dir is to make sure allocating blocks and
updating inode size consistent, and this will result an entry in orphan
is not truly deleted one.
To distinguish dio from unlink, we introduce a new dinode flag
OCFS2_DIO_ORPHANED_FL.
So after this change, once unlink and dio happen concurrently, there
may be two kinds of results:
1. On the same node, only one entry will be added with flag
OCFS2_ORPHANED_FL, and flag OCFS2_DIO_ORPHANED_FL will be set and clear
during dio.
2. On different nodes, two entries will be added with each flag
OCFS2_ORPHANED_FL or OCFS2_DIO_ORPHANED_FL.

Since entry name is stringified from blkno and the length is fixed
OCFS2_ORPHAN_NAMELEN, so adding a prefix as you suggested may affect
too much.

On 2014/12/19 7:18, Mark Fasheh wrote:
> Thanks once again for all of this by the way.
> 
> On Mon, Dec 15, 2014 at 02:51:14PM -0800, Andrew Morton wrote:
>> From: Weiwei Wang <wangww631@huawei.com>
>> Subject: ocfs2: add functions to add and remove inode in orphan dir
>>
>> Add functions to add inode to orphan dir and remove inode in orphan dir. 
>> Here we do not call ocfs2_prepare_orphan_dir and ocfs2_orphan_add
>> directly.  Because append O_DIRECT will add inode to orphan two and may
>> result in more than one orphan entry for the same inode.
> 
> Just so I understand -  your problem is that ocfs2_prepare_orphan_dir() will
> EEXIST when it finds the inode already there.
> 
> The solution you have below has a couple problems, stemming from the fact
> that it uses the same name for an inode that is orphaned because of
> nlink == 0 as that of one that is orphaned because it is undergoing direct
> IO.
> 
> Having identical names in a directory is always considered an error, even
> for system directories like the orphan dir. In my honest opinion we should
> not do that here either.
> 
> Instead you could use a prefix for those entries - something like "dio-".
> 
> If you modify ocfs2_prepare_orphan_dir() and __ocfs2_prepare_orphan_dir()
> to pass down an optional prefix which is prepended to the name you don't need a
> special dio prepare function. Then we don't need ocfs2_dio_prepare_orphan_dir()
> any more.
> 
> In fact, the EEXIST check now becomes valid again and you don't need to
> do all that extra work in ocfs2_dio_orphan_add() to figure out whether the
> name is there because of nlink == 0 or we're doing dio (or both!).
> 
> Actually, it really becomes trivial to special case dio in
> ocfs2_orphan_add() so I would suggest just adding a boolean argument and
> doing the bits for dio there, thus allowing us to drop
> ocfs2_dio_orphan_add() too.
> 
> 
> I hope this all helps.
> 	--Mark
> 
> --
> Mark Fasheh
> 
> .
>
Mark Fasheh Dec. 19, 2014, 8:33 p.m. UTC | #3
On Fri, Dec 19, 2014 at 11:37:56AM +0800, Joseph Qi wrote:
> Hi Mark,
> I much appreciate your pertinent comments on this feature.
> 
> The reason we use orphan dir is to make sure allocating blocks and
> updating inode size consistent, and this will result an entry in orphan
> is not truly deleted one.

Right, I got that from the patch. Good idea by the way.


> To distinguish dio from unlink, we introduce a new dinode flag
> OCFS2_DIO_ORPHANED_FL.

Ok, that's also good.


> So after this change, once unlink and dio happen concurrently, there
> may be two kinds of results:
> 1. On the same node, only one entry will be added with flag
> OCFS2_ORPHANED_FL, and flag OCFS2_DIO_ORPHANED_FL will be set and clear
> during dio.
> 2. On different nodes, two entries will be added with each flag
> OCFS2_ORPHANED_FL or OCFS2_DIO_ORPHANED_FL.
> 
> Since entry name is stringified from blkno and the length is fixed
> OCFS2_ORPHAN_NAMELEN, so adding a prefix as you suggested may affect
> too much.

No, adding the prefix is much more preferable to intentionally corrupting
the conents of a directory. That OCFS2_ORPHAN_NAMELEN is a #define does not
matter, you are free to make it whatever reasonable value fits the name. The
directory code naturally handles this, whtether the length is
OCFS2_ORPHAN_NAMELEN or OCFS2_ORPHAN_NAMELEN + PREFIX_NAMELEN.

Also if you were to run fsck against the file system it will (should) complain
about having duplciate entries in the orphan dir.

Aside from the corruption issue, there are usability issues too. Someone
running "ls //orphan_dir:0000" will not be able to tell whether a name is
there for dio or nlink==0. If you prefix it, then debugging becomes much
easier.


Btw, did you see my note about a readonly superblock flag for this? We're
going to need to handle backward compatibility too.
	--Mark

--
Mark Fasheh
Joseph Qi Dec. 22, 2014, 1:04 a.m. UTC | #4
On 2014/12/20 4:33, Mark Fasheh wrote:
> On Fri, Dec 19, 2014 at 11:37:56AM +0800, Joseph Qi wrote:
>> Hi Mark,
>> I much appreciate your pertinent comments on this feature.
>>
>> The reason we use orphan dir is to make sure allocating blocks and
>> updating inode size consistent, and this will result an entry in orphan
>> is not truly deleted one.
> 
> Right, I got that from the patch. Good idea by the way.
> 
> 
>> To distinguish dio from unlink, we introduce a new dinode flag
>> OCFS2_DIO_ORPHANED_FL.
> 
> Ok, that's also good.
> 
> 
>> So after this change, once unlink and dio happen concurrently, there
>> may be two kinds of results:
>> 1. On the same node, only one entry will be added with flag
>> OCFS2_ORPHANED_FL, and flag OCFS2_DIO_ORPHANED_FL will be set and clear
>> during dio.
>> 2. On different nodes, two entries will be added with each flag
>> OCFS2_ORPHANED_FL or OCFS2_DIO_ORPHANED_FL.
>>
>> Since entry name is stringified from blkno and the length is fixed
>> OCFS2_ORPHAN_NAMELEN, so adding a prefix as you suggested may affect
>> too much.
> 
> No, adding the prefix is much more preferable to intentionally corrupting
> the conents of a directory. That OCFS2_ORPHAN_NAMELEN is a #define does not
> matter, you are free to make it whatever reasonable value fits the name. The
> directory code naturally handles this, whtether the length is
> OCFS2_ORPHAN_NAMELEN or OCFS2_ORPHAN_NAMELEN + PREFIX_NAMELEN.
> 
> Also if you were to run fsck against the file system it will (should) complain
> about having duplciate entries in the orphan dir.
> 
> Aside from the corruption issue, there are usability issues too. Someone
> running "ls //orphan_dir:0000" will not be able to tell whether a name is
> there for dio or nlink==0. If you prefix it, then debugging becomes much
> easier.
> 
Okay, I will try to take your advice and make a change.

> 
> Btw, did you see my note about a readonly superblock flag for this? We're
> going to need to handle backward compatibility too.
Yes. At my first glance, it may need a lot change to support feature ro compat
at my first glance. I need to investigate on it before do the actual change.
Thanks.

> 	--Mark
> 
> --
> Mark Fasheh
> 
> .
>
Mark Fasheh Dec. 22, 2014, 7:06 a.m. UTC | #5
On Mon, Dec 22, 2014 at 09:04:25AM +0800, Joseph Qi wrote:
> >> So after this change, once unlink and dio happen concurrently, there
> >> may be two kinds of results:
> >> 1. On the same node, only one entry will be added with flag
> >> OCFS2_ORPHANED_FL, and flag OCFS2_DIO_ORPHANED_FL will be set and clear
> >> during dio.
> >> 2. On different nodes, two entries will be added with each flag
> >> OCFS2_ORPHANED_FL or OCFS2_DIO_ORPHANED_FL.
> >>
> >> Since entry name is stringified from blkno and the length is fixed
> >> OCFS2_ORPHAN_NAMELEN, so adding a prefix as you suggested may affect
> >> too much.
> > 
> > No, adding the prefix is much more preferable to intentionally corrupting
> > the conents of a directory. That OCFS2_ORPHAN_NAMELEN is a #define does not
> > matter, you are free to make it whatever reasonable value fits the name. The
> > directory code naturally handles this, whtether the length is
> > OCFS2_ORPHAN_NAMELEN or OCFS2_ORPHAN_NAMELEN + PREFIX_NAMELEN.
> > 
> > Also if you were to run fsck against the file system it will (should) complain
> > about having duplciate entries in the orphan dir.
> > 
> > Aside from the corruption issue, there are usability issues too. Someone
> > running "ls //orphan_dir:0000" will not be able to tell whether a name is
> > there for dio or nlink==0. If you prefix it, then debugging becomes much
> > easier.
> > 
> Okay, I will try to take your advice and make a change.

Great, thank you.


> > Btw, did you see my note about a readonly superblock flag for this? We're
> > going to need to handle backward compatibility too.
> Yes. At my first glance, it may need a lot change to support feature ro compat
> at my first glance. I need to investigate on it before do the actual change.
> Thanks.

Kernel should be straight forward - if the flag isn't there just use the
existing fallback. Otherwise we can use the new direct_IO code.

>From memory, in ocfs2-tools you'll have to update mkfs to write the flag,
tunefs to turn it on/off for an existing filesystem and fsck to truncate the
direct-io orphans it finds.
	--Mark

--
Mark Fasheh
Joseph Qi Jan. 13, 2015, 8:36 a.m. UTC | #6
Hi Mark,

On 2014/12/22 15:06, Mark Fasheh wrote:
> On Mon, Dec 22, 2014 at 09:04:25AM +0800, Joseph Qi wrote:
>>>> So after this change, once unlink and dio happen concurrently, there
>>>> may be two kinds of results:
>>>> 1. On the same node, only one entry will be added with flag
>>>> OCFS2_ORPHANED_FL, and flag OCFS2_DIO_ORPHANED_FL will be set and clear
>>>> during dio.
>>>> 2. On different nodes, two entries will be added with each flag
>>>> OCFS2_ORPHANED_FL or OCFS2_DIO_ORPHANED_FL.
>>>>
>>>> Since entry name is stringified from blkno and the length is fixed
>>>> OCFS2_ORPHAN_NAMELEN, so adding a prefix as you suggested may affect
>>>> too much.
>>>
>>> No, adding the prefix is much more preferable to intentionally corrupting
>>> the conents of a directory. That OCFS2_ORPHAN_NAMELEN is a #define does not
>>> matter, you are free to make it whatever reasonable value fits the name. The
>>> directory code naturally handles this, whtether the length is
>>> OCFS2_ORPHAN_NAMELEN or OCFS2_ORPHAN_NAMELEN + PREFIX_NAMELEN.
>>>
>>> Also if you were to run fsck against the file system it will (should) complain
>>> about having duplciate entries in the orphan dir.
>>>
>>> Aside from the corruption issue, there are usability issues too. Someone
>>> running "ls //orphan_dir:0000" will not be able to tell whether a name is
>>> there for dio or nlink==0. If you prefix it, then debugging becomes much
>>> easier.
>>>
>> Okay, I will try to take your advice and make a change.
> 
> Great, thank you.
> 
> 
>>> Btw, did you see my note about a readonly superblock flag for this? We're
>>> going to need to handle backward compatibility too.
>> Yes. At my first glance, it may need a lot change to support feature ro compat
>> at my first glance. I need to investigate on it before do the actual change.
>> Thanks.
> 
> Kernel should be straight forward - if the flag isn't there just use the
> existing fallback. Otherwise we can use the new direct_IO code.
> 
>>From memory, in ocfs2-tools you'll have to update mkfs to write the flag,
> tunefs to turn it on/off for an existing filesystem and fsck to truncate the
> direct-io orphans it finds.
> 	--Mark
> 
For the comments about setting it to a ro compat feature, I still have
one more questions.
If I check this feature flag in ocfs2_direct_IO, that means the existing
formatted ocfs2 volume won't take this feature except we explicitly turn
it on using tunefs.
But I don't think there is any problem if feature is default on.
Do you mean we have to treat this feature as optional and let user
choose?

> --
> Mark Fasheh
> 
> .
>
Mark Fasheh Jan. 16, 2015, 10:56 p.m. UTC | #7
On Tue, Jan 13, 2015 at 04:36:14PM +0800, Joseph Qi wrote:
> Hi Mark,
> 
> On 2014/12/22 15:06, Mark Fasheh wrote:
> > On Mon, Dec 22, 2014 at 09:04:25AM +0800, Joseph Qi wrote:
> >>>> So after this change, once unlink and dio happen concurrently, there
> >>>> may be two kinds of results:
> >>>> 1. On the same node, only one entry will be added with flag
> >>>> OCFS2_ORPHANED_FL, and flag OCFS2_DIO_ORPHANED_FL will be set and clear
> >>>> during dio.
> >>>> 2. On different nodes, two entries will be added with each flag
> >>>> OCFS2_ORPHANED_FL or OCFS2_DIO_ORPHANED_FL.
> >>>>
> >>>> Since entry name is stringified from blkno and the length is fixed
> >>>> OCFS2_ORPHAN_NAMELEN, so adding a prefix as you suggested may affect
> >>>> too much.
> >>>
> >>> No, adding the prefix is much more preferable to intentionally corrupting
> >>> the conents of a directory. That OCFS2_ORPHAN_NAMELEN is a #define does not
> >>> matter, you are free to make it whatever reasonable value fits the name. The
> >>> directory code naturally handles this, whtether the length is
> >>> OCFS2_ORPHAN_NAMELEN or OCFS2_ORPHAN_NAMELEN + PREFIX_NAMELEN.
> >>>
> >>> Also if you were to run fsck against the file system it will (should) complain
> >>> about having duplciate entries in the orphan dir.
> >>>
> >>> Aside from the corruption issue, there are usability issues too. Someone
> >>> running "ls //orphan_dir:0000" will not be able to tell whether a name is
> >>> there for dio or nlink==0. If you prefix it, then debugging becomes much
> >>> easier.
> >>>
> >> Okay, I will try to take your advice and make a change.
> > 
> > Great, thank you.
> > 
> > 
> >>> Btw, did you see my note about a readonly superblock flag for this? We're
> >>> going to need to handle backward compatibility too.
> >> Yes. At my first glance, it may need a lot change to support feature ro compat
> >> at my first glance. I need to investigate on it before do the actual change.
> >> Thanks.
> > 
> > Kernel should be straight forward - if the flag isn't there just use the
> > existing fallback. Otherwise we can use the new direct_IO code.
> > 
> >>From memory, in ocfs2-tools you'll have to update mkfs to write the flag,
> > tunefs to turn it on/off for an existing filesystem and fsck to truncate the
> > direct-io orphans it finds.
> > 	--Mark
> > 
> For the comments about setting it to a ro compat feature, I still have
> one more questions.
> If I check this feature flag in ocfs2_direct_IO, that means the existing
> formatted ocfs2 volume won't take this feature except we explicitly turn
> it on using tunefs.

Correct.


> But I don't think there is any problem if feature is default on.
> Do you mean we have to treat this feature as optional and let user
> choose?

Yes it should be optional, though we can always have mkfs.ocfs2 turn it on by default.

If the user uses this, and the node crashes with inodes in the orphan dir
they might be deleted by another node with an ealier version of the fs code.
We can not allow that and therefore have to guard with an incompat bit.
	--Mark

--
Mark Fasheh

Patch
diff mbox

diff -puN fs/ocfs2/journal.h~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir fs/ocfs2/journal.h
--- a/fs/ocfs2/journal.h~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir
+++ a/fs/ocfs2/journal.h
@@ -472,6 +472,11 @@  static inline int ocfs2_unlink_credits(s
  * orphan dir index leaf */
 #define OCFS2_DELETE_INODE_CREDITS (3 * OCFS2_INODE_UPDATE_CREDITS + 4)
 
+/* dinode + orphan dir dinode + extent tree leaf block + orphan dir entry +
+ * orphan dir index root + orphan dir index leaf */
+#define OCFS2_INODE_ADD_TO_ORPHAN_CREDITS  (2 * OCFS2_INODE_UPDATE_CREDITS + 4)
+#define OCFS2_INODE_DEL_FROM_ORPHAN_CREDITS  OCFS2_INODE_ADD_TO_ORPHAN_CREDITS
+
 /* dinode update, old dir dinode update, new dir dinode update, old
  * dir dir entry, new dir dir entry, dir entry update for renaming
  * directory + target unlink + 3 x dir index leaves */
diff -puN fs/ocfs2/namei.c~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir fs/ocfs2/namei.c
--- a/fs/ocfs2/namei.c~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir
+++ a/fs/ocfs2/namei.c
@@ -81,6 +81,12 @@  static int ocfs2_prepare_orphan_dir(stru
 				    char *name,
 				    struct ocfs2_dir_lookup_result *lookup);
 
+static int ocfs2_dio_prepare_orphan_dir(struct ocfs2_super *osb,
+		struct inode **ret_orphan_dir,
+		u64 blkno,
+		char *name,
+		struct ocfs2_dir_lookup_result *lookup);
+
 static int ocfs2_orphan_add(struct ocfs2_super *osb,
 			    handle_t *handle,
 			    struct inode *inode,
@@ -89,6 +95,15 @@  static int ocfs2_orphan_add(struct ocfs2
 			    struct ocfs2_dir_lookup_result *lookup,
 			    struct inode *orphan_dir_inode);
 
+static int ocfs2_dio_orphan_add(struct ocfs2_super *osb,
+		handle_t *handle,
+		struct inode *inode,
+		struct buffer_head *fe_bh,
+		char *name,
+		struct ocfs2_dir_lookup_result *lookup,
+		struct inode *orphan_dir_inode,
+		bool orphaned);
+
 static int ocfs2_create_symlink_data(struct ocfs2_super *osb,
 				     handle_t *handle,
 				     struct inode *inode,
@@ -2137,6 +2152,51 @@  out:
 	return ret;
 }
 
+/**
+ * Copy from ocfs2_prepare_orphan_dir(). The difference:
+ * It will still lock orphan dir if entry exists.
+ * Caller must take care of -EEXIST and responsible for unlock.
+*/
+static int ocfs2_dio_prepare_orphan_dir(struct ocfs2_super *osb,
+				    struct inode **ret_orphan_dir,
+				    u64 blkno,
+				    char *name,
+				    struct ocfs2_dir_lookup_result *lookup)
+{
+	struct inode *orphan_dir_inode = NULL;
+	struct buffer_head *orphan_dir_bh = NULL;
+	int ret = 0;
+
+	ret = ocfs2_lookup_lock_orphan_dir(osb, &orphan_dir_inode,
+					   &orphan_dir_bh);
+	if (ret < 0) {
+		mlog_errno(ret);
+		return ret;
+	}
+
+	ret = __ocfs2_prepare_orphan_dir(orphan_dir_inode, orphan_dir_bh,
+					 blkno, name, lookup);
+	if (ret < 0 && ret != -EEXIST) {
+		mlog_errno(ret);
+		goto out;
+	}
+
+	*ret_orphan_dir = orphan_dir_inode;
+
+out:
+	brelse(orphan_dir_bh);
+
+	if (ret && ret != -EEXIST) {
+		ocfs2_inode_unlock(orphan_dir_inode, 1);
+		mutex_unlock(&orphan_dir_inode->i_mutex);
+		iput(orphan_dir_inode);
+	}
+
+	if (ret && ret != -EEXIST)
+		mlog_errno(ret);
+	return ret;
+}
+
 static int ocfs2_orphan_add(struct ocfs2_super *osb,
 			    handle_t *handle,
 			    struct inode *inode,
@@ -2226,6 +2286,100 @@  leave:
 	return status;
 }
 
+/**
+ * Copy from ocfs2_orphan_add, the difference:
+ * 1. Do not add entry if already added.
+ * 2. Update di flags OCFS2_DIO_ORPHANED_FL and record the
+ * orphan slot.
+*/
+static int ocfs2_dio_orphan_add(struct ocfs2_super *osb,
+			    handle_t *handle,
+			    struct inode *inode,
+			    struct buffer_head *fe_bh,
+			    char *name,
+			    struct ocfs2_dir_lookup_result *lookup,
+			    struct inode *orphan_dir_inode,
+			    bool orphaned)
+{
+	struct buffer_head *orphan_dir_bh = NULL;
+	int status = 0;
+	struct ocfs2_dinode *orphan_fe;
+	struct ocfs2_dinode *fe = (struct ocfs2_dinode *) fe_bh->b_data;
+
+	trace_ocfs2_dio_orphan_add_begin(
+				(unsigned long long)OCFS2_I(inode)->ip_blkno);
+
+	status = ocfs2_read_inode_block(orphan_dir_inode, &orphan_dir_bh);
+	if (status < 0) {
+		mlog_errno(status);
+		goto leave;
+	}
+
+	status = ocfs2_journal_access_di(handle,
+					 INODE_CACHE(orphan_dir_inode),
+					 orphan_dir_bh,
+					 OCFS2_JOURNAL_ACCESS_WRITE);
+	if (status < 0) {
+		mlog_errno(status);
+		goto leave;
+	}
+
+	/*
+	 * We're going to journal the change of i_flags and i_dio_orphaned_slot.
+	 * It's safe anyway, though some callers may duplicate the journaling.
+	 * Journaling within the func just make the logic look more
+	 * straightforward.
+	 */
+	status = ocfs2_journal_access_di(handle,
+					 INODE_CACHE(inode),
+					 fe_bh,
+					 OCFS2_JOURNAL_ACCESS_WRITE);
+	if (status < 0) {
+		mlog_errno(status);
+		goto leave;
+	}
+
+	/* we're a cluster, and nlink can change on disk from
+	 * underneath us... */
+	orphan_fe = (struct ocfs2_dinode *) orphan_dir_bh->b_data;
+	if (S_ISDIR(inode->i_mode))
+		ocfs2_add_links_count(orphan_fe, 1);
+	set_nlink(orphan_dir_inode, ocfs2_read_links_count(orphan_fe));
+	ocfs2_journal_dirty(handle, orphan_dir_bh);
+
+	/* It may already be orphaned by ocfs2_unlink/ocfs2_rename */
+	if (!orphaned) {
+		status = __ocfs2_add_entry(handle, orphan_dir_inode, name,
+				   OCFS2_ORPHAN_NAMELEN, inode,
+				   OCFS2_I(inode)->ip_blkno,
+				   orphan_dir_bh, lookup);
+		if (status < 0) {
+			mlog_errno(status);
+			goto rollback;
+		}
+	}
+
+	/* Update flag OCFS2_DIO_ORPHANED_FL and record the orphan slot */
+	fe->i_flags |= cpu_to_le32(OCFS2_DIO_ORPHANED_FL);
+	fe->i_dio_orphaned_slot = cpu_to_le16(osb->slot_num);
+
+	ocfs2_journal_dirty(handle, fe_bh);
+
+	trace_ocfs2_dio_orphan_add_end((unsigned long long)OCFS2_I(inode)->ip_blkno,
+				   osb->slot_num);
+
+rollback:
+	if (status < 0) {
+		if (S_ISDIR(inode->i_mode))
+			ocfs2_add_links_count(orphan_fe, -1);
+		set_nlink(orphan_dir_inode, ocfs2_read_links_count(orphan_fe));
+	}
+
+leave:
+	brelse(orphan_dir_bh);
+
+	return status;
+}
 /* unlike orphan_add, we expect the orphan dir to already be locked here. */
 int ocfs2_orphan_del(struct ocfs2_super *osb,
 		     handle_t *handle,
@@ -2500,6 +2654,167 @@  leave:
 	return status;
 }
 
+int ocfs2_add_inode_to_orphan(struct ocfs2_super *osb,
+	struct inode *inode)
+{
+	char orphan_name[OCFS2_ORPHAN_NAMELEN + 1];
+	struct inode *orphan_dir_inode = NULL;
+	struct ocfs2_dir_lookup_result orphan_insert = { NULL, };
+	struct buffer_head *di_bh = NULL;
+	int status = 0;
+	handle_t *handle = NULL;
+	struct ocfs2_dinode *di = NULL;
+	bool orphaned = false;
+
+	status = ocfs2_inode_lock(inode, &di_bh, 1);
+	if (status < 0) {
+		mlog_errno(status);
+		goto bail;
+	}
+
+	status = ocfs2_dio_prepare_orphan_dir(osb, &orphan_dir_inode,
+			OCFS2_I(inode)->ip_blkno,
+			orphan_name,
+			&orphan_insert);
+	if (status < 0 && status != -EEXIST) {
+		mlog_errno(status);
+		goto bail_unlock_inode;
+	} else if (status == -EEXIST) {
+		mlog(ML_NOTICE, "inode %llu already added to "
+				"orphan dir %llu.\n",
+				OCFS2_I(inode)->ip_blkno,
+				OCFS2_I(orphan_dir_inode)->ip_blkno);
+		di = (struct ocfs2_dinode *) di_bh->b_data;
+		if (!(di->i_flags & le32_to_cpu(OCFS2_ORPHANED_FL))) {
+			mlog_errno(status);
+			goto bail_unlock_orphan;
+		}
+		orphaned = true;
+	}
+
+	handle = ocfs2_start_trans(osb,
+			OCFS2_INODE_ADD_TO_ORPHAN_CREDITS);
+	if (IS_ERR(handle)) {
+		status = PTR_ERR(handle);
+		goto bail_unlock_orphan;
+	}
+
+	status = ocfs2_dio_orphan_add(osb, handle, inode, di_bh, orphan_name,
+			&orphan_insert, orphan_dir_inode, orphaned);
+	if (status)
+		mlog_errno(status);
+
+	ocfs2_commit_trans(osb, handle);
+
+bail_unlock_orphan:
+	ocfs2_inode_unlock(orphan_dir_inode, 1);
+	mutex_unlock(&orphan_dir_inode->i_mutex);
+	iput(orphan_dir_inode);
+
+	ocfs2_free_dir_lookup_result(&orphan_insert);
+
+bail_unlock_inode:
+	ocfs2_inode_unlock(inode, 1);
+	brelse(di_bh);
+
+bail:
+	return status;
+}
+
+int ocfs2_del_inode_from_orphan(struct ocfs2_super *osb,
+		struct inode *inode, int update_isize,
+		loff_t end)
+{
+	struct inode *orphan_dir_inode = NULL;
+	struct buffer_head *orphan_dir_bh = NULL;
+	struct buffer_head *di_bh = NULL;
+	struct ocfs2_dinode *di = NULL;
+	handle_t *handle = NULL;
+	int status = 0;
+
+	status = ocfs2_inode_lock(inode, &di_bh, 1);
+	if (status < 0) {
+		mlog_errno(status);
+		goto bail;
+	}
+	di = (struct ocfs2_dinode *) di_bh->b_data;
+
+	orphan_dir_inode = ocfs2_get_system_file_inode(osb,
+			ORPHAN_DIR_SYSTEM_INODE,
+			le16_to_cpu(di->i_dio_orphaned_slot));
+	if (!orphan_dir_inode) {
+		status = -EEXIST;
+		mlog_errno(status);
+		goto bail_unlock_inode;
+	}
+
+	mutex_lock(&orphan_dir_inode->i_mutex);
+	status = ocfs2_inode_lock(orphan_dir_inode, &orphan_dir_bh, 1);
+	if (status < 0) {
+		mutex_unlock(&orphan_dir_inode->i_mutex);
+		iput(orphan_dir_inode);
+		mlog_errno(status);
+		goto bail_unlock_inode;
+	}
+
+	handle = ocfs2_start_trans(osb,
+			OCFS2_INODE_DEL_FROM_ORPHAN_CREDITS);
+	if (IS_ERR(handle)) {
+		status = PTR_ERR(handle);
+		goto bail_unlock_orphan;
+	}
+
+	BUG_ON(!(di->i_flags & cpu_to_le32(OCFS2_DIO_ORPHANED_FL)));
+
+	/* Only delete entry if OCFS2_ORPHANED_FL not set, or
+	 * there are two entries added */
+	if (!(di->i_flags & cpu_to_le32(OCFS2_ORPHANED_FL)) ||
+			(di->i_flags & cpu_to_le32(OCFS2_ORPHANED_FL) &&
+			(di->i_orphaned_slot != di->i_dio_orphaned_slot))) {
+		status = ocfs2_orphan_del(osb, handle, orphan_dir_inode,
+				inode, orphan_dir_bh);
+		if (status < 0) {
+			mlog_errno(status);
+			goto bail_commit;
+		}
+	}
+
+	status = ocfs2_journal_access_di(handle,
+			INODE_CACHE(inode),
+			di_bh,
+			OCFS2_JOURNAL_ACCESS_WRITE);
+	if (status < 0) {
+		mlog_errno(status);
+		goto bail_commit;
+	}
+
+	di->i_flags &= ~cpu_to_le32(OCFS2_DIO_ORPHANED_FL);
+	di->i_dio_orphaned_slot = 0;
+
+	if (update_isize) {
+		status = ocfs2_set_inode_size(handle, inode, di_bh, end);
+		if (status)
+			mlog_errno(status);
+	} else
+		ocfs2_journal_dirty(handle, di_bh);
+
+bail_commit:
+	ocfs2_commit_trans(osb, handle);
+
+bail_unlock_orphan:
+	ocfs2_inode_unlock(orphan_dir_inode, 1);
+	mutex_unlock(&orphan_dir_inode->i_mutex);
+	brelse(orphan_dir_bh);
+	iput(orphan_dir_inode);
+
+bail_unlock_inode:
+	ocfs2_inode_unlock(inode, 1);
+	brelse(di_bh);
+
+bail:
+	return status;
+}
+
 int ocfs2_mv_orphaned_inode_to_new(struct inode *dir,
 				   struct inode *inode,
 				   struct dentry *dentry)
diff -puN fs/ocfs2/namei.h~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir fs/ocfs2/namei.h
--- a/fs/ocfs2/namei.h~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir
+++ a/fs/ocfs2/namei.h
@@ -38,6 +38,11 @@  int ocfs2_orphan_del(struct ocfs2_super
 int ocfs2_create_inode_in_orphan(struct inode *dir,
 				 int mode,
 				 struct inode **new_inode);
+int ocfs2_add_inode_to_orphan(struct ocfs2_super *osb,
+		struct inode *inode);
+int ocfs2_del_inode_from_orphan(struct ocfs2_super *osb,
+		struct inode *inode, int update_isize,
+		loff_t end);
 int ocfs2_mv_orphaned_inode_to_new(struct inode *dir,
 				   struct inode *new_inode,
 				   struct dentry *new_dentry);
diff -puN fs/ocfs2/ocfs2_fs.h~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir fs/ocfs2/ocfs2_fs.h
--- a/fs/ocfs2/ocfs2_fs.h~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir
+++ a/fs/ocfs2/ocfs2_fs.h
@@ -229,6 +229,7 @@ 
 #define OCFS2_CHAIN_FL		(0x00000400)	/* Chain allocator */
 #define OCFS2_DEALLOC_FL	(0x00000800)	/* Truncate log */
 #define OCFS2_QUOTA_FL		(0x00001000)	/* Quota file */
+#define OCFS2_DIO_ORPHANED_FL	(0X00002000)	/* On the orphan list especially for dio */
 
 /*
  * Flags on ocfs2_dinode.i_dyn_features
@@ -729,7 +730,9 @@  struct ocfs2_dinode {
 					   inode belongs to.  Only valid
 					   if allocated from a
 					   discontiguous block group */
-/*A0*/	__le64 i_reserved2[3];
+/*A0*/	__le16 i_dio_orphaned_slot;	/* only used for append dio write */
+	__le16 i_reserved1[3];
+	__le64 i_reserved2[2];
 /*B8*/	union {
 		__le64 i_pad1;		/* Generic way to refer to this
 					   64bit union */
diff -puN fs/ocfs2/ocfs2_trace.h~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir fs/ocfs2/ocfs2_trace.h
--- a/fs/ocfs2/ocfs2_trace.h~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir
+++ a/fs/ocfs2/ocfs2_trace.h
@@ -2373,6 +2373,9 @@  DEFINE_OCFS2_ULL_EVENT(ocfs2_orphan_add_
 
 DEFINE_OCFS2_ULL_UINT_EVENT(ocfs2_orphan_add_end);
 
+DEFINE_OCFS2_ULL_EVENT(ocfs2_dio_orphan_add_begin);
+DEFINE_OCFS2_ULL_UINT_EVENT(ocfs2_dio_orphan_add_end);
+
 TRACE_EVENT(ocfs2_orphan_del,
 	TP_PROTO(unsigned long long dir, const char *name, int namelen),
 	TP_ARGS(dir, name, namelen),