Discussion:
Recovering filesystem with large number of orphaned inodes?
John D. Baker
2014-04-15 02:34:07 UTC
Permalink
I had occasion to deal with a system whose hardware RAID-5 had lost a
component and operated in degraded mode for some time. Following a
power failure, the machine (a DELL PowerEdge 2550, IIRC) refused to boot
from the degraded, but operational RAID.

The failed component was replaced and the RAID card's firmware utility
was used to reconstruct the logical volume. The machine then proceeded
to boot, but the filesystem check revealed very damaged filesystems.

Fortunately, the owner of the system had practiced good separation of
infrastructure vs application. I ultimately declared the OS filesystems
a total loss and installed a fresh 6.1_STABLE from sometime in the
middle of 2013 (and recently updated, then updated to i386-6.99.40).

The remaining filesystem is where the user's data resides. From my
previous attempts at salvaging the OS filesystems, I can expect more
orphaned files than can be referenced in a single "lost+found" directory.
From a brief perusal of "fsck_ffs" sourcecode, it appears that if the
"linkup()" routine returns 0 (zero) for any reason, the inode being
processed is simply cleared.

I'd like to give the best possible chance to recover data but I don't
really feel like having to approve 65534 reconnections. I'd like to use
the "-y" option, but have fsck exit if it can't attach the orphan file.
Then I can move the "lost+found" directory out of the way and start over
with a new one.

That seem reasonable?
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]mylinuxisp[flyspeck]com OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
Andy Ruhl
2014-04-15 03:09:33 UTC
Permalink
Post by John D. Baker
I had occasion to deal with a system whose hardware RAID-5 had lost a
component and operated in degraded mode for some time. Following a
power failure, the machine (a DELL PowerEdge 2550, IIRC) refused to boot
from the degraded, but operational RAID.
The failed component was replaced and the RAID card's firmware utility
was used to reconstruct the logical volume. The machine then proceeded
to boot, but the filesystem check revealed very damaged filesystems.
Fortunately, the owner of the system had practiced good separation of
infrastructure vs application. I ultimately declared the OS filesystems
a total loss and installed a fresh 6.1_STABLE from sometime in the
middle of 2013 (and recently updated, then updated to i386-6.99.40).
The remaining filesystem is where the user's data resides. From my
previous attempts at salvaging the OS filesystems, I can expect more
orphaned files than can be referenced in a single "lost+found" directory.
From a brief perusal of "fsck_ffs" sourcecode, it appears that if the
"linkup()" routine returns 0 (zero) for any reason, the inode being
processed is simply cleared.
I'd like to give the best possible chance to recover data but I don't
really feel like having to approve 65534 reconnections. I'd like to use
the "-y" option, but have fsck exit if it can't attach the orphan file.
Then I can move the "lost+found" directory out of the way and start over
with a new one.
That seem reasonable?
Sorry to go off topic a bit.

What is your goal? To recover newer stuff than the owner was able to
get back? In this case you probably need to go through every inode I
guess. What's it worth? At some point it becomes harder than the data
is worth... You probably know that.

I assume when you say "routine returns 0" means it can't link the
inode and it just removes it? Sorry, I didn't look at the code...

If you're not using -y, I guess you would have to script your way
through using -n and moving the bad stuff out of the way every time
you hit it, unless you have a better idea. I'd like to hear it.

Andy
Martin Husemann
2014-04-15 07:18:30 UTC
Permalink
Post by John D. Baker
I'd like to give the best possible chance to recover data but I don't
really feel like having to approve 65534 reconnections. I'd like to use
the "-y" option, but have fsck exit if it can't attach the orphan file.
Then I can move the "lost+found" directory out of the way and start over
with a new one.
Not helpful for you directly, but:
I wonder if we should modify fsck to stop linking there at (say) 0xf000 links
and start creating sub-directories where to link further inodes.

Martin
Brett Lymn
2014-04-15 09:47:34 UTC
Permalink
Post by Martin Husemann
Post by John D. Baker
I'd like to give the best possible chance to recover data but I don't
really feel like having to approve 65534 reconnections. I'd like to use
the "-y" option, but have fsck exit if it can't attach the orphan file.
Then I can move the "lost+found" directory out of the way and start over
with a new one.
I wonder if we should modify fsck to stop linking there at (say) 0xf000 links
and start creating sub-directories where to link further inodes.
The problem with doing that is that you are allocating blocks on a
broken file system at best it may fail in "interesting" ways at worst
you could stomp some of the data that you are trying to rescue. It is
too late now but common wisdom was when the filesystem was healthy to cd
into lost+found and allocate a lot of files and then remove them so the
directory has a lot of "pre-allocated" slots to create entries on.
--
Brett Lymn
Greg Troxel
2014-04-15 12:27:59 UTC
Permalink
From a brief perusal of "fsck_ffs" sourcecode, it appears that if the
"linkup()" routine returns 0 (zero) for any reason, the inode being
processed is simply cleared.
I'd like to give the best possible chance to recover data but I don't
really feel like having to approve 65534 reconnections. I'd like to use
the "-y" option, but have fsck exit if it can't attach the orphan file.
Then I can move the "lost+found" directory out of the way and start over
with a new one.
It seems like a bug for fsck to clear an inode that could have been
reattached if there's no space. Brett's point about writing to a
damaged fs is valid, but it seems better for fsck to error out rather
than delete data, at least by default.
Ottavio Caruso
2014-04-15 13:54:20 UTC
Permalink
Post by John D. Baker
I'd like to give the best possible chance to recover data but I don't
really feel like having to approve 65534 reconnections. I'd like to use
the "-y" option, but have fsck exit if it can't attach the orphan file.
Then I can move the "lost+found" directory out of the way and start over
with a new one.
I don't know it testdisk:
http://www.cgsecurity.org/wiki/TestDisk

can recover a ffs filesystem but I'd give it a try.
--
Ottavio
Loading...