Database Administration: Starting PostgreSQL server after a HDD crash results in FAILED STATE

The real answer will be in the PostgreSQL logs, in /var/lib/pgsql/data/pg_log.

However, before you take any action: It is vital that you take a file system level copy of your database before attempting repair if any of your data is valuable to you. Seehttp://wiki.postgresql.org/wiki/Corruption . You must copy the whole data directory. On Fedora that's/var/lib/pgsql/data by default, but verify that's correct for your install.

Based on the logs you've posted you certainly have some degree of database corruption. The storage that the database is on (the hard drive or file system) is most likely damaged. Take a copy NOW, and put it on a different hard drive or system.

Only once you have made a full file-system level copy of your data directory, try using pg_resetxlog to clear the damaged transaction logs and start your database. Even if it starts it is highly likely to be corrupt; you should pg_dump it then re-initdb it and restore the dump to the fresh instance.

If you still can't start it after a pg_resetxlog then post an updated log of the startup attempt after resetxlog. It's possible you'll need to start Pg in stand-alone mode with:

sudo -u postgres postgres --single -D /var/lib/pgsql/data -P -f i postgres

If that works, giving you a backend> prompt, try again after replacing the last "postgres" with the name of the DB you want to connect to. You should be able to SELECT, COPY data from tables, etc.

If that doesn't work, ie you can't start a standalone backend, then it's probably time to restore from backups - since you're sensible enough to have them. If anyone else reading this is in the same position,contact an experienced PostgreSQL consultant to see if they can recover data from your database. Be prepared to pay for their time and expertise.

Your file system is probably damaged

The severity of the damage to the PostgreSQL install suggests that your whole file system is probably damaged. You may wish to consider restoring the whole system from a backup or reinstalling it.

I would not trust this file system, fsck or no fsck.

SMART-test your drive

I also recommend that you run a SMART check on your hard drive with smartctl from smartmontools; assuming it's /dev/hda that'd be smartctl -d ata -a /dev/sda | less . Look for a failed health test,uncorrectable_sectors, a high read error rate, a reallocated_sector_count of more than 2 or 3, or a non-zero current_pending_sector. Run smartctl -d ata -t long /dev/sda to execute a non-destructive self test on your HDD; it won't interrupt normal functioning of the system. When the estimated time has elapsed run smartctl -d ata /dev/sda again and look at the self test log to see if it passed.

If anything looks less than perfect, replace the drive.

In future, consider automating this testing via smartd for early warning of drive failures.

(Content in this post was obsoleted by updates to the question. If you're troubleshooting a similar problem, look at this answer's edit history).

Database Administration

Pages

Sunday, January 19, 2014

Starting PostgreSQL server after a HDD crash results in FAILED STATE

Your file system is probably damaged

SMART-test your drive

No comments:

Post a Comment

Translate Language

Learn Database Guide