Monday, December 31, 2012

Cassandra Backup & Restore Procedure


Cassandra Backup & Restore Procedure


 Locations

o   commitlog_directory_location
Default out of the box location -  /var/lib/cassandra/commitlog

o   data_directory_location
Default out of the box location -  /var/lib/cassandra/data

o   Snapshot location
<data_directory_location>/<keyspace_name>/<columnfamilyName> /snapshots/<snapshot_name>

Commands

o   Snapshot  command (Basic)
nodetool snapshot
o   Snapshot  command (Advanced)
nodetool -h <host> -p <port> snapshot [keyspaces...] -cf [columnfamilyName] -t [snapshotName]
o   Clearing Snapshot Files (Basic)
nodetool clearsnapshot
o   Clearing Snapshot Files (Advanced)
nodetool -h <host> -p <port> clearsnapshot [keyspaces...] -t [snapshotName]

Backup procedure

o   Setup Parameter

change incremental_backups to true in cassandra.yaml
stop and start cassandra

o   Backup

nodetool snapshot [keyspaces...] -cf [columnfamilyName] -t [snapshotName]

Restore

To restore a node from a snapshot and incremental backups:
  1. Shut down the node to be restored ( in cluster - shutdown all nodes ).
  2. Clear all files the in <commitlog_directory_location>.
  3. Clear all *.db files in <keyspace_name>/<columnfamilyName>, but DO NOT delete the /snapshots and /backups subdirectories.
  4. If the restore is done to a new server – make sure to create all keyspaces & columnfamilyName directories the same as the source.
  5. Locate the most recent snapshot folder in <data_directory_location>/<keyspace_name>/<columnfamilyName>/snapshots/<snapshot_name>, and copy its contents into <data_directory_location>/<keyspace_name>/<columnfamilyName>.
  6. If using incremental backups as well, copy all contents of <data_directory_location>/<keyspace_name>/<columnfamilyName>/backups into <data_directory_location>/<keyspace_name>/<columnfamilyName>.
  7. Restart the node (in cluster - startup all nodes), keeping in mind that a temporary burst of I/O activity will consume a large amount of CPU resources.
  8. If the moved keyspace is new to the target host, you will need to write its information to the system keyspace (ie. Create the definition) this way:

On the source host:
Create a file called 'schema_export' and add the following lines inside:

use <keyspace_name>;
show schema;
then run this cassandra script and save the output in another file:
cassandra-cli –f schema_export > my.schema
Edit the file and remove non-command lines like "Connected to: "Test Cluster" on 127.0.0.1/9160".

On the target host:
cassandra-cli –f my.schema