Take a Mandatory Snapshot of Hive Tables

Taking a snapshot of Hive tables is mandatory before upgrading. You also need to keep track of how many tables you have before upgrading for comparison after upgrading.

  1. In Ambari, go to Services/Hive/Configs, and check the value of hive.metastore.warehouse.dir to determine the location of the Hive warehouse, /apps/hive/warehouse by default.
  2. On any node in the cluster, as the HDFS superuser, enable snapshots.
    $ sudo su - hdfs
    $ hdfs dfsadmin -allowSnapshot /apps/hive/warehouse
    Allowing snapshot on /apps/hive/warehouse succeeded
  3. Create a snapshot of the Hive warehouse.
    $ hdfs dfs -createSnapshot /apps/hive/warehouse
    
    Output includes the name and location of the snapshot.
    Created snapshot /apps/hive/warehouse/.snapshot/s20181204-164645.898
    
  4. Start Hive as a user who has SELECT privileges on the tables.
    $ beeline beeline> !connect jdbc:hive2:// 
    Enter username for jdbc:hive2://: hive
    Enter password for jdbc:hive2://: *********
    Connected to: Apache Hive (version 1.2.1000.2.6.5.0-292)
    Driver: Hive JDBC (version 1.2.1000.2.6.5.0-292)
    
  5. Identify all tables outside /apps/hive/warehouse/.
    hive> USE my_database;
    hive> SHOW TABLES;
  6. Determine the location of each table using the DESCRIBE command. For example:
    hive> DESCRIBE FORMATTED my_table partition (dt='20201130');
  7. Create a snapshot of the directory shown in the location section of the output.
  8. Repeat steps 5-7 for each database and its tables outside /apps/hive/warehouse/.