Hive is a data warehousing infrastructure based on Apache Hadoop.

Hive is not designed for online transaction processing. It is best used for traditional data warehousing tasks.

Getting Started

Set up Hadoop and Hive using Docker

Pull Hadoop Docker Image:

1
docker pull sequenceiq/hadoop-docker

Run a single-node Hadoop Container:

1
docker run -dit --name hadoop --privileged=true -p 50070:50070 -p 8088:8088  -p 9000:9000 sequenceiq/hadoop-docker /etc/bootstrap.sh -bash

Verify that Hadoop is running by executing Hadoop commands such as:

1
docker exec -it hadoop /bin/bash
1
PATH=$PATH:/usr/local/hadoop/bin/
1
hdfs dfs -ls /

Pull Hive Docker Image:

1
docker pull apache/hive:4.0.0-alpha-2

Export the Hive version:

1
export HIVE_VERSION=4.0.0-alpha-2

Run Hive Container:

1
docker run -d -p 10000:10000 -p 10002:10002 --env SERVICE_NAME=hiveserver2 --name hive4 --link <hadoop-container-name>:hadoop apache/hive:${HIVE_VERSION}

Connect to beeline

1
docker exec -it hiveserver2 beeline -u 'jdbc:hive2://hiveserver2:10000/'

Launch Standalone Metastore To use standalone Metastore with Derby

1
docker run -d -p 9083:9083 --env SERVICE_NAME=metastore --name metastore-standalone apache/hive:${HIVE_VERSION}

Install and Configure Hive

Reference

Quickstart with Hive

Hive Getting Started

Hive Tutorial