Hive is a data warehousing infrastructure based on Apache Hadoop.
Hive is not designed for online transaction processing. It is best used for traditional data warehousing tasks.
Getting Started
Set up Hadoop and Hive using Docker
Pull Hadoop Docker Image:
1
|
docker pull sequenceiq/hadoop-docker
|
Run a single-node Hadoop Container:
1
|
docker run -dit --name hadoop --privileged=true -p 50070:50070 -p 8088:8088 -p 9000:9000 sequenceiq/hadoop-docker /etc/bootstrap.sh -bash
|
Verify that Hadoop is running by executing Hadoop commands such as:
1
|
docker exec -it hadoop /bin/bash
|
1
|
PATH=$PATH:/usr/local/hadoop/bin/
|
Pull Hive Docker Image:
1
|
docker pull apache/hive:4.0.0-alpha-2
|
Export the Hive version:
1
|
export HIVE_VERSION=4.0.0-alpha-2
|
Run Hive Container:
1
|
docker run -d -p 10000:10000 -p 10002:10002 --env SERVICE_NAME=hiveserver2 --name hive4 --link <hadoop-container-name>:hadoop apache/hive:${HIVE_VERSION}
|
Connect to beeline
1
|
docker exec -it hiveserver2 beeline -u 'jdbc:hive2://hiveserver2:10000/'
|
Launch Standalone Metastore To use standalone Metastore with Derby
1
|
docker run -d -p 9083:9083 --env SERVICE_NAME=metastore --name metastore-standalone apache/hive:${HIVE_VERSION}
|
Reference
Quickstart with Hive
Hive Getting Started
Hive Tutorial