伪分布式环境配置Hive

1. 相关概念

Hive Metastore有三种配置方式(根据元数据划分,元数据一般存储在关系型数据库里面),分别是:

  1. Embedded Metastore Database (Derby) 内嵌模式

    把元数据存在内嵌的Derby里面,不支持多会话连接

  2. Local Metastore Server 本地元存储(个人选择使用)

    将元数据存储在本地的mysql

  3. Remote Metastore Server 远程元存储

    将元数据独立出来,数据存储在远程的mysql里面,避免每个客户端都安装mysql

内嵌模式安装

主要步骤如下:

  • 安装mysql
  • 创建mysql账户
  • 创建hive元数据库
  • 下载mysql驱动包
  • 配置hive的相关文件
  • 启动hive shell命令
  • 查看元数据库信息

1)安装mysql

sudo apt-get install mysql-server mysql-client

2)创建hive账户

create user 'hive' identified by 'aa'

3)创建hive数据库,用以存放hive元数据

create database hive;

4)把hive数据授权给hive账户

grant all privileges on *.* to 'hive'@'localhost' identified by 'hive';

5)下载mysql驱动包

6)进入/home/aa/jike/apache-hive-1.2.2-bin/conf,基于模板添加hive的配置文件

aa@ubuntu:~$ cp hive-default.xml.template hive-site.xml
aa@ubuntu:~$ cp hive-default.xml.template hive-default.xml

运行hive结果报错,需要给hive添加hadoop路径权限

aa@ubuntu:~$ hive
Logging initialized using configuration in jar:file:/home/aa/jike/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Filesystem closed
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.io.IOException: Filesystem closed
    at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:323)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1057)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:554)
    at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:599)
    at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
    ... 7 more

添加权限

aa@ubuntu:~$ hadoop fs -chmod -R 777 /tmp/hive

再次尝试,如下则表示获取权限了

aa@ubuntu:~$ hadoop fs -ls /tmp/
Found 1 items
drwxrwxrwx   - aa supergroup          0 2017-05-06 23:27 /tmp/hive

再次运行`hive结果继续报错,URI配置有问题

aa@ubuntu:~$ hive
Logging initialized using configuration in jar:file:/home/aa/jike/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
    at org.apache.hadoop.fs.Path.initialize(Path.java:148)
    at org.apache.hadoop.fs.Path.<init>(Path.java:126)
    at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:563)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
    ... 7 more
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
    at java.net.URI.checkPath(URI.java:1823)
    at java.net.URI.<init>(URI.java:745)
    at org.apache.hadoop.fs.Path.initialize(Path.java:145)
    ... 10 more

tmp路径下创建hive目录,将所有的system:java.io.tmpdir以及${system:java.io.tmpdir}/${system:user.name} 替换为/home/aa/jike/tmp/hive

aa@ubuntu:~$ mkdir /home/aa/jike/tmp/hive

再次尝试,运行hive发现如下,hive已跑通,内嵌模式成功

aa@ubuntu:~$ hive
Logging initialized using configuration in jar:file:/home/simon/jike/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
hive>

本地元存储安装

即在内嵌模式完成的基础上,将连接改为mysql模式

配置/home/aa/jike/apache-hive-1.2.2-bin/conf/hive-site.xml 文件,修改ConnectionURL 和 DriverName

aa@ubuntu:~$ vim hive-site.xml

##修改ConnectionURL
##原文件
...
<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:derby:;databaseName=metastore_db;create=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>
 ...

## 修改为
...
 <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value> 
    <description>JDBC connect string for a JDBC metastore</description>
  </property>

 ...
 
 
 ##修改DriverName
 ##原文件
...
 <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>org.apache.derby.jdbc.EmbeddedDriver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
 ...

## 修改为
...
 <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql//localhost:3306/hive</value> 
    <description>JDBC connect string for a JDBC metastore</description>
  </property>

 ...
 
 
 ##修改ConnectionUserName
 ##原文件
...
<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>APP</value>
    <description>Username to use against metastore database</description>
  </property>
 ...

## 修改为
...
<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value> ##创建的数据库用户名
    <description>Username to use against metastore database</description>
  </property>

 ...
 
 ##修改ConnectionPassword
 ##原文件
...
 <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>mine</value>
    <description>password to use against metastore database</description>
  </property>
 ...

## 修改为
...
 <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value> ##创建的数据库密码
    <description>password to use against metastore database</description>
  </property>

 ...
 

再次尝试,运行hive发现如下,hive已跑通,内嵌模式成功

aa@ubuntu:~$ hive
Logging initialized using configuration in jar:file:/home/simon/jike/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
hive>

使用hive账户进入hive数据库里面,可以看到创建的表

mysql> show tables;
+---------------------------+
| Tables_in_hive            |
+---------------------------+
| BUCKETING_COLS            |
| CDS                       |
| COLUMNS_V2                |
| DATABASE_PARAMS           |
| DBS                       |
| FUNCS                     |
| FUNC_RU                   |
| GLOBAL_PRIVS              |
| PARTITIONS                |
| PARTITION_KEYS            |
| PARTITION_KEY_VALS        |
| PARTITION_PARAMS          |
| PART_COL_STATS            |
| ROLES                     |
| SDS                       |
| SD_PARAMS                 |
| SEQUENCE_TABLE            |
| SERDES                    |
| SERDE_PARAMS              |
| SKEWED_COL_NAMES          |
| SKEWED_COL_VALUE_LOC_MAP  |
| SKEWED_STRING_LIST        |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES             |
| SORT_COLS                 |
| TABLE_PARAMS              |
| TAB_COL_STATS             |
| TBLS                      |
| VERSION                   |
+---------------------------+
29 rows in set (0.00 sec)

参考资料

1.Ubuntu 16.04 mysql安装配置
2.百度传课教程-Hive 环境搭建
3.在Hadoop伪分布式模式下安装Hive(derby,mysql)

2017-05-07 22:54 13 技巧
Comments
Write a Comment