大数据技术之HBase操作归纳
HBase基本命令总结表(实际操作方式)
进入Hbase:hbase shell
方式一:命令行窗口来操作HBase
1.通用性命令
version 版本信息
status 查看集群当前状态
whoami 查看登入者身份
help 帮助
2.HBase DDL操作(对象级操作)
2.1、namespace命名空间(相当于库)
# 1.【查看】已创建的【所有】命名空间列表
list_namespace
---------------------------
NAMESPACE
default
hbase
hbase_test
【test_hbase】
4 row(s)
Took 0.0631 seconds
---------------------------
# 2.【创建】命名空间
create_namespace "test_hbase"
# 3.【查看】【指定】命名空间(库)中的表
list_namespace_tables "test_hbase"
---------------------------
TABLE
0 row(s)
Took 0.0301 seconds
=> []
---------------------------
# 4.【描述】命名空间的定义
describe_namespace "test_hbase"
---------------------------
DESCRIPTION
{NAME => 'test_hbase'}
Quota is disabled
---------------------------
# 5.【删除】命名空间
drop_namespace "test_hbase"
2.2、Table表
# 1.查看所有表
list
---------------------------
TABLE
hbase_test:student_info
1 row(s)
Took 0.0202 seconds
=> ["hbase_test:student_info"]
---------------------------
# 2.表是否存在
exists "test_hbase:test_table"
---------------------------
Table test_hbase:test_table does exist
Took 0.0114 seconds
=> true
---------------------------
# 3.创建表
1.完整写法:
create "test_hbase:test_table",{NAME => 'base', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'TRUE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'},{NAME => 'sources', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false', VERSIONS => '3', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '655360', REPLICATION_SCOPE => '0'}
说明文字:
BLOOMFILTER布隆过滤器有三个参数=>ROW,ROWCOL,NONE
ROW:只对行键进行BLOOMFILTER检测 => 分裂策略
ROWCOL:行健和列键进行BLOOMFILTER检测
NONE:不使用BLOOMFILTER,默认值为ROW
TTL:TTL的值以秒为单位
2.简单写法:✔
create "test_hbase:test_table","base","sources"
# 4.查看表的定义
desc "test_hbase:test_table"
---------------------------
Table test_hbase:test_table is ENABLED
test_hbase:test_table
COLUMN FAMILIES DESCRIPTION
{NAME => 'base', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DE
LETED_CELLS => 'TRUE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => '
FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATIO
N_SCOPE => '0'}
{NAME => 'sources', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false', VERSIONS => '3', K
EEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', T
TL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '655360', RE
PLICATION_SCOPE => '0'}
---------------------------
# 5.查看表的状态
is_enabled "test_hbase:test_table" # 是否已启用
is_disabled "test_hbase:test_table" # 是否已禁用
enable "test_hbase:test_table" # 启用表
disable "test_hbase:test_table" # 禁用表
# 6.删除表【禁用状态的表才可以删除】
disable "test_hbase:test_table"
drop "test_hbase:test_table"
3.HBase DML操作(数据级操作)
# 1.添加数据=>列插入【一个put只能插入一列】
语法:put "表名","行键","列族:新增的信息","内容"
案例:【单】插入
put "test_hbase:test_table","1","base:name","胡桃"
put "test_hbase:test_table","1","base:age",17
put "test_hbase:test_table","1","base:gender","女"
put "test_hbase:test_table","1","sources:English",82
put "test_hbase:test_table","1","sources:Math",90
# 2.查看全表数据【全表扫描】
scan "test_hbase:test_table"
---------------------------
ROW COLUMN+CELL
1 column=base:age, timestamp=2024-03-07T15:07:10.339, value=17
1 column=base:gender, timestamp=2024-03-07T15:07:14.510, value=\
xE5\xA5\xB3
1 column=base:name, timestamp=2024-03-07T15:07:06.009, value=\xE
8\x83\xA1\xE6\xA1\x83
1 column=sources:English, timestamp=2024-03-07T15:07:17.987, val
ue=86
1 column=sources:Math, timestamp=2024-03-07T15:07:21.874, value=
97
---------------------------
# 3.查看表中记录数【行数】
count "test_hbase:test_table"
---------------------------
1 row(s)
Took 0.0194 seconds
=> 1
---------------------------
# 4.查看某列值
4.1、查一行
get "test_hbase:test_table","1"
---------------------------
COLUMN CELL
base:age timestamp=2024-03-07T15:36:03.061, value=17
base:gender timestamp=2024-03-07T15:36:03.115, value=\xE5\xA5\xB3
base:name timestamp=2024-03-07T15:36:03.001, value=\xE8\x83\xA1\xE6\xA1\
x83
sources:English timestamp=2024-03-07T15:36:03.156, value=82
sources:Math timestamp=2024-03-07T15:36:03.192, value=90
---------------------------
4.2、查一行一个列族
get "test_hbase:test_table","1","sources"
---------------------------
COLUMN CELL
sources:English timestamp=2024-03-07T15:36:03.156, value=82
sources:Math timestamp=2024-03-07T15:36:03.192, value=90
---------------------------
4.3、查一行一个列族某个列
get "test_hbase:test_table","1","sources:English"
---------------------------
COLUMN CELL
sources:English timestamp=2024-03-07T15:36:03.156, value=82
---------------------------
# 5.删除数据
5.1、删除【一个单元格】
deleteall | delete "test_hbase:test_table","1","base:name"
5.2、删除【整行】
deleteall "test_hbase:test_table","2"
5.3、ROEPREFIXFILTEB:支持行键前缀批量删除,CACHE:修改批量的值
deleteall "test_hbase:test_table",{ROEPREFIXFILTEB="时间戳TS|字符串STR",CACHE=>100}
5.4、删除表中【所有数据】
disable "test_hbase:test_table"
truncate "test_hbase:test_table"
# 6.自增
-- 首次针对不存在的列操作,针对存在的列会报错:Field is not a log,it‘s 10 bytes wide
-- 此后操作可针对【新添列名】进行
6.1、基本语法
自增:incr "[命名空间:]表名","行键","列族名:新添列名",增加数N
查询:get_counter "[命名空间:]表名","行键","列族名:新添列名"
6.2、案例展示
scan "test_hbase:test_table"
---------------------------
ROW COLUMN+CELL
1 column=base:age, timestamp=2024-03-07T15:36:03.061, value=17
1 column=base:gender, timestamp=2024-03-07T15:36:03.115, value=\
xE5\xA5\xB3
1 column=base:name, timestamp=2024-03-07T15:36:03.001, value=\xE
8\x83\xA1\xE6\xA1\x83
1 column=sources:English, timestamp=2024-03-07T15:36:03.156, val
ue=82
1 column=sources:Math, timestamp=2024-03-07T15:36:03.192, value=
90
---------------------------
incr "test_hbase:test_table","1","sources:count",2
---------------------------
ROW COLUMN+CELL
1 column=base:age, timestamp=2024-03-07T15:36:03.061, value=17
1 column=base:gender, timestamp=2024-03-07T15:36:03.115, value=\
xE5\xA5\xB3
1 column=base:name, timestamp=2024-03-07T15:36:03.001, value=\xE
8\x83\xA1\xE6\xA1\x83
1 column=sources:English, timestamp=2024-03-07T15:36:03.156, val
ue=82
1 column=sources:Math, timestamp=2024-03-07T15:36:03.192, value=
90
1 column=sources:count, timestamp=2024-03-11T20:01:16.651, value
=\x00\x00\x00\x00\x00\x00\x00\x02
---------------------------
# 7.预分区(hbase优化)
7.1、预分区
策略一:【NUMREGIONS:分区数量;SPLITALGO:分裂所采用的算法】
create "test_hbase:test_split","t1","t2",{NUMREGIONS=>3,SPLITALGO=>"UniformSplit"}
策略二:【SPLITS:行键取值范围(字母或数字)】
###取值范围:0~100,101~200,201~300,301以上
create "test_hbase:test_rowkey_split","cf1","cf2",SPLITS=>["100","200","300"]
7.2、查看分区
scan "hbase:meta",{STARTROW=>"test_hbase:test_rowkey_split",LIMIT=>10}
---------------------------
#hdfs存储信息
#drwxr-xr-x root supergroup 0 B Mar 11 20:31 0 0 B .tabledesc
#drwxr-xr-x root supergroup 0 B Mar 11 20:31 0 0 B .tmp
#drwxr-xr-x root supergroup 0 B Mar 11 20:31 0 0 B 28c38ce5ff401333122c00c05e521ae3
#drwxr-xr-x root supergroup 0 B Mar 11 20:31 0 0 B 4493f765702cc8979678f14cbcff17ff
#drwxr-xr-x root supergroup 0 B Mar 11 20:31 0 0 B 540c8c1f386356cab11f824e74d33fad
#drwxr-xr-x root supergroup 0 B Mar 11 20:31 0 0 B 867157c4f6ab39ba52ac6b3b58e6cbf4
---------------------------
4.TOOLS
## 2个小文件合并为一个大文件
1.compact "[命名空间:]表名"
## 所有小的文件合并为一个大文件
2.major_compact "[命名空间:]表名"
方式二:Hive来操作HBase(HBase数据映射至Hive中进行操作)
1.向HBase导入数据
## 基本格式
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv \
-Dimporttsv.separator="分隔符" \
-Dimporttsv.columns="HBASE_ROW_KEY,列族:列名..." \
"命名空间:表名" \
文件路径
## 案例(在shell命令窗下进行,不在hbase中进行)
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv \
-Dimporttsv.separator="|" \
-Dimporttsv.columns=HBASE_ROW_KEY,base:name,base:age,sources:English,sources:Math \
test_hbase:test_table \
file:///root/file/hbase_file/students_for_import_2.csv
2.hive 表映射 hbase表(在hive中进行)
# hive中建表并导入数据【hbase数据映射到hive中】
create external table yb12211.student_from_hbase(
stu_id int,
stu_name string,
stu_age int,
score_English int,
score_Math int
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties("hbase.columns.mapping"=":key,base:name,base:age,sources:English,sources:Math")
tblproperties("hbase.table.name"="test_hbase:test_table");
方式三:Java来操作HBase——数据迁移
1、应用场景的讲解
Java借助于HBase的API接口来操作HBase。
其核心功能主要是数据迁移。
1.借助于原生的HBase的API接口和Java jdbc的API接口,将传统的关系型数据库(mysql)中的数据导入到HBase中。
2.借助于文件流将普通的文件中的数据导入到HBase中。
2、初步准备工作
2.1:Maven创建
选择quick start,进行Maven创建
2.2:初步配置
一、删除url
二、properties配置
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
三、基本检查,确保版本一致=>都为1.8|8版本
四、依赖(覆盖)
<!-- MySql 驱动 -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.29</version>
</dependency>
<!-- HBase 驱动 -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>2.3.5</version>
</dependency>
<!-- Hadoop -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.1.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-auth</artifactId>
<version>3.1.3</version>
</dependency>
<!-- zookeeper -->
<dependency>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
<version>3.6.3</version>
</dependency>
<!-- log4j 系统日志 -->
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<!--json tool-->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>2.0.47</version>
</dependency>
3、最终的传参操作(验证操作)
运行配置的设置——传参
步骤一:先点击绿色的小锤子,然后再点击Edit Configurations的选项
步骤二:进行信息的配置