当前位置：首页 > news >正文

Spark，HDFS客户端操作 2

news 2025/8/25 23:34:19

一）创建文件夹
这一小结，我们来通过hadoop的相关api，实现通过代码的方式去创建文件夹。我们的目标是：在根目录下去创建一个名为maven的文件夹。

要用到的api是fs.mkdirs。

核心代码如下：

public void testMkdirs() throws IOException, URISyntaxException, InterruptedException {

// 1 获取文件系统
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://hadoop100:8020"); // hadoop100是namenode所在的节点
conf.set("hadoop.job.ugi", "root");

// 2 创建新文件
fs.mkdirs(new Path("/maven"));

// 3 关闭资源
fs.close();
}

此时，需要去设置登录的用户名。

System.setProperty("HADOOP_USER_NAME", "root")

然后再去尝试。去到hdfs的UI界面上去检查是否运行成功。

二）HDFS文件上传
接下来，我们向/maven下上传一个文件。要用到的api是put (或者copyFormLocalFile）。核心代码如下：public void testCopyFromLocalFile() throws IOException, InterruptedException, URISyntaxException {
// 1 获取文件系统
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "root");

// 2 上传文件
fs.copyFromLocalFile(new Path("d:/sunwukong.txt"), new Path("/maven"));
// 3 关闭资源
fs.close();
｝

三）动态设置副本份数（参数优先级)
默认情况下，上传的文件会被保存3份，如果需要的话，我们可以随时去修改这个设置参数。

参数优先级排序：（1）客户端代码中设置的值 >（2）然后是服务器的自定义配置（xxx-site.xml） >（3）服务器的默认配置（xxx-default.xml）

参考代码如下：

方法一：@Test
public void testCopyFromLocalFile() throws IOException, InterruptedException, URISyntaxException {
// 1 获取文件系统
Configuration configuration = new Configuration();
configuration.set("dfs.replication", "2");
FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "root");
// 2 上传文件
fs.copyFromLocalFile(new Path("d:/sunwukong.txt"), new Path("/xiyou/huaguoshan"));
// 3 关闭资源
fs.close();
｝

方法二：将hdfs-site.xml拷贝到项目的resources资源目录下<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>

四）HDFS文件下载
接下来，我们看如何去下载文件。这个过程需要调用copyToLocalFile这个API。具体的测试代码如下：@Test
public void testCopyToLocalFile() throws IOException, InterruptedException, URISyntaxException{

// 1 获取文件系统
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "root");

// 2 执行下载操作
// boolean delSrc 指是否将原文件删除
// Path src 指要下载的文件路径
// Path dst 指将文件下载到的路径
// boolean useRawLocalFileSystem 是否开启文件校验
fs.copyToLocalFile(false, new Path("/xiyou/huaguoshan/sunwukong.txt"), new Path("d:/sunwukong2.txt"), true);

// 3 关闭资源
fs.close();
}

注：如果执行上面代码，下载不了文件，有可能是你电脑的微软支持的运行库少，需要安装一下微软运行库

五）HDFS删除文件和目录
我们来学习如何删除文件。这里要用的API是fs.delete，用于删除 HDFS 中的文件或目录。

基本思路是：

获取文件系统
调用fs.delete删除指定目录下的文件
关闭文件系统连接。
参考代码如下：@Test
public void testDelete() throws IOException, InterruptedException, URISyntaxException{
// 1 获取文件系统
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "root");

// 2 执行删除
fs.delete(new Path("/xiyou"), true);

// 3 关闭资源
fs.close();
}

六）HDFS文件详情查看
接下来，我们来学习如何查看文件的详细信息。例如：名称、权限、长度、块信息等等。

这里要用的API是fs.listFiles。

基本思路是：

获取文件系统
调用fs.listFiles获取指定目录下的文件信息
使用迭代器，循环遍历
关闭文件系统连接。
参考代码如下：@Test
public void testListFiles() throws IOException, InterruptedException, URISyntaxException {

// 1获取文件系统
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "root");

// 2 获取文件详情
RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/"), true);

while (listFiles.hasNext()) {
LocatedFileStatus fileStatus = listFiles.next();

System.out.println("========" + fileStatus.getPath() + "=========");
System.out.println(fileStatus.getPermission());
System.out.println(fileStatus.getOwner());
System.out.println(fileStatus.getGroup());
System.out.println(fileStatus.getLen());
System.out.println(fileStatus.getModificationTime());
System.out.println(fileStatus.getReplication());
System.out.println(fileStatus.getBlockSize());
System.out.println(fileStatus.getPath().getName());

// 获取块信息
BlockLocation[] blockLocations = fileStatus.getBlockLocations();
System.out.println(Arrays.toString(blockLocations));
}
// 3 关闭资源
fs.close();
}

七）HDFS文件和文件夹判断
接下来，我们来学习如何查看文件的详细信息。例如：名称、权限、长度、块信息等等。

这里要用的API是fs.listFiles。

基本思路是：

获取文件系统
调用fs.listFiles获取指定目录下的文件信息
使用迭代器，循环遍历
关闭文件系统连接。
参考代码如下：@Test
public void testListStatus() throws IOException, InterruptedException, URISyntaxException{
// 1 获取文件配置信息
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "root");

// 2 判断是文件还是文件夹
FileStatus[] listStatus = fs.listStatus(new Path("/"));

for (FileStatus fileStatus : listStatus) {

// 如果是文件
if (fileStatus.isFile()) {
System.out.println("f:"+fileStatus.getPath().getName());
}else {
System.out.println("d:"+fileStatus.getPath().getName());
}
}

// 3 关闭资源
fs.close();
}

八）HDFS文件更名和移动
文件更名和文件移动本质是一样的：更新了这个文件的访问路径。这两个操作的的API都是rename。

具体如下：@Test
public void testRename() throws IOException, InterruptedException, URISyntaxException{

// 1 获取文件系统
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "root");

// 2 修改文件名称
fs.rename(new Path("/xiyou/huaguoshan/sunwukong.txt"), new Path("/xiyou/huaguoshan/meihouwang.txt"));
// 3 关闭资源
fs.close();
}

同样的，请大家运行之后，回去ui界面去检查。

九）HDFS读写流程
前面介绍了实际操作，下面我们深入介绍一些读写流程。

1.写入流程

2.读流程

读数据就是下载的过程
————————————————

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

原文链接：https://blog.csdn.net/2401_87076452/article/details/147380350

查看全文

http://www.xdnf.cn/news/61453.html