当前位置：首页 > ds >正文

Python-文件操作-StringIO和BytesIO-路径操作-shutil模块-csv,ini序列化和反序列化-argparse使用-学习笔记

ds 2025/7/13 17:39:53

序

欠4年前的一份学习笔记，献给今后的自己。

文件操作

冯诺依曼体系架构

在这里插入图片描述

CPU由运算器和控制器组成
运算器，完成各种算数运算、逻辑运算、数据传输等数据加工处理。
控制器，控制程序的执行
存储器，用于记忆程序和数据，例如内存
输入设备，将数据或者程序输入到计算机中，例如键盘、鼠标。
输出设备，将数据或程序的处理结果展示给用户，例如显示器、打印机等一般说10操作，指的是文件I0，如果指的是网络10，都会直接说网络10

文件10常用操作

在这里插入图片描述

打开操作

open(file， mode=‘r‘， buffering=-1， encoding=None， errors=None， newline=None， closefd=True， opener=None)

打开一个文件，返回一个文件对象（流对象）和文件描述符。打开文件失败，则返回异常

基本使用：

创建一个文件test，然后打开它，用完关闭

f = open("test") # file对象# windows <_io.TextIOwrapper name='test' mode='r' encoding='cp936'> 
# linux <_io.TextIOWrapper name='test' mode='r' encoding='UTF-8' › print(f.read())# 读取文件
f.close () # 关闭文件

文件操作中，最常用的操作就是读和写。

文件访问的模式有两种：文本模式和二进制模式。不同模式下，操作函数不尽相同，表现的结果也不一样。

open的参数

file

打开或者要创建的文件名。如果不指定路径，默认是当前路径

mode模式

在这里插入图片描述

在上面的例子中，可以看到默认是文本打开模式，且是只读的。

# r模式f = open('test')  # 只读还是只写?f.read()
# Traceback (most recent call last):
#   File "/Users/quyixiao/pp/python_lesson/function1/function15.py", line 7, in <module>
#     f.write('abc')
#     ~~~~~~~^^^^^^^
# io.UnsupportedOperation: not writable
#f.write('abc')#f.close()#f = open('test', 'r')  # 只读# Traceback (most recent call last):
#   File "/Users/quyixiao/pp/python_lesson/function1/function15.py", line 17, in <module>
#     f.write('abc')
#     ~~~~~~~^^^^^^^
# io.UnsupportedOperation: not writable
#f.write('abc')#f.close()f = open('test1', 'r')  # 只读,文件不存在# w模式f = open('test', 'w')  # 只写打开f.write('abc')f.close()##>>> cat test # 看看内容f = open('test', mode='w')f.close()# >>> cat test # 看看内容f = open('test1', mode='w')f.write('123')f.close()# >>> cat test1 # 看看内容

open默认是只读模式t打开已经存在的文件。

只读打开文件，如果使用write方法，会抛异常。

如果文件不存在，抛出FileNotFoundError异常

表示只写方式打开，如果读取则抛出异常

如果文件不存在，则直接创建文件

如果文件存在，则清空文件内容

f = open('test2', 'x')# f. read() #f.write('abcd')f.close()f = open('test2', 'x')  #第二次运行抛出
Traceback (most recent call last):File "/Users/quyixiao/pp/python_lesson/function1/function15.py", line 9, in <module>f = open('test2', 'x')  #
FileExistsError: [Errno 17] File exists: 'test2'

x
文件不存在，创建文件，并只写方式打开
文件存在，抛出FileExistsError异常

文件存在，只写打开，追加内容

文件不存在，则创建后，只写打开，追加内容「是只读，wxa都是只写。

wxa都可以产生新文件，w不管文件存在与否，都会生成全新内容的文件；a不管文件是否存在，都能在打开的文件尾部追加；x必须要求文件事先不存在，自己造一个新文件

文本模式t

字符流，将文件的字节按照某种字符编码理解，按照字符操作。open的默认mode就是rt。

二进制模式b

字节流，将文件就按照字节理解，与字符编码无关。二进制模式操作时，字节操作使用bytes类型

f = open('test3', 'rb')  # 二进制只读s = f.read()print(type(s))  #  <class 'bytes'>print(s)        #  b''f.close()  # 关闭文件f = open("test3", 'wb')  # IO 对象s = f.write("马哥教育".encode())print('~' * 30)print(s)  # 是什么  12f.close()
输出：
<class 'bytes'>
b'\xe9\xa9\xac\xe5\x93\xa5\xe6\x95\x99\xe8\x82\xb2'
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
12

为r、w、a、x提供缺失的读写功能，但是，获取文件对象依|日按照r、w、a、x自己的特征。

+不能单独使用，可以认为它是为前面的模式字符做增强功能的。

文件指针

上面的例子中，已经说明了有一个指针。

文件指针，指向当前字节位置

mode=r，指针起始在0

mode=a，指针起始在EOF

tell（）显示指针当前位置

seek(offset[， whence])

移动文件指针位置。offest偏移多少字节，whence从哪里开始。

文本模式下

whence 0 缺省值，表示从头开始，offest只能正整数
whence 1表示从当前位置，offest只接受0
whence 2 表示从EOF开始，offest只接受0

# 文本模式f = open('test4', 'r+')
f.tell()  # 起始f.read()f.tell()  # EOFf.seek(0)  # 起始f.read()f.seek(2, 0)f.read()f.seek(2, 0)# Traceback (most recent call last):
#   File "/Users/quyixiao/pp/python_lesson/function1/function15.py", line 20, in <module>
#     f.seek(2, 1)  # offset必须 0
#     ~~~~~~^^^^^^
# io.UnsupportedOperation: can't do nonzero cur-relative seeks
# f.seek(2, 1)  # offset必须 0# Traceback (most recent call last):
#   File "/Users/quyixiao/pp/python_lesson/function1/function15.py", line 27, in <module>
#     f.seek(2, 2)  # offset 必须为 0
#     ~~~~~~^^^^^^
# io.UnsupportedOperation: can't do nonzero end-relative seeks
# f.seek(2, 2)  # offset 必须为 0f.close()
# 中文f = open('test4', 'w+')f.write('马哥教育')f.tell()f.close()f = open('test4', 'r+')f.read(3)f.seek(1)f.tell()f.read()  #f.seek(2)  # f.seek(3)
f.close()

文本模式支持从开头向后偏移的方式。

whence为1表示从当前位置开始偏移，但是只支持偏移0，相当于原地不动，所以没什么用。

whence为2表示从EOF开始，只支持偏移0，相当于移动文件指针到EOF。

seek是按照字节偏移的。

二进制模式下

whence 0 缺省值，表示从头开始，offest只能正整数

whence 1表示从当前位置，offest可正可负

whence 2表示从EOF开始，offest可正可负

# 二进制模式f = open('test4', 'rb+')f.tell()  # 起始f.read()f.tell()  # EOFf.write(b'abc')f.seek(0)  # 起始f.seek(2, 1)  # 从当前指针开始,向后2f.read()f.seek(-2, 1)  # 从当前指针开始,向前2f.seek(2, 2)  # 从EOF开始,向后2f.seek(0)f.seek(-2, 2)  # 从EOF开始,向前2f.read()# Traceback (most recent call last):
#   File "/Users/quyixiao/pp/python_lesson/function1/function15.py", line 29, in <module>
#     f.seek(-20, 2)  # OSError
#     ~~~~~~^^^^^^^^
# OSError: [Errno 22] Invalid argument
# f.seek(-20, 2)  # OSErrorf.close()

二进制模式支持任意起点的偏移，从头、从尾、从中间位置开始。

向后seek可以超界，但是向前seek的时候，不能超界，否则抛异常。

buffering：缓冲区

-1表示使用缺省大小的buffer。如果是二进制模式，使用io.DEFAULT_BUFFER_SIZE值，默认是4096 或者8192。如果是文本模式，如果是终端设备，是行缓存方式，如果不是，则使用二进制模式的策略。

0只在二进制模式使用，表示关buffer
1只在文本模式使用，表示使用行缓冲。意思就是见到换行符就flush
大于1用于指定buffer的大小

buffer 缓冲区

缓冲区一个内存空间，一般来说是一个FIFO队列，到缓冲区满了或者达到阈值，数据才会flush到磁盘。

flush()将缓冲区数据写入磁盘

close()关闭前会调用flush()

io.DEFAULT_BUFFER_SIZE 缺省缓冲区大小，字节

先看二进制模式

import iof = open('test4', 'w+b')print(io.DEFAULT_BUFFER_SIZE)f.write("magedu.com".encode())# cat test4f.seek(0)# cat test4f.write("www.magedu.com".encode())f.flush()f.close()f = open('test4', 'w+b', 4)f.write(b"mag")# cat test4f.write(b'edu')# cat test4f.close()输出：
8192

文本模式

import io# buffering=1,使用行缓冲f = open('test4', 'w+', 1)f.write("mag")  # cat test4f.write("magedu" * 4)  # cat test4f.write('\n')  # cat test4f.write('Hello\nPython')  # cat test4f.close()# buffering>1,使用指定大小的缓冲区f = open('test4', 'w+', 15)f.write("mag")  # cat test4f.write('edu')  # cat test4f.write('Hello\n')  # cat test4
f.write('\nPython')  # cat test4f.write('a' * (io.DEFAULT_BUFFER_SIZE - 28))  # 设置为大于1没什么用
f.write('\nwww.magedu.com/python')
f.close()

buffering=0

这是一种特殊的二进制模式，不需要内存的buffer，可以看做是一个FIFO的文件。

f = open('test4', 'wb+', 0)
f.write(b"m")  # cat test4
f.write(b"a")  # cat test4
f.write(b"g")  # cat test4
f.write(b"magedu" * 4)  # cat test4
f.write(b'\n')  # cat test4
f.write(b'Hello\nPython')
f.close()

在这里插入图片描述

似乎看起来很麻烦，一般来说，只需要记得：

文本模式，一般都用默认缓冲区大小
二进制模式，是一个个字节的操作，可以指定buffer的大小
一般来说，默认缓冲区大小是个比较好的选择，除非明确知道，否则不调整它
一般编程中，明确知道需要写磁盘了，都会手动调用一次flush，而不是等到自动flush或者 close的时候

encoding：编码，仅文本模式使用

None 表示使用缺省编码，依赖操作系统。windows、linux下测试如下代码

f = open('test1', 'w')
f.write('啊')f.close()

windows 下缺省GBK（0xB0A1）,Linux 下缺省UTF-8（0xE5 958A）

其它参数

errors

什么样的编码错误将被捕获

None和strict表示有编码错误将抛出ValueError异常；ignore表示忽略

newline

文本模式中，换行的转换。可以为None、“空串”、‘\r’、‘\n‘、‘\r\n’

读时，None表示’\r’、‘\n‘、‘\r\n’都被转换为’\n’；“表示不会自动转换通用换行符；其它合法字符表示换行符就是指定字符，就会按照指定字符分行

写时，None表示’\n’ 都会被替换为系统缺省行分隔符os.linesep；‘\n’或“表示’\n’不替换；其它合法字符表示’\n’会被替换为指定的字符

f = open('testxx', 'w')f.write('python\rwww.python.org\nwww.magedu.com\r\npython3')f.close()newlines = [None, '', '\n', '\r\n']for nl in newlines:f = open('testxx', 'r+', newline=nl)  # 缺省替换所有换行符print(f.readlines())f.close()

closefd

关闭文件描述符，True表示关闭它。False会在文件关闭后保持这个描述符。fileobj.fieno()查看

read

read (size=-1)

size表示读取的多少个字符或字节；负数或者None表示读取到EOF

f = open('test4', 'r+', 0)f.write("magedu")f.write('\n')f.write('马哥教育')f.seek(0)f.read(7)f.close()# 二进制f = open ('test5', 'rb+')f. read (7)f. read (1)f. close()

行读取

readline(size=-1)
一行行读取文件内容。size设置一次能读取行内几个字符或字节。
readlines(hint=-1)
读取所有行的列表。指定hint则返回指定的行数。

# 按行迭代f = open('test')  # 返回可迭代对象for line in f:print(line)f.close()

write

write(s)，把字符串s写入到文件中并返回字符的个数
writelines(lines)，将字符串列表写入文件。

f = open('test', 'w+')lines = ['abc', '123\n', 'magedu']  # 提供换行符f.writelines(lines)f.seek(0)print(f.read())f.close()

close

flush并关闭文件对象。

文件已经关闭，再次关闭没有任何效果。

其他

seekable() 是否可seek
readable() 是否可读
writable() 是否可写
closed() 是否已经关闭

上下文管理

问题的引出

在Linux中，执行

# 下面必须这么写lst = []for _ in range(2000):lst.append(open('test'))# OSError: [Errno 24] Too many open files: 'test'print(len(lst))

Isof 列出打开的文件。没有就#yum install lsof

$ Isof -p 1427 | grep test | wc-l

Isof -p进程号

ulimit -a 查看所有限制。其中open files就是打开文件数的限制，默认1024

for x in lst:x.close()

将文件一次关闭，然后就可以继续打开了。再看一次Isof。

如何解决？

1、异常处理

当出现异常的时候，拦截异常。但是，因为很多代码都可能出现OSError异常，还不好判断异常就是应为资源限制产生的。

f = open('test','w+')
try:f.write("abc")  # 文件只读，写入失败
finally:f.close()  # 这样才行

使用finally可以保证打开的文件可以被关闭。

2、上下文管理

一种特殊的语法，交给解释器去释放文件对象

上下文管理

with open('test') as f:f.write("abc")  # 文件只读，写入失败# 测试f是否关闭f.closed  # f的作用域

上下文管理

使用with …as 关键字
上下文管理的语句块并不会开启新的作用域
with语句块执行完的时候，会自动关闭文件对象

另一种写法

f1 = open('test')
with f1:f1.write("abc")  # 文件只读，写入失败# 测试f是否关闭f1.closed  # f1的作用域

对于类似于文件对象的IO对象，一般来说都需要在不使用的时候关闭、注销，以释放资源。

IO被打开的时候，会获得一个文件描述符。计算机资源是有限的，所以操作系统都会做限制。就是为了保护计算机的资源不要被完全耗尽，计算资源是共享的，不是独占的。

一般情况下，除非特别明确的知道资源情况，否则不要提高资源的限制值来解決问题。

练习

指定一个源文件，实现copy到目标目录。

例如把/tmp/test.xt 拷贝到 /tmp/test1.txt

filename1 = 'test.txt'
filename2 = 'test1.txt'f = open(filename1, 'w+')lines = ['abc', '123', 'magedu']f.writelines('\n'.join(lines))f.seek(0)print(f.read())f.close()def copy(src, dest):with open(src) as f1:with open(dest, 'w') as f2:f2.write(f1.read())copy(filename1, filename2)

有一个文件，对其进行单词统计，不区分大小写，并显示单词重复最多的10个单词。

d = {}with open('sample', encoding='utf8') as f:for line in f:words = line.split()for word in map(str.lower, words):d[word] = d.get(word, 0) + 1print(sorted(d.items(), key=lambda item: item[1], reverse=True))
# 输出： [('c', 3), ('iewie', 1), ('a', 1), ('b', 1), ('f', 1)]

这是帮助文档中path的文档，path应该很多。

for k in d.keys():if k.find('path') > -1:print(k)

使用上面的代码，就可以看到path非常多 os.path.exists(path)可以认为含有2个path。

def makekey(s: str):chars = set(r"""!'"#./\()[],*-""")key = s.lower()ret = []for i, c in enumerate(key):if c in chars:ret.append(' ')else:ret.append(c)return ''.join(ret).split()d = {}with open('sample', encoding='utf8') as f:for line in f:words = line.split()for wordlist in map(makekey, words):for word in wordlist:d[word] = d.get(word, 0) + 1for k, v in sorted(d.items(), key=lambda item: item[1], reverse=True):print(k, v)# 对单词做进一步处理后，统计如下：
# c 3
# iewie 1
# a 1
# b 1
# f 1

分割key的另一种思路

def makekey(s: str):chars = set(r"""!'"#./\()[],*-""")key = s.lower()ret = []start = 0length = len(key)for i, c in enumerate(key):if c in chars:if start == i:  # 如果紧挨着还是特殊字符，start一定等于istart += 1  # 加1并continuecontinueret.append(key[start:i])start = i + 1  # 加1是跳过这个不需要的特字符celse:if start < len(key):  # 小于，说明还有有效的字符，而且一直到末尾ret.append(key[start:])return retprint(makekey('os.path.exists(path)'))
print(makekey('os.path.-exists(path))'))
print(makekey('path.os...'))print(makekey('path'))print(makekey('path-p'))print(makekey('***...'))print(makekey(''))输出：['os', 'path', 'exists', 'path']
['os', 'path', 'exists', 'path']
['path', 'os']
['path']
['path', 'p']
[]
[]

StringIO和lBytesIO

StringlO

io模块中的类
from io import StringIO
内存中，开辟的一个文本模式的buffer，可以像文件对象一样操作它
当close方法被调用的时候，这个buffer会被释放

StringlO操作

getValue() 获取全部内容。跟文件指针没有关系

from io import StringIO# 内存中构建sio = StringIO()  # 像文件对象一样操作print(sio.readable(), sio.writable(), sio.seekable())sio.write("magedu\nPython")sio.seek(0)print(sio.readline())print(sio.getvalue())  # 无视指针，输出全部内容sio.close()

StringIO

好处
一般来说，磁盘的操作比内存的操作要慢得多，内存足够的情况下，一般的优化思路是少落地，減少磁盘IO的过程，可以大大提高程序的运行效率

ByteslO

io模块中的类
from io import ByteslO
内存中，开辟的一个二进制模式的buffer，可以像文件对象一样操作它
当close方法被调用的时候，这个buffer会被释放

ByteslO操作

from io import BytesIO  # 内存中构建bio = BytesIO()print(bio.readable(), bio.writable(), bio.seekable())       # True True Truebio.write(b" magedu\nPython")bio.seek(0)print(bio.readline())       # b' magedu\n'print(bio.getvalue())  # 无视指针,输出全部内容  b' magedu\nPython'
bio.close()

file-like对象

类文件对象，可以像文件对象一样操作
socket对象、输入输出对象（stdin、 stdout）都是类文件对象

from sys import stdoutf = stdoutprint(type (f))         # <class '_io.TextIOWrapper'>f.write('magedu.com')   # magedu.com

路径操作

路径操作模块

3.4版本之前

os.path模块

from os import pathp = path.join('/etc', 'sysconfig', 'network')print(type(p), p)       # <class 'str'> /etc/sysconfig/networkprint(path.exists(p))   # Falseprint(path.split(p))  # ('/etc/sysconfig', 'network')print(path.abspath('.'))     # /Users/quyixiao/pp/python_lesson/function1p = path.join('0:/', p, 'test.txt')print('*'*30)print(path.dirname(p))      # /etc/sysconfig/networkprint(path.basename(p))     # test.txtprint(path.splitdrive(p))       # ('', '/etc/sysconfig/network/test.txt')p1 = path.abspath(__file__)print(p1, path.basename(p1))    # /Users/quyixiao/pp/python_lesson/function1/function15.py function15.py# --- /Users/quyixiao/pp/python_lesson/function1 function1
# --- /Users/quyixiao/pp/python_lesson python_lesson
# --- /Users/quyixiao/pp pp
# --- /Users/quyixiao quyixiao
# --- /Users Users
# --- / 
while p1 != path.dirname(p1):p1 = path.dirname(p1)print('---',p1, path.basename(p1))

3.4版本开始

建议使用pathlib模块，提供Path对象来操作。包括目录和文件。

pathlib模块

from pathlib import Path

目录操作

初始化

from pathlib import Pathp = Path()  # 当前目录p = Path('a', 'b', 'c/d')  # 当前目录下的a/b/c/d
print(p)        # a/b/c/dP = Path('/etc')  # 根下的etc目录
print(P)        # /etc

路径拼接和分解

操作符/

Path对象/ Path对象

Path对象/字符串或者字符串/Path对象

分解

parts属性，可以返回路径中的每一个部分

joinpath

joinpath(*other) 连接多个字符串到Path对象中

from pathlib import Pathp = Path()p = p / 'a'p1 = 'b' / pp2 = Path('c')p3 = p2 / p1print(p3.parts)     # ('c', 'b', 'a')p3.joinpath('etc', 'init.d', Path('httpd'))

获取路径

str 获取路径字符串

bytes 获取路径字符串的bytes

from pathlib import Pathp = Path('/etc')print(str(p), bytes(p)) # /etc b'/etc'

父目录

parents 父目录序列，索引0是直接的父

from pathlib import Pathp = Path('/a/b/c/d')
print(p.parent.parent)for x in p.parents:print(x)# 输出
# /a/b
# /a/b/c
# /a/b
# /a
# /

name, stem, suffix, suffixes, with_suffix(suffix), with_name(name) name 目录的最后一个部分
suffix 目录中最后一个部分的扩展名

stem 目录最后一个部分，没有后缀

suffixes 返回多个扩展名列表

with_suffix（suffix）补充扩展名到路径尾部，返回新的路径，扩展名存在则无效
with_name（name）替换目录最后一个部分并返回一个新的路径

from pathlib import Pathp = Path('/magedu/mysqlinstall/mysql.tar.gz')print(p.name)               # mysql.tar.gzprint(p.suffix)     # .gzprint(p.suffixes)       # ['.tar', '.gz']print(p.stem)       # mysql.tarprint(p.with_name('mysq1-5.tgz'))       # /magedu/mysqlinstall/mysq1-5.tgzp = Path('README')print(p.with_suffix('.txt'))    # README.txt

cwd（）返回当前工作目录
home（）返回当前家目录
is_dir()是否是目录，目录存在返回True
is_file（是否是普通文件，文件存在返回True
is_symlink（）是否是软链接
is_socket() 是否是socket文件
is_block_device() 是否是块设备
is_char_device() 是否是字符设备
is_absolute() 是否是绝对路径
resolve() 返回一个新的路径，这个新路径就是当前Path对象的绝对路径，如果是软链接则直接被解析
absolute() 也可以获取绝对路径，但是推荐使用resolve（）
exists() 目录或文件是否存在
rmdir() 删除空目录。没有提供判断目录为空的方法
touch（mode=0o666， exist_ok=True）创建一个文件
as_uri() 将路径返回成URI，例如‘file:///etc/passwd’
mkdir(mode=0o777， parents=False， exist_ok=False)
parents，是否创建父目录，True等同于mkdir-p；False时，父目录不存在，则抛出FileNotFoundError
exist_ok参数，在3.5版本加入。False时，路径存在，抛出FileExistsError； True时，FileExistsError被忽略
iterdir() 选代当前目录

from pathlib import Pathp = Path()p /= 'a/b/c/d'print(p.exists())  # True# p.mkdir()  # FileNotFoundErrorp.mkdir(parents=True)p.exists()  # True#p.mkdir(parents=True)p.mkdir(parents=True, exist_ok=True)p /= 'readme.txt'p.parent.rmdir()  #p.parent.exists()  # False '/a/b/c'#p.mkdir()  # FileNotFoundErrorp.mkdir(parents=True)  # 成功# 遍历,并判断文件类型,如果是目录是否可以判断其是否为空?
for x in p.parents[len(p.parents) - 1].iterdir():print(x, end='\t')if x.is_dir():flag = Falsefor _ in x.iterdir():flag = Truebreak# for 循环是否可以使用else子句print('dir', 'Not Empty' if flag else 'Empyt', sep='(t')elif x.is_file():print('file')else:print('other file')

通配符

glob(pattern)通配给定的模式

rglob(pattern) 通配给定的模式，递归目录

返回一个生成器

from pathlib import Pathp = Path()print(list(p.glob('test*')) ) # 返回当前目录对象下的test开头的文件, [PosixPath('test1.txt'), PosixPath('test'), PosixPath('test1'), PosixPath('testxx'), PosixPath('test.txt'), PosixPath('test3'), PosixPath('test2')]
print(list(p.glob('**/*.py')))  # 递归所有目录，等同rglob, [PosixPath('function16.py'), PosixPath('function8.py'), PosixPath('function12.py'), PosixPath('function9.py'), PosixPath('function13.py'), PosixPath('function17.py'), PosixPath('function6.py'), PosixPath('function2.py'), PosixPath('function3.py'), PosixPath('__init__.py'), PosixPath('function7.py'), PosixPath('function4.py'), PosixPath('function5.py'), PosixPath('function1.py'), PosixPath('function10.py'), PosixPath('function15.py'), PosixPath('function11.py')]
g = p.rglob('*.py')#生成器print(next (g)) # function16.py

匹配

match(pattern)

模式匹配，成功返回True

from pathlib import Pathprint(Path('a/b.py').match('**.py'))  # True
Path('/a/b/c.py').match('b/*.py')  # True
Path('/a/b/c.py').match('a/*.py')  # False
Path('/a/b/c.py').match('a/*/*.py')  # True
Path('/a/b/c.py').match('a/**/*.py')  # True
Path('/a/b/c.py').match('**/*-py')  # True

stat() 相当于stat命令

Istat() 同stat() ，但如果是符号链接，则显示符号链接本身的文件信息

# $ ln -s test tfrom pathlib import Pathp = Path('a')p.stat()p1 = Path('t')p1.stat()p1.lstat()

文件操作

open(mode=‘r’， buffering=-1， encoding=None， errors=None， newline=None) 使用方法类似内建函数open。返回一个文件对象

3.5 增加的新函数

read_bytes()

以’rb’读取路径对应文件，并返回二进制流。看源码

read_text(encoding=None， errors=None) 以’rt’方式读取路径对应文件，返回文本。

Path.write_bytes(data)

以’wb’方式写入数据到路径对应文件。

write_text(data， encoding=None， errors=None)
以’wt’方式写入字符串到路径对应文件。

from pathlib import Pathp = Path('my_binary_file')p.write_bytes(b'Binary file contents')
p.read_bytes()  # b'Binary file contents'p = Path('my_text_file')p.write_text('Text file contents')
p.read_text()  # 'Text file contents'from pathlib import Pathp = Path('test.py')p.write_text('hello python')print(p.read_text())
with p.open() as f:print(f.read(5))
输出：
hello python
hello

OS 模块

操作系统平台

在这里插入图片描述

os.listdir(‘temp’) 返回目录内容列表。

os也有open、read、write等方法，但是太低级，建议使用内建函数open、read、write，使用方法相似。

$ In -s test t1 建立一个软链接

os.stat(path,*,dir_fd=None， follow_symlinks=True) 只业学院本质上调用Linux系统的stat。

path：路径的string或者bytes，或者fd文件描述符

follow_symlinks True返回文件本身信息，False且如果是软链接则显示软链接本身。

import osprint(os.stat('test'))
# os.stat_result(st_mode=33188, st_ino=35292987, st_dev=16777230, st_nlink=1, st_uid=501, st_gid=20, st_size=3, st_atime=1751974762, st_mtime=1751974760, st_ctime=1751974760)#st_mode = 33206 = > 100666print(os.stat('test'))#os.stat_result(st_mode=33204, st_ino=3407875, st_dev=64768, st_nlink=1, st_uid=500, st_gid=500, st_size=3,
#               st_atime=1508690220, st_mtime=1508690177, st_ctime=1508690177)

import osos.chmod(path, mode,*, dir_fd=None, follow_symlinks=True)os.chmod ('test',0o777)
os.chown(path, uid, gid)

改变文件的属主、属组，但需要足够的权限

shutil模块

到目前为止

文件拷贝：使用打开2个文件对象，源文件读取内容，写入目标文件中来完成拷贝过程。但是这样丢失stat数据信息（权限等），因为根本没有复制这些信息过去。

目录复制怎么办呢？

Python提供了一个方便的库shutil（高级文件操作）。在这里插入代码片

copy 复制

copyfileobj(fsrc， fdst[， length])

文件对象的复制，fsrc和fdst是open打开的文件对象，复制内容。fdst要求可写。

length 指定了表示buffer的大小；

import shutil
import oswith open('test', 'r+') as f1:f1.write('abcd\n1234')f1.flush()with open('test1', 'w+') as f2:shutil.copyfileobj(f1, f2)  # 可以复制内容吗?为什么,怎么改?print(os.stat('test'))
print(os.stat('test1')) 
# 输出：
# os.stat_result(st_mode=33188, st_ino=35292987, st_dev=16777230, st_nlink=1, st_uid=501, st_gid=20, st_size=9, st_atime=1752020998, st_mtime=1752020998, st_ctime=1752020998)
# os.stat_result(st_mode=33188, st_ino=35293203, st_dev=16777230, st_nlink=1, st_uid=501, st_gid=20, st_size=0, st_atime=1752020952, st_mtime=1752020998, st_ctime=1752020998)

copyfile(sc，dst， *， follow_symlinks=True)

复制文件内容，不含元数据。src、dst为文件的路径字符串
本质上调用的就是copyfileobj，所以不带元数据二进制内容复制。

copymode(src,dst,*,follow_symlinks=True) 仅仅复制权限。

import shutil
import osshutil.copymode('test1', 'test')print(os.stat('test1'))print(os.stat('test'))# 输出：
# os.stat_result(st_mode=33188, st_ino=35293203, st_dev=16777230, st_nlink=1, st_uid=501, st_gid=20, st_size=0, st_atime=1752020998, st_mtime=1752020998, st_ctime=1752020998)
# os.stat_result(st_mode=33188, st_ino=35292987, st_dev=16777230, st_nlink=1, st_uid=501, st_gid=20, st_size=9, st_atime=1752020999, st_mtime=1752020998, st_ctime=1752021117)

copystat(sc, dst, *,follow_symlinks=True) 复制元数据，stat包含权限

在这里插入图片描述

copy(src， dst， *， follow_symlinks=True)

复制文件内容、权限和部分元数据，不包括创建时间和修改时间。本质上调用的是

copyfile(src， dst， follow_symlinks=follow_symlinks)

copymode(src， dst， follow_symlinks=follow_symlinks)

copy2 比copy多了复制全部元数据，但需要平台支持。

本质上调用的是

copyfile(src,dst,follow_symlinks=follow_symlinks)

copystat(src,dst,follow_symlinks=follow_symlinks)

copytree(src， dst， symlinks=False， ignore=None， copy_function=copy2， ignore_dangling_symlinks=False) 递归复制目录。默认使用copy2，也就是带更多的元数据复制。

src、dst必须是目录，src必须存在，dst必须不存在

ignore =func，提供一个callable(src， names) -> ignored_names。提供一个函数，它会被调用。src是源目录，names是os.listdir(src) 的结果，就是列出src中的文件名，返回值是要被过滤的文件名的 set类型数据。

import shutil
import os# o:/temp下有a、b目录
def ignore(sc, names):ig = filter(lambda x: x.startswith('x'), names)  #return set(ig)shutil.copytree('a', 'b', ignore=ignore)

在这里插入图片描述

rm 删除

shutil.rmtree(path， ignore_errors=False， οnerrοr=None)

递归删除。如同rm -rf一样危险，慎用。

它不是原子操作，有可能删除错误，就会中断，已经删除的就删除了。

ignore_errors为true，忽略错误。当为False或者omitted时onerror生效。

onerror为callable，接受函数function、path和execinfo。

import shutil
import osshutil. rmtree('b') # #12 rm -rf

move 移动

move(sc， dst， copy_function=copy2) 递归移动文件、目录到目标，返回目标。

本身使用的是 os.rename方法。

如果不支持rename，如果是目录则想copytree再删除源目录。

默认使用copy2方法。

import shutil
import osos. rename('o:/t.txt', 'o:/temp/t') 
os. rename ('test3', '/tmp/py/test300')

shutil还有打包功能。生成tar并压缩。支持zip、gz、bz、xz。

csv文件

csv文件简介

参看 RFC 4180

http://www.ietf.org/rfc/rfc4180.txt

逗号分隔值Comma-Separated Values。

CSV 是一个被行分隔符、列分隔符划分成行和列的文本文件。

CSV 不指定字符编码。

行分隔符为\r\n，最后一行可以没有换行符

列分隔符常为逗号或者制表符。

每一行称为一条记录record

字段可以使用双引号括起来，也可以不使用。如果字段中出现了双引号、逗号、换行符必须使用双引号括起来。如果字段的值是双引号，使用两个双引号表示一个转义。

表头可选，和字段列对齐就行了。

手动生成csv文件

from pathlib import Pathp = Path('a/test.csv')parent = p.parentif not parent.exists():parent.mkdir(parents=True)csv_body = '''\id, name, age, comment1, zs,18, "I'm 18"2, 1s,20, "this is a ""test"" string." 
3,ww,23,"你好计算机
"
'''
p.write_text(csv_body)

在这里插入图片描述

csv 模块

reader (csvfile, dialect=‘excel’, **fmtparams)

返回DictReader对象，是一个行迭代器。

delimiter 列分隔符，逗号

lineterminator 行分隔符\r\n

quotechar 字段的引用符号，缺省为”，双引号双引号的处理：

doublequote 双引号的处理，默认为True。如果和quotechar为同一个，True则使用2个双引号表示， False表示使用转义字符将作为双引号的前缀。

escapechar一个转义字符，默认为None。

quoting 指定双引号的规则。QUOTE_ALL 所有字段；QUOTE_MINIMAL特殊字符字段；

QUOTE_NONNUMERIC非数字字段；QUOTE_NONE都不使用引号。

writer(csvfile,dialect=‘excel’， **fmtparams) 返回DictWriter的实例。

主要方法有writerow、writerows。

writerow(iterable)

import csv
from pathlib import Pathp = Path('a/test.csv')
with open(str(p)) as f:reader = csv.reader(f)print(next(reader))print(next(reader))rows = [[4, 'tom', 22, 'tom'],(5, 'jerry', 24, 'jerry'),(6, 'justin', 22, 'just\t"in'),"abcdefghi",((1,), (2,))
]row = rows[0]print(row)
with open(str(p), 'a') as f:writer = csv.writer(f)writer.writerow(row)writer.writerows(rows)

在这里插入图片描述

ini文件处理

作为配置文件，ini文件格式的很流行。

[DEFAULT]

a = test

[mysql]

default-character-set=utf8

[mysqld]

datadir =/dbserver/data

port = 33060

character-set-server=utf8

sql_mode=NO_ENGINE_SUBSTITUTION, STRICT_TRANS_TABLES

中括号里面的部分称为section，译作节、区、段。

每一个section内，都是key=value形成的键值对，key称为option选项。

注意这里的DEFAULT是缺省section的名字，必须大写，。

configparser

configparser模块的ConfigParser类就是用来操作。

可以将section当做key，section存储着键值对组成的字典，可以把ini配置文件当做一个嵌套的字典。默认使用的是有序字典。

read (filenames， encoding=None)

读取ini文件，可以是单个文件，也可以是文件列表。可以指定文件编码。

sections()返回section列表。缺省section不包括在内。

add_section(section_name) 增加一个section。

has_section(section_name) 判断section是否存在

options（section）返回section的所有option，会追加缺省section的option has_option（section，option）判断section是否存在这个option

get (section， option， *， raw=False， vars=None[， fallback])

从指定的段的选项上取值，如果找到返回，如果没有找到就去找DEFAULT段有没有。

getint(section， option， *， raw=False， vars=None[， fallback])

getfloat (section， option， *， raw=False， vars=Nonel， fallback])

getboolean (section， option， *， raw=False， vars=Nonel， fallback]) 上面3个方法和get一样，返回指定类型数据。

items (raw=False， vars=None)

items (section， raw=False， vars=None)

没有section，则返回所有section名字及其对象；如果指定section，则返回这个指定的section的键值对组成二元组。

set(section， option， value)

section存在的情况下，写入option=value，要求option、value必须是字符串。

remove_section (section)

移除section及其所有option

remove_option(section， option) #section #Joption。

write(fileobject， space_around_delimiters=True)

将当前config的所有内容写入fileobject中，一般open函数使用w模式。

from configparser import ConfigParserfilename = 'test.ini'newfilename = 'mysql.ini'cfg = ConfigParser()cfg.read(filename)print(cfg.sections())  #print(cfg.has_section('client'))print(cfg.items('mysqld'))for k, v in cfg.items():print(k, type(v))print(k, cfg.items(k))tmp = cfg.get('mysqld', 'port')print(type(tmp), tmp)print(cfg.get('mysqld', 'a'))# print(cfg.get ('mysqld', 'magedu' ))print(cfg.get('mysqld', 'magedu', fallback='python'))tmp = cfg.getint('mysqld', 'port')print(type(tmp), tmp)if cfg.has_section('test'):cfg.remove_section('test')cfg.add_section('test')cfg.set('test', 'test1', '1')cfg.set('test', 'test2', '2')with open(newfilename, 'w') as f:cfg.write(f)print(cfg.getint('test', 'test2'))cfg.remove_option('test', 'test2')# 字典操作更简单cfg['test']['x'] = '100'  # key THcfg['test2'] = {'test2': '1000'}  # section f-1:print('x' in cfg['test'])print('x' in cfg['test2'])print(cfg._dict)  # 返回默认字典类型,内部使用有序字典# 修改后需再次写入with open(newfilename, 'w') as f:cfg.write(f)

序列化和反序列化

为什么要序列化

内存中的字典、列表、集合以及各种对象，如何保存到一个文件中？

如果是自己定义的类的实例，如何保存到一个文件中？

如何从文件中读取数据，并让它们在内存中再次变成自己对应的类的实例？

要设计一套协议，按照某种规则，把内存中数据保存到文件中。文件是一个字节序列，所以必须把数据转换成字节序列，输出到文件。这就是序列化。反之，从文件的字节序列恢复到内存，就是反序列化。

定义

serialization 序列化

将内存中对象存储下来，把它变成一个个字节,->二进制兇

deserialization 反序列化

将文件的一个个字节恢复成内存中对象。<- 二进制

序列化保存到文件就是持久化。

可以将数据序列化后持久化，或者网络传输；也可以将从文件中或者网络接收到的字节序列反序列化。

Python 提供了pickle 库。

pickle库

Python中的序列化、反序列化模块。

dumps 对象序列化为bytes对象

dump 对象序列化到文件对象，就是存入文件

loads 从bytes对象反序列化

load 对象反序列化，从文件读取数据

import pickle# 文件序列化和反序列化filename = 'test'd = {'a': 1, 'b': 'abc', 'c': [1, 2, 3]}l = list('123')i = 99with open(filename, 'wb') as f:pickle.dump(d, f)pickle.dump(l, f)pickle.dump(i, f)with open(filename, 'rb') as f:print(f.read(), f.seek(0))for _ in range(3):x = pickle.load(f)print(type(x), x)
# 输出
# b'\x80\x04\x95#\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x01a\x94K\x01\x8c\x01b\x94\x8c\x03abc\x94\x8c\x01c\x94]\x94(K\x01K\x02K\x03eu.\x80\x04\x95\x11\x00\x00\x00\x00\x00\x00\x00]\x94(\x8c\x011\x94\x8c\x012\x94\x8c\x013\x94e.\x80\x04Kc.' 0
# <class 'dict'> {'a': 1, 'b': 'abc', 'c': [1, 2, 3]}
# <class 'list'> ['1', '2', '3']
# <class 'int'> 99

import pickle# 定义类
class AAA:def __init__(self):self.tttt = 'abc'# 创建AA类的实例a1 = AAA()# 序列化ser = pickle.dumps(a1)
print('ser={}'.format(ser))# 反序列化a2 = pickle.loads(ser)print(a2, type(a2))print(a2.tttt)print(id(a1), id(a2))# 输出：
# ser=b'\x80\x04\x95(\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x03AAA\x94\x93\x94)\x81\x94}\x94\x8c\x04tttt\x94\x8c\x03abc\x94sb.'
# <__main__.AAA object at 0x102eb9a90> <class '__main__.AAA'>
# abc
# 4345922288 4343962256

可以看出这回除了必须保存的AAA，还序列化了tttt和abc，因为这是每一个对象自己的属性，每一个对象不一样的，所以这些数据需要序列化。

序列化、反序列化实验

定义类AAA，并序列化到文件

import pickle# 实验class AAA:def __init__(self):self.tttt = 'abc'aaa = AAA()sr = pickle.dumps(aaa)print(len(sr))file = 'test'with open(file, 'wb') as f:pickle.dump(aaa, f)

将产生的序列化文件发送到其他节点上。

增加一个x.py文件，内容如下。最后执行这个脚本 $ python x.py

import pickle# Traceback (most recent call last):
#   File "/Users/quyixiao/pp/python_lesson/function1/function15.py", line 4, in <module>
#     a = pickle.load(f)  # 异常
# AttributeError: Can't get attribute 'AAA' on <module '__main__' from '/Users/quyixiao/pp/python_lesson/function1/function15.py'>
with open('test', 'rb') as f:a = pickle.load(f)  # 异常

会抛出异常 AttributeError： Can‘t get attribute ‘AAA‘ on <module ‘__main__’ from ‘t.py’> 。

这个异常实际上是找不到类AAA的定义，增加类定义即可解决。

反序列化的时候要找到AAA类的定义，才能成功。否则就会抛出异常。

可以这样理解：反序列化的时候，类是模子，二进制序列就是铁水

import pickleclass AAA:def show(self):print('xyz')with open('test', 'rb') as f:a = pickle.load(f)print(a)a.show()# 输出：
# <__main__.AAA object at 0x10149c2f0>
# xyz

这里定义了类AAA，并且上面的代码也能成功的执行。

注意：这里的AAA定义和原来完全不同了。

因此，序列化、反序列化必须保证使用同一套类的定义，否则会带来不可预料的结果。

序列化应用

一般来说，本地序列化的情况，应用较少。大多数场景都应用在网络传输中。

将数据序列化后通过网络传输到远程节点，远程服务器上的服务将接收到的数据反序列化后，就可以使用了。

但是，要注意一点，远程接收端，反序列化时必须有对应的数据类型，否则就会报错。尤其是自定义类，必须远程得有一致的定义。

现在，大多数项目，都不是单机的，也不是单服务的。需要通过网络将数据传送到其他节点上去，这就需要大量的序列化、反序列化过程。

但是，问题是，Python程序之间还可以都是用pickle解决序列化、反序列化，如果是跨平台、跨语

言、跨协议pickle就不太适合了，就需要公共的协议。例如XML、Json、Protocol Buffer等。

不同的协议，效率不同、学习曲线不同，适用不同场景，要根据不同的情况分析选型。

Json

JSON（JavaScript Object Notation， JS 对象标记）是一种轻量级的数据交换格式。它基于 ECMAScript (w3c制定的JS规范)的一个子集，采用完全独立于编程语言的文本格式来存储和表示数据。

http://json.org/

Json的数据类型

值

value

双引号引起来的字符串，数值，true和false，null，对象，数组，这些都是值

在这里插入图片描述
字符串

由双引号包围起来的任意字符的组合，可以有转义字符。

数值

有正负，有整数、浮点数。

对象

无序的键值对的集合

格式： {key1：value1，…， keyn：valuelen)

key必须是一个字符串，需要双引号包围这个字符串。

value可以是任意合法的值。

在这里插入图片描述

实例

{"person": [{"name": "tom", "age": 18},{"name": "jerry", "age": 16}],"total": 2
}

json模块

Python 与 Json

Python支持少量内建数据类型到Json类型的转换。

在这里插入图片描述

常用方法

在这里插入图片描述

import jsond = {'name': 'Tom', 'age': 20, 'interest': ['music', 'movie']}j = json.dumps(d)print(j)  # 请注意引号的变化            {"name": "Tom", "age": 20, "interest": ["music", "movie"]}d1 = json.loads(j)print(d1)           # {'name': 'Tom', 'age': 20, 'interest': ['music', 'movie']}

一般json编码的数据很少落地，数据都是通过网络传输。传输的时候，要考虑压缩它。

本质上来说它就是个文本，就是个字符串。

ison很简单，几乎语言编程都支持Json，所以应用范围十分广泛。

MessagePack

MessagePack是一个基于二进制高效的对象序列化类库，可用于跨语言通信。

它可以像JSON那样，在许多种语言之间交换结构对象。

但是它比JSON更快速也更轻巧。

支持Python、Ruby、Java、C/C++等众多语言。宣称比Google Protocol Buffers还要快4倍。

兼容json和pickle。

在这里插入图片描述

# 72 bytes{"person": [{"name": "tom", "age": 18}, {"name": "jerry", "age": 16}], "total": 2}# 48 bytes# 82 аб 70 65 72 73 6f 6e 92 82 a4 6e 61 6d 65 a3 74 6f 6d a3 61 67 65 12 82 a4 6e 61 6d 65 a5 6а 65 72 72 79 а3 61 67 65 10 a5 74 6f 74 61 6c 02

可以看出，大大的节约了空间

安装

pip3 install msgpack-python

在这里插入图片描述

常用方法

packb 序列化对象。提供了dumps来兼容pickle和json。
unpackb 反序列化对象。提供了loads来兼容。
pack 序列化对象保存到文件对象。提供了dump来兼容。
unpack 反序列化对象保存到文件对象。提供了load来兼容。

import msgpackimport json# 源数据d = {"person": [{"name": "tom", "age": 18}, {"name": "jerry", "age": 16}], "total": 2}j = json.dumps(d)m = msgpack.dumps(d)  # 本质上就是packb   print("json = {}, msgpack = {}".format(len(j), len(m)))       # json = 82, msgpack = 48print(j.encode(), len(j.encode()))  #b'{"person": [{"name": "tom", "age": 18}, {"name": "jerry", "age": 16}], "total": 2}' 82print(m)   # b'\x82\xa6person\x92\x82\xa4name\xa3tom\xa3age\x12\x82\xa4name\xa5jerry\xa3age\x10\xa5total\x02'u = msgpack.unpackb(m)print(type(u), u)       # <class 'dict'> {b'person': [{b'name': b'tom', b'age': 18}, {b'name': b'jerry', b'age': 16}], b'total': 2}u = msgpack.unpackb(m, encoding='utf8')print(type(u), u)   # <class 'dict'> {'person': [{'name': 'tom', 'age': 18}, {'name': 'jerry', 'age': 16}], 'total': 2}

MessagePack简单易用，高效压缩，支持语言丰富。

所以，用它序列化也是一种很好的选择。

练习

单词统计增加忽略单词

对sample文件进行不区分大小写的单词统计

要求用户可以排除一些单词的统计，例如a、the、of等不应该出现在具有实际意义的统计中，应当忽略

要求，全部代码使用函数封装、调用完成

之前代码中，切分单词的太繁琐，因为makekey1函数已经可以直接把一行数据切成一个个单词了，所以对上面的代码重新封装。

def makekey2(line: str, chars=set("""!'"#./()[] ,*-\r\n""")):start = 0for i, c in enumerate(line):if c in chars:if start == i:  # 如果紧挨着还是特殊字符，start一定等于istart += 1  # 加1 并 continuecontinueyield line[start:i]start = i + 1  # 加1是跳过这个不需要的特殊字符celse:if start < len(line):  # 小于，说明还有有效的字符，而且一直到末尾yield line[start:]def wordcount(filename, encoding='utf8', ignore=set()):d = {}with open(filename, encoding=encoding) as f:for line in f:for word in map(str.lower, makekey2(line)):if word not in ignore:d[word] = d.get(word, 0) + 1return ddef top(d: dict, n=10):for i, (k, v) in enumerate(sorted(d.items(), key=lambda item: item[1], reverse=True)):if i > n:breakprint(k, v)wordx = wordcount('sample', ignore={'the', 'a'});
print(wordx)# 单词统计前几名
top(wordx)
输出：
{'iewie': 1, 'b': 1, 'c': 3, 'f': 1}
c 3
iewie 1
b 1
f 1

转换为json文件

有一个配置文件test.ini内容如下，将其转换成json格式文件,test.ini 文件如下

[DEFAULT]a = test[mysql]default-character-set=utf8a = 1000[mysqld]datadir =/dbserver/dataport = 33060character-set-server=utf8sql_mode=NO _ENGINE_SUBSTITUTION, STRICT_TRANS_TABLES

遍历ini文件的字典即可

from configparser import ConfigParser
import jsonfilename = 'test.ini'
jsonname = 'test.json'cfg = ConfigParser()cfg.read(filename)dest = {}for sect in cfg.sections():print(sect, cfg.items(sect))dest[sect] = dict(cfg.items(sect))json.dump(dest, open(jsonname, 'w'))# mysql [('a', '1000'), ('default-character-set', 'utf8')]
# mysqld [('a', 'test'), ('datadir', '/dbserver/data'), ('port', '33060'), ('character-set-server', 'utf8'), ('sql_mode', 'NO _ENGINE_SUBSTITUTION, STRICT_TRANS_TABLES')]

argparse 模块

一个可执行文件或者脚本都可以接收参数。
在这里插入图片描述

如何把这些参数传递给程序呢？

从3.2开始Python提供了参数分析的模块argparse。

参数分类

参数分为：

位置参数，参数放在那里，就要对应一个参数位置。例如/etc就是对应一个参数位置。

选项参数，必须通过前面是 - 的短选项或者 -- 的长选项，然后后面的才算它的参数，当然短选项后面也可以没有参数。

上例中，/etc对应的是位置参数，-是选项参数。

ls -alh src

基本解析

先来一段最简单的程序

import argparseparser = argparse.ArgumentParser()  # 获得一个参数解析器
args = parser.parse_args()  # 分析参数parser.print_help()  # 打印帮助# usage: function15.py [-h]
# 
# options:
#   -h, --help  show this help message and exit

argparse不仅仅做了参数的定义和解析，还自动帮助生成了帮助信息。尤其是usage，可以看到现在定义的参数是否是自己想要的。

解析器的参数

在这里插入图片描述

parser = argparse.ArgumentParser(prog='ls', add_help=True, description='list directory contents')

function1.py

x = 5def a():# x += 1print(x)a()print(x)print('------------------------------------------------')# 找外部要，如果没有的话，外部对内部是可见的，
# 在本地的范围内定义了一个本地变量，外部变量的作用域，外部作用，相当于对这个变量
# 相当于在外部的范围内重新定义了一个o,外部，是自己的，
# 外部
def out2():o = 65def inner():o = 97print('inner {}'.format(o))print(chr(o))inner()print('outer {}'.format(o))out2()x = 5def foo():y = x + 1def foo1():x = 1def foo2():  # 在本地的语句块中定义了x ,在本地定义了x,等号的右边定义了x,优先的定义，x = 2x += 1#
# 本地的作用域，相同#
# def foo5 ():
#     global  z #NameError: name 'z' is not defined
#     z = z + 1
#
# foo5()def foo5():global z  # z 是一个全局的变量，z = 20print(z)z = z + 1foo5()
print(z)# global 总结
# x += 1 这种特殊的形式产生的错误的原因，先引用后，赋值，而python 动态语言是赋值才有意义的
# 才能被引用，解决办法，在这条语句的前增加x = 0 的赋值语句，或者使用global 告诉内部作用域
# 使用全局的作用域，去全局的用作用域找变量赋值
# global 的使用原则
# 外部的使用域的变量内部的使用域是可见的，但是也不要在这内部的情况使用域直接使用，因为函数的目的就是为了封装
# 尽量与外界隔离
# 如果函数需要使用，
# 闭包
# 自由变量：未在本地作用域使用定义变量，例如定义在内层函数的外层函数的使用域的中的变量
# 闭包是一个概念，出现嵌套函数中，指的是内层函数的引用到外层函数的的自由变量，就形成了闭包，很多的语言都有这个概念，最熟悉的就是JavaScript
# 先看看右边的一段代码
# 第4行的print("*"*30)def counter():c = [0]  # 对c中的变量的值进行改变，但是不是对c变量本身进行改变def inc():c[0] += 1  #return c[0]return incfoo = counter()
print(type(foo))  # <class 'int'>
# print(type(foo())) #<class 'function'>
# 元组的是不可以改变的。
print(callable(foo)) # True 这个返回的是True,
print(foo(), foo())  # TypeError: 'int' object is not callable
c = 100  # 这个值通过它的调用，是没有用的，这个值被保留下来了，
print(foo())  # 函数执行完成以后，这个局部变量，外部函数调用，c[] 随着函数的调用，没有结束，这个c表示的是自由变量，使用c[0] 就生成了闭包
# 内部函数使用外部函数的自由变量，就产生了闭包，当前，并没有使用global,
# inc() 这种方式是不可见的

python function1.py help

在这里插入图片描述

位置参数解析

Is 基本功能应该解决目录内容的打印。

打印的时候应该指定目录路径，需要位置参数。

import argparse# 获得一个参数解析器parser = argparse.ArgumentParser(prog='ls', add_help=True, description='list direct ory contents')parser.add_argument ('path')args = parser.parse_args()#分析参数
parser.print_help()# 打印帮助
输出：
usage: ls [-h] path
ls: error: the following arguments are required: path

程序等定义为：

Is |-h] path

- 为帮助，可有可无

path为位置参数，必须提供

传参

parse_args(args=None， namespace=None)

args 参数列表，一个可迭代对象。内部会把可迭代对象转换成list。如果为None则使用命令行传入参数，非None则使用args参数的可迭代对象。

import argparse# 获得一个参数解析器parser = argparse.ArgumentParser(prog='ls',add_help=True, description='list direct ory contents')parser.add_argument('path') # 位置参数args = parser.parse_args(('//Users/quyixiao/pp/python_lesson/function1',))#分析参数，同时传入可迭代的参数 print（args）＃ 打印名词空间中收集的参数parser.print_help()# 打印帮助
输出：
usage: ls [-h] pathlist direct ory contentspositional arguments:pathoptions:-h, --help  show this help message and exit

Namespace（path=‘/etc’)里面的path参数存储在了一个Namespace对象内的属性上，可以通过Namespace对象属性来访问，例如args.path

非必须位置参数

上面的代码必须输入位置参数，否则会报错。
在这里插入图片描述

但有时候，ls命令不输入任何路径的话就表示列出当前目录的文件列表。

import argparse# 获得一个参数解析器parser = argparse.ArgumentParser(prog='ls', add_help=True, description='list direct ory contents')parser.add_argument('path', nargs='?', default='.', help="path help")  # 位置参数args = parser.parse_args()  # 分析参数，同时传入可迭代的参数print(args)  # 打印名词空间中收集的参数parser.print_help()  # 打印帮助
输出：
Namespace(path='.')
usage: ls [-h] [path]list direct ory contentspositional arguments:path        path helpoptions:-h, --help  show this help message and exit

可以看出path也变成可选的位置参数，没有提供就使用默认值.点号表示当前路径。

help 表示帮助文档中这个参数的描述
nargs 表示这个参数接收结果参数，\？表示可有可无，+表示至少一个，*可以任意个，数字表示必须是指定数目个
default表示如果不提供该参数，就使用这个值。一般和\？, * 配合，因为它们都可以不提供位置参数，不提供就是用缺省值

选项参数

-l的实现
parser.add_argument(‘-l’) 就增加了选项参数，参数定义为 Is [-h] [-l L] [path]

和我们要的形式有一点出入，我们期望的是[-h]，怎么解决？

nargs能够解决吗？

parser.add_argument(‘-l’， nargs=‘?’)

Is [-h][-l [L]][path]

L变成了可选，而不是-l

那么，直接把nargs=0，意思就是让这个选项接收0个参数，如下 parser.add _argument(‘-l’， nargs=0)

结果，抛出异常

raise Value Error('nargs for store actions must be > 0 , if you'ValueError: nargs for store actions must be > 0; if you have nothing to store, actions such as store true or store const may be more appropriate

为了这个问题，使用action参数

parser.add_argument(‘-l’,action=‘store_true’) 看到命令定义变成了 ls [-h][-l][path] 提供-l选项，例如

ls -l 得到Namespace(l=True,path=‘.’)

Is 得到Namespace(l=False， path=‘.’)

这样同True、False来判断用户是否提供了该选项

-a的实现

parser.add_argument(‘-a’， ‘–all’， action=‘store_true’)
长短选项可以同时给出。

代码

import argparse# 获得一个参数解析器parser = argparse.ArgumentParser(prog='ls', add_help=True, description='list direct ory contents')parser.add_argument('path', nargs='?', default='.', help="directory") # 位置参数 可有可无,缺省值,帮助parser.add_argument('-l', action='store_true' , help='use a long listing format')
parser.add_argument('-a', '--all', action='store_true', help='show all files, do not ignore entries starting with .')args = parser.parse_args()#分析参数,同时传入可迭代的参数print(args)# 打印名词空间中收集的参数parser.print_help() #  打印帮助#  运行结果
Namespace(path='.', l=False, all=False)
usage: ls [-h] [-l] [-a] [path]list direct ory contentspositional arguments:path        directoryoptions:-h, --help  show this help message and exit-l          use a long listing format-a, --all   show all files, do not ignore entries starting with .

Is业务功能的实现

到目前为止，已经解决了参数的定义和传参的问题，下面就要解决业务问题：

列出所有指定路径的文件，默认是不递归的
-a 显示所有文件，包括隐藏文件
-l详细列表模式显示

代码实现

import argparseimport argparsefrom pathlib import Pathfrom datetime import datetime# 获得一个参数解析器parser = argparse.ArgumentParser(prog='ls', add_help=True, description='list direct ory contents')parser.add_argument('path', nargs='?', default='.', help="directory")  # 位置参数 可有可无,缺省值,帮助parser.add_argument('-l', action='store_true', help='use a long listing format')
parser.add_argument('-a', '--all', action='store_true', help='show all files, do no t ignore entries starting with .')args = parser.parse_args()  # 分析参数,同时传入可迭代的参数print(args)  # 打印名词空间中收集的参数parser.print_help()  # 打印帮助def listdir(path, all=False):""""列出本目录文件"""p = Path(path)for i in p.iterdir():if not all and i.name.startswith('.'):  # 不显示隐藏文件continueyield i.nameprint(list(listdir(args.path)))# 获取文件类型def _getfiletype(f: Path):if f.is_dir():return 'd'elif f.is_block_device():return 'b'elif f.is_char_device():return 'c'elif f.is_socket():return 's'elif f.is_symlink():return '1'else:return '-'# -rw-rw-r-- 1 python python 5 Oct 25 00:07 test4def listdirdetail(path, all=False):"""详细列出本目录"""p = Path(path)for i in p.iterdir():if not all and i.name.startswith('.'):  # 不显示隐藏文件continue# mode 硬链接属主属组字节时间 namestat = i.stat()t = _getfiletype(i)mode = oct(stat.st_mode)[-3:]atime = datetime.fromtimestamp(stat.st_atime).strftime('%Y %m %d %H:%M:%S')yield (t, mode, stat.st_uid, stat.st_gid, stat.st_size, atime, i.name)print(list(listdirdetail(args.path)))输出：
usage: ls [-h] [-l] [-a] [path]list direct ory contentspositional arguments:path        directoryoptions:-h, --help  show this help message and exit-l          use a long listing format-a, --all   show all files, do no t ignore entries starting with .
['function16.py', 'function8.py', 'function12.py', 'function9.py', 'test1.txt', 'my_binary_file', 'function13.py', 'test', 'function17.py', 'function6.py', 'function2.py', 'test1', 'function3.py', 'test.ini', '__init__.py', 'test.json', 'test.py', 'function7.py', 'a', 'testxx', 'function4.py', 'function5.py', 'sample', 'function1.py', 'test.txt', 'test3', 'function10.py', 'test2', 'function15.py', 'b', 'sample.txt', 'my_text_file', 'function11.py']
[('-', '644', 501, 20, 413, '2025 06 19 22:44:03', 'function16.py'), ('-', '644', 501, 20, 330, '2025 06 19 22:44:03', 'function8.py'), ('-', '644', 501, 20, 351, '2025 06 19 22:44:03', 'function12.py'), ('-', '644', 501, 20, 301, '2025 06 19 22:44:03', 'function9.py'), ('-', '644', 501, 20, 14, '2025 07 09 08:20:14', 'test1.txt'), ('-', '644', 501, 20, 20, '2025 07 08 21:02:57', 'my_binary_file'), ('-', '644', 501, 20, 661, '2025 06 19 22:44:03', 'function13.py'), ('-', '644', 501, 20, 51, '2025 07 09 09:34:23', 'test'), ('-', '644', 501, 20, 794, '2025 06 19 22:44:03', 'function17.py'), ('-', '644', 501, 20, 196, '2025 06 19 22:44:03', 'function6.py'), ('-', '644', 501, 20, 268, '2025 06 19 22:44:03', 'function2.py'), ('-', '644', 501, 20, 0, '2025 07 09 08:29:58', 'test1'), ('-', '644', 501, 20, 169, '2025 06 19 22:44:03', 'function3.py'), ('-', '644', 501, 20, 197, '2025 07 09 17:56:12', 'test.ini'), ('-', '644', 501, 20, 0, '2025 06 19 22:44:02', '__init__.py'), ('-', '644', 501, 20, 220, '2025 07 09 17:57:31', 'test.json'), ('-', '644', 501, 20, 41, '2025 07 09 13:05:14', 'test.py'), ('-', '644', 501, 20, 125, '2025 06 19 22:44:03', 'function7.py'), ('d', '755', 501, 20, 128, '2025 07 09 09:08:29', 'a'), ('-', '644', 501, 20, 45, '2025 07 08 19:19:59', 'testxx'), ('-', '644', 501, 20, 196, '2025 06 19 22:44:03', 'function4.py'), ('-', '644', 501, 20, 1242, '2025 06 19 22:44:03', 'function5.py'), ('-', '644', 501, 20, 17, '2025 07 08 19:45:14', 'sample'), ('-', '644', 501, 20, 2841, '2025 07 09 18:08:29', 'function1.py'), ('-', '644', 501, 20, 14, '2025 07 09 08:20:14', 'test.txt'), ('-', '644', 501, 20, 12, '2025 07 08 18:48:56', 'test3'), ('-', '644', 501, 20, 344, '2025 06 19 22:44:03', 'function10.py'), ('-', '644', 501, 20, 4, '2025 07 08 09:17:27', 'test2'), ('-', '644', 501, 20, 1893, '2025 07 09 19:04:28', 'function15.py'), ('d', '755', 501, 20, 64, '2025 07 09 08:48:44', 'b'), ('-', '600', 501, 20, 13099, '2025 07 09 17:58:16', 'sample.txt'), ('-', '644', 501, 20, 18, '2025 07 08 21:02:57', 'my_text_file'), ('-', '644', 501, 20, 803, '2025 06 19 22:44:03', 'function11.py')]

mode是整数，八进制描述的权限，最终显示未rwx的格式。

方法1

modelist = ['r', 'w', 'x', 'r', 'w', 'x', 'r', 'w', 'x']def _getmodestr(mode: int):m = mode & 0o777# print(mode)# print(m, bin(m))mstr = ''for i, v in enumerate(bin(m)[-9:]):if v == '1':mstr += modelist[i]else:mstr += '-'return mstrprint(_getmodestr(5))  # --x-w

方法2

modelist = dict(zip(range(9), ['r', 'w', 'x', 'r', 'w', 'x', 'r', 'w', 'x']))print(modelist)def _getmodestr(mode: int):m = mode & 0o777mstr = ''for i in range(8, -1, -1):if m >> i & 1:mstr += modelist[8 - i]else:mstr += '-'return mstr
print(_getmodestr(5))

合并列出文件函数

listdirdetail和口listdir几乎一样，重复太多，合并

import argparsefrom pathlib import Pathfrom datetime import datetime# 获得一个参数解析器parser = argparse.ArgumentParser(prog='ls', add_help=True, description='list direct ory contents')parser.add_argument('path', nargs='?', default='.', help="directory")  # 位置参数， 可有可无,缺省值,帮助parser.add_argument('-l', action='store_true', help='use a long listing format')
parser.add_argument('-a', '--all', action='store_true', help='show all files, do no t ignore entries starting with .')
args = parser.parse_args()  # 分析参数,同时传入可迭代的参数print(args)  # 打印名词空间中收集的参数parser.print_help()  # 打印帮助def _getfiletype(f: Path):"""获取文件类型"""if f.is_dir():return 'd'elif f.is_block_device():return 'b'elif f.is_char_device():return 'c'elif f.is_socket():return 's'elif f.is_symlink():return 'l'elif f.is_fifo():  # pipereturn 'p'else:return '-'modelist = dict(zip(range(9), ['r', 'w', 'x', 'r', 'w', 'x', 'r', 'w', 'x']))def _getmodestr(mode: int):m = mode & 0o777mstr = ''for i in range(8, -1, -1):if m >> i & 1:mstr += modelist[8 - i]else:mstr += '-'return mstrdef listdir(path, all=False, detail=False):"""详细列出本目录"""p = Path(path)for i in p.iterdir():if not all and i.name.startswith('.'):  # 不显示隐藏文件continueif not detail:yield (i.name,)else:# -rw-rw-r-- 1             python python 5 Oct 25 00:07 test4# mode   硬链接 属主 属组 字节 时间 namestat = i.stat()mode = _getfiletype(i) + _getmodestr(stat.st_mode)atime = datetime.fromtimestamp(stat.st_atime).strftime('%Y %m %d %H:%M:%S')yield (mode, stat.st_nlink, stat.st_uid, stat.st_gid, stat.st_size, atime, i.name)for x in listdir(args.path,detail=True):print(x)输出：
Namespace(path='.', l=False, all=False)
usage: ls [-h] [-l] [-a] [path]list direct ory contentspositional arguments:path        directoryoptions:-h, --help  show this help message and exit-l          use a long listing format-a, --all   show all files, do no t ignore entries starting with .
('-rwxrwxrwx', 1, 501, 20, 413, '2025 06 19 22:44:03', 'function16.py')
('-rwxrwxrwx', 1, 501, 20, 330, '2025 06 19 22:44:03', 'function8.py')
('-rwxrwxrwx', 1, 501, 20, 351, '2025 06 19 22:44:03', 'function12.py')
('-rwxrwxrwx', 1, 501, 20, 301, '2025 06 19 22:44:03', 'function9.py')
('-rwxrwxrwx', 1, 501, 20, 14, '2025 07 09 08:20:14', 'test1.txt')
('-rwxrwxrwx', 1, 501, 20, 20, '2025 07 08 21:02:57', 'my_binary_file')
('-rwxrwxrwx', 1, 501, 20, 661, '2025 06 19 22:44:03', 'function13.py')
('-rwxrwxrwx', 1, 501, 20, 51, '2025 07 09 09:34:23', 'test')
('-rwxrwxrwx', 1, 501, 20, 794, '2025 06 19 22:44:03', 'function17.py')
('-rwxrwxrwx', 1, 501, 20, 196, '2025 06 19 22:44:03', 'function6.py')
('-rwxrwxrwx', 1, 501, 20, 268, '2025 06 19 22:44:03', 'function2.py')
('-rwxrwxrwx', 1, 501, 20, 0, '2025 07 09 08:29:58', 'test1')
('-rwxrwxrwx', 1, 501, 20, 169, '2025 06 19 22:44:03', 'function3.py')
('-rwxrwxrwx', 1, 501, 20, 197, '2025 07 09 17:56:12', 'test.ini')
('-rwxrwxrwx', 1, 501, 20, 0, '2025 06 19 22:44:02', '__init__.py')
('-rwxrwxrwx', 1, 501, 20, 220, '2025 07 09 17:57:31', 'test.json')
('-rwxrwxrwx', 1, 501, 20, 41, '2025 07 09 13:05:14', 'test.py')
('-rwxrwxrwx', 1, 501, 20, 125, '2025 06 19 22:44:03', 'function7.py')
('drwxrwxrwx', 4, 501, 20, 128, '2025 07 09 09:08:29', 'a')
('-rwxrwxrwx', 1, 501, 20, 45, '2025 07 08 19:19:59', 'testxx')
('-rwxrwxrwx', 1, 501, 20, 196, '2025 06 19 22:44:03', 'function4.py')
('-rwxrwxrwx', 1, 501, 20, 1242, '2025 06 19 22:44:03', 'function5.py')
('-rwxrwxrwx', 1, 501, 20, 17, '2025 07 08 19:45:14', 'sample')
('-rwxrwxrwx', 1, 501, 20, 2841, '2025 07 09 18:08:29', 'function1.py')
('-rwxrwxrwx', 1, 501, 20, 14, '2025 07 09 08:20:14', 'test.txt')
('-rwxrwxrwx', 1, 501, 20, 12, '2025 07 08 18:48:56', 'test3')
('-rwxrwxrwx', 1, 501, 20, 344, '2025 06 19 22:44:03', 'function10.py')
('-rwxrwxrwx', 1, 501, 20, 4, '2025 07 08 09:17:27', 'test2')
('-rwxrwxrwx', 1, 501, 20, 2116, '2025 07 09 19:21:12', 'function15.py')
('drwxrwxrwx', 2, 501, 20, 64, '2025 07 09 08:48:44', 'b')
('-rwxrwxrwx', 1, 501, 20, 13099, '2025 07 09 17:58:16', 'sample.txt')
('-rwxrwxrwx', 1, 501, 20, 18, '2025 07 08 21:02:57', 'my_text_file')
('-rwxrwxrwx', 1, 501, 20, 803, '2025 06 19 22:44:03', 'function11.py')

排序

Is的显示是把文件名按照升序排序输出。

for x in sorted(listdir(args.path, detail=True), key=lambda x: x[len(x) - 1]):print(x)

完整代码

再次重构代码

import argparsefrom pathlib import Pathfrom datetime import datetime# 获得一个参数解析器parser = argparse.ArgumentParser(prog='ls', add_help=True, description='list direct ory contents')parser.add_argument('path', nargs='?', default='.', help="directory")  # 位置参数， 可有可无,缺省值,帮助parser.add_argument('-l', action='store_true', help='use a long listing format')
parser.add_argument('-a', '--all', action='store_true', help='show all files, do no t ignore entries starting with .')
args = parser.parse_args()  # 分析参数,同时传入可迭代的参数print(args)  # 打印名词空间中收集的参数parser.print_help()  # 打印帮助def listdir(path, all=False, detail=False):def _getfiletype(f: Path):"""获取文件类型"""if f.is_dir():return 'd'elif f.is_block_device():return 'b'elif f.is_char_device():return 'c'elif f.is_socket():return 's'elif f.is_symlink():return 'l'elif f.is_fifo():  # pipereturn 'p'else:return '-'modelist = dict(zip(range(9), ['r', 'w', 'x', 'r', 'w', 'x', 'r', 'w', 'x']))def _getmodestr(mode: int):m = mode & 0o777mstr = ''for i in range(8, -1, -1):if m >> i & 1:mstr += modelist[8 - i]else:mstr += '-'return mstrdef _listdir(path, all=False, detail=False):"""详细列出本目录"""p = Path(path)for i in p.iterdir():if not all and i.name.startswith('.'):  # 不显示隐藏文件continueif not detail:yield (i.name,)else:# -rw-rw-r-- 1             python python 5 Oct 25 00:07 test4# mode   硬链接 属主 属组 字节 时间 namestat = i.stat()mode = _getfiletype(i) + _getmodestr(stat.st_mode)atime = datetime.fromtimestamp(stat.st_atime).strftime('%Y %m %d %H:%M:%S')yield (mode, stat.st_nlink, stat.st_uid, stat.st_gid, stat.st_size, atime, i.name)# 排序yield from sorted(_listdir(path, all, detail), key=lambda x: x[len(x) - 1])if __name__ == '__main__':args = parser.parse_args()  # 分析参数，同时传入可迭代的参数print(args)  # 打印名词空间中收集的参数parser.print_help()  # 打印帮助files = listdir(args.path, args.all, detail=True)for x in files:print(x)
输出：Namespace(path='.', l=False, all=False)
usage: ls [-h] [-l] [-a] [path]list direct ory contentspositional arguments:path        directoryoptions:-h, --help  show this help message and exit-l          use a long listing format-a, --all   show all files, do no t ignore entries starting with .
Namespace(path='.', l=False, all=False)
usage: ls [-h] [-l] [-a] [path]list direct ory contentspositional arguments:path        directoryoptions:-h, --help  show this help message and exit-l          use a long listing format-a, --all   show all files, do no t ignore entries starting with .
('-rwxrwxrwx', 1, 501, 20, 0, '2025 06 19 22:44:02', '__init__.py')
('drwxrwxrwx', 4, 501, 20, 128, '2025 07 09 09:08:29', 'a')
('drwxrwxrwx', 2, 501, 20, 64, '2025 07 09 08:48:44', 'b')
('-rwxrwxrwx', 1, 501, 20, 2841, '2025 07 09 18:08:29', 'function1.py')
('-rwxrwxrwx', 1, 501, 20, 344, '2025 06 19 22:44:03', 'function10.py')
('-rwxrwxrwx', 1, 501, 20, 803, '2025 06 19 22:44:03', 'function11.py')
('-rwxrwxrwx', 1, 501, 20, 351, '2025 06 19 22:44:03', 'function12.py')
('-rwxrwxrwx', 1, 501, 20, 661, '2025 06 19 22:44:03', 'function13.py')
('-rwxrwxrwx', 1, 501, 20, 2659, '2025 07 09 19:31:13', 'function15.py')
('-rwxrwxrwx', 1, 501, 20, 413, '2025 06 19 22:44:03', 'function16.py')
('-rwxrwxrwx', 1, 501, 20, 794, '2025 06 19 22:44:03', 'function17.py')
('-rwxrwxrwx', 1, 501, 20, 268, '2025 06 19 22:44:03', 'function2.py')
('-rwxrwxrwx', 1, 501, 20, 169, '2025 06 19 22:44:03', 'function3.py')
('-rwxrwxrwx', 1, 501, 20, 196, '2025 06 19 22:44:03', 'function4.py')
('-rwxrwxrwx', 1, 501, 20, 1242, '2025 06 19 22:44:03', 'function5.py')
('-rwxrwxrwx', 1, 501, 20, 196, '2025 06 19 22:44:03', 'function6.py')
('-rwxrwxrwx', 1, 501, 20, 125, '2025 06 19 22:44:03', 'function7.py')
('-rwxrwxrwx', 1, 501, 20, 330, '2025 06 19 22:44:03', 'function8.py')
('-rwxrwxrwx', 1, 501, 20, 301, '2025 06 19 22:44:03', 'function9.py')
('-rwxrwxrwx', 1, 501, 20, 20, '2025 07 08 21:02:57', 'my_binary_file')
('-rwxrwxrwx', 1, 501, 20, 18, '2025 07 08 21:02:57', 'my_text_file')
('-rwxrwxrwx', 1, 501, 20, 17, '2025 07 08 19:45:14', 'sample')
('-rwxrwxrwx', 1, 501, 20, 13099, '2025 07 09 17:58:16', 'sample.txt')
('-rwxrwxrwx', 1, 501, 20, 51, '2025 07 09 09:34:23', 'test')
('-rwxrwxrwx', 1, 501, 20, 197, '2025 07 09 17:56:12', 'test.ini')
('-rwxrwxrwx', 1, 501, 20, 220, '2025 07 09 17:57:31', 'test.json')
('-rwxrwxrwx', 1, 501, 20, 41, '2025 07 09 13:05:14', 'test.py')
('-rwxrwxrwx', 1, 501, 20, 14, '2025 07 09 08:20:14', 'test.txt')
('-rwxrwxrwx', 1, 501, 20, 0, '2025 07 09 08:29:58', 'test1')
('-rwxrwxrwx', 1, 501, 20, 14, '2025 07 09 08:20:14', 'test1.txt')
('-rwxrwxrwx', 1, 501, 20, 4, '2025 07 08 09:17:27', 'test2')
('-rwxrwxrwx', 1, 501, 20, 12, '2025 07 08 18:48:56', 'test3')
('-rwxrwxrwx', 1, 501, 20, 45, '2025 07 08 19:19:59', 'testxx')

-h的实现

-h,-human-readable，如果-l存在，-h有效。

1、增加选项参数

parser = argparse.ArgumentParser(prog='ls', description='list directory contents', add_help=False)parser.add_argument('-h', '--human-readable', action='store_true', help='with -1, p rint sizes in human readable format')

2、增加一个函数，能够解决单位转换的

def _gethuman(size: int):units = ' KMGTP'depth = 0while size >= 1000:size = size // 1000depth += 1return '{}{}'.format(size, units[depth])

3、在-l逻辑部分增加处理

size = stat.st_size if not human else _gethuman(stat.st_size)

其他的完善

uid、gid的转换

pwd模块，The password database，提供访问Linux、Unix 的password文件的方式。windows没有。

pwd.getpwuid(Path().stat().st_uid).pw_name

grp模块，Linux、Unix获取组信息的模块。windows没有

grp.getgrgid(Path().stat().st_gid().gr_name

pathlib模块，Path().group()或者 Path().owner()也可以，本质上它们就是调用pwd模块和grp模块。

由于windows不支持，这次可以不加这个uid、gid的转换

import argparsefrom pathlib import Pathfrom datetime import datetime# 获得一个参数解析器parser = argparse.ArgumentParser(prog='ls', add_help=True, description='list direct ory contents')
parser.add_argument('path', nargs='?', default='.', help="directory")  # 位置参数， 可有可无,缺省值,帮助
parser.add_argument('-l', action='store_true', help='use a long listing format')
parser.add_argument('-a', '--all', action='store_true', help='show all files, do no t ignore entries starting with .')
parser.add_argument ('-hr', '--human-readable', action='store_true', help='with -l, print sizes in human readable format' )
args = parser.parse_args()  # 分析参数,同时传入可迭代的参数print(args)  # 打印名词空间中收集的参数parser.print_help()  # 打印帮助def listdir(path, all=False, detail=False,human=False):def _getfiletype(f: Path):"""获取文件类型"""if f.is_dir():return 'd'elif f.is_block_device():return 'b'elif f.is_char_device():return 'c'elif f.is_socket():return 's'elif f.is_symlink():return 'l'elif f.is_fifo():  # pipereturn 'p'else:return '-'modelist = dict(zip(range(9), ['r', 'w', 'x', 'r', 'w', 'x', 'r', 'w', 'x']))def _gethuman(size: int):units = ' KMGTP'depth = 0while size >= 1000:size = size // 1000depth += 1return '{}{}'.format(size, units[depth])def _getmodestr(mode: int):m = mode & 0o777mstr = ''for i in range(8, -1, -1):if m >> i & 1:mstr += modelist[8 - i]else:mstr += '-'return mstrdef _listdir(path, all=False, detail=False,human=False):"""详细列出本目录"""p = Path(path)for i in p.iterdir():if not all and i.name.startswith('.'):  # 不显示隐藏文件continueif not detail:yield (i.name,)else:# -rw-rw-r-- 1             python python 5 Oct 25 00:07 test4# mode   硬链接 属主 属组 字节 时间 namestat = i.stat()mode = _getfiletype(i) + _getmodestr(stat.st_mode)atime = datetime.fromtimestamp(stat.st_atime).strftime('%Y %m %d %H:%M:%S')size = stat.st_size if not human else _gethuman(stat.st_size)yield (mode, stat.st_nlink, stat.st_uid, stat.st_gid, stat.st_size,size, atime, i.name)# 排序yield from sorted(_listdir(path, all, detail,human), key=lambda x: x[len(x) - 1])if __name__ == '__main__':args = parser.parse_args()  # 分析参数，同时传入可迭代的参数print(args)  # 打印名词空间中收集的参数print('---------------------')parser.print_help()  # 打印帮助files = listdir(args.path, args.all, detail=True,human=True)for x in files:print(x)
输出：
usage: ls [-h] [-l] [-a] [-hr] [path]list direct ory contentspositional arguments:path                  directoryoptions:-h, --help            show this help message and exit-l                    use a long listing format-a, --all             show all files, do no t ignore entries starting with .-hr, --human-readablewith -l, print sizes in human readable format
Namespace(path='.', l=False, all=False, human_readable=False)
---------------------
usage: ls [-h] [-l] [-a] [-hr] [path]list direct ory contentspositional arguments:path                  directoryoptions:-h, --help            show this help message and exit-l                    use a long listing format-a, --all             show all files, do no t ignore entries starting with .-hr, --human-readablewith -l, print sizes in human readable format
('-rw-r--r--', 1, 501, 20, 0, '0 ', '2025 06 19 22:44:02', '__init__.py')
('drwxr-xr-x', 4, 501, 20, 128, '128 ', '2025 07 09 09:08:29', 'a')
('drwxr-xr-x', 2, 501, 20, 64, '64 ', '2025 07 09 08:48:44', 'b')
('-rw-r--r--', 1, 501, 20, 2841, '2K', '2025 07 09 18:08:29', 'function1.py')
('-rw-r--r--', 1, 501, 20, 344, '344 ', '2025 06 19 22:44:03', 'function10.py')

改进mode的方法

使用stat模块


import statfrom pathlib import Pathstat. filemode(Path().stat().st_mode)

最终代码

import argparse
import stat
from pathlib import Pathfrom datetime import datetime# 获得一个参数解析器parser = argparse.ArgumentParser(prog='ls', add_help=True, description='list direct ory contents')
parser.add_argument('path', nargs='?', default='.', help="directory")  # 位置参数， 可有可无,缺省值,帮助
parser.add_argument('-l', action='store_true', help='use a long listing format')
parser.add_argument('-a', '--all', action='store_true', help='show all files, do no t ignore entries starting with .')
parser.add_argument ('-hr', '--human-readable', action='store_true', help='with -l, print sizes in human readable format' )
args = parser.parse_args()  # 分析参数,同时传入可迭代的参数print(args)  # 打印名词空间中收集的参数parser.print_help()  # 打印帮助def listdir(path, all=False, detail=False,human=False):def _getfiletype(f: Path):"""获取文件类型"""if f.is_dir():return 'd'elif f.is_block_device():return 'b'elif f.is_char_device():return 'c'elif f.is_socket():return 's'elif f.is_symlink():return 'l'elif f.is_fifo():  # pipereturn 'p'else:return '-'modelist = dict(zip(range(9), ['r', 'w', 'x', 'r', 'w', 'x', 'r', 'w', 'x']))def _gethuman(size: int):units = ' KMGTP'depth = 0while size >= 1000:size = size // 1000depth += 1return '{}{}'.format(size, units[depth])def _getmodestr(mode: int):m = mode & 0o777mstr = ''for i in range(8, -1, -1):if m >> i & 1:mstr += modelist[8 - i]else:mstr += '-'return mstrdef _listdir(path, all=False, detail=False,human=False):"""详细列出本目录"""p = Path(path)for i in p.iterdir():if not all and i.name.startswith('.'):  # 不显示隐藏文件continueif not detail:yield (i.name,)else:# -rw-rw-r-- 1             python python 5 Oct 25 00:07 test4# mode   硬链接 属主 属组 字节 时间 namest = i.stat()mode = stat.filemode(st.st_mode)atime = datetime.fromtimestamp(st.st_atime).strftime('%Y %m %d %H:%M:%S')size = str(st.st_size ) if not human else _gethuman(st.st_size)yield (mode, st.st_nlink, st.st_uid, st.st_gid,size, atime, i.name)# 排序yield from sorted(_listdir(path, all, detail,human), key=lambda x: x[len(x) - 1])if __name__ == '__main__':args = parser.parse_args()  # 分析参数，同时传入可迭代的参数print(args)  # 打印名词空间中收集的参数print('---------------------')parser.print_help()  # 打印帮助files = listdir(args.path, args.all, detail=True,human=True)for x in files:print(x)输出：Namespace(path='.', l=False, all=False, human_readable=False)
usage: ls [-h] [-l] [-a] [-hr] [path]list direct ory contentspositional arguments:path                  directoryoptions:-h, --help            show this help message and exit-l                    use a long listing format-a, --all             show all files, do no t ignore entries starting with .-hr, --human-readablewith -l, print sizes in human readable format
Namespace(path='.', l=False, all=False, human_readable=False)
---------------------
usage: ls [-h] [-l] [-a] [-hr] [path]list direct ory contentspositional arguments:path                  directoryoptions:-h, --help            show this help message and exit-l                    use a long listing format-a, --all             show all files, do no t ignore entries starting with .-hr, --human-readablewith -l, print sizes in human readable format
('-rw-r--r--', 1, 501, 20, '0 ', '2025 06 19 22:44:02', '__init__.py')
('drwxr-xr-x', 4, 501, 20, '128 ', '2025 07 09 09:08:29', 'a')
('drwxr-xr-x', 2, 501, 20, '64 ', '2025 07 09 08:48:44', 'b')
('-rw-r--r--', 1, 501, 20, '2K', '2025 07 09 18:08:29', 'function1.py')
('-rw-r--r--', 1, 501, 20, '344 ', '2025 06 19 22:44:03', 'function10.py')
('-rw-r--r--', 1, 501, 20, '803 ', '2025 06 19 22:44:03', 'function11.py')
('-rw-r--r--', 1, 501, 20, '351 ', '2025 06 19 22:44:03', 'function12.py')
('-rw-r--r--', 1, 501, 20, '661 ', '2025 06 19 22:44:03', 'function13.py')
('-rw-r--r--', 1, 501, 20, '3K', '2025 07 09 19:53:41', 'function15.py')
('-rw-r--r--', 1, 501, 20, '413 ', '2025 06 19 22:44:03', 'function16.py')
('-rw-r--r--', 1, 501, 20, '794 ', '2025 06 19 22:44:03', 'function17.py')