当前位置：首页 > web >正文

MIT 6.S081 Lab 11 networking

web 2025/6/25 11:15:07

文章目录

Lecture 21: Networking
- Ethernet packet
- tcpdump
- ARP
- IP
- TCP/ UDP
Lab: networking
- Background
- Your Job
- Hints
- 实现
- - E1000网卡
  - 代码

Lecture 21: Networking

Ethernet packet

下面是xv6中以太网头部的定义

// kernel/net.h 
#define ETHADDR_LEN 6
// an Ethernet packet header (start of the packet).
struct eth {uint8  dhost[ETHADDR_LEN];uint8  shost[ETHADDR_LEN];uint16 type;
} __attribute__((packed));#define ETHTYPE_IP  0x0800 // Internet protocol
#define ETHTYPE_ARP 0x0806 // Address resolution protocol

tcpdump

使用命令tcpdump -XXnr packets.pcap查看报文

reading from file packets.pcap, link-type EN10MB (Ethernet)
15:27:40.861988 IP 10.0.2.15.2000 > 10.0.2.2.25603: UDP, length 190x0000:  ffff ffff ffff 5254 0012 3456 0800 4500  ......RT..4V..E.0x0010:  002f 0000 0000 6411 3eae 0a00 020f 0a00  ./....d.>.......0x0020:  0202 07d0 6403 001b 0000 6120 6d65 7373  ....d.....a.mess0x0030:  6167 6520 6672 6f6d 2078 7636 21         age.from.xv6!

目的地址0xffff ffff ffff, 广播包, 源地址0x5254 0012 3456, 类型0x0800, IP协议

ARP

// kernel/net.h
// an ARP packet (comes after an Ethernet header).
struct arp {uint16 hrd; // format of hardware addressuint16 pro; // format of protocol addressuint8  hln; // length of hardware addressuint8  pln; // length of protocol addressuint16 op;  // operationchar   sha[ETHADDR_LEN]; // sender hardware addressuint32 sip;              // sender IP addresschar   tha[ETHADDR_LEN]; // target hardware addressuint32 tip;              // target IP address
} __attribute__((packed));#define ARP_HRD_ETHER 1 // Ethernetenum {ARP_OP_REQUEST = 1, // requests hw addr given protocol addrARP_OP_REPLY = 2,   // replies a hw addr given protocol addr
};

15:27:40.862370 ARP, Request who-has 10.0.2.15 tell 10.0.2.2, length 280x0000:  ffff ffff ffff 5255 0a00 0202 0806 0001  ......RU........0x0010:  0800 0604 0001 5255 0a00 0202 0a00 0202  ......RU........0x0020:  0000 0000 0000 0a00 020f      
15:27:40.862844 ARP, Reply 10.0.2.15 is-at 52:54:00:12:34:56, length 280x0000:  ffff ffff ffff 5254 0012 3456 0806 0001  ......RT..4V....0x0010:  0800 0604 0002 5254 0012 3456 0a00 020f  ......RT..4V....0x0020:  5255 0a00 0202 0a00 0202                 RU........

Packet1: 目的地址0xffff ffff ffff, 广播包, 源地址0x5255 0a00 0202, 类型0x0806, ARP协议.
tip字段是0x0a00 020f(10.0.2.15), 其他类推

IP

IP报头

// an IP packet header (comes after an Ethernet header).
struct ip {uint8  ip_vhl; // version << 4 | header length >> 2uint8  ip_tos; // type of serviceuint16 ip_len; // total lengthuint16 ip_id;  // identificationuint16 ip_off; // fragment offset fielduint8  ip_ttl; // time to liveuint8  ip_p;   // protocoluint16 ip_sum; // checksumuint32 ip_src, ip_dst;
};#define IPPROTO_ICMP 1  // Control message protocol
#define IPPROTO_TCP  6  // Transmission control protocol
#define IPPROTO_UDP  17 // User datagram protocol#define MAKE_IP_ADDR(a, b, c, d)           \(((uint32)a << 24) | ((uint32)b << 16) | \((uint32)c << 8) | (uint32)d)

TCP/ UDP

UDP报头

// a UDP packet header (comes after an IP header).
struct udp {uint16 sport; // source portuint16 dport; // destination portuint16 ulen;  // length, including udp header, not including IP headeruint16 sum;   // checksum
};

Lab: networking

Background

You’ll use a network device called the E1000 to handle network communication. To xv6 (and the driver you write), the E1000 looks like a real piece of hardware connected to a real Ethernet local area network (LAN). In fact, the E1000 your driver will talk to is an emulation provided by qemu, connected to a LAN that is also emulated by qemu. On this emulated LAN, xv6 (the “guest”) has an IP address of 10.0.2.15. Qemu also arranges for the computer running qemu to appear on the LAN with IP address 10.0.2.2. When xv6 uses the E1000 to send a packet to 10.0.2.2, qemu delivers the packet to the appropriate application on the (real) computer on which you’re running qemu (the “host”).

您将使用一种名为 E1000 的网络设备来处理网络通信。对于 xv6（以及您要编写的驱动程序）来说，E1000 就像一块连接到真实以太网局域网（LAN）的硬件。实际上，您的驱动程序要与之通信的 E1000 是由 qemu 提供的模拟设备，连接到的局域网也是由 qemu 模拟的。在这个模拟局域网中，xv6（“客户机”）的 IP 地址为 10.0.2.15。qemu 还会安排运行 qemu 的计算机（“主机”）以 IP 地址 10.0.2.2 出现在局域网中。当 xv6 使用 E1000 向 10.0.2.2 发送数据包时，qemu 会将该数据包传递给您运行 qemu 的计算机上的相应应用程序。

You will use QEMU’s “user-mode network stack”. QEMU’s documentation has more about the user-mode stack here. We’ve updated the Makefile to enable QEMU’s user-mode network stack and the E1000 network card.

您将使用 QEMU 的“用户模式网络堆栈”。QEMU 的文档中有关于用户模式堆栈的更多内容，请点击此处查看。我们已更新 Makefile 以启用 QEMU 的用户模式网络堆栈和 E1000 网卡。

The Makefile configures QEMU to record all incoming and outgoing packets to the file packets.pcap in your lab directory. It may be helpful to review these recordings to confirm that xv6 is transmitting and receiving the packets you expect. To display the recorded packets:

Makefile 会将 QEMU 配置为将所有传入和传出的数据包记录到您实验目录中的 packets.pcap 文件中。查看这些记录可能有助于确认 xv6 正在传输和接收您期望的数据包。要显示已记录的数据包：

tcpdump -XXnr packets.pcap
We’ve added some files to the xv6 repository for this lab. The file kernel/e1000.c contains initialization code for the E1000 as well as empty functions for transmitting and receiving packets, which you’ll fill in. kernel/e1000_dev.h contains definitions for registers and flag bits defined by the E1000 and described in the Intel E1000 Software Developer’s Manual. kernel/net.c and kernel/net.h contain a simple network stack that implements the IP, UDP, and ARP protocols. These files also contain code for a flexible data structure to hold packets, called an mbuf. Finally, kernel/pci.c contains code that searches for an E1000 card on the PCI bus when xv6 boots.

我们为本次实验在 xv6 代码库中添加了一些文件。文件 kernel/e1000.c 包含了 E1000 的初始化代码以及用于发送和接收数据包的空函数，这些函数需要您来实现。文件 kernel/e1000_dev.h 包含了由 E1000 定义的寄存器和标志位的定义，这些内容在英特尔 E1000 软件开发手册中有描述。文件 kernel/net.c 和 kernel/net.h 包含了一个简单的网络栈，实现了 IP、UDP 和 ARP 协议。这些文件还包含了一个用于保存数据包的灵活数据结构的代码，称为 mbuf。最后，文件 kernel/pci.c 包含了当 xv6 启动时在 PCI 总线上搜索 E1000 卡的代码。

Your Job

Your job is to complete e1000_transmit() and e1000_recv(), both in kernel/e1000.c, so that the driver can transmit and receive packets. You are done when make grade says your solution passes all the tests.

您的任务是完成位于 kernel/e1000.c 中的 e1000_transmit() 和 e1000_recv() 函数，以便驱动程序能够发送和接收数据包。当 make grade 显示您的解决方案通过了所有测试时，即表示您已完成任务。

While writing your code, you’ll find yourself referring to the E1000 Software Developer’s Manual. Of particular help may be the following sections:

Section 2 is essential and gives an overview of the entire device.
Section 3.2 gives an overview of packet receiving.
Section 3.3 gives an overview of packet transmission, alongside section 3.4.
Section 13 gives an overview of the registers used by the E1000.
Section 14 may help you understand the init code that we’ve provided.

在编写代码时，您会发现自己需要参考 E1000 软件开发人员手册。以下部分可能会特别有帮助：

第 2 节至关重要，概述了整个设备。
第 3.2 节概述了数据包接收。
第 3.3 节概述了数据包传输，第 3.4 节也对此有所涉及。
第 13 节概述了 E1000 所使用的寄存器。
第 14 节可能有助于您理解我们提供的初始化代码。

Hints

Start by adding print statements to e1000_transmit() and e1000_recv(), and running make server and (in xv6) nettests. You should see from your print statements that nettests generates a call to e1000_transmit.

首先在 e1000_transmit() 和 e1000_recv() 中添加打印语句，然后运行 make server 和（在 xv6 中）nettests。从您的打印语句中，您应该可以看到 nettests 会生成对 e1000_transmit 的调用。

Some hints for implementing e1000_transmit:
First ask the E1000 for the TX ring index at which it’s expecting the next packet, by reading the E1000_TDT control register.

Then check if the the ring is overflowing. If E1000_TXD_STAT_DD is not set in the descriptor indexed by E1000_TDT, the E1000 hasn’t finished the corresponding previous transmission request, so return an error.
Otherwise, use mbuffree() to free the last mbuf that was transmitted from that descriptor (if there was one).
Then fill in the descriptor. m->head points to the packet’s content in memory, and m->len is the packet length. Set the necessary cmd flags (look at Section 3.3 in the E1000 manual) and stash away a pointer to the mbuf for later freeing.
Finally, update the ring position by adding one to E1000_TDT modulo TX_RING_SIZE.
If e1000_transmit() added the mbuf successfully to the ring, return 0. On failure (e.g., there is no descriptor available to transmit the mbuf), return -1 so that the caller knows to free the mbuf.

实现 e1000_transmit 的一些提示：

首先通过读取 E1000_TDT 控制寄存器，向 E1000 询问其期望接收下一个数据包的 TX 环索引。
然后检查环是否溢出。如果 E1000_TDT 所索引的描述符中未设置 E1000_TXD_STAT_DD 标志，说明 E1000 尚未完成相应的前一个传输请求，因此返回错误。
否则，使用 mbuffree() 释放从该描述符传输的最后一个 mbuf（如果有的话）。
然后填充描述符。m->head 指向内存中数据包的内容，m->len 是数据包的长度。设置必要的命令标志（请参阅 E1000 手册的第 3.3 节），并保存指向 mbuf 的指针以备稍后释放。
最后，通过将 E1000_TDT 加 1 并对 TX_RING_SIZE 取模来更新环的位置。
如果 e1000_transmit() 成功将 mbuf 添加到环中，则返回 0。如果失败（例如，没有可用的描述符来传输 mbuf），则返回 -1，以便调用者知道释放 mbuf。

Some hints for implementing e1000_recv:

First ask the E1000 for the ring index at which the next waiting received packet (if any) is located, by fetching the E1000_RDT control register and adding one modulo RX_RING_SIZE.
Then check if a new packet is available by checking for the E1000_RXD_STAT_DD bit in the status portion of the descriptor. If not, stop.
Otherwise, update the mbuf’s m->len to the length reported in the descriptor. Deliver the mbuf to the network stack using net_rx().
Then allocate a new mbuf using mbufalloc() to replace the one just given to net_rx(). Program its data pointer (m->head) into the descriptor. Clear the descriptor’s status bits to zero.
Finally, update the E1000_RDT register to be the index of the last ring descriptor processed.
e1000_init() initializes the RX ring with mbufs, and you’ll want to look at how it does that and perhaps borrow code.
At some point the total number of packets that have ever arrived will exceed the ring size (16); make sure your code can handle that.

实现 e1000_recv 的一些提示：

首先，通过读取 E1000_RDT 控制寄存器并加一（模 RX_RING_SIZE）来询问 E1000 下一个等待接收的数据包（如果有）所在的环索引。
然后，通过检查描述符状态部分中的 E1000_RXD_STAT_DD 位来确定是否有新数据包可用。如果没有，停止。
否则，将 mbuf 的 m->len 更新为描述符中报告的长度。使用 net_rx() 将 mbuf 传递给网络堆栈。
接着，使用 mbufalloc() 分配一个新的 mbuf 来替换刚刚传递给 net_rx() 的那个。将它的数据指针（m->head）编程到描述符中。将描述符的状态位清零。
最后，将 E1000_RDT 寄存器更新为刚刚处理的最后一个环描述符的索引。
e1000_init() 会用 mbuf 初始化 RX 环，您需要查看它是如何做到的，并可能借用相关代码。
在某个时候，到达的总数据包数量可能会超过环的大小（16）；请确保您的代码能够处理这种情况。

You’ll need locks to cope with the possibility that xv6 might use the E1000 from more than one process, or might be using the E1000 in a kernel thread when an interrupt arrives.

您需要使用锁来应对 xv6 可能从多个进程使用 E1000，或者在中断到达时内核线程正在使用 E1000 的情况。

实现

E1000网卡

参考
E1000 网卡的软件开发人员手册
博客
E1000网卡基本结构
以太网网卡包括OSI模型的2个层：物理层和数据链路层。物理层由PHY芯片控制，定义了数据传送与接收所需要的光电信号、时钟基准等。数据链路层由MAC芯片控制，提供寻址机构、数据帧的构建、向网络层提供标准数据接口等功能。
PCI总线到MAC芯片的分解图
DMA(Direct Memory Access)是可以不通过CPU直接访问内存的机制，在进行DMA传输时DMA Engine控制PCI总线，将内存中的数据和FIFO data buffer (64KB)中的数据互传。

发送数据的流程：
CPU将IP数据包打包放入内存中，通知DMA Engine进行DMA传输，数据放入FIFO data buffer中。MAC将IP数据包拆分为最小64KB，最大1518KB的数据帧，每一帧包含了目标MAC地址、自己的MAC地址和数据包中的协议类型以及CRC校验码。目标MAC地址通过ARP协议获取。PHY接受MAC传送的数据，将并行数据转化为串行数据后进行编码，在转变为模拟信号将数据进行传输。

Ring
RAM中的tx/rx buffer是一个环形结构，有head和tail2个指针，其中head的位置由网卡控制，在进行发送时，每发送完成一个packet网卡就会自动将head向前移动一个mbuf，而需要将某个packet发送时，软件将tail向前移动一个mbuf；在进行接收时，每接收到一个packet网卡自动将head向前移动一个mbuf，软件读取tail所指向的mbuf，并向前移动。移动到最后一个mbuf后从头开始，形成一个wrap-up的结构。

传统模式
对于每个数据包：

以太网、IP和TCP/UDP头由堆栈准备。
堆栈与软件设备驱动程序通信，命令驱动程序发送单个数据包。
驱动程序接收帧并与硬件接口。—硬件通过DMA传输从主机内存读取数据包。
当硬件完成帧的DMA传输（由中断指示）后，驱动程序将数据包的所有权归还给网络操作系统（NOS）。

当接收到以太网控制器的数据包时，硬件会将数据包数据存储到指定的缓冲区，并记录长度、校验和、状态、错误和状态字段。长度包括写入接收缓冲区的数据，以及可能包含的CRC字节。软件需要读取多个描述符，以确定跨越多个接收缓冲区的数据包的完整长度。

代码

// kernel/e1000.c
int
e1000_transmit(struct mbuf *m)
{//// Your code here.//// the mbuf contains an ethernet frame; program it into// the TX descriptor ring so that the e1000 sends it. Stash// a pointer so that it can be freed after sending.//acquire(&e1000_lock);uint32 tx_index = regs[E1000_TDT]; // tailer pointerif((tx_ring[tx_index].status & E1000_TXD_STAT_DD) == 0){// nothing to sendrelease(&e1000_lock);return -1;}else if(tx_mbufs[tx_index]!=0){// free previous packetmbuffree(tx_mbufs[tx_index]);}// fill in the mbuf ringtx_mbufs[tx_index] = m;// fill in the descriptortx_ring[tx_index].addr = (uint64) m->head;tx_ring[tx_index].length = m->len;tx_ring[tx_index].cmd = E1000_TXD_CMD_EOP | E1000_TXD_CMD_RS;regs[E1000_TDT] = (tx_index + 1) % TX_RING_SIZE; // update the postion of ringif (&tx_ring[tx_index] == 0 || tx_mbufs[tx_index] == 0) // 没有可用的描述符?{release(&e1000_lock);return -1;}release(&e1000_lock);return 0;
}static void
e1000_recv(void)
{//// Your code here.//// Check for packets that have arrived from the e1000// Create and deliver an mbuf for each packet (using net_rx()).//while(1){// 读取E1000_RDT控制寄存器加一来询问下一个等待接收的数据包所在的ring索引uint32 rx_index = (regs[E1000_RDT]+1)%RX_RING_SIZE; // 检查状态描述符中的E1000_RXD_STAT_DD位来确定是否有新数据可用if((rx_ring[rx_index].status & E1000_RXD_STAT_DD) == 0)break;// 更新长度,传递给网络堆栈rx_mbufs[rx_index]->len = rx_ring[rx_index].length;net_rx(rx_mbufs[rx_index]);// 分配新的mbuf来替换刚刚传递给net_rx()的mbufrx_mbufs[rx_index] = mbufalloc(0);// fill in the descriptor// 将数据指针编程到描述符中rx_ring[rx_index].addr = (uint64)rx_mbufs[rx_index]->head;// 状态位清0rx_ring[rx_index].status = 0;// 更新E1000_RDT寄存器regs[E1000_RDT] = rx_index;}
}