当前位置：首页 > news >正文

openjdk底层(hotspot)汇编指令的内存分布

news 2025/6/4 11:16:46

内存分段机制

在冯诺依曼机中，内存是一整块。数据放在数据段，代码放在代码段，二者之间不能混合。假设内存分布如下。

在这里插入图片描述

JVM的实现

CodeSection类

为了模拟内存中的这种机制，首先定义了CodeSection类。该类用于描述段。

在这里插入图片描述

不考虑重定位计算，该类的关键属性包括下面几个方面

属性名	备注
_start	数据记录内存起始地址
_end	当前数据记录内存终止地址
_limit	分配的最后一个地址
_mark	这个属性用于标记该段section，一般就是第一条指令
_index	用于标记该段属于什么类型。

在这里插入图片描述

注意，通常我们在Java或者C#中描述某种固定类型通常使用enum关键字描述，但由于Java和C#是强类型语言，因此都是用enum的具体值来描述。例如在C#中定义段类型：

public enum EnumCodeSectionType{SECT_CONSTS,   //放置常量值SECT_INST,     //放置指令值SECT_STUB,     //放置装代码SECT_NONE=-1   //非代码段
}public class CodeSection{private EnumCodeSectionType _codeSectionType=;...
}

但是C或者C++是弱类型语言，其枚举类型的本质是整形，因此段枚举如下：

enum {// Here is the list of all possible sections.  The order reflects// the final layout.SECT_FIRST = 0,SECT_CONSTS = SECT_FIRST, // Non-instruction data:  Floats, jump tables, etc.SECT_INSTS,               // Executable instructions.SECT_STUBS,               // Outbound trampolines for supporting call sites.SECT_LIMIT, SECT_NONE = -1};

我们在CodeSection类中，使用_index字段描述该该枚举值。

public class CodeSection{private:int _index;  //默认值由构造函数决定，否则为原内存中的遗留值
}

CodeBuffer类

在JVM中，定义CodeBuffer类用于保存各种CodeSection

在这里插入图片描述

这里可以看到有3个CodeSection类对象，分别是_consts，_insts以及_stubs。这3个字段分别对应3种CodeSection。在内存上的分别可以用下图显示。

在这里插入图片描述

其他相关字段功能如下

字段名	用途
_name	CodeBuffer的名称，用于打印信息
_total_start	整个CodeBuffer的起始地址
_total_size	整个CodeBuffer的大小
_last_insn	最后一条指令

如何将指令插入_insts字段

假设使用的电脑架构是x86，使用gdb进行java -version命令的调试
在build/linux-x86_64-normal-server-slowdebug/jdk/bin目录下新建.gdbinit文件

handle SIGSEGV pass noprint nostop
handle SIGUSR1 pass noprint nostop
handle SIGUSR2 pass noprint nostop
set logging on
set breakpoint pending onb mainr -Xint -version

这里使用-Xint选项表示采用纯解释器模式执行
使用下面的命令启动gdb调试

gdb -command=.gdbinit ./java

运行后，程序停在main函数处

Breakpoint 1, main (argc=3, argv=0x7fffffffe088)at /home/jx/src/openjdk/src/java.base/share/native/launcher/main.c:98
98

此时，根据《openjdk底层(hotspot)汇编指令调用（四）——发送指令到内存》中讲述的内容，将断点设置在的下面函数处

//assembler.hpp
void emit_int8(   int8_t  x) { code_section()->emit_int8(   x); }

gdb打印信息如下

(gdb) b emit_int8
Function "emit_int8" not defined.
Breakpoint 2 (emit_int8) pending.
(gdb) c
Continuing.
[New Thread 0x7ffff7fe0700 (LWP 12619)]
[Switching to Thread 0x7ffff7fe0700 (LWP 12619)]Thread 2 "java" hit Breakpoint 2, AbstractAssembler::emit_int8 (this=0x7ffff001bbb0, x=-123 '\205') at /home/jx/src/openjdk/src/hotspot/share/asm/assembler.hpp:273
273       void emit_int8(   int8_t  x) { code_section()->emit_int8(   x); }
(gdb) s
AbstractAssembler::code_section (this=0x7ffff001bbb0)at /home/jx/src/openjdk/src/hotspot/share/asm/assembler.hpp:306
306       CodeSection*  code_section() const   { return _code_section; }
(gdb) print *_code_section
$1 = {_start = 0x7fffec9c20a0 '\314' <repeats 63 times>, <incomplete sequence \314>, _mark = 0x0, _end = 0x7fffec9c20a0 '\314' <repeats 63 times>, <incomplete sequence \314>, _limit = 0x7fffec9c20e0 "", _locs_start = 0x0, _locs_end = 0x0, _locs_limit = 0x0, _locs_point = 0x7fffec9c20a0 '\314' <repeats 63 times>, <incomplete sequence \314>, _locs_own = false, _frozen = false, _scratch_emit = false, _index = 1 '\001', _outer = 0x7ffff7fdf890
}

这里可以看到该CodeSection类对象的信息，

其中_index=1，结合上述的类图以及段类型枚举可知，当前CodeSection是SECT_INST类型，即代码段。

该段的起始地址和终止地址由_start和_end对象描述，由于是第一条插入的指令，因此二者相等

_start = 0x7fffec9c20a0
_end   = 0x7fffec9c20a0

该段最大只能到_limit处

_limit = 0x7fffec9c20e0

对应到内存的图形分布如下
在这里插入图片描述

可以看到该代码段有几个特征。

起始地址能够被8整除，这样有利于CPU访存的速度
整个代码段的长度是64字节，也有利于CPU访存速度

我们再让其运行到emit_8int处看看

Thread 2 "java" hit Breakpoint 2, AbstractAssembler::emit_int8 (this=0x7ffff001bbb0, x=-10 '\366') at /home/jx/src/openjdk/src/hotspot/share/asm/assembler.hpp:273
273       void emit_int8(   int8_t  x) { code_section()->emit_int8(   x); }
(gdb) s
AbstractAssembler::code_section (this=0x7ffff001bbb0)at /home/jx/src/openjdk/src/hotspot/share/asm/assembler.hpp:306
306       CodeSection*  code_section() const   { return _code_section; }
(gdb) print *_code_section
$2 = {_start = 0x7fffec9c20a0 "\205", '\314' <repeats 62 times>, <incomplete sequence \314>, _mark = 0x0, _end = 0x7fffec9c20a1 '\314' <repeats 62 times>, <incomplete sequence \314>, _limit = 0x7fffec9c20e0 "", _locs_start = 0x0, _locs_end = 0x0, _locs_limit = 0x0, _locs_point = 0x7fffec9c20a0 "\205", '\314' <repeats 62 times>, <incomplete sequence \314>, _locs_own = false, _frozen = false, _scratch_emit = false, _index = 1 '\001', _outer = 0x7ffff7fdf890
}

此时可以看到_end的值发生了改变

_start = 0x7fffec9c20a0
_end   = 0x7fffec9c20a1

可以看到_end的地址增加了1个字节，即8位
在这里插入图片描述

我们此时再看看_outer属性，该属性对应了CodeBuffer，即哪个CodeBuffer拥有该CodeSection

(gdb) print *_code_section->_outer
$4 = {<StackObj> = {<AllocatedObj> = {_vptr.AllocatedObj = 0x7ffff6fb7110 <vtable for CodeBuffer+16>}, <No data fields>}, members of CodeBuffer: _name = 0x7ffff6540ab4 "static buffer", _consts = {_start = 0x0, _mark = 0x0, _end = 0x0, _limit = 0x0, _locs_start = 0x0, _locs_end = 0x0, _locs_limit = 0x0, _locs_point = 0x0, _locs_own = false, _frozen = false, _scratch_emit = false, _index = 0 '\000', _outer = 0x7ffff7fdf890}, _insts = {_start = 0x7fffec9c20a0 "\205", '\314' <repeats 62 times>, <incomplete sequence \314>, _mark = 0x0, _end = 0x7fffec9c20a1 '\314' <repeats 62 times>, <incomplete sequence \314>, _limit = 0x7fffec9c20e0 "", _locs_start = 0x0, _locs_end = 0x0, _locs_limit = 0x0, _locs_point = 0x7fffec9c20a0 "\205", '\314' <repeats 62 times>, <incomplete sequence \314>, _locs_own = false, _frozen = false, _scratch_emit = false, _index = 1 '\001', _outer = 0x7ffff7fdf890}, _stubs = {_start = 0x0, _mark = 0x0, _end = 0x0, _limit = 0x0, _locs_start = 0x0, _locs_end = 0x0, _locs_limit = 0x0, _locs_point = 0x0, _locs_own = false, _frozen = false, _scratch_emit = false, _index = 2 '\002', _outer = 0x7ffff7fdf890},