开头吐槽一句:当初被Java骗的呀,什么自动内存管理,到头来还是都要学的。还不如直接去学C++呐。
第六章:理解垃圾收集
标记-清除
for each object in allocatedObjectList:
clearing the mark bit
DFS starting from GC-Roots:
set the reached object mark bit
for each object in allocatedObjectList:
if mark bit hasn't setted:
remove it from allocatedObjectList
内存布局如下图

jmap -histo [pid]
num #instances #bytes class name
----------------------------------------------
1: 20839 14983608 [B
2: 118743 12370760 [C
3: 14528 9385360 [I
4: 282 6461584 [D
5: 115231 3687392 java.util.HashMap$Node
6: 102237 2453688 java.lang.String
7: 68388 2188416 java.util.Hashtable$Entry
8: 8708 1764328 [Ljava.util.HashMap$Node
9: 39047 1561880 jdk.nashorn.internal.runtime.CompiledFunction
10: 23688 1516032 com.mysql.jdbc.ConnectionPropertiesImpl$BooleanConnectionProperty
11: 24217 1356152 jdk.nashorn.internal.runtime.ScriptFunction
12: 27344 1301896 [Ljava.lang.Object
13: 10040 1107896 java.lang.Class
14: 44090 1058160 java.util.LinkedList$Node
15: 29375 940000 java.util.LinkedList
16: 25944 830208 jdk.nashorn.internal.runtime.FinalScriptFunctionData
17: 20 655680 [Lscala.concurrent.forkjoin.ForkJoinTask
18: 19943 638176 java.util.concurrent.ConcurrentHashMap$Node
19: 730 614744 [Ljava.util.Hashtable$Entry
20: 24022 578560 [Ljava.lang.Class
HotSpot 运行时
Ordinary Object Pointer: 这是Java对象在JVM中的表示,以两个机器字长大的对象头作为开头,mark word指向对象独有的元数据(如hashcode),klass word指向类级别的元数据(PermGen永久代中的)
使用-XX:+UseCompressedOops压缩对象头,在Java7以上是默认开启的。
KlassOops和Class Objects

Oops的继承结构
oop (abstract base)
|-instanceOop (instance objects)
|-methodOop (representations of methods)
|-arrayOop (array abstract base)
|-symbolOop (internal symbol / string class)
|-klassOop (klass Header) (Java 7 and before only)
|-markOop
GC Roots
- 栈帧
- JNI
- 寄存器
- Code roots(from JVM code cache)
- 全局对象
- 加载类的元数据
GC In HotSpot
Weak Generational Hypothesis发现大量对象是很短命的,只有一部分对象能够活得时间长一些。
- 记录了每个对象的年龄 (逃过了几次GC)
- 对象优先分配了Eden区,哪怕存活也要移到Survivor区
- 由另一个内存区域-老年代保存长期存活的对象

为了加快mark-sweep的速度,HotSpot维持一个“Card table”的数据结构,记录下哪些老年代对象指向年轻代对象。表中每个元素与512字节相对应
cards[*instanceOop >> 9] = 0;
TLABs: thread local allocation buffers, 在线程独有的一块缓冲区分配对象。

并发收集器
在Java8以前,默认的收集器是并发收集器,因此YGC和FGC都是要STW的。并发收集器为了吞吐量而设计,在STW后,收集器竭尽所能尽快完成内存回收。
- ParallelGC: 年轻代最简单的收集器
- ParNew:和ParallelGC区别很小,主要为了和CMS配合使用
- ParallelOld:老年代(包括永久代)的并发收集器
年轻代并行回收:但对象在Eden区分配失败,JVM就会停止用户线程,进行垃圾回收

老年代并发回收:和年轻代不同,老年代会为年轻代提供空间分配担保,且老年代使用一整块连续的内存空间,因此老年代没有临时存放对象的地方,所以ParallelOld使用标记-压缩算法。
复制算法 vs 压缩算法

JVM内存分配实例
堆分配
Heap Area |
Size |
Overall |
2G |
Old Gen |
1.5G |
Young Gen |
500M |
Eden |
400M |
S1 |
50M |
S2 |
50M |
GC数据
|
|
Allocation Rate |
100M/s |
YGC time |
2ms |
FGC time |
100ms |
Object lifetime |
200ms |
因为对象分配速率为100MB/s, 所以4s就将Eden分配光了,即每4s会发生一次YGC
GC次数 |
时间点 |
数据情况 |
GC0 |
4s |
20M Eden -> S1(20M) |
GC1 |
8.002s |
20M Eden -> S2(20M) |
GC2 |
12.004s |
20M Eden -> S1(20M) |
public class ModelAllocator implements Runnable {
private volatile boolean shutdown = false;
private double chanceOfLongLived = 0.02;
private int multiplierForLongLived = 20;
private int x = 1024;
private int y = 1024;
private int mbPerSec = 50;
private int shortLivedMs = 100;
private int nThreads = 8;
private Executor exec = Executors.newFixedThreadPool(nThreads);
public void run() {
final int mainSleep = (int) (1000.0 / mbPerSec);
while (!shutdown) {
for (int i = 0; i < mbPerSec; i++) {
ModelObjectAllocation to = new ModelObjectAllocation(x, y, lifetime());
exec.execute(to);
try {
Thread.sleep(mainSleep);
} catch (InterruptedException ex) {
shutdown = true;
}
}
}
}
public int lifetime() {
if (Math.random() < chanceOfLongLived) {
return multiplierForLongLived * shortLivedMs;
}
return shortLivedMs;
}
static class ModelObjectAllocation implements Runnable {
private final int[][] allocated;
private final int lifeTime;
public ModelObjectAllocation(final int x, final int y, final int liveFor) {
allocated = new int[x][y];
lifeTime = liveFor;
}
@Override
public void run() {
try {
Thread.sleep(lifeTime);
System.err.println(System.currentTimeMillis() +": "+ allocated.length);
} catch (InterruptedException ex) {
}
}
}
}
第七章:高级垃圾收集
选择GC的指标
- 停顿时间
- 吞吐量(GC time/app run time)
- 停顿频率
- 回收效率(一个停顿周期能回收多少内存)
- 停顿一致性(是否每次停顿的时间差不多)
大数据应用应该更在乎吞吐量而不是停顿时间。对于一些批处理任务,10s的暂停时间也无关紧要,GC算法更关心CPU的使用效率和吞吐量。
并发GC理论
safepoint: JVM开始执行GC时,线程的暂停点
- JVM不会强制一个线程到safepoint
- JVM可以阻止一个线程离开safepoint
到达safepoint的流程
- JVM设置一个全局的“time to safepoint”标志
- 应用线程能够查询这个标志位
- 应用线程暂停,并等待被唤醒
safepoint情景
- 线程自动达到safepoint,当线程被锁阻塞
- 线程自动达到safepoint,当线程在执行JNI代码
- 线程不必达到safepoint,当线程被OS打断
- 线程不必达到safepoint,当字节码执行到一半
Tri-color marking
- GC roots 被标记为灰色
- 其他对象被标记为白色
- 标记线程如果能沿着灰节点移动到白节点,就标记为灰色
- 如果灰节点没有白色子节点,就标记为黑色
- 停止标记,直到没有灰色节点
- 回收所有白节点

- 当一个对象已经被一个线程标记为黑色,然后又被标记为白色。即Mutator(获取?)线程会使标记对象无效。
- 在并发标记期间,没有黑色的对象会持有一个指向白色对象的引用。
CMS
流程
- 初始标记(STW)
- 并发标记
- 并发预清理
- 重新标记(STW)
- 并发清理
- 并发重置
CMF并发模式失败

- 如果老年代有太多的对象,而年轻代中晋升得太多了
- 则会使用ParallelOld, 这会使得完全的STW。

- 而CMS在老年代75%(默认)的时候,就会进行回收
- CMS在回收老年代时,不会进行压缩,空间是分散的
- 而如果老年代没有可用的连续空间,也会使用ParallelOld
- -XX:+UseConcMarkSweepGC
第8章:GC日志,监控,调优,工具
GC日志简介
-Xloggc:gc.log -XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
Effect |
Flags |
Controls which file to log GC events to |
-Xloggc:gc.log |
Logs GC event details |
-XX:+PrintGCDetails |
Prints the wallclock time that GC events occured at. |
-XX:+PrintGCDateStamps |
Prints the time (in secs since VM start) that GC events occured at. |
-XX:+PrintGCTimeStamps |
Adds extra GC event detail that is vital for tooling |
-XX:+PrintTenuringDistribution |
Switches on log file rotation |
-XX:+UseGCLogFileRotation |
Set the maximum number of log files to keep |
-XX:+NumberOfGCLogFiles=< n> |
Set the maximum size of each file before rotation |
-XX:+GCLogFileSize=< size> |
Log分析工具
基本调优
Table 8-3. GC heap sizing flags
Effect |
Flag |
Set the minimum size reserved for the heap |
-Xms< size> |
Set the maximum size reserved for the heap |
-Xmx< size> |
Set the maximum size permitted for PermGen (Java 7) |
-XX:MaxPermSize=< size> |
Set the maximum size permitted for Metaspace (Java 8) |
-XX:MaxMetaspaceSize=< size> |
临界对象大小 |
-XX:PretenureSizeThreshold=N> |
最小TLAB大小 |
-XX:MinTLABSize=N |
GC测试代码
@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@OutputTimeUnit(TimeUnit.SECONDS)
@Fork(1)
public class SimulateCardTable {
private static final int SIZE_FOR_20_GIG_HEAP = 15 * 2 * 1024 * 1024;
private static final byte[] cards = new byte[SIZE_FOR_20_GIG_HEAP];
@Setup
public static final void setup() {
final Random r = new Random(System.nanoTime());
for (int i=0; i<100_000; i++) {
cards[r.nextInt(SIZE_FOR_20_GIG_HEAP)] = 1;
}
}
@Benchmark
public int scanCardTable() {
int found = 0;
for (int i=0; iif (cards[i] > 0)
found++;
}
return found;
}
}
并发调优
Effect |
Flag |
(Old flag) Set ratio of YoungGen to Heap |
-XX:NewRatio=N |
(Old flag) Set ratio of Survivor spaces to YoungGen |
-XX:SurvivorRatio=N |
(Old flag) Set min size of YoungGen |
-XX:NewSize=N |
(Old flag) Set max size of YoungGen |
-XX:MaxNewSize=N |
(Old flag) Set min % of heap free after GC to avoid expanding |
-XX:MinHeapFreeRatio |
(Old flag) Set max % of heap free after GC to avoid shrinking |
-XX:MaxHeapFreeRatio |
Flags set:
-XX:NewRatio=N
-XX:SurvivorRatio=K
YoungGen = 1 / (N+1) of heap
OldGen = N / (N+1) of heap
Eden = (K – 2) / K of YoungGen
Survivor1 = 1 / K of YoungGen
Survivor2 = 1 / K of YoungGen
第9章:JVM上的代码执行
。。。。。。
第10章:理解JIT编译
JITWatch
https://github.com/AdoptOpenJDK/jitwatch/
-XX:+UnlockDiagnosticVMOptions
-XX:+TraceClassLoading
-XX:+LogCompilation
hsdis
-XX:+PrintAssembly
内联
Switch |
Default (JDK 8, Linux x86_64) |
Explanation |
-XX:MaxInlineSize=n |
35 bytes of bytecode |
Inline methods up to this size |
-XX:FreqInlineSize=n |
325 bytes of bytecode |
Inline “hot” (frequently called) methods up to this size |
-XX:InlineSmallCode=n |
1000 bytes of native code (non-Tiered)2000 bytes of native code (Tiered) |
Do not inline methods where there is already a final-tier compilation that occupies more than this amount of space in the code cache. |
-XX:MaxInlineLevel |
9 |
Maximum number of call frames to inline |