Primer
To learn something with problems will lead to a sense of purpose. I first listed some of my doubts about CMS. I hope this learning note can solve these doubts and help you.
What is the original purpose, background and purpose of CMS? CMS Applicable scenarios of ? What is the trade off of CMS? Strengths, Weaknesses and Costs: Which region will CMS recycle? What objects does CMS GC Roots include? CMS process? Is CMS and Full gc the same thing? When is CMS triggered? How to analyze CMS logs? How to tune CMS? CMS scanning those Object? What is the difference between CMS and CMS collector? What are the recommended parameters for CMS? Why can ParNew be used with CMS, but not Parallel Scanvenge?
1、 Basic knowledge
CMS collector: Mostly Concurrent collector, also known as Concurrent Mark Sweep GC (CMS collector), manages the new generation in the same way as Parallel collector and Serial collector, while in the old generation, it executes concurrently as much as possible, with only two short pauses per garbage collector cycle. My previous understanding of CMS was that it was a collector for the elderly. Today, I checked the authoritative guide to Java performance optimization and the authoritative guide to Java performance, and confirmed that the previous understanding was wrong. The original purpose and purpose of CMS is to eliminate the long pause of the Through collector and the Serial collector in the Full GC cycle. CMS Applicable scenarios of : If your application needs faster response, does not want to have a long pause, and your CPU resources are rich, it is suitable for CMS collectors.
2、 Process of CMS
Normal process of CMS
Here we will first take a look at the normal completion of the CMS concurrent collection cycle.
1. (STW) Initial marking: This stage marks the old age objects directly accessible from GcRoots and the old age objects referenced by the Cenozoic, which are the gray points in the following figure. This process is single threaded (before JDK7, single threaded, and after JDK8, parallel, and you can use the parameters
CMSParallelInitialMarkEnabled adjustment).
2. Concurrency marking: start the tracing process from the objects marked in the previous phase, and mark all reachable objects. In this phase, the garbage collection thread and the application thread run simultaneously, as shown in the gray point in the figure above. In the process of concurrent marking, the application thread is still running, so some objects will be promoted from the new generation to the old age, some object references in the old age will be changed, and some objects will be directly allocated to the old age. The card of the affected old age objects will be marked as dirty, which is used to re mark the phase scanning. In this stage, the possible reason why the card of the old age object is marked as dirty is the green line in the following figure:
3. Pre cleaning: pre cleaning is also used to mark the surviving objects of the old generation, in order to make the STW in the re marking stage as short as possible. The goal of this phase is to mark the old age objects affected by the application thread in the concurrent marking phase, including: (1) the old age objects whose card is dirty; (2) The old age object referenced in the survival area (from and to). Therefore, this stage also needs to scan the new generation+old age. [PS: Will you scan the objects in Eden area? I guess the source code is not. We need to continue to verify]
4. Interruptible pre cleaning: The goal of this phase is the same as that of the "pre cleaning" phase, which is also to reduce the workload of the re marking phase. The value of interruptible pre cleaning: try to wait for a Minor GC before entering the retarking phase, and try to shorten the pause time in the retarking phase. In addition, interruptible pre cleaning will start when Eden reaches 50%. At this time, there is still half a way to the next minor gc. This has another meaning, that is, to avoid two consecutive pauses in a short time, as shown in the following figure:
5. After the pre cleaning step, if the following two conditions are met, the interruptible pre cleaning will not be started, and the re marking phase will be started directly:
Eden's usage space is larger than "CMSScheduleRemarkEdenSizeThreshold", and the default value of this parameter is 2M; Eden utilization rate is greater than or equal to "CMSScheduleRemarkEdenPenetration", and the default value of this parameter is 50%. If the above two conditions are not met, the interruptible pre cleaning will enter. The interruptible pre cleaning may be executed multiple times. Then there are two exits to exit this phase (see the figure below for source code): CMSMaxAbortablePrecleanLoops is set, and the number of executions exceeds this value. The default value of this parameter is 0; CMSMaxAbortablePrecleanTime: The time for executing interruptible pre cleaning exceeds this value. The default value of this parameter is 5000 milliseconds.
If you quit for this reason, the gc log will be printed as follows:
It is possible to interrupt the pre cleaning process without waiting for the Minor gc. At this time, if we enter the re marking stage, there are still many living objects in the Cenozoic, which will lead to the STW becoming longer. Therefore, CMS also provides the CMSScavengeBeforeRemark parameter, which can force the Minor gc to be carried out in sequence before entering the re marking stage.
5. (STW) Re marking: Rescan objects in the heap, perform accessibility analysis, and mark live objects. The goal of scanning in this stage is: Cenozoic objects+Gc Roots+old age objects corresponding to cards marked as dirty in front. If the pre cleaning work is not done well, this step will take a lot of time to scan the Cenozoic, resulting in a long pause in this phase. This process is multi-threaded.
6. Concurrent clearing: user threads are reactivated, and objects that are not marked as alive are marked as unreachable;
7. Concurrent reset: CMS resets the status of the collector internally and prepares to enter the next concurrent recovery cycle.
CMS abnormalities
The above describes the normal completion of the CMS concurrency cycle, but there are several cases of CMS concurrency cycle failure:
Concurrent mode failure: The goal of CMS is not to stop all application threads when recycling old generation objects. During the concurrency cycle, the user's threads are still running. If at this time, the application thread requests the old generation to allocate more space than the reserved space (guarantee failure), the concurrent mode failure will be triggered, Then the concurrency cycle of CMS will be replaced by a Full GC - stop all applications for garbage collection and perform space compression. If we set the UseCMSInitiationOccupancy Only and CMSInitiationOccupancy Fraction parameters, where the value of CMSInitiationOccupancy Fraction is 70, the reserved space is 30% of the old age. Promotion failure: when the new generation is doing minor gc, the CMS guarantee mechanism needs to confirm whether there is enough space for the objects to be promoted in the old age. If the guarantee mechanism finds that there is not enough space for the objects to be promoted, it will report the concurrent mode failure. If the guarantee mechanism judges that it is enough, but in fact it cannot be allocated due to the fragmentation problem, it will report the promotion failure. The permanent generation space (or Java 8's meta space) is exhausted. By default, CMS will not collect permanent generations. Once the permanent generation space is exhausted, Full GC will be triggered.
3、 CMS tuning
For tuning with too long pause time, it is necessary to first determine which stage of pause is the cause, and then tune for specific reasons. The JVM using the CMS collector may cause a pause in the following cases: (1) Minor gc pause; (2) The pause of the initial mark in the concurrent cycle; (3) Remarked pauses in the concurrent cycle; (4) Serial Old collects the pauses of the old age; (5) The pause of Full GC. The failure of concurrent mode will lead to case (4), and the failure of promotion and the exhaustion of permanent generation space will lead to case (5). For the tuning of concurrency mode failure, try to increase the space of the old generation, increase the size of the entire heap, or reduce the size of the young generation to execute the background recycling thread at a higher frequency, that is, increase the frequency of CMS concurrency cycles. Set the UseCMSInitiationOccupancyOnly and CMSInitiationOccupancyFraction parameters, and lower the value of CMSInitiationOccupancyFraction, but not too low. Too low will lead to too many invalid concurrency cycles, which will consume CPU time and more invalid pauses. Generally speaking, this process requires several iterations, but there are still some routines. See the recommendations given in the Java Performance Authority Guide. The excerpt is as follows:>For specific applications, the better value of this flag can be obtained according to the value in the GC log when the CMS cycle fails to start for the first time. The specific method is to find the concurrent mode failure in the garbage collection log, and then reverse find the latest startup record of the CMS cycle. Then, calculate the space occupation value of the old age based on the log, and set a value smaller than this value.
Increase the number of recycling threads The default number of garbage collection threads in CMS is * (number of CPUs+3)/4 *. This formula means that when the number of CPUs is greater than 4, the garbage collection background thread occupies at least 25% of the CPU resources. For example, if the number of CPU cores is 1-4, then one CPU will be used for garbage collection. If the number of CPU cores is 5-8, then two CPUs will be used for garbage collection for a long time.
If garbage collection (or metaspace expansion) is required for tuning the permanent generation, Full GC will be triggered. By default, CMS will not process garbage in the permanent generation. You can enable garbage collection in the permanent generation by enabling the CMSPermGenSweepingEnabled configuration. After enabling, a group of background threads will collect garbage for the permanent generation. Note that the indicator that triggers garbage collection in the permanent generation is independent of the indicator that triggers garbage collection in the old age, The threshold value of the old age can be set through the CMSInitiationPermOccupancy Fraction parameter. The default value of this parameter is 80%. Enabling garbage collection for permanent generation is only one step. Another parameter, CMSClassUnloadingEnabled, needs to be enabled to unload unused classes during garbage collection.
4、 What is the trade off of CMS?
Advantage Low latency collector: There is almost no long pause, and the application only has a very short pause when Minor gc and background threads scan the old age. CPU usage with higher disadvantage: There must be enough CPU resources to run the background garbage collection thread, and scan the heap usage while the application thread is running. [PS: Now the CPU resources of the server are basically not a problem, which can be ignored] When the CMS collector collects the old generation, it does not carry out any compression and collation work, which means that the old generation will become fragmented with the operation of the application; Too many fragments will affect the allocation of large objects. Although there is still a lot of space left in the old days, there is no continuous space to allocate large objects. At this time, Full GC will be triggered. CMS provides two parameters to solve this problem: (1) UseCMSCompactAtFullCollection, which performs memory defragmentation when full GC is required; (2) CMSFullGCsBeforeComparison: After every number of uncompressed Full GCs, execute a full GC with compression. Floating garbage will appear; In the concurrent cleanup phase, user threads are still running, and space must be reserved for user threads to use. Therefore, CMS requires more heap space than other collectors.
5、 Answers to several questions
Why can ParNew be used with CMS, but not Parallel Scanvenge? Answer: This is related to the history of Hotspot VM. Parallel Scanvenge is not developed under the "generational framework", while ParNew and CMS are developed under the generational framework. Do minor gc and major gc occur sequentially in CMS? Answer: No, it can cross occur. That is, minor gc can occur during the execution of the concurrent cycle. This can be observed by finding a gc log. Is the concurrent collection cycle of CMS properly triggered? As can be seen from the figure below, there are two conditions for triggering the CMS concurrency cycle:
Threshold check mechanism: the usage space of the old age reaches a certain threshold, and the default value of the JVM is 92% (68% before jdk1.5, 92% after jdk1.6), or it can be set through two parameters: CMSInitiationOccupancy Fraction and UseCMSInitiationOccupancy Only; The setting of this parameter depends on the application scenario. If it is set too small, CMS will occur frequently. If it is set too large, too many concurrent modes will fail. For example, the dynamic check mechanism: the JVM will estimate the time when the next old generation will be exhausted according to the latest recycling history. When this time is approaching, a concurrency cycle will be started. This feature can be turned off by setting the parameter UseCMSInitiatingOccupancy Only. What objects will be scanned in the concurrent collection cycle of CMS? What objects will be recycled? Answer: The CMS concurrency cycle only recycles objects of the old age. However, when marking surviving objects of the old age, some objects may be referenced by objects of the young generation, so it is necessary to scan objects of the entire heap. What objects does CMS gc roots include? A: First, how to understand the concept of Gc Roots in JVM garbage collection (see R's explanation of the concept of GC roots); Second, in the concurrent collection cycle of CMS, how to determine whether the objects in the old age are alive? As we mentioned earlier, in the CMS concurrency cycle, there will be omissions if only objects that are directly reached by Gc Roots are scanned, and new generation objects need to be scanned. As shown in the blue font in the following figure, the young generation and the old generation in CMS are collected separately, so when judging the survival of the objects of the young generation, the old age should be regarded as their own GcRoots. At this time, it is not necessary to scan all objects of the old age, but use the card table data structure. If an old age object references the objects of the young generation, Then the value in the card will be set to a special value; On the other hand, when judging the survival of the old age objects, we also need to regard the young generation as their own Gc Roots. This process has been discussed in the third section.
If my application decides to use the CMS collector, what are the recommended JVM parameters? My own application uses the following parameters, which are generated from PerfMa's xxfox. You can also use this product to tune your JVM parameters: - Xmx4096M - Xms4096M - Xmn1536M
-XX:MaxMetaspaceSize=512M -XX:MetaspaceSize=512M
-XX:+UseConcMarkSweepGC
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70
-XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses
-XX:+CMSClassUnloadingEnabled
-XX:+ParallelRefProcEnabled
-XX:+CMSScavengeBeforeRemark
-XX:ErrorFile=/home/admin/logs/xelephant/hs_err_pid%p.log
-Xloggc:/home/admin/logs/xelephant/gc.log
-XX:HeapDumpPath=/home/admin/logs/xelephant
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+HeapDumpOnOutOfMemoryError
Summary of CMS related parameters (it should be noted that I have not considered too many JDK versions here. The configuration of JDK1.7 and JDK1.8 parameters may have different default values, and the specific settings need to be determined according to the specific version when using them) | No. | Parameter name | Explanation|
| --- | --- | --- |
|1 | UseConcMarkSweepGC | Enable CMS collector|
|2 | UseCMSInititingOccupancy Only | Turn off the dynamic check mechanism of CMS, and judge whether to start the concurrent collection cycle only by the preset threshold|
|3 | CMSInititingOccupancy Fraction | Start the concurrent collection cycle when the space used up in the old days, and use it together with UseCMSInititingOccupancy Only|
|4 | ExplicitGCInvokesConcurrentAndUnloadsClasses | Convert the Full GC triggered by System. gc() into a CMS concurrent collection, and unload unnecessary classes in the Perm (Metaspace) area in this collection cycle|
|5 | CMSClassUnloadingEnabled | Whether to unload classes in the CMS collection cycle|
|6 | ParallelRefProcEnabled | Enable concurrent reference processing|
|7 | CMSScavengeBeforeRemark | If this parameter is enabled, the minor gc will be forced to trigger once before entering the remarking phase|
reference material
1. Talk about GC optimization of Java applications from actual cases
//tech.meituan.com/jvm_optimize.html
2. Understand CMS garbage collection logs
//ifeve.com/jvm-cms-log/
3. Diagram CMS garbage collection mechanism, you deserve to have
//www.jianshu.com/p/2a1b2f17d3e4
4. Why does CMS scan the Cenozoic even though it is the old gc?
//www.zhihu.com/question/279580656/answer/408089811
5. R explains the concept of GC roots
//www.zhihu.com/question/53613423/answer/135743258
6、Introduce to CMS Collector
//medium.com/ @robiplus/introduce-to-cms-collector-47b4400665c3
7. Understanding Java Virtual Machine
8. Java Performance Authority Guide
9. Oracle's GC Tuning Manual
//docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/cms.html
10、
what-is-the-threshold-for-cms-old-gc-to-be-triggered
//stackoverflow.com/questions/33557644/what-is-the-threshold-for-cms-old-gc-to-be-triggered
11、Frequently Asked Questions about Garbage Collection in the Hotspot Java VirtualMachine
//www.oracle.com/technetwork/java/faq-140837.html
12、Java SE HotSpot at a Glance
//www.oracle.com/technetwork/java/javase/tech/index-jsp-136373.html
13. Xxfox: PerfMa's parameter tuning artifact
//xxfox.perfma.com/
14. Explain the CMS garbage collection mechanism in detail
//www.cnblogs.com/littleLord/p/5380624.html
15. ParNew and PSYoungGen and DefNew are the same thing?
//hllvm.group.iteye.com/group/topic/37095
16. Java SE Memory Management White Paper
//www.oracle.com/technetwork/java/javase/memorymanagement-whitepaper-150215.pdf
17、Garbage Collection in Elasticsearch and the G1GC
//medium.com/naukri-engineering/garbage-collection-in-elasticsearch-and-the-g1gc-16b79a447181
18、A Heap of Trouble
//www.elastic.co/cn/blog/a-heap-of-trouble
19. Bi Xuan's article: Why not suggest
//hellojava.info/
20. Full Interpretation of SystemGC for JVM Source Code Analysis
//lovestblog.cn/blog/2015/05/07/system-gc/
Reader discussion
The following figure is misleading about the collection range of the CMS collector. From the official document, the CMS collector includes the collection of the young generation and the old generation, but the collection strategy for the young generation is the same as ParNew, which can be seen on page 11 of Reference 16.
What is the difference between the Full GC triggered by concurrent mode failure and promotion failed? (This question is the result of my discussion with Ah Fei and Jiang Xiaofeng.) Answer: The "Full GC" triggered by concurrent mode failure is not what we often call Full GC - normal Full GC is actually the whole gc process, including ygc and cms gc. That is to say, the problem itself is problematic. The trigger of concurrent mode failure is not what we often call Full GC. Then we will discuss one missing point: there are two modes of the concurrency cycle of CMS gc: foreground and background. The concurrent mode failure triggers the foreground mode, which will suspend the entire application, and will save some parallel stages from an old generation collection. The behavior is the same as that of Serial Old. Whether compression is required in this process depends on three conditions: (1) We have set UseCMSCompactAtFullCollection and CMSFullGCsBeforeComparison, and the former is set to true, The default value of the latter is 0. The former means that compression is performed at the time of Full GC, and the latter means that compression is performed every other time. If the default value is 0, compression is performed every Full GC; (2) The user called System. gc(), and DisableExplicitGC was not enabled; (3) Young gen reports that incremental collection will fail next.
The promotion failed triggers what we often call Full GC, which recycles and collates young and old generations. What are the different trigger reasons between promotion failed and concurrent mode failure? Promotion failed means that the guarantee mechanism determines whether there is enough space for new objects in the old age. If the guarantee mechanism says there is, it fails when it finds that continuous space cannot be found due to fragments when it is actually allocated; Concurrent mode failure means that the user thread requests more space than the reserved space before the concurrency cycle is completed, that is, the collection of background threads does not catch up with the allocation speed of application threads. When should I choose to use the CMS collector? My previous idea was that CMS should be used for those smaller than 8G, and G1 should be used for those larger than 8G. Jiang Xiaofeng discussed this idea with me and put forward some other ideas, which I think are also reasonable. It is recorded here: in addition to the throughput and delay, it also depends on the specific application. For example, ES, Lucene and G1 are incompatible. Therefore, the default collector is CMS. See Reference 17 and 18 for details. For heaps smaller than 3G, CMS is not recommended unless there is a particularly high demand for latency. This is mainly due to several shortcomings of CMS: (1) The trigger ratio of concurrent cycles is not easy to set; (2) Preempting CPU time; (3) Guarantee judgment causes YGC to slow down; (4) For a more detailed discussion of the debris problem, see Data 19.
Author: Du Qi